18 best Python Speech Recognition Libraries for 2023
by naveen.kumar@openweaver.com Updated: Jan 11, 2023
Guide Kit
Speech recognition is the process of converting spoken words to text. Python supports many speech recognition engines and APIs, including Google Speech Engine, Google Cloud Speech API, Microsoft Bing Voice Recognition, and IBM Speech to Text. As we know Python is a multipurpose language that can be used for developing various applications including web apps. Python has many libraries dedicated to speech recognition, text-to-speech conversion, and text analysis. In this article, I have listed some of the best Python Speech Recognition libraries with their key features.
In this kit, we will go through some of the best Python Speech Recognition libraries like Real-Time-Voice-Cloning - 5 seconds to generate arbitrary speech; speech_recognition - Speech recognition module for Python, supporting several engines; wav2letter - Facebook AI Research's Automatic Speech Recognition Toolkit. Find the top 18 best Python Speech Recognition libraries in 2022.
Real-Time-Voice-Cloningby CorentinJ
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Real-Time-Voice-Cloningby CorentinJ
Python
40505
Version:Current
License: Others (Non-SPDX)
speech_recognitionby Uberi
Speech recognition module for Python, supporting several engines and APIs, online and offline.
speech_recognitionby Uberi
Python
6884
Version:3.9.0
License: Permissive (BSD-3-Clause)
wav2letterby facebookresearch
Facebook AI Research's Automatic Speech Recognition Toolkit
wav2letterby facebookresearch
Python
5531
Version:v0.1
License: Others (Non-SPDX)
ASRT_SpeechRecognitionby nl8590687
A Deep-Learning-Based Chinese Speech Recognition System Chinese Speech Recognition System Based on Deep Learning
ASRT_SpeechRecognitionby nl8590687
Python
6395
Version:v1.3.0
License: Strong Copyleft (GPL-3.0)
ffsubsyncby smacke
Automagically synchronize subtitles with video.
ffsubsyncby smacke
Python
5878
Version:0.4.22
License: Permissive (MIT)
speechbrainby speechbrain
A PyTorch-based Speech Toolkit
speechbrainby speechbrain
Python
5566
Version:v0.5.14
License: Permissive (Apache-2.0)
speech-to-text-wavenetby buriburisuri
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
speech-to-text-wavenetby buriburisuri
Python
3746
Version:Current
License: Permissive (Apache-2.0)
Automatic_Speech_Recognitionby zzw922cn
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Automatic_Speech_Recognitionby zzw922cn
Python
2729
Version:Current
License: Permissive (MIT)
tacotronby keithito
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
tacotronby keithito
Python
2762
Version:v0.2.0
License: Permissive (MIT)
TensorFlowTTSby TensorSpeech
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
TensorFlowTTSby TensorSpeech
Python
3179
Version:v1.8
License: Permissive (Apache-2.0)
tensorflow-speech-recognitionby pannous
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
tensorflow-speech-recognitionby pannous
Python
2132
Version:Current
License: Others (Non-SPDX)
say_whatby joshnewlan
Using speech-to-text to fully check out during con calls
say_whatby joshnewlan
Python
2080
Version:Current
License: No License
pytorch-kaldiby mravanelli
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
pytorch-kaldiby mravanelli
Python
2267
Version:Current
License: No License
deepspeech.pytorchby SeanNaren
Speech Recognition using DeepSpeech2.
deepspeech.pytorchby SeanNaren
Python
1994
Version:V3.0
License: Permissive (MIT)
aeneasby readbeyond
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
aeneasby readbeyond
Python
2122
Version:v1.7.3
License: Strong Copyleft (AGPL-3.0)
waveglowby NVIDIA
A Flow-based Generative Network for Speech Synthesis
waveglowby NVIDIA
Python
2072
Version:Current
License: Permissive (BSD-3-Clause)
lip-reading-deeplearningby astorfi
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
lip-reading-deeplearningby astorfi
Python
1730
Version:1.2
License: Permissive (Apache-2.0)