18 best Python Speech Recognition Libraries for 2023
by naveen.kumar@openweaver.com Updated: Jul 31, 2023
Guide Kit
Speech recognition is converting spoken words to text. It supports Google Speech Engine, Cloud Speech API, Bing Voice Recognition, and IBM Speech.
As we know Python is a multipurpose language that can be used for developing various applications including web apps. Python has many libraries dedicated to speech recognition, text-to-speech conversion, and text analysis.
In this article, I have listed some of the best Python Speech Recognition libraries with their key features. In this kit, we will go through some of the best Python Speech Recognition libraries like Real-Time-Voice-Cloning - 5 seconds to generate arbitrary speech; speech_recognition - Speech recognition module for Python, supporting several engines; wav2letter - Facebook AI Research's Automatic Speech Recognition Toolkit. Find the top 18 best Python Speech Recognition libraries in 2022.
Real-Time-Voice-Cloningby CorentinJ
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Real-Time-Voice-Cloningby CorentinJ
Python
42399
Version:Current
License: Others (Non-SPDX)
speech_recognitionby Uberi
Speech recognition module for Python, supporting several engines and APIs, online and offline.
speech_recognitionby Uberi
Python
7239
Version:3.10.0
License: Permissive (BSD-3-Clause)
wav2letterby facebookresearch
Facebook AI Research's Automatic Speech Recognition Toolkit
wav2letterby facebookresearch
Python
5531
Version:v0.1
License: Others (Non-SPDX)
ASRT_SpeechRecognitionby nl8590687
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
ASRT_SpeechRecognitionby nl8590687
Python
6646
Version:v1.3.0
License: Strong Copyleft (GPL-3.0)
ffsubsyncby smacke
Automagically synchronize subtitles with video.
ffsubsyncby smacke
Python
5990
Version:0.4.22
License: Permissive (MIT)
speechbrainby speechbrain
A PyTorch-based Speech Toolkit
speechbrainby speechbrain
Python
6123
Version:v0.5.14
License: Permissive (Apache-2.0)
speech-to-text-wavenetby buriburisuri
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
speech-to-text-wavenetby buriburisuri
Python
3746
Version:Current
License: Permissive (Apache-2.0)
Automatic_Speech_Recognitionby zzw922cn
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Automatic_Speech_Recognitionby zzw922cn
Python
2729
Version:Current
License: Permissive (MIT)
tacotronby keithito
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
tacotronby keithito
Python
2787
Version:v0.2.0
License: Permissive (MIT)
TensorFlowTTSby TensorSpeech
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
TensorFlowTTSby TensorSpeech
Python
3375
Version:v1.8
License: Permissive (Apache-2.0)
tensorflow-speech-recognitionby pannous
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
tensorflow-speech-recognitionby pannous
Python
2142
Version:Current
License: Others (Non-SPDX)
say_whatby joshnewlan
Using speech-to-text to fully check out during con calls
say_whatby joshnewlan
Python
2080
Version:Current
License: No License
pytorch-kaldiby mravanelli
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
pytorch-kaldiby mravanelli
Python
2267
Version:Current
License: No License
deepspeech.pytorchby SeanNaren
Speech Recognition using DeepSpeech2.
deepspeech.pytorchby SeanNaren
Python
2023
Version:V3.0
License: Permissive (MIT)
aeneasby readbeyond
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
aeneasby readbeyond
Python
2169
Version:v1.7.3
License: Strong Copyleft (AGPL-3.0)
waveglowby NVIDIA
A Flow-based Generative Network for Speech Synthesis
waveglowby NVIDIA
Python
2110
Version:Current
License: Permissive (BSD-3-Clause)
lip-reading-deeplearningby astorfi
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
lip-reading-deeplearningby astorfi
Python
1730
Version:1.2
License: Permissive (Apache-2.0)