18 best Python Speech Recognition Libraries for 2023
by naveen.kumar@openweaver.com Updated: Jul 31, 2023
Guide Kit
Speech recognition is converting spoken words to text. It supports Google Speech Engine, Cloud Speech API, Bing Voice Recognition, and IBM Speech.
As we know Python is a multipurpose language that can be used for developing various applications including web apps. Python has many libraries dedicated to speech recognition, text-to-speech conversion, and text analysis.
In this article, I have listed some of the best Python Speech Recognition libraries with their key features. In this kit, we will go through some of the best Python Speech Recognition libraries like Real-Time-Voice-Cloning - 5 seconds to generate arbitrary speech; speech_recognition - Speech recognition module for Python, supporting several engines; wav2letter - Facebook AI Research's Automatic Speech Recognition Toolkit. Find the top 18 best Python Speech Recognition libraries in 2022.
Real-Time-Voice-Cloningby CorentinJ
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Real-Time-Voice-Cloningby CorentinJ
Python 42399 Version:Current License: Others (Non-SPDX)
speech_recognitionby Uberi
Speech recognition module for Python, supporting several engines and APIs, online and offline.
speech_recognitionby Uberi
Python 7239 Version:3.10.0 License: Permissive (BSD-3-Clause)
wav2letterby facebookresearch
Facebook AI Research's Automatic Speech Recognition Toolkit
wav2letterby facebookresearch
Python 5531 Version:v0.1 License: Others (Non-SPDX)
ASRT_SpeechRecognitionby nl8590687
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
ASRT_SpeechRecognitionby nl8590687
Python 6646 Version:v1.3.0 License: Strong Copyleft (GPL-3.0)
ffsubsyncby smacke
Automagically synchronize subtitles with video.
ffsubsyncby smacke
Python 5990 Version:0.4.22 License: Permissive (MIT)
speechbrainby speechbrain
A PyTorch-based Speech Toolkit
speechbrainby speechbrain
Python 6123 Version:v0.5.14 License: Permissive (Apache-2.0)
speech-to-text-wavenetby buriburisuri
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
speech-to-text-wavenetby buriburisuri
Python 3746 Version:Current License: Permissive (Apache-2.0)
Automatic_Speech_Recognitionby zzw922cn
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Automatic_Speech_Recognitionby zzw922cn
Python 2729 Version:Current License: Permissive (MIT)
tacotronby keithito
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
tacotronby keithito
Python 2787 Version:v0.2.0 License: Permissive (MIT)
TensorFlowTTSby TensorSpeech
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
TensorFlowTTSby TensorSpeech
Python 3375 Version:v1.8 License: Permissive (Apache-2.0)
tensorflow-speech-recognitionby pannous
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
tensorflow-speech-recognitionby pannous
Python 2142 Version:Current License: Others (Non-SPDX)
say_whatby joshnewlan
Using speech-to-text to fully check out during con calls
say_whatby joshnewlan
Python 2080 Version:Current License: No License
pytorch-kaldiby mravanelli
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
pytorch-kaldiby mravanelli
Python 2267 Version:Current License: No License
deepspeech.pytorchby SeanNaren
Speech Recognition using DeepSpeech2.
deepspeech.pytorchby SeanNaren
Python 2023 Version:V3.0 License: Permissive (MIT)
aeneasby readbeyond
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
aeneasby readbeyond
Python 2169 Version:v1.7.3 License: Strong Copyleft (AGPL-3.0)
waveglowby NVIDIA
A Flow-based Generative Network for Speech Synthesis
waveglowby NVIDIA
Python 2110 Version:Current License: Permissive (BSD-3-Clause)
lip-reading-deeplearningby astorfi
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
lip-reading-deeplearningby astorfi
Python 1730 Version:1.2 License: Permissive (Apache-2.0)