18 best Python Speech Recognition Libraries for 2023

by naveen.kumar@openweaver.com Updated: Jul 31, 2023

Guide Kit

Speech recognition is converting spoken words to text. It supports Google Speech Engine, Cloud Speech API, Bing Voice Recognition, and IBM Speech.

As we know Python is a multipurpose language that can be used for developing various applications including web apps. Python has many libraries dedicated to speech recognition, text-to-speech conversion, and text analysis.

In this article, I have listed some of the best Python Speech Recognition libraries with their key features. In this kit, we will go through some of the best Python Speech Recognition libraries like Real-Time-Voice-Cloning - 5 seconds to generate arbitrary speech; speech_recognition - Speech recognition module for Python, supporting several engines; wav2letter - Facebook AI Research's Automatic Speech Recognition Toolkit. Find the top 18 best Python Speech Recognition libraries in 2022.

Real-Time-Voice-Cloningby CorentinJ

Python

42399

Version:Current

License: Others (Non-SPDX)

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Support

Quality

Security

License

Reuse

Real-Time-Voice-Cloningby CorentinJ

Python 42399 Version:Current License: Others (Non-SPDX)

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Support

Quality

Security

License

Reuse

speech_recognitionby Uberi

Python

7239

Version:3.10.0

License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Support

Quality

Security

License

Reuse

speech_recognitionby Uberi

Python 7239 Version:3.10.0 License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Support

Quality

Security

License

Reuse

wav2letterby facebookresearch

Python

5531

Version:v0.1

License: Others (Non-SPDX)

Facebook AI Research's Automatic Speech Recognition Toolkit

Support

Quality

Security

License

Reuse

wav2letterby facebookresearch

Python 5531 Version:v0.1 License: Others (Non-SPDX)

Facebook AI Research's Automatic Speech Recognition Toolkit

Support

Quality

Security

License

Reuse

ASRT_SpeechRecognitionby nl8590687

Python

6646

Version:v1.3.0

License: Strong Copyleft (GPL-3.0)

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Support

Quality

Security

License

Reuse

ASRT_SpeechRecognitionby nl8590687

Python 6646 Version:v1.3.0 License: Strong Copyleft (GPL-3.0)

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Support

Quality

Security

License

Reuse

ffsubsyncby smacke

Python

5990

Version:0.4.22

License: Permissive (MIT)

Automagically synchronize subtitles with video.

Support

Quality

Security

License

Reuse

ffsubsyncby smacke

Python 5990 Version:0.4.22 License: Permissive (MIT)

Automagically synchronize subtitles with video.

Support

Quality

Security

License

Reuse

espnetby espnet

Python

6684

Version:v.202304

License: Permissive (Apache-2.0)

End-to-End Speech Processing Toolkit

Support

Quality

Security

License

Reuse

espnetby espnet

Python 6684 Version:v.202304 License: Permissive (Apache-2.0)

End-to-End Speech Processing Toolkit

Support

Quality

Security

License

Reuse

speechbrainby speechbrain

Python

6123

Version:v0.5.14

License: Permissive (Apache-2.0)

A PyTorch-based Speech Toolkit

Support

Quality

Security

License

Reuse

speechbrainby speechbrain

Python 6123 Version:v0.5.14 License: Permissive (Apache-2.0)

A PyTorch-based Speech Toolkit

Support

Quality

Security

License

Reuse

speech-to-text-wavenetby buriburisuri

Python

3746

Version:Current

License: Permissive (Apache-2.0)

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

Support

Quality

Security

License

Reuse

speech-to-text-wavenetby buriburisuri

Python 3746 Version:Current License: Permissive (Apache-2.0)

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

Support

Quality

Security

License

Reuse

Automatic_Speech_Recognitionby zzw922cn

Python

2729

Version:Current

License: Permissive (MIT)

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Support

Quality

Security

License

Reuse

Automatic_Speech_Recognitionby zzw922cn

Python 2729 Version:Current License: Permissive (MIT)

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Support

Quality

Security

License

Reuse

tacotronby keithito

Python

2787

Version:v0.2.0

License: Permissive (MIT)

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Support

Quality

Security

License

Reuse

tacotronby keithito

Python 2787 Version:v0.2.0 License: Permissive (MIT)

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Support

Quality

Security

License

Reuse

TensorFlowTTSby TensorSpeech

Python

3375

Version:v1.8

License: Permissive (Apache-2.0)

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Support

Quality

Security

License

Reuse

TensorFlowTTSby TensorSpeech

Python 3375 Version:v1.8 License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

tensorflow-speech-recognitionby pannous

Python

2142

Version:Current

License: Others (Non-SPDX)

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

Support

Quality

Security

License

Reuse

tensorflow-speech-recognitionby pannous

Python 2142 Version:Current License: Others (Non-SPDX)

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

Support

Quality

Security

License

Reuse

say_whatby joshnewlan

Python

2080

Version:Current

License: No License (null)

Using speech-to-text to fully check out during con calls

Support

Quality

Security

License

Reuse

say_whatby joshnewlan

Python 2080 Version:Current License: No License

Using speech-to-text to fully check out during con calls

Support

Quality

Security

License

Reuse

pytorch-kaldiby mravanelli

Python

2267

Version:Current

License: No License (null)

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Support

Quality

Security

License

Reuse

pytorch-kaldiby mravanelli

Python 2267 Version:Current License: No License

Support

Quality

Security

License

Reuse

deepspeech.pytorchby SeanNaren

Python

2023

Version:V3.0

License: Permissive (MIT)

Speech Recognition using DeepSpeech2.

Support

Quality

Security

License

Reuse

deepspeech.pytorchby SeanNaren

Python 2023 Version:V3.0 License: Permissive (MIT)

Speech Recognition using DeepSpeech2.

Support

Quality

Security

License

Reuse

aeneasby readbeyond

Python

2169

Version:v1.7.3

License: Strong Copyleft (AGPL-3.0)

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Support

Quality

Security

License

Reuse

aeneasby readbeyond

Python 2169 Version:v1.7.3 License: Strong Copyleft (AGPL-3.0)

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Support

Quality

Security

License

Reuse

waveglowby NVIDIA

Python

2110

Version:Current

License: Permissive (BSD-3-Clause)

A Flow-based Generative Network for Speech Synthesis

Support

Quality

Security

License

Reuse

waveglowby NVIDIA

Python 2110 Version:Current License: Permissive (BSD-3-Clause)

A Flow-based Generative Network for Speech Synthesis

Support

Quality

Security

License

Reuse

lip-reading-deeplearningby astorfi

Python

1730

Version:1.2

License: Permissive (Apache-2.0)

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

Support

Quality

Security

License

Reuse

lip-reading-deeplearningby astorfi

Python 1730 Version:1.2 License: Permissive (Apache-2.0)

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

Support

Quality

Security

License

Reuse

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

18 best Python Speech Recognition Libraries for 2023

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow