Speech Libraries - Page 1

Real-Time-Voice-Cloningby CorentinJ

Python 42399 Version:Current
License: Proprietary (Proprietary)

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Support

Quality

Security

License

Reuse

whisperby openai

Python 39256 Version:Current
License: Permissive (MIT)

Robust Speech Recognition via Large-Scale Weak Supervision

Support

Quality

Security

License

Reuse

C++ 22108 Version:Current
License: Weak Copyleft (MPL-2.0)

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Support

Quality

Security

License

Reuse

so-vits-svcby svc-develop-team

Python 15411 Version:Current
License: Strong Copyleft (AGPL-3.0)

SoftVC VITS Singing Voice Conversion

Support

Quality

Security

License

Reuse

leonby leon-ai

TypeScript 12924 Version:Current
License: Permissive (MIT)

🧠 Leon is your open-source personal assistant.

Support

Quality

Security

License

Reuse

kaldiby kaldi-asr

Shell 12835 Version:Current
License: Proprietary (Proprietary)

kaldi-asr/kaldi is the official location of the Kaldi project.

Support

Quality

Security

License

Reuse

TTSby coqui-ai

Python 12468 Version:Current
License: Weak Copyleft (MPL-2.0)

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Support

Quality

Security

License

Reuse

zealby zealdocs

C++ 10486 Version:Current
License: Strong Copyleft (GPL-3.0)

Offline documentation browser inspired by Dash

Support

Quality

Security

License

Reuse

AudioGPTby AIGC-Audio

Python 8722 Version:Current
License: Proprietary (Proprietary)

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Support

Quality

Security

License

Reuse

PaddleSpeechby PaddlePaddle

Python 7725 Version:Current
License: Permissive (Apache-2.0)

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Support

Quality

Security

License

Reuse

TTSby mozilla

Jupyter Notebook 7519 Version:Current
License: Weak Copyleft (MPL-2.0)

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Support

Quality

Security

License

Reuse

Python 7408 Version:Current
License: Permissive (Apache-2.0)

A multi-voice TTS system trained with an emphasis on quality

Support

Quality

Security

License

Reuse

pydubby jiaaro

Python 7332 Version:Current
License: Permissive (MIT)

Manipulate audio with a simple and easy high level interface

Support

Quality

Security

License

Reuse

Python 7239 Version:Current
License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Support

Quality

Security

License

Reuse

NeMoby NVIDIA

Python 7027 Version:Current
License: Permissive (Apache-2.0)

NeMo: a toolkit for conversational AI

Support

Quality

Security

License

Reuse

espnetby espnet

Python 6684 Version:Current
License: Permissive (Apache-2.0)

End-to-End Speech Processing Toolkit

Support

Quality

Security

License

Reuse

annyangby TalAter

JavaScript 6366 Version:Current
License: Permissive (MIT)

:speech_balloon: Speech recognition for your site

Support

Quality

Security

License

Reuse

buzzby chidiwilliams

Python 6327 Version:Current
License: Permissive (MIT)

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Support

Quality

Security

License

Reuse

wav2letterby flashlight

C++ 6241 Version:Current
License: Proprietary (Proprietary)

Facebook AI Research's Automatic Speech Recognition Toolkit

Support

Quality

Security

License

Reuse

vosk-apiby alphacep

Jupyter Notebook 5750 Version:Current
License: Permissive (Apache-2.0)

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Support

Quality

Security

License

Reuse

wav2letterby facebookresearch

Python 5531 Version:Current
License: Proprietary (Proprietary)

Facebook AI Research's Automatic Speech Recognition Toolkit

Support

Quality

Security

License

Reuse

Retrieval-based-Voice-Conversion-WebUIby RVC-Project

Python 4863 Version:Current
License: Permissive (MIT)

Voice data <= 10 mins can also be used to train a good VC model!

Support

Quality

Security

License

Reuse

lucidaby claritylab

Java 4839 Version:Current
License: Proprietary (Proprietary)

Speech and Vision Based Intelligent Personal Assistant

Support

Quality

Security

License

Reuse

Python 4664 Version:Current
License: Permissive (MIT)

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

Support

Quality

Security

License

Reuse

Jupyter Notebook 4497 Version:Current
License: Permissive (BSD-3-Clause)

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Support

Quality

Security

License

Reuse

vitsby jaywalnut310

Python 4351 Version:Current
License: Permissive (MIT)

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Support

Quality

Security

License

Reuse

ecouteby SevaSk

Python 4253 Version:Current
License: Permissive (MIT)

Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.

Support

Quality

Security

License

Reuse

speech-to-text-wavenetby buriburisuri

Python 3746 Version:Current
License: Permissive (Apache-2.0)

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

Support

Quality

Security

License

Reuse

Jupyter Notebook 3739 Version:Current
License: Proprietary (Proprietary)

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Support

Quality

Security

License

Reuse

C 3387 Version:Current
License: Proprietary (Proprietary)

A small speech recognizer

Support

Quality

Security

License

Reuse

TensorFlowTTSby TensorSpeech

Python 3375 Version:Current
License: Permissive (Apache-2.0)

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Support

Quality

Security

License

Reuse

common-voiceby common-voice

TypeScript 3156 Version:Current
License: Weak Copyleft (MPL-2.0)

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

Support

Quality

Security

License

Reuse

Jupyter Notebook 3116 Version:Current
License: Permissive (MIT)

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Support

Quality

Security

License

Reuse

wenetby wenet-e2e

C++ 3072 Version:Current
License: Permissive (Apache-2.0)

Production First and Production Ready End-to-End Speech Recognition Toolkit

Support

Quality

Security

License

Reuse

porcupineby Picovoice

Python 3030 Version:Current
License: Permissive (Apache-2.0)

On-device wake word detection powered by deep learning

Support

Quality

Security

License

Reuse

enhancementsby kubernetes

Go 2907 Version:Current
License: Permissive (Apache-2.0)

Enhancements tracking repo for Kubernetes

Support

Quality

Security

License

Reuse

tacotronby keithito

Python 2787 Version:Current
License: Permissive (MIT)

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Support

Quality

Security

License

Reuse

Automatic_Speech_Recognitionby zzw922cn

Python 2729 Version:Current
License: Permissive (MIT)

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Support

Quality

Security

License

Reuse

lingvoby tensorflow

Python 2727 Version:Current
License: Permissive (Apache-2.0)

Lingvo

Support

Quality

Security

License

Reuse

JavaScript 2578 Version:Current
License: Proprietary (Proprietary)

A speech recognition library for the web

Support

Quality

Security

License

Reuse

speechgptby hahahumble

TypeScript 2508 Version:Current
License: Permissive (MIT)

💬 SpeechGPT is a web application that enables you to converse with ChatGPT.

Support

Quality

Security

License

Reuse

Resemblyzerby resemble-ai

Python 2291 Version:Current
License: Permissive (Apache-2.0)

A python package to analyze and compare voices with deep learning

Support

Quality

Security

License

Reuse

Python 2170 Version:Current
License: Proprietary (Proprietary)

WaveNet vocoder

Support

Quality

Security

License

Reuse

aeneasby readbeyond

Python 2169 Version:Current
License: Strong Copyleft (AGPL-3.0)

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Support

Quality

Security

License

Reuse

Tacotron-2by Rayhane-mamah

Python 2166 Version:Current
License: Permissive (MIT)

DeepMind's Tacotron-2 Tensorflow implementation

Support

Quality

Security

License

Reuse

tensorflow-speech-recognitionby pannous

Python 2142 Version:Current
License: Proprietary (Proprietary)

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

Support

Quality

Security

License

Reuse

espeak-ngby espeak-ng

C 2099 Version:Current
License: Strong Copyleft (GPL-3.0)

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Support

Quality

Security

License

Reuse

spot-sdkby boston-dynamics

Python 2077 Version:Current
License: Proprietary (Proprietary)

Spot SDK repo

Support

Quality

Security

License

Reuse

rhasspyby rhasspy

Shell 2000 Version:Current
License: Permissive (MIT)

Offline private voice assistant for many human languages

Support

Quality

Security

License

Reuse

maryttsby marytts

Java 1992 Version:Current
License: Proprietary (Proprietary)

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java

Support

Quality

Security

License

Reuse

Real-Time-Voice-Cloningby CorentinJ

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python

42399

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

whisperby openai

Robust Speech Recognition via Large-Scale Weak Supervision

Python

39256

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

DeepSpeechby mozilla

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

C++

22108

Updated: 2 y ago

License: Weak Copyleft (MPL-2.0)

Support

Quality

Security

License

Reuse

so-vits-svcby svc-develop-team

SoftVC VITS Singing Voice Conversion

Python

15411

Updated: 2 y ago

License: Strong Copyleft (AGPL-3.0)

Support

Quality

Security

License

Reuse

leonby leon-ai

🧠 Leon is your open-source personal assistant.

TypeScript

12924

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

kaldiby kaldi-asr

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell

12835

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

TTSby coqui-ai

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python

12468

Updated: 2 y ago

License: Weak Copyleft (MPL-2.0)

Support

Quality

Security

License

Reuse

zealby zealdocs

Offline documentation browser inspired by Dash

C++

10486

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

AudioGPTby AIGC-Audio

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Python

8722

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

PaddleSpeechby PaddlePaddle

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Python

7725

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

TTSby mozilla

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Jupyter Notebook

7519

Updated: 2 y ago

License: Weak Copyleft (MPL-2.0)

Support

Quality

Security

License

Reuse

tortoise-ttsby neonbjb

A multi-voice TTS system trained with an emphasis on quality

Python

7408

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

pydubby jiaaro

Manipulate audio with a simple and easy high level interface

Python

7332

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

speech_recognitionby Uberi

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Python

7239

Updated: 2 y ago

License: Permissive (BSD-3-Clause)

Support

Quality

Security

License

Reuse

NeMoby NVIDIA

NeMo: a toolkit for conversational AI

Python

7027

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

espnetby espnet

End-to-End Speech Processing Toolkit

Python

6684

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

annyangby TalAter

:speech_balloon: Speech recognition for your site

JavaScript

6366

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

buzzby chidiwilliams

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Python

6327

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

wav2letterby flashlight

Facebook AI Research's Automatic Speech Recognition Toolkit

C++

6241

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

vosk-apiby alphacep

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Jupyter Notebook

5750

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

wav2letterby facebookresearch

Facebook AI Research's Automatic Speech Recognition Toolkit

Python

5531

Updated: 4 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

Retrieval-based-Voice-Conversion-WebUIby RVC-Project

Voice data <= 10 mins can also be used to train a good VC model!

Python

4863

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

lucidaby claritylab

Speech and Vision Based Intelligent Personal Assistant

Java

4839

Updated: 4 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

wukong-robotby wzpan

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

Python

4664

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

tacotron2by NVIDIA

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Jupyter Notebook

4497

Updated: 2 y ago

License: Permissive (BSD-3-Clause)

Support

Quality

Security

License

Reuse

vitsby jaywalnut310

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python

4351

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

ecouteby SevaSk

Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.

Python

4253

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

speech-to-text-wavenetby buriburisuri

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

Python

3746

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

silero-modelsby snakers4

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Jupyter Notebook

3739

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

pocketsphinxby cmusphinx

A small speech recognizer

3387

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

TensorFlowTTSby TensorSpeech

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Python

3375

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

common-voiceby common-voice

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

TypeScript

3156

Updated: 2 y ago

License: Weak Copyleft (MPL-2.0)

Support

Quality

Security

License

Reuse

pyannote-audioby pyannote

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook

3116

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

wenetby wenet-e2e

Production First and Production Ready End-to-End Speech Recognition Toolkit

C++

3072

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

porcupineby Picovoice

On-device wake word detection powered by deep learning

Python

3030

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

enhancementsby kubernetes

Enhancements tracking repo for Kubernetes

2907

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

tacotronby keithito

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Python

2787

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Automatic_Speech_Recognitionby zzw922cn

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Python

2729

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

lingvoby tensorflow

Lingvo

Python

2727

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

juliusjsby zzmp

A speech recognition library for the web

JavaScript

2578

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

speechgptby hahahumble

💬 SpeechGPT is a web application that enables you to converse with ChatGPT.

TypeScript

2508

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Resemblyzerby resemble-ai

A python package to analyze and compare voices with deep learning

Python

2291

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

wavenet_vocoderby r9y9

WaveNet vocoder

Python

2170

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

aeneasby readbeyond

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Python

2169

Updated: 2 y ago

License: Strong Copyleft (AGPL-3.0)

Support

Quality

Security

License

Reuse

Tacotron-2by Rayhane-mamah

DeepMind's Tacotron-2 Tensorflow implementation

Python

2166

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

tensorflow-speech-recognitionby pannous

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

Python

2142

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

espeak-ngby espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

2099

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

spot-sdkby boston-dynamics

Spot SDK repo

Python

2077

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

rhasspyby rhasspy

Offline private voice assistant for many human languages

Shell

2000

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

maryttsby marytts

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java

Java

1992

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Speech Libraries - Page 1

Real-Time-Voice-Cloningby CorentinJ

Python 42399 Version:Current License: Proprietary (Proprietary)

Clone a voice in 5 seconds to generate arbitrary speech in real-time

whisperby openai

Python 39256 Version:Current License: Permissive (MIT)

Robust Speech Recognition via Large-Scale Weak Supervision

DeepSpeechby mozilla

C++ 22108 Version:Current License: Weak Copyleft (MPL-2.0)

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

so-vits-svcby svc-develop-team

Python 15411 Version:Current License: Strong Copyleft (AGPL-3.0)

SoftVC VITS Singing Voice Conversion

leonby leon-ai

TypeScript 12924 Version:Current License: Permissive (MIT)

🧠 Leon is your open-source personal assistant.

kaldiby kaldi-asr

Shell 12835 Version:Current License: Proprietary (Proprietary)

kaldi-asr/kaldi is the official location of the Kaldi project.

TTSby coqui-ai

Python 12468 Version:Current License: Weak Copyleft (MPL-2.0)

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

zealby zealdocs

C++ 10486 Version:Current License: Strong Copyleft (GPL-3.0)

Offline documentation browser inspired by Dash

AudioGPTby AIGC-Audio

Python 8722 Version:Current License: Proprietary (Proprietary)

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

PaddleSpeechby PaddlePaddle

Python 7725 Version:Current License: Permissive (Apache-2.0)

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

TTSby mozilla

Jupyter Notebook 7519 Version:Current License: Weak Copyleft (MPL-2.0)

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

tortoise-ttsby neonbjb

Python 7408 Version:Current License: Permissive (Apache-2.0)

A multi-voice TTS system trained with an emphasis on quality

pydubby jiaaro

Python 7332 Version:Current License: Permissive (MIT)

Manipulate audio with a simple and easy high level interface

speech_recognitionby Uberi

Python 7239 Version:Current License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

NeMoby NVIDIA

Python 7027 Version:Current License: Permissive (Apache-2.0)

NeMo: a toolkit for conversational AI

espnetby espnet

Python 6684 Version:Current License: Permissive (Apache-2.0)

End-to-End Speech Processing Toolkit

annyangby TalAter

JavaScript 6366 Version:Current License: Permissive (MIT)

:speech_balloon: Speech recognition for your site

buzzby chidiwilliams

Python 6327 Version:Current License: Permissive (MIT)

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

wav2letterby flashlight

C++ 6241 Version:Current License: Proprietary (Proprietary)

Facebook AI Research's Automatic Speech Recognition Toolkit

vosk-apiby alphacep

Jupyter Notebook 5750 Version:Current License: Permissive (Apache-2.0)

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

wav2letterby facebookresearch

Python 5531 Version:Current License: Proprietary (Proprietary)

Facebook AI Research's Automatic Speech Recognition Toolkit

Retrieval-based-Voice-Conversion-WebUIby RVC-Project

Python 4863 Version:Current License: Permissive (MIT)

Voice data <= 10 mins can also be used to train a good VC model!

lucidaby claritylab

Java 4839 Version:Current License: Proprietary (Proprietary)

Speech and Vision Based Intelligent Personal Assistant

wukong-robotby wzpan

Python 4664 Version:Current License: Permissive (MIT)

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

tacotron2by NVIDIA

Jupyter Notebook 4497 Version:Current License: Permissive (BSD-3-Clause)

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

vitsby jaywalnut310

Python 4351 Version:Current License: Permissive (MIT)

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

ecouteby SevaSk

Python 42399 Version:Current
License: Proprietary (Proprietary)

Python 39256 Version:Current
License: Permissive (MIT)

C++ 22108 Version:Current
License: Weak Copyleft (MPL-2.0)

Python 15411 Version:Current
License: Strong Copyleft (AGPL-3.0)

TypeScript 12924 Version:Current
License: Permissive (MIT)

Shell 12835 Version:Current
License: Proprietary (Proprietary)

Python 12468 Version:Current
License: Weak Copyleft (MPL-2.0)

C++ 10486 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 8722 Version:Current
License: Proprietary (Proprietary)

Python 7725 Version:Current
License: Permissive (Apache-2.0)

Jupyter Notebook 7519 Version:Current
License: Weak Copyleft (MPL-2.0)

Python 7408 Version:Current
License: Permissive (Apache-2.0)

Python 7332 Version:Current
License: Permissive (MIT)

Python 7239 Version:Current
License: Permissive (BSD-3-Clause)

Python 7027 Version:Current
License: Permissive (Apache-2.0)

Python 6684 Version:Current
License: Permissive (Apache-2.0)

JavaScript 6366 Version:Current
License: Permissive (MIT)

Python 6327 Version:Current
License: Permissive (MIT)

C++ 6241 Version:Current
License: Proprietary (Proprietary)

Jupyter Notebook 5750 Version:Current
License: Permissive (Apache-2.0)

Python 5531 Version:Current
License: Proprietary (Proprietary)

Python 4863 Version:Current
License: Permissive (MIT)

Java 4839 Version:Current
License: Proprietary (Proprietary)

Python 4664 Version:Current
License: Permissive (MIT)

Jupyter Notebook 4497 Version:Current
License: Permissive (BSD-3-Clause)

Python 4351 Version:Current
License: Permissive (MIT)

Python 4253 Version:Current
License: Permissive (MIT)

Python 3746 Version:Current
License: Permissive (Apache-2.0)

Jupyter Notebook 3739 Version:Current
License: Proprietary (Proprietary)

C 3387 Version:Current
License: Proprietary (Proprietary)

Python 3375 Version:Current
License: Permissive (Apache-2.0)

TypeScript 3156 Version:Current
License: Weak Copyleft (MPL-2.0)

Jupyter Notebook 3116 Version:Current
License: Permissive (MIT)

C++ 3072 Version:Current
License: Permissive (Apache-2.0)

Python 3030 Version:Current
License: Permissive (Apache-2.0)

Go 2907 Version:Current
License: Permissive (Apache-2.0)

Python 2787 Version:Current
License: Permissive (MIT)

Python 2729 Version:Current
License: Permissive (MIT)

Python 2727 Version:Current
License: Permissive (Apache-2.0)

JavaScript 2578 Version:Current
License: Proprietary (Proprietary)

TypeScript 2508 Version:Current
License: Permissive (MIT)

Python 2291 Version:Current
License: Permissive (Apache-2.0)

Python 2170 Version:Current
License: Proprietary (Proprietary)

Python 2169 Version:Current
License: Strong Copyleft (AGPL-3.0)

Python 2166 Version:Current
License: Permissive (MIT)

Python 2142 Version:Current
License: Proprietary (Proprietary)

C 2099 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 2077 Version:Current
License: Proprietary (Proprietary)

Shell 2000 Version:Current
License: Permissive (MIT)

Java 1992 Version:Current
License: Proprietary (Proprietary)