Clone a voice in 5 seconds to generate arbitrary speech in real-time
Support
Quality
Security
License
Reuse
Robust Speech Recognition via Large-Scale Weak Supervision
Support
Quality
Security
License
Reuse
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Support
Quality
Security
License
Reuse
SoftVC VITS Singing Voice Conversion
Support
Quality
Security
License
Reuse
🧠 Leon is your open-source personal assistant.
Support
Quality
Security
License
Reuse
kaldi-asr/kaldi is the official location of the Kaldi project.
Support
Quality
Security
License
Reuse
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Support
Quality
Security
License
Reuse
Offline documentation browser inspired by Dash
Support
Quality
Security
License
Reuse
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Support
Quality
Security
License
Reuse
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Support
Quality
Security
License
Reuse
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Support
Quality
Security
License
Reuse
A multi-voice TTS system trained with an emphasis on quality
Support
Quality
Security
License
Reuse
Manipulate audio with a simple and easy high level interface
Support
Quality
Security
License
Reuse
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Support
Quality
Security
License
Reuse
NeMo: a toolkit for conversational AI
Support
Quality
Security
License
Reuse
End-to-End Speech Processing Toolkit
Support
Quality
Security
License
Reuse
:speech_balloon: Speech recognition for your site
Support
Quality
Security
License
Reuse
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Support
Quality
Security
License
Reuse
Facebook AI Research's Automatic Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Support
Quality
Security
License
Reuse
Facebook AI Research's Automatic Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
R
Retrieval-based-Voice-Conversion-WebUIby RVC-Project
Python 
4863
Version:Current
License: Permissive (MIT)
Voice data <= 10 mins can also be used to train a good VC model!
Support
Quality
Security
License
Reuse
Speech and Vision Based Intelligent Personal Assistant
Support
Quality
Security
License
Reuse
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Support
Quality
Security
License
Reuse
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Support
Quality
Security
License
Reuse
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Support
Quality
Security
License
Reuse
Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
Support
Quality
Security
License
Reuse
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Support
Quality
Security
License
Reuse
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Support
Quality
Security
License
Reuse
A small speech recognizer
Support
Quality
Security
License
Reuse
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Support
Quality
Security
License
Reuse
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
Support
Quality
Security
License
Reuse
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
On-device wake word detection powered by deep learning
Support
Quality
Security
License
Reuse
Enhancements tracking repo for Kubernetes
Support
Quality
Security
License
Reuse
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Support
Quality
Security
License
Reuse
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Support
Quality
Security
License
Reuse
Lingvo
Support
Quality
Security
License
Reuse
A speech recognition library for the web
Support
Quality
Security
License
Reuse
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
Support
Quality
Security
License
Reuse
A python package to analyze and compare voices with deep learning
Support
Quality
Security
License
Reuse
WaveNet vocoder
Support
Quality
Security
License
Reuse
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Support
Quality
Security
License
Reuse
DeepMind's Tacotron-2 Tensorflow implementation
Support
Quality
Security
License
Reuse
t
tensorflow-speech-recognitionby pannous
Python 
2142
Version:Current
License: Proprietary (Proprietary)
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Support
Quality
Security
License
Reuse
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Support
Quality
Security
License
Reuse
Spot SDK repo
Support
Quality
Security
License
Reuse
Offline private voice assistant for many human languages
Support
Quality
Security
License
Reuse
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Support
Quality
Security
License
Reuse
R
Real-Time-Voice-Cloningby CorentinJ
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Python
42399
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
whisperby openai
Robust Speech Recognition via Large-Scale Weak Supervision
Python
39256
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
DeepSpeechby mozilla
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
C++
22108
Updated: 2 y ago
License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
s
so-vits-svcby svc-develop-team
SoftVC VITS Singing Voice Conversion
Python
15411
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
l
leonby leon-ai
🧠 Leon is your open-source personal assistant.
TypeScript
12924
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldiby kaldi-asr
kaldi-asr/kaldi is the official location of the Kaldi project.
Shell
12835
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
TTSby coqui-ai
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Python
12468
Updated: 2 y ago
License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
z
zealby zealdocs
Offline documentation browser inspired by Dash
C++
10486
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
A
AudioGPTby AIGC-Audio
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Python
8722
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
P
PaddleSpeechby PaddlePaddle
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Python
7725
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TTSby mozilla
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Jupyter Notebook
7519
Updated: 2 y ago
License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
t
tortoise-ttsby neonbjb
A multi-voice TTS system trained with an emphasis on quality
Python
7408
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pydubby jiaaro
Manipulate audio with a simple and easy high level interface
Python
7332
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech_recognitionby Uberi
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Python
7239
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
N
NeMoby NVIDIA
NeMo: a toolkit for conversational AI
Python
7027
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
espnetby espnet
End-to-End Speech Processing Toolkit
Python
6684
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
annyangby TalAter
:speech_balloon: Speech recognition for your site
JavaScript
6366
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
buzzby chidiwilliams
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Python
6327
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wav2letterby flashlight
Facebook AI Research's Automatic Speech Recognition Toolkit
C++
6241
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
v
vosk-apiby alphacep
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Jupyter Notebook
5750
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
wav2letterby facebookresearch
Facebook AI Research's Automatic Speech Recognition Toolkit
Python
5531
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
R
Retrieval-based-Voice-Conversion-WebUIby RVC-Project
Voice data <= 10 mins can also be used to train a good VC model!
Python
4863
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
lucidaby claritylab
Speech and Vision Based Intelligent Personal Assistant
Java
4839
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
wukong-robotby wzpan
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Python
4664
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tacotron2by NVIDIA
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Jupyter Notebook
4497
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
v
vitsby jaywalnut310
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Python
4351
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
ecouteby SevaSk
Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
Python
4253
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech-to-text-wavenetby buriburisuri
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Python
3746
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
silero-modelsby snakers4
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Jupyter Notebook
3739
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
pocketsphinxby cmusphinx
A small speech recognizer
C
3387
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
TensorFlowTTSby TensorSpeech
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Python
3375
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
common-voiceby common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
TypeScript
3156
Updated: 2 y ago
License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
p
pyannote-audioby pyannote
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Jupyter Notebook
3116
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wenetby wenet-e2e
Production First and Production Ready End-to-End Speech Recognition Toolkit
C++
3072
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
porcupineby Picovoice
On-device wake word detection powered by deep learning
Python
3030
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
enhancementsby kubernetes
Enhancements tracking repo for Kubernetes
Go
2907
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tacotronby keithito
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Python
2787
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
Automatic_Speech_Recognitionby zzw922cn
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Python
2729
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
Support
Quality
Security
License
Reuse
j
juliusjsby zzmp
A speech recognition library for the web
JavaScript
2578
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speechgptby hahahumble
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
TypeScript
2508
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
R
Resemblyzerby resemble-ai
A python package to analyze and compare voices with deep learning
Python
2291
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
Support
Quality
Security
License
Reuse
a
aeneasby readbeyond
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Python
2169
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
T
Tacotron-2by Rayhane-mamah
DeepMind's Tacotron-2 Tensorflow implementation
Python
2166
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tensorflow-speech-recognitionby pannous
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Python
2142
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
espeak-ngby espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
C
2099
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
spot-sdkby boston-dynamics
Spot SDK repo
Python
2077
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
r
rhasspyby rhasspy
Offline private voice assistant for many human languages
Shell
2000
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
maryttsby marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Java
1992
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse