Clone a voice in 5 seconds to generate arbitrary speech in real-time
Support
Quality
Security
License
Reuse
Robust Speech Recognition via Large-Scale Weak Supervision
Support
Quality
Security
License
Reuse
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Support
Quality
Security
License
Reuse
SoftVC VITS Singing Voice Conversion
Support
Quality
Security
License
Reuse
🧠 Leon is your open-source personal assistant.
Support
Quality
Security
License
Reuse
kaldi-asr/kaldi is the official location of the Kaldi project.
Support
Quality
Security
License
Reuse
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Support
Quality
Security
License
Reuse
Offline documentation browser inspired by Dash
Support
Quality
Security
License
Reuse
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Support
Quality
Security
License
Reuse
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Support
Quality
Security
License
Reuse
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Support
Quality
Security
License
Reuse
A multi-voice TTS system trained with an emphasis on quality
Support
Quality
Security
License
Reuse
Manipulate audio with a simple and easy high level interface
Support
Quality
Security
License
Reuse
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Support
Quality
Security
License
Reuse
NeMo: a toolkit for conversational AI
Support
Quality
Security
License
Reuse
End-to-End Speech Processing Toolkit
Support
Quality
Security
License
Reuse
:speech_balloon: Speech recognition for your site
Support
Quality
Security
License
Reuse
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Support
Quality
Security
License
Reuse
Facebook AI Research's Automatic Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Support
Quality
Security
License
Reuse
Facebook AI Research's Automatic Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
R
Retrieval-based-Voice-Conversion-WebUIby RVC-Project
Python 4863 Version:Current License: Permissive (MIT)
Voice data <= 10 mins can also be used to train a good VC model!
Support
Quality
Security
License
Reuse
Speech and Vision Based Intelligent Personal Assistant
Support
Quality
Security
License
Reuse
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Support
Quality
Security
License
Reuse
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Support
Quality
Security
License
Reuse
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Support
Quality
Security
License
Reuse
Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
Support
Quality
Security
License
Reuse
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Support
Quality
Security
License
Reuse
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Support
Quality
Security
License
Reuse
A small speech recognizer
Support
Quality
Security
License
Reuse
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Support
Quality
Security
License
Reuse
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
Support
Quality
Security
License
Reuse
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
On-device wake word detection powered by deep learning
Support
Quality
Security
License
Reuse
Enhancements tracking repo for Kubernetes
Support
Quality
Security
License
Reuse
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Support
Quality
Security
License
Reuse
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Support
Quality
Security
License
Reuse
Lingvo
Support
Quality
Security
License
Reuse
A speech recognition library for the web
Support
Quality
Security
License
Reuse
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
Support
Quality
Security
License
Reuse
A python package to analyze and compare voices with deep learning
Support
Quality
Security
License
Reuse
WaveNet vocoder
Support
Quality
Security
License
Reuse
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Support
Quality
Security
License
Reuse
DeepMind's Tacotron-2 Tensorflow implementation
Support
Quality
Security
License
Reuse
t
tensorflow-speech-recognitionby pannous
Python 2142 Version:Current License: Proprietary (Proprietary)
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Support
Quality
Security
License
Reuse
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Support
Quality
Security
License
Reuse
Spot SDK repo
Support
Quality
Security
License
Reuse
Offline private voice assistant for many human languages
Support
Quality
Security
License
Reuse
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Support
Quality
Security
License
Reuse
R
Real-Time-Voice-Cloningby CorentinJ
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Python 42399Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
whisperby openai
Robust Speech Recognition via Large-Scale Weak Supervision
Python 39256Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
DeepSpeechby mozilla
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
C++ 22108Updated: 1 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
s
so-vits-svcby svc-develop-team
SoftVC VITS Singing Voice Conversion
Python 15411Updated: 1 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
l
leonby leon-ai
🧠 Leon is your open-source personal assistant.
TypeScript 12924Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldiby kaldi-asr
kaldi-asr/kaldi is the official location of the Kaldi project.
Shell 12835Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
TTSby coqui-ai
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Python 12468Updated: 1 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
z
zealby zealdocs
Offline documentation browser inspired by Dash
C++ 10486Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
A
AudioGPTby AIGC-Audio
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Python 8722Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
P
PaddleSpeechby PaddlePaddle
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Python 7725Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TTSby mozilla
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Jupyter Notebook 7519Updated: 1 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
t
tortoise-ttsby neonbjb
A multi-voice TTS system trained with an emphasis on quality
Python 7408Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pydubby jiaaro
Manipulate audio with a simple and easy high level interface
Python 7332Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech_recognitionby Uberi
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Python 7239Updated: 1 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
N
NeMoby NVIDIA
NeMo: a toolkit for conversational AI
Python 7027Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
espnetby espnet
End-to-End Speech Processing Toolkit
Python 6684Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
annyangby TalAter
:speech_balloon: Speech recognition for your site
JavaScript 6366Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
buzzby chidiwilliams
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Python 6327Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wav2letterby flashlight
Facebook AI Research's Automatic Speech Recognition Toolkit
C++ 6241Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
v
vosk-apiby alphacep
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Jupyter Notebook 5750Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
wav2letterby facebookresearch
Facebook AI Research's Automatic Speech Recognition Toolkit
Python 5531Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
R
Retrieval-based-Voice-Conversion-WebUIby RVC-Project
Voice data <= 10 mins can also be used to train a good VC model!
Python 4863Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
lucidaby claritylab
Speech and Vision Based Intelligent Personal Assistant
Java 4839Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
wukong-robotby wzpan
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Python 4664Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tacotron2by NVIDIA
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Jupyter Notebook 4497Updated: 1 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
v
vitsby jaywalnut310
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Python 4351Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
ecouteby SevaSk
Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
Python 4253Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech-to-text-wavenetby buriburisuri
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Python 3746Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
silero-modelsby snakers4
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Jupyter Notebook 3739Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
pocketsphinxby cmusphinx
A small speech recognizer
C 3387Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
TensorFlowTTSby TensorSpeech
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Python 3375Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
common-voiceby common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
TypeScript 3156Updated: 1 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
p
pyannote-audioby pyannote
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Jupyter Notebook 3116Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wenetby wenet-e2e
Production First and Production Ready End-to-End Speech Recognition Toolkit
C++ 3072Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
porcupineby Picovoice
On-device wake word detection powered by deep learning
Python 3030Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
enhancementsby kubernetes
Enhancements tracking repo for Kubernetes
Go 2907Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tacotronby keithito
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Python 2787Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
Automatic_Speech_Recognitionby zzw922cn
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Python 2729Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
Support
Quality
Security
License
Reuse
j
juliusjsby zzmp
A speech recognition library for the web
JavaScript 2578Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speechgptby hahahumble
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
TypeScript 2508Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
R
Resemblyzerby resemble-ai
A python package to analyze and compare voices with deep learning
Python 2291Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
Support
Quality
Security
License
Reuse
a
aeneasby readbeyond
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Python 2169Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
T
Tacotron-2by Rayhane-mamah
DeepMind's Tacotron-2 Tensorflow implementation
Python 2166Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tensorflow-speech-recognitionby pannous
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Python 2142Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
espeak-ngby espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
C 2099Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
spot-sdkby boston-dynamics
Spot SDK repo
Python 2077Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
r
rhasspyby rhasspy
Offline private voice assistant for many human languages
Shell 2000Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
maryttsby marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Java 1992Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse