A small Javascript library for browser-based real-time speech recognition, which uses Recorderjs for audio capture, and a WebSocket connection to the Kaldi GStreamer server for speech recognition.
Support
Quality
Security
License
Reuse
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。
Support
Quality
Security
License
Reuse
Tensorflow Implementation of Expressive Tacotron
Support
Quality
Security
License
Reuse
Convert text documents to high fidelity audio(books).
Support
Quality
Security
License
Reuse
C
Cross-Lingual-Voice-Cloningby deterministic-algorithms-lab
Jupyter Notebook 
193
Version:Current
License: Permissive (BSD-3-Clause)
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
Support
Quality
Security
License
Reuse
Official implementation of SawSing (ISMIR'22)
Support
Quality
Security
License
Reuse
Pytorch implementation of Tacotron
Support
Quality
Security
License
Reuse
An open-source speech separation and enhancement library
Support
Quality
Security
License
Reuse
Offline transcription system for Estonian using Kaldi
Support
Quality
Security
License
Reuse
S
SpeechToText-WebSockets-Javascriptby Azure-Samples
TypeScript 
191
Version:Current
License: Permissive (MIT)
SDK & Sample to do speech recognition using websockets in Javascript
Support
Quality
Security
License
Reuse
PyTorch implementation of LF-MMI for End-to-end ASR
Support
Quality
Security
License
Reuse
A simple tool for a simple task: remove filler sounds ("ehm") from pre-recorded speeches. AI powered.
Support
Quality
Security
License
Reuse
A WaveRNN implementation
Support
Quality
Security
License
Reuse
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine for Chinese. 中文语音识别、文字转语音,基于语音库实现,易扩展。
Support
Quality
Security
License
Reuse
Recognition of baby cry audio signal
Support
Quality
Security
License
Reuse
Kaldi model converter to ONNX
Support
Quality
Security
License
Reuse
Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .
Support
Quality
Security
License
Reuse
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
Support
Quality
Security
License
Reuse
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Support
Quality
Security
License
Reuse
An Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
Support
Quality
Security
License
Reuse
General Speech Restoration
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
AAAI‘20 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Support
Quality
Security
License
Reuse
P25 Phase 1 and ProVoice vocoder
Support
Quality
Security
License
Reuse
Real-time Neural Timbre Transfer
Support
Quality
Security
License
Reuse
A testing server for a speech to text service based on mozilla deepspeech
Support
Quality
Security
License
Reuse
GStreamer plugin around Kaldi's online neural network decoder
Support
Quality
Security
License
Reuse
Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
Support
Quality
Security
License
Reuse
Real time speech to text transcription app.
Support
Quality
Security
License
Reuse
W
Wave-U-Net-For-Speech-Enhancementby craigmacartney
Python 
182
Version:Current
License: No License (No License)
Improved speech enhancement with the Wave-U-Net, a deep convolutional neural network architecture for audio source separation, implemented for the task of speech enhancement in the time-domain.
Support
Quality
Security
License
Reuse
PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning
Support
Quality
Security
License
Reuse
深層学習とかを使ってボイスチェンジャー作るリポジトリ
Support
Quality
Security
License
Reuse
Real-time note recognition in monophonic audio stream
Support
Quality
Security
License
Reuse
O
One-Shot-Voice-Cloningby CMsmartvoice
Jupyter Notebook 
180
Version:Current
License: No License (No License)
:relaxed: One Shot Voice Cloning base on Unet-TTS
Support
Quality
Security
License
Reuse
A list of publically available audio data that anyone can download for ASR or other speech activities
Support
Quality
Security
License
Reuse
Simple speech input for <input>s —replaces the now defunct x-webkit-speech attribute
Support
Quality
Security
License
Reuse
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
Support
Quality
Security
License
Reuse
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Support
Quality
Security
License
Reuse
Open Voice Operating System - Buildroot edition is a minimalistic linux OS bringing the open source voice assistant Mycroft A.I. to embbeded, low-spec headless and/or small (touch)screen devices.
Support
Quality
Security
License
Reuse
FastCGI support for Kaldi ASR
Support
Quality
Security
License
Reuse
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Support
Quality
Security
License
Reuse
Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/
Support
Quality
Security
License
Reuse
An Android library module to Mozilla's Speech-To-Text services
Support
Quality
Security
License
Reuse
Speaker independent emotion recognition
Support
Quality
Security
License
Reuse
S
Speaker-Identification-Pythonby Atul-Anand-Jha
Python 
174
Version:Current
License: Weak Copyleft (LGPL-3.0)
Speaker Identification System (upto 100% accuracy); built using Python 2.7 and python_speech_features library
Support
Quality
Security
License
Reuse
compact language detection in ruby
Support
Quality
Security
License
Reuse
MBROLA is a speech synthesizer based on the concatenation of diphones
Support
Quality
Security
License
Reuse
c
cognitive-services-speech-sdk-jsby microsoft
TypeScript 
169
Version:Current
License: Proprietary (Proprietary)
Microsoft Azure Cognitive Services Speech SDK for JavaScript
Support
Quality
Security
License
Reuse
Client libraries, examples and demos of Speechly API for the Web.
Support
Quality
Security
License
Reuse
Mixing an audio file with a noise file at any Signal-to-Noise Ratio (SNR)
Support
Quality
Security
License
Reuse
d
dictate.jsby Kaljurand
A small Javascript library for browser-based real-time speech recognition, which uses Recorderjs for audio capture, and a WebSocket connection to the Kaldi GStreamer server for speech recognition.
JavaScript
195
Updated: 4 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
m
masrby binzhouchn
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。
Python
195
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
e
expressive_tacotronby Kyubyong
Tensorflow Implementation of Expressive Tacotron
Python
194
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
d
doc2audiobookby danthelion
Convert text documents to high fidelity audio(books).
Python
194
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Cross-Lingual-Voice-Cloningby deterministic-algorithms-lab
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
Jupyter Notebook
193
Updated: 3 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
d
ddsp-singing-vocodersby YatingMusic
Official implementation of SawSing (ISMIR'22)
Python
193
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
T
Tacotron-pytorchby soobinseo
Pytorch implementation of Tacotron
Python
192
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
onssenby speechLabBcCuny
An open-source speech separation and enhancement library
Python
192
Updated: 4 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
k
kaldi-offline-transcriberby alumae
Offline transcription system for Estonian using Kaldi
Python
191
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
SpeechToText-WebSockets-Javascriptby Azure-Samples
SDK & Sample to do speech recognition using websockets in Javascript
TypeScript
191
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pychainby YiwenShaoStephen
PyTorch implementation of LF-MMI for End-to-end ASR
C++
191
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
simple-ehmby morrolinux
A simple tool for a simple task: remove filler sounds ("ehm") from pre-recorded speeches. AI powered.
Jupyter Notebook
191
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
W
Support
Quality
Security
License
Reuse
p
parrotsby shibing624
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine for Chinese. 中文语音识别、文字转语音,基于语音库实现,易扩展。
Python
190
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
baby_cry_detectionby giulbia
Recognition of baby cry audio signal
Python
189
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
k
kaldi-onnxby XiaoMi
Kaldi model converter to ONNX
Python
189
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pytorch-StarGAN-VCby hujinsen
Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .
Python
189
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
p
pytorch_xvectorsby manojpamk
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
Python
189
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
L
Listen-Attend-Spellby kaituoxu
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Python
188
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
c
chatbot-watson-androidby IBM-Cloud
An Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
Java
186
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
v
voicefixer_mainby haoheliu
General Speech Restoration
Python
186
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
p
pyannote-whisperby yinruiqing
Python
186
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
2
2D-TANby microsoft
AAAI‘20 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Python
185
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
m
mbelibby szechyjs
P25 Phase 1 and ProVoice vocoder
C++
185
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
Scycloneby Torsion-Audio
Real-time Neural Timbre Transfer
C++
185
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
deepspeech-serverby MainRo
A testing server for a speech to text service based on mozilla deepspeech
Python
184
Updated: 4 y ago
License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
g
gst-kaldi-nnet2-onlineby alumae
GStreamer plugin around Kaldi's online neural network decoder
C++
184
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
transcribe-anythingby zackees
Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
Python
184
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
transcriber_appby davabase
Real time speech to text transcription app.
Python
183
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
W
Wave-U-Net-For-Speech-Enhancementby craigmacartney
Improved speech enhancement with the Wave-U-Net, a deep convolutional neural network architecture for audio source separation, implemented for the task of speech enhancement in the time-domain.
Python
182
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
e
end-to-end-SLUby lorenlugosch
PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning
Python
182
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
Deep_VoiceChangerby pstuvwx
深層学習とかを使ってボイスチェンジャー作るリポジトリ
Python
181
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
rtmonoaudio2midiby aniawsz
Real-time note recognition in monophonic audio stream
Python
180
Updated: 4 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
O
One-Shot-Voice-Cloningby CMsmartvoice
:relaxed: One Shot Voice Cloning base on Unet-TTS
Jupyter Notebook
180
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
A
ASR-Audio-Data-Linksby robmsmt
A list of publically available audio data that anyone can download for ASR or other speech activities
Shell
180
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speech-inputby Daniel-Hug
Simple speech input for <input>s —replaces the now defunct x-webkit-speech attribute
JavaScript
179
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
p
pitsby anonymous-pits
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
Python
179
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
V
VocGANby rishikksh20
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Python
178
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
ovos-buildrootby OpenVoiceOS
Open Voice Operating System - Buildroot edition is a minimalistic linux OS bringing the open source voice assistant Mycroft A.I. to embbeded, low-spec headless and/or small (touch)screen devices.
Python
177
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
asr-serverby dialogflow
FastCGI support for Kaldi ASR
C++
176
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
Expressive-FastSpeech2by keonlee9420
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Python
176
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
m
malaya-speechby huseinzol05
Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/
Jupyter Notebook
176
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
androidspeechby mozilla
An Android library module to Mozilla's Speech-To-Text services
C
175
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech-emotion-recognitionby harry-7
Speaker independent emotion recognition
Python
174
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
Speaker-Identification-Pythonby Atul-Anand-Jha
Speaker Identification System (upto 100% accuracy); built using Python 2.7 and python_speech_features library
Python
174
Updated: 2 y ago
License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
c
cldby jtoy
compact language detection in ruby
C++
174
Updated: 3 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
M
MBROLAby numediart
MBROLA is a speech synthesizer based on the concatenation of diphones
C
170
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
c
cognitive-services-speech-sdk-jsby microsoft
Microsoft Azure Cognitive Services Speech SDK for JavaScript
TypeScript
169
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speechlyby speechly
Client libraries, examples and demos of Speechly API for the Web.
TypeScript
169
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
audio-SNRby Sato-Kunihiko
Mixing an audio file with a noise file at any Signal-to-Noise Ratio (SNR)
Python
168
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse