A small Javascript library for browser-based real-time speech recognition, which uses Recorderjs for audio capture, and a WebSocket connection to the Kaldi GStreamer server for speech recognition.
Support
Quality
Security
License
Reuse
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。
Support
Quality
Security
License
Reuse
Tensorflow Implementation of Expressive Tacotron
Support
Quality
Security
License
Reuse
Convert text documents to high fidelity audio(books).
Support
Quality
Security
License
Reuse
C
Cross-Lingual-Voice-Cloningby deterministic-algorithms-lab
Jupyter Notebook 193 Version:Current License: Permissive (BSD-3-Clause)
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
Support
Quality
Security
License
Reuse
Official implementation of SawSing (ISMIR'22)
Support
Quality
Security
License
Reuse
Pytorch implementation of Tacotron
Support
Quality
Security
License
Reuse
An open-source speech separation and enhancement library
Support
Quality
Security
License
Reuse
Offline transcription system for Estonian using Kaldi
Support
Quality
Security
License
Reuse
S
SpeechToText-WebSockets-Javascriptby Azure-Samples
TypeScript 191 Version:Current License: Permissive (MIT)
SDK & Sample to do speech recognition using websockets in Javascript
Support
Quality
Security
License
Reuse
PyTorch implementation of LF-MMI for End-to-end ASR
Support
Quality
Security
License
Reuse
A simple tool for a simple task: remove filler sounds ("ehm") from pre-recorded speeches. AI powered.
Support
Quality
Security
License
Reuse
A WaveRNN implementation
Support
Quality
Security
License
Reuse
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine for Chinese. 中文语音识别、文字转语音,基于语音库实现,易扩展。
Support
Quality
Security
License
Reuse
Recognition of baby cry audio signal
Support
Quality
Security
License
Reuse
Kaldi model converter to ONNX
Support
Quality
Security
License
Reuse
Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .
Support
Quality
Security
License
Reuse
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
Support
Quality
Security
License
Reuse
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Support
Quality
Security
License
Reuse
An Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
Support
Quality
Security
License
Reuse
General Speech Restoration
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
AAAI‘20 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Support
Quality
Security
License
Reuse
P25 Phase 1 and ProVoice vocoder
Support
Quality
Security
License
Reuse
Real-time Neural Timbre Transfer
Support
Quality
Security
License
Reuse
A testing server for a speech to text service based on mozilla deepspeech
Support
Quality
Security
License
Reuse
GStreamer plugin around Kaldi's online neural network decoder
Support
Quality
Security
License
Reuse
Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
Support
Quality
Security
License
Reuse
Real time speech to text transcription app.
Support
Quality
Security
License
Reuse
W
Wave-U-Net-For-Speech-Enhancementby craigmacartney
Python 182 Version:Current License: No License (No License)
Improved speech enhancement with the Wave-U-Net, a deep convolutional neural network architecture for audio source separation, implemented for the task of speech enhancement in the time-domain.
Support
Quality
Security
License
Reuse
PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning
Support
Quality
Security
License
Reuse
深層学習とかを使ってボイスチェンジャー作るリポジトリ
Support
Quality
Security
License
Reuse
Real-time note recognition in monophonic audio stream
Support
Quality
Security
License
Reuse
O
One-Shot-Voice-Cloningby CMsmartvoice
Jupyter Notebook 180 Version:Current License: No License (No License)
:relaxed: One Shot Voice Cloning base on Unet-TTS
Support
Quality
Security
License
Reuse
A list of publically available audio data that anyone can download for ASR or other speech activities
Support
Quality
Security
License
Reuse
Simple speech input for <input>s —replaces the now defunct x-webkit-speech attribute
Support
Quality
Security
License
Reuse
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
Support
Quality
Security
License
Reuse
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Support
Quality
Security
License
Reuse
Open Voice Operating System - Buildroot edition is a minimalistic linux OS bringing the open source voice assistant Mycroft A.I. to embbeded, low-spec headless and/or small (touch)screen devices.
Support
Quality
Security
License
Reuse
FastCGI support for Kaldi ASR
Support
Quality
Security
License
Reuse
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Support
Quality
Security
License
Reuse
Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/
Support
Quality
Security
License
Reuse
An Android library module to Mozilla's Speech-To-Text services
Support
Quality
Security
License
Reuse
Speaker independent emotion recognition
Support
Quality
Security
License
Reuse
S
Speaker-Identification-Pythonby Atul-Anand-Jha
Python 174 Version:Current License: Weak Copyleft (LGPL-3.0)
Speaker Identification System (upto 100% accuracy); built using Python 2.7 and python_speech_features library
Support
Quality
Security
License
Reuse
compact language detection in ruby
Support
Quality
Security
License
Reuse
MBROLA is a speech synthesizer based on the concatenation of diphones
Support
Quality
Security
License
Reuse
c
cognitive-services-speech-sdk-jsby microsoft
TypeScript 169 Version:Current License: Proprietary (Proprietary)
Microsoft Azure Cognitive Services Speech SDK for JavaScript
Support
Quality
Security
License
Reuse
Client libraries, examples and demos of Speechly API for the Web.
Support
Quality
Security
License
Reuse
Mixing an audio file with a noise file at any Signal-to-Noise Ratio (SNR)
Support
Quality
Security
License
Reuse
d
dictate.jsby Kaljurand
A small Javascript library for browser-based real-time speech recognition, which uses Recorderjs for audio capture, and a WebSocket connection to the Kaldi GStreamer server for speech recognition.
JavaScript 195Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
m
masrby binzhouchn
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。
Python 195Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
expressive_tacotronby Kyubyong
Tensorflow Implementation of Expressive Tacotron
Python 194Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
doc2audiobookby danthelion
Convert text documents to high fidelity audio(books).
Python 194Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Cross-Lingual-Voice-Cloningby deterministic-algorithms-lab
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
Jupyter Notebook 193Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
d
ddsp-singing-vocodersby YatingMusic
Official implementation of SawSing (ISMIR'22)
Python 193Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
T
Tacotron-pytorchby soobinseo
Pytorch implementation of Tacotron
Python 192Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
onssenby speechLabBcCuny
An open-source speech separation and enhancement library
Python 192Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
k
kaldi-offline-transcriberby alumae
Offline transcription system for Estonian using Kaldi
Python 191Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
SpeechToText-WebSockets-Javascriptby Azure-Samples
SDK & Sample to do speech recognition using websockets in Javascript
TypeScript 191Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pychainby YiwenShaoStephen
PyTorch implementation of LF-MMI for End-to-end ASR
C++ 191Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
simple-ehmby morrolinux
A simple tool for a simple task: remove filler sounds ("ehm") from pre-recorded speeches. AI powered.
Jupyter Notebook 191Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
W
Support
Quality
Security
License
Reuse
p
parrotsby shibing624
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine for Chinese. 中文语音识别、文字转语音,基于语音库实现,易扩展。
Python 190Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
baby_cry_detectionby giulbia
Recognition of baby cry audio signal
Python 189Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
k
kaldi-onnxby XiaoMi
Kaldi model converter to ONNX
Python 189Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pytorch-StarGAN-VCby hujinsen
Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .
Python 189Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
pytorch_xvectorsby manojpamk
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
Python 189Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
L
Listen-Attend-Spellby kaituoxu
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Python 188Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
chatbot-watson-androidby IBM-Cloud
An Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
Java 186Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
v
voicefixer_mainby haoheliu
General Speech Restoration
Python 186Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
p
pyannote-whisperby yinruiqing
Python 186Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
2
2D-TANby microsoft
AAAI‘20 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Python 185Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
m
mbelibby szechyjs
P25 Phase 1 and ProVoice vocoder
C++ 185Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
Scycloneby Torsion-Audio
Real-time Neural Timbre Transfer
C++ 185Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
deepspeech-serverby MainRo
A testing server for a speech to text service based on mozilla deepspeech
Python 184Updated: 3 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
g
gst-kaldi-nnet2-onlineby alumae
GStreamer plugin around Kaldi's online neural network decoder
C++ 184Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
transcribe-anythingby zackees
Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
Python 184Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
transcriber_appby davabase
Real time speech to text transcription app.
Python 183Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
W
Wave-U-Net-For-Speech-Enhancementby craigmacartney
Improved speech enhancement with the Wave-U-Net, a deep convolutional neural network architecture for audio source separation, implemented for the task of speech enhancement in the time-domain.
Python 182Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
end-to-end-SLUby lorenlugosch
PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning
Python 182Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
Deep_VoiceChangerby pstuvwx
深層学習とかを使ってボイスチェンジャー作るリポジトリ
Python 181Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
rtmonoaudio2midiby aniawsz
Real-time note recognition in monophonic audio stream
Python 180Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
O
One-Shot-Voice-Cloningby CMsmartvoice
:relaxed: One Shot Voice Cloning base on Unet-TTS
Jupyter Notebook 180Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
A
ASR-Audio-Data-Linksby robmsmt
A list of publically available audio data that anyone can download for ASR or other speech activities
Shell 180Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speech-inputby Daniel-Hug
Simple speech input for <input>s —replaces the now defunct x-webkit-speech attribute
JavaScript 179Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
pitsby anonymous-pits
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
Python 179Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
V
VocGANby rishikksh20
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Python 178Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
ovos-buildrootby OpenVoiceOS
Open Voice Operating System - Buildroot edition is a minimalistic linux OS bringing the open source voice assistant Mycroft A.I. to embbeded, low-spec headless and/or small (touch)screen devices.
Python 177Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
asr-serverby dialogflow
FastCGI support for Kaldi ASR
C++ 176Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
Expressive-FastSpeech2by keonlee9420
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Python 176Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
m
malaya-speechby huseinzol05
Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/
Jupyter Notebook 176Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
androidspeechby mozilla
An Android library module to Mozilla's Speech-To-Text services
C 175Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech-emotion-recognitionby harry-7
Speaker independent emotion recognition
Python 174Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
Speaker-Identification-Pythonby Atul-Anand-Jha
Speaker Identification System (upto 100% accuracy); built using Python 2.7 and python_speech_features library
Python 174Updated: 2 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
c
cldby jtoy
compact language detection in ruby
C++ 174Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
M
MBROLAby numediart
MBROLA is a speech synthesizer based on the concatenation of diphones
C 170Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
c
cognitive-services-speech-sdk-jsby microsoft
Microsoft Azure Cognitive Services Speech SDK for JavaScript
TypeScript 169Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speechlyby speechly
Client libraries, examples and demos of Speechly API for the Web.
TypeScript 169Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
audio-SNRby Sato-Kunihiko
Mixing an audio file with a noise file at any Signal-to-Noise Ratio (SNR)
Python 168Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse