Provides sample implementations of the Polly library. The intent of this project is to help newcomers kick-start their use of Polly within their own projects.
Support
Quality
Security
License
Reuse
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Support
Quality
Security
License
Reuse
Speech synthesis model repo for galgame characters based on Tacotron2 and Hifigan
Support
Quality
Security
License
Reuse
DiffSinger community vocoders release page
Support
Quality
Security
License
Reuse
A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
Support
Quality
Security
License
Reuse
Implementation code of non-parallel sequence-to-sequence VC
Support
Quality
Security
License
Reuse
Raspberry Pi + Nodejs = Speech Robot
Support
Quality
Security
License
Reuse
A fast, high-quality neural vocoder.
Support
Quality
Security
License
Reuse
An Android app that offers speech-to-text user interfaces to other apps
Support
Quality
Security
License
Reuse
End-to-end speech recognition using RNN Transducers in Tensorflow 2.0
Support
Quality
Security
License
Reuse
Web Audio Speech Synthesis / Recognition for p5.js
Support
Quality
Security
License
Reuse
Implementation of Google's Tacotron in TensorFlow
Support
Quality
Security
License
Reuse
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Support
Quality
Security
License
Reuse
Turn an image into sound whose spectrogram looks like the image.
Support
Quality
Security
License
Reuse
Encode and decode text using the Web Audio API to enable offline data transfer between devices.
Support
Quality
Security
License
Reuse
Voice assistant for Visual Studio Code.
Support
Quality
Security
License
Reuse
A pure python module for reading and writing kaldi ark files
Support
Quality
Security
License
Reuse
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Support
Quality
Security
License
Reuse
A pytroch implementation of the GAN-TTS: HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS
Support
Quality
Security
License
Reuse
CMU Wilderness Multilingual Speech Dataset
Support
Quality
Security
License
Reuse
End-2-end speech synthesis with recurrent neural networks
Support
Quality
Security
License
Reuse
A PyTorch implementation of DNN-based source separation.
Support
Quality
Security
License
Reuse
Global Rhythm Style Transfer Without Text Transcriptions
Support
Quality
Security
License
Reuse
m
multi-speaker-tacotronby nii-yamagishilab
Python 
214
Version:Current
License: Permissive (BSD-3-Clause)
VCTK multi-speaker tacotron for ICASSP 2020
Support
Quality
Security
License
Reuse
My-Voice Analysis is a Python library for the analysis of voice (simultaneous speech, high entropy) without the need of a transcription. It breaks utterances and detects syllable boundaries, fundamental frequency contours, and formants.
Support
Quality
Security
License
Reuse
Sonos smart speaker controller API and command-line tools
Support
Quality
Security
License
Reuse
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Support
Quality
Security
License
Reuse
C++ implementation of LSTM (Long Short Term Memory), in Kaldi's nnet1 framework. Used for automatic speech recognition, possibly language modeling etc, the training can be switched between CPU and GPU(CUDA). This repo is now merged into official Kaldi codebase(Karel's setup), so this repo is no longer maintained, please check out the Kaldi project instead.
Support
Quality
Security
License
Reuse
Replacement for built-in Speech services. Supports playing, skipping, progress, and more
Support
Quality
Security
License
Reuse
MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Support
Quality
Security
License
Reuse
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Support
Quality
Security
License
Reuse
Dynamic Audio Normalizer
Support
Quality
Security
License
Reuse
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Support
Quality
Security
License
Reuse
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Support
Quality
Security
License
Reuse
Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)
Support
Quality
Security
License
Reuse
Arabic speech recognition, classification and text-to-speech.
Support
Quality
Security
License
Reuse
Bidirectional LSTM network for speech emotion recognition.
Support
Quality
Security
License
Reuse
Base on MFCC and GMM(基于MFCC和高斯混合模型的语音识别)
Support
Quality
Security
License
Reuse
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis
Support
Quality
Security
License
Reuse
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Support
Quality
Security
License
Reuse
Images to audio files with corresponding spectrograms encoder.
Support
Quality
Security
License
Reuse
A PyTorch implementation of "Robust Universal Neural Vocoding"
Support
Quality
Security
License
Reuse
m
multimodal-speech-emotionby david-yoon
Jupyter Notebook 
203
Version:Current
License: Permissive (MIT)
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Support
Quality
Security
License
Reuse
Recurrent neural network for audio noise reduction
Support
Quality
Security
License
Reuse
基于STM32的孤立词语音识别
Support
Quality
Security
License
Reuse
Generate a docset from any HTML documentations. Written in python
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Support
Quality
Security
License
Reuse
VOSK Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Multi-voice singing voice synthesis
Support
Quality
Security
License
Reuse
P
Polly-Samplesby App-vNext
Provides sample implementations of the Polly library. The intent of this project is to help newcomers kick-start their use of Polly within their own projects.
C#
235
Updated: 2 y ago
License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
V
VQ-VAE-Speechby swasun
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Python
234
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wekwsby wenet-e2e
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Python
234
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
M
MoeTTSby luoyily
Speech synthesis model repo for galgame characters based on Tacotron2 and Hifigan
Python
234
Updated: 3 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
v
vocodersby openvpi
DiffSinger community vocoders release page
HTML
234
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
v
vosk-browserby ccoreilly
A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
JavaScript
231
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
nonparaSeq2seqVC_codeby jxzhanggg
Implementation code of non-parallel sequence-to-sequence VC
Python
230
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voluteby webfansplz
Raspberry Pi + Nodejs = Speech Robot
JavaScript
228
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wavegradby lmnt-com
A fast, high-quality neural vocoder.
Python
228
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
K
K6neleby Kaljurand
An Android app that offers speech-to-text user interfaces to other apps
Java
227
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
r
rnnt-speech-recognitionby noahchalifour
End-to-end speech recognition using RNN Transducers in Tensorflow 2.0
Python
227
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
p5.js-speechby IDMNYU
Web Audio Speech Synthesis / Recognition for p5.js
JavaScript
227
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
Tacotronby barronalex
Implementation of Google's Tacotron in TensorFlow
Python
226
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
a
asr-evaluationby belambert
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Python
223
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spectrographicby LeviBorodenko
Turn an image into sound whose spectrogram looks like the image.
Python
222
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
webaudio-modemby martme
Encode and decode text using the Web Audio API to enable offline data transfer between devices.
HTML
222
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-assistantby b4rtaz
Voice assistant for Visual Studio Code.
TypeScript
222
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldiioby nttcslab-sp
A pure python module for reading and writing kaldi ark files
Python
220
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
edgedictby theblackcat102
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Python
219
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
GAN-TTSby yanggeng1995
A pytroch implementation of the GAN-TTS: HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS
Python
218
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
d
datasets-CMU_Wildernessby festvox
CMU Wilderness Multilingual Speech Dataset
Shell
218
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
T
TTS-Cubeby tiberiu44
End-2-end speech synthesis with recurrent neural networks
Python
216
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
DNN-based_source_separationby tky823
A PyTorch implementation of DNN-based source separation.
Python
216
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
A
AutoPSTby auspicious3000
Global Rhythm Style Transfer Without Text Transcriptions
Python
216
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
multi-speaker-tacotronby nii-yamagishilab
VCTK multi-speaker tacotron for ICASSP 2020
Python
214
Updated: 4 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
m
my-voice-analysisby Shahabks
My-Voice Analysis is a Python library for the analysis of voice (simultaneous speech, high entropy) without the need of a transcription. It breaks utterances and detects syllable boundaries, fundamental frequency contours, and formants.
Python
213
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
ronorby mlang
Sonos smart speaker controller API and command-line tools
Rust
213
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
N
Naomiby NaomiProject
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Python
211
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldi-lstmby dophist
C++ implementation of LSTM (Long Short Term Memory), in Kaldi's nnet1 framework. Used for automatic speech recognition, possibly language modeling etc, the training can be switched between CPU and GPU(CUDA). This repo is now merged into official Kaldi codebase(Karel's setup), so this repo is no longer maintained, please check out the Kaldi project instead.
C++
211
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
D
Dictaterby Nosrac
Replacement for built-in Speech services. Supports playing, skipping, progress, and more
Swift
211
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
MelGAN-VCby marcoppasini
MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Jupyter Notebook
210
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
BigCiDianby speechio
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Python
209
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
D
DynamicAudioNormalizerby lordmulder
Dynamic Audio Normalizer
C++
209
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
M
MS-SNSDby microsoft
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
HTML
209
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
ttslearnby r9y9
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Jupyter Notebook
209
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
CREMA-Dby CheyneyComputerScience
Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)
R
208
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
k
klaamby ARBML
Arabic speech recognition, classification and text-to-speech.
Jupyter Notebook
208
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
Speech_emotion_recognition_BLSTMby RayanWang
Bidirectional LSTM network for speech emotion recognition.
Python
207
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speaker-recognition-py3by crouchred
Base on MFCC and GMM(基于MFCC和高斯混合模型的语音识别)
Python
206
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
waveglowby npuichigo
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis
Python
205
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speaker-idby google
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Python
205
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spectrologyby solusipse
Images to audio files with corresponding spectrograms encoder.
Python
204
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
U
UniversalVocodingby bshall
A PyTorch implementation of "Robust Universal Neural Vocoding"
Python
203
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
multimodal-speech-emotionby david-yoon
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Jupyter Notebook
203
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
nnnoiselessby jneem
Recurrent neural network for audio noise reduction
Rust
201
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
h
html2Dashby selfboot
Generate a docset from any HTML documentations. Written in python
Python
198
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
w
wettsby wenet-e2e
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Python
198
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
voskby alphacep
VOSK Speech Recognition Toolkit
C
197
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
W
WGANSingby MTG
Multi-voice singing voice synthesis
Python
195
Updated: 3 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse