Provides sample implementations of the Polly library. The intent of this project is to help newcomers kick-start their use of Polly within their own projects.
Support
Quality
Security
License
Reuse
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Support
Quality
Security
License
Reuse
Speech synthesis model repo for galgame characters based on Tacotron2 and Hifigan
Support
Quality
Security
License
Reuse
DiffSinger community vocoders release page
Support
Quality
Security
License
Reuse
A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
Support
Quality
Security
License
Reuse
Implementation code of non-parallel sequence-to-sequence VC
Support
Quality
Security
License
Reuse
Raspberry Pi + Nodejs = Speech Robot
Support
Quality
Security
License
Reuse
A fast, high-quality neural vocoder.
Support
Quality
Security
License
Reuse
An Android app that offers speech-to-text user interfaces to other apps
Support
Quality
Security
License
Reuse
End-to-end speech recognition using RNN Transducers in Tensorflow 2.0
Support
Quality
Security
License
Reuse
Web Audio Speech Synthesis / Recognition for p5.js
Support
Quality
Security
License
Reuse
Implementation of Google's Tacotron in TensorFlow
Support
Quality
Security
License
Reuse
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Support
Quality
Security
License
Reuse
Turn an image into sound whose spectrogram looks like the image.
Support
Quality
Security
License
Reuse
Encode and decode text using the Web Audio API to enable offline data transfer between devices.
Support
Quality
Security
License
Reuse
Voice assistant for Visual Studio Code.
Support
Quality
Security
License
Reuse
A pure python module for reading and writing kaldi ark files
Support
Quality
Security
License
Reuse
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Support
Quality
Security
License
Reuse
A pytroch implementation of the GAN-TTS: HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS
Support
Quality
Security
License
Reuse
CMU Wilderness Multilingual Speech Dataset
Support
Quality
Security
License
Reuse
End-2-end speech synthesis with recurrent neural networks
Support
Quality
Security
License
Reuse
A PyTorch implementation of DNN-based source separation.
Support
Quality
Security
License
Reuse
Global Rhythm Style Transfer Without Text Transcriptions
Support
Quality
Security
License
Reuse
m
multi-speaker-tacotronby nii-yamagishilab
Python 214 Version:Current License: Permissive (BSD-3-Clause)
VCTK multi-speaker tacotron for ICASSP 2020
Support
Quality
Security
License
Reuse
My-Voice Analysis is a Python library for the analysis of voice (simultaneous speech, high entropy) without the need of a transcription. It breaks utterances and detects syllable boundaries, fundamental frequency contours, and formants.
Support
Quality
Security
License
Reuse
Sonos smart speaker controller API and command-line tools
Support
Quality
Security
License
Reuse
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Support
Quality
Security
License
Reuse
C++ implementation of LSTM (Long Short Term Memory), in Kaldi's nnet1 framework. Used for automatic speech recognition, possibly language modeling etc, the training can be switched between CPU and GPU(CUDA). This repo is now merged into official Kaldi codebase(Karel's setup), so this repo is no longer maintained, please check out the Kaldi project instead.
Support
Quality
Security
License
Reuse
Replacement for built-in Speech services. Supports playing, skipping, progress, and more
Support
Quality
Security
License
Reuse
MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Support
Quality
Security
License
Reuse
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Support
Quality
Security
License
Reuse
Dynamic Audio Normalizer
Support
Quality
Security
License
Reuse
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Support
Quality
Security
License
Reuse
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Support
Quality
Security
License
Reuse
Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)
Support
Quality
Security
License
Reuse
Arabic speech recognition, classification and text-to-speech.
Support
Quality
Security
License
Reuse
Bidirectional LSTM network for speech emotion recognition.
Support
Quality
Security
License
Reuse
Base on MFCC and GMM(基于MFCC和高斯混合模型的语音识别)
Support
Quality
Security
License
Reuse
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis
Support
Quality
Security
License
Reuse
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Support
Quality
Security
License
Reuse
Images to audio files with corresponding spectrograms encoder.
Support
Quality
Security
License
Reuse
A PyTorch implementation of "Robust Universal Neural Vocoding"
Support
Quality
Security
License
Reuse
m
multimodal-speech-emotionby david-yoon
Jupyter Notebook 203 Version:Current License: Permissive (MIT)
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Support
Quality
Security
License
Reuse
Recurrent neural network for audio noise reduction
Support
Quality
Security
License
Reuse
基于STM32的孤立词语音识别
Support
Quality
Security
License
Reuse
Generate a docset from any HTML documentations. Written in python
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Support
Quality
Security
License
Reuse
VOSK Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Multi-voice singing voice synthesis
Support
Quality
Security
License
Reuse
P
Polly-Samplesby App-vNext
Provides sample implementations of the Polly library. The intent of this project is to help newcomers kick-start their use of Polly within their own projects.
C# 235Updated: 2 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
V
VQ-VAE-Speechby swasun
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Python 234Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wekwsby wenet-e2e
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Python 234Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
M
MoeTTSby luoyily
Speech synthesis model repo for galgame characters based on Tacotron2 and Hifigan
Python 234Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
v
vocodersby openvpi
DiffSinger community vocoders release page
HTML 234Updated: 1 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
v
vosk-browserby ccoreilly
A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
JavaScript 231Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
nonparaSeq2seqVC_codeby jxzhanggg
Implementation code of non-parallel sequence-to-sequence VC
Python 230Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voluteby webfansplz
Raspberry Pi + Nodejs = Speech Robot
JavaScript 228Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wavegradby lmnt-com
A fast, high-quality neural vocoder.
Python 228Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
K
K6neleby Kaljurand
An Android app that offers speech-to-text user interfaces to other apps
Java 227Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
r
rnnt-speech-recognitionby noahchalifour
End-to-end speech recognition using RNN Transducers in Tensorflow 2.0
Python 227Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
p5.js-speechby IDMNYU
Web Audio Speech Synthesis / Recognition for p5.js
JavaScript 227Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
Tacotronby barronalex
Implementation of Google's Tacotron in TensorFlow
Python 226Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
asr-evaluationby belambert
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Python 223Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spectrographicby LeviBorodenko
Turn an image into sound whose spectrogram looks like the image.
Python 222Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
webaudio-modemby martme
Encode and decode text using the Web Audio API to enable offline data transfer between devices.
HTML 222Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-assistantby b4rtaz
Voice assistant for Visual Studio Code.
TypeScript 222Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldiioby nttcslab-sp
A pure python module for reading and writing kaldi ark files
Python 220Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
edgedictby theblackcat102
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Python 219Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
GAN-TTSby yanggeng1995
A pytroch implementation of the GAN-TTS: HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS
Python 218Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
datasets-CMU_Wildernessby festvox
CMU Wilderness Multilingual Speech Dataset
Shell 218Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
T
TTS-Cubeby tiberiu44
End-2-end speech synthesis with recurrent neural networks
Python 216Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
DNN-based_source_separationby tky823
A PyTorch implementation of DNN-based source separation.
Python 216Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
A
AutoPSTby auspicious3000
Global Rhythm Style Transfer Without Text Transcriptions
Python 216Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
multi-speaker-tacotronby nii-yamagishilab
VCTK multi-speaker tacotron for ICASSP 2020
Python 214Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
m
my-voice-analysisby Shahabks
My-Voice Analysis is a Python library for the analysis of voice (simultaneous speech, high entropy) without the need of a transcription. It breaks utterances and detects syllable boundaries, fundamental frequency contours, and formants.
Python 213Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
ronorby mlang
Sonos smart speaker controller API and command-line tools
Rust 213Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
N
Naomiby NaomiProject
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Python 211Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldi-lstmby dophist
C++ implementation of LSTM (Long Short Term Memory), in Kaldi's nnet1 framework. Used for automatic speech recognition, possibly language modeling etc, the training can be switched between CPU and GPU(CUDA). This repo is now merged into official Kaldi codebase(Karel's setup), so this repo is no longer maintained, please check out the Kaldi project instead.
C++ 211Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
D
Dictaterby Nosrac
Replacement for built-in Speech services. Supports playing, skipping, progress, and more
Swift 211Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
MelGAN-VCby marcoppasini
MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Jupyter Notebook 210Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
BigCiDianby speechio
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Python 209Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
D
DynamicAudioNormalizerby lordmulder
Dynamic Audio Normalizer
C++ 209Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
M
MS-SNSDby microsoft
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
HTML 209Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
ttslearnby r9y9
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Jupyter Notebook 209Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
CREMA-Dby CheyneyComputerScience
Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)
R 208Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
k
klaamby ARBML
Arabic speech recognition, classification and text-to-speech.
Jupyter Notebook 208Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
Speech_emotion_recognition_BLSTMby RayanWang
Bidirectional LSTM network for speech emotion recognition.
Python 207Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speaker-recognition-py3by crouchred
Base on MFCC and GMM(基于MFCC和高斯混合模型的语音识别)
Python 206Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
waveglowby npuichigo
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis
Python 205Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speaker-idby google
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Python 205Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spectrologyby solusipse
Images to audio files with corresponding spectrograms encoder.
Python 204Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
U
UniversalVocodingby bshall
A PyTorch implementation of "Robust Universal Neural Vocoding"
Python 203Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
multimodal-speech-emotionby david-yoon
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Jupyter Notebook 203Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
nnnoiselessby jneem
Recurrent neural network for audio noise reduction
Rust 201Updated: 1 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
h
html2Dashby selfboot
Generate a docset from any HTML documentations. Written in python
Python 198Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
w
wettsby wenet-e2e
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Python 198Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
voskby alphacep
VOSK Speech Recognition Toolkit
C 197Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
W
WGANSingby MTG
Multi-voice singing voice synthesis
Python 195Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse