A Pytorch implementation of WaveVAE ("Parallel Neural Text-to-Speech")
Support
Quality
Security
License
Reuse
Deep Learning-based Voice Conversion system
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
HoloBot is a reusable 3D interface that allows HoloLens & VR users to interact with any bot using Mixed Reality & Speech.
Support
Quality
Security
License
Reuse
Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN
Support
Quality
Security
License
Reuse
Desktop assistant that uses speech recognition and gTTS to execute commands and talk back to the user.
Support
Quality
Security
License
Reuse
Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
Support
Quality
Security
License
Reuse
implementation of "DCCRN-Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement" by pytorch
Support
Quality
Security
License
Reuse
Synthesis Technology WaveEdit for the E370 and E352 Eurorack synthesizer modules
Support
Quality
Security
License
Reuse
A high-level toolkit for speaker recognition, build on top of ALIZE-Core.
Support
Quality
Security
License
Reuse
Fully Functional Voice Based Natural Language UI
Support
Quality
Security
License
Reuse
An Android audio management library for real-time communication apps.
Support
Quality
Security
License
Reuse
s
speech-emotion-recognitionby amanbasu
Jupyter Notebook 112 Version:Current License: Strong Copyleft (GPL-3.0)
Detecting emotions using MFCC features of human speech using Deep Learning
Support
Quality
Security
License
Reuse
A release version for https://github.com/athena-team/athena
Support
Quality
Security
License
Reuse
Voice Conversion Challenge 2020 CycleVAE baseline system
Support
Quality
Security
License
Reuse
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.
Support
Quality
Security
License
Reuse
p
pytorch-kaldi-neural-speaker-embeddingsby jefflai108
Perl 111 Version:Current License: Permissive (BSD-3-Clause)
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
Support
Quality
Security
License
Reuse
FreeSWITCH ASR APP
Support
Quality
Security
License
Reuse
Port of Android Pico TTS to the Raspberry Pi
Support
Quality
Security
License
Reuse
A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM
Support
Quality
Security
License
Reuse
Code for "Vid2speech: Speech Reconstruction from Silent Video" ICASSP '17
Support
Quality
Security
License
Reuse
This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.
Support
Quality
Security
License
Reuse
Singing synthesis from MIDI file
Support
Quality
Security
License
Reuse
Text to speech package for Golang.
Support
Quality
Security
License
Reuse
Pico TTS: text to speech voice sinthesizer from SVox, included in Android AOSP
Support
Quality
Security
License
Reuse
Official Task Suite Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
Support
Quality
Security
License
Reuse
A Simple and Efficient Audio Resampler Implementation in C
Support
Quality
Security
License
Reuse
基于Flask Web的中文自动语音识别演示系统,包含语音识别、语音合成、声纹识别之说话人识别。
Support
Quality
Security
License
Reuse
The web browser client library for Speechly API
Support
Quality
Security
License
Reuse
B
Jupyter Notebook 109 Version:Current License: No License (No License)
Support
Quality
Security
License
Reuse
On-device voice activity detection (VAD) powered by deep learning.
Support
Quality
Security
License
Reuse
App that leverages GPT-3 to facilitate new language listening and speaking practice.
Support
Quality
Security
License
Reuse
vits singing voice conversion based on ppg & hubert;singing voice clone;
Support
Quality
Security
License
Reuse
PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi
Support
Quality
Security
License
Reuse
PyTorch Implementations for End-to-End Automatic Speech Recognition
Support
Quality
Security
License
Reuse
Common high-level interface to speech synthesis
Support
Quality
Security
License
Reuse
Computes the GMM-based Goodness of Pronunciation (GOP). Bases on Kaldi.
Support
Quality
Security
License
Reuse
Unofficial Pytorch Implementation of WaveGrad2
Support
Quality
Security
License
Reuse
A simple example for use speech recognition baidu api with python.
Support
Quality
Security
License
Reuse
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Support
Quality
Security
License
Reuse
Audio Fingerprinting & Retrieval for .NET
Support
Quality
Security
License
Reuse
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (IEEE MLSP 2021)
Support
Quality
Security
License
Reuse
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)
Support
Quality
Security
License
Reuse
各種 Text-to-Speech エンジンを統一的に操作するライブラリです
Support
Quality
Security
License
Reuse
Translator.js is a JavaScript library built top on Google Speech-Recognition & Translation API to transcript and translate voice and text. It supports many locales and brings globalization in WebRTC! https://www.webrtc-experiment.com/Translator/
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model
Support
Quality
Security
License
Reuse
W
WebAudioEvaluationToolby BrechtDeMan
JavaScript 105 Version:Current License: Strong Copyleft (GPL-3.0)
A tool based on the HTML5 Web Audio API to perform perceptual audio evaluation tests locally or on remote machines over the web.
Support
Quality
Security
License
Reuse
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.
Support
Quality
Security
License
Reuse
Automatically constructing corpus for automatic speech recognition from YouTube videos
Support
Quality
Security
License
Reuse
W
WaveVAEby ksw0306
A Pytorch implementation of WaveVAE ("Parallel Neural Text-to-Speech")
Python 116Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tfg-voice-conversionby albertaparicio
Deep Learning-based Voice Conversion system
Python 116Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
p
panns_inferenceby qiuqiangkong
Python 116Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
H
HoloBotby ActiveNick
HoloBot is a reusable 3D interface that allows HoloLens & VR users to interact with any bot using Mixed Reality & Speech.
C# 116Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldipdnnby yajiemiao
Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN
Shell 116Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
desktopAssistantby jg-fisher
Desktop assistant that uses speech recognition and gTTS to execute commands and talk back to the user.
Python 115Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SpeakerRecognition_tutorialby jymsuper
Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
Python 115Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
DCCRNby maggie0830
implementation of "DCCRN-Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement" by pytorch
Python 115Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
W
WaveEditby AndrewBelt
Synthesis Technology WaveEdit for the E370 and E352 Eurorack synthesizer modules
C++ 115Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
L
LIA_RALby ALIZE-Speaker-Recognition
A high-level toolkit for speaker recognition, build on top of ALIZE-Core.
C++ 115Updated: 4 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
J
Jarvisby thevickypedia
Fully Functional Voice Based Natural Language UI
Python 113Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
audioswitchby twilio
An Android audio management library for real-time communication apps.
Kotlin 112Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speech-emotion-recognitionby amanbasu
Detecting emotions using MFCC features of human speech using Deep Learning
Jupyter Notebook 112Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
a
athenaby didi
A release version for https://github.com/athena-team/athena
Python 111Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
vcc20_baseline_cyclevaeby bigpon
Voice Conversion Challenge 2020 CycleVAE baseline system
Python 111Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spokestack-pythonby spokestack
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.
Python 111Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pytorch-kaldi-neural-speaker-embeddingsby jefflai108
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
Perl 111Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
F
Support
Quality
Security
License
Reuse
p
picopiby DougGore
Port of Android Pico TTS to the Raspberry Pi
C 111Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
py_speech_segby wblgers
A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM
Python 110Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
vid2speechby arielephrat
Code for "Vid2speech: Speech Reconstruction from Silent Video" ICASSP '17
Python 110Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
V
Voice-synthesisby smoke-trees
This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.
Python 110Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
midi2voiceby mathigatti
Singing synthesis from MIDI file
Python 110Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
htgo-ttsby hegedustibor
Text to speech package for Golang.
Go 110Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
picottsby naggety
Pico TTS: text to speech voice sinthesizer from SVox, included in Android AOSP
C 110Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
V
VIMABenchby vimalabs
Official Task Suite Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
Python 110Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
resamplerby cpuimage
A Simple and Efficient Audio Resampler Implementation in C
C 109Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
CASR-DEMOby lihanghang
基于Flask Web的中文自动语音识别演示系统,包含语音识别、语音合成、声纹识别之说话人识别。
CSS 109Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
b
browser-clientby speechly
The web browser client library for Speechly API
TypeScript 109Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
Bangla-deep-speech-Recognitionby Qyum
Jupyter Notebook 109Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
cobraby Picovoice
On-device voice activity detection (VAD) powered by deep learning.
Python 109Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
talk-with-gpt3by JavaFXpert
App that leverages GPT-3 to facilitate new language listening and speaking practice.
JavaScript 109Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
V
VI-SVCby PlayVoice
vits singing voice conversion based on ppg & hubert;singing voice clone;
Python 109Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
F
Factorized-TDNNby cvqluu
PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi
Python 108Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
E
E2E-ASRby HawkAaron
PyTorch Implementations for End-to-End Automatic Speech Recognition
Python 108Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speechdby brailcom
Common high-level interface to speech synthesis
C 108Updated: 1 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
k
kaldi-gopby jimbozhang
Computes the GMM-based Goodness of Pronunciation (GOP). Bases on Kaldi.
C++ 108Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
wavegrad2by mindslab-ai
Unofficial Pytorch Implementation of WaveGrad2
Jupyter Notebook 108Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
p
python-Speech_Recognitionby zthxxx
A simple example for use speech recognition baidu api with python.
Python 107Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
ctc-asrby mdangschat
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Python 107Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
Aurioby protyposis
Audio Fingerprinting & Retrieval for .NET
C# 107Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
m
mlp-singerby neosapience
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (IEEE MLSP 2021)
Python 107Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
edittsby neosapience
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)
Python 107Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
TTSControllerby ksasao
各種 Text-to-Speech エンジンを統一的に操作するライブラリです
C# 106Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
Translatorby muaz-khan
Translator.js is a JavaScript library built top on Google Speech-Recognition & Translation API to transcript and translate voice and text. It supports many locales and brings globalization in WebRTC! https://www.webrtc-experiment.com/Translator/
HTML 106Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
voice_conversionby ebadawy
Python 106Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
Tacotron-pytorchby ttaoREtw
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model
Python 105Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
W
WebAudioEvaluationToolby BrechtDeMan
A tool based on the HTML5 Web Audio API to perform perceptual audio evaluation tests locally or on remote machines over the web.
JavaScript 105Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
T
Tacotron2-PyTorchby BogiHsu
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.
Python 104Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
K
KTSpeechCrawlerby EgorLakomkin
Automatically constructing corpus for automatic speech recognition from YouTube videos
Python 104Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse