A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
Support
Quality
Security
License
Reuse
PyTorch Implementation of FastDiff (IJCAI'22)
Support
Quality
Security
License
Reuse
Speech recognition
Support
Quality
Security
License
Reuse
This is the code for "Neural Network Voices" by Siraj Raval on Youtube
Support
Quality
Security
License
Reuse
Tools for Speech Enhancement integrated with Kaldi
Support
Quality
Security
License
Reuse
scalable audio processing framework and server written in Python
Support
Quality
Security
License
Reuse
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Support
Quality
Security
License
Reuse
This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
Support
Quality
Security
License
Reuse
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Support
Quality
Security
License
Reuse
Phoneme multilingual(Russian-English) voice cloning based on
Support
Quality
Security
License
Reuse
Voice Converter Using CycleGAN and Non-Parallel Data
Support
Quality
Security
License
Reuse
Novoic's audio feature extraction library
Support
Quality
Security
License
Reuse
Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
Support
Quality
Security
License
Reuse
A fast and lightweight python-based CTC beam search decoder for speech recognition.
Support
Quality
Security
License
Reuse
Node.js client for Google Cloud Text-to-Speech
Support
Quality
Security
License
Reuse
A 10000+ hours dataset for Chinese speech recognition
Support
Quality
Security
License
Reuse
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
Support
Quality
Security
License
Reuse
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Support
Quality
Security
License
Reuse
s
self-supervised-speech-recognitionby mailong25
Python 329 Version:Current License: No License (No License)
speech to text with self-supervised learning based on wav2vec 2.0 framework
Support
Quality
Security
License
Reuse
A
AI-Personal-Voice-assistant-using-Pythonby mmirthula02
Python 328 Version:Current License: No License (No License)
Support
Quality
Security
License
Reuse
m
music-spectrogram-diffusionby magenta
Jupyter Notebook 328 Version:Current License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
An application for real-time voice conversion
Support
Quality
Security
License
Reuse
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Support
Quality
Security
License
Reuse
A Go implementation of fluent-ffmpeg
Support
Quality
Security
License
Reuse
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
Support
Quality
Security
License
Reuse
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
Support
Quality
Security
License
Reuse
iOS app to record and transcribe speech to text with the help of the OpenAI Whisper model
Support
Quality
Security
License
Reuse
A hackday project. Run the program, speak into your microphone and hear the response from your speakers.
Support
Quality
Security
License
Reuse
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
Support
Quality
Security
License
Reuse
Dragonfly-Based Voice Programming and Accessibility Toolkit
Support
Quality
Security
License
Reuse
S
Python 313 Version:Current License: Permissive (MIT)
The S3PRL speech toolkit: self-supervised pre-training and representation learning of Mockingjay, TERA, A-ALBERT, APC, and more to come. With easy-to-use standard downstream evaluation scripts including phone classification, speaker recognition, and ASR. (All in Pytorch!)
Support
Quality
Security
License
Reuse
Repository of useful FFmpeg commands for archivists!
Support
Quality
Security
License
Reuse
CTC + Tensorflow Example for ASR
Support
Quality
Security
License
Reuse
Python interface for forced audio alignment using HTK and SoX
Support
Quality
Security
License
Reuse
t
tensorflow_end2end_speech_recognitionby hirofumi0810
Python 306 Version:Current License: Permissive (MIT)
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Support
Quality
Security
License
Reuse
A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions
Support
Quality
Security
License
Reuse
d
dl-for-emo-ttsby Emotional-Text-to-Speech
Jupyter Notebook 303 Version:Current License: Permissive (MIT)
:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
Support
Quality
Security
License
Reuse
The Shazam-similar app, that identify the song using audio fingerprints & spectrum analysis and Fast Fourier transform
Support
Quality
Security
License
Reuse
Chinese Text-to-Speech web service
Support
Quality
Security
License
Reuse
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Support
Quality
Security
License
Reuse
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
Support
Quality
Security
License
Reuse
Javascript API for the Google Text-to-Speech engine
Support
Quality
Security
License
Reuse
Aria2 FFmpeg 的多用户下载视频转码
Support
Quality
Security
License
Reuse
Rust bindings for the deepspeech library
Support
Quality
Security
License
Reuse
A
Aggregation-Cross-Entropyby summerlvsong
Python 292 Version:Current License: No License (No License)
Aggregation Cross-Entropy for Sequence Recognition. CVPR 2019.
Support
Quality
Security
License
Reuse
Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System)
Support
Quality
Security
License
Reuse
Automatic Speech Recognition (ASR) - German
Support
Quality
Security
License
Reuse
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Support
Quality
Security
License
Reuse
MicroPython libraries and examples that work out of the box on Pycom's IoT modules
Support
Quality
Security
License
Reuse
A blazingly fast and lightweight language detection library for Rust
Support
Quality
Security
License
Reuse
g
gst-tacotronby syang1993
A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
Python 358Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
F
FastDiffby Rongjiehuang
PyTorch Implementation of FastDiff (IJCAI'22)
Python 352Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
Support
Quality
Security
License
Reuse
N
Neural_Network_Voicesby llSourcell
This is the code for "Neural Network Voices" by Siraj Raval on Youtube
Python 349Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
setkby funcwj
Tools for Speech Enhancement integrated with Kaldi
Python 347Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TimeSideby Ircam-WAM
scalable audio processing framework and server written in Python
Python 347Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
T
Thorsten-Voiceby thorstenMueller
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Python 344Updated: 2 y ago License: Permissive (CC0-1.0)
Support
Quality
Security
License
Reuse
S
StarGAN-Voice-Conversionby liusongxiang
This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
Python 342Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
F
FullSubNetby haoxiangsnr
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Python 341Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
Multi-Tacotron-Voice-Cloningby vlomme
Phoneme multilingual(Russian-English) voice cloning based on
Python 336Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
V
Voice_Converter_CycleGANby leimao
Voice Converter Using CycleGAN and Non-Parallel Data
Python 336Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
surfboardby novoic
Novoic's audio feature extraction library
Python 335Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
d
dragonflyby dictation-toolbox
Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
Python 334Updated: 2 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
p
pyctcdecodeby kensho-technologies
A fast and lightweight python-based CTC beam search decoder for speech recognition.
Python 333Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
nodejs-text-to-speechby googleapis
Node.js client for Google Cloud Text-to-Speech
TypeScript 332Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
W
WenetSpeechby wenet-e2e
A 10000+ hours dataset for Chinese speech recognition
Shell 332Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
ECAPA-TDNNby TaoRuijie
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
Python 330Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pysptkby r9y9
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Python 329Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
self-supervised-speech-recognitionby mailong25
speech to text with self-supervised learning based on wav2vec 2.0 framework
Python 329Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
A
AI-Personal-Voice-assistant-using-Pythonby mmirthula02
Python 328Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
music-spectrogram-diffusionby magenta
Jupyter Notebook 328Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
r
realtime-yukarinby Hiroshiba
An application for real-time voice conversion
Python 326Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speechbrain.github.ioby speechbrain
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
HTML 326Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
go-fluent-ffmpegby modfy
A Go implementation of fluent-ffmpeg
Go 325Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
A
ActionCLIPby sallymmx
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
Python 322Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
Dual-Path-RNN-Pytorchby JusperLee
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
Python 321Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
W
Whisperboardby Saik0s
iOS app to record and transcribe speech to text with the help of the OpenAI Whisper model
Swift 320Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
Pi-Voiceby rob-mccann
A hackday project. Run the program, speak into your microphone and hear the response from your speakers.
Python 319Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
N
NeuralSVBby MoonInTheRiver
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
Python 317Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
C
Casterby dictation-toolbox
Dragonfly-Based Voice Programming and Accessibility Toolkit
Python 315Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
Self-Supervised-Speech-Pretraining-and-Representation-Learningby andi611
The S3PRL speech toolkit: self-supervised pre-training and representation learning of Mockingjay, TERA, A-ALBERT, APC, and more to come. With easy-to-use standard downstream evaluation scripts including phone classification, speaker recognition, and ASR. (All in Pytorch!)
Python 313Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
ffmprovisrby amiaopensource
Repository of useful FFmpeg commands for archivists!
HTML 313Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
ctc_tensorflow_exampleby igormq
CTC + Tensorflow Example for ASR
Python 311Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Prosodylab-Alignerby prosodylab
Python interface for forced audio alignment using HTK and SoX
Python 309Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tensorflow_end2end_speech_recognitionby hirofumi0810
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Python 306Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
watson-word-watcherby dannguyen
A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions
Python 303Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
dl-for-emo-ttsby Emotional-Text-to-Speech
:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
Jupyter Notebook 303Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
audio-fingerprint-identifying-pythonby itspoma
The Shazam-similar app, that identify the song using audio fingerprints & spectrum analysis and Fast Fourier transform
Python 302Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
H
HanTTSby junzew
Chinese Text-to-Speech web service
Python 300Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
G
GST-Tacotronby KinglittleQ
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Python 300Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spchcatby petewarden
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
C 300Updated: 2 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
g
google-ttsby hiddentao
Javascript API for the Google Text-to-Speech engine
JavaScript 298Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
y
Support
Quality
Security
License
Reuse
d
deepspeech-rsby RustAudio
Rust bindings for the deepspeech library
Rust 295Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
Aggregation-Cross-Entropyby summerlvsong
Aggregation Cross-Entropy for Sequence Recognition. CVPR 2019.
Python 292Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
T
TTS-Voice-Wizardby VRCWizard
Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System)
C# 292Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
deepspeech-germanby AASHISHAG
Automatic Speech Recognition (ASR) - German
Python 291Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
M
MB-iSTFT-VITSby MasayaKawamura
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Python 291Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pycom-librariesby pycom
MicroPython libraries and examples that work out of the box on Pycom's IoT modules
Python 286Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
w
whichlangby quickwit-oss
A blazingly fast and lightweight language detection library for Rust
Rust 286Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse