The PyTorch-based audio source separation toolkit for researchers
Support
Quality
Security
License
Reuse
免费的在线文本转语音API
Support
Quality
Security
License
Reuse
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Support
Quality
Security
License
Reuse
A neural network for end-to-end speech denoising
Support
Quality
Security
License
Reuse
Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser
Support
Quality
Security
License
Reuse
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Support
Quality
Security
License
Reuse
:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection
Support
Quality
Security
License
Reuse
A library for speech data augmentation in time-domain
Support
Quality
Security
License
Reuse
PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN)
Support
Quality
Security
License
Reuse
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Support
Quality
Security
License
Reuse
Offline speech recognition for Android with Vosk library.
Support
Quality
Security
License
Reuse
Android SDK for Dialogflow
Support
Quality
Security
License
Reuse
CMU ARK Twitter Part-of-Speech Tagger
Support
Quality
Security
License
Reuse
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Support
Quality
Security
License
Reuse
一个使用C++编写的音频处理软件
Support
Quality
Security
License
Reuse
Legacy repository for concrete5
Support
Quality
Security
License
Reuse
Speech Algorithms
Support
Quality
Security
License
Reuse
Python library for Dialogflow
Support
Quality
Security
License
Reuse
A self-supervised learning framework for audio-visual speech
Support
Quality
Security
License
Reuse
Voice Conversion Tool Kit
Support
Quality
Security
License
Reuse
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Support
Quality
Security
License
Reuse
Language Detection Library for Java
Support
Quality
Security
License
Reuse
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Support
Quality
Security
License
Reuse
On-device Speech-to-Intent engine powered by deep learning
Support
Quality
Security
License
Reuse
Open Text to Speech Server
Support
Quality
Security
License
Reuse
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Support
Quality
Security
License
Reuse
An opensource text-to-speech (TTS) voice building tool
Support
Quality
Security
License
Reuse
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Support
Quality
Security
License
Reuse
A Fundamental End-to-End Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
Support
Quality
Security
License
Reuse
基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。
Support
Quality
Security
License
Reuse
Large, modern dataset for speech recognition
Support
Quality
Security
License
Reuse
Evaluation functions for music/audio information retrieval/signal processing algorithms.
Support
Quality
Security
License
Reuse
Converts an audio stream to speech events in the browser
Support
Quality
Security
License
Reuse
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
Support
Quality
Security
License
Reuse
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Support
Quality
Security
License
Reuse
speech to text benchmark framework
Support
Quality
Security
License
Reuse
General Speech Restoration
Support
Quality
Security
License
Reuse
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Support
Quality
Security
License
Reuse
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Support
Quality
Security
License
Reuse
MelGAN vocoder (compatible with NVIDIA/tacotron2)
Support
Quality
Security
License
Reuse
On-device streaming speech-to-text engine powered by deep learning
Support
Quality
Security
License
Reuse
📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转化)
Support
Quality
Security
License
Reuse
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Support
Quality
Security
License
Reuse
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Support
Quality
Security
License
Reuse
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
Support
Quality
Security
License
Reuse
End-to-end ASR/LM implementation with PyTorch
Support
Quality
Security
License
Reuse
A Python wrapper for the high-quality vocoder "World"
Support
Quality
Security
License
Reuse
🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Support
Quality
Security
License
Reuse
Flutter Text to Speech package
Support
Quality
Security
License
Reuse
a
asteroidby mpariente
The PyTorch-based audio source separation toolkit for researchers
Python 611Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
Support
Quality
Security
License
Reuse
S
SpecAugmentby DemisEom
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Python 596Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speech-denoising-wavenetby drethage
A neural network for end-to-end speech denoising
Python 594Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
cboardby cboard-org
Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser
JavaScript 594Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
d
diffwaveby lmnt-com
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Python 593Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sonusby evancohen
:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection
JavaScript 592Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
W
WavAugmentby facebookresearch
A library for speech data augmentation in time-domain
Python 585Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Parakeetby PaddlePaddle
PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN)
Python 584Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
i
inaSpeechSegmenterby ina-foss
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Python 584Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
vosk-android-demoby alphacep
Offline speech recognition for Android with Vosk library.
Java 578Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
dialogflow-android-clientby dialogflow
Android SDK for Dialogflow
Java 577Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
ark-tweet-nlpby brendano
CMU ARK Twitter Part-of-Speech Tagger
Java 573Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
o
openspeechby openspeech-team
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Python 572Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
MoeVoiceStudioby NaruseMioShirakana
一个使用C++编写的音频处理软件
C++ 572Updated: 1 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
c
concrete5-legacyby concretecms
Legacy repository for concrete5
PHP 566Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
Support
Quality
Security
License
Reuse
d
dialogflow-python-clientby dialogflow
Python library for Dialogflow
Python 559Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
av_hubertby facebookresearch
A self-supervised learning framework for audio-visual speech
Python 559Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
C
Conv-TasNetby kaituoxu
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Python 552Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
language-detectorby optimaize
Language Detection Library for Java
Java 543Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
Y
YourTTSby Edresson
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Jupyter Notebook 541Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
r
rhinoby Picovoice
On-device Speech-to-Intent engine powered by deep learning
Python 533Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
openttsby synesthesiam
Open Text to Speech Server
Python 530Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
Multilingual_Text_to_Speechby Tomiinek
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Python 528Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-builderby google
An opensource text-to-speech (TTS) voice building tool
JavaScript 528Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SpeechSplitby auspicious3000
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Python 527Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
F
FunASRby alibaba-damo-academy
A Fundamental End-to-End Speech Recognition Toolkit
Python 524Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
tacotronby google
Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
HTML 522Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
P
PaddlePaddle-DeepSpeechby yeyupiaoling
基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。
Python 519Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
GigaSpeechby SpeechColab
Large, modern dataset for speech recognition
Shell 515Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mir_evalby craffel
Evaluation functions for music/audio information retrieval/signal processing algorithms.
Python 512Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
harkby otalk
Converts an audio stream to speech events in the browser
JavaScript 504Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
C
CTCWordBeamSearchby githubharald
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
C++ 504Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
ganttsby r9y9
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Jupyter Notebook 503Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speech-to-text-benchmarkby Picovoice
speech to text benchmark framework
Python 502Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
Support
Quality
Security
License
Reuse
j
java-speech-apiby lkuza2
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Java 496Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PPASRby yeyupiaoling
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Python 495Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
melganby seungwonpark
MelGAN vocoder (compatible with NVIDIA/tacotron2)
Python 494Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
c
cheetahby Picovoice
On-device streaming speech-to-text engine powered by deep learning
Python 491Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cn2anby Ailln
📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转化)
Python 486Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kospeechby sooftware
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Python 485Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
FloWaveNetby ksw0306
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Python 476Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AutoSubby abhirooptalasila
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
Python 470Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
neural_spby hirofumi0810
End-to-end ASR/LM implementation with PyTorch
Python 469Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
P
Python-Wrapper-for-World-Vocoderby JeremyCCHsu
A Python wrapper for the high-quality vocoder "World"
Python 468Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spec_augmentby zcaceres
🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Jupyter Notebook 467Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
flutter_ttsby dlutton
Flutter Text to Speech package
C++ 461Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse