The PyTorch-based audio source separation toolkit for researchers
Support
Quality
Security
License
Reuse
免费的在线文本转语音API
Support
Quality
Security
License
Reuse
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Support
Quality
Security
License
Reuse
A neural network for end-to-end speech denoising
Support
Quality
Security
License
Reuse
Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser
Support
Quality
Security
License
Reuse
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Support
Quality
Security
License
Reuse
:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection
Support
Quality
Security
License
Reuse
A library for speech data augmentation in time-domain
Support
Quality
Security
License
Reuse
PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN)
Support
Quality
Security
License
Reuse
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Support
Quality
Security
License
Reuse
Offline speech recognition for Android with Vosk library.
Support
Quality
Security
License
Reuse
Android SDK for Dialogflow
Support
Quality
Security
License
Reuse
CMU ARK Twitter Part-of-Speech Tagger
Support
Quality
Security
License
Reuse
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Support
Quality
Security
License
Reuse
一个使用C++编写的音频处理软件
Support
Quality
Security
License
Reuse
Legacy repository for concrete5
Support
Quality
Security
License
Reuse
Speech Algorithms
Support
Quality
Security
License
Reuse
Python library for Dialogflow
Support
Quality
Security
License
Reuse
A self-supervised learning framework for audio-visual speech
Support
Quality
Security
License
Reuse
Voice Conversion Tool Kit
Support
Quality
Security
License
Reuse
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Support
Quality
Security
License
Reuse
Language Detection Library for Java
Support
Quality
Security
License
Reuse
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Support
Quality
Security
License
Reuse
On-device Speech-to-Intent engine powered by deep learning
Support
Quality
Security
License
Reuse
Open Text to Speech Server
Support
Quality
Security
License
Reuse
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Support
Quality
Security
License
Reuse
An opensource text-to-speech (TTS) voice building tool
Support
Quality
Security
License
Reuse
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Support
Quality
Security
License
Reuse
A Fundamental End-to-End Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
Support
Quality
Security
License
Reuse
基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。
Support
Quality
Security
License
Reuse
Large, modern dataset for speech recognition
Support
Quality
Security
License
Reuse
Evaluation functions for music/audio information retrieval/signal processing algorithms.
Support
Quality
Security
License
Reuse
Converts an audio stream to speech events in the browser
Support
Quality
Security
License
Reuse
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
Support
Quality
Security
License
Reuse
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Support
Quality
Security
License
Reuse
speech to text benchmark framework
Support
Quality
Security
License
Reuse
General Speech Restoration
Support
Quality
Security
License
Reuse
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Support
Quality
Security
License
Reuse
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Support
Quality
Security
License
Reuse
MelGAN vocoder (compatible with NVIDIA/tacotron2)
Support
Quality
Security
License
Reuse
On-device streaming speech-to-text engine powered by deep learning
Support
Quality
Security
License
Reuse
📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转化)
Support
Quality
Security
License
Reuse
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Support
Quality
Security
License
Reuse
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Support
Quality
Security
License
Reuse
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
Support
Quality
Security
License
Reuse
End-to-end ASR/LM implementation with PyTorch
Support
Quality
Security
License
Reuse
A Python wrapper for the high-quality vocoder "World"
Support
Quality
Security
License
Reuse
🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Support
Quality
Security
License
Reuse
Flutter Text to Speech package
Support
Quality
Security
License
Reuse
a
asteroidby mpariente
The PyTorch-based audio source separation toolkit for researchers
Python
611
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
Support
Quality
Security
License
Reuse
S
SpecAugmentby DemisEom
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Python
596
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speech-denoising-wavenetby drethage
A neural network for end-to-end speech denoising
Python
594
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
cboardby cboard-org
Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser
JavaScript
594
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
d
diffwaveby lmnt-com
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Python
593
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sonusby evancohen
:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection
JavaScript
592
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
W
WavAugmentby facebookresearch
A library for speech data augmentation in time-domain
Python
585
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Parakeetby PaddlePaddle
PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN)
Python
584
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
i
inaSpeechSegmenterby ina-foss
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Python
584
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
vosk-android-demoby alphacep
Offline speech recognition for Android with Vosk library.
Java
578
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
dialogflow-android-clientby dialogflow
Android SDK for Dialogflow
Java
577
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
ark-tweet-nlpby brendano
CMU ARK Twitter Part-of-Speech Tagger
Java
573
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
o
openspeechby openspeech-team
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Python
572
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
MoeVoiceStudioby NaruseMioShirakana
一个使用C++编写的音频处理软件
C++
572
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
c
concrete5-legacyby concretecms
Legacy repository for concrete5
PHP
566
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
S
Support
Quality
Security
License
Reuse
d
dialogflow-python-clientby dialogflow
Python library for Dialogflow
Python
559
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
av_hubertby facebookresearch
A self-supervised learning framework for audio-visual speech
Python
559
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
C
Conv-TasNetby kaituoxu
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Python
552
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
language-detectorby optimaize
Language Detection Library for Java
Java
543
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
Y
YourTTSby Edresson
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Jupyter Notebook
541
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
r
rhinoby Picovoice
On-device Speech-to-Intent engine powered by deep learning
Python
533
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
openttsby synesthesiam
Open Text to Speech Server
Python
530
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
Multilingual_Text_to_Speechby Tomiinek
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Python
528
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-builderby google
An opensource text-to-speech (TTS) voice building tool
JavaScript
528
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SpeechSplitby auspicious3000
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Python
527
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
F
FunASRby alibaba-damo-academy
A Fundamental End-to-End Speech Recognition Toolkit
Python
524
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
tacotronby google
Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
HTML
522
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
P
PaddlePaddle-DeepSpeechby yeyupiaoling
基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。
Python
519
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
GigaSpeechby SpeechColab
Large, modern dataset for speech recognition
Shell
515
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mir_evalby craffel
Evaluation functions for music/audio information retrieval/signal processing algorithms.
Python
512
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
harkby otalk
Converts an audio stream to speech events in the browser
JavaScript
504
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
C
CTCWordBeamSearchby githubharald
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
C++
504
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
ganttsby r9y9
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Jupyter Notebook
503
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speech-to-text-benchmarkby Picovoice
speech to text benchmark framework
Python
502
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
Support
Quality
Security
License
Reuse
j
java-speech-apiby lkuza2
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Java
496
Updated: 4 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PPASRby yeyupiaoling
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Python
495
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
melganby seungwonpark
MelGAN vocoder (compatible with NVIDIA/tacotron2)
Python
494
Updated: 4 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
c
cheetahby Picovoice
On-device streaming speech-to-text engine powered by deep learning
Python
491
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cn2anby Ailln
📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转化)
Python
486
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kospeechby sooftware
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Python
485
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
FloWaveNetby ksw0306
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Python
476
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AutoSubby abhirooptalasila
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
Python
470
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
neural_spby hirofumi0810
End-to-end ASR/LM implementation with PyTorch
Python
469
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
P
Python-Wrapper-for-World-Vocoderby JeremyCCHsu
A Python wrapper for the high-quality vocoder "World"
Python
468
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spec_augmentby zcaceres
🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Jupyter Notebook
467
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
flutter_ttsby dlutton
Flutter Text to Speech package
C++
461
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse