WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
Support
Quality
Security
License
Reuse
wsj0-{2, 3, 4, 5} mix generation scripts, in Python.
Support
Quality
Security
License
Reuse
Go wrapper for Kitt-AI's snowboy audio detection library.
Support
Quality
Security
License
Reuse
Word Spotting and Recognition with Embedded Attributes
Support
Quality
Security
License
Reuse
ChatGPT web application, use OpenAI official API. ChatGPT 网页应用,支持多对话、海量提示词、PWA、ASR、TTS
Support
Quality
Security
License
Reuse
A python IO interface for data accessing in kaldi
Support
Quality
Security
License
Reuse
LSTM CTC End2End Speech Recognition.
Support
Quality
Security
License
Reuse
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Support
Quality
Security
License
Reuse
Bidirectional dynamic RNN + CTC for phoneme recognition
Support
Quality
Security
License
Reuse
Voice conversion (VC) investigation using three variants of VAE
Support
Quality
Security
License
Reuse
Quasi-Periodic Parallel WaveGAN Pytorch implementation
Support
Quality
Security
License
Reuse
Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.
Support
Quality
Security
License
Reuse
self-supervised domain adaptation
Support
Quality
Security
License
Reuse
Code for NeurIPS paper "vGraph: A Generative Model for Joint CommunityDetection and Node Representation Learning"
Support
Quality
Security
License
Reuse
Yorùbá language training text for NLP, ASR and TTS tasks
Support
Quality
Security
License
Reuse
D
Deep-Encoder-Decoder-Conv-TasNetby JusperLee
Python 35 Version:Current License: No License (No License)
A PyTorch implementation of " AN EMPIRICAL STUDY OF CONV-TASNET "
Support
Quality
Security
License
Reuse
how to use the Google Cloud Speech API to transcribe audio/video files.
Support
Quality
Security
License
Reuse
Automatically generates TTS dataset using audio and associated text. Make cuts under a custom length. Uses Google Speech to text API to perform diarization and transcription or aeneas to force align text to audio.
Support
Quality
Security
License
Reuse
SmartApp Framework для создания навыков семейства Виртуальных Ассистентов "Салют" на языке JavaScript
Support
Quality
Security
License
Reuse
T
Shell 35 Version:Current License: No License (No License)
Support
Quality
Security
License
Reuse
The HMM-Based Singing Voice Syntheis System Remix "Sinsy-r"
Support
Quality
Security
License
Reuse
CTC+Beam_Search+kenlm 是用于以汉字为声学模型建模单元的解码系统
Support
Quality
Security
License
Reuse
Package snd provides methods and types for sound processing and synthesis.
Support
Quality
Security
License
Reuse
An evaluation toolkit for voice conversion models.
Support
Quality
Security
License
Reuse
REPeating Pattern Extraction Technique (REPET) in Matlab for audio source separation: original REPET, REPET extended, adaptive REPET, REPET-SIM, REPET-SIM online
Support
Quality
Security
License
Reuse
SelfRemaster: SSL Speech Restoration
Support
Quality
Security
License
Reuse
Dataset Release for Intent Classification from Speech
Support
Quality
Security
License
Reuse
A
Attention-Is-All-You-Need-In-Speech-Separationby Zhongyang-debug
Python 35 Version:Current License: No License (No License)
Speech Separation
Support
Quality
Security
License
Reuse
A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).
Support
Quality
Security
License
Reuse
Rev AI Python SDK
Support
Quality
Security
License
Reuse
An unofficial implementation of https://arxiv.org/abs/2005.05106
Support
Quality
Security
License
Reuse
Ivona Cloud (via Amazon services) client library for Node
Support
Quality
Security
License
Reuse
Flask-based web application that records sound (as PCM/WAV) and converts speech to text via Google Cloud Speech API using HTML, JavaScript, and Python
Support
Quality
Security
License
Reuse
Bleep, bloop, I'm a computer that responds to your voice
Support
Quality
Security
License
Reuse
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
Support
Quality
Security
License
Reuse
An open-source implementation of sequence-to-sequence based speech processing engine
Support
Quality
Security
License
Reuse
2
2022-DL-Audio-Courseby severilov
Jupyter Notebook 34 Version:Current License: No License (No License)
Deep Learning Audio Course, 2022
Support
Quality
Security
License
Reuse
Automatic Speech Recognition in Unity using Vosk library
Support
Quality
Security
License
Reuse
Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model
Support
Quality
Security
License
Reuse
Synthesize a continuous pitch sequence
Support
Quality
Security
License
Reuse
Text-to-Speech (TTS) demo web app that converts written text into spoken words via Morse code
Support
Quality
Security
License
Reuse
An Windows WPF client software for ASRT speech recognition system. 一个可用于ASRT语音识别系统的Windows WPF版客户端软件
Support
Quality
Security
License
Reuse
C
ConferencingSpeech2021by ConferencingSpeech
Python 33 Version:Current License: Permissive (Apache-2.0)
Conferencing Speech Challenge
Support
Quality
Security
License
Reuse
Voice Controlled Chromium Web Browser
Support
Quality
Security
License
Reuse
An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech
Support
Quality
Security
License
Reuse
*UNMAINTAINED* A modular digital audio workstation for synthesis, sequencing, live coding, visuals, etc
Support
Quality
Security
License
Reuse
wavutils is a tool set that process wav file
Support
Quality
Security
License
Reuse
Multilingual Grapheme to Phoneme
Support
Quality
Security
License
Reuse
Go bindings and high-level API to acoustic fingerprinting library chromaprint
Support
Quality
Security
License
Reuse
Integrating Asterisk with Google Assistant Voice Service on a Raspberry Pi Zero using AGI
Support
Quality
Security
License
Reuse
w
wavencoderby shangeth
WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
Python 36Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pywsj0-mixby mpariente
wsj0-{2, 3, 4, 5} mix generation scripts, in Python.
Python 36Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
go-snowboyby brentnd
Go wrapper for Kitt-AI's snowboy audio detection library.
Go 36Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wattsby almazan
Word Spotting and Recognition with Embedded Attributes
C 36Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
chatgpt-webby liuw5367
ChatGPT web application, use OpenAI official API. ChatGPT 网页应用,支持多对话、海量提示词、PWA、ASR、TTS
TypeScript 36Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kaldi-python-ioby funcwj
A python IO interface for data accessing in kaldi
Python 35Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
lstm_ctcby mobvoi
LSTM CTC End2End Speech Recognition.
Python 35Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
C
CycleGAN-VC2by onejiin
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Python 35Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
phoneme_ctcby tbornt
Bidirectional dynamic RNN + CTC for phoneme recognition
Python 35Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
voice-conversionby vsimkus
Voice conversion (VC) investigation using three variants of VAE
Python 35Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
Q
QPPWGby bigpon
Quasi-Periodic Parallel WaveGAN Pytorch implementation
Python 35Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
ProMoby timmahrt
Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.
Python 35Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
self-supervised-daby Jiaolong
self-supervised domain adaptation
Python 35Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
vGraphby fanyun-sun
Code for NeurIPS paper "vGraph: A Generative Model for Joint CommunityDetection and Node Representation Learning"
Python 35Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
y
yoruba-textby Niger-Volta-LTI
Yorùbá language training text for NLP, ASR and TTS tasks
Python 35Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
D
Deep-Encoder-Decoder-Conv-TasNetby JusperLee
A PyTorch implementation of " AN EMPIRICAL STUDY OF CONV-TASNET "
Python 35Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech_to_textby m-nathani
how to use the Google Cloud Speech API to transcribe audio/video files.
PHP 35Updated: 4 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
T
TTS-dataset-toolsby youmebangbang
Automatically generates TTS dataset using audio and associated text. Make cuts under a custom length. Uses Google Speech to text API to perform diarization and transcription or aeneas to force align text to audio.
Python 35Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
salutejsby sberdevices
SmartApp Framework для создания навыков семейства Виртуальных Ассистентов "Салют" на языке JavaScript
TypeScript 35Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
T
THE-2020-PERSONALIZED-VOICE-TRIGGER-CHALLENGE-BASELINE-SYSTEMby lenovo-voice
Shell 35Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
Sinsy-Remixby hyperzlib
The HMM-Based Singing Voice Syntheis System Remix "Sinsy-r"
C++ 35Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
ctc_beam_search_lmby Sundy1219
CTC+Beam_Search+kenlm 是用于以汉字为声学模型建模单元的解码系统
C++ 35Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
sndby dskinner
Package snd provides methods and types for sound processing and synthesis.
Go 35Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
V
Voice-conversion-evaluationby tzuhsien
An evaluation toolkit for voice conversion models.
Python 35Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
R
REPET-Matlabby zafarrafii
REPeating Pattern Extraction Technique (REPET) in Matlab for audio source separation: original REPET, REPET extended, adaptive REPET, REPET-SIM, REPET-SIM online
Jupyter Notebook 35Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
ssl_speech_restorationby Takaaki-Saeki
SelfRemaster: SSL Speech Restoration
Python 35Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech-to-intent-datasetby skit-ai
Dataset Release for Intent Classification from Speech
Python 35Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
Attention-Is-All-You-Need-In-Speech-Separationby Zhongyang-debug
Speech Separation
Python 35Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
chainer-Fast-WaveNetby dhgrs
A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).
Python 34Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
r
Support
Quality
Security
License
Reuse
m
multiband_melganby AppleHolic
An unofficial implementation of https://arxiv.org/abs/2005.05106
Python 34Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
ivona-nodeby tmanderson
Ivona Cloud (via Amazon services) client library for Node
JavaScript 34Updated: 5 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
gcloud_speech_voice_recorderby taekb
Flask-based web application that records sound (as PCM/WAV) and converts speech to text via Google Cloud Speech API using HTML, JavaScript, and Python
JavaScript 34Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
voice-activated-microbitby edgeimpulse
Bleep, bloop, I'm a computer that responds to your voice
C 34Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spectby lennes
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
HTML 34Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
a
athenaby LianjiaTech
An open-source implementation of sequence-to-sequence based speech processing engine
C++ 34Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
2
2022-DL-Audio-Courseby severilov
Deep Learning Audio Course, 2022
Jupyter Notebook 34Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
vosk-unity-asrby alphacep
Automatic Speech Recognition in Unity using Vosk library
C# 34Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
k
kaldi-adapt-lmby gooofy
Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model
Python 33Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
melosynthby justinsalamon
Synthesize a continuous pitch sequence
Python 33Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
m
morse-speak-demoby googlecreativelab
Text-to-Speech (TTS) demo web app that converts written text into spoken words via Morse code
JavaScript 33Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
A
ASRT_SpeechClient_WPFby nl8590687
An Windows WPF client software for ASRT speech recognition system. 一个可用于ASRT语音识别系统的Windows WPF版客户端软件
C# 33Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
ConferencingSpeech2021by ConferencingSpeech
Conferencing Speech Challenge
Python 33Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
voce-browserby trabdlkarim
Voice Controlled Chromium Web Browser
Python 33Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
scriptionby smlum
An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech
JavaScript 33Updated: 3 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
f
flow-synthby nwoeanhinnogaehr
*UNMAINTAINED* A modular digital audio workstation for synthesis, sequencing, live coding, visuals, etc
Rust 33Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wavutilsby smallmuou
wavutils is a tool set that process wav file
Shell 33Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
multilingual-g2pby jcsilva
Multilingual Grapheme to Phoneme
Shell 33Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
gochromaby go-fingerprint
Go bindings and high-level API to acoustic fingerprinting library chromaprint
Go 33Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
R
RaspiAsteriskGoogleby rgrokett
Integrating Asterisk with Google Assistant Voice Service on a Raspberry Pi Zero using AGI
Perl 33Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse