DeepSpeech, Speech To Text, ASR, Speech recognition, Keras, Tensorflow
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
A small collection of examples that use the Web Speech API.
Support
Quality
Security
License
Reuse
Wiki2SSML provides the WikiVoice markup language used for fine-tuning synthesised voice.
Support
Quality
Security
License
Reuse
waveform wavesurfer -waveform js html 音频audio波形图
Support
Quality
Security
License
Reuse
Examples on how to use Tinkoff Voicekit
Support
Quality
Security
License
Reuse
A PyTorch implementation of the universal neural vocoder
Support
Quality
Security
License
Reuse
A small JavaScript library to call Bing Speech-To-Text API with continuous detection and Text-To-Speech API
Support
Quality
Security
License
Reuse
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
Support
Quality
Security
License
Reuse
SpeakerVoiceIdentifier can recognize the voice of a speaker by learning.
Support
Quality
Security
License
Reuse
C++ implementation of End to End TTS which combines both Tacatron2 and LPCNET Vocoder.
Support
Quality
Security
License
Reuse
Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language
Support
Quality
Security
License
Reuse
A local auto speech recognition project based on Kaldi and ALSA.
Support
Quality
Security
License
Reuse
FreeSWITCH TTS Voice Prompt Generator
Support
Quality
Security
License
Reuse
Python Assistant (PA) is a voice command based assistant service written in Python 3.9+. It can recognize human speech or voice, talk to user and execute basic commands.
Support
Quality
Security
License
Reuse
Train fast Speech-to-Text networks in different languages
Support
Quality
Security
License
Reuse
Python wrapper for OpenFST and its extensions from Kaldi. Also support reading/writing ark/scp files
Support
Quality
Security
License
Reuse
Reproducibility kit for "BAF: An Audio Fingerprinting Dataset for Broadcast Monitoring" by Guillem Cortès, Álex Ciurana, Emilio Molina, Marius Miron, Owen Meyers, Joren Six and Xavier Serra.
Support
Quality
Security
License
Reuse
Transcribe and translate audio to text using Whisper and DeepL.
Support
Quality
Security
License
Reuse
A diffusion-based cross-lingual voice conversion model, as my bachelor's thesis
Support
Quality
Security
License
Reuse
تفريغ المواد المرئية أو المسموعة إلى نصوص
Support
Quality
Security
License
Reuse
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
Support
Quality
Security
License
Reuse
Code for the Paper Speech Recognition and Multi-Speaker Diarization of Long Conversations
Support
Quality
Security
License
Reuse
Telegram bot with voice message recognition and generation. Speech to Text and Text to Speech
Support
Quality
Security
License
Reuse
automatically align transcribed audio and generate a wav2letter training corpus
Support
Quality
Security
License
Reuse
Persian Speech Recognition using Google API's
Support
Quality
Security
License
Reuse
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Support
Quality
Security
License
Reuse
Google Speech Recognition Module for Naoqi and the Pepper Robot by Aldebaran
Support
Quality
Security
License
Reuse
Runs `ember serve` and will automatically restart it when necessary
Support
Quality
Security
License
Reuse
S
SpeechSynthesisRecorderby guest271314
JavaScript 29 Version:Current License: No License (No License)
Get audio output from window.speechSynthesis.speak() call as ArrayBuffer, AudioBuffer, Blob, MediaSource, MediaStream, ReadableStream, other object or data types
Support
Quality
Security
License
Reuse
JavaScript modules for Mozilla's cloud speech recognition API.
Support
Quality
Security
License
Reuse
Jarvis Tutorial
Support
Quality
Security
License
Reuse
A modern platform for conlanging. Currently in the planning stage.
Support
Quality
Security
License
Reuse
An unofficial implement of autoregressive vocoder Multiband-WaveRNN. Audio samples in https://rongjiehuang.github.io/Multiband-WaveRNN/
Support
Quality
Security
License
Reuse
Implementation Phase-aware Speech Enhancement with Deep Complex U-Net
Support
Quality
Security
License
Reuse
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch
Support
Quality
Security
License
Reuse
:computer: A repository with comprehensive instructions for using the Festvox toolkit for generating Emotional speech :speaker: from text
Support
Quality
Security
License
Reuse
A library for using Web Speech API with Angular
Support
Quality
Security
License
Reuse
A
AUDIO-SPEECH-TO-SIGN-LANGUAGE-CONVERTERby jigargajjar55
HTML 29 Version:Current License: Permissive (MIT)
A web based application which accepts Audio speech or Text as input and converts it to corresponding Indian Sign Language for impaired of speaking or impaired of hearing and deaf people.
Support
Quality
Security
License
Reuse
⚙️ [Processor] A better English POS tagger written in JavaScript
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
assignments for e6870 ASR class
Support
Quality
Security
License
Reuse
嵌入式设备环境的前端降噪模块
Support
Quality
Security
License
Reuse
语音合成工具箱,Text To Speech Toolkit,多种音色可供选择的语音合成工具。
Support
Quality
Security
License
Reuse
Speech to Text with Hugging Face and Wav2vec 2.0
Support
Quality
Security
License
Reuse
This is the repo of the manuscript "Embedding and Beamforming: All-Neural Causal Beamformer for Multichannel Speech Enhancement", which was submitted to ICASSP2022.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Companion for OSC and Communication
Support
Quality
Security
License
Reuse
MimicMania is a web application that allows you to generate speech and clone voices using text-to-speech technology. With MimicMania, you can create custom voices in a variety of languages and use them for a range of applications, from voiceovers to chatbots.
Support
Quality
Security
License
Reuse
Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis [ACMMM 2022]
Support
Quality
Security
License
Reuse
D
DeepSpeech2-Kerasby ShankHarinath
DeepSpeech, Speech To Text, ASR, Speech recognition, Keras, Tensorflow
Python 30Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
N
NER-with-LSby ghaddarAbs
Python 30Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speechapi-examplesby iandevlin
A small collection of examples that use the Web Speech API.
JavaScript 30Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wiki2ssmlby baxtree
Wiki2SSML provides the WikiVoice markup language used for fine-tuning synthesised voice.
JavaScript 30Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
vue-waveformby chenqiaoen521
waveform wavesurfer -waveform js html 音频audio波形图
JavaScript 30Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
voicekit-examplesby TinkoffCreditSystems
Examples on how to use Tinkoff Voicekit
C# 30Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
u
universal-vocoderby yistLin
A PyTorch implementation of the universal neural vocoder
Python 30Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
B
BingSpeechby davrous
A small JavaScript library to call Bing Speech-To-Text API with continuous detection and Text-To-Speech API
TypeScript 30Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pytorch_MLP_for_ASRby mravanelli
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
Perl 30Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SpeakerVoiceIdentifierby FragJage
SpeakerVoiceIdentifier can recognize the voice of a speaker by learning.
C++ 30Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
l
lpctron-tts-cppby alokprasad
C++ implementation of End to End TTS which combines both Tacatron2 and LPCNET Vocoder.
C 30Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech.koby homink
Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language
Shell 30Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SpeechRecognitionby OAID
A local auto speech recognition project based on Kaldi and ALSA.
C++ 30Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
freeswitch-sounds-ttsby jpawlowski
FreeSWITCH TTS Voice Prompt Generator
Shell 30Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
P
Python-Assistantby Umesh-01
Python Assistant (PA) is a voice command based assistant service written in Python 3.9+. It can recognize human speech or voice, talk to user and execute basic commands.
Python 30Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
Scribosermoby Jaco-Assistant
Train fast Speech-to-Text networks in different languages
Python 30Updated: 3 y ago License: Weak Copyleft (GNU LGPLv3)
Support
Quality
Security
License
Reuse
k
kaldifstby k2-fsa
Python wrapper for OpenFST and its extensions from Kaldi. Also support reading/writing ark/scp files
C++ 30Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
b
baf-datasetby guillemcortes
Reproducibility kit for "BAF: An Audio Fingerprinting Dataset for Broadcast Monitoring" by Guillem Cortès, Álex Ciurana, Emilio Molina, Marius Miron, Owen Meyers, Joren Six and Xavier Serra.
Python 30Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
A
AudioToTextby Carleslc
Transcribe and translate audio to text using Whisper and DeepL.
Jupyter Notebook 30Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
P
PPG-Diff-VCby seahore
A diffusion-based cross-lingual voice conversion model, as my bachelor's thesis
Python 30Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tafrighby ieasybooks
تفريغ المواد المرئية أو المسموعة إلى نصوص
Python 30Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
whisper-writerby savbell
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
Python 30Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
tal-asrdby calclavia
Code for the Paper Speech Recognition and Multi-Speaker Diarization of Long Conversations
Python 30Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tg_bot_stt_ttsby tochilkinva
Telegram bot with voice message recognition and generation. Speech to Text and Text to Speech
Python 30Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
wav2trainby talonvoice
automatically align transcribed audio and generate a wav2letter training corpus
Python 29Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Persian-Speech-Recognitionby amirfrsd
Persian Speech Recognition using Google API's
Python 29Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
download_audiosetby jim-schwoebel
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Python 29Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
pepperspeechrecognitionby JBramauer
Google Speech Recognition Module for Naoqi and the Pepper Robot by Aldebaran
Python 29Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
ember-autoserveby ebryn
Runs `ember serve` and will automatically restart it when necessary
JavaScript 29Updated: 5 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SpeechSynthesisRecorderby guest271314
Get audio output from window.speechSynthesis.speak() call as ArrayBuffer, AudioBuffer, Blob, MediaSource, MediaStream, ReadableStream, other object or data types
JavaScript 29Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speaktome-webby mozilla
JavaScript modules for Mozilla's cloud speech recognition API.
JavaScript 29Updated: 3 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
j
Support
Quality
Security
License
Reuse
l
langueby yuhr
A modern platform for conlanging. Currently in the planning stage.
TypeScript 29Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
M
Multiband-WaveRNNby Rongjiehuang
An unofficial implement of autoregressive vocoder Multiband-WaveRNN. Audio samples in https://rongjiehuang.github.io/Multiband-WaveRNN/
Python 29Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Phase_aware_Deep_Complex_UNetby Doyosae
Implementation Phase-aware Speech Enhancement with Deep Complex U-Net
Python 29Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
auditory-slow-fastby ekazakos
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch
Python 29Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hmm-for-emo-ttsby Emotional-Text-to-Speech
:computer: A repository with comprehensive instructions for using the Festvox toolkit for generating Emotional speech :speaker: from text
CSS 29Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speechby ng-web-apis
A library for using Web Speech API with Angular
TypeScript 29Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AUDIO-SPEECH-TO-SIGN-LANGUAGE-CONVERTERby jigargajjar55
A web based application which accepts Audio speech or Text as input and converts it to corresponding Indian Sign Language for impaired of speaking or impaired of hearing and deaf people.
HTML 29Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
en-posby FinNLP
⚙️ [Processor] A better English POS tagger written in JavaScript
TypeScript 29Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
D
DiDiSpeechby athena-team
HTML 29Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
e6870by placebokkk
assignments for e6870 ASR class
C 29Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
b
Support
Quality
Security
License
Reuse
t
ttskitby KuangDD
语音合成工具箱,Text To Speech Toolkit,多种音色可供选择的语音合成工具。
Python 29Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech-to-textby sdhilip200
Speech to Text with Hugging Face and Wav2vec 2.0
Jupyter Notebook 29Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
E
EaBNetby Andong-Li-speech
This is the repo of the manuscript "Embedding and Beamforming: All-Neural Causal Beamformer for Multichannel Speech Enhancement", which was submitted to ICASSP2022.
Python 29Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
U
UnsupTTSby lwang114
Shell 29Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
H
HOSCYby PaciStardust
Companion for OSC and Communication
C# 29Updated: 1 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
M
MimicManiaby everydaycodings
MimicMania is a web application that allows you to generate speech and clone voices using text-to-speech technology. With MimicMania, you can create custom voices in a variety of languages and use them for a range of applications, from voiceovers to chatbots.
Python 29Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
DisCoby PantoMatrix
Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis [ACMMM 2022]
Python 29Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse