Predicts the level of noise and reverberation on your audiofiles
Support
Quality
Security
License
Reuse
Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
Support
Quality
Security
License
Reuse
基于MFCC语音特征提取和识别
Support
Quality
Security
License
Reuse
Feature extraction of speech signal is the initial stage of any speech recognition system.
Support
Quality
Security
License
Reuse
Audio-Visual Speech Recognition using Deep Learning
Support
Quality
Security
License
Reuse
A PyPI package for fast word/character error rate (WER/CER) calculation
Support
Quality
Security
License
Reuse
Adapting your own Language Model for Kaldi
Support
Quality
Security
License
Reuse
A dependency-free C interface to the Mozilla Universal Character Set Detector
Support
Quality
Security
License
Reuse
Open Source Google Translator and TTS App for Linux Desktop
Support
Quality
Security
License
Reuse
Desktop application to convert text-to-speech to preview Twitch donations.
Support
Quality
Security
License
Reuse
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
Support
Quality
Security
License
Reuse
A pytorch_lightning reimplementation of the Transducer module from ESPnet.
Support
Quality
Security
License
Reuse
Intelligent audio assistant like Iron Man Jarvis
Support
Quality
Security
License
Reuse
S
Speech_Emotion_Recognition_DNN-ELMby eesungkim
Python 57 Version:Current License: No License (No License)
Implementation of Speech Emotion Recognition using DNN-ELM
Support
Quality
Security
License
Reuse
A Vue2 Streaming Speech Recognition Speech to text with Google Cloud Speech
Support
Quality
Security
License
Reuse
C
Cognitive-SpeakerRecognition-Windowsby microsoft
C# 57 Version:Current License: Proprietary (Proprietary)
Windows SDK for the Microsoft Speaker Recognition API, part of Cognitive Services
Support
Quality
Security
License
Reuse
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Support
Quality
Security
License
Reuse
Noise removal/ reducer from the audio file in python. De-noising is done using Wavelets and thresholding is done by VISU Shrink thresholding technique
Support
Quality
Security
License
Reuse
Dockerfile for compiling Kaldi for Android.
Support
Quality
Security
License
Reuse
Go bindings for the Microsoft Cognitive Services Speech SDK
Support
Quality
Security
License
Reuse
Adaptive Vocoder for Custom Voice
Support
Quality
Security
License
Reuse
VietASR - Vietnamese Automatic Speech Recognition
Support
Quality
Security
License
Reuse
Official code for paper Context-aware Zero-shot Recognition (https://arxiv.org/abs/1904.09320 to appear at AAAI2020)
Support
Quality
Security
License
Reuse
🛠 Command Line Interface for the Jovo Framework: Makes voice experience deployment a breeze, including features like local development and staging.
Support
Quality
Security
License
Reuse
A Japanese accent dictionary generator
Support
Quality
Security
License
Reuse
Phoneme-to-speech alignment toolkit based on liblrhsmm
Support
Quality
Security
License
Reuse
TTS Util — Text-to-speech utility Android app for synthesising text into audible speech
Support
Quality
Security
License
Reuse
The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.
Support
Quality
Security
License
Reuse
Official Python SDK for Deepgram's automated speech recognition APIs.
Support
Quality
Security
License
Reuse
Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementation
Support
Quality
Security
License
Reuse
Whisper realtime streaming for long speech-to-text transcription and translation
Support
Quality
Security
License
Reuse
LogMMSE speech enhancement/noise reduction
Support
Quality
Security
License
Reuse
🙊 Speech Recognition , Text To Speech , Google Translate
Support
Quality
Security
License
Reuse
python codes to extract MFCC and FBANK speech features for Kaldi
Support
Quality
Security
License
Reuse
HTML5 Speech Recognition API Wrapper Library
Support
Quality
Security
License
Reuse
Voice assistant SDK for Android
Support
Quality
Security
License
Reuse
use iflytek's technology to realize awaken and order recognition
Support
Quality
Security
License
Reuse
Design files and software for my parametric ultrasonic speaker.
Support
Quality
Security
License
Reuse
common repo for MB520/MB525/MB526/
Support
Quality
Security
License
Reuse
Juicer is a Weighted Finite State Transducer (WFST) based decoder for Automatic Speech Recognition (ASR).
Support
Quality
Security
License
Reuse
The idea of this project is to design and make a web-application (with scientist cooperation) which would contained series of special audio trainings to support people with central auditory skills deficit to allow them to train them to listen better.
Support
Quality
Security
License
Reuse
Google's SoundStorm: Efficient Parallel Audio Generation
Support
Quality
Security
License
Reuse
TTS model based on Transformer.
Support
Quality
Security
License
Reuse
Bangla text to speech, Multilingual (Bangla, English) real-time ([almost] in a GPU) speech synthesis library
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
A text to speech GTK+ front-end for eSpeak and mbrola to play a text in many languages with settings for voice, pitch, volume and speed.
Support
Quality
Security
License
Reuse
text-to-speech synthesis
Support
Quality
Security
License
Reuse
s
simple-speaker-embeddingby RF5
Jupyter Notebook 54 Version:Current License: Proprietary (Proprietary)
A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.
Support
Quality
Security
License
Reuse
🎤 🔉 Projeto de um SPA desenvolvido com Quasar Framework 1.0 + Speech API para capturar áudio e transformar em texto, ou utilizar um texto como base para a aplicação emitir um áudio.
Support
Quality
Security
License
Reuse
macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.
Support
Quality
Security
License
Reuse
b
brouhaha-vadby marianne-m
Predicts the level of noise and reverberation on your audiofiles
Jupyter Notebook 59Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AuxiliaryASRby yl4579
Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
Python 59Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech-mfccby education-service
基于MFCC语音特征提取和识别
Java 58Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
Speech_Feature_Extractionby pchao6
Feature extraction of speech signal is the initial stage of any speech recognition system.
Python 58Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
L
Lip_Reading_in_the_Wild_AVSRby ajinkyaT
Audio-Visual Speech Recognition using Deep Learning
Python 58Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
fastwerby kahne
A PyPI package for fast word/character error rate (WER/CER) calculation
Python 58Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
lm_buildby srvk
Adapting your own Language Model for Kaldi
Shell 58Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
l
libcharsetdetectby batterseapower
A dependency-free C interface to the Mozilla Universal Character Set Detector
C++ 58Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
glateby keshavbhatt
Open Source Google Translator and TTS App for Linux Desktop
C++ 58Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
E
ElundusCoreAppby SietseT
Desktop application to convert text-to-speech to preview Twitch donations.
JavaScript 58Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
P
PromptingWhisperby jasonppy
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
Python 58Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
T
Transformer-Transducerby okkteam
A pytorch_lightning reimplementation of the Transducer module from ESPnet.
Python 57Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
J
JELVISby kiahamedi
Intelligent audio assistant like Iron Man Jarvis
Python 57Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
S
Speech_Emotion_Recognition_DNN-ELMby eesungkim
Implementation of Speech Emotion Recognition using DNN-ELM
Python 57Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
vue-speech-streamingby aofdev
A Vue2 Streaming Speech Recognition Speech to text with Google Cloud Speech
JavaScript 57Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Cognitive-SpeakerRecognition-Windowsby microsoft
Windows SDK for the Microsoft Speaker Recognition API, part of Cognitive Services
C# 57Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
syn-speechby SynHub
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
C# 57Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
Audio-Denoisingby AP-Atul
Noise removal/ reducer from the audio file in python. De-noising is done using Wavelets and thresholding is done by VISU Shrink thresholding technique
Python 57Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
docker-kaldi-androidby jcsilva
Dockerfile for compiling Kaldi for Android.
Shell 57Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
cognitive-services-speech-sdk-goby microsoft
Go bindings for the Microsoft Cognitive Services Speech SDK
Go 57Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AdaVocoderby yuan1615
Adaptive Vocoder for Custom Voice
Python 57Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
viet-asrby dangvansam98
VietASR - Vietnamese Automatic Speech Recognition
Python 57Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
C
Context-aware-ZSRby ruotianluo
Official code for paper Context-aware Zero-shot Recognition (https://arxiv.org/abs/1904.09320 to appear at AAAI2020)
Python 56Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
j
jovo-cliby jovotech
🛠 Command Line Interface for the Jovo Framework: Makes voice experience deployment a breeze, including features like local development and staging.
TypeScript 56Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tdmelodicby PKSHATechnology-Research
A Japanese accent dictionary generator
Python 56Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
S
SHIROby Sleepwalking
Phoneme-to-speech alignment toolkit based on liblrhsmm
C 56Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
t
tts-util-appby Danesprite
TTS Util — Text-to-speech utility Android app for synthesising text into audible speech
Kotlin 56Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
A
AliMeetingby yufan-aslp
The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.
Python 56Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
deepgram-python-sdkby deepgram
Official Python SDK for Deepgram's automated speech recognition APIs.
Python 56Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SVCC23_FastSVCby lesterphillip
Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementation
Python 56Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
w
whisper_streamingby ufal
Whisper realtime streaming for long speech-to-text transcription and translation
Python 56Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
logmmseby rajivpoddar
LogMMSE speech enhancement/noise reduction
Python 55Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
j
java-google-speech-apiby goxr3plus
🙊 Speech Recognition , Text To Speech , Google Translate
Java 55Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
p
python_kaldi_featuresby ZitengWang
python codes to extract MFCC and FBANK speech features for Kaldi
Python 55Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
webspeechby ranacseruet
HTML5 Speech Recognition API Wrapper Library
JavaScript 55Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
aimybox-android-sdkby just-ai
Voice assistant SDK for Android
Kotlin 55Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
iflytek_awaken_asrby HaoQChen
use iflytek's technology to realize awaken and order recognition
C 55Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
P
ParametricSpeakerby NiklasFauth
Design files and software for my parametric ultrasonic speaker.
C++ 55Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
android_device_moto_jordan-commonby Quarx2k
common repo for MB520/MB525/MB526/
C 55Updated: 5 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
j
juicerby idiap
Juicer is a Weighted Finite State Transducer (WFST) based decoder for Automatic Speech Recognition (ASR).
C++ 55Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
b
brnby Brain-up
The idea of this project is to design and make a web-application (with scientist cooperation) which would contained series of special audio trainings to support people with central auditory skills deficit to allow them to train them to listen better.
Kotlin 55Updated: 2 y ago License: Permissive (CC0-1.0)
Support
Quality
Security
License
Reuse
S
SoundStorm-pytorchby rishikksh20
Google's SoundStorm: Efficient Parallel Audio Generation
Python 55Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
T
Transformer-TTSby xcmyz
TTS model based on Transformer.
Python 54Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
b
bangla-ttsby zabir-nabil
Bangla text to speech, Multilingual (Bangla, English) real-time ([almost] in a GPU) speech synthesis library
Python 54Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
A
ASRby shiyuzh2007
Python 54Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
gespeakerby muflone
A text to speech GTK+ front-end for eSpeak and mbrola to play a text in many languages with settings for voice, pitch, volume and speed.
Python 54Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speechby unk1911
text-to-speech synthesis
JavaScript 54Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
simple-speaker-embeddingby RF5
A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.
Jupyter Notebook 54Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
q
quasar-speech-apiby patrickmonteiro
🎤 🔉 Projeto de um SPA desenvolvido com Quasar Framework 1.0 + Speech API para capturar áudio e transformar em texto, ou utilizar um texto como base para a aplicação emitir um áudio.
JavaScript 54Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voicesby mklement0
macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.
Shell 54Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse