Unknown Detection Party
Support
Quality
Security
License
Reuse
useful for learning android audio system
Support
Quality
Security
License
Reuse
PyTorch implementation of Listen Attend and Spell Automatic Speech Recognition (ASR).
Support
Quality
Security
License
Reuse
The open source intelligent personal assistant
Support
Quality
Security
License
Reuse
Voice commands (command your PC with spoken commands)
Support
Quality
Security
License
Reuse
A free, premium quality speech synthesis engine written completely in C.
Support
Quality
Security
License
Reuse
Tensorflow implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Support
Quality
Security
License
Reuse
Converts SRT subtitle file to SSML file with speech durations
Support
Quality
Security
License
Reuse
Pitch Controllable DDSP Vocoders
Support
Quality
Security
License
Reuse
A self-supervised speech denoising strategy named Only-Noisy Training (ONT), which solves the speech denoising problem with only noisy audio signals in audio space for the first time.
Support
Quality
Security
License
Reuse
G
Python 24 Version:Current License: No License (No License)
Python WebSocket server which converts input audio stream from microphone to text using Google speech to text
Support
Quality
Security
License
Reuse
Convert ppt to video with audio track, using text to speech synthesis
Support
Quality
Security
License
Reuse
5-class Korean speech emotion classifier
Support
Quality
Security
License
Reuse
HAL-9000 Speech Simulator
Support
Quality
Security
License
Reuse
Constrained Permutation Invariant Training, Speech Separation
Support
Quality
Security
License
Reuse
A simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition.
Support
Quality
Security
License
Reuse
an tutorial implement of voice conversion using pytorch
Support
Quality
Security
License
Reuse
🦁 Nala is an agile open-source voice assistant framework (20+ actions).
Support
Quality
Security
License
Reuse
Speech recognition using Google Cloud Speech API
Support
Quality
Security
License
Reuse
Create a web-based intelligent personal assistant (IPA) in NodeJS using two Watson services: Natural Language Classifier (NLC) and Dialog. https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/nl-classifier/ www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/dialog.html
Support
Quality
Security
License
Reuse
Self-contained multilingual TTS speech synthesizer for Node.js in pure js
Support
Quality
Security
License
Reuse
Node module for voice commands using native, offline speech recognition.
Support
Quality
Security
License
Reuse
RestSharp with Polly
Support
Quality
Security
License
Reuse
Web Browser Audio Detection/Speech Recording Events API
Support
Quality
Security
License
Reuse
Angular 5+ speech recognition service (based on browser implementation such as Chrome).
Support
Quality
Security
License
Reuse
the Tensorflow version of multi-speaker TTS training with feedback constraint
Support
Quality
Security
License
Reuse
AI grand challenge 2020 Repo (Speech Recognition Track)
Support
Quality
Security
License
Reuse
An extensible speech synthesis system, build with PyTorch and the original code is from r9y9's https://github.com/r9y9/nnmnkwii_gallery
Support
Quality
Security
License
Reuse
Add-ons for Home Assistant's Hass.IO
Support
Quality
Security
License
Reuse
Keyword Search Recipe for Subword ASR
Support
Quality
Security
License
Reuse
Convert text to audiable speech. Play it or save it to audio file.
Support
Quality
Security
License
Reuse
Few-Shot Keyword Spotting
Support
Quality
Security
License
Reuse
Non-blocking Asterisk modules for accessing VoiceKit services for speech recognition and speech synthesis.
Support
Quality
Security
License
Reuse
Official implementation of SpeechSplit2
Support
Quality
Security
License
Reuse
s
speaker-anonymizationby DigitalPhonetics
Python 24 Version:Current License: Strong Copyleft (GPL-3.0)
Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.
Support
Quality
Security
License
Reuse
Twitter Spaces Host and Speaker's Lounge
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
A virtual waifu that you can speak to through your mic and it'll speak back to you!
Support
Quality
Security
License
Reuse
Mass interaction physics library for Processing, including Audio and Haptic capabilities. Latest compiled release for Processing environment : https://github.com/mi-creative/miPhysics_Processing/releases/tag/2.0.0
Support
Quality
Security
License
Reuse
Towards an end-to-end speech recognizer for Portuguese using deep neural networks
Support
Quality
Security
License
Reuse
Some simple wrappers around eSpeak NG intended to make using this excellent TTS for waveform and IPA generation as convenient as possible.
Support
Quality
Security
License
Reuse
Hybrid speech synthesiser
Support
Quality
Security
License
Reuse
Tensorflow 2 implementation of Speech Separation Methods
Support
Quality
Security
License
Reuse
PAGAN: a phase-adapted GAN for speech enhancement
Support
Quality
Security
License
Reuse
[KAIST CS420] Transfer Learning from Speaker Verification to Zero-Shot Multispeaker Korean Text-To-Speech Synthesis
Support
Quality
Security
License
Reuse
Properly handle position-dependent phones in a subword lexicon FST
Support
Quality
Security
License
Reuse
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"
Support
Quality
Security
License
Reuse
声纹识别(Voiceprint Recognition, VPR),也称为说话人识别(Speaker Recognition),有两类,即说话人辨认(Speaker Identification)和说话人确认(Speaker Verification)
Support
Quality
Security
License
Reuse
C
Python 23 Version:Current License: Permissive (BSD-3-Clause)
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral domain features. The proposed postprocessors in both domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed postprocessor improves speech quality (PESQ) by up to 0.25 MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points for G.722, and 0.26 points for adaptive multirate wideband codec (AMR-WB). In a subjective CCR listening test, the proposed postprocessor on G.711-coded speech exceeds the speech quality of an ITU-T-standardized postfilter by 0.36 CMOS points, and obtains a clear preference of 1.77 CMOS points compared to G.711, even en par with uncoded speech.
Support
Quality
Security
License
Reuse
A wavelet audio denoiser done in python
Support
Quality
Security
License
Reuse
U
UDP-CPPby UnknownDetectionParty
Unknown Detection Party
C++ 25Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
DroidAudioby yqpan1991
useful for learning android audio system
C 25Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
Listen-Attend-Spell-v2by foamliu
PyTorch implementation of Listen Attend and Spell Automatic Speech Recognition (ASR).
Shell 25Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
K
Khronosby syb0rg
The open source intelligent personal assistant
C 25Updated: 5 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
v
voice-commandsby baitsart
Voice commands (command your PC with spoken commands)
Shell 25Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tritiumby syb0rg
A free, premium quality speech synthesis engine written completely in C.
C 25Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tf-flowavenetby gvashkevich
Tensorflow implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Jupyter Notebook 25Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SRT-To-SSMLby ThioJoe
Converts SRT subtitle file to SSML file with speech durations
Python 25Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
O
Only-Noisy-Trainingby liqingchunnnn
A self-supervised speech denoising strategy named Only-Noisy Training (ONT), which solves the speech denoising problem with only noisy audio signals in audio space for the first time.
Python 25Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
G
Google-speech-to-text-python-websocket-server-using-microphone-streamby dawntcherian
Python WebSocket server which converts input audio stream from microphone to text using Google speech to text
Python 24Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
ppt_presenterby chaonan99
Convert ppt to video with audio track, using text to speech synthesis
Python 24Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
k
koremoby warnikchow
5-class Korean speech emotion classifier
Python 24Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Support
Quality
Security
License
Reuse
s
speech_separationby xuchenglin28
Constrained Permutation Invariant Training, Speech Separation
Python 24Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech-training-recorderby daanzu
A simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition.
Python 24Updated: 4 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
v
voice-conversionby azraelkuan
an tutorial implement of voice conversion using pytorch
Python 24Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
n
nalaby jim-schwoebel
🦁 Nala is an agile open-source voice assistant framework (20+ actions).
Python 24Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
n
nlp_speechby nguyenhuyanhh
Speech recognition using Google Cloud Speech API
Python 24Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
w
watson-ipa-web-nodejsby biosopher
Create a web-based intelligent personal assistant (IPA) in NodeJS using two Watson services: Natural Language Classifier (NLC) and Dialog. https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/nl-classifier/ www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/dialog.html
JavaScript 24Updated: 6 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
text2wav.node.jsby abbr
Self-contained multilingual TTS speech synthesizer for Node.js in pure js
JavaScript 24Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-commandby baluubas
Node module for voice commands using native, offline speech recognition.
C# 24Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
R
Support
Quality
Security
License
Reuse
W
WeBADby solyarisoftware
Web Browser Audio Detection/Speech Recording Events API
JavaScript 24Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
ngx-speech-recognitionby kamiazya
Angular 5+ speech recognition service (based on browser implementation such as Chrome).
TypeScript 24Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tf_multispeakerTTS_fcby caizexin
the Tensorflow version of multi-speaker TTS training with feedback constraint
Python 24Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AI-Grand-Challenge-2020by NeuroAI-PI
AI grand challenge 2020 Repo (Speech Recognition Track)
Python 24Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
E
ExtensibleTTS-PyTorchby huiw39
An extensible speech synthesis system, build with PyTorch and the original code is from r9y9's https://github.com/r9y9/nnmnkwii_gallery
Python 24Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
h
hassio-addonsby rhasspy
Add-ons for Home Assistant's Hass.IO
Shell 24Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
K
KWS-Scriptsby lallubharteja
Keyword Search Recipe for Subword ASR
Shell 24Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
Sayby youknowone
Convert text to audiable speech. Play it or save it to audio file.
Swift 24Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
F
Few-Shot-KWSby ArchitParnami
Few-Shot Keyword Spotting
Jupyter Notebook 24Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
asterisk-voicekit-modulesby Tinkoff
Non-blocking Asterisk modules for accessing VoiceKit services for speech recognition and speech synthesis.
Shell 24Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SpeechSplit2by biggytruck
Official implementation of SpeechSplit2
Python 24Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speaker-anonymizationby DigitalPhonetics
Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.
Python 24Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
spacesloungeby avie-dev
Twitter Spaces Host and Speaker's Lounge
JavaScript 24Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
ConvS2S-VCby kamepong
Python 24Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
O
OneRealityby DogeLord081
A virtual waifu that you can speak to through your mic and it'll speak back to you!
Python 24Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
m
miPhysics_Processingby mi-creative
Mass interaction physics library for Processing, including Audio and Haptic capabilities. Latest compiled release for Processing environment : https://github.com/mi-creative/miPhysics_Processing/releases/tag/2.0.0
Java 23Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
sbrt2017by igormq
Towards an end-to-end speech recognizer for Portuguese using deep neural networks
Python 23Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
py-espeak-ngby gooofy
Some simple wrappers around eSpeak NG intended to make using this excellent TTS for waveform and IPA generation as convenient as possible.
Python 23Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
snickeryby CSTR-Edinburgh
Hybrid speech synthesiser
Python 23Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Speech-Separation-TF2by r06944010
Tensorflow 2 implementation of Speech Separation Methods
Python 23Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
P
PAGANby Zihang97
PAGAN: a phase-adapted GAN for speech enhancement
Python 23Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tts-korby jw9730
[KAIST CS420] Transfer Learning from Speaker Verification to Zero-Shot Multispeaker Korean Text-To-Speech Synthesis
Python 23Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
subword-kaldiby aalto-speech
Properly handle position-dependent phones in a subword lexicon FST
Python 23Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
J
Joint-Slot-Fillingby pengshuang
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"
Python 23Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
Speaker-Recognitionby mialrr
声纹识别(Voiceprint Recognition, VPR),也称为说话人识别(Speaker Recognition),有两类,即说话人辨认(Speaker Identification)和说话人确认(Speaker Verification)
Python 23Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
C
ConvolutionaNeuralNetworksToEnhanceCodedSpeechby ansleliu
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral domain features. The proposed postprocessors in both domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed postprocessor improves speech quality (PESQ) by up to 0.25 MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points for G.722, and 0.26 points for adaptive multirate wideband codec (AMR-WB). In a subjective CCR listening test, the proposed postprocessor on G.711-coded speech exceeds the speech quality of an ITU-T-standardized postfilter by 0.36 CMOS points, and obtains a clear preference of 1.77 CMOS points compared to G.711, even en par with uncoded speech.
Python 23Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
w
wavelet-denoiserby actonDev
A wavelet audio denoiser done in python
Python 23Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse