target speaker extraction and verification for multi-talker speech
Support
Quality
Security
License
Reuse
Voice activity detection in Javascript
Support
Quality
Security
License
Reuse
PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.
Support
Quality
Security
License
Reuse
A Vue2 Performs synchronous speech recognition Speech to text Google Cloud Speech With Progressive Web App
Support
Quality
Security
License
Reuse
Chrome Web Speech API
Support
Quality
Security
License
Reuse
s
speaker-recognition-papersby bjfu-ai-institute
Python 89 Version:Current License: No License (No License)
Share some recent speaker recognition papers and their implementations.
Support
Quality
Security
License
Reuse
Speech separation with utterance-level PIT experiments
Support
Quality
Security
License
Reuse
An inclusive audio guide for The Andy Warhol Museum
Support
Quality
Security
License
Reuse
Speech-to-text based on wav2letter built for transfer learning
Support
Quality
Security
License
Reuse
Model for recasing and repunctuating ASR transcripts
Support
Quality
Security
License
Reuse
.Net core 6 SDK for ElevateAI
Support
Quality
Security
License
Reuse
An LDA/PLDA estimator using KALDI in python for speaker verification tasks
Support
Quality
Security
License
Reuse
Speech-to-text based on wav2letter built for transfer learning
Support
Quality
Security
License
Reuse
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.
Support
Quality
Security
License
Reuse
application of vits on mandarin tts
Support
Quality
Security
License
Reuse
Festvox voice building tools
Support
Quality
Security
License
Reuse
Implementation of speech to singing of interspeech20' paper.
Support
Quality
Security
License
Reuse
voice services stack from audio hardware through hotword, ASR, NLU, AI routing and TTS bound by messaging protocol over MQTT
Support
Quality
Security
License
Reuse
Speech Enhancement using Bayesian WaveNet
Support
Quality
Security
License
Reuse
An Attention Based Open-Source End to End Speech Synthesis Framework, No CNN, No RNN, No MFCC!!!
Support
Quality
Security
License
Reuse
Voice activity detection
Support
Quality
Security
License
Reuse
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Support
Quality
Security
License
Reuse
Vits Japanese with Whisper as data processor (you can train your VITS even you only have audios)
Support
Quality
Security
License
Reuse
B.E.N.J.I.- The Impossible Missions Force's digital assistant
Support
Quality
Security
License
Reuse
A ruby gem for Text-To-Speech by using google translate service.
Support
Quality
Security
License
Reuse
Tools to convert text to speech :books::speech_balloon:
Support
Quality
Security
License
Reuse
n
nativescript-speech-recognitionby EddyVerbruggen
TypeScript 86 Version:Current License: Proprietary (Proprietary)
:speech_balloon: Speech to text, using the awesome engines readily available on the device.
Support
Quality
Security
License
Reuse
Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
Support
Quality
Security
License
Reuse
Speech-to-text library in C
Support
Quality
Security
License
Reuse
Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis
Support
Quality
Security
License
Reuse
🎙️ Handsfree Audio Development Interface
Support
Quality
Security
License
Reuse
Speaker diarization based on Kaldi x-vectors, tuned for 16k microphone data
Support
Quality
Security
License
Reuse
Official Implementation of "Visual Speech Enhancement", Interspeech 2018.
Support
Quality
Security
License
Reuse
A PyTorch implementation of SEGAN based on INTERSPEECH 2017 paper "SEGAN: Speech Enhancement Generative Adversarial Network"
Support
Quality
Security
License
Reuse
Python implementation of the Flexible Audio Source Separation Toolbox (FASST)
Support
Quality
Security
License
Reuse
I/O customizable voice driven butler - http://720kb.github.io/butler/
Support
Quality
Security
License
Reuse
End to end Arabic TTS system based on tacotron
Support
Quality
Security
License
Reuse
One line solution for Android Text to speech(TTS) & Speech to Text(STT) translation problem
Support
Quality
Security
License
Reuse
OSSSpeechKit offers a native iOS Speech wrapper for AVFoundation and Apple's Speech.
Support
Quality
Security
License
Reuse
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow
Support
Quality
Security
License
Reuse
开源人工智能,基于开源软硬件构建语音对话机器人、智能音箱……人机对话、自然交互,来宝拥有无限可能。特别说明,来宝运行于Python 3!
Support
Quality
Security
License
Reuse
AGI-server voice recognizer for #Asterisk
Support
Quality
Security
License
Reuse
NodeJS service wrapper for Microsoft Speech API and Custom Speech Service
Support
Quality
Security
License
Reuse
FFTNet vocoder implementation
Support
Quality
Security
License
Reuse
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (Interspeech 2022)
Support
Quality
Security
License
Reuse
Public repository for the paper "Learning Sound Event Classifiers from Web Audio with Noisy Labels"
Support
Quality
Security
License
Reuse
A simple command line manager for OverDrive/Libby loans. Download your library loans from the command line.
Support
Quality
Security
License
Reuse
SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. It is mainly written as a perl library but its functionality also depends on freely available software, namely HTK, srilm and sclite.
Support
Quality
Security
License
Reuse
Vector-Quantized Contrastive Predictive Coding for Acoustic Unit Discovery and Voice Conversion
Support
Quality
Security
License
Reuse
Python wrapper for Kaldi decoders (Kaldi https://sourceforge.net/projects/kaldi/)
Support
Quality
Security
License
Reuse
s
speaker_extractionby xuchenglin28
target speaker extraction and verification for multi-talker speech
Python 92Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
v
vad.jsby kdavis-mozilla
Voice activity detection in Javascript
JavaScript 92Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
e
efficientspeechby roatienza
PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.
Jupyter Notebook 92Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
vue-pwa-speechby aofdev
A Vue2 Performs synchronous speech recognition Speech to text Google Cloud Speech With Progressive Web App
JavaScript 90Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Chrome-Web-Speech-APIby bensonruan
Chrome Web Speech API
JavaScript 90Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speaker-recognition-papersby bjfu-ai-institute
Share some recent speaker recognition papers and their implementations.
Python 89Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
u
uPIT-for-speech-separationby funcwj
Speech separation with utterance-level PIT experiments
Python 89Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
T
TheWarholOutLoudby CMP-Studio
An inclusive audio guide for The Andy Warhol Museum
JavaScript 89Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speechlessby juliuskunze
Speech-to-text based on wav2letter built for transfer learning
Python 89Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
recasepuncby benob
Model for recasing and repunctuating ASR transcripts
Python 89Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
E
ElevateAIDotNetSDKby NICEElevateAI
.Net core 6 SDK for ElevateAI
C# 89Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
PLDAby RicherMans
An LDA/PLDA estimator using KALDI in python for speaker verification tasks
Python 88Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speechlessby JuliusKunze
Speech-to-text based on wav2letter built for transfer learning
Python 88Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
MaskCycleGAN-VCby GANtastic3
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.
Python 88Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
vits-mandarin-biaobeiby AlexandaJerry
application of vits on mandarin tts
Jupyter Notebook 88Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
f
festvoxby festvox
Festvox voice building tools
Python 87Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speech2singingby ericwudayi
Implementation of speech to singing of interspeech20' paper.
Python 87Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
h
hermodby syntithenai
voice services stack from audio hardware through hotword, ASR, NLU, AI routing and TTS bound by messaging protocol over MQTT
Python 87Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
W
WaveNet-Enhancementby auspicious3000
Speech Enhancement using Bayesian WaveNet
Python 87Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
D
Deep-Expressionby ttsunion
An Attention Based Open-Source End to End Speech Synthesis Framework, No CNN, No RNN, No MFCC!!!
Python 87Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
voice-activity-detectionby Jam3
Voice activity detection
JavaScript 87Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
ParallelTTSby atomicoo
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Python 87Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
whisper-vits-japaneseby AlexandaJerry
Vits Japanese with Whisper as data processor (you can train your VITS even you only have audios)
Jupyter Notebook 87Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
B
B.E.N.J.I.by the-ethan-hunt
B.E.N.J.I.- The Impossible Missions Force's digital assistant
Python 86Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
ttsby c2h2
A ruby gem for Text-To-Speech by using google translate service.
Ruby 86Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
ttsby eheikes
Tools to convert text to speech :books::speech_balloon:
JavaScript 86Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
nativescript-speech-recognitionby EddyVerbruggen
:speech_balloon: Speech to text, using the awesome engines readily available on the device.
TypeScript 86Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
torch-pitch-shiftby KentoNishi
Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
Python 86Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
april-asrby abb128
Speech-to-text library in C
C 86Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
f
face-vid2vidby zhengkw18
Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis
Python 86Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
i
idiolectby OpenASR
🎙️ Handsfree Audio Development Interface
Kotlin 86Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
V
VBDiarizationby Jamiroquai88
Speaker diarization based on Kaldi x-vectors, tuned for 16k microphone data
Python 85Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
audio-visual-speech-enhancementby avivga
Official Implementation of "Visual Speech Enhancement", Interspeech 2018.
Python 85Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SEGANby leftthomas
A PyTorch implementation of SEGAN based on INTERSPEECH 2017 paper "SEGAN: Speech Enhancement Generative Adversarial Network"
Python 85Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
pyfasstby wslihgt
Python implementation of the Flexible Audio Source Separation Toolbox (FASST)
Python 85Updated: 4 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
b
butlerby 720kb
I/O customizable voice driven butler - http://720kb.github.io/butler/
JavaScript 85Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
arabic-tacotron-ttsby youssefsharief
End to end Arabic TTS system based on tacotron
Python 84Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
Android-TTS-STTby hiteshsahu
One line solution for Android Text to speech(TTS) & Speech to Text(STT) translation problem
Kotlin 84Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
O
OSSSpeechKitby AppDevGuy
OSSSpeechKit offers a native iOS Speech wrapper for AVFoundation and Apple's Speech.
Swift 84Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tacotron2by ide8
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow
Jupyter Notebook 84Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
l
laibot-clientby jjwang
开源人工智能,基于开源软硬件构建语音对话机器人、智能音箱……人机对话、自然交互,来宝拥有无限可能。特别说明,来宝运行于Python 3!
Python 83Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voicerby antirek
AGI-server voice recognizer for #Asterisk
JavaScript 83Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
ms-bing-speech-serviceby noopkat
NodeJS service wrapper for Microsoft Speech API and Custom Speech Service
JavaScript 83Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
F
FFTNetby erogol
FFTNet vocoder implementation
Jupyter Notebook 83Updated: 3 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
S
SRD-VCby YoungSeng
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (Interspeech 2022)
Python 83Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
i
icassp19by edufonseca
Public repository for the paper "Learning Sound Event Classifiers from Web Audio with Noisy Labels"
Python 82Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
odmpyby ping
A simple command line manager for OverDrive/Libby loans. Download your library loans from the command line.
Python 82Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
sail_alignby nassosoassos
SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. It is mainly written as a perl library but its functionality also depends on freely available software, namely HTK, srilm and sclite.
Perl 82Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
V
VectorQuantizedCPCby bshall
Vector-Quantized Contrastive Predictive Coding for Acoustic Unit Discovery and Voice Conversion
Python 81Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pykaldiby UFAL-DSG
Python wrapper for Kaldi decoders (Kaldi https://sourceforge.net/projects/kaldi/)
Python 81Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse