🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
Support
Quality
Security
License
Reuse
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Support
Quality
Security
License
Reuse
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Support
Quality
Security
License
Reuse
Angular Ivy library compatibility validation project
Support
Quality
Security
License
Reuse
A Deep-Learning-Based Persian Speech Recognition System
Support
Quality
Security
License
Reuse
The Panako acoustic fingerprinting system.
Support
Quality
Security
License
Reuse
A Joint Chinese segmentation and POS tagger based on bidirectional GRU-CRF
Support
Quality
Security
License
Reuse
Audio processing library, which provides waveform data
Support
Quality
Security
License
Reuse
A collection of datasets for the purpose of emotion recognition/detection in speech.
Support
Quality
Security
License
Reuse
⌨️ Command-line interface (CLI) for a better use of Leon, your open-source personal assistant. GNU/Linux, macOS and Windows supported.
Support
Quality
Security
License
Reuse
building blocks to create voice interface applications
Support
Quality
Security
License
Reuse
Automatically synchronize crowd-sourced concert videos
Support
Quality
Security
License
Reuse
WAV utility for saving and loading wav files in Unity
Support
Quality
Security
License
Reuse
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Support
Quality
Security
License
Reuse
Include some core functions and model to handle speech separation
Support
Quality
Security
License
Reuse
SOVA ASR (Automatic Speech Recognition)
Support
Quality
Security
License
Reuse
A Speech-to-text Demo App
Support
Quality
Security
License
Reuse
An overview of the AI-as-a-service landscape
Support
Quality
Security
License
Reuse
Golang bindings for Mozilla's DeepSpeech speech-to-text library
Support
Quality
Security
License
Reuse
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.
Support
Quality
Security
License
Reuse
TTS for pitch-accented language. Korean dialect DB.
Support
Quality
Security
License
Reuse
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Support
Quality
Security
License
Reuse
Implementation of the Griffin and Lim algorithm to recover an audio signal from a magnitude-only spectrogram.
Support
Quality
Security
License
Reuse
微信大数据2021 1st,qq浏览器2021 3rd,mind新闻推荐2020 1st,NAIC2020 AI+遥感影像 2nd
Support
Quality
Security
License
Reuse
d
dual-path-RNNs-DPRNNs-based-speech-separationby ShiZiqiang
Python 139 Version:Current License: No License (No License)
A PyTorch implementation of dual-path RNNs (DPRNNs) based speech separation described in "Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation".
Support
Quality
Security
License
Reuse
A Fast Sequence Transducer Implementation with PyTorch Bindings
Support
Quality
Security
License
Reuse
Some basic praat scripts.
Support
Quality
Security
License
Reuse
Port of the Festival-lite (Flite TTS) speech-synthesis engine to Android
Support
Quality
Security
License
Reuse
Collection of EM algorithms for blind source separation of audio signals
Support
Quality
Security
License
Reuse
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Support
Quality
Security
License
Reuse
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Support
Quality
Security
License
Reuse
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Support
Quality
Security
License
Reuse
ASR with PyTorch
Support
Quality
Security
License
Reuse
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Support
Quality
Security
License
Reuse
A voice-computing assistant built in Ruby.
Support
Quality
Security
License
Reuse
Audio fingerprinting and recognition in C++
Support
Quality
Security
License
Reuse
Raspberry Pi Translation Tool
Support
Quality
Security
License
Reuse
Accompanying repository for Ubicoustics: Plug-and-Play Acoustic Activity Recognition
Support
Quality
Security
License
Reuse
A
Python 134 Version:Current License: No License (No License)
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Support
Quality
Security
License
Reuse
speech recognition based on tensorflow 1.0.0
Support
Quality
Security
License
Reuse
A neural attention model for speech command recognition
Support
Quality
Security
License
Reuse
AdaSpeech: Adaptive Text to Speech for Custom Voice
Support
Quality
Security
License
Reuse
Speech recognition in Unity3D.
Support
Quality
Security
License
Reuse
A versatile cutting tool for R
Support
Quality
Security
License
Reuse
Your voice-controlled Mac assistant
Support
Quality
Security
License
Reuse
TTS with The Massively Multilingual Speech (MMS) project
Support
Quality
Security
License
Reuse
⦠ Angle: new speakable syntax for python 💡
Support
Quality
Security
License
Reuse
🙊 software for creating speech recognition models.
Support
Quality
Security
License
Reuse
L
Looking-to-Listen-at-the-Cocktail-Partyby JusperLee
Python 132 Version:Current License: Permissive (MIT)
Executable code based on Google articles
Support
Quality
Security
License
Reuse
Produce Word Document, CSV or SQLite transcriptions using the automatic speech recognition from AWS Transcribe.
Support
Quality
Security
License
Reuse
C
CCAlignerby saurabhshri
🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
C++ 150Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
i
iSTFTNet-pytorchby rishikksh20
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Python 149Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Speech-Translateby Dadangdut33
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Python 149Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
ngcc-validationby angular
Angular Ivy library compatibility validation project
TypeScript 148Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech2textby shenasa-ai
A Deep-Learning-Based Persian Speech Recognition System
Jupyter Notebook 148Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Panakoby JorenSix
The Panako acoustic fingerprinting system.
Java 147Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
t
taggerby yanshao9798
A Joint Chinese segmentation and POS tagger based on bidirectional GRU-CRF
Python 147Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
A
Amplitudaby lincollincol
Audio processing library, which provides waveform data
C 147Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SER-datasetsby SuperKogito
A collection of datasets for the purpose of emotion recognition/detection in speech.
HTML 147Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
leon-cliby leon-ai
⌨️ Command-line interface (CLI) for a better use of Leon, your open-source personal assistant. GNU/Linux, macOS and Windows supported.
TypeScript 147Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-engineby voice-engine
building blocks to create voice interface applications
Python 146Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
V
VideoSyncby allisonnicoledeal
Automatically synchronize crowd-sourced concert videos
Python 146Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
U
UnityWavby deadlyfingers
WAV utility for saving and loading wav files in Unity
C# 146Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SpeechRecognizerButtonby alexruperez
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Swift 146Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech_separationby bill9800
Include some core functions and model to handle speech separation
Python 144Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
sova-asrby sovaai
SOVA ASR (Automatic Speech Recognition)
Python 144Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SpeechToTextDemoby appcoda
A Speech-to-text Demo App
Swift 144Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
awesome-ai-servicesby sekwiatkowski
An overview of the AI-as-a-service landscape
Java 142Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
go-astideepspeechby asticode
Golang bindings for Mozilla's DeepSpeech speech-to-text library
Go 142Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Comprehensive-Transformer-TTSby keonlee9420
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.
Python 142Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pitchtronby hash2430
TTS for pitch-accented language. Korean dialect DB.
Python 141Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hubertby bshall
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Python 141Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
griffin_limby bkvogel
Implementation of the Griffin and Lim algorithm to recover an audio signal from a magnitude-only spectrogram.
Python 140Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
p
pikachu2by chenghuige
微信大数据2021 1st,qq浏览器2021 3rd,mind新闻推荐2020 1st,NAIC2020 AI+遥感影像 2nd
Jupyter Notebook 140Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
dual-path-RNNs-DPRNNs-based-speech-separationby ShiZiqiang
A PyTorch implementation of dual-path RNNs (DPRNNs) based speech separation described in "Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation".
Python 139Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
transducerby awni
A Fast Sequence Transducer Implementation with PyTorch Bindings
Python 139Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
P
Praat_Scriptsby feelins
Some basic praat scripts.
Python 139Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
F
Flite-TTS-Engine-for-Androidby happyalu
Port of the Festival-lite (Flite TTS) speech-synthesis engine to Android
Java 138Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
pb_bssby fgnt
Collection of EM algorithms for blind source separation of audio signals
Python 138Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
deep_avsrby smeetrs
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Python 138Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SoundSourceSeparationby sekiguchi92
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Python 137Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
AndroidMaryTTSby AndroidMaryTTS
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Java 137Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
h
hifigan-denoiserby rishikksh20
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Python 137Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
isabellaby chrisvfritz
A voice-computing assistant built in Ruby.
Ruby 136Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
audio_recognitionby JiahuiYu
Audio fingerprinting and recognition in C++
C++ 136Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
P
PiTranslateby dconroy
Raspberry Pi Translation Tool
Python 135Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
u
ubicousticsby FIGLAB
Accompanying repository for Ubicoustics: Plug-and-Play Acoustic Activity Recognition
Python 135Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancementby haoxiangsnr
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Python 134Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tensorflow-wavenetby Deeperjia
speech recognition based on tensorflow 1.0.0
Python 134Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
S
SpeechCmdRecognitionby douglas125
A neural attention model for speech command recognition
Jupyter Notebook 134Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AdaSpeechby rishikksh20
AdaSpeech: Adaptive Text to Speech for Custom Voice
Jupyter Notebook 134Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Speech-Recognition-Unityby LightBuzz
Speech recognition in Unity3D.
C# 133Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
santokuby hughjonesd
A versatile cutting tool for R
JavaScript 133Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
G
GPT-Automatorby chidiwilliams
Your voice-controlled Mac assistant
Python 133Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
ttsmmsby wannaphong
TTS with The Massively Multilingual Speech (MMS) project
Python 133Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
angleby pannous
⦠ Angle: new speakable syntax for python 💡
Python 132Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
elpisby CoEDL
🙊 software for creating speech recognition models.
Python 132Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
Looking-to-Listen-at-the-Cocktail-Partyby JusperLee
Executable code based on Google articles
Python 132Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
aws_transcribe_to_docxby kibaffo33
Produce Word Document, CSV or SQLite transcriptions using the automatic speech recognition from AWS Transcribe.
Python 132Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse