🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
Support
Quality
Security
License
Reuse
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Support
Quality
Security
License
Reuse
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Support
Quality
Security
License
Reuse
Angular Ivy library compatibility validation project
Support
Quality
Security
License
Reuse
A Deep-Learning-Based Persian Speech Recognition System
Support
Quality
Security
License
Reuse
The Panako acoustic fingerprinting system.
Support
Quality
Security
License
Reuse
A Joint Chinese segmentation and POS tagger based on bidirectional GRU-CRF
Support
Quality
Security
License
Reuse
Audio processing library, which provides waveform data
Support
Quality
Security
License
Reuse
A collection of datasets for the purpose of emotion recognition/detection in speech.
Support
Quality
Security
License
Reuse
⌨️ Command-line interface (CLI) for a better use of Leon, your open-source personal assistant. GNU/Linux, macOS and Windows supported.
Support
Quality
Security
License
Reuse
building blocks to create voice interface applications
Support
Quality
Security
License
Reuse
Automatically synchronize crowd-sourced concert videos
Support
Quality
Security
License
Reuse
WAV utility for saving and loading wav files in Unity
Support
Quality
Security
License
Reuse
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Support
Quality
Security
License
Reuse
Include some core functions and model to handle speech separation
Support
Quality
Security
License
Reuse
SOVA ASR (Automatic Speech Recognition)
Support
Quality
Security
License
Reuse
A Speech-to-text Demo App
Support
Quality
Security
License
Reuse
An overview of the AI-as-a-service landscape
Support
Quality
Security
License
Reuse
Golang bindings for Mozilla's DeepSpeech speech-to-text library
Support
Quality
Security
License
Reuse
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.
Support
Quality
Security
License
Reuse
TTS for pitch-accented language. Korean dialect DB.
Support
Quality
Security
License
Reuse
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Support
Quality
Security
License
Reuse
Implementation of the Griffin and Lim algorithm to recover an audio signal from a magnitude-only spectrogram.
Support
Quality
Security
License
Reuse
微信大数据2021 1st,qq浏览器2021 3rd,mind新闻推荐2020 1st,NAIC2020 AI+遥感影像 2nd
Support
Quality
Security
License
Reuse
d
dual-path-RNNs-DPRNNs-based-speech-separationby ShiZiqiang
Python 
139
Version:Current
License: No License (No License)
A PyTorch implementation of dual-path RNNs (DPRNNs) based speech separation described in "Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation".
Support
Quality
Security
License
Reuse
A Fast Sequence Transducer Implementation with PyTorch Bindings
Support
Quality
Security
License
Reuse
Some basic praat scripts.
Support
Quality
Security
License
Reuse
Port of the Festival-lite (Flite TTS) speech-synthesis engine to Android
Support
Quality
Security
License
Reuse
Collection of EM algorithms for blind source separation of audio signals
Support
Quality
Security
License
Reuse
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Support
Quality
Security
License
Reuse
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Support
Quality
Security
License
Reuse
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Support
Quality
Security
License
Reuse
ASR with PyTorch
Support
Quality
Security
License
Reuse
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Support
Quality
Security
License
Reuse
A voice-computing assistant built in Ruby.
Support
Quality
Security
License
Reuse
Audio fingerprinting and recognition in C++
Support
Quality
Security
License
Reuse
Raspberry Pi Translation Tool
Support
Quality
Security
License
Reuse
Accompanying repository for Ubicoustics: Plug-and-Play Acoustic Activity Recognition
Support
Quality
Security
License
Reuse
A
Python 
134
Version:Current
License: No License (No License)
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Support
Quality
Security
License
Reuse
speech recognition based on tensorflow 1.0.0
Support
Quality
Security
License
Reuse
A neural attention model for speech command recognition
Support
Quality
Security
License
Reuse
AdaSpeech: Adaptive Text to Speech for Custom Voice
Support
Quality
Security
License
Reuse
Speech recognition in Unity3D.
Support
Quality
Security
License
Reuse
A versatile cutting tool for R
Support
Quality
Security
License
Reuse
Your voice-controlled Mac assistant
Support
Quality
Security
License
Reuse
TTS with The Massively Multilingual Speech (MMS) project
Support
Quality
Security
License
Reuse
⦠ Angle: new speakable syntax for python 💡
Support
Quality
Security
License
Reuse
🙊 software for creating speech recognition models.
Support
Quality
Security
License
Reuse
L
Looking-to-Listen-at-the-Cocktail-Partyby JusperLee
Python 
132
Version:Current
License: Permissive (MIT)
Executable code based on Google articles
Support
Quality
Security
License
Reuse
Produce Word Document, CSV or SQLite transcriptions using the automatic speech recognition from AWS Transcribe.
Support
Quality
Security
License
Reuse
C
CCAlignerby saurabhshri
🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
C++
150
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
i
iSTFTNet-pytorchby rishikksh20
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Python
149
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Speech-Translateby Dadangdut33
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Python
149
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
n
ngcc-validationby angular
Angular Ivy library compatibility validation project
TypeScript
148
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
speech2textby shenasa-ai
A Deep-Learning-Based Persian Speech Recognition System
Jupyter Notebook
148
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Panakoby JorenSix
The Panako acoustic fingerprinting system.
Java
147
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
t
taggerby yanshao9798
A Joint Chinese segmentation and POS tagger based on bidirectional GRU-CRF
Python
147
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
A
Amplitudaby lincollincol
Audio processing library, which provides waveform data
C
147
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SER-datasetsby SuperKogito
A collection of datasets for the purpose of emotion recognition/detection in speech.
HTML
147
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
leon-cliby leon-ai
⌨️ Command-line interface (CLI) for a better use of Leon, your open-source personal assistant. GNU/Linux, macOS and Windows supported.
TypeScript
147
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-engineby voice-engine
building blocks to create voice interface applications
Python
146
Updated: 4 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
V
VideoSyncby allisonnicoledeal
Automatically synchronize crowd-sourced concert videos
Python
146
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
U
UnityWavby deadlyfingers
WAV utility for saving and loading wav files in Unity
C#
146
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SpeechRecognizerButtonby alexruperez
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Swift
146
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
speech_separationby bill9800
Include some core functions and model to handle speech separation
Python
144
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
sova-asrby sovaai
SOVA ASR (Automatic Speech Recognition)
Python
144
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SpeechToTextDemoby appcoda
A Speech-to-text Demo App
Swift
144
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
awesome-ai-servicesby sekwiatkowski
An overview of the AI-as-a-service landscape
Java
142
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
g
go-astideepspeechby asticode
Golang bindings for Mozilla's DeepSpeech speech-to-text library
Go
142
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Comprehensive-Transformer-TTSby keonlee9420
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.
Python
142
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pitchtronby hash2430
TTS for pitch-accented language. Korean dialect DB.
Python
141
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hubertby bshall
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Python
141
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
g
griffin_limby bkvogel
Implementation of the Griffin and Lim algorithm to recover an audio signal from a magnitude-only spectrogram.
Python
140
Updated: 4 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
p
pikachu2by chenghuige
微信大数据2021 1st,qq浏览器2021 3rd,mind新闻推荐2020 1st,NAIC2020 AI+遥感影像 2nd
Jupyter Notebook
140
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
d
dual-path-RNNs-DPRNNs-based-speech-separationby ShiZiqiang
A PyTorch implementation of dual-path RNNs (DPRNNs) based speech separation described in "Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation".
Python
139
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
t
transducerby awni
A Fast Sequence Transducer Implementation with PyTorch Bindings
Python
139
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
P
Praat_Scriptsby feelins
Some basic praat scripts.
Python
139
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
F
Flite-TTS-Engine-for-Androidby happyalu
Port of the Festival-lite (Flite TTS) speech-synthesis engine to Android
Java
138
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
pb_bssby fgnt
Collection of EM algorithms for blind source separation of audio signals
Python
138
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
deep_avsrby smeetrs
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Python
138
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
S
SoundSourceSeparationby sekiguchi92
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Python
137
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
AndroidMaryTTSby AndroidMaryTTS
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Java
137
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
h
hifigan-denoiserby rishikksh20
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Python
137
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
isabellaby chrisvfritz
A voice-computing assistant built in Ruby.
Ruby
136
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
a
audio_recognitionby JiahuiYu
Audio fingerprinting and recognition in C++
C++
136
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
P
PiTranslateby dconroy
Raspberry Pi Translation Tool
Python
135
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
u
ubicousticsby FIGLAB
Accompanying repository for Ubicoustics: Plug-and-Play Acoustic Activity Recognition
Python
135
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancementby haoxiangsnr
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Python
134
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
t
tensorflow-wavenetby Deeperjia
speech recognition based on tensorflow 1.0.0
Python
134
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
S
SpeechCmdRecognitionby douglas125
A neural attention model for speech command recognition
Jupyter Notebook
134
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
A
AdaSpeechby rishikksh20
AdaSpeech: Adaptive Text to Speech for Custom Voice
Jupyter Notebook
134
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Speech-Recognition-Unityby LightBuzz
Speech recognition in Unity3D.
C#
133
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
santokuby hughjonesd
A versatile cutting tool for R
JavaScript
133
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
G
GPT-Automatorby chidiwilliams
Your voice-controlled Mac assistant
Python
133
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
t
ttsmmsby wannaphong
TTS with The Massively Multilingual Speech (MMS) project
Python
133
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
angleby pannous
⦠ Angle: new speakable syntax for python 💡
Python
132
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
e
elpisby CoEDL
🙊 software for creating speech recognition models.
Python
132
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
Looking-to-Listen-at-the-Cocktail-Partyby JusperLee
Executable code based on Google articles
Python
132
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
aws_transcribe_to_docxby kibaffo33
Produce Word Document, CSV or SQLite transcriptions using the automatic speech recognition from AWS Transcribe.
Python
132
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse