Simple, hackable offline speech to text - using the VOSK-API.
Support
Quality
Security
License
Reuse
Subtitle Speech Synchronizer
Support
Quality
Security
License
Reuse
b
botium-speech-processingby codeforequity-at
JavaScript 939 Version:Current License: Permissive (MIT)
Botium Speech Processing
Support
Quality
Security
License
Reuse
A Python wrapper for Kaldi
Support
Quality
Security
License
Reuse
Unofficial PyTorch implementation of Google AI's VoiceFilter system
Support
Quality
Security
License
Reuse
Creates audio supercuts.
Support
Quality
Security
License
Reuse
Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。
Support
Quality
Security
License
Reuse
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
an open-source implementation of sequence-to-sequence based speech processing engine
Support
Quality
Security
License
Reuse
Audio MODEM Communication Library in Python
Support
Quality
Security
License
Reuse
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
Support
Quality
Security
License
Reuse
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Support
Quality
Security
License
Reuse
Audio Waveform Data Manipulation API – resample, offset and segment waveform data in JavaScript.
Support
Quality
Security
License
Reuse
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Support
Quality
Security
License
Reuse
The official repository of the Eesen project
Support
Quality
Security
License
Reuse
A fast, local neural text to speech system
Support
Quality
Security
License
Reuse
Stephanie is an open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of an virtual assistant's work.
Support
Quality
Security
License
Reuse
The Implementation of FastSpeech based on pytorch.
Support
Quality
Security
License
Reuse
Closed Captioning OBS plugin using Google Speech Recognition
Support
Quality
Security
License
Reuse
Jarvis.sh is a simple configurable multi-lang assistant.
Support
Quality
Security
License
Reuse
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
Support
Quality
Security
License
Reuse
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
Support
Quality
Security
License
Reuse
End to end text to speech system using gruut and onnx
Support
Quality
Security
License
Reuse
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
Support
Quality
Security
License
Reuse
OpenAI Whisper ASR Webservice API
Support
Quality
Security
License
Reuse
Open STT
Support
Quality
Security
License
Reuse
Chinese text-to-speech engine
Support
Quality
Security
License
Reuse
中文语音识别
Support
Quality
Security
License
Reuse
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Support
Quality
Security
License
Reuse
Adapt Intent Parser
Support
Quality
Security
License
Reuse
s
stream-audio-fingerprintby adblockradio
JavaScript 704 Version:Current License: Weak Copyleft (MPL-2.0)
Audio landmark fingerprinting as a Node Stream module
Support
Quality
Security
License
Reuse
A lightweight, simple-to-use, RNN wake word listener
Support
Quality
Security
License
Reuse
Production First and Production Ready End-to-End Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Tools for handling speech data in machine learning projects.
Support
Quality
Security
License
Reuse
Node.js client for Google Cloud Speech: Speech to text conversion powered by machine learning.
Support
Quality
Security
License
Reuse
A fast local neural text to speech engine for Mycroft
Support
Quality
Security
License
Reuse
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
Support
Quality
Security
License
Reuse
Python AI assistant 🧠
Support
Quality
Security
License
Reuse
Example scripts for voice assistants created with the Alan AI Platform.
Support
Quality
Security
License
Reuse
A PyTorch Implementation of End-to-End Models for Speech-to-Text
Support
Quality
Security
License
Reuse
A Speaker Recognition System
Support
Quality
Security
License
Reuse
:speech_balloon: An On-Premises, Streaming Speech Recognition System
Support
Quality
Security
License
Reuse
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Support
Quality
Security
License
Reuse
An open "intelligent" assistant for the web that can listen to you and learn.
Support
Quality
Security
License
Reuse
An audio/acoustic activity detection and audio segmentation tool
Support
Quality
Security
License
Reuse
语音api示例
Support
Quality
Security
License
Reuse
JavaScript implementation of Japanese morphological analyzer
Support
Quality
Security
License
Reuse
a
aiexperiments-drum-machineby googlecreativelab
JavaScript 615 Version:Current License: Permissive (Apache-2.0)
Thousands of everyday sounds, organized using machine learning.
Support
Quality
Security
License
Reuse
Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
Support
Quality
Security
License
Reuse
This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
Support
Quality
Security
License
Reuse
n
nerd-dictationby ideasman42
Simple, hackable offline speech to text - using the VOSK-API.
Python 956Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
subsyncby sc0ty
Subtitle Speech Synchronizer
C++ 943Updated: 1 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
b
botium-speech-processingby codeforequity-at
Botium Speech Processing
JavaScript 939Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pykaldiby pykaldi
A Python wrapper for Kaldi
Python 936Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
voicefilterby mindslab-ai
Unofficial PyTorch implementation of Google AI's VoiceFilter system
Python 912Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
z
zhrtvcby KuangDD
Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。
Python 890Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
espressoby freewym
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Python 887Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
athenaby athena-team
an open-source implementation of sequence-to-sequence based speech processing engine
C++ 869Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
amodemby romanz
Audio MODEM Communication Library in Python
Python 864Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
speechpyby astorfi
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
Python 839Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TensorFlowASRby TensorSpeech
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Jupyter Notebook 839Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
waveform-data.jsby bbc
Audio Waveform Data Manipulation API – resample, offset and segment waveform data in JavaScript.
JavaScript 824Updated: 3 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
f
flowtronby NVIDIA
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Jupyter Notebook 817Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
eesenby srvk
The official repository of the Eesen project
C++ 816Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
piperby rhasspy
A fast, local neural text to speech system
C++ 794Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
stephanie-vaby SlapBot
Stephanie is an open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of an virtual assistant's work.
Python 788Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
F
FastSpeechby xcmyz
The Implementation of FastSpeech based on pytorch.
Python 785Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
O
OBS-captions-pluginby ratwithacompiler
Closed Captioning OBS plugin using Google Speech Recognition
C++ 785Updated: 1 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
j
jarvisby alexylem
Jarvis.sh is a simple configurable multi-lang assistant.
Shell 780Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
mellotronby NVIDIA
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
Jupyter Notebook 773Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
C
CTCDecoderby githubharald
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
Python 766Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
larynxby rhasspy
End to end text to speech system using gruut and onnx
Python 766Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
q
quillmanby modal-labs
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
JavaScript 730Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
w
whisper-asr-webserviceby ahmetoner
OpenAI Whisper ASR Webservice API
Python 728Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
Support
Quality
Security
License
Reuse
e
ekhoby hgneng
Chinese text-to-speech engine
C++ 719Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
S
Speech-Transformerby kaituoxu
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Python 709Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
s
stream-audio-fingerprintby adblockradio
Audio landmark fingerprinting as a Node Stream module
JavaScript 704Updated: 3 y ago License: Weak Copyleft (MPL-2.0)
Support
Quality
Security
License
Reuse
m
mycroft-preciseby MycroftAI
A lightweight, simple-to-use, RNN wake word listener
Python 700Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
wenetby mobvoi
Production First and Production Ready End-to-End Speech Recognition Toolkit
Python 687Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
lhotseby lhotse-speech
Tools for handling speech data in machine learning projects.
Python 686Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
nodejs-speechby googleapis
Node.js client for Google Cloud Speech: Speech to text conversion powered by machine learning.
TypeScript 683Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mimic3by MycroftAI
A fast local neural text to speech engine for Mycroft
Python 676Updated: 2 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
t
ttskitby kuangdd
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
Python 661Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Python-ai-assistantby ggeop
Python AI assistant 🧠
Python 660Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
v
voice-assistant-scriptsby alan-ai
Example scripts for voice assistants created with the Alan AI Platform.
JavaScript 659Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
speechby awni
A PyTorch Implementation of End-to-End Models for Speech-to-Text
Python 654Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
speaker-recognitionby ppwwyyxx
A Speaker Recognition System
C++ 645Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
LibreASRby iceychris
:speech_balloon: An On-Premises, Streaming Speech Recognition System
Python 642Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
C
Cognitive-Speech-TTSby Azure-Samples
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
C# 635Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
m
mysamby mysamai
An open "intelligent" assistant for the web that can listen to you and learn.
JavaScript 628Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
auditokby amsehili
An audio/acoustic activity detection and audio segmentation tool
Python 626Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
k
kuromoji.jsby takuyaa
JavaScript implementation of Japanese morphological analyzer
JavaScript 620Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
aiexperiments-drum-machineby googlecreativelab
Thousands of everyday sounds, organized using machine learning.
JavaScript 615Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mimic1by MycroftAI
Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
C 615Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
L
Lip2Wavby Rudrabha
This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
Python 613Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse