Speech Libraries - Page 6

Python 358 Version:Current
License: No License (No License)

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

Support

Quality

Security

License

Reuse

FastDiffby Rongjiehuang

Python 352 Version:Current
License: No License (No License)

PyTorch Implementation of FastDiff (IJCAI'22)

Support

Quality

Security

License

Reuse

esp-srby espressif

C 350 Version:Current
License: Proprietary (Proprietary)

Speech recognition

Support

Quality

Security

License

Reuse

Neural_Network_Voicesby llSourcell

Python 349 Version:Current
License: No License (No License)

This is the code for "Neural Network Voices" by Siraj Raval on Youtube

Support

Quality

Security

License

Reuse

setkby funcwj

Python 347 Version:Current
License: Permissive (Apache-2.0)

Tools for Speech Enhancement integrated with Kaldi

Support

Quality

Security

License

Reuse

TimeSideby Ircam-WAM

Python 347 Version:Current
License: Strong Copyleft (AGPL-3.0)

scalable audio processing framework and server written in Python

Support

Quality

Security

License

Reuse

Thorsten-Voiceby thorstenMueller

Python 344 Version:Current
License: Permissive (CC0-1.0)

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Support

Quality

Security

License

Reuse

StarGAN-Voice-Conversionby liusongxiang

Python 342 Version:Current
License: No License (No License)

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

Support

Quality

Security

License

Reuse

FullSubNetby haoxiangsnr

Python 341 Version:Current
License: Permissive (MIT)

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Support

Quality

Security

License

Reuse

Multi-Tacotron-Voice-Cloningby vlomme

Python 336 Version:Current
License: Proprietary (Proprietary)

Phoneme multilingual(Russian-English) voice cloning based on

Support

Quality

Security

License

Reuse

Voice_Converter_CycleGANby leimao

Python 336 Version:Current
License: Permissive (MIT)

Voice Converter Using CycleGAN and Non-Parallel Data

Support

Quality

Security

License

Reuse

Python 335 Version:Current
License: Strong Copyleft (GPL-3.0)

Novoic's audio feature extraction library

Support

Quality

Security

License

Reuse

dragonflyby dictation-toolbox

Python 334 Version:Current
License: Weak Copyleft (LGPL-3.0)

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx

Support

Quality

Security

License

Reuse

pyctcdecodeby kensho-technologies

Python 333 Version:Current
License: Permissive (Apache-2.0)

A fast and lightweight python-based CTC beam search decoder for speech recognition.

Support

Quality

Security

License

Reuse

nodejs-text-to-speechby googleapis

TypeScript 332 Version:Current
License: Permissive (Apache-2.0)

Node.js client for Google Cloud Text-to-Speech

Support

Quality

Security

License

Reuse

WenetSpeechby wenet-e2e

Shell 332 Version:Current
License: Permissive (Apache-2.0)

A 10000+ hours dataset for Chinese speech recognition

Support

Quality

Security

License

Reuse

ECAPA-TDNNby TaoRuijie

Python 330 Version:Current
License: Permissive (MIT)

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Support

Quality

Security

License

Reuse

pysptkby r9y9

Python 329 Version:Current
License: Proprietary (Proprietary)

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Support

Quality

Security

License

Reuse

self-supervised-speech-recognitionby mailong25

Python 329 Version:Current
License: No License (No License)

speech to text with self-supervised learning based on wav2vec 2.0 framework

Support

Quality

Security

License

Reuse

AI-Personal-Voice-assistant-using-Pythonby mmirthula02

Python 328 Version:Current
License: No License (No License)

Support

Quality

Security

License

Reuse

music-spectrogram-diffusionby magenta

Jupyter Notebook 328 Version:Current
License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

Python 326 Version:Current
License: Permissive (MIT)

An application for real-time voice conversion

Support

Quality

Security

License

Reuse

speechbrain.github.ioby speechbrain

HTML 326 Version:Current
License: No License (No License)

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Support

Quality

Security

License

Reuse

Go 325 Version:Current
License: Permissive (Apache-2.0)

A Go implementation of fluent-ffmpeg

Support

Quality

Security

License

Reuse

ActionCLIPby sallymmx

Python 322 Version:Current
License: Permissive (MIT)

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Support

Quality

Security

License

Reuse

Dual-Path-RNN-Pytorchby JusperLee

Python 321 Version:Current
License: Permissive (Apache-2.0)

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Support

Quality

Security

License

Reuse

Swift 320 Version:Current
License: Strong Copyleft (GPL-3.0)

iOS app to record and transcribe speech to text with the help of the OpenAI Whisper model

Support

Quality

Security

License

Reuse

Pi-Voiceby rob-mccann

Python 319 Version:Current
License: No License (No License)

A hackday project. Run the program, speak into your microphone and hear the response from your speakers.

Support

Quality

Security

License

Reuse

NeuralSVBby MoonInTheRiver

Python 317 Version:Current
License: Strong Copyleft (GPL-3.0)

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Support

Quality

Security

License

Reuse

Casterby dictation-toolbox

Python 315 Version:Current
License: Proprietary (Proprietary)

Dragonfly-Based Voice Programming and Accessibility Toolkit

Support

Quality

Security

License

Reuse

Self-Supervised-Speech-Pretraining-and-Representation-Learningby andi611

Python 313 Version:Current
License: Permissive (MIT)

The S3PRL speech toolkit: self-supervised pre-training and representation learning of Mockingjay, TERA, A-ALBERT, APC, and more to come. With easy-to-use standard downstream evaluation scripts including phone classification, speaker recognition, and ASR. (All in Pytorch!)

Support

Quality

Security

License

Reuse

ffmprovisrby amiaopensource

HTML 313 Version:Current
License: No License (No License)

Repository of useful FFmpeg commands for archivists!

Support

Quality

Security

License

Reuse

ctc_tensorflow_exampleby igormq

Python 311 Version:Current
License: Permissive (MIT)

CTC + Tensorflow Example for ASR

Support

Quality

Security

License

Reuse

Prosodylab-Alignerby prosodylab

Python 309 Version:Current
License: Permissive (MIT)

Python interface for forced audio alignment using HTK and SoX

Support

Quality

Security

License

Reuse

tensorflow_end2end_speech_recognitionby hirofumi0810

Python 306 Version:Current
License: Permissive (MIT)

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Support

Quality

Security

License

Reuse

watson-word-watcherby dannguyen

Python 303 Version:Current
License: Proprietary (Proprietary)

A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions

Support

Quality

Security

License

Reuse

dl-for-emo-ttsby Emotional-Text-to-Speech

Jupyter Notebook 303 Version:Current
License: Permissive (MIT)

:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:

Support

Quality

Security

License

Reuse

audio-fingerprint-identifying-pythonby itspoma

Python 302 Version:Current
License: Permissive (MIT)

The Shazam-similar app, that identify the song using audio fingerprints & spectrum analysis and Fast Fourier transform

Support

Quality

Security

License

Reuse

HanTTSby junzew

Python 300 Version:Current
License: Permissive (MIT)

Chinese Text-to-Speech web service

Support

Quality

Security

License

Reuse

GST-Tacotronby KinglittleQ

Python 300 Version:Current
License: Permissive (MIT)

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Support

Quality

Security

License

Reuse

spchcatby petewarden

C 300 Version:Current
License: Weak Copyleft (MPL-2.0)

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Support

Quality

Security

License

Reuse

google-ttsby hiddentao

JavaScript 298 Version:Current
License: Permissive (MIT)

Javascript API for the Google Text-to-Speech engine

Support

Quality

Security

License

Reuse

yunBTby maysrp

PHP 298 Version:Current
License: Proprietary (Proprietary)

Aria2 FFmpeg 的多用户下载视频转码

Support

Quality

Security

License

Reuse

Rust 295 Version:Current
License: Proprietary (Proprietary)

Rust bindings for the deepspeech library

Support

Quality

Security

License

Reuse

Aggregation-Cross-Entropyby summerlvsong

Python 292 Version:Current
License: No License (No License)

Aggregation Cross-Entropy for Sequence Recognition. CVPR 2019.

Support

Quality

Security

License

Reuse

C# 292 Version:Current
License: Permissive (MIT)

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System)

Support

Quality

Security

License

Reuse

Python 291 Version:Current
License: Permissive (Apache-2.0)

Automatic Speech Recognition (ASR) - German

Support

Quality

Security

License

Reuse

MB-iSTFT-VITSby MasayaKawamura

Python 291 Version:Current
License: Permissive (Apache-2.0)

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Support

Quality

Security

License

Reuse

Python 286 Version:Current
License: No License (No License)

MicroPython libraries and examples that work out of the box on Pycom's IoT modules

Support

Quality

Security

License

Reuse

whichlangby quickwit-oss

Rust 286 Version:Current
License: Permissive (MIT)

A blazingly fast and lightweight language detection library for Rust

Support

Quality

Security

License

Reuse

gst-tacotronby syang1993

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

Python

358

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

FastDiffby Rongjiehuang

PyTorch Implementation of FastDiff (IJCAI'22)

Python

352

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

esp-srby espressif

Speech recognition

350

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

Neural_Network_Voicesby llSourcell

This is the code for "Neural Network Voices" by Siraj Raval on Youtube

Python

349

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

setkby funcwj

Tools for Speech Enhancement integrated with Kaldi

Python

347

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

TimeSideby Ircam-WAM

scalable audio processing framework and server written in Python

Python

347

Updated: 2 y ago

License: Strong Copyleft (AGPL-3.0)

Support

Quality

Security

License

Reuse

Thorsten-Voiceby thorstenMueller

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Python

344

Updated: 2 y ago

License: Permissive (CC0-1.0)

Support

Quality

Security

License

Reuse

StarGAN-Voice-Conversionby liusongxiang

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

Python

342

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

FullSubNetby haoxiangsnr

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Python

341

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Multi-Tacotron-Voice-Cloningby vlomme

Phoneme multilingual(Russian-English) voice cloning based on

Python

336

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

Voice_Converter_CycleGANby leimao

Voice Converter Using CycleGAN and Non-Parallel Data

Python

336

Updated: 5 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

surfboardby novoic

Novoic's audio feature extraction library

Python

335

Updated: 4 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

dragonflyby dictation-toolbox

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx

Python

334

Updated: 2 y ago

License: Weak Copyleft (LGPL-3.0)

Support

Quality

Security

License

Reuse

pyctcdecodeby kensho-technologies

A fast and lightweight python-based CTC beam search decoder for speech recognition.

Python

333

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

nodejs-text-to-speechby googleapis

Node.js client for Google Cloud Text-to-Speech

TypeScript

332

Updated: 3 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

WenetSpeechby wenet-e2e

A 10000+ hours dataset for Chinese speech recognition

Shell

332

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

ECAPA-TDNNby TaoRuijie

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Python

330

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

pysptkby r9y9

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Python

329

Updated: 3 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

self-supervised-speech-recognitionby mailong25

speech to text with self-supervised learning based on wav2vec 2.0 framework

Python

329

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

AI-Personal-Voice-assistant-using-Pythonby mmirthula02

Python

328

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

music-spectrogram-diffusionby magenta

Jupyter Notebook

328

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

realtime-yukarinby Hiroshiba

An application for real-time voice conversion

Python

326

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

speechbrain.github.ioby speechbrain

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

HTML

326

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

go-fluent-ffmpegby modfy

A Go implementation of fluent-ffmpeg

325

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

ActionCLIPby sallymmx

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Python

322

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Dual-Path-RNN-Pytorchby JusperLee

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Python

321

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

Whisperboardby Saik0s

iOS app to record and transcribe speech to text with the help of the OpenAI Whisper model

Swift

320

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

Pi-Voiceby rob-mccann

A hackday project. Run the program, speak into your microphone and hear the response from your speakers.

Python

319

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

NeuralSVBby MoonInTheRiver

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Python

317

Updated: 2 y ago

License: Strong Copyleft (GPL-3.0)

Support

Quality

Security

License

Reuse

Casterby dictation-toolbox

Dragonfly-Based Voice Programming and Accessibility Toolkit

Python

315

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

Self-Supervised-Speech-Pretraining-and-Representation-Learningby andi611

The S3PRL speech toolkit: self-supervised pre-training and representation learning of Mockingjay, TERA, A-ALBERT, APC, and more to come. With easy-to-use standard downstream evaluation scripts including phone classification, speaker recognition, and ASR. (All in Pytorch!)

Python

313

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

ffmprovisrby amiaopensource

Repository of useful FFmpeg commands for archivists!

HTML

313

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

ctc_tensorflow_exampleby igormq

CTC + Tensorflow Example for ASR

Python

311

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Prosodylab-Alignerby prosodylab

Python interface for forced audio alignment using HTK and SoX

Python

309

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

tensorflow_end2end_speech_recognitionby hirofumi0810

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Python

306

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

watson-word-watcherby dannguyen

A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions

Python

303

Updated: 4 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

dl-for-emo-ttsby Emotional-Text-to-Speech

:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:

Jupyter Notebook

303

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

audio-fingerprint-identifying-pythonby itspoma

The Shazam-similar app, that identify the song using audio fingerprints & spectrum analysis and Fast Fourier transform

Python

302

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

HanTTSby junzew

Chinese Text-to-Speech web service

Python

300

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

GST-Tacotronby KinglittleQ

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Python

300

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

spchcatby petewarden

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

300

Updated: 2 y ago

License: Weak Copyleft (MPL-2.0)

Support

Quality

Security

License

Reuse

google-ttsby hiddentao

Javascript API for the Google Text-to-Speech engine

JavaScript

298

Updated: 4 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

yunBTby maysrp

Aria2 FFmpeg 的多用户下载视频转码

PHP

298

Updated: 4 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

deepspeech-rsby RustAudio

Rust bindings for the deepspeech library

Rust

295

Updated: 2 y ago

License: Proprietary (Proprietary)

Support

Quality

Security

License

Reuse

Aggregation-Cross-Entropyby summerlvsong

Aggregation Cross-Entropy for Sequence Recognition. CVPR 2019.

Python

292

Updated: 2 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

TTS-Voice-Wizardby VRCWizard

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System)

292

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

deepspeech-germanby AASHISHAG

Automatic Speech Recognition (ASR) - German

Python

291

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

MB-iSTFT-VITSby MasayaKawamura

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Python

291

Updated: 2 y ago

License: Permissive (Apache-2.0)

Support

Quality

Security

License

Reuse

pycom-librariesby pycom

MicroPython libraries and examples that work out of the box on Pycom's IoT modules

Python

286

Updated: 4 y ago

License: No License (No License)

Support

Quality

Security

License

Reuse

whichlangby quickwit-oss

A blazingly fast and lightweight language detection library for Rust

Rust

286

Updated: 2 y ago

License: Permissive (MIT)

Support

Quality

Security

License

Reuse

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

Speech Libraries - Page 6

gst-tacotronby syang1993

Python 358 Version:Current License: No License (No License)

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

FastDiffby Rongjiehuang

Python 352 Version:Current License: No License (No License)

PyTorch Implementation of FastDiff (IJCAI'22)

esp-srby espressif

C 350 Version:Current License: Proprietary (Proprietary)

Speech recognition

Neural_Network_Voicesby llSourcell

Python 349 Version:Current License: No License (No License)

This is the code for "Neural Network Voices" by Siraj Raval on Youtube

setkby funcwj

Python 347 Version:Current License: Permissive (Apache-2.0)

Tools for Speech Enhancement integrated with Kaldi

TimeSideby Ircam-WAM

Python 347 Version:Current License: Strong Copyleft (AGPL-3.0)

scalable audio processing framework and server written in Python

Thorsten-Voiceby thorstenMueller

Python 344 Version:Current License: Permissive (CC0-1.0)

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

StarGAN-Voice-Conversionby liusongxiang

Python 342 Version:Current License: No License (No License)

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

FullSubNetby haoxiangsnr

Python 341 Version:Current License: Permissive (MIT)

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Multi-Tacotron-Voice-Cloningby vlomme

Python 336 Version:Current License: Proprietary (Proprietary)

Phoneme multilingual(Russian-English) voice cloning based on

Voice_Converter_CycleGANby leimao

Python 336 Version:Current License: Permissive (MIT)

Voice Converter Using CycleGAN and Non-Parallel Data

surfboardby novoic

Python 335 Version:Current License: Strong Copyleft (GPL-3.0)

Novoic's audio feature extraction library

dragonflyby dictation-toolbox

Python 334 Version:Current License: Weak Copyleft (LGPL-3.0)

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx

pyctcdecodeby kensho-technologies

Python 333 Version:Current License: Permissive (Apache-2.0)

A fast and lightweight python-based CTC beam search decoder for speech recognition.

nodejs-text-to-speechby googleapis

TypeScript 332 Version:Current License: Permissive (Apache-2.0)

Node.js client for Google Cloud Text-to-Speech

WenetSpeechby wenet-e2e

Shell 332 Version:Current License: Permissive (Apache-2.0)

A 10000+ hours dataset for Chinese speech recognition

ECAPA-TDNNby TaoRuijie

Python 330 Version:Current License: Permissive (MIT)

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

pysptkby r9y9

Python 329 Version:Current License: Proprietary (Proprietary)

A python wrapper for Speech Signal Processing Toolkit (SPTK).

self-supervised-speech-recognitionby mailong25

Python 329 Version:Current License: No License (No License)

speech to text with self-supervised learning based on wav2vec 2.0 framework

AI-Personal-Voice-assistant-using-Pythonby mmirthula02

Python 328 Version:Current License: No License (No License)

music-spectrogram-diffusionby magenta

Jupyter Notebook 328 Version:Current License: Permissive (Apache-2.0)

realtime-yukarinby Hiroshiba

Python 326 Version:Current License: Permissive (MIT)

An application for real-time voice conversion

speechbrain.github.ioby speechbrain

HTML 326 Version:Current License: No License (No License)

go-fluent-ffmpegby modfy

Go 325 Version:Current License: Permissive (Apache-2.0)

A Go implementation of fluent-ffmpeg

ActionCLIPby sallymmx

Python 322 Version:Current License: Permissive (MIT)

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Dual-Path-RNN-Pytorchby JusperLee

Python 321 Version:Current License: Permissive (Apache-2.0)

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Whisperboardby Saik0s

Swift 320 Version:Current License: Strong Copyleft (GPL-3.0)

iOS app to record and transcribe speech to text with the help of the OpenAI Whisper model

Pi-Voiceby rob-mccann

Python 358 Version:Current
License: No License (No License)

Python 352 Version:Current
License: No License (No License)

C 350 Version:Current
License: Proprietary (Proprietary)

Python 349 Version:Current
License: No License (No License)

Python 347 Version:Current
License: Permissive (Apache-2.0)

Python 347 Version:Current
License: Strong Copyleft (AGPL-3.0)

Python 344 Version:Current
License: Permissive (CC0-1.0)

Python 342 Version:Current
License: No License (No License)

Python 341 Version:Current
License: Permissive (MIT)

Python 336 Version:Current
License: Proprietary (Proprietary)

Python 336 Version:Current
License: Permissive (MIT)

Python 335 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 334 Version:Current
License: Weak Copyleft (LGPL-3.0)

Python 333 Version:Current
License: Permissive (Apache-2.0)

TypeScript 332 Version:Current
License: Permissive (Apache-2.0)

Shell 332 Version:Current
License: Permissive (Apache-2.0)

Python 330 Version:Current
License: Permissive (MIT)

Python 329 Version:Current
License: Proprietary (Proprietary)

Python 329 Version:Current
License: No License (No License)

Python 328 Version:Current
License: No License (No License)

Jupyter Notebook 328 Version:Current
License: Permissive (Apache-2.0)

Python 326 Version:Current
License: Permissive (MIT)

HTML 326 Version:Current
License: No License (No License)

Go 325 Version:Current
License: Permissive (Apache-2.0)

Python 322 Version:Current
License: Permissive (MIT)

Python 321 Version:Current
License: Permissive (Apache-2.0)

Swift 320 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 319 Version:Current
License: No License (No License)

Python 317 Version:Current
License: Strong Copyleft (GPL-3.0)

Python 315 Version:Current
License: Proprietary (Proprietary)

Python 313 Version:Current
License: Permissive (MIT)

HTML 313 Version:Current
License: No License (No License)

Python 311 Version:Current
License: Permissive (MIT)

Python 309 Version:Current
License: Permissive (MIT)

Python 306 Version:Current
License: Permissive (MIT)

Python 303 Version:Current
License: Proprietary (Proprietary)

Jupyter Notebook 303 Version:Current
License: Permissive (MIT)

Python 302 Version:Current
License: Permissive (MIT)

Python 300 Version:Current
License: Permissive (MIT)

Python 300 Version:Current
License: Permissive (MIT)

C 300 Version:Current
License: Weak Copyleft (MPL-2.0)

JavaScript 298 Version:Current
License: Permissive (MIT)

PHP 298 Version:Current
License: Proprietary (Proprietary)

Rust 295 Version:Current
License: Proprietary (Proprietary)

Python 292 Version:Current
License: No License (No License)

C# 292 Version:Current
License: Permissive (MIT)

Python 291 Version:Current
License: Permissive (Apache-2.0)

Python 291 Version:Current
License: Permissive (Apache-2.0)

Python 286 Version:Current
License: No License (No License)

Rust 286 Version:Current
License: Permissive (MIT)