DeepSpeech | open source | Speech library

by mozilla C++ Version: 0.10.0a3 License: MPL-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | DeepSpeech Summary

DeepSpeech is a C++ library typically used in Artificial Intelligence, Speech, Deep Learning, Tensorflow applications. DeepSpeech has no bugs, it has no vulnerabilities, it has a Weak Copyleft License and it has medium support. You can download it from GitHub.

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Support

Quality

Security

License

Reuse

Support

DeepSpeech has a medium active ecosystem.

It has 22108 star(s) with 3761 fork(s). There are 658 watchers for this library.

It had no major release in the last 12 months.

There are 110 open issues and 1977 have been closed. On average issues are closed in 97 days. There are 19 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of DeepSpeech is 0.10.0a3

Quality

DeepSpeech has 0 bugs and 0 code smells.

Security

DeepSpeech has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

DeepSpeech code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

DeepSpeech is licensed under the MPL-2.0 License. This license is Weak Copyleft.

Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.

Reuse

DeepSpeech releases are available to install and integrate.

It has 9524 lines of code, 659 functions and 119 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of DeepSpeech

Get all kandi verified functions for this library.

DeepSpeech Key Features

No Key Features are available at this moment for DeepSpeech.

DeepSpeech Examples and Code Snippets

No Code Snippets are available at this moment for DeepSpeech.

Community Discussions

Trending Discussions on DeepSpeech

Adafruit I2S MEMS microphone is not working with voice activity detection system

Can't set a hotword with deepspeech

Subprocess call error while calling generate_lm.py of DeepSpeech

(0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech

['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1

Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]

The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'

while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported

How to change microphone sample rate to 16000 on linux?

DeepSpeech failed to learn Persian language

QUESTION

Adafruit I2S MEMS microphone is not working with voice activity detection system

Asked 2022-Jan-26 at 13:36

I am trying to make a speech to text system using raspberry pi. There are many problems with VAD. I am using DeepCpeech's VAD script. Adafruit I2S MEMS microphone accepts only 32-bit PCM audio. So I modified the script to record 32-bit audio and then convert it to 16 bit for DeepSpeech's processing. Frames generation and conversation parts are below:

...

ANSWER

Answered 2022-Jan-26 at 13:36

I searched for DeepCpeech's VAD script and found it. The problem is connected with the webrtcvad. The webrtcvad VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. So you need to convert the 32-bit frame to 16-bit (I am about PyAudio output frame) to process webrtcvad.is_speech(). I changed and it worked fine.

Source https://stackoverflow.com/questions/70838837

QUESTION

Can't set a hotword with deepspeech

Asked 2022-Jan-23 at 23:41

I tried to set my hotword for deepspeech on my raspberry pi and got a really long error when I sent this in terminal:

python3 /home/pi/DeepSpeech_RaspberryPi4_Hotword/mic_streaming.py --keywords jarvis

Error

I don't know how to fix this and didn't find anything anywhere else.

...

ANSWER

Answered 2021-Nov-25 at 03:10

these errors are not related to DeepSpeech, they're related to ALSA, which is the sound subsystem for Linux. By the looks of the error, your system is having trouble accessing the microphone.

I would recommend running several ALSA tests, such as;

arecord -l

This should give you a list of recording devices that are detected, such as:

Source https://stackoverflow.com/questions/69816953

QUESTION

Subprocess call error while calling generate_lm.py of DeepSpeech

Asked 2021-Dec-06 at 03:33

I am trying to build customised scorer (language model) for speech-to-text using DeepSpeech in colab. While calling generate_lm.py getting this error:

...

ANSWER

Answered 2021-Dec-06 at 03:33

Able to find a solution for the above question. Successfully created language model after reducing the value of top_k to 15000. My phrases file has about 42000 entries only. We have to adjust top_k value based on the number of phrases in our collection. top_k parameter says - this much of less frequent phrases will be removed before processing.

Source https://stackoverflow.com/questions/70043586

QUESTION

(0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech

Asked 2021-Sep-25 at 18:12

I am using below command to start the training of deepspeech model

...

ANSWER

Answered 2021-Sep-25 at 18:12

Following worked for me

Go to

Source https://stackoverflow.com/questions/69328818

QUESTION

['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1

Asked 2021-Sep-25 at 14:09

During the build of lm binay to create scorer doe deepspeech model I was getting the following error again and again

...

ANSWER

Answered 2021-Sep-25 at 14:09

Following worked for me Go to

Source https://stackoverflow.com/questions/69326923

QUESTION

Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]

Asked 2021-Sep-24 at 13:04

Getting following error when trying to excecute

...

ANSWER

Answered 2021-Sep-23 at 07:59

If i try it as below it worked fine.

Source https://stackoverflow.com/questions/69296114

QUESTION

The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'

Asked 2021-Jun-13 at 13:29

so a part of my code is

...

ANSWER

Answered 2021-Jun-13 at 13:29

You have to install path provider package by running flutter pub add path_provider in your terminal. If you already installed it. check whether you are importing it to your file.

Source https://stackoverflow.com/questions/67958607

QUESTION

while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported

Asked 2021-May-26 at 00:07

commands i used

...

ANSWER

Answered 2021-May-26 at 00:07

You are using wget to pull down a .whl file that was built for a different version of Python. You are pulling down

ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl

but are running Python 3.7. You need a different .whl file, such as:

ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl

This is available here from the DeepSpeech releases page on GitHub.

Source https://stackoverflow.com/questions/67671706

QUESTION

How to change microphone sample rate to 16000 on linux?

Asked 2021-May-18 at 13:17

I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.

The microphone info is listed like this by pyaudio:

...

ANSWER

Answered 2021-Jan-09 at 16:47

So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.

The file is located here: /etc/pulse/daemon.conf . We can open and edit this file on Raspberian using sudo vi daemon.conf. Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.

If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.

Source https://stackoverflow.com/questions/65599012

QUESTION

DeepSpeech failed to learn Persian language

Asked 2021-May-15 at 08:12

I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.

My configurations are as follows:

Batch size = 2 (due to cuda OOM)
Learning rate = 0.0001
Num. neurons = 2048
Num. epochs = 50
Train set size = 7500
Test and Dev sets size = 5000
dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)

Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.

The predictions are all empty strings at the end of the process. Any ideas how to improve the model?

...

ANSWER

Answered 2021-May-11 at 14:02

maybe you need to decrease learning rate or use a learning rate scheduler.

Source https://stackoverflow.com/questions/67347479

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install DeepSpeech

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: