deepspeech | A PyTorch implementation of DeepSpeech and DeepSpeech2 | Speech library

by MyrtleSoftware Python Version: v0.2 License: Non-SPDX

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | deepspeech Summary

deepspeech is a Python library typically used in Artificial Intelligence, Speech, Deep Learning, Pytorch applications. deepspeech has no bugs, it has no vulnerabilities, it has build file available and it has low support. However deepspeech has a Non-SPDX License. You can download it from GitHub.

A PyTorch implementation of DeepSpeech and DeepSpeech2.

Support

Quality

Security

License

Reuse

Support

deepspeech has a low active ecosystem.

It has 45 star(s) with 12 fork(s). There are 8 watchers for this library.

It had no major release in the last 12 months.

There are 4 open issues and 2 have been closed. On average issues are closed in 1 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of deepspeech is v0.2

Quality

deepspeech has 0 bugs and 0 code smells.

Security

deepspeech has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

deepspeech code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

deepspeech has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

deepspeech releases are available to install and integrate.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

deepspeech saves you 663 person hours of effort in developing the same functionality from scratch.

It has 1537 lines of code, 169 functions and 45 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed deepspeech and discovered the below as its top functions. This is intended to give you an instant insight into deepspeech implemented functionality, and help decide if they suit your requirements.

Return a trained model
Save the model to disk
Get the path to the last state dict
Return a dictionary of all state dictionaries
Load all transcription files
Processes an audio file
Parse a transcription file
Sort the paths by duration
Train the model
Return a function to clip gradients
Log end of training
Log a training step
Calculate the checksum of a directory
Calculate the checksum of a file
Parse an algorithm
Evaluate the model
Evaluate the loss function
Download the tarball of the dataset
Check integrity of a subset
Save a trained model
Calculate the layer feature size
Get argument parser
Get a dev data loader
Return a training data loader
Return the decoder from the given arguments
Calculate the output length

Get all kandi verified functions for this library.

deepspeech Key Features

No Key Features are available at this moment for deepspeech.

deepspeech Examples and Code Snippets

Myrtle Deep Speech,WER

Python

Lines of Code : 8

License : Non-SPDX (NOASSERTION)

Copy

sum_edits = sum([edit_distance(target, predict)
                 for target, predict in zip(targets, predictions)])
sum_lens = sum([len(target) for target in targets])
WER = (1.0/N) * (sum_edits / sum_lens)

>>> (1.0/3) * ((1.0/1) + (2.0/4)

Myrtle Deep Speech,Examples,Inference

Python

Lines of Code : 8

License : Non-SPDX (NOASSERTION)

Copy

deepspeech ds1 \
           --state_dict_path $MODEL_PATH \
           --log_file \
           --decoder greedy \
           --train_subsets \
           --dev_log wer \
           --dev_subsets dev-clean \
           --dev_batch_size 1

Myrtle Deep Speech,Running

Python

Lines of Code : 4

License : Non-SPDX (NOASSERTION)

Copy

make build

sudo docker run --runtime=nvidia --shm-size 512M -p 9999:9999 deepspeech

deepspeech --help

import deepspeech

Community Discussions

Trending Discussions on deepspeech

Adafruit I2S MEMS microphone is not working with voice activity detection system

Can't set a hotword with deepspeech

Subprocess call error while calling generate_lm.py of DeepSpeech

(0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech

['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1

Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]

The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'

while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported

How to change microphone sample rate to 16000 on linux?

DeepSpeech failed to learn Persian language

QUESTION

Adafruit I2S MEMS microphone is not working with voice activity detection system

Asked 2022-Jan-26 at 13:36

I am trying to make a speech to text system using raspberry pi. There are many problems with VAD. I am using DeepCpeech's VAD script. Adafruit I2S MEMS microphone accepts only 32-bit PCM audio. So I modified the script to record 32-bit audio and then convert it to 16 bit for DeepSpeech's processing. Frames generation and conversation parts are below:

...

ANSWER

Answered 2022-Jan-26 at 13:36

I searched for DeepCpeech's VAD script and found it. The problem is connected with the webrtcvad. The webrtcvad VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. So you need to convert the 32-bit frame to 16-bit (I am about PyAudio output frame) to process webrtcvad.is_speech(). I changed and it worked fine.

Source https://stackoverflow.com/questions/70838837

QUESTION

Can't set a hotword with deepspeech

Asked 2022-Jan-23 at 23:41

I tried to set my hotword for deepspeech on my raspberry pi and got a really long error when I sent this in terminal:

python3 /home/pi/DeepSpeech_RaspberryPi4_Hotword/mic_streaming.py --keywords jarvis

Error

I don't know how to fix this and didn't find anything anywhere else.

...

ANSWER

Answered 2021-Nov-25 at 03:10

these errors are not related to DeepSpeech, they're related to ALSA, which is the sound subsystem for Linux. By the looks of the error, your system is having trouble accessing the microphone.

I would recommend running several ALSA tests, such as;

arecord -l

This should give you a list of recording devices that are detected, such as:

Source https://stackoverflow.com/questions/69816953

QUESTION

Subprocess call error while calling generate_lm.py of DeepSpeech

Asked 2021-Dec-06 at 03:33

I am trying to build customised scorer (language model) for speech-to-text using DeepSpeech in colab. While calling generate_lm.py getting this error:

...

ANSWER

Answered 2021-Dec-06 at 03:33

Able to find a solution for the above question. Successfully created language model after reducing the value of top_k to 15000. My phrases file has about 42000 entries only. We have to adjust top_k value based on the number of phrases in our collection. top_k parameter says - this much of less frequent phrases will be removed before processing.

Source https://stackoverflow.com/questions/70043586

QUESTION

(0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech

Asked 2021-Sep-25 at 18:12

I am using below command to start the training of deepspeech model

...

ANSWER

Answered 2021-Sep-25 at 18:12

Following worked for me

Go to

Source https://stackoverflow.com/questions/69328818

QUESTION

['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1

Asked 2021-Sep-25 at 14:09

During the build of lm binay to create scorer doe deepspeech model I was getting the following error again and again

...

ANSWER

Answered 2021-Sep-25 at 14:09

Following worked for me Go to

Source https://stackoverflow.com/questions/69326923

QUESTION

Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]

Asked 2021-Sep-24 at 13:04

Getting following error when trying to excecute

...

ANSWER

Answered 2021-Sep-23 at 07:59

If i try it as below it worked fine.

Source https://stackoverflow.com/questions/69296114

QUESTION

The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'

Asked 2021-Jun-13 at 13:29

so a part of my code is

...

ANSWER

Answered 2021-Jun-13 at 13:29

You have to install path provider package by running flutter pub add path_provider in your terminal. If you already installed it. check whether you are importing it to your file.

Source https://stackoverflow.com/questions/67958607

QUESTION

while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported

Asked 2021-May-26 at 00:07

commands i used

...

ANSWER

Answered 2021-May-26 at 00:07

You are using wget to pull down a .whl file that was built for a different version of Python. You are pulling down

ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl

but are running Python 3.7. You need a different .whl file, such as:

ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl

This is available here from the DeepSpeech releases page on GitHub.

Source https://stackoverflow.com/questions/67671706

QUESTION

How to change microphone sample rate to 16000 on linux?

Asked 2021-May-18 at 13:17

I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.

The microphone info is listed like this by pyaudio:

...

ANSWER

Answered 2021-Jan-09 at 16:47

So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.

The file is located here: /etc/pulse/daemon.conf . We can open and edit this file on Raspberian using sudo vi daemon.conf. Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.

If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.

Source https://stackoverflow.com/questions/65599012

QUESTION

DeepSpeech failed to learn Persian language

Asked 2021-May-15 at 08:12

I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.

My configurations are as follows:

Batch size = 2 (due to cuda OOM)
Learning rate = 0.0001
Num. neurons = 2048
Num. epochs = 50
Train set size = 7500
Test and Dev sets size = 5000
dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)

Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.

The predictions are all empty strings at the end of the process. Any ideas how to improve the model?

...

ANSWER

Answered 2021-May-11 at 14:02

maybe you need to decrease learning rate or use a learning rate scheduler.

Source https://stackoverflow.com/questions/67347479

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install deepspeech

You can download it from GitHub.
You can use deepspeech like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: