DeepSpeech | A Speech Toolkit based on PaddlePaddle | Speech library

by PaddlePaddle Python Version: v2.1.1 License: Apache-2.0

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | DeepSpeech Summary

DeepSpeech is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Speech applications. DeepSpeech has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

A Speech Toolkit based on PaddlePaddle.

Support

Quality

Security

License

Reuse

Support

DeepSpeech has a medium active ecosystem.

It has 1470 star(s) with 420 fork(s). There are 119 watchers for this library.

It had no major release in the last 12 months.

There are 11 open issues and 541 have been closed. On average issues are closed in 40 days. There are 4 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of DeepSpeech is v2.1.1

Quality

DeepSpeech has 0 bugs and 36 code smells.

Security

DeepSpeech has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

DeepSpeech code analysis shows 0 unresolved vulnerabilities.

There are 8 security hotspots that need review.

License

DeepSpeech is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

DeepSpeech releases are available to install and integrate.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of DeepSpeech

Get all kandi verified functions for this library.

DeepSpeech Key Features

No Key Features are available at this moment for DeepSpeech.

DeepSpeech Examples and Code Snippets

Simple Youtube summarizer,Download Mozzila deepspeech model,Output:

Python

Lines of Code : 45

License : Permissive (Apache-2.0)

Copy

Volodymyrs-MacBook-Pro:YoutubeSummarizer roaming$ cd /Users/Volodymyr/Projects/YoutubeSummarizer ; env "PYTHONIOENCODING=UTF-8" "PYTHONUNBUFFERED=1" /usr/local/bin/python3 /Users/Volodymyr/.vscode/extensions/ms-python.python-2018.11.0/pythonFiles/exp

DeepSpeech German,Results

Python

Lines of Code : 42

License : No License

Copy

I Test of Epoch 19 - WER: 0.667205, loss: 69.56213065852289, mean edit distance: 0.287312
I --------------------------------------------------------------------------------
I WER: 0.333333, loss: 0.544307, mean edit distance: 0.200000
I  - src: "p c

deepspeech-cleaner,Results,German Results: [de]

Python

Lines of Code : 20

License : Permissive (MIT)

Copy

src: “eine neue”
res: “einem neuen leben und neuen pflichten entgegen”

src: “ausverkauft”
res: “aus der fast”

src: “riesengebirge”
res: “riesen der berge”

src: “beerdigung”
res: “wer die un”

src: “besuchstermin”
res: “es wuchs der”

src: “weiterm

Community Discussions

Trending Discussions on DeepSpeech

The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'

while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported

How to change microphone sample rate to 16000 on linux?

DeepSpeech failed to learn Persian language

Train DeepSpeech on Common Voice dataset gives error on gpu

Want to activate a virtual environment from terminal with Octave/Matlab

Getting blank results from deepspeech with portaudio in Rust

Build TFLite without ruy for android

How to measure the context switching overhead of a very large program?

How to set an environment variable on ubuntu 18.04

QUESTION

The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'

Asked 2021-Jun-13 at 13:29

so a part of my code is

...

ANSWER

Answered 2021-Jun-13 at 13:29

You have to install path provider package by running flutter pub add path_provider in your terminal. If you already installed it. check whether you are importing it to your file.

Source https://stackoverflow.com/questions/67958607

QUESTION

while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported

Asked 2021-May-26 at 00:07

commands i used

...

ANSWER

Answered 2021-May-26 at 00:07

You are using wget to pull down a .whl file that was built for a different version of Python. You are pulling down

ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl

but are running Python 3.7. You need a different .whl file, such as:

ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl

This is available here from the DeepSpeech releases page on GitHub.

Source https://stackoverflow.com/questions/67671706

QUESTION

How to change microphone sample rate to 16000 on linux?

Asked 2021-May-18 at 13:17

I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.

The microphone info is listed like this by pyaudio:

...

ANSWER

Answered 2021-Jan-09 at 16:47

So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.

The file is located here: /etc/pulse/daemon.conf . We can open and edit this file on Raspberian using sudo vi daemon.conf. Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.

If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.

Source https://stackoverflow.com/questions/65599012

QUESTION

DeepSpeech failed to learn Persian language

Asked 2021-May-15 at 08:12

I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.

My configurations are as follows:

Batch size = 2 (due to cuda OOM)
Learning rate = 0.0001
Num. neurons = 2048
Num. epochs = 50
Train set size = 7500
Test and Dev sets size = 5000
dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)

Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.

The predictions are all empty strings at the end of the process. Any ideas how to improve the model?

...

ANSWER

Answered 2021-May-11 at 14:02

maybe you need to decrease learning rate or use a learning rate scheduler.

Source https://stackoverflow.com/questions/67347479

QUESTION

Train DeepSpeech on Common Voice dataset gives error on gpu

Asked 2021-Apr-27 at 08:15

I'm trying to train DeepSpeech model on Common Voice dataset as it's stated in documentation. But it gives the following error:

...

ANSWER

Answered 2021-Apr-24 at 00:53

I've seen a similar error posted on the DeepSpeech Discourse and the issue there was the CUDA installation.

What is the value of your $LD_LIBRARY_PATH environment variable?

You can find this by doing:

Source https://stackoverflow.com/questions/67198198

QUESTION

Want to activate a virtual environment from terminal with Octave/Matlab

Asked 2021-Apr-18 at 14:28

I would like to execute a bash command to activate a virtual environment with Octave using Linux. What I actually want to do is run DeepSpeech using Octave/Matlab.

The command I want to use is source $HOME/tmp/deepspeech-venv/bin/activate

The line of code I tried on my own is system("source $HOME/tmp/deepspeech-venv/bin/activate")

And the output I'm getting is sh: 1: source: not found

I saw this answer on a post and tried this command setenv('PATH', ['/source $HOME/tmp/deepspeech-venv/bin/activate', pathsep, getenv('PATH')]) but with no help it returned the same error.

...

ANSWER

Answered 2021-Apr-11 at 16:00

It's not completely clear from your question, but I'm assuming you're trying to do is run python commands within octave/matlab, and you'd like to use a python virtual environment for that.

Unfortunately, when you run a system command from within octave, what most likely happens is that this creates a subshell to execute your command, which is discarded once the command has finished.

You have several options to rectify this, but I think the easiest one would be to activate the python virtual environment first, and run your octave instance from within that environment. This then inherits all environmental variables as they existed when octave was run. You can confirm this by doing getenv( 'VIRTUAL_ENV' ).

If that's not an option, then you could make sure that all system commands intended to run python scripts, are prefixed with a call to the virtual environment first (e.g. something like system( 'source ./my/venv/activate; python3 ./myscript.py') ).

Alternatively, you can try to recreate the virtual environment from its exported variables manually, using the setenv command.

Source https://stackoverflow.com/questions/67044176

QUESTION

Getting blank results from deepspeech with portaudio in Rust

Asked 2021-Apr-16 at 13:18

I'm attempting to use portaudio with deepspeech (both using Rust bindings) to create a speech recognition program. I can see the data when I log the buffer, but when attempting to use intermediate_decode, I always get blank results. I'm assuming I'm either configuring the audio incorrectly, or setting up the model incorrectly. I've spent a lot of time just getting to this point (fairly new to handling audio) and any help would be appreciated!

This is the full source code:

...

ANSWER

Answered 2021-Apr-16 at 13:18

It turns out the issue was with the process_audio callback. I needed to move the initialization of the model's stream outside of the callback.

Source https://stackoverflow.com/questions/67105792

QUESTION

Build TFLite without ruy for android

Asked 2021-Apr-15 at 22:28

I'm trying to analyze DeepSpeech's (a third-party library that uses TensorFlow and TFLite) performance on android devices and had built it successfully as they mentioned in their docs.

After I read the source codes, I found out that tensorflow uses Google's ruy as the back-end for matrix operations for TFLite. But I also found out that there is support for different GEMM libraries like Eigen and GEMMLOWP in the TFLite source codes.

But I was unable to found a way to use them to build TFLite.

How can I use them instead of ruy?

My build command is almost the same as the one in DeepSpeech docs.

...

ANSWER

Answered 2021-Apr-15 at 22:28

I haven't tested with the DeepSpeech library compilation but the following bazel flag can disable the RUY to enable the other GEMM libraries for the TensorFlow Lite library compilation through the bazel tool.

Source https://stackoverflow.com/questions/67116064

QUESTION

How to measure the context switching overhead of a very large program?

Asked 2021-Feb-22 at 07:33

I am trying to measure the impact of CPU scheduler on a large AI program (https://github.com/mozilla/DeepSpeech).

By using strace, I can see that it uses a lot of (~200) CPU threads.

I have tried using Linux Perf to measure this, but I have only been able to find the number of context switch events, not the overhead of them.

What I am trying to achieve is the total CPU core-seconds spent on context switching. Since it is a pretty large program, I would prefer non-invasive tools to avoid having to edit the source code of this program.

How can I do this?

...

ANSWER

Answered 2021-Feb-22 at 07:33

Are you sure most of those 200 threads are actually waiting to run at the same time, not waiting for data from a system call? I guess you can tell from perf stat that context-switches are actually pretty high, but part of the question is whether they're high for the threads doing the critical work.

The cost of a context-switch is reflected in cache misses once a thread is running again. (And stopping OoO exec from finding as much ILP right at the interrupt boundary). This cost is more significant than the cost of the kernel code that saves/restores registers. So even if there was a way to measure how much time the CPUs spent in kernel context-switch code (possible with perf record sampling profiler as long as your perf_event_paranoid setting allows recording kernel addresses), that wouldn't be an accurate reflection of the true cost.

Even making a system call has a similar (but lower and more frequent) performance cost from serializing OoO exec, as well as disturbing caches (and TLB). There's a useful characterization of this on real modern CPUs (from 2010) in a paper by Livio & Stumm, especially the graph on the first page of IPC (instructions per cycle) dropping after a system call returns, and taking time to recover: FlexSC: Flexible System Call Scheduling with Exception-Less System Calls. (Conference presentation: https://www.usenix.org/conference/osdi10/flexsc-flexible-system-call-scheduling-exception-less-system-calls)

You might estimate context-switch cost by running the program on a system with enough cores not to need to context-switch much at all (e.g. a big many-core Xeon or Epyc), vs. on fewer cores but with the same CPUs / caches / inter-core latency and so on. So, on the same system with taskset --cpu-list 0-8 ./program to limit how many cores it can use.

Look at the total user-space CPU-seconds used: the amount higher is the extra amount of CPU time needed because of slowdowns from context switched. The wall-clock time will of course be higher when the same work has to compete for fewer cores, but perf stat includes a "task-clock" output which tells you a total time in CPU-milliseconds that threads of your process spent on CPUs. That would be constant for the same amount of work, with perfect scaling to more threads, and/or to the same threads competing for more / fewer cores.

But that would tell you about context-switch overhead on that big system with big caches and higher latency between cores than on a small desktop.

Source https://stackoverflow.com/questions/66311172

QUESTION

How to set an environment variable on ubuntu 18.04

Asked 2021-Jan-07 at 17:30

According to the deespeech documentation to avoid this error related to tensorflow I have to proceed as follows but I don't know how to set an environment variable. Could you explain me the steps to follow? I use ubuntu 18.04.

...

ANSWER

Answered 2021-Jan-07 at 17:30

You can type TF_FORCE_GPU_ALLOW_GROWTH=true in you terminal before running your program.

You can also add export TF_FORCE_GPU_ALLOW_GROWTH=true in ~/.bashrc and it will load automatically when new terminal will open

Source https://stackoverflow.com/questions/65617061

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install DeepSpeech

The base environment in this page is. If you want to set up PaddleSpeech in other environment, please see the ASR installation and TTS installation documents for all the alternatives.
Ubuntu 16.04
python>=3.7
paddlepaddle==2.1.2
Note: the current links to English ASR and English TTS are not valid. Just a quick test of our functions: English ASR and English TTS by typing message or upload your own audio file. Developers can have a try of our model with only a few lines of code.
Download LJSpeech-1.1 from the ljspeech official website, our prepared durations for fastspeech2 ljspeech_alignment.
The pretrained models are seperated into two parts: fastspeech2_nosil_ljspeech_ckpt and pwg_ljspeech_ckpt. Please download then unzip to ./model/fastspeech2 and ./model/pwg respectively.
Assume your path to the dataset is ~/datasets/LJSpeech-1.1 and ./ljspeech_alignment accordingly, preprocess your data and then use our pretrained model to synthesize:

Support

You are warmly welcome to submit questions in discussions and bug reports in issues! Also, we highly appreciate if you would like to contribute to this project!.

Find more information at: