deepspeech | A PyTorch implementation of DeepSpeech and DeepSpeech2 | Speech library

 by   MyrtleSoftware Python Version: v0.2 License: Non-SPDX

kandi X-RAY | deepspeech Summary

kandi X-RAY | deepspeech Summary

deepspeech is a Python library typically used in Artificial Intelligence, Speech, Deep Learning, Pytorch applications. deepspeech has no bugs, it has no vulnerabilities, it has build file available and it has low support. However deepspeech has a Non-SPDX License. You can download it from GitHub.

A PyTorch implementation of DeepSpeech and DeepSpeech2.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              deepspeech has a low active ecosystem.
              It has 45 star(s) with 12 fork(s). There are 8 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 4 open issues and 2 have been closed. On average issues are closed in 1 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of deepspeech is v0.2

            kandi-Quality Quality

              deepspeech has 0 bugs and 0 code smells.

            kandi-Security Security

              deepspeech has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              deepspeech code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              deepspeech has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              deepspeech releases are available to install and integrate.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              deepspeech saves you 663 person hours of effort in developing the same functionality from scratch.
              It has 1537 lines of code, 169 functions and 45 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed deepspeech and discovered the below as its top functions. This is intended to give you an instant insight into deepspeech implemented functionality, and help decide if they suit your requirements.
            • Return a trained model
            • Save the model to disk
            • Get the path to the last state dict
            • Return a dictionary of all state dictionaries
            • Load all transcription files
            • Processes an audio file
            • Parse a transcription file
            • Sort the paths by duration
            • Train the model
            • Return a function to clip gradients
            • Log end of training
            • Log a training step
            • Calculate the checksum of a directory
            • Calculate the checksum of a file
            • Parse an algorithm
            • Evaluate the model
            • Evaluate the loss function
            • Download the tarball of the dataset
            • Check integrity of a subset
            • Save a trained model
            • Calculate the layer feature size
            • Get argument parser
            • Get a dev data loader
            • Return a training data loader
            • Return the decoder from the given arguments
            • Calculate the output length
            Get all kandi verified functions for this library.

            deepspeech Key Features

            No Key Features are available at this moment for deepspeech.

            deepspeech Examples and Code Snippets

            Myrtle Deep Speech,WER
            Pythondot img1Lines of Code : 8dot img1License : Non-SPDX (NOASSERTION)
            copy iconCopy
            sum_edits = sum([edit_distance(target, predict)
                             for target, predict in zip(targets, predictions)])
            sum_lens = sum([len(target) for target in targets])
            WER = (1.0/N) * (sum_edits / sum_lens)
            
            >>> (1.0/3) * ((1.0/1) + (2.0/4)   
            Myrtle Deep Speech,Examples,Inference
            Pythondot img2Lines of Code : 8dot img2License : Non-SPDX (NOASSERTION)
            copy iconCopy
            deepspeech ds1 \
                       --state_dict_path $MODEL_PATH \
                       --log_file \
                       --decoder greedy \
                       --train_subsets \
                       --dev_log wer \
                       --dev_subsets dev-clean \
                       --dev_batch_size 1
              
            Myrtle Deep Speech,Running
            Pythondot img3Lines of Code : 4dot img3License : Non-SPDX (NOASSERTION)
            copy iconCopy
            make build
            
            sudo docker run --runtime=nvidia --shm-size 512M -p 9999:9999 deepspeech
            
            deepspeech --help
            
            import deepspeech
              

            Community Discussions

            QUESTION

            Adafruit I2S MEMS microphone is not working with voice activity detection system
            Asked 2022-Jan-26 at 13:36

            I am trying to make a speech to text system using raspberry pi. There are many problems with VAD. I am using DeepCpeech's VAD script. Adafruit I2S MEMS microphone accepts only 32-bit PCM audio. So I modified the script to record 32-bit audio and then convert it to 16 bit for DeepSpeech's processing. Frames generation and conversation parts are below:

            ...

            ANSWER

            Answered 2022-Jan-26 at 13:36

            I searched for DeepCpeech's VAD script and found it. The problem is connected with the webrtcvad. The webrtcvad VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. So you need to convert the 32-bit frame to 16-bit (I am about PyAudio output frame) to process webrtcvad.is_speech(). I changed and it worked fine.

            Source https://stackoverflow.com/questions/70838837

            QUESTION

            Can't set a hotword with deepspeech
            Asked 2022-Jan-23 at 23:41

            I tried to set my hotword for deepspeech on my raspberry pi and got a really long error when I sent this in terminal:

            python3 /home/pi/DeepSpeech_RaspberryPi4_Hotword/mic_streaming.py --keywords jarvis

            Error

            I don't know how to fix this and didn't find anything anywhere else.

            ...

            ANSWER

            Answered 2021-Nov-25 at 03:10

            these errors are not related to DeepSpeech, they're related to ALSA, which is the sound subsystem for Linux. By the looks of the error, your system is having trouble accessing the microphone.

            I would recommend running several ALSA tests, such as;

            arecord -l

            This should give you a list of recording devices that are detected, such as:

            Source https://stackoverflow.com/questions/69816953

            QUESTION

            Subprocess call error while calling generate_lm.py of DeepSpeech
            Asked 2021-Dec-06 at 03:33

            I am trying to build customised scorer (language model) for speech-to-text using DeepSpeech in colab. While calling generate_lm.py getting this error:

            ...

            ANSWER

            Answered 2021-Dec-06 at 03:33

            Able to find a solution for the above question. Successfully created language model after reducing the value of top_k to 15000. My phrases file has about 42000 entries only. We have to adjust top_k value based on the number of phrases in our collection. top_k parameter says - this much of less frequent phrases will be removed before processing.

            Source https://stackoverflow.com/questions/70043586

            QUESTION

            (0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech
            Asked 2021-Sep-25 at 18:12

            I am using below command to start the training of deepspeech model

            ...

            ANSWER

            Answered 2021-Sep-25 at 18:12

            Following worked for me

            Go to

            Source https://stackoverflow.com/questions/69328818

            QUESTION

            ['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1
            Asked 2021-Sep-25 at 14:09

            During the build of lm binay to create scorer doe deepspeech model I was getting the following error again and again

            ...

            ANSWER

            Answered 2021-Sep-25 at 14:09

            Following worked for me Go to

            Source https://stackoverflow.com/questions/69326923

            QUESTION

            Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]
            Asked 2021-Sep-24 at 13:04

            Getting following error when trying to excecute

            ...

            ANSWER

            Answered 2021-Sep-23 at 07:59

            If i try it as below it worked fine.

            Source https://stackoverflow.com/questions/69296114

            QUESTION

            The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'
            Asked 2021-Jun-13 at 13:29

            so a part of my code is

            ...

            ANSWER

            Answered 2021-Jun-13 at 13:29

            You have to install path provider package by running flutter pub add path_provider in your terminal. If you already installed it. check whether you are importing it to your file.

            Source https://stackoverflow.com/questions/67958607

            QUESTION

            while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported
            Asked 2021-May-26 at 00:07

            commands i used

            ...

            ANSWER

            Answered 2021-May-26 at 00:07

            You are using wget to pull down a .whl file that was built for a different version of Python. You are pulling down

            ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl

            but are running Python 3.7. You need a different .whl file, such as:

            ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl

            This is available here from the DeepSpeech releases page on GitHub.

            Source https://stackoverflow.com/questions/67671706

            QUESTION

            How to change microphone sample rate to 16000 on linux?
            Asked 2021-May-18 at 13:17

            I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.

            The microphone info is listed like this by pyaudio:

            ...

            ANSWER

            Answered 2021-Jan-09 at 16:47

            So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.

            The file is located here: /etc/pulse/daemon.conf . We can open and edit this file on Raspberian using sudo vi daemon.conf. Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.

            If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.

            Source https://stackoverflow.com/questions/65599012

            QUESTION

            DeepSpeech failed to learn Persian language
            Asked 2021-May-15 at 08:12

            I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.

            My configurations are as follows:

            1. Batch size = 2 (due to cuda OOM)
            2. Learning rate = 0.0001
            3. Num. neurons = 2048
            4. Num. epochs = 50
            5. Train set size = 7500
            6. Test and Dev sets size = 5000
            7. dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)

            Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.

            The predictions are all empty strings at the end of the process. Any ideas how to improve the model?

            ...

            ANSWER

            Answered 2021-May-11 at 14:02

            maybe you need to decrease learning rate or use a learning rate scheduler.

            Source https://stackoverflow.com/questions/67347479

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install deepspeech

            You can download it from GitHub.
            You can use deepspeech like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/MyrtleSoftware/deepspeech.git

          • CLI

            gh repo clone MyrtleSoftware/deepspeech

          • sshUrl

            git@github.com:MyrtleSoftware/deepspeech.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link