DeepSpeech | open source | Speech library

 by   mozilla C++ Version: 0.10.0a3 License: MPL-2.0

kandi X-RAY | DeepSpeech Summary

DeepSpeech is a C++ library typically used in Artificial Intelligence, Speech, Deep Learning, Tensorflow applications. DeepSpeech has no bugs, it has no vulnerabilities, it has a Weak Copyleft License and it has medium support. You can download it from GitHub.
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
    Support
      Quality
        Security
          License
            Reuse
            Support
              Quality
                Security
                  License
                    Reuse

                      kandi-support Support

                        summary
                        DeepSpeech has a medium active ecosystem.
                        summary
                        It has 21362 star(s) with 3670 fork(s). There are 656 watchers for this library.
                        summary
                        It had no major release in the last 12 months.
                        summary
                        There are 109 open issues and 1974 have been closed. On average issues are closed in 95 days. There are 18 open pull requests and 0 closed requests.
                        summary
                        It has a neutral sentiment in the developer community.
                        summary
                        The latest version of DeepSpeech is 0.10.0a3
                        DeepSpeech Support
                          Best in #Speech
                            Average in #Speech
                            DeepSpeech Support
                              Best in #Speech
                                Average in #Speech

                                  kandi-Quality Quality

                                    summary
                                    DeepSpeech has 0 bugs and 0 code smells.
                                    DeepSpeech Quality
                                      Best in #Speech
                                        Average in #Speech
                                        DeepSpeech Quality
                                          Best in #Speech
                                            Average in #Speech

                                              kandi-Security Security

                                                summary
                                                DeepSpeech has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
                                                summary
                                                DeepSpeech code analysis shows 0 unresolved vulnerabilities.
                                                summary
                                                There are 0 security hotspots that need review.
                                                DeepSpeech Security
                                                  Best in #Speech
                                                    Average in #Speech
                                                    DeepSpeech Security
                                                      Best in #Speech
                                                        Average in #Speech

                                                          kandi-License License

                                                            summary
                                                            DeepSpeech is licensed under the MPL-2.0 License. This license is Weak Copyleft.
                                                            summary
                                                            Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.
                                                            DeepSpeech License
                                                              Best in #Speech
                                                                Average in #Speech
                                                                DeepSpeech License
                                                                  Best in #Speech
                                                                    Average in #Speech

                                                                      kandi-Reuse Reuse

                                                                        summary
                                                                        DeepSpeech releases are available to install and integrate.
                                                                        summary
                                                                        It has 9524 lines of code, 659 functions and 119 files.
                                                                        summary
                                                                        It has high code complexity. Code complexity directly impacts maintainability of the code.
                                                                        DeepSpeech Reuse
                                                                          Best in #Speech
                                                                            Average in #Speech
                                                                            DeepSpeech Reuse
                                                                              Best in #Speech
                                                                                Average in #Speech
                                                                                  Top functions reviewed by kandi - BETA
                                                                                  kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
                                                                                  Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here
                                                                                  Get all kandi verified functions for this library.
                                                                                  Get all kandi verified functions for this library.

                                                                                  DeepSpeech Key Features

                                                                                  DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

                                                                                  DeepSpeech Examples and Code Snippets

                                                                                  No Code Snippets are available at this moment for DeepSpeech.
                                                                                  Community Discussions

                                                                                  Trending Discussions on DeepSpeech

                                                                                  Adafruit I2S MEMS microphone is not working with voice activity detection system
                                                                                  chevron right
                                                                                  Can't set a hotword with deepspeech
                                                                                  chevron right
                                                                                  Subprocess call error while calling generate_lm.py of DeepSpeech
                                                                                  chevron right
                                                                                  (0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech
                                                                                  chevron right
                                                                                  ['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1
                                                                                  chevron right
                                                                                  Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]
                                                                                  chevron right
                                                                                  The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'
                                                                                  chevron right
                                                                                  while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported
                                                                                  chevron right
                                                                                  How to change microphone sample rate to 16000 on linux?
                                                                                  chevron right
                                                                                  DeepSpeech failed to learn Persian language
                                                                                  chevron right

                                                                                  QUESTION

                                                                                  Adafruit I2S MEMS microphone is not working with voice activity detection system
                                                                                  Asked 2022-Jan-26 at 13:36

                                                                                  I am trying to make a speech to text system using raspberry pi. There are many problems with VAD. I am using DeepCpeech's VAD script. Adafruit I2S MEMS microphone accepts only 32-bit PCM audio. So I modified the script to record 32-bit audio and then convert it to 16 bit for DeepSpeech's processing. Frames generation and conversation parts are below:

                                                                                  for frame in frames:
                                                                                      if frame is not None:
                                                                                          if spinner: spinner.start()
                                                                                          #Get frame generated by PyAudio and Webrtcvad
                                                                                          dp_frame = np.frombuffer(frame, np.int32)
                                                                                          #Covert to 16-bit PCM
                                                                                          dp_frame=(dp_frame>>16).astype(np.int16)
                                                                                          #Convert speech to text
                                                                                          stream_context.feedAudioContent(dp_frame)
                                                                                  

                                                                                  PyAudio configs are:

                                                                                  'format': paInt32,
                                                                                  'channels': 1,
                                                                                  'rate': 16000,
                                                                                  

                                                                                  When VAD is starting it is always generating non-empty frames even if there is no voice around. But When I am setting a timer for every 5 seconds it shows that the recording was done successfully. I think the problem is that the energy(voltage) adds some noise and that's why the microphone can not detect silence and end frame generation. How to solve this problem?

                                                                                  ANSWER

                                                                                  Answered 2022-Jan-26 at 13:36

                                                                                  I searched for DeepCpeech's VAD script and found it. The problem is connected with the webrtcvad. The webrtcvad VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. So you need to convert the 32-bit frame to 16-bit (I am about PyAudio output frame) to process webrtcvad.is_speech(). I changed and it worked fine.

                                                                                  Source https://stackoverflow.com/questions/70838837

                                                                                  QUESTION

                                                                                  Can't set a hotword with deepspeech
                                                                                  Asked 2022-Jan-23 at 23:41

                                                                                  I tried to set my hotword for deepspeech on my raspberry pi and got a really long error when I sent this in terminal:

                                                                                  python3 /home/pi/DeepSpeech_RaspberryPi4_Hotword/mic_streaming.py --keywords jarvis

                                                                                  Error

                                                                                  I don't know how to fix this and didn't find anything anywhere else.

                                                                                  ANSWER

                                                                                  Answered 2021-Nov-25 at 03:10

                                                                                  these errors are not related to DeepSpeech, they're related to ALSA, which is the sound subsystem for Linux. By the looks of the error, your system is having trouble accessing the microphone.

                                                                                  I would recommend running several ALSA tests, such as;

                                                                                  arecord -l

                                                                                  This should give you a list of recording devices that are detected, such as:

                                                                                  $ arecord -l
                                                                                  
                                                                                  
                                                                                  **** List of CAPTURE Hardware Devices ****
                                                                                  card 2: Generic_1 [HD-Audio Generic], device 0: ALC294 Analog [ALC294 Analog]
                                                                                    Subdevices: 1/1
                                                                                    Subdevice #0: subdevice #0
                                                                                  

                                                                                  If this is not what you expected, you can use the command alsamixer to select another sound card and/or microphone.

                                                                                  Source https://stackoverflow.com/questions/69816953

                                                                                  QUESTION

                                                                                  Subprocess call error while calling generate_lm.py of DeepSpeech
                                                                                  Asked 2021-Dec-06 at 03:33

                                                                                  I am trying to build customised scorer (language model) for speech-to-text using DeepSpeech in colab. While calling generate_lm.py getting this error:

                                                                                      main()
                                                                                    File "generate_lm.py", line 201, in main
                                                                                      build_lm(args, data_lower, vocab_str)
                                                                                    File "generate_lm.py", line 126, in build_lm
                                                                                      binary_path,
                                                                                    File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
                                                                                      raise CalledProcessError(retcode, cmd)
                                                                                  subprocess.CalledProcessError: Command '['/content/DeepSpeech/native_client/kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', '/content/DeepSpeech/data/lm/lm_filtered.arpa', '/content/DeepSpeech/data/lm/lm.binary']' died with .```
                                                                                  
                                                                                  Calling the script generate_lm.py like this :
                                                                                  
                                                                                  ```! python3 generate_lm.py --input_txt hindi_tokens.txt --output_dir /content/DeepSpeech/data/lm --top_k 500000 --kenlm_bins /content/DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie```
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-06 at 03:33

                                                                                  Able to find a solution for the above question. Successfully created language model after reducing the value of top_k to 15000. My phrases file has about 42000 entries only. We have to adjust top_k value based on the number of phrases in our collection. top_k parameter says - this much of less frequent phrases will be removed before processing.

                                                                                  Source https://stackoverflow.com/questions/70043586

                                                                                  QUESTION

                                                                                  (0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24) During the Training in Mozilla Deepspeech
                                                                                  Asked 2021-Sep-25 at 18:12

                                                                                  I am using below command to start the training of deepspeech model

                                                                                  %cd /content/DeepSpeech
                                                                                  !python3 DeepSpeech.py \
                                                                                  --drop_source_layers 2 --scorer /content/DeepSpeech/data/lm/kenlm-nigerian.scorer\
                                                                                   --train_cudnn True --early_stop True --es_epochs 6 --n_hidden 2048 --epochs 5 \
                                                                                    --export_dir /content/models/ --checkpoint_dir /content/model_checkpoints/ \
                                                                                    --train_files /content/train.csv --dev_files /content/dev.csv --test_files /content/test.csv \
                                                                                    --learning_rate 0.0001 --train_batch_size 64 --test_batch_size 32 --dev_batch_size 32 --export_file_name 'he_model_5' \
                                                                                    --max_to_keep 3
                                                                                  

                                                                                  I keep getting following error again and again.

                                                                                  (0) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
                                                                                  (1) Invalid argument: Not enough time for target transition sequence (required: 28, available: 24)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2021-Sep-25 at 18:12

                                                                                  Following worked for me

                                                                                  Go to

                                                                                  DeepSpeech/training/deepspeech_training/train.py
                                                                                  

                                                                                  Now look for following particular line (Normally in 240-250)

                                                                                  total_loss = tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)
                                                                                  

                                                                                  Change it to as following

                                                                                  total_loss = tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, )
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/69328818

                                                                                  QUESTION

                                                                                  ['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1
                                                                                  Asked 2021-Sep-25 at 14:09

                                                                                  During the build of lm binay to create scorer doe deepspeech model I was getting the following error again and again

                                                                                  subprocess.CalledProcessError: Command '['/content/kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', '/content/lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1.
                                                                                  

                                                                                  The command I was using is as below

                                                                                  !python /content/DeepSpeech/data/lm/generate_lm.py \
                                                                                  --input_txt /content/transcripts.txt \
                                                                                  --output_dir /content/scorer/ \
                                                                                  --top_k 50000 \
                                                                                  --kenlm_bins /content/kenlm/build/bin/ \
                                                                                  --arpa_order 5 --max_arpa_memory "95%" --arpa_prune "0|0|1" \
                                                                                  --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2021-Sep-25 at 14:09

                                                                                  Following worked for me Go to

                                                                                  DeepSpeech -> data -> lm -> generate_lm.py
                                                                                  

                                                                                  Now find following stack of code inside it

                                                                                  subprocess.check_call(
                                                                                          [
                                                                                              os.path.join(args.kenlm_bins, "build_binary"),
                                                                                              "-a",
                                                                                              str(args.binary_a_bits),
                                                                                              "-q",
                                                                                              str(args.binary_q_bits),
                                                                                              "-v",
                                                                                              args.binary_type,
                                                                                              filtered_path,
                                                                                              binary_path,
                                                                                          ]
                                                                                  

                                                                                  Tweak the code by adding "-s" flag in it as below

                                                                                  subprocess.check_call(
                                                                                      [
                                                                                          os.path.join(args.kenlm_bins, "build_binary"),
                                                                                          "-a",
                                                                                          str(args.binary_a_bits),
                                                                                          "-q",
                                                                                          str(args.binary_q_bits),
                                                                                          "-v",
                                                                                          args.binary_type,
                                                                                          filtered_path,
                                                                                          binary_path,
                                                                                          "-s"
                                                                                      ]
                                                                                  

                                                                                  Now your command will run fine

                                                                                  Source https://stackoverflow.com/questions/69326923

                                                                                  QUESTION

                                                                                  Error during training in deepspeech Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]
                                                                                  Asked 2021-Sep-24 at 13:04

                                                                                  Getting following error when trying to excecute

                                                                                  %cd /content/DeepSpeech
                                                                                  !python3 DeepSpeech.py --train_cudnn True --early_stop True --es_epochs 6 --n_hidden 2048 --epochs 20 \
                                                                                    --export_dir /content/models/ --checkpoint_dir /content/model_checkpoints/ \
                                                                                    --train_files /content/train.csv --dev_files /content/dev.csv --test_files /content/test.csv \
                                                                                    --learning_rate 0.0001 --train_batch_size 64 --test_batch_size 32 --dev_batch_size 32 --export_file_name 'ft_model' \
                                                                                     --augment reverb[p=0.2,delay=50.0~30.0,decay=10.0:2.0~1.0] \
                                                                                     --augment volume[p=0.2,dbfs=-10:-40] \
                                                                                     --augment pitch[p=0.2,pitch=1~0.2] \
                                                                                     --augment tempo[p=0.2,factor=1~0.5] 
                                                                                  

                                                                                  tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 798, 64, 2048] [[{{node tower_0/cudnn_lstm/CudnnRNNV3}}]] [[tower_0/gradients/tower_0/BiasAdd_2_grad/BiasAddGrad/_87]] (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 798, 64, 2048] [[{{node tower_0/cudnn_lstm/CudnnRNNV3}}]] 0 successful operations. 0 derived errors ignored.

                                                                                  ANSWER

                                                                                  Answered 2021-Sep-23 at 07:59

                                                                                  If i try it as below it worked fine.

                                                                                  %cd /content/DeepSpeech
                                                                                  !python3 DeepSpeech.py --train_cudnn True --early_stop True --es_epochs 6 --n_hidden 2048 --epochs 20 \
                                                                                    --export_dir /content/models/ --checkpoint_dir /content/model_checkpoints/ \
                                                                                    --train_files /content/train.csv --dev_files /content/dev.csv --test_files /content/test.csv \
                                                                                    --learning_rate 0.0001 --train_batch_size 64 --test_batch_size 32 --dev_batch_size 32 --export_file_name 'ft_model' \
                                                                                    # --augment reverb[p=0.2,delay=50.0~30.0,decay=10.0:2.0~1.0] \
                                                                                    # --augment volume[p=0.2,dbfs=-10:-40] \
                                                                                    # --augment pitch[p=0.2,pitch=1~0.2] \
                                                                                    # --augment tempo[p=0.2,factor=1~0.5]
                                                                                  

                                                                                  Basically augment was doing something to break our training in between

                                                                                  Source https://stackoverflow.com/questions/69296114

                                                                                  QUESTION

                                                                                  The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'
                                                                                  Asked 2021-Jun-13 at 13:29

                                                                                  so a part of my code is

                                                                                  
                                                                                   Future _loadModel() async {
                                                                                      final bytes =
                                                                                          await rootBundle.load('assets/deepspeech-0.9.3-models.tflite');
                                                                                      final directory = (await getApplicationDocumentsDirectory()).path;
                                                                                  
                                                                                  

                                                                                  And i keep getting the error:

                                                                                  The method 'getApplicationDocumentsDirectory' isn't defined for the type '_MyAppState'.
                                                                                  Try correcting the name to the name of an existing method, or defining a method named 'getApplicationDocumentsDirectory'
                                                                                  

                                                                                  What should i do? help me please!

                                                                                  ANSWER

                                                                                  Answered 2021-Jun-13 at 13:29

                                                                                  You have to install path provider package by running flutter pub add path_provider in your terminal. If you already installed it. check whether you are importing it to your file.

                                                                                  Source https://stackoverflow.com/questions/67958607

                                                                                  QUESTION

                                                                                  while I was trying to train a DeepSpeech model on google colab, I'm getting an error saying that .whl file is not suported
                                                                                  Asked 2021-May-26 at 00:07

                                                                                  commands i used

                                                                                  !wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl
                                                                                  
                                                                                  !pip install /content/~path~/ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl
                                                                                  

                                                                                  this gives me an error

                                                                                  ERROR: ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.

                                                                                  how can i solve this ?

                                                                                  ANSWER

                                                                                  Answered 2021-May-26 at 00:07

                                                                                  You are using wget to pull down a .whl file that was built for a different version of Python. You are pulling down

                                                                                  ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl

                                                                                  but are running Python 3.7. You need a different .whl file, such as:

                                                                                  ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl

                                                                                  This is available here from the DeepSpeech releases page on GitHub.

                                                                                  Source https://stackoverflow.com/questions/67671706

                                                                                  QUESTION

                                                                                  How to change microphone sample rate to 16000 on linux?
                                                                                  Asked 2021-May-18 at 13:17

                                                                                  I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.

                                                                                  The microphone info is listed like this by pyaudio:

                                                                                  {'index': 1, 'structVersion': 2, 'name': 'Logitech USB Microphone: Audio (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': -1.0, 'defaultHighInputLatency': 0.034829931972789115, 'defaultHighOutputLatency': -1.0, 'defaultSampleRate': 44100.0}
                                                                                  

                                                                                  The first thing I tried was setting the pyAudio stream sample rate to 44100 and feeding the model that. But after testing I found out that the model does not work well when it gets a rate different from its requested 16000.

                                                                                  I have been trying to find a way to have the microphone change rate to 16000, or at least have its rate converted to 16000 when it is used in the python script, but to no avail.

                                                                                  The latest thing I have tried is changing the .asoundrc file to find away to change the rate, but I don't know if it is possible to change the microphone's rate to 16000 within this file. This is how the file currently looks like:

                                                                                  pcm.!default {
                                                                                          type asymd
                                                                                          playback.pcm
                                                                                          {
                                                                                                  type plug
                                                                                                  slave.pcm "dmix"
                                                                                          }
                                                                                          capture.pcm
                                                                                          {
                                                                                                  type plug
                                                                                                  slave.pcm "usb"
                                                                                          }
                                                                                  }
                                                                                  
                                                                                  ctl.!default {
                                                                                          type hw
                                                                                          card 0
                                                                                  }
                                                                                  
                                                                                  pcm.usb {
                                                                                          type hw
                                                                                          card 1
                                                                                          device 0
                                                                                          rate 16000
                                                                                  } 
                                                                                  

                                                                                  The python code I made works on windows, which I guess is because windows does convert the rate of the input to the sample rate in the code. But Linux does not seem to do this.

                                                                                  tldr; microphone rate is 44100, but has to change to 16000 to be usable. How do you do this on Linux?

                                                                                  Edit 1:

                                                                                  I create the pyAudio stream like this:

                                                                                  self.paStream = self.pa.open(rate = self.model.sampleRate(), channels = 1, format= pyaudio.paInt16, input=True, input_device_index = 1, frames_per_buffer= self.model.beamWidth())
                                                                                  

                                                                                  It uses the model's rate and model's beamwidth, and the number of channels of the microphone and index of the microphone.

                                                                                  I get the next audio frame and to format it properly to use with the stream I create for the model I do this:

                                                                                  def __get_next_audio_frame__(self):
                                                                                      audio_frame = self.paStream.read(self.model.beamWidth(), exception_on_overflow= False)  
                                                                                      audio_frame = struct.unpack_from("h" * self.model.beamWidth(), audio_frame)     
                                                                                      return audio_frame
                                                                                  

                                                                                  exception_on_overflow = False was used to test the model with an input rate of 44100, without this set to False the same error as I currently deal with would occur. model.beamWidth is a variable that hold the value for the amount of chunks the model expects. I then read that amount of chunks and reformat them before feeding them to the model's stream. Which happens like this:

                                                                                  modelStream.feedAudioContent(self.__get_next_audio_frame__())
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2021-Jan-09 at 16:47

                                                                                  So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.

                                                                                  The file is located here: /etc/pulse/daemon.conf . We can open and edit this file on Raspberian using sudo vi daemon.conf. Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.

                                                                                  If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.

                                                                                  Source https://stackoverflow.com/questions/65599012

                                                                                  QUESTION

                                                                                  DeepSpeech failed to learn Persian language
                                                                                  Asked 2021-May-15 at 08:12

                                                                                  I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.

                                                                                  My configurations are as follows:

                                                                                  1. Batch size = 2 (due to cuda OOM)
                                                                                  2. Learning rate = 0.0001
                                                                                  3. Num. neurons = 2048
                                                                                  4. Num. epochs = 50
                                                                                  5. Train set size = 7500
                                                                                  6. Test and Dev sets size = 5000
                                                                                  7. dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)

                                                                                  Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.

                                                                                  The predictions are all empty strings at the end of the process. Any ideas how to improve the model?

                                                                                  ANSWER

                                                                                  Answered 2021-May-11 at 14:02

                                                                                  maybe you need to decrease learning rate or use a learning rate scheduler.

                                                                                  Source https://stackoverflow.com/questions/67347479

                                                                                  Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                                                                                  Vulnerabilities

                                                                                  No vulnerabilities reported

                                                                                  Install DeepSpeech

                                                                                  You can download it from GitHub.

                                                                                  Support

                                                                                  For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
                                                                                  Find more information at:
                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit
                                                                                  Install
                                                                                • PyPI

                                                                                  pip install deepspeech

                                                                                • CLONE
                                                                                • HTTPS

                                                                                  https://github.com/mozilla/DeepSpeech.git

                                                                                • CLI

                                                                                  gh repo clone mozilla/DeepSpeech

                                                                                • sshUrl

                                                                                  git@github.com:mozilla/DeepSpeech.git

                                                                                • Share this Page

                                                                                  share link

                                                                                  Consider Popular Speech Libraries

                                                                                  DeepSpeech

                                                                                  by mozilla

                                                                                  kaldi

                                                                                  by kaldi-asr

                                                                                  zeal

                                                                                  by zealdocs

                                                                                  leon

                                                                                  by leon-ai

                                                                                  Try Top Libraries by mozilla

                                                                                  pdf.js

                                                                                  by mozillaJavaScript

                                                                                  send

                                                                                  by mozillaJavaScript

                                                                                  sops

                                                                                  by mozillaGo

                                                                                  BrowserQuest

                                                                                  by mozillaJavaScript

                                                                                  nunjucks

                                                                                  by mozillaJavaScript

                                                                                  Compare Speech Libraries with Highest Support

                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit