How to convert speech from an audio file to text in Python?

share link

by Dejaswarooba dot icon Updated: Apr 6, 2023

technology logo
technology logo

Solution Kit Solution Kit  

SpeechRecognition is a popular Python package for doing speech recognition tasks. It is a popular option for programmers building speech-enabled applications since it offers a simple interface for interacting with various speech recognition APIs and engines. Support for numerous voice recognition engines is a key aspect of the SpeechRecognition library. It can communicate with prominent voice recognition APIs like Google Speech Recognition, IBM Watson, and Microsoft Bing Voice Recognition. Because of this, developers may quickly switch between several engines and select the one that best suits their specific use case.


The library also includes an interface for interacting with offline voice recognition engines like the CMU Sphinx toolkit. This is beneficial when internet availability is restricted or privacy concerns necessitate local speech processing. Another benefit of the SpeechRecognition library is that it supports audio file types. It supports various audio file types, including WAV, MP3, FLAC, and OGG. In this way, text can be generated from audio recordings processed from several audio sources.


We use two main classes of the SpeechRecognition library to convert speech into text.

  1. Recognizer() - Speech recognition tasks are performed using the Python class Recognizer() from the SpeechRecognition package. It may be used to transcribe speech from audio files or microphone input and offers a handy interface for interacting with various speech recognition engines and APIs.
  2. AudioFile() - The AudioFile class can identify speech in audio files when working with a voice recognition engine.

fig1 Code depicting speech conversion.


fig2 Expected output


Code

  • r = sr. Recognizer() - This method returns a new instance of the SpeechRecognition library's Recognizer class. The Recognizer class is used to detect speech in audio files.


  • audio = 'test.wav' - Sets the variable audio to the filename of the audio file to be transcribed.


  • sr.AudioFile(audio) - This method opens the audio file as a source and creates an AudioFile object from the SpeechRecognition library. The with statement ensures that the AudioFile object is closed properly after use.


  • audio = r.record(source) - This utilises the Recognizer object's record function to record audio from the source file.


  • text = r.recognize google(audio) - The Recognizer object's recognize_google() method is used to transcribe the audio and save the generated text in the variable text.

Follow the steps carefully to get the output easily.

  • Install Visual Studio Code in your computer.
  • Install the required library by using the following command - pip install SpeechRecognition.
  • If your system is not reflecting the installation, try running the above command by opening windows powershell as administrator.
  • Save the required audio file in WAV format and put the audio as well as the python file in the same folder.
  • Open the folder in the code editor, copy and paste the above kandi code snippet in the python file.
  • Make sure to specify the path of the audio file correctly in the line audio = 'test.wav'


I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for "speech from audio file to text in Python" in kandi. You can try any such use case!

Dependent Libraries

Python doticonstar image 7239 doticonVersion:3.10.0doticon
License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Support
    Quality
      Security
        License
          Reuse

            speech_recognitionby Uberi

            Python doticon star image 7239 doticonVersion:3.10.0doticon License: Permissive (BSD-3-Clause)

            Speech recognition module for Python, supporting several engines and APIs, online and offline.
            Support
              Quality
                Security
                  License
                    Reuse

                      If you do not have Speech Recognition that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Speech Recognition page in kandi.


                      You can search for any dependent library on kandi like SpeechRecognition.

                      Environment tested

                      1. This code had been tested using python version 3.8.0
                      2. SpeechRecognition version 3.10.0 has been used.


                      Support

                      1. For any support on kandi solution kits, please use the chat
                      2. For further learning resources, visit the Open Weaver Community learning page.

                      See similar Kits and Libraries