How to convert speech from an audio file to text in Python?

by Dejaswarooba Updated: Apr 6, 2023

Solution Kit

SpeechRecognition is a popular Python package for doing speech recognition tasks. It is a popular option for programmers building speech-enabled applications since it offers a simple interface for interacting with various speech recognition APIs and engines. Support for numerous voice recognition engines is a key aspect of the SpeechRecognition library. It can communicate with prominent voice recognition APIs like Google Speech Recognition, IBM Watson, and Microsoft Bing Voice Recognition. Because of this, developers may quickly switch between several engines and select the one that best suits their specific use case.

The library also includes an interface for interacting with offline voice recognition engines like the CMU Sphinx toolkit. This is beneficial when internet availability is restricted or privacy concerns necessitate local speech processing. Another benefit of the SpeechRecognition library is that it supports audio file types. It supports various audio file types, including WAV, MP3, FLAC, and OGG. In this way, text can be generated from audio recordings processed from several audio sources.

We use two main classes of the SpeechRecognition library to convert speech into text.

Recognizer() - Speech recognition tasks are performed using the Python class Recognizer() from the SpeechRecognition package. It may be used to transcribe speech from audio files or microphone input and offers a handy interface for interacting with various speech recognition engines and APIs.
AudioFile() - The AudioFile class can identify speech in audio files when working with a voice recognition engine.

fig1 Code depicting speech conversion.

fig2 Expected output

Code

r = sr. Recognizer() - This method returns a new instance of the SpeechRecognition library's Recognizer class. The Recognizer class is used to detect speech in audio files.

audio = 'test.wav' - Sets the variable audio to the filename of the audio file to be transcribed.

sr.AudioFile(audio) - This method opens the audio file as a source and creates an AudioFile object from the SpeechRecognition library. The with statement ensures that the AudioFile object is closed properly after use.

audio = r.record(source) - This utilises the Recognizer object's record function to record audio from the source file.

text = r.recognize google(audio) - The Recognizer object's recognize_google() method is used to transcribe the audio and save the generated text in the variable text.

How to convert speech to text in python input from audio file

PythonLines of Code : 17License : Strong Copyleft (CC BY-SA 4.0)

Dependent Libraries :

import speech_recognition as sr

r = sr.Recognizer()

audio = 'trial.wav'

with sr.AudioFile(audio) as source:
    audio = r.record(source)
    print ('Done!')

try:
    text = r.recognize_google(audio)
    print (text)

except Exception as e:
    print (e)

Follow the steps carefully to get the output easily.

Install Visual Studio Code in your computer.
Install the required library by using the following command - pip install SpeechRecognition.
If your system is not reflecting the installation, try running the above command by opening windows powershell as administrator.
Save the required audio file in WAV format and put the audio as well as the python file in the same folder.
Open the folder in the code editor, copy and paste the above kandi code snippet in the python file.
Make sure to specify the path of the audio file correctly in the line audio = 'test.wav'

I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.

I found this code snippet by searching for "speech from audio file to text in Python" in kandi. You can try any such use case!

Dependent Libraries

speech_recognitionby Uberi

Python

7239

Version:3.10.0

License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Support

Quality

Security

License

Reuse

speech_recognitionby Uberi

Python 7239 Version:3.10.0 License: Permissive (BSD-3-Clause)

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Support

Quality

Security

License

Reuse

If you do not have Speech Recognition that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the Speech Recognition page in kandi.

You can search for any dependent library on kandi like SpeechRecognition.

Environment tested

This code had been tested using python version 3.8.0
SpeechRecognition version 3.10.0 has been used.

Support

For any support on kandi solution kits, please use the chat
For further learning resources, visit the Open Weaver Community learning page.

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

How to convert speech from an audio file to text in Python?

Code

Dependent Libraries

Environment tested

Support

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow