Speech summarization help us in generating a gist of a speech by solving the problem of transcribing and summarization. Speech summarizer can also be used to comprehend Podcasts on variety of topics.
Below are the steps involved in building a speech summarizer. The speech summarizer takes an audio file as an input and generates text or audio as an output.
- Transform audio to meet the following spec a. '.wav' file format b. 16KHz sample rate c. Mono channel
- Transcribe transformed audio file
- Process transcribed text if necessary
- Summarize transcribed text using pre-trained state-of-the-art models
- Generate audio out of summarized text.
Speech Summarizer created using this kit are added in this section. The entire solution is available as a package to download from the source code repository.
- Download, extract and double-click the kit installer file to install the kit. Do ensure to extract the zip file before running it.
- After successful installation of the kit, press 'Y' to run the kit and execute cells in the notebook.
- To run the kit manually, press 'N' and locate the zip file 'speech-summarizer.zip'
- Extract the zip file and navigate to the directory 'speech-summarizer'
- Open command prompt in the extracted directory 'speech-summarizer' and run the command 'jupyter notebook'
- Locate and open the 'Speech Summarizer.ipynb' notebook from the Jupyter Notebook browser window.
- Execute cells in the notebook
Click on the button below to download the solution and follow the deployment instructions to begin set-up. This 1-click kit has all the required dependencies and resources you may need to build your Speech Summarizer App.
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers. Jupyter Notebook is used for our development.
Exploratory Data Analysis
For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation.
Python 38689 Version:v2.0.2 License: Permissive (BSD-3-Clause)
Libraries in this group are used for analysis and processing of unstructured natural language. The data, as in its original form aren't used as it has to go through processing pipeline to become suitable for applying machine learning techniques and algorithms.
Transcribing libraries help in converting speech to text.
Machine learning libraries and frameworks here are helpful in generating state-of-the-art summarization.
Request servicing via REST API
Web frameworks help build serving solution as REST APIs. The resources involved for servicing request can be handled by containerising and hosting on hyperscalers.