Emotion Detection and Recognition is related to Sentiment Analysis. Sentiment Analysis aims to detect positive, neutral, or negative feelings from text.
Emotion Analysis aims to detect and recognize types of feelings through the expression of texts, such as joy, anger, fear, sadness.
In this kit, we build an AI based Speech Emotion Detector using open-source libraries. The concepts covered in the kit are:
- Voice-to-text transcription- The speech can be captured in real-time through the microphone or by uploading an audio file. It is then converted to text using state-of-the-art Opensource AI models from OpenAI Whisper library.
- Emotion detection- Emotion detection on the transcribed text is carried out using a finetuned XLM-RoBERTa model.
Whisper is a general-purpose speech recognition model released by OpenAI that can perform multilingual speech recognition as well as speech translation and language identification. Combined with an emotion detection model, this allows for detecting emotion directly from speech in multiple languages.
XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It can be finetuned to perform any specific task such as emotion classification, text completion etc. Combining these, the emotion detection model could be used to transcribe and detect different emotions to enable a data-driven analysis.
Deployment Information
This repository helps you build your own AI based speech emotion detector with Whisper and finetuned XLM-RoBERTa.
For Windows OS,
- Download, extract the zip file and run. Do ensure to extract the zip file before running it.
- After successful installation of the kit, press 'Y' to run the kit and execute cells in the notebook.
- To run the kit manually, press 'N' and follow the below steps. To run the solution anytime manually after installation, follow the below steps:
- Navigate to the 'speech-emotion-detection' folder located in C:\kandikits
- Open command prompt inside the extracted directory 'speech-emotion-detection'
- Run this command - "speech-emotion-detection-env\Scripts\activate.bat" to activate the virtual environment
- Run the command - "cd speech-emotion-detection"
- Run the command 'jupyter notebook' which would start a Jupyter notebook instance.
- Locate and open the 'Speech_emotion_detection.ipynb' notebook from the Jupyter Notebook browser window.
- Execute cells in the notebook.
For Linux distros and macOS,
- Follow the instructions to download & install Python3.9 & pip for your respective Linux distros or mac OS.
- Download the repository.
- Extract the zip file and navigate to the directory speech-emotion-detection.zip
- Open a terminal in the extracted directory 'speech-emotion-detection'
- Create and activate virtual environment using this command: 'virtualenv venv & source ./venv/bin/activate'
- Install dependencies using the command 'pip3.9 install -r requirements.txt'
- Once the dependencies are installed, run the command 'jupyter notebook' to start jupyter notebook (Pls use --allow-root if you're running as root)
- Locate and open the 'Speech_emotion_detection.ipynb' notebook from the Jupyter Notebook browser window.
- Execute cells in the notebook.
Click the button below to download the solution and follow the deployment information to begin set-up. This 1-click kit has all the required dependencies and resources to build your Speech Emotion Detector App.
Libraries used in this solution
Development Environment
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.
Jupyter Notebook is used for our development.
jupyterby jupyter
Jupyter metapackage for installation, docs and chat
jupyterby jupyter
Python 14404 Version:Current License: Permissive (BSD-3-Clause)
Machine Learning
Machine learning libraries and frameworks here are helpful in providing state-of-the-art solutions using Machine learning
pytorchby pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorchby pytorch
Python 67874 Version:v2.0.1 License: Others (Non-SPDX)
transformersby huggingface
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
transformersby huggingface
Python 104111 Version:v4.30.2 License: Permissive (Apache-2.0)
whisperby openai
Robust Speech Recognition via Large-Scale Weak Supervision
whisperby openai
Python 39256 Version:v20230314 License: Permissive (MIT)
Kit Solution Source
speech-emotion-detectionby kandi1clickkits
Emotion detection from speech using OpenAI's Whisper and finetuned XLM-RoBERTa models
speech-emotion-detectionby kandi1clickkits
Jupyter Notebook 0 Version:v1.0.0 License: Permissive (MIT)
APP Interface
gradioby gradio-app
Create UIs for your machine learning model in Python in 3 minutes
gradioby gradio-app
Python 18771 Version:v3.35.2 License: Permissive (Apache-2.0)