DESCRIPTION
Aim:
The aim of this project is to analyze the sentiment of a song to be positive, negative or neutral based on its lyrics.
Procedure:
The songs lyrics in the form of text is first pre-processed using nltk library. The various steps such as punctuation removal,tokenization, stop-word removal, etc is done. From nltk sentiment analyzer is imported and it is used to detect the sentiment of the song lyrics. The sentiments are stored in the form of labels. The dataset is then split into training and testing data. They are converted to bad-of-words and tfidf sparse matrix is created using the bag-of-words vector. Then sklearn's svm model is imported. The svm model is fitted/trained with the training data tfidf sparse matrix and it predicts labels using testing data tfidf sparse matrix. On evaluation of the model using a classification report, the accuracy was found to be 82%.
Testing Method:
2 songs with their lyrics were taken and sentiment analysis library was used to label their actual sentiments.
Then the svm model was used to predict their sentiments.
The 2 outcomes were similar.
Future Enhancements:
To improve model accuracy
To use different classification models such as Naive_Bayes classifier (Multinomialnb)
DEPENDANT LIBRARIES
scikit-learnby scikit-learn
scikit-learn: machine learning in Python
scikit-learnby scikit-learn
Python 54584 Version:1.2.2 License: Permissive (BSD-3-Clause)
pandasby pandas-dev
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
pandasby pandas-dev
Python 38689 Version:v2.0.2 License: Permissive (BSD-3-Clause)
GITHUB REPOSITORY LINK
https://github.com/Bidura/OpenWeaver2.
this link consists of the '.ipynb' file as well as '.csv' files that were used
SCREENSHOTS
CODE
OUTPUT