kandi background
Explore Kits

Sachet-Samachar: Hindi Fake News Detector

by Parul Mann1 Updated: Jan 28, 2022

सचेत-समाचार Hindi Fake News Detector Youtube: https://youtu.be/5wdvW-OnuKU This is the submission for team Hackstreet Girls under the topic 'Combating Disinformation'. COMBATING MISINFORMATION IN HINDI Taking into account the Covid-19 pandemic situation during the last few years, there has been a rampant rise in the increase of fake news. In a country such as India, with news circulation in more than 22 languages, keeping a check on the spread of false news is a complex task. Although there are many available resources to check the validity of news in English, there is little to no work done in regional languages. In India, nearly 55 Crore people speak and understand Hindi making it the primary language for the circulation of news. With the lack of check on Hindi fake news, lots of misinformation is circulated in our country causing socio-political tensions amongst other issues. This is especially dangerous during the current pandemic as the circulation of false medical information can even cause loss of life. To combat the spread of Hindi misinformation, we have made a Hindi Fake News Detector using Machine Learning and Deep Learning Algorithms. Being native Hindi speakers, we understood this problem well and used a dataset we scraped and annotated on our own. Our one-of-a-kind detector provides accuracy up to 82.32 percent! The project aims to identify whether a Hindi news article is fake or not from its link. We worked on a Hindi news dataset that we had prepared on our own. We scraped over 200 news articles from Hindi Fact-Checking websites like Aaj Tak, Fact Check and Alt News Fact Check and annotated the entire dataset of about 2206 data points manually. We used Jupyter Notebook for writing the programs. For the process of scraping URLs, we used libraries like urllib, BeautifulSoup, difflib, re and requests. For pre-processing and building the machine learning models we used Pandas, NumPy, sci-kit learn, Keras and TensorFlow. After annotation, we performed basic pre-processing on the dataset. One of the major challenges we faced during the project was the pre-processing of data. Numerous pre-processing tools are available for cleaning an English dataset. However, since our entire dataset is in the Hindi language, the pre-processing task was very difficult due to the lack of available resources required for data pre-processing. We took a very simple and direct approach to the problem. We simply removed the stop words and punctuation marks followed by stemming and lemmatizing. Then we vectorized the entire dataset using the TF-IDF Vectorizer before using the dataset for training the model. We used Seaborn for the visualization of data as shown on our webpage. After the preprocessing steps and hyperparameter tuning, we tested our dataset on various Machine Learning models like Logistic Regression, SVM, Random Forest, k-NN, and Gradient Boosting Classifier. We also tested our dataset on a Deep Learning LSTM model for 10, 25, 50, and 100 epochs. On a benchmark dataset consisting of 932 fake and 1274 not-fake news links, the model has been successful in identifying most of the fake news links and we achieved an accuracy of 82.35% for the Random Forest model implemented on 10% test data and an accuracy of 60.42% for the LSTM model implemented on 30% test data. We have also built a simple website where we can enter the link of the URL in the search box and check whether the article is fake or not. In the future, we intend to link both the website and ML/DL models together with a suitable backend to support the website. We will also be working on improving the accuracy of the models. Link to GitHub Repository for the project: https://github.com/A-nn-e/Sachet-Samachar Link to YouTube video explaining the project: https://youtu.be/5wdvW-OnuKU

Kit Solution Source

Sachet-Samacharby A-nn-e

Jupyter Notebook star image 0 Version:Current

License: No License (null)

Support
Quality
Security
License
Reuse

Sachet-Samacharby A-nn-e

Jupyter Notebook star image 0 Version:Current License: No License

Support
Quality
Security
License
Reuse

Deployment Information

Repository for our Hindi Fake News Detector.

Follow the below instructions to run the solution: -Run the Web-Scraping code providing a URL as input. You'll get scraped links. -Run the ML and DL code cell by cell to get the accuracy with different models. -For the deployment of our website, we suggest Heroku. Just need to connect the local git repository to the Heroku app. To do this we add the remote of the Heroku app to the git repository and push the changes to our Heroku app and our website will be live on Heroku.

Deployment Environment

We have used Jupyter Notebook for the deployment.

jupyterby jupyter

Python star image 12379 Version:Current

License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat

Support
Quality
Security
License
Reuse

jupyterby jupyter

Python star image 12379 Version:Current License: Permissive (BSD-3-Clause)

Jupyter metapackage for installation, docs and chat
Support
Quality
Security
License
Reuse

Exploratory Data Analysis

For extensive analysis and exploration of data, these libraries were used.

numpyby numpy

Python star image 20101 Version:v1.22.3

License: Permissive (BSD-3-Clause)

The fundamental package for scientific computing with Python.

Support
Quality
Security
License
Reuse

numpyby numpy

Python star image 20101 Version:v1.22.3 License: Permissive (BSD-3-Clause)

The fundamental package for scientific computing with Python.
Support
Quality
Security
License
Reuse

pandasby pandas-dev

Python star image 33259 Version:v1.4.1

License: Permissive (BSD-3-Clause)

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Support
Quality
Security
License
Reuse

pandasby pandas-dev

Python star image 33259 Version:v1.4.1 License: Permissive (BSD-3-Clause)

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Support
Quality
Security
License
Reuse

Data Scraping and Cleaning

For data scraping and cleaning, these libraries were used.

urllibby node-modules

JavaScript star image 634 Version:0.3.6

License: Permissive (MIT)

Request HTTP(s) URLs in a complex world.

Support
Quality
Security
License
Reuse

urllibby node-modules

JavaScript star image 634 Version:0.3.6 License: Permissive (MIT)

Request HTTP(s) URLs in a complex world.
Support
Quality
Security
License
Reuse

web scrapping using bs4by ashutoshdhondkar

Python star image 0 Version:Current

License: No License (null)

Project submitted at NIIT

Support
Quality
Security
License
Reuse

web scrapping using bs4by ashutoshdhondkar

Python star image 0 Version:Current License: No License

Project submitted at NIIT
Support
Quality
Security
License
Reuse

document-similarityby analyticsbot

Python star image 1 Version:Current

License: Permissive (Apache-2.0)

A scalable system (using multiprocessing in Python) to find similarity between thousands of documents using difflib Sequence Matcher/ Levenstein Distance /cosine similarity/ word embeddings generated by word2vec

Support
Quality
Security
License
Reuse

document-similarityby analyticsbot

Python star image 1 Version:Current License: Permissive (Apache-2.0)

A scalable system (using multiprocessing in Python) to find similarity between thousands of documents using difflib Sequence Matcher/ Levenstein Distance /cosine similarity/ word embeddings generated by word2vec
Support
Quality
Security
License
Reuse

requestsby psf

Python star image 47177 Version:v2.27.1

License: Permissive (Apache-2.0)

A simple, yet elegant, HTTP library.

Support
Quality
Security
License
Reuse

requestsby psf

Python star image 47177 Version:v2.27.1 License: Permissive (Apache-2.0)

A simple, yet elegant, HTTP library.
Support
Quality
Security
License
Reuse

Machine Learning and Deep Learning

For applying Machine Learning and Deep Learning, these libraries were used.

Machine-learning-with-Sci-kit-Learn-and-Tensorflow-V-by PacktPublishing

Jupyter Notebook star image 7 Version:Current

License: Permissive (MIT)

Machine learning with Sci-kit Learn and Tensorflow (V)

Support
Quality
Security
License
Reuse

Machine-learning-with-Sci-kit-Learn-and-Tensorflow-V-by PacktPublishing

Jupyter Notebook star image 7 Version:Current License: Permissive (MIT)

Machine learning with Sci-kit Learn and Tensorflow (V)
Support
Quality
Security
License
Reuse

kerasby keras-team

Python star image 55007 Version:v2.9.0-rc2

License: Permissive (Apache-2.0)

Deep Learning for humans

Support
Quality
Security
License
Reuse

kerasby keras-team

Python star image 55007 Version:v2.9.0-rc2 License: Permissive (Apache-2.0)

Deep Learning for humans
Support
Quality
Security
License
Reuse

tensorflowby tensorflow

C++ star image 164372 Version:v2.9.0-rc1

License: Permissive (Apache-2.0)

An Open Source Machine Learning Framework for Everyone

Support
Quality
Security
License
Reuse

tensorflowby tensorflow

C++ star image 164372 Version:v2.9.0-rc1 License: Permissive (Apache-2.0)

An Open Source Machine Learning Framework for Everyone
Support
Quality
Security
License
Reuse
  • © 2022 Open Weaver Inc.