technology logo
technology logo

Sachet-Samachar: Hindi Fake News Detector

share link

by Parul Mann1 dot icon Updated: Jan 28, 2022

Solution Kit Solution Kit  

सचेत-समाचार Hindi Fake News Detector Youtube: https://youtu.be/5wdvW-OnuKU This is the submission for team Hackstreet Girls under the topic 'Combating Disinformation'. COMBATING MISINFORMATION IN HINDI Taking into account the Covid-19 pandemic situation during the last few years, there has been a rampant rise in the increase of fake news. In a country such as India, with news circulation in more than 22 languages, keeping a check on the spread of false news is a complex task. Although there are many available resources to check the validity of news in English, there is little to no work done in regional languages. In India, nearly 55 Crore people speak and understand Hindi making it the primary language for the circulation of news. With the lack of check on Hindi fake news, lots of misinformation is circulated in our country causing socio-political tensions amongst other issues. This is especially dangerous during the current pandemic as the circulation of false medical information can even cause loss of life. To combat the spread of Hindi misinformation, we have made a Hindi Fake News Detector using Machine Learning and Deep Learning Algorithms. Being native Hindi speakers, we understood this problem well and used a dataset we scraped and annotated on our own. Our one-of-a-kind detector provides accuracy up to 82.32 percent! The project aims to identify whether a Hindi news article is fake or not from its link. We worked on a Hindi news dataset that we had prepared on our own. We scraped over 200 news articles from Hindi Fact-Checking websites like Aaj Tak, Fact Check and Alt News Fact Check and annotated the entire dataset of about 2206 data points manually. We used Jupyter Notebook for writing the programs. For the process of scraping URLs, we used libraries like urllib, BeautifulSoup, difflib, re and requests. For pre-processing and building the machine learning models we used Pandas, NumPy, sci-kit learn, Keras and TensorFlow. After annotation, we performed basic pre-processing on the dataset. One of the major challenges we faced during the project was the pre-processing of data. Numerous pre-processing tools are available for cleaning an English dataset. However, since our entire dataset is in the Hindi language, the pre-processing task was very difficult due to the lack of available resources required for data pre-processing. We took a very simple and direct approach to the problem. We simply removed the stop words and punctuation marks followed by stemming and lemmatizing. Then we vectorized the entire dataset using the TF-IDF Vectorizer before using the dataset for training the model. We used Seaborn for the visualization of data as shown on our webpage. After the preprocessing steps and hyperparameter tuning, we tested our dataset on various Machine Learning models like Logistic Regression, SVM, Random Forest, k-NN, and Gradient Boosting Classifier. We also tested our dataset on a Deep Learning LSTM model for 10, 25, 50, and 100 epochs. On a benchmark dataset consisting of 932 fake and 1274 not-fake news links, the model has been successful in identifying most of the fake news links and we achieved an accuracy of 82.35% for the Random Forest model implemented on 10% test data and an accuracy of 60.42% for the LSTM model implemented on 30% test data. We have also built a simple website where we can enter the link of the URL in the search box and check whether the article is fake or not. In the future, we intend to link both the website and ML/DL models together with a suitable backend to support the website. We will also be working on improving the accuracy of the models. Link to GitHub Repository for the project: https://github.com/A-nn-e/Sachet-Samachar Link to YouTube video explaining the project: https://youtu.be/5wdvW-OnuKU

Kit Solution Source

Jupyter Notebook doticonstar image 0 doticonVersion:Currentdoticon
no licences License: No License (null)

Support
    Quality
      Security
        License
          Reuse

            Sachet-Samacharby A-nn-e

            Jupyter Notebook doticon star image 0 doticonVersion:Currentdoticonno licences License: No License

            Support
              Quality
                Security
                  License
                    Reuse

                      Deployment Information

                      Repository for our Hindi Fake News Detector.

                      Follow the below instructions to run the solution: -Run the Web-Scraping code providing a URL as input. You'll get scraped links. -Run the ML and DL code cell by cell to get the accuracy with different models. -For the deployment of our website, we suggest Heroku. Just need to connect the local git repository to the Heroku app. To do this we add the remote of the Heroku app to the git repository and push the changes to our Heroku app and our website will be live on Heroku.

                      Deployment Environment

                      We have used Jupyter Notebook for the deployment.

                      jupyterby jupyter

                      Python doticonstar image 14397 doticonVersion:Currentdoticon
                      License: Permissive (BSD-3-Clause)

                      Jupyter metapackage for installation, docs and chat

                      Support
                        Quality
                          Security
                            License
                              Reuse

                                jupyterby jupyter

                                Python doticon star image 14397 doticonVersion:Currentdoticon License: Permissive (BSD-3-Clause)

                                Jupyter metapackage for installation, docs and chat
                                Support
                                  Quality
                                    Security
                                      License
                                        Reuse

                                          Exploratory Data Analysis

                                          For extensive analysis and exploration of data, these libraries were used.

                                          numpyby numpy

                                          Python doticonstar image 23692 doticonVersion:v1.25.0rc1doticon
                                          License: Permissive (BSD-3-Clause)

                                          The fundamental package for scientific computing with Python.

                                          Support
                                            Quality
                                              Security
                                                License
                                                  Reuse

                                                    numpyby numpy

                                                    Python doticon star image 23692 doticonVersion:v1.25.0rc1doticon License: Permissive (BSD-3-Clause)

                                                    The fundamental package for scientific computing with Python.
                                                    Support
                                                      Quality
                                                        Security
                                                          License
                                                            Reuse

                                                              pandasby pandas-dev

                                                              Python doticonstar image 38607 doticonVersion:v2.0.2doticon
                                                              License: Permissive (BSD-3-Clause)

                                                              Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

                                                              Support
                                                                Quality
                                                                  Security
                                                                    License
                                                                      Reuse

                                                                        pandasby pandas-dev

                                                                        Python doticon star image 38607 doticonVersion:v2.0.2doticon License: Permissive (BSD-3-Clause)

                                                                        Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
                                                                        Support
                                                                          Quality
                                                                            Security
                                                                              License
                                                                                Reuse

                                                                                  Data Scraping and Cleaning

                                                                                  For data scraping and cleaning, these libraries were used.

                                                                                  urllibby node-modules

                                                                                  TypeScript doticonstar image 699 doticonVersion:v3.15.0doticon
                                                                                  License: Permissive (MIT)

                                                                                  Request HTTP(s) URLs in a complex world.

                                                                                  Support
                                                                                    Quality
                                                                                      Security
                                                                                        License
                                                                                          Reuse

                                                                                            urllibby node-modules

                                                                                            TypeScript doticon star image 699 doticonVersion:v3.15.0doticon License: Permissive (MIT)

                                                                                            Request HTTP(s) URLs in a complex world.
                                                                                            Support
                                                                                              Quality
                                                                                                Security
                                                                                                  License
                                                                                                    Reuse

                                                                                                      web scrapping using bs4by ashutoshdhondkar

                                                                                                      Python doticonstar image 0 doticonVersion:Currentdoticon
                                                                                                      no licences License: No License (null)

                                                                                                      Project submitted at NIIT

                                                                                                      Support
                                                                                                        Quality
                                                                                                          Security
                                                                                                            License
                                                                                                              Reuse

                                                                                                                web scrapping using bs4by ashutoshdhondkar

                                                                                                                Python doticon star image 0 doticonVersion:Currentdoticonno licences License: No License

                                                                                                                Project submitted at NIIT
                                                                                                                Support
                                                                                                                  Quality
                                                                                                                    Security
                                                                                                                      License
                                                                                                                        Reuse

                                                                                                                          document-similarityby analyticsbot

                                                                                                                          Python doticonstar image 1 doticonVersion:Currentdoticon
                                                                                                                          License: Permissive (Apache-2.0)

                                                                                                                          A scalable system (using multiprocessing in Python) to find similarity between thousands of documents using difflib Sequence Matcher/ Levenstein Distance /cosine similarity/ word embeddings generated by word2vec

                                                                                                                          Support
                                                                                                                            Quality
                                                                                                                              Security
                                                                                                                                License
                                                                                                                                  Reuse

                                                                                                                                    document-similarityby analyticsbot

                                                                                                                                    Python doticon star image 1 doticonVersion:Currentdoticon License: Permissive (Apache-2.0)

                                                                                                                                    A scalable system (using multiprocessing in Python) to find similarity between thousands of documents using difflib Sequence Matcher/ Levenstein Distance /cosine similarity/ word embeddings generated by word2vec
                                                                                                                                    Support
                                                                                                                                      Quality
                                                                                                                                        Security
                                                                                                                                          License
                                                                                                                                            Reuse

                                                                                                                                              requestsby psf

                                                                                                                                              Python doticonstar image 49766 doticonVersion:v2.31.0doticon
                                                                                                                                              License: Permissive (Apache-2.0)

                                                                                                                                              A simple, yet elegant, HTTP library.

                                                                                                                                              Support
                                                                                                                                                Quality
                                                                                                                                                  Security
                                                                                                                                                    License
                                                                                                                                                      Reuse

                                                                                                                                                        requestsby psf

                                                                                                                                                        Python doticon star image 49766 doticonVersion:v2.31.0doticon License: Permissive (Apache-2.0)

                                                                                                                                                        A simple, yet elegant, HTTP library.
                                                                                                                                                        Support
                                                                                                                                                          Quality
                                                                                                                                                            Security
                                                                                                                                                              License
                                                                                                                                                                Reuse

                                                                                                                                                                  Machine Learning and Deep Learning

                                                                                                                                                                  For applying Machine Learning and Deep Learning, these libraries were used.

                                                                                                                                                                  Jupyter Notebook doticonstar image 7 doticonVersion:Currentdoticon
                                                                                                                                                                  License: Permissive (MIT)

                                                                                                                                                                  Machine learning with Sci-kit Learn and Tensorflow (V)

                                                                                                                                                                  Support
                                                                                                                                                                    Quality
                                                                                                                                                                      Security
                                                                                                                                                                        License
                                                                                                                                                                          Reuse

                                                                                                                                                                            Machine-learning-with-Sci-kit-Learn-and-Tensorflow-V-by PacktPublishing

                                                                                                                                                                            Jupyter Notebook doticon star image 7 doticonVersion:Currentdoticon License: Permissive (MIT)

                                                                                                                                                                            Machine learning with Sci-kit Learn and Tensorflow (V)
                                                                                                                                                                            Support
                                                                                                                                                                              Quality
                                                                                                                                                                                Security
                                                                                                                                                                                  License
                                                                                                                                                                                    Reuse

                                                                                                                                                                                      kerasby keras-team

                                                                                                                                                                                      Python doticonstar image 58540 doticonVersion:v2.13.1-rc0doticon
                                                                                                                                                                                      License: Permissive (Apache-2.0)

                                                                                                                                                                                      Deep Learning for humans

                                                                                                                                                                                      Support
                                                                                                                                                                                        Quality
                                                                                                                                                                                          Security
                                                                                                                                                                                            License
                                                                                                                                                                                              Reuse

                                                                                                                                                                                                kerasby keras-team

                                                                                                                                                                                                Python doticon star image 58540 doticonVersion:v2.13.1-rc0doticon License: Permissive (Apache-2.0)

                                                                                                                                                                                                Deep Learning for humans
                                                                                                                                                                                                Support
                                                                                                                                                                                                  Quality
                                                                                                                                                                                                    Security
                                                                                                                                                                                                      License
                                                                                                                                                                                                        Reuse

                                                                                                                                                                                                          tensorflowby tensorflow

                                                                                                                                                                                                          C++ doticonstar image 175362 doticonVersion:v2.13.0-rc1doticon
                                                                                                                                                                                                          License: Permissive (Apache-2.0)

                                                                                                                                                                                                          An Open Source Machine Learning Framework for Everyone

                                                                                                                                                                                                          Support
                                                                                                                                                                                                            Quality
                                                                                                                                                                                                              Security
                                                                                                                                                                                                                License
                                                                                                                                                                                                                  Reuse

                                                                                                                                                                                                                    tensorflowby tensorflow

                                                                                                                                                                                                                    C++ doticon star image 175362 doticonVersion:v2.13.0-rc1doticon License: Permissive (Apache-2.0)

                                                                                                                                                                                                                    An Open Source Machine Learning Framework for Everyone
                                                                                                                                                                                                                    Support
                                                                                                                                                                                                                      Quality
                                                                                                                                                                                                                        Security
                                                                                                                                                                                                                          License
                                                                                                                                                                                                                            Reuse