Asatya | Combatting Disinformation
by manan1 Updated: Jan 27, 2022
Solution Kit
GITHUB REP: BRIEF: FEATURES: VIDEO: 1. Social Innovation • Problem: The greatest threat to the golden era of the internet age is fake news. With millions of sources and a growing corpus of data being put up on the web daily, combatting disinformation is the need of the hour. Apart from its obvious effects, disinformation on the web has drastic social consequences. With the spread of falsity under the guise of reliability, public acceptance of disinformation becomes a peril. A big example of this has been seen during the COVID-19 pandemic where incorrect data about the disease, ailments, cures and vaccines have rapidly spread through netizens, and affected public opinion for the worse. Moreover, the spread of falsity can negatively affect the communities being described, often patronising or villainising them. In the age of information, any wrong information can tarnish the correct efforts. • Scope and Scale: The obvious solution to this issue comes by tackling the roots of the issue, demanding applications that work over a variety of sources as well as work over a variety of technological interfaces. Seeing that fake news is spread through these channels of communications on the internet: click bait headlines, whatsapp-facebook-twitter link sharing, and unfettered news sites, we developed a single application that works on all these channels. Our solution involves a browser extension that works by right clicking a link, or opening a news link. We use sophisticated machine learning algorithms to then tell the users about the reliability, biases, objectivity in the given news item. Additionally, our Artificial Engineering interface finds the most relevant and credible articles as compared to the one you’re reading right now as well as summarise articles in advance before you use them. • Social Impact: The social impact that this then provides is that not only is the user more aware about which sources to refer to for news, the user is able to perform real time verification of the news or data item shared. Moreover, with feedback loops in place, the spread of disinformation is further tackled by the option of community resolution through the feedback mechanisms in the app. Furthermore, since this works across platforms and websites and has a simple UI/UX, it is easy to use and can be adopted by a large number of people. Tackling disinformation at its root allows us to further stop its spread, and help make the citizens more aware about real issues. 2. Technology Innovation • Machine Learning: (i) NLP for News Metrics A dataset of size ~50k data points which contains news details as well as a binary label for True/Fake is used. The textual data is transformed into vectors by first using the count vectoriser which gives a one-hot like scheme for our words. Further, TFIDF(term frequency- inverse document frequency) is used to optimise these word vectors into a better representation. A logistic regression based model running with a SAGA solver is used to train on the dataset, achieving a validation accuracy of 96%. The model is then replicated into a pipeline, and the pipeline is saved in the Open Neural Network Exchange format for us to use in our flask server. (ii) Extractive Text Summarisation Extractive methods attempt to summarise articles by identifying the important sentences or phrases from the original text and stitch together portions of the content to produce a condensed version. These extracted sentences are then used to form the summary. This works by calculating the cosine difference between each sentence pair and finding the highest rated sentences, adding them to our summary. • Chrome Extension Chrome extensions work by using either page actions or browser actions. A page action is an action that is specific to certain pages. A browser action is relevant no matter where you are in the browser. We use page actions in order to generate a portfolio of relevant information for the site being considered. Our Chrome Extension has two set-ups: (i) Right-Click Fact Check: In the right click fact check scenario, a user right clicks on a specific link that they want to be considered. Our extension adds an option on the right click menu where the user can check “Fact Check Link”. The link is sent to the backend server. The backend server returns the article summary, %age reliability as well as the bias and objectivity of the model. This is then shown to the user in a seamless fashion using a Swal-2 based UI. Application: When browsing through social media, seeing links in your inbox, as well as to summarise the article. (ii) Current Article Fact Check: In the current article fact check, the extension works when the user goes to an article and clicks on our extension icon. As a result, the extension extracts information from the web page including the tab title as well as the URL. This is sent as a post request to the backend which then returns relevant details and displays it using a React.js based frontend. It also displays other relevant articles which have a high credibility to read if you’re interested in the topic. The user then has an option to give feedback to our model about whether the report received is accurate or not, and the feedback is sent to the server as another request on the ‘/feedback’ route. Application: When referring to an article, reading to • Backend Server (i) REST API Flask is a lightweight Web Server Gateway Interface WSGI web application framework. (ii) Web scraping We use beautiful soup to extract article data when a user right clicks on an article link and runs our extension. We also use the newspaper library to extract article metadata by scraping the article. (iii) Classification Pipeline The classification model is loaded using the joblib library and runs on the samples received. (iv) Feedback route The ‘/feedback’ route receives the feedback from the user and stores it in a TinyDB database. This can then be used to retrain our model and optimise its performance. (v) Predict route The’/ predict’ route is a post route that takes in the URL of the article and runs our ML model on the given article. It also uses the newspaper library to generate summary and receive the keywords from the article. The keywords are then used to make an API call which gives us similar articles.
ASATYA BROWSER EXTENSION (hosted on GitHub)
DEVELOPMENT ENVIRONMENT
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers. Jupyter Notebook is used for our development. cmder is a console emulator for windows with bash support.
notebookby jupyter
Jupyter Interactive Notebook
notebookby jupyter
Jupyter Notebook
10143
Version:v7.0.0b3
License: Permissive (BSD-3-Clause)
EXPLORATORY DATA ANALYSIS
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
pandasby pandas-dev
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
pandasby pandas-dev
Python
38552
Version:v2.0.2
License: Permissive (BSD-3-Clause)
WEB SCRAPING
Web scraping is an automatic method to obtain large amounts of data from websites. Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. urllib3 is a powerful, user-friendly HTTP client for Python. Much of the Python ecosystem already uses urllib3
beautifulsoupby waylan
Git Clone of Beautiful Soup (https://code.launchpad.net/~leonardr/beautifulsoup/bs4)
beautifulsoupby waylan
Python
138
Version:Current
License: Others (Non-SPDX)
urllib3by urllib3
urllib3 is a user-friendly HTTP client library for Python
urllib3by urllib3
Python
3414
Version:1.26.16
License: Permissive (MIT)
REST API
A REST API (also known as RESTful API) is an application programming interface (API or web API) that conforms to the constraints of REST architectural style and allows for interaction with RESTful web services. REST stands for representational state transfer and was created by computer scientist Roy Fielding. Flask is a web application framework written in Python. Flask is based on the Werkzeug WSGI toolkit and Jinja2 template engine. flask-cors provides Cross Origin Resource Sharing. TinyDB is a lightweight document oriented database optimized for your database applications. It's written in pure Python and has no external dependencies. The target are small apps that would be blown away by a SQL-DB or an external database server.
flaskby pallets
The Python micro framework for building web applications.
flaskby pallets
Python
63166
Version:2.2.5
License: Permissive (BSD-3-Clause)
flask-corsby corydolphin
Cross Origin Resource Sharing ( CORS ) support for Flask
flask-corsby corydolphin
Python
820
Version:3.0.10
License: Permissive (MIT)
tinydbby msiemens
TinyDB is a lightweight document oriented database optimized for your happiness :)
tinydbby msiemens
Python
5891
Version:v4.7.1
License: Permissive (MIT)
SUMMARISER
Extractive methods attempt to summarise articles by identifying the important sentences or phrases from the original text and stitch together portions of the content to produce a condensed version. newspaper is a full text article metadata extraction library written in pure python.
newspaperby saydulk
News, full-text, and article metadata extraction in Python 3
newspaperby saydulk
Python
0
Version:Current
License: Permissive (MIT License)
NEWS PARAMETER PREDICTION
News prediction works on our pretrained model trained using the scikit learn library.
scikit-learnby scikit-learn
scikit-learn: machine learning in Python
scikit-learnby scikit-learn
Python
54472
Version:1.2.2
License: Permissive (BSD-3-Clause)
MODEL PIPELINE
Joblib is a set of tools to provide lightweight pipelining in Python. We use it to pipeline our machine learning model.
TEXT MINING
Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) to transform the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms. The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.
FRONT-END DEVELOPMENT
Front-end development involves websites and applications using web technologies (i.e., HTML, CSS, DOM, and JavaScript), which run on the Open Web Platform or act as compilation input for non-web platform environments . We have used React as a framework, designed UI from scratch with some components from material-ui. We have used sweetalert2 to add functionalities and styled using bootstrap.
reactby facebook
The library for web and native user interfaces
reactby facebook
JavaScript
208526
Version:v18.2.0
License: Permissive (MIT)
create-react-appby facebook
Set up a modern web app by running one command.
create-react-appby facebook
JavaScript
99979
Version:v5.0.1
License: Permissive (MIT)
material-uiby mui-org
MUI (formerly Material-UI) is the React UI library you always wanted. Follow your own design system, or start with Material Design.
material-uiby mui-org
JavaScript
75241
Version:v5.4.0
License: Permissive (MIT)
sweetalert2by sweetalert2
✨ A beautiful, responsive, highly customizable and accessible (WAI-ARIA) replacement for JavaScript's popup boxes. Zero dependencies. 🇺🇦
sweetalert2by sweetalert2
JavaScript
15936
Version:v11.7.10
License: Permissive (MIT)
bootstrapby twbs
The most popular HTML, CSS, and JavaScript framework for developing responsive, mobile first projects on the web.
bootstrapby twbs
JavaScript
164116
Version:v5.3.0
License: Permissive (MIT)
emotionby emotion-js
👩🎤 CSS-in-JS library designed for high performance style composition
emotionby emotion-js
JavaScript
16434
Version:@emotion/styled@11.11.0
License: Permissive (MIT)