Asatya | Combatting Disinformation
by manan1 Updated: Jan 28, 2022
Solution Kit
ASATYA : A Browser extension to Combat Disinformation
Video Demo Asatya is a browser extension that bundles a suite of tools to fight disinformation on the web. These inform the user of a news report's reliability, media bias and objectivity using sophisticated Machine Learning Models .The extension summarises the report for the user's convenience, and suggests similar verified articles. Fighting disinformation: One right click at a time!
Features:
1. Machine Learning Predictor for News Reliability 2. Displaying Media Bias 3. Displaying Objectivity 4. Suggested Articles using keyword extraction 5. Article Summariser using Extractive Summarisation Algorithm Using the extension on an article you're reading!
SOCIAL INNOVATION
1.Problem:
The greatest threat to the golden era of the internet age is fake news. With millions of sources and a growing corpus of data being put up on the web daily, combatting disinformation is the need of the hour. Apart from its obvious effects, disinformation on the web has drastic social consequences. With the spread of falsity under the guise of reliability, public acceptance of disinformation becomes a peril. A big example of this has been seen during the COVID-19 pandemic where incorrect data about the disease, ailments, cures and vaccines have rapidly spread through netizens, and affected public opinion for the worse. Moreover, the spread of falsity can negatively affect the communities being described, often patronising or villainising them. In the age of information, any wrong information can tarnish the correct efforts.2. Scope and Scale:
The obvious solution to this issue comes by tackling the roots of the issue, demanding applications that work over a variety of sources as well as work over a variety of technological interfaces. Seeing that fake news is spread through these channels of communications on the internet: click bait headlines, whatsapp-facebook-twitter link sharing, and unfettered news sites, we developed a single application that works on all these channels. Our solution involves a browser extension that works by right clicking a link, or opening a news link. We use sophisticated machine learning algorithms to then tell the users about the reliability, biases, objectivity in the given news item. Additionally, our Artificial Engineering interface finds the most relevant and credible articles as compared to the one you’re reading right now as well as summarise articles in advance before you use them.3. Social Impact:
The social impact that this then provides is that not only is the user more aware about which sources to refer to for news, the user is able to perform real time verification of the news or data item shared. Moreover, with feedback loops in place, the spread of disinformation is further tackled by the option of community resolution through the feedback mechanisms in the app. Furthermore, since this works across platforms and websites and has a simple UI/UX, it is easy to use and can be adopted by a large number of people. Tackling disinformation at its root allows us to further stop its spread, and help make the citizens more aware about real issues.TECHNOLOGICAL INNOVATION
1. Machine Learning:
(i) NLP for News Metrics A dataset of size ~50k data points which contains news details as well as a binary label for True/Fake is used. The textual data is transformed into vectors by first using the count vectoriser which gives a one-hot like scheme for our words. Further, TFIDF(term frequency- inverse document frequency) is used to optimise these word vectors into a better representation. A logistic regression based model running with a SAGA solver is used to train on the dataset, achieving a validation accuracy of 96%. The model is then replicated into a pipeline, and the pipeline is saved in the Open Neural Network Exchange format for us to use in our flask server. (ii)Extractive Text Summarisation Extractive methods attempt to summarise articles by identifying the important sentences or phrases from the original text and stitch together portions of the content to produce a condensed version. These extracted sentences are then used to form the summary. This works by calculating the cosine difference between each sentence pair and finding the highest rated sentences, adding them to our summary.2. Browser Extension
Chrome extensions work by using either page actions or browser actions. A page action is an action that is specific to certain pages. A browser action is relevant no matter where you are in the browser. We use page actions in order to generate a portfolio of relevant information for the site being considered. Our extension has two set-ups: (i) Right-Click Fact Check: In the right click fact check scenario, a user right clicks on a specific link that they want to be considered. Our extension adds an option on the right click menu where the user can check “Fact Check Link”. The link is sent to the backend server. The backend server returns the article summary, %age reliability as well as the bias and objectivity of the model. This is then shown to the user in a seamless fashion using a Swal-2 based UI. • Application: When browsing through social media, seeing links in your inbox, as well as to summarise the article. (ii) Current Article Fact Check: In the current article fact check, the extension works when the user goes to an article and clicks on our extension icon. As a result, the extension extracts information from the web page including the tab title as well as the URL. This is sent as a post request to the backend which then returns relevant details and displays it using a React.js based frontend. It also displays other relevant articles which have a high credibility to read if you’re interested in the topic. The user then has an option to give feedback to our model about whether the report received is accurate or not, and the feedback is sent to the server as another request on the ‘/feedback’ route. • Application: When referring to an article, reading to3. Backend Server
(i) Rest API Flask is a lightweight Web Server Gateway Interface WSGI web application framework. • Feedback route The ‘/feedback’ route receives the feedback from the user and stores it in a TinyDB database. This can then be used to retrain our model and optimise its performance. • Predict route The '/ predict' route is a post route that takes in the URL of the article and runs our ML model on the given article. It also uses the newspaper library to generate summary and receive the keywords from the article. The keywords are then used to make an API call which gives us similar articles. (ii) Web scraping We use beautiful soup to extract article data when a user right clicks on an article link and runs our extension. We also use the newspaper library to extract article metadata by scraping the article. (iii) Classification Pipeline The classification model is loaded using the joblib library and runs on the samples received.KIT SOLUTION SOURCE
ASATYA | A Browser extension to Combat Disinformation Asatya is a browser extension that bundles a suite of tools to fight disinformation on the web. These inform the user of a news report's reliability, media bias, objectivity and even summarizes the report for the user's convenience.
CombattingDisinformationby MananSuri27
CombattingDisinformationby MananSuri27
JavaScript
1
Version:Current
License: Permissive (MIT)
DEVELOPMENT ENVIRONMENT
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers. Jupyter Notebook is used for our development. cmder is a console emulator package for Windows which supports bash.
notebookby jupyter
Jupyter Interactive Notebook
notebookby jupyter
Jupyter Notebook
10114
Version:v7.0.0b2
License: Permissive (BSD-3-Clause)
EXPLORATORY DATA ANALYSIS
For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation. pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
pandasby pandas-dev
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
pandasby pandas-dev
Python
38499
Version:v2.0.2
License: Permissive (BSD-3-Clause)
WEB SCRAPING
Web scraping is an automatic method to obtain large amounts of data from websites. Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. urllib3 is a powerful, user-friendly HTTP client for Python.
beautifulsoupby waylan
Git Clone of Beautiful Soup (https://code.launchpad.net/~leonardr/beautifulsoup/bs4)
beautifulsoupby waylan
Python
138
Version:Current
License: Others (Non-SPDX)
urllib3by urllib3
urllib3 is a user-friendly HTTP client library for Python
urllib3by urllib3
Python
3399
Version:2.0.2
License: Permissive (MIT)
TEXT MINING
Libraries in this group are used for analysis and processing of unstructured natural language. The data, as in its original form aren't used as it has to go through processing pipeline to become suitable for applying machine learning techniques and algorithms. The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.
REST API
A REST API (also known as RESTful API) is an application programming interface (API or web API) that conforms to the constraints of REST architectural style and allows for interaction with RESTful web services. REST stands for representational state transfer. Flask is a web application framework written in Python. Flask is based on the Werkzeug WSGI toolkit and Jinja2 template engine. flask-cors is a Cross Origin Resource Sharing ( CORS ) support for Flask. TinyDB is a lightweight document oriented database optimized for your database applications. It's written in pure Python and has no external dependencies. The target are small apps that would be blown away by a SQL-DB or an external database server.
flaskby pallets
The Python micro framework for building web applications.
flaskby pallets
Python
63073
Version:2.2.5
License: Permissive (BSD-3-Clause)
flask-corsby corydolphin
Cross Origin Resource Sharing ( CORS ) support for Flask
flask-corsby corydolphin
Python
820
Version:3.0.10
License: Permissive (MIT)
tinydbby msiemens
TinyDB is a lightweight document oriented database optimized for your happiness :)
tinydbby msiemens
Python
5869
Version:v4.7.1
License: Permissive (MIT)
SUMMARISER
Extractive methods attempt to summarise articles by identifying the important sentences or phrases from the original text and stitch together portions of the content to produce a condensed version. Newspaper can extract and detect languages seamlessly, including metadata and can perform NLP algorithms on the same.
newspaperby codelucas
News, full-text, and article metadata extraction in Python 3. Advanced docs:
newspaperby codelucas
Python
12800
Version:0.0.9
License: Permissive (MIT)
NEWS PARAMETER PREDICTION
We predict the veracity of news and use machine learning methods to find parameters like objectivity, bias and reliability. scikit learn includes simple and efficient tools for predictive data analysis . It is accessible to everybody, and reusable in various contexts
scikit-learnby scikit-learn
scikit-learn: machine learning in Python
scikit-learnby scikit-learn
Python
54382
Version:1.2.2
License: Permissive (BSD-3-Clause)
MODEL PIPELINING
Joblib is a set of tools to provide lightweight pipelining in Python.
FRONT-END DEVELOPMENT
Front-end web development, also known as client-side development is the practice of producing HTML, CSS and JavaScript for a website or Web Application so that a user can see and interact with them directly. React is a free and open-source front-end JavaScript library for building user interfaces based on UI components. The seamless UI was developed from scratch using components from material-ui. We used sweetalert2 to add certain functionalities and styled it using bootstrap.
reactby facebook
The library for web and native user interfaces
reactby facebook
JavaScript
208097
Version:v18.2.0
License: Permissive (MIT)
material-uiby mui-org
MUI (formerly Material-UI) is the React UI library you always wanted. Follow your own design system, or start with Material Design.
material-uiby mui-org
JavaScript
75241
Version:v5.4.0
License: Permissive (MIT)
sweetalert2by sweetalert2
✨ A beautiful, responsive, highly customizable and accessible (WAI-ARIA) replacement for JavaScript's popup boxes. Zero dependencies. 🇺🇦
sweetalert2by sweetalert2
JavaScript
15877
Version:v11.7.5
License: Permissive (MIT)
bootstrapby twbs
The most popular HTML, CSS, and JavaScript framework for developing responsive, mobile first projects on the web.
bootstrapby twbs
JavaScript
163966
Version:v5.3.0-alpha3
License: Permissive (MIT)
emotionby emotion-js
👩🎤 CSS-in-JS library designed for high performance style composition
emotionby emotion-js
JavaScript
16430
Version:@emotion/styled@11.11.0
License: Permissive (MIT)
Deployment Information
OUR TEAM
Aaryak Garg (github.com/Darthfire ) Arsh Kohli ( github.com/arshxyz ) Manan Suri ( github.com/MananSuri27 )