kandi background
Explore Kits

Fake News Detection using News Articles

by Divyanshu_Chourasiya Updated: Aug 6, 2022

The topic of fake news detection on social media has recently attracted tremendous attention. The basic countermeasure of comparing websites against a list of labeled fake news sources is inflexible, and so a machine learning approach is desirable. Our project aims to use Natural Language Processing to detect fake news directly, based on the text content of news articles. Develop a machine learning program to identify when a news source may be producing fake news. We aim to use a corpus of labeled real and fake new articles to build a classifier that can make decisions about information based on the content from the corpus. The model will focus on identifying fake news sources, based on multiple articles originating from a source. Once a source is labeled as a producer of fake news, we can predict with high confidence that any future articles from that source will also be fake news. Focusing on sources widens our article misclassification tolerance, because we will have multiple data points coming from each source. The intended application of the project is for use in applying visibility weights in social media. Using weights produced by this model, social networks can make stories which are highly likely to be fake news less visible.

Group Name 1

Notebook: The Jupyter Notebook is the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience. JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality. Numpy: NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. It is an open source project and you can use it freely. It stands for Numerical Python. Keras: Keras is a high-level, deep learning API developed by Google for implementing neural networks. It is written in Python and is used to make the implementation of neural networks easy. It also supports multiple backend neural network computation. It is relatively easy to learn and work with because it provides a python frontend with a high level of abstraction while having the option of multiple back-ends for computation purposes. This makes Keras slower than other deep learning frameworks, but extremely beginner-friendly. It allows you to switch between different back ends.

notebookby jupyter

Jupyter Notebook star image 9702 Version:v7.0.0a11

License: Others (Non-SPDX)

Jupyter Interactive Notebook

Support
Quality
Security
License
Reuse

notebookby jupyter

Jupyter Notebook star image 9702 Version:v7.0.0a11 License: Others (Non-SPDX)

Jupyter Interactive Notebook
Support
Quality
Security
License
Reuse

numpyby numpy

Python star image 22526 Version:1.24.1

License: Permissive (BSD-3-Clause)

The fundamental package for scientific computing with Python.

Support
Quality
Security
License
Reuse

numpyby numpy

Python star image 22526 Version:1.24.1 License: Permissive (BSD-3-Clause)

The fundamental package for scientific computing with Python.
Support
Quality
Security
License
Reuse

kerasby keras-team

Python star image 57152 Version:2.11.0

License: Permissive (Apache-2.0)

Deep Learning for humans

Support
Quality
Security
License
Reuse

kerasby keras-team

Python star image 57152 Version:2.11.0 License: Permissive (Apache-2.0)

Deep Learning for humans
Support
Quality
Security
License
Reuse

Group Name 2

Matplotlib: Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its numerical extension NumPy. As such, it offers a viable open source alternative to MATLAB. Developers can also use matplotlib’s APIs (Application Programming Interfaces) to embed plots in GUI applications. A Python matplotlib script is structured so that a few lines of code are all that is required in most instances to generate a visual data plot. Pandas: Pandas is an open-source library that is made mainly for working with relational or labeled data both easily and intuitively. It provides various data structures and operations for manipulating numerical data and time series. This library is built on top of the NumPy library. Pandas is fast and it has high performance & productivity for users. Gensim: Gensim is an open source library in python which is used in unsupervised topic modelling and natural language processing. It is designed to extract semantic topics from documents. It can handle large text collections. Hence it makes it different from other machine learning software packages which target memory processing. Gensim also provides efficient multicore implementations for various algorithms to increase processing speed. It provides more convenient facilities for text processing than other packages like Scikit-learn, R etc.

matplotlibby matplotlib

Python star image 16757 Version:3.6.2

License: No License (null)

matplotlib: plotting with Python

Support
Quality
Security
License
Reuse

matplotlibby matplotlib

Python star image 16757 Version:3.6.2 License: No License

matplotlib: plotting with Python
Support
Quality
Security
License
Reuse

pandasby pandas-dev

Python star image 36647 Version:1.5.2

License: Permissive (BSD-3-Clause)

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Support
Quality
Security
License
Reuse

pandasby pandas-dev

Python star image 36647 Version:1.5.2 License: Permissive (BSD-3-Clause)

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Support
Quality
Security
License
Reuse

gensimby RaRe-Technologies

Python star image 13891 Version:4.3.0

License: Weak Copyleft (LGPL-2.1)

Topic Modelling for Humans

Support
Quality
Security
License
Reuse

gensimby RaRe-Technologies

Python star image 13891 Version:4.3.0 License: Weak Copyleft (LGPL-2.1)

Topic Modelling for Humans
Support
Quality
Security
License
Reuse

Group Name 3

NLTK: The Natural Language Toolkit (NLTK) is a platform used for building Python programs that work with human language data for applying in statistical natural language processing (NLP). It contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning. It also includes graphical demonstrations and sample data sets as well as accompanied by a cook book and a book which explains the principles behind the underlying language processing tasks that NLTK supports. Doc2Vec: Doc2Vec model, as opposite to Word2Vec model, is used to create a vectorised representation of a group of words taken collectively as a single unit. It doesn’t only give the simple average of the words in the sentence. Stopwords: Stopwords are the English words which does not add much meaning to a sentence. They can safely be ignored without sacrificing the meaning of the sentence. For example, the words like the, he, have etc. A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query. We would not want these words to take up space in our database, or taking up valuable processing time. For this, we can remove them easily, by storing a list of words that you consider to stop words. NLTK(Natural Language Toolkit) in python has a list of stopwords stored in 16 different languages.

nltkby nltk

Python star image 11409 Version:3.8.1

License: Permissive (Apache-2.0)

NLTK Source

Support
Quality
Security
License
Reuse

nltkby nltk

Python star image 11409 Version:3.8.1 License: Permissive (Apache-2.0)

NLTK Source
Support
Quality
Security
License
Reuse

doc2vecby jhlau

Python star image 556 Version:Current

License: Permissive (Apache-2.0)

Python scripts for training/testing paragraph vectors

Support
Quality
Security
License
Reuse

doc2vecby jhlau

Python star image 556 Version:Current License: Permissive (Apache-2.0)

Python scripts for training/testing paragraph vectors
Support
Quality
Security
License
Reuse

stopwords-isoby stopwords-iso

JavaScript star image 267 Version:1.1.0

License: Permissive (MIT)

All languages stopwords collection

Support
Quality
Security
License
Reuse

stopwords-isoby stopwords-iso

JavaScript star image 267 Version:1.1.0 License: Permissive (MIT)

All languages stopwords collection
Support
Quality
Security
License
Reuse

Group Name 4

Scikit-learn: Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib. Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is focused on modeling the data. Tensorflow: TensorFlow is a Python library for fast numerical computing created and released by Google. It is a foundation library that can be used to create Deep Learning models directly or by using wrapper libraries that simplify the process built on top of TensorFlow. Unlike other numerical libraries intended for use in Deep Learning like Theano, TensorFlow was designed for use both in research and development and in production systems, not least of which is RankBrain in Google search and the fun DeepDream project. It can run on single CPU systems and GPUs, as well as mobile devices and large-scale distributed systems of hundreds of machines.

scikit-learnby scikit-learn

Python star image 52681 Version:1.2.0

License: Permissive (BSD-3-Clause)

scikit-learn: machine learning in Python

Support
Quality
Security
License
Reuse

scikit-learnby scikit-learn

Python star image 52681 Version:1.2.0 License: Permissive (BSD-3-Clause)

scikit-learn: machine learning in Python
Support
Quality
Security
License
Reuse

tensorflowby tensorflow

C++ star image 170686 Version:1.15.0

License: Permissive (Apache-2.0)

An Open Source Machine Learning Framework for Everyone

Support
Quality
Security
License
Reuse

tensorflowby tensorflow

C++ star image 170686 Version:1.15.0 License: Permissive (Apache-2.0)

An Open Source Machine Learning Framework for Everyone
Support
Quality
Security
License
Reuse

Deployment Information

You can visit this Github Repository for the detailed problem statement and it's precise and well-labelled solution: Link: https://github.com/Divyanshu1509/Fake_News_Detection_Using_News_Articles

Dataset Description:- train.csv: A full training dataset with the following attributes: id: unique id for a news article title: the title of a news article author: author of the news article text: the text of the article; could be incomplete label: a label that marks the article as potentially unreliable 1: unreliable 0: reliable test.csv: A testing training dataset with all the same attributes at train.csv without the label. Clone the repo to your local machine:- > git clone git://github.com/Divyanshu1509/Fake_News_Detection_Using_News_Articles.git > cd Fake_News_Detection_Using_News_Articles Make sure you have all the dependencies installed:- python 3.6+ numpy tensorflow gensim pandas keras matplotlib scikitplot sklearn nltk For nltk, it's recommended to type python.exe in your command line which will take you to the Python interpretor. Then, enter- > import nltk > nltk.download() You're good to go now- > python svm.py

See similar Kits and Libraries

Artificial Intelligence
Machine Learning