Fake news detection using machine learning

Fake News Detection using Machine Learning

Abstract
• In recent years, due to the booming development of online social networks, fake news for
various commercial and political purposes has been appearing in large numbers and
widespread in the online world.
• With deceptive words, online social network users can get infected by these online fake
news easily, which has brought about tremendous effects on the offline society already.
• An important goal in improving the trustworthiness of information in online social
networks is to identify the fake news timely. This paper aims at investigating the principles,
methodologies and algorithms for detecting fake news articles, creators and subjects from
online social networks and evaluating the corresponding performance.

• Information preciseness on Internet, especially on social media, is an increasingly
important concern, but web-scale data hampers, ability to identify, evaluate and correct
such data, or so called "fake news," present in these platforms.
• In this paper, we propose a method for "fake news" detection and ways to apply it on
Facebook, one of the most popular online social media platforms.
• This method uses Naive Bayes classification model to predict whether a post on Facebook
will be labeled as real or fake. The results may be improved by applying several techniques
that are discussed in the paper.
• Received results suggest, that fake news detection problem can be addressed with
machine learning methods.

Objectives
• In this paper,it is seeked to produce a model that can accurately predict the likelihood
that a given article is fake news.Facebook has been at the epicenter of much critique
following media attention.
• They have already implemented a feature to flag fake news on the site when a user
sees‟s it ; they have also said publicly they are working on to to distinguish these
articles in an automated way. Certainly, it is not an easy task.
• A given algorithm must be politically unbiased – since fake news exists on both ends of
the spectrum – and also give equal balance to legitimate news sources on either end of
the spectrum. In addition, the question of legitimacy is a difficult one.However, in order
to solve this problem, it is necessary to have an understanding on what Fake News.

Existing System
• With the advancement of technology, digital news is more widely exposed to users globally and
contributes to the increment of spreading and disinformation online. Fake news can be found
through popular platforms such as social media and the Internet.
• There have been multiple solutions and efforts in the detection of fake news where it even works
with tools. However, fake news intends to convince the reader to believe false information which
deems these articles difficult to perceive. The rate of producing digital news is large and quick,
running daily at every second, thus it is challenging for machine learning to effectively detect fake
news

Disadvantages
• One of the primary challenges is obtaining high-quality labeled data for training machine
learning models. Labeling news articles as fake or genuine requires significant human
effort and expertise.
• Moreover, the definition of "fake news" can be subjective and may vary depending on
cultural, political, or social contexts, making it challenging to create a universally
applicable training dataset.
• The datasets used for training machine learning models may themselves be biased,
reflecting existing biases in the sources from which the data was collected or in the
labeling process.

Proposed System
• The proposed system of this project is to examine the problems and
possible significances related with the spread of fake news. We will be
working on different fake news data set in which we will apply different
machine learning algorithms to train the data and test it to find which news
is the real news or which one is the fake news.
• As the fake news is a problem that is heavily affecting society and our
perception of not only the media but also facts and opinions themselves. By
using the artificial intelligence and the machine learning, the problem can
be solved as we will be able to mine the patterns from the data to maximize
well defined objectives. So, our focus is to find which machine learning
algorithm is best suitable for what kind of text dataset. Also, which dataset
is better for finding the accuracies as the accuracies directly depends on the

• So, our focus is to find which machine learning algorithm is best suitable for
what kind of text dataset. Also, which dataset is better for finding the
accuracies as the accuracies directly depends on the type of data and the
amount of data. The more the data, more are your chances of getting
correct accuracy as you can test and train more data to find out your results

Advantages
• the goal is profiting through clickbaits. Clickbaits lure users and entice curiosity
with flashy headlines or designs to click links to increase advertisements revenues.
• This exposition analyzes the prevalence of fake news in light of the advances in
communication made possible by the emergence of social networking sites.
• The purpose of the work is to come up with a solution that can be utilized by users
to detect and filter out sites containing false and misleading information.
• We use simple and carefully selected features of the title and post to accurately
identify fake posts. The experimental results show a 99.4% accuracy using logistic
classifier

Introduction
• These days‟ fake news is creating different issues from sarcastic articles to a fabricated news
and plan government propaganda in some outlets. Fake news and lack of trust in the media are
growing problems with huge ramifications in our society. Obviously, a purposely misleading
story is “fake news “ but lately blathering social media‟s discourse is changing its definition.
Some of them now use the term to dismiss the facts counter to their preferred viewpoints.
• The importance of disinformation within American political discourse was the subject of
weighty attention , particularly following the American president election .
• The term 'fake news' became common parlance for the issue, particularly to describe factually
incorrect and misleading articles published mostly for the purpose of making money through
page views. In this paper,it is seeked to produce a model that can accurately predict the
likelihood that a given article is fake

• news.Facebook has been at the epicenter of much critique following media attention. They
have already implemented a feature to flag fake news on the site when a user sees‟s it ; they
have also said publicly they are working on to to distinguish these articles in an automated
way. Certainly, it is not an easy task.
• A given algorithm must be politically unbiased – since fake news exists on both ends of the
spectrum – and also give equal balance to legitimate news sources on either end of the
spectrum. In addition, the question of legitimacy is a difficult one.However, in order to solve
this problem, it is necessary to have an understanding on what Fake News.

Modules
• A.Data Use
• B. Preprocessing
• C. Feature Extraction
• D. Training the Classifier

A.Data Use
• So, in this project we are using different packages and to load and read the data set we are using
pandas. By using pandas, we can read the .csv file and then we can display the shape of the dataset
with that we can also display the dataset in the correct form.
• We will be training and testing the data, when we use supervised learning it means we are labeling
the data. By getting the testing and training data and labels we can perform different machine
learning algorithms but before performing the predictions and accuracies, the data is need to be
preprocessing i.e. the null values which are not readable are required to be removed from the data
set and the data is required to be converted into vectors by normalizing and tokening the data so that
it could be understood by the machine.
• Next step is by using this data, getting the visual reports, which we will get by using the Mat Plot
Library of Python and Sickit Learn. This library helps us in getting the results in the form of
histograms, pie charts or bar charts.

Preprocessing
• The data set used is split into a training set and a testing set containing in
Dataset 1 -3256 training data and 814 testing data and in Dataset II- 1882
training data and 471 testing data respectively.
• Cleaning the data is always the first step. In this, those words are removed
from the dataset. That helps in mining the useful information. Whenever
we collect data online, it sometimes contains the undesirable characters
like stop words, digits etc.
• which creates hindrance while spam detection. It helps in removing the
texts which are language independent entities and integrate the logic
which can improve the accuracy of the identification task.

Feature Extraction
• Feature extraction s the process of selecting a subset of relevant features for use in model
construction. Feature extraction methods helps in to create an accurate predictive model. They help in
selecting features that will give better accuracy.
• When the input data to an algorithm is too large to be handled and i ts supposed to be redundant then
the input data will be transformed into a reduced illustration set of features also named feature
vectors.
• Altering the input data to perform the desired task using this reduced representation instead of the
full-size input. Feature extraction is performed on raw data prior to applying any machine learning
algorithm, on the transformed data in feature space.

D. Training the Classifier
• As In this project I am using Scikit-Learn Machine leanring ibrary for
implementing the architecture.Scikit Learn is an open source python
Machine Learning library which comes bundled in 3rd distribution
anaconda. This just needs importing the packages and you can compile the
command as soon as you write it.
• If the command doesn‟t run, we can get the error at the same time. I am
using 4 different algorithms and I have trained these 4 models i.e. Naïve
Bayes, Support Vector Machine, K Nearest Neighbors and Logistic
Regression wic are very popular methods for document classification
problem. Once the classifiers are traned, we can c heck the performance of
the models on test-set. We can extract the word count vector for each mail
in test-set and predict it class with the trained models

Software Details
• A web application (web app) is an application program that is
stored on a remote server and delivered over the internet
through a browser interface. Web services are web apps by
definition and many, although not all, websites contain web
apps.
• Developers design web applications for a wide variety of uses
and users, from an organization to an individual for numerous
reasons. Commonly used web applications can include
webmail, online calculators or e-commerce shops. While users
can only access some web apps by a specific browser, most are
available no matter the browser.

Front End
• Hyper Text Markup Language (HTML) is the backbone of any website
development process, without which a web page does not exist.
Hypertext means that text has links, termed hyperlinks, embedded in it.
When a user clicks on a word or a phrase that has a hyperlink, it will
bring another web-page. A markup language indicates text can be
turned into images, tables, links, and other representations.
• Cascading Style Sheets (CSS) controls the presentation aspect of the
site and allows your site to have its own unique look. It does this by
maintaining style sheets that sit on top of other style rules and are
triggered based on other inputs, such as device screen size and
resolution. The CSS can be added either externally, internally, or
embedded in the HTML tags

Database
• A Database Management System (DBMS) is a software system that is
designed to manage and organize data in a structured manner. It allows
users to create, modify, and query a database, as well as manage the
security and access controls for that database.
• A database is a collection of interrelated data which helps in the efficient
retrieval, insertion, and deletion of data from the database and organizes the
data in the form of tables, views, schemas, reports, etc.
• For Example, a university database organizes the data about students,
faculty, admin staff, etc. which helps in the efficient retrieval, insertion, and
deletion of data from it.

Cloud Web Server
• Cloud-based application development plays a significant part in today's world. Cloud
has allowed a generation of Internet-enabled applications. What started as one of the
major technological advancements, has revolutionized the entire development paradigm
and made it simpler than ever to develop applications.
• Cloud-based application development has become a revolutionary trend in the IT
industry today due to its various benefits over traditional software development models.
It helps cloud application development services to reduce the costs of software
maintenance and deployment.
• It provides a flexible environment for the developers which allows them to work on
projects from anywhere using any device with internet access.

SOFTWARE REQUIREMENTS:
• Operating System : Windows 10
• Language :PYTHON Google
Colab

HARDWARE REQUIREMENTS:
• Keyboard
• Mouse
• Harddisk
• Ram

Conclusion
• Many people consume news from social media instead of traditional news media.
However, social media has also been used to spread fake news, which has
negative impacts on individual people and society.
• In this paper, an innovative model for fake news detection using machine
learning algorithms has been presented. This model takes news events as an
input and based on twitter reviews and classification algorithms it predicts the
percentage of news being fake or real.

• The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out.
This is to ensure that the proposed system is not a burden to the company.
• For feasibility analysis, some understanding of the major requirements for the
system is essential.This study is carried out to check the economic impact that the
system will have on the organization. The amount of fund that the company can
pour into the research and development of the system is limited.
• The expenditures must be justified. Thus the developed system as well within the
budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.

Fake news detection using machine learning

More Related Content

Similar to Fake news detection using machine learning (20)

Recently uploaded (20)

Fake news detection using machine learning