Mahendra nath

A Presentation on
“ Fake User Detection”
SUBMITTED BY
Mahendra Nath Dwivedi
Roll No:- 502202216004
Enroll No.:- AA/3522
Department of Computer Science & Engineering
Central college of engineering and management
GUIDED BY
Mr. Abhishek Badholia
DEPT. OF COMPUTER SCIENCE &
ENGINEERING

CONTENT
• INTRODUCTION
- INTRODUCTION OF PROJECT
- REVIEW ANALYSIS OF AMAZON.COM
- BEHAVIOR FEATURES OF SPAMMERS
• LITERATURE REVIEW
• PROBLEM IDENTIFICATION
• METHODOLOGY
• RESULT AND FUTURE SCOPE
• REFERENCES

With the development of the Internet, people are more likely to express their
views and opinions on the Web. They can write reviews or other opinions on
E-Commerce sites, forums, and blogs. They are also used by product
manufacturers to identify problems of their products and to find competitive
intelligence information about their competitors. Unfortunately, this
importance of reviews also gives good incentive for spam, which contains
false positive or malicious negative opinions
INTRODUCTION

Table shows some selected mobile phone reviews from the Amazon website. For
the mobile phone product's topic, reviews 1 and 2 are relevant to the topic, and
review 1 has the highest relevancy than other reviews. But, it is hard to decide the
relevancy between review 3 and the topic. Besides, reviews 4 and 5 are part of
plagiarism of review 3, and review 6 is an advertisement. Only two of six reviews are
relevant to the mobile phone product's topic. Fake reviews can not only increase
decision's making cost, but also affect decision accuracy making.
Review(Comment) analysis of website.

1) Star User
2) Deviation Rate
3) Bias Rate
4) Review Similarity rate
5) Review Relevancy Rate
6) Content-Length
7) Illustration
8) Burst review
BEHAVIOR FEATURES OF SPAMMERS

Types of Review Spams
Basically three types of review spams exist[6]. These are:
Type 1 (Untruthful Review Spams): Fictitious positive reviews are rewarded to
products in order to promote them and also unreasonable negative reviews
are given to the competing products to harm their reputations among the
consumers.This is how untruthful reviews mislead the consumers into
believing their spam reviews.
Type 2 (Reviews with brand mentions): These spams have only brands as their prime
focus. They comment about the manufacturer or seller or the brand name
alone.These reviews are biased and can easily b figured out as they do not
talk about the product and rather only mention the brand names.
Type 3 (Non-reviews): These reviews are either junk, as in, have no relation with the
product or are purely used for advertisement purposes. They have these
two forms:
i. marketing purposes, and
ii. irrelevant text or reviews having random write-ups.

Rule Based Classification Of Spammers

METHODOLOGY
In this section we will discuss the proposed framework in detail. The
proposed spam detection and blocking framework consist of various
modules.
•Feature Discretization
•Negative Set Extraction
•Expected Maximization Algorithm
•Blocking of Users
.

Rating Deviation from Mean Agreement
Filter Mean Target Difference
Group Filter Mean Variance
Target Model Focus

Algorithm: Negative Set Extraction
Input: P → Positive Set of Spammers
U → unlabeled set of users
Output: RN → Set of negative set.
RN <- N initially
RN_Extract (P, U)
For each feature do
Calculate
End for
For evaluate each feature (decreasing order) do
Remove instances consists of from
If Size(RN) is close enough to P then
Return RN
End if
End for
End
Algorithm for Negative set extraction is presented below.

BLOCK DIAGRAM OF GENERATING A LIST OF SPAMMERS

Literature Riews
To detect spam reviews, some scholars have done some related research by using
the techniques of data mining and natural language processing.This works are
performed by the several other researcher by work of them my research takes place
1. The paper entiteled “FAKE REVIEW DETECTION FROM A PRODUCT REVIEW USING
MODIFIED METHOD OF ITERATIVE COMPUTATION FRAMEWORK “ was published in
DP Sciences.( DOI: 10.1051/conf/ matec5803003) by Eka Dyar Wahyuni and Arif
Djunaidy.
They worked on The honesty value of a review will be measured by utilizing the text
mining and opinion mining techniques. The result from the experiment shows that
the proposed system has a better accuracy compared with the result from iterative
computation framework (ICF) method and try to identified the fake reviews.The
drawback of this method is, some process need to be optimized, so it can detect a
fake review in a short amount of time.
REVIEW OF PAPERS:-

2. The paper entitled “Spammers Detection from Product Reviews: A Hybrid Model”
was published in 1550-4786/15 $31.00 © 2015 IEEEDOI 10.1109/ICDM.2015.73.
They worked on This paper focuses on detecting hidden spam users based on product
reviews. In the literature, there have been tremendous studies suggesting diversified
methods for spammer detection. This paper proposes a principled hybrid learning
model called hPSD to combine both user features and user-product relations for
spammer detection. Three essential components of hPSD, including feature
discretization, reliable negative set extraction and hybrid learning scheme, are
elaborated respectively.
3. The paper entitled “Mining the Peanut Gallery: Opinion Extraction and Semantic
Classification Of Product Reviews” was published by Kushal Dave NEC Laboratories
America
They worked on Opinion mining tool would process a set of search results for a given
item, generating a list of product attributes (quality ,features, etc.) and aggregating
opinions about each of them (poor , mixed,good).We begin by identifying the unique
properties of this problem and develop a method for automatically distinguishing
between positive and negative reviews. a number of issues that make this problem
difficult in Rating inconsistency, Ambivalence and comparison , Sparse data , Skewed
distribution

PROBLEM IDENTIFICATION
The main problem of reviews by users lies in the fact to identify the spam reviews
in between genuine reviews. The reviews posted by any users can be spam or not a
spam. Consider an example of person Alice. Alice constantly posting review of some
published “X”. The publisher published many books. Alice simply post good content
and genuine review to the publisher “X”. He purchase most of the books of “X” and
provide review on that particular book. So by looking at this posts, the algorithm can
conclude that the user Alice is genuine user so as its comments too.
But in fact, the user Alice is hired to posts review by publisher “X”. HE gave
good and 5 star rating to publisher “X” books. This might be the problem in
identification of users who looks to be genuine but not actually is.
Fig. 3.1. Alice Behaviour of Reviewing Books

Percentage of Users Being Spammer and Ham
Lastly, the users are classified into spam and non-spam categories. The
probability of categorizing into spam and non-spam category are presented in
In our dataset, the probability of spam users are 49 % and non-spam users
are 51%. The dataset is flooded with the spam users. The user need to be blocked
so that they cannot further effect the review and comments.
Fig.. Shows the Probability of Spammer and Non-Spammers

Users Blocking
After identifying of the spam users, they are blocked. The
blocking stage is depicted in fig.

The result produced by EM(Expectation Maximization) algorithm with 6 features are
compared with the base paper having more number of features. Fig. Shows the
comparison between proposed and existing approach.

Future scope of work
In this project, majority of the work has been done with respect to spammer
detection technique. The major drawback of this work is working with only one
dataset. The future scope might be working with multiple dataset to analyse the
attacker of other websites too.

References
Nitin Jindal and Bing Liu, “Analyzing and Detecting Review Spam”, Seventh IEEE International Conference on Data
Mining 2007.
SNEHAL DIXIT & A.J.AGRAWAL, “REVIEW SPAM DETECTION”, International Journal of Computational
Linguistics and Natural Language Processing Vol 2 Issue 6 June 2013 ISSN2279 –0756
Gera T., Thakur D. and Singh J. 2015. BILD Testing for Spotting Out Suspicious Reviews, Suspicious Reviewers and
Group Spammers, International Conference on Communication Systems and Network Technologies(CSNT.2015.138).
Liang D., Liu X. and Shen H. 2014. Detecting Spam Reviewers by Combing Reviewer Feature and Relationship,
International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS).
Mukherjee A., Kumar A., Liu B., Wang J. and Ghosh R. 2013. Spotting Opinion Spammers using Behavioral Footprints.
Mukherjee A., Glance N. and Liu B. 2012 . Spotting Fake Reviewer Groups in Consumer Reviews.
Wang G., Xie S., Liu B. and Philip S. Yu 2011. Review Graph based Online Store Review Spammer Detection, IEEE
International Conference on Data Mining(ICDM) .
Zhang X., Xiong G., Zhu F. and Dong X. 2016. A Method of SMS Spam Filtering Based on AdaBoost Algorithm, World
Congress on Intelligent Control and Automation (WCICA).

Mahendra nath

More Related Content

Similar to Mahendra nath (20)

Recently uploaded (20)

Mahendra nath