IRJET- Detection of Ranking Fraud in Mobile Applications

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 2187
Detection of Ranking Fraud in Mobile Applications
Ms.Manasi Mhatre1, Ms. Surabhi Mhatre2, Ms. Dhikshashri Dhemre 3, Prof.Saroja.T.V4
123 B.E. Students, Department of Computer Engineering, Shivajirao S. Jondhale College of Engineering,
Dombivli (E)
4Associate Professor, Department of Computer Engineering, Shivajirao S. Jondhale College of Engineering,
Dombivli (E), Mumbai University, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The Mobile App is an extremely popular and
renowned idea as a result of the fast advancement of the
mobile technology. As a result of the massive range of mobile
Apps, ranking fraud is the key challenge leading of the mobile
App market. Ranking fraud refers to fraud or vulnerable
activities that have a purpose of bumping up the Apps within
the leading list. Whereas the importance and necessity of
preventing ranking fraud have been widely known. In this
aggregation methodology, we are proposing 2 enhancements.
Firstly, by using Approval of scores from the admin to spot the
precise reviews and rating scores. Secondly, thefauxfeedbacks
by the same person for pushing up that app on the
leaderboard are restricted.
Key Words: Mobile Apps, Ranking Fraud Detection,
Evidence Aggregation, HistoricalRankingRecords,Rating
and Review.
1. INTRODUCTION
Ranking fraud within the mobile app market refers to
deceitful or deceptive activities. Nowadays it has become
normal for app developers to use any means in order to
increase their app rating thus committing ranking fraud. In
this paper, we offer a comprehensive view of ranking fraud
and propose a ranking fraud detection system for mobile
apps. In order to find the ranking fraud, we need to perform
extraction on active periods, particularly leadingsessions,of
mobile Apps. Such leading sessions will be leveraged for
investigating the native anomaly and not the international
anomaly of app rankings.
Moreover, we tend to investigate 3 kindsofevidence,i.e.,
ranking based evidence, rating evidence, and review based
evidence, by modeling apps’ ranking, rating and review
behaviors through statistical hypotheses tests.
In Rating based Evidence, specifically, whenanApphasbeen
released, it will be rated by any user who has downloaded it.
An App that has higher rating could attract a lot of users.
Thus, rating manipulation isa crucial perspective of ranking
fraud. In Review based Evidence, most of the App stores
allow the users to write down some comments as App
reviews.
2. EXISTING SYSTEM
In the existing system, whereas there’s some
interrelated work, like internet ranking spam detection,
online review spam detection, and mobile app
recommendation. The issue of detection ranking fraud for
mobile apps remain under explored. Usually speaking, the
related works of this study are often classified into 3 classes.
The first class is regarding internet ranking spam detection.
The second class istargetedondetectiononlinereviewspam.
Finally, the third class includes the studies on mobile app
recommendation.
3. PROPOSED SYSTEM
In this project,we use effectivealgorithm tospotthe
leading sessions of everyAppdependonitshistoricalranking
records. Then, with the analysis of Apps’ ranking behaviors,
we discover that the fraud Apps usually have completely
differentranking patterns in everyleading sessioncompared
with normal Apps.
Fig 3.1: The Framework of ranking fraud detection
system for mobile Apps
3.1 Identifying Leading Sessions
We need to observe ranking fraud inside leading
sessions of mobile Apps then propose an easy but effective
algorithm to spot the leading sessions of every App depend
on its historical ranking records .After this we discover that
the deceitful Apps usually have completely different ranking
patterns in every leading session compared with traditional
Apps.

3.2 Ranking based Evidences
We need to analyze the fundamental characteristics
of leading events for extracting fraud evidences.Byanalyzing
the Apps’ historical ranking records, we observe that Apps’
ranking behaviors in an exceedingly leading event
continuously satisfy a particularranking pattern,thatconsist
of different ranking phases, namely,risingphase,maintaining
phase and recession phase. Specifically, in every leading
event, an App’s ranking first will increase to a peak position
within the leader board, then keeps such peak position for a
time being, and ﬁnally decreases until the end of the event.
3.3 Rating Based Evidences
User rating isone among the utmost vital featuresof
App promotion. An App that has higher rating could attract a
lot of users to download and can even become higher on the
leaderboard. Thus, rating manipulation is additionally a
crucial perspective of ranking fraud.
3.4 Review Based Evidences
The App stores permit users to write some
comments as reviews. Such reviews will project the personal
perceptions and usage experiences of existing users for
specific mobile Apps. Before downloading or buying a new
mobile App,users readitshistorical reviews&basedonlotof
positive can download it. Thus, imposters usually post faux
reviews within the leading sessions of a particular App so as
to inﬂate the App downloads.
3.5 Evidence Aggregation
The study describes a ranking fraud detection
process where there is some evidence considered and
integrated to get an aggregated result which is most reliable
in finding a fraudulent application in a mobile market[1].
Most generally the ranking fraud is happening in some
particular phase of many leading events [1]. A leading event
may occur due to an advertisement campaign or etc. This
study can be extended to get a recommender system to
enhance user experience
4. LITERATURE SURVEY
Agarwal ET. al. [2] examine sentiment analysis on Twitter
information. The authors experimented with 3 kinds of
models:theunigram model, a feature basedmodel,andatree
kernel-based model. They assumed the unigram model as a
baseline. They investigated 2typesofmodels:treekerneland
feature-based models and demonstrated that each these
models outperform the unigram baseline.
David F. Gleich et al. [3] has done a survey on Rank
Aggregation via Nuclear Norm minimization process of rank
aggregation isintimately tangledwith the structure of skew-
symmetricmatrices. Toprovide a replacement methodology
for ranking a collection of things. The essence of our plan is
that a rank aggregation describes a partly stuffed skew-
symmetric matrix.
Leif Azzopardi et al. [6] studied an investigation the link
between Language Model perplexity and IR precision Recall
Measures the perplexity of the language model includes a
systematic relationship with the accomplishable precision-
recall performance although it's not statisticallyimportant.A
latent variable unigram based mostly lm that has been
successful once applied to IR, is that the so-called
probabilistic latent semantic indexing (PLSI).
N. Jindal and B. Liu [7] conferred variety of police work
Product Review SpammersmistreatmentRatingBehaviorsto
discover users generatingspamreviewsorreviewspammers.
We tend to find many characteristic behaviors of review
spammers and model these behaviors therefore on discover
the spammers.
5. SENTIMENT ANALYSIS ALGORITHM
Sociologists have studied humansentimentforhalfa
century. In the pattern of interaction between people, the
meaning of vocabularieshas the centralroletoshowpeople’s
reaction to each other and also works and other actions
meant to evoke a sentimental response from between
vocabularies.
The increase of social media like blogs and social
networks has fueled interest in sentiment analysis. So as to
find the new opportunities and to manage the reputations,
business folks typically read the reviews/ ratings/
recommendations and different types of online opinion. this
enables to not solely realize the words that are indicative of
sentiment however additionally to seek out the relationships
between words in order that each word that modifies the
sentiment and what the sentiment is regarding will be
accurately identified.
Scaling system is employed to work out the
sentiment for the words having a positive, negative and
neutral sentiment. It also analyzes the following concepts to
know the words and the way they relate to the conception.
There are many sentiment analysis algorithms
available fordevelopers. Implementing sentimentanalysisin
your apps is an easy job. There are not any servers to set up,
or settings toconfigure. Sentiment Analysis analyzes the text
of news articles, social media posts like Tweets, Facebook,
and more. Social Sentiment Analysis is an algorithm that's
tuned to research the sentiment of social media content, like
tweets and status updates. The algorithm takes a string, and
returns the sentiment rating for the “positive,” “negative,”
and “neutral.” Additionally, this algorithm provides a
compound result that is an overall sentiment of the string.
For this purpose we tend to use Classifiers is
Sentiment Analysis is to see the subjective value of a text-
document, i.e. however positive or negative is that the

content of a text document. Regrettably, for this purpose,
these Classifiers fail todetermine a similar accuracy.Thiscan
be because of the subtleties ofhumanlanguage,irony,context
interpretation, use of slang, cultural variations and also the
other ways during which opinion will be expressed. In this
paper, we are using Naive Bayes classifiers.
5.1 Naive Bayes
NaiveBayes classifiers are studying the classificationtask
from a Statistical point of view. The starting point is that the
probability of a class C isgiven by the posterior probabilityP
(C|D) given a training document D. Here D refers to all of the
text in the entire training set. It is given by D= (d1, d2, .dn),
where di is the attribute (word) of document D .Using Bayes’
rule, this posterior probability can be rewritten as:
Since the marginal probability P (D) is equalforallclasses,
it can be disregarded and the equation becomes:
The document D belongs to the class C which maximizes
this probability, so:
Assuming conditional independence of the words di, this
equation simplifies to:
Here P (di | C) is the conditional probability that word i
belongs to class C. For the purpose of text classification, this
probability can simply be calculated by calculating the
frequency of word i in class C relative to the total number of
words in class C.
We need to multiply the class probability with all of the
prior-probabilities of the individual words belonging to that
class. In supervised machine learning algorithm: we can
estimate the prior-probabilities with a training set with
documents that are already labeled with their classes. With
this training set we can train the model and obtain valuesfor
the prior probabilities. This trained model can then be used
for classifying unlabeled documents.
This is relatively easy to understand with an example.
Let’s say we have counted the number of words in a set of
labeled training documents. In this set each text document
has been labeled as either Positive, Neutral or as Negative.
The result will then look like:
From this table we can already deduce each of the class
probabilities:
If we look at the sentence “This blog-post is awesome.”,
then the probabilities for this sentence belonging to a
specific class are:
This sentence can thus be classified in the positive
category.
6. SCREENSHOTS
Fig 6.1: Admin Panel (Manage user)

Fig 6.2: Admin Panel (Manage Provider)
Fig 6.3: Admin Panel (Manage Apps)
Fig 6.4: Provider Panel
Fig 6.5: User Panel
Fig 6.6: User Panel
7. CONCLUSIONS
A unique perspective of this approach is that each one the
evidence ismodel by statistical hypothesistests,thereforeit's
simple to be extended with alternativeevidencefromdomain
data to discover ranking fraud. The admin will notice the
ranking fraud formobile application.TheRevieworRatingor
Ranking given by users is accurately calculated.Hence,anew
user who needs to download an app for a few purposes will
get a clear viewof the present applications.Finallywetendto
validate the proposed system with in-depth experiments on
real-world App information collected from the App Store.
8. FUTURESCOPE
In the future, we will decide to study more practical fraud
evidence and analyze the latent relationship among rating,
review, and rankings. Moreover,we can add more servicesin
ranking fraud detection approach to enhance user
experience.
REFERENCES:
[1]. Hengshu Zhu, Hui Xiong,Yong Ge, and Enhong Chen,
“Discovery of Ranking Fraud for Mobile Apps”in Proc. IEEE
27th Int. Conf. Transactions on knowledge and data
engineering, 2015, pp. 74-87.
[2]. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R.
Passonneau, “Sentiment analysis of twitter data,” in
Proceedings of the Workshop on Languages in Social Media.
Association for Computational Linguistics, 2011, pp. 30–38.
[3]. D. F. Gleich and L.-h. Lim, “Rank aggregation via nuclear
norm minimization,” in Proc. 17th ACM SIGKDD Int. Conf.
Knowl. Discovery Data Mining, 2011, pp. 60–68.
[4]. A. Klementiev, D. Roth, and K. Small, “An unsupervised
learning algorithm for rank aggregation,” in Proc. 18th Eur.
Conf. Mach. Learn., 2007, pp. 616–623.
[5]. J. Kivinen and M. K. Warmuth, “Additive versus
exponentiated gradient updates for linear prediction,” in
Proc. 27th Annu. ACM Symp.Theory Comput. 1995, pp. 209–
218.

[6]. L. Azzopardi, M. Girolami, and K. V. Risjbergen,
“Investigating the relationship between language model
perplexity and in precision- recall measures,” in Proc. 26th
Int. Conf. Res. Develop. Inform. Retrieval,2003, pp.369–370.
[7]. N. Jindal andB. Liu, “Opinion spam and analysis,” in Proc.
Int. Conf. Web Search Data Mining, 2008, pp. 219–230
[8]. T. L. Griffiths and M. Steyvers, “Finding scientific topics,”
Proc. Nat. Acad. Sci. USA, vol. 101, pp. 5228–5235, 2004.
[9].Y. Ge, H.Xiong, C. Liu, and Z.-H. Zhou, “Ataxi drivingfraud
detection system,” in Proc. IEEE 11th Int. Conf. Data Mining,
2011, pp. 181–190.
[10]. K. Shi and K. Ali. Getjar mobile application
recommendations with very sparse datasets. In Proceedings
of the 18th ACMSIGKDD international conference on
Knowledge discovery and data mining, KDD ’12, pages 204–
212, 2012.
[11]. G. Heinrich, “Parameter estimation for text analysis,”
Univ. Leipzig, Leipzig, Germany, Tech.
Rep.,http://faculty.cs.byu.edu/~ringger/
CS601R/papers/Heinrich-GibbsLDA.pdf, 2008.
[12]. B. Yan and G. Chen. Appjoy: personalized mobile
application discovery. In Proceedings of the9thinternational
conference on Mobile systems, applications, and services,
MobiSys ’11, pages 113–126, 2011.
[13]. S. Xie, G. Wang, S. Lin, and P. S. Yu. Review spam
detection via temporal pattern discovery. In Proceedings of
the 18th ACMSIGKDDinternationalconferenceonKnowledge
discovery and data mining, KDD ’12, pages 823–831, 2012.
[14]. Z. Wu, J. Wu, J. Cao, and D.Tao. Hysad: asemi-supervised
hybrid shilling attack detector for trustworthy product
recommendation. In Proceedings of the 18th ACM SIGKDD
international conference on Knowledge discovery and data
mining, KDD ’12, pages 985–993, 2012
[15]. M. N. Volkovs and R. S. Zemel. A flexible generative
model for preference aggregation. In Proceedings of the21st
international conference on World Wide Web, WWW ’12,
pages 479–488, 2012.
[16]. H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian,
“Mining personal context-aware preferences for mobile
users,” in Proc. IEEE 12th Int.Conf. Data Mining, 2012, pp.
1212–1217.
[17]. H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian, “Exploiting
enriched contextual information for mobile app
classification,” in Proc.21stACMInt. Conf. Inform. Knowl.
Manage. 2012, pp. 1617–1621.
[18]. E.-P. Lim, V.-A. Nguyen, N. Jindal,B. Liu, and H. W. Lauw,
“Detecting product reviewspammersusingratingbehaviors,”
in Proc.19thACMInt. Conf. Inform. Knowl. Manage, 2010, pp.
939–948.
[19]. N. Spirin and J. Han, “Survey on web spam detection:
Principles and algorithms,” SIGKDD Explor. Newslett,vol.13,
no. 2, pp. 50-64, May 201.
[20]. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly,
“Detecting spam web pages through content analysis,” in
Proc. 15th Int. Conf. World Wide Web, 2006, pp. 83-92.K.

IRJET- Detection of Ranking Fraud in Mobile Applications

More Related Content

What's hot (18)

Similar to IRJET- Detection of Ranking Fraud in Mobile Applications (20)

More from IRJET Journal (20)

Recently uploaded (20)

IRJET- Detection of Ranking Fraud in Mobile Applications