A Supervised Modeling Approach to Determine Elite Status of Yelp Members

A Supervised Modeling Approach to Determine the Elite Status
of Yelp Members Using Decision Trees and Linear Regression
Chithroobeni Shankar
Carnegie Mellon University
chithroobeni.shankar@sv.cmu.edu
Darshana Sivakumar
darshana.sivakumar@sv.cmu.edu
Jennifer Li
jennifer.li@sv.cmu.edu
Julie Tram
julie.tram@sv.cmu.edu
Moustafa Aly
moustafa.aly@sv.cmu.edu
Neil Everette
neil.everette@sv.cmu.edu
Ravindra Udipi
ravindra.udipi@sv.cmu.edu
Sahil Kumar
sahil.kumar@sv.cmu.edu
Abstract
Yelp, which was founded in 2004 by two PayPal executives,
is a crowd-sourced multinational company headquartered in
San Francisco, CA. Yelp’s goal is to connect people with
great local businesses. Yelp has over 77 million cumulative
reviews from yelpers around the world. Yelpers share their
everyday local business experiences, giving voice to con-
sumers and bringing word of mouth online. Approximately
142 million unique visitors used Yelp’s website, and approx-
imately 79 million unique visitors visited Yelp via their mo-
bile device, on a monthly average [1].
Embed among all these business reviews and yelpers is
a classification between Elite and Non-elite yelpers. Yelp
Elite is a way for Yelp to recognize and reward users who
are active on Yelp. Elite-worthiness is based on a number
of things, including well-written reviews, high quality tips, a
detailed personal profile, an active voting and compliment-
ing record, and a history of playing well with others [2]. Elite
status is earned every year and is determined by a commit-
tee. Elite yelpers have profiles with special badges and the
elite yelpers are invited to private events and parties.
For the data analytics course project, our team will at-
tempt to crack the code using a systematic algorithm to pre-
dict users’ Elite worthiness. We will use the Yelp academic
set and the associated user attributes to determine the most
accurate algorithm to predict elite status. Our goal for the
project is to predict with 95% accuracy if a user obtains elite
status for any particular year within the Yelp Academic set.
We should note that there are some inherent risk using
the Yelp academic data set. Our team has no insight into any
additional or hidden indicators that may be used in determin-
ing Elite status beyond the data field that was provided in the
Yelp Academic set. The academic dataset only has 12% of
the reviews from 370K users. Our algorithm and modeling
is based on the the data provided that exists in the academic
data set.
1. Introduction
The Yelp Academic Dataset has been provided by Yelp to be
used for academic purposes. The dataset is a rich resource
of the interaction information between customers and busi-
nesses on the Yelp platform. Yelp’s academic dataset in-
cludes information about businesses near 30 different pre-
mium schools, including Carnegie Mellon University in
Pittsburgh, Pennsylvania. The academic dataset is in the
form of different JSON files for different objects, with nested
json structures and arrays in it. It consists of five objects re-
lated to Businesses, Customers, Reviews, Customer Check-
Ins and Customer tips. Business objects contain basic infor-
mation about local businesses. Review objects contain the
review text, the star rating, and information on votes Yelp
users have cast on the review. User objects contain aggre-
gate information about a single user across all of Yelp. Table
1 shows the number of records for each of these categories
and describes the Yelp objects [3].
2. Problem Statement
All Yelpers can nominate themselves or their friends to be an
elite member on the Yelp website. According to Yelp, there
isn’t a specific benchmark for a member to be selected to

Object Type Description Num. of Records
Business Business objects contain, location information, Number of reviews, average star
ratings and url of the local businesses.
61,184
Review Review objects contain the review text, the star rating, and information on votes Yelp
users have cast on the review. [4]
1,569,264
User User objects contain aggregate information about a single user across all of Yelp. 366,715
Check-ins The Check-ins set provides the data related to the user check-ins patterns for busi-
nesses.
45,166
Customer tips Similar to the reviews the tips set also has the text column that provides quick tips
related to the businesses.
495,107
Table 1. Various Objects Belonging to Yelp Academic Dataset
be an elite member or not. Also, to be considered elite, a
member needs to reapply every year [5].
Yelp’s Elite Council’s process of selecting elite members
is a blackbox for the rest of the world. What if, using Yelp’s
historic data, we could create an automated process for deci-
phering if a member is fit enough to be given the elite status
or not? This could potentially ease the selection task for the
Elite Council, by automatically filtering out nominations that
are predictably unfit for elite status. This will result in sav-
ings for Yelp, as the overhead costs of preliminary filtering
for the Elite Council will be removed.
2.1 Goal
Our goal is to create an algorithm to predict a user’s elite
status on Yelp. We want to predict a user’s elite status with
an accuracy of 95%.
3. Initial Data Investigation
There are 5 data objects provided in the Yelp academic
dataset that comprise the 1.6 million reviews and 500k tips
by 366k users for 61k business in 10 cities and four countries
[3]. Out of the 366k Yelp members in the dataset, only 25k
(6.8%) were determined to be elite members. For our initial
investigation we analyzed the 20 attributes of the user data
object to find correlations that could identify elite vs non-
elite members.
3.1 Most Significant Attributes for 2015
The data set was initially reduced to user activity in the
year 2015. The red outline in the box plot developed using
Tableau as seen in Figure 1 identifies the non elite members.
Compared with results from elite members, the following at-
tributes had significant differences over non-elite members:
• Number of reviews written
• Number of user Fans
• Votes counted as Useful
• Votes counted as Cool
• Votes counted as Funny
Figure 1. Most Significant Attributes for 2015
These five attributes were initially flagged as attributes
for further analysis.
3.2 Review Count Past 10 Years, Elite vs Non-Elite
The box plot in Figure 2 depicts the most significant at-
tribute, Review Count.
When the user data attributes were expanded over a 10
year span, it confirmed the findings from the 2015 informa-
tion.
With small exceptions in 2005 (the first year of Elite
qualification) and 2015 (an incomplete year), the attribute
findings were consistent across the 10 year span.
According to our initial analysis on the user attribute
data alone, it was concluded that the four initial attributes
as in Table 2 had a high correlation in identifying Elite
vs Non-Elite members. Additional manipulation of the data
(merging of the user data with the review data set) was
required to further test if other conditions such as previous
years as Yelp Elite status had any additional correlating
effects.

Figure 2. Review Count Past 10 Years, Elite vs Non-Elite
Review Count
Elite Non-Elite Difference
75% Quartile 106 Reviews 11 Reviews 9.6x
Median 75 Reviews 5 Reviews 15x
25% Quartile 51 Reviews 2 Reviews 25x
Votes Useful
75% Quartile 140 Votes 16 Votes 8.8x
Median 70 Votes 4 Votes 17.5x
25% Quartile 40 Votes 1 Votes 40x
Votes Cool
Median 27 Votes 1 Votes 27x
Votes Funny
Median 20 Votes 1 Votes 20x
Table 2. Initial Data Findings
Dataset Num. of Attributes
Users 23
Businesses 105
Table 3. Number of Attributes in Different Datasets
4. Feature Selection
Feature selection is a popular technique in Data Mining that
helps reduce input data into more manageable sizes for pro-
cessing and analysis. It does not imply only cardinality re-
duction, i.e. reducing the number of features to be selected
based on a cutoff count, but also actively selecting features
or attributes of a dataset based on their usefulness for anal-
ysis4. Some datasets have the issue of containing too many
attributes which are sparse in their information. This may
lead to cumbersome fitting problems with a model and even
degrade the quality of the result by introducing noise in the
analysis. For this reason we paid attention to the feature se-
lection and data massaging’ early in our work.
As was alluded earlier, the raw Yelp datasets had a high
number of attributes to describe Users and Businesses as
seen in Table 3.
In our bid to create predictive models for determining
Yelp Elite User Selection, we found the models built on
the raw dataset to have a high degree of inaccuracy. In
order to determine the usefulness of the attributes available,
we decided to evaluate correlation between a user’s elite
status and the other attributes available to depict his behavior
on the Yelp platform. Furthermore, we rendered correlation
matrices for the dataset available. This helped us narrow
down to the attribute groups of interest. Still cautious about
getting rid of data in the dataset, we decided to try and club
related attributes. We had 10 different type of compliments
and 10 attributes to represent them. Since they all could
essentially be clubbed in an aggregated field to represent the
overall compliments, we decided to experiment with that.
With this experiment, we noticed slightly higher accuracy
in our predictive model. Inspired by the change, we decided
to apply the same approach to some more attributes which
were related. There were 3 attributes to represent 3 different
types of votes a user had received. Consolidating the data
from these 3 columns into one was the next step. Our model
improved with this step too.
Once we knew that we had a better organized dataset
now to work on in order to create the model, we decided
to trim out down some more to really highlight the patterns
we were interested in and use the correlations that were
more prominent. We built new sets of correlation matrices
on the new dataset in order to filter down to the attributes
that had the highest impact in determining a user’s elite
status. Comparing the correlation of the newly generated
aggregated attributes helped us find the areas we needed to
concentrate on in order to build an effective model. We were
able to improve our models by leveraging this information

Figure 3. Correlation Matrix
Figure 4. Correlation Matrix
by creating appropriate rules in order to better utilize the
correlation information we had found.
5. Algorithm Experiments
Our team decided to dive deeper into the Yelp user data
set to gain better insights in it. As we focused more on the
Yelp Elite member status, we began exploring different tech-
niques to determine any correlation that will help establish
our model. The team used a supervised learning technique
to understand the criteria for Yelp Elite member.
The criteria for our classifier algorithms selection were:
• Rule Based Classifier: We are interested in generating a
rules engine for Yelp Elite users evaluation.
• Reasonable computational complexity: The academic
dataset size is over 2 GB, with the review dataset over
1.4 GB. We need to have the best of both worlds, fast al-
gorithm to allow multiple experiments and produce high
quality model.
Based on that, these initial set of algorithms are selected
for experimentation, we evaluated their effectiveness during
the project life cycle.
• Alternating decision tree
• kNN: k nearest neighbor classification
• Bayesian Algorithms
• Random Forest
• CART: Classification and regression tree
• Conjunctive Rule classification
Below, we will briefly discuss our results in each type of
classifiers:
• Bayesian Algorithms: We ran the data set against a num-
ber of Bayesian algorithms and the results were very
weak True positive rate (62%). A quick look on the nature
of our data and some visualizations lead us to understand
why the Bayesian algorithms performed poorly. Bayesian
algorithms assume strong independence on the attributes,
the data set we had, we could see strong correlation be-
tween the attributes, such as the number of star counts
and the number of reviews. The statistical advantage of
Bayesian algorithms was lost in our case.
• Regression models: We had a hunch that regression
models were not the best options we have. The data is
rich in its attributes and the percentile distribution of the
values in each attributes leads to multiple decision points.
For example, if the number of compliments the user has
is less than 2, he/she is definitely not an elite member.
However, we tried the regression models, the results were
better than Bayesian algorithms, but far from our targets:
True positive rate (72%).
• Alternating Decision Trees: Based on our observations,
we noticed that we needed an algorithm that works with
independent attributes and be sensitive to different bands
of data. Decision trees seemed to be a natural and logi-
cal progression. We had better results True positive rate
(79%). We couldn’t improve further than this.
• Random Forest: As its a family of decision trees, the
results were almost identical to the previous algorithms.
• kNN: K nearest neighbor seemed to be a good choice as
it tends to do well with Binary classifier if K is selected
to be an Odd number. The results were very promising.
True positive rate (84%). However, we identified that we

Figure 5. Dataset Relations
could not improve further as the data is quite discrete and
this degrades the performance of the algorithm.
• CART: Classification and regression tree: Classification
and regression trees seemed to combine the best of both
worlds, Rules that can take care of the percentile distri-
bution and regression that can easily identify the correla-
tion between the attributes and weight them. Indeed, we
achieved the best results with J48 tree and linear regres-
sion, our True positive rate was 94.2% .
6. Results and Analysis
Once we finalized our goal to create a model for determining
the elite worthiness of an yelp user, we focused on the User
attributes and the Review attributes. During the experiment
period we used this data set in different combinations to
arrive at our final model. In this section we describe the
reasoning behind using each of these combinations and the
results of our experiments on each of these data sets.
6.1 Pennsylvania Data Set
Goal: We decided to focus on the data from one state, as
it provides balanced distribution of business types, users and
reviews and helps us understand the behavior and correlation
amongst these attributes. With the smaller data set it is also
easier to try different algorithms.
Data Manipulation: We chose Pennsylvania as it had the
business around the CMU campus and it ranked second
for the number of restaurants/ state metric in the academic
dataset. The user data set did not have the state information
related to the user. With an assumption that the review data
is local, we picked business in Pennsylvania, selected the
reviews for these businesses and then got the users and
the corresponding attributes for these reviews. The relation
between the Business, User and Review objects in shown in
Figure 5.
Datasets Used:
Data Size: 17,791
Elite:Non-Elite: 1:12
Attributes Used: review count, fans, votes.cool,
votes.funny, votes.useful, average stars, compliments.hot,
compliments.more, compliments.list
Results: The results obtained using the J48 Pruned Tree
and Regression Classifier are shown in Table 4.
Discussion: The Pennsylvania data was initially selected
because it is smaller and take less time to try different al-
gorithms. At the same time, it has a balanced distribution of
business types, users and reviews.
The data was divided to test data and training data at ratio
of 1:2. After running J48 graft pruned tree classifier, 95.40%
users are correctly classified. ROC area is 95.70% which
means the classification is quite accurate. However the False
positive rate on Non-Elite users is quite high which means
many users that can be qualified as elite users are falsely
classified as Non-Elite users. So the goal for next step is
to expand the algorithm to a larger scale, as well as reduce
False positive rate.
6.2 Review Data Set
Goal: The academic data set has 1.6 Million reviews,
spread across multiple users and business. The intention of
this experiment is to predict the elite Yelpers just based on
the review data.
Data Manipulation: To be able to use the review data
we decided to aggregate them by the userId for each year.
Elite status is granted to users on an yearly basis. Being
elite in one year doesn’t necessarily mean the status can
be kept for next year. User data doesn’t reflect this time
sensitiveness. Most attributes in user dataset is an aggregated
result across years since user joined Yelp. So we decided to
explore Review dataset which has timestamps of when the
user posted the reviews.
For a given userId we aggregated the star ratings (1,2,3,4
and 5) provided by each user and also the compliments
(funny, cool and useful) they got. For each of these user Ids
we inserted an isElite flag based on the year they were elite.
The years the user had the elite status is available in the user
data set. Sample record set aggregation is depicted in Figure
6.
Datasets Used:
Data Size: 500,967
Attributes Used: NumberOf5StarReviews, Num-
berOf4StarReviews, NumberOf3StarReviews, Num-
berOf2StarReviews, NumberOf1StarReviews, funnyVote-
Count, usefulVoteCount, coolVoteCount
Discussion: The data was divided to test data and training
data at ratio of 1:2. In the result, weighted average True pos-
itive rate is 93.40%. It is not a significant change from last
experiment. However, True positive rate of Non-Elite is 99%
and False positive rate of Elite is 1%, while True positive

TP Rate FP Rate Precision Recall F-Measure ROC Area Class
85.50% 3.50% 74% 85.50% 79.30% 95.70% Elite
96.50% 14.50% 98.30% 96.50% 97.40% 95.70% Non Elite
Weighted Avg. 95.40% 13.40% 95.80% 95.40% 95.50% 95.70%
Table 4. Pennsylvania Data Set Results
21.80% 1.00% 62.30% 21.80% 32.30% 69.30% Elite
99.00% 78.20% 94.20% 99.00% 96.50% 69.40% Non Elite
Weighted Avg. 93.40% 72.60% 91.90% 93.40% 91.90% 69.40%
Table 5. Review Data Set Results
Figure 6. Data Transformation
rate of Elite is 21.80% and False positive rate of Non-Elite
is 78.20%. This means the classifier tends to classify any
given user toward Non-Elite instead of Elite. Almost 80% of
users that can be elite are falsely classified to non-elite. And
the ROC Area dropped significantly to 69.40%. It means the
accuracy of this classification is not very good.
There are two main reasons behind this result:
• Review attributes are not as strongly associated with the
elite status as user attributes.
• Data is highly skewed towards non-elite users.
To achieve more accurate results, we need to stay with
user attributes and take some advantages of review attributes.
6.3 All User Data
Goal: The academic data set has 366K user’s data with 23
attributes. The goal of this experiment is to predict the elite
Yelpers just based on the user data.
Data Manipulation: We massaged the User level attributes
a little to obtain parsable data elements. From the feature
selection process and the attribution correlation matrix we
identified that user data set has attributes like review count,
fans, votes.cool and votes.useful that play a significant role
to obtain the elite status. We further aggregated the friends
list and the total number of votes and compliments into a
measurable numeric count.
Datasets Used:
Data Size: 366,715
Attributes Used: review count, friends, fans, aver-
age stars, yelping.since.months, aggregated compliments,
aggregated votes
Discussion: After expanding the algorithm to all user data,
the result is quite satisfying. The data was still divided to test
data and training data at ratio of 1:2.
Weighted average is 97% with the True positive rate of
Elite users as high as 98.70%. ROC Area is 94.70%, which
means this classification is relatively accurate. The False
positive rate of Elite is as high as 24.90%, meaning that users
who are not supposed to be elite users are classified to elite
group.We would like to reduce the False positive rate while
maintaining the accuracy of the prediction.
This result proved that user attributes are very strongly as-
sociated with the elite status, compared to review attributes.
However, since elite status is granted on a yearly basis, user
attributes still cannot capture the impact of the time factor
on the elite status. So the next goal is to aggregate the two
dataset to leverage the strengths of user attributes and the
time attribute of review data.

98.70% 24.90% 98.20% 98.70% 98.40% 94.70% Elite
75.10% 1.30% 80.50% 75.10% 77.70% 94.70% Non Elite
Weighted Avg. 97.00% 23.20% 96.90% 97.00% 97.00% 94.70%
Table 6. All User Data Results
79.40% 1.80% 76.30% 79.40% 77.80% 98.60% Elite
98.20% 20.60% 98.50% 98.20% 98.30% 98.60% Non Elite
Weighted Avg. 96.90% 19.30% 96.90% 96.90% 96.90% 98.60%
Table 7. Merged Review and User Data Results
94.20% 4.10% 90.20% 94.70% 92.40% 98.80% Elite
95.90% 5.80% 97.50% 95.10% 96.70% 98.80% Non Elite
Weighted Avg. 95.30% 5.30% 95.90% 95.90% 94.90% 98.80%
Table 8. Balanced Merged Review and User Data Results
6.4 Merged Review and User Data
Goal: After independent analysis of user level attributes
and review level attributes, we wanted to measure the impact
of these attributes together on the elite status. So we decided
to aggregate the review data and merge it with the user data
to predict the elite status.
Data Manipulation: We merged the user level attributes
into review attributes, to be able to experiment the combined
data set. We converted the yelping since column to a mea-
surable number of months field. It is a factor that reflects how
long an user has been active on Yelp. Here the review Count
belongs to user attributes. StarCount1 starCount5 belong
to review attributes. In the dataset, review data only captures
about 12% total reviews all users have given. So sum of all
users’ reviewCount is not equal to amount of reviews.
Datasets Used:
Data Size: 366,715
Attributes Used: yelpMonths, starCount5, starCount4.
starCount3, starCount2, starCount1, averageStars, cool-
ComplimentsCount, funnyComplimentsCount, useFulCom-
plimentsCount, friendsCount, fanCount, reviewCount
Discussion: The merged dataset yields better result than
review-only data and user-only data. While the weighted av-
erage True positive rate stays as high as 96.90%, average
False positive rate dropped to lower than 20%. ROC Area
is as high as 98.6%, up from 94.70% on user data. Compar-
ing the False positive rate of Elite and Non-Elite, the clas-
sifier still tends to classify users towards non-elite. It makes
20.60% users falsely classified to non-elite status. False pos-
itive rate of two types is very unbalanced.
This experiment showed that the combined attributes
work better in the classification. However the skewed data
problem hasn’t been solved yet. Next step is to balance the
dataset so that false positive rate can be further reduced.
6.5 Balanced Merged Review and User Data
Goal: The goal here is to run our experiments on a bal-
anced data set that is not skewed towards Non-elite mem-
bers.
Data Manipulation: In all of the above mentioned datasets,
the proportion of the elite users was very less so the results
were more inclined towards classifying non-elite status.So
to get the right balance, we choose a dataset, that had a bal-
anced mix (1:2) of elite vs non-elite data. from the merged
dataset.
Datasets Used:
Data Size: 82,000
Attributes Used: yelpMonths, starCount5, starCount4,
starCount3, starCount2, starCount1, averageStars, cool-
ComplimentsCount, funnyComplimentsCount, useFulCom-
plimentsCount, friendsCount, fanCount, reviewCount

Discussion: After balancing the dataset with 33% elite
users and 67% non-elite users, we got the best result among
all experiments. The data was divided to test data and train-
ing data at ratio of 1:2. Weighted average of True positive
rate is 95.3% with both Elite and Non-Elite close to 95%.
Average of False positive rate is 5.3%, balanced between
elite and non-elite. ROC Area is 98.80%, which is higher
than all previous results.
Given this result, we can confidently conclude that our
classifier will classify Yelp users with a weighted average
accuracy of 95%.
7. Conclusion and Future Work
Our final model, developed using the J48 tree and linear
regression determines elite users with over 94% accuracy.
It also gives an ROC area of 98.80%, establishing its cor-
rectness. However, this model has been developed with the
academic data set provided by Yelp, thus missing some at-
tributes. With additional attributes such as the device through
which reviews were written, the time taken to write reviews
after meals, the proximity with which the reviews were writ-
ten, the user attributed divided by year, and so on, we believe
we can develop the model to predict elite users with more ac-
curacy. The model also does not use Natural Language Pro-
cessing to determine the content of reviews. Applying NLP
on this data may yield more conditions for the determination
of elite status. Furthermore, the Yelp elite council does not
disclose the factors it considers for the determination of the
elite status. The developed model is based only on historic
data.
In the future, we would like to try our models on Yelp’s
complete dataset, and check if it yields similar results. We
may have to make some modifications to incorporate the new
attributes, to achieve similar accuracy. We also plan to sub-
mit our results to the Yelp Dataset Challenge’ to evaluate
our findings. Additionally, we will work with other qualita-
tive factors such as content of reviews, in an effort to com-
pletely eliminate the manual process that Yelp uses to deter-
mine elite members.
References
[1] ”Yelp Investor Relations.” Web. 7 May 2015.
http://goo.gl/Iz4ZEo.
[2] ”What Is Yelp’s Elite Squad?” Web. 7 May 2015.
http://goo.gl/DcbkCX.
[3] ”Yelp.” Yelp’s Academic Dataset. Accessed April 5, 2015.
https://goo.gl/dHgVmn.
[4] ”Feature Selection (Data Mining) -
MSDN - Microsoft.” 2015. 7 May. 2015.
https://msdn.microsoft.com/en-us/ms175382.aspx.
[5] Stone, Madeline. ”Elite Yelpers Hold Immense Power, And
They Get Treated Like Kings By Bars And Restaurants Trying
To Curry Favor.” Business Insider. August 22, 2014. Accessed
April 27, 2015. http://goo.gl/cZyOMN.

A Supervised Modeling Approach to Determine Elite Status of Yelp Members

More Related Content

Similar to A Supervised Modeling Approach to Determine Elite Status of Yelp Members (20)

Recently uploaded (20)

A Supervised Modeling Approach to Determine Elite Status of Yelp Members