A N A LY Z I N G T H E
B I G D AT A
O F
Y E L P A C A D E M I C
D AT A S E T
M O H A M M E D
A L H A M A D I
YELP ACADEMIC DATASET
• 77,445 businesses
• 2,225,213 reviews
• 552,339 users
• 55,569 checkins
• 591,864 tips
Business_IDUser_ID
Business_ID
User_ID
Business_ID
User_ID
Business_ID
User_ID
BusinessUser
Checkin
Tip
Review
MOST COMMON USES
• Users looking for the best business around
• Users looking for a menu of a restaurant
• Users want to see pictures of meals of a restaurant
• Users want read reviews about a business
• Users…
• Users…
Focus is on
the user
PROPOSED USES & FEATURES
• Users
– Suggest places to go to based on history.
– Suggest better ways to improve users reviews to get more likes and attract more friends.
– Suggest friends for to befriend based on the common interests and other possible similarity
measure(s).
– Suggest the best time to visit a business and warn about the times that are typically
• Businesses
– Suggest places for businesses to be based on customers going to similar places.
– Suggest improvements for businesses to do based on customers opinions and reviews.
– Inform businesses about the rush hours that customers visit the place the most.
– Inform businesses about their competitors and what customers like about those businesses and
how to improve in order to compete stronger.
TECHNOLOGY USED
• Apache Spark™
– Spark-2.0.0 – hadoop2.7
– Spark Python API (PySpark)
• Download Apache Spark™
– http://spark.apache.org/downloads.html
• Spark Python API Programming Guide
– https://spark.apache.org/docs/0.9.0/python-programming-guide.html
FEATURE #1
PLACES TO VISIT BASED ON HISTORY
• Data Processing Steps:
1. Get all businesses the user reviewed
2. Get the categories of these businesses
3. Get all the businesses rated 5 stars in the same categories in the same cities
4. Display businesses names, cities and ratings to the user
FEATURE #1
PLACES TO GO TO BASED ON HISTORY (EXAMPLE)
User ID:
‘rpOyqD_893cqmDAtJLbdog’
Highly rated businesses visited by user (28 businesses)
Business ID City Stars
hW0Ne_HTHEAgGF1rAdmR-g Phoenix 3.5
9Y3aQAVITkEJYe5vLZr13w Scottsdale 3.5
cKiTluWCfMQTdmFZIugoiQ Scottsdale 3.5
u9wjRhUjySkHPa_hG3kFOg Las Vegas 4.0
… … …
Categories
flattened
'Hotels & Travel', 'Airports',
'Breakfast & Brunch', 'American
(Traditional)', 'Restaurants', 'Taxis',
'Airport Shuttles', 'Transportation',
'Arts & Entertainment', 'Resorts', …
Cities
'Pittsburgh'
,
'Champaig
n', 'Las
Vegas',
'Scottsdale'
, 'Phoenix'
Business ID Name City Stars
qAHfgkG-wIjx7Qd65... Waxworks Scottsdal
e
5.0
tunbozfPcMd84VO8O... Last Wave Salon Scottsdal
e
5.0
hzEFfuz2mOA3whGk8... Hairdressers II Pittsburg
h
5.0
kDtr03NIjERTqpdfV... Prop Shop Pittsburg 5.0
Suggested
Businesses
FEATURE #2
PLACES FOR BUSINESSES TO OPEN BASED ON
CUSTOMERS GOING TO SIMILAR PLACES
• Data Processing Steps:
1. Start with a random business
2. Get the users who wrote reviews for this business
3. Get all their other reviews (for other businesses)
4. Get the other businesses that they wrote reviews for
• These businesses must have high rating (>4)
• These businesses must be in the same categories, too
5. Get the latitudes and longitude of these businesses
FEATURE #2
PLACES FOR BUSINESSES TO OPEN BASED ON CUSTOMERS
GOING TO SIMILAR PLACES (EXAMPLE)
Business ID: ‘_RkvdDlzEFSRnGMrTRkVYA’
Users who wrote reviews
Other businesses they wrote reviews
for
'gqdCwtiDjOvRNcX81LQh-A‘, 'LMlBCXFVAHdPnSA94jc6PQ‘,
'zhVOlwBuEgdGlHjwgVf3Jg‘, 'UemY5i38Zb2hSS-Vwu4r2Q‘, …….
'q_BKmbdlYfQJroJVHfYMUQ',
's_cKw6m0Fw9jZbobRH0YSg', 'vbruEqj8eSqsgGkEkKzkig',
'3H2ttTM2aSIaZ6FTjHwDQQ',
'5ambRqdTJt9vGwFzVI9HBw',
'25cCnPfbVdYWNhbFLuwiYQ',
'UCaF3g4e1Diqpf0A06smag',
'DT6bZgApAKY0JE7McdUTyA',
'IPsG_71MD8pwB9i3TKOJYg‘, ……
Businesses locations:
43.07495567072
6
-
89.44053018809
2
43.0940334 -89.3413172
43.060838 -89.4966755
43.056081 -89.4976609
• Data Processing Steps:
1. Start with a random business that is rated 1 (the lowest)
2. Get all the reviews of this business
3. Analyze the clues that indicate unhappy customers’ reviews
4. Tokenize sentences to look for clues
5. Report the key customers opinions
FEATURE #3
IMPROVEMENTS FOR BUSINESSES TO DO
BASED ON CUSTOMERS OPINIONS REVIEWS
FEATURE #3
IMPROVEMENTS FOR BUSINESSES TO DO BASED ON
CUSTOMERS OPINIONS REVIEWS (EXAMPLE)
Business ID:
‘4ghEtxHV0uhrpYRRWh7Whw’
All reviews of this
business
I'll go ahead and throw my hater hat in the ring as well. We cancelled our subscription for three reasons:…
This is the worst delivery service I have ever had. Five of the last six Sundays have been missed. I keep…
Where to start with this RAG, "The Arizona Repulsive" ?? They are the worst example of an alleged
newspaper…
The worst marketing company i have ever worked with. After paying over $3000 and receiving 2 calls I told
them to… Clues
clues = ['rude', 'poor', 'service', 'customer service', 'supposed to open', 'the
worst']
Why customers are
unhappy?
' The Accounting practices are the worst', 'the customer service rep',
'Awful customer service', 'Some of the worst customer service of any public company',
'One of the poorest examples of a newspaper in the US', 'This is the worst delivery service I have ever had'
FEATURE #4
RUSH HOURS THAT CUSTOMERS
VISIT THE PLACE THE MOST
• Data Processing Steps:
1. Start with a random business
2. Get customers’ checkin information
3. Parse the checkin times
4. Display the rush hours of the business
FEATURE #4
RUSH HOURS THAT CUSTOMERS
VISIT THE PLACE THE MOST (EXAMPLE)
Business ID: ‘DZZQhoOWmTcJi2iSBscV-
g’
Checkin Information
{"checkin_info": {"16-0": 2, "15-0": 2, "15-3": 1, "15-2": 1, "15-
4": 3, "18-0": 3, "18-1": 2, "18-4": 1, "14-4": 1, "14-0": 2, "14-1": 2,
"14-2": 1, "14-3": 1, "17-1": 1, "11-5": 1, "13-2": 2, "11-1": 1, "13-
5": 3, "13-4": 3, "12-4": 1, "12-5": 1, "12-2": 1, "12-0": 1, "12-1": 1,
"9-1": 3, "9-0": 2, "9-3": 1, "9-2": 2, "9-5": 2, "9-4": 2, "10-4": 1, "7-
4": 1, "16-3": 1, "17-5": 1, "16-4": 1, "17-0": 2, "10-1": 1, "10-3": 1,
"8-0": 1, "8-1": 2, "8-3": 1}, "type": "checkin", "business_id":
"DZZQhoOWmTcJi2iSBscV-g"}
Parsed checkin
information
[('Friday', 13), ('Monday', 9), ('Sunday', 18), ('Thursday', 13), ('Thursday',
15)]
FEATURE #5
COMPETITORS AND WHAT CUSTOMERS LIKE ABOUT
THOSE BUSINESSES
• Data Processing Steps:
1. Define the business competitor:
• working in the same business category
• has equal or higher number of reviews
• has high rating (>4)
• is in the same city
2. Pick a business randomly
3. Select the competitors from the dataset
4. Get the reviews of these businesses
5. Parse the reviews to look for clues of why customers like these businesses and report the
results to the business owner
FEATURE #5
COMPETITORS AND WHAT CUSTOMERS LIKE ABOUT
THOSE BUSINESSES (EXAMPLE)
Business ID: ‘sfrqgVaaEs-
7afZvGITTrA’
Competitors
Business ID Business Name Stars Review Count
8uMP4kv2Je6rM9ZNz… Theatre Royal Bar 4.0 17
71Jz93r-PIkXpAP9O… The Antiquary 4.5 9
POabgnQCv-GgefKBD… Toddle In 4.0 7
Parsed reviews
'candles and various other quirky knick knacks that are excellent for gifts', 'i love that i can buy pens with almost
any colour of ink (must get an orange one!!), ' There is an excellent selection of cards and stationary', 'a very cute
little shop that's well presented with an excellent range of products'
Clues
clues = ['welcoming', 'the best', 'one of the best', 'comfortable', 'high quality', 'high-quality', 'professional',
'excellent', 'BEST']
• Data Processing Steps:
1. Start with a random user who has no useful reviews
2. Pick any review for this customer
3. For this business that the user reviewed, we'll get the review that is voted most helpful
4. Define the features that determine the review quality:
• text length
• number of places and names mentioned in the review
• presentation of the review to the reader
• existence of quoted text in the review, which can cite actual content on the menu or in the business
5. Parse both reviews and compare them based on the features we defined
6. Present suggestions to the user
FEATURE #6
BETTER WAYS FOR CUSTOMERS TO IMPROVE THEIR
REVIEWS TO GET MORE LIKES AND MORE FRIENDS
FEATURE #6
BETTER WAYS FOR CUSTOMERS TO IMPROVE THEIR
REVIEWS TO GET MORE LIKES AND MORE FRIENDS
User ID: ‘KQVa76T2-
hBoejvnhuUB0w’
'Simply wonderful ambiance and delicious food with very
attentive servers! Would highly recommend it to anyone!!!'
"It has taken me a while to visit The Grain Store but it was well worth the wait. Boasting the
best of Scottish produce, they offer a two course lunch for a generous xa312.50 (or xa315
for three courses).nnTo start I had oysters which came with shallot vinegar. Not a lot you
can say cooking wise but they were plump and fresh as a daisy and the vinegar was a classic
partner to them.nnSarah went for Stornaway black pudding with apple, which came well
presented and was a good sized starter portion. The variations of apple complimented the
wonderful black pudding and we were really enjoying the open brick surroundings.nnFor
main i had duck confit with celeriac and spinach. The skin was nice and crisp while the meat
was still moist. It was well seasoned and the celeriac puree was silky and velvety and brought
an earthyness to the plate. The spinach was superbly executed and the rich sauce tied this
wonderful dish all together perfectly.nnI was tempted by the pork belly with chestnuts and
kale and was glad Sarah ordered it so i could steal a taste! The pork was expertly cooked and
seasoned and kale was soft but still had a slight bite to it. I'd have been equally as happy to
devour that as I was with my duck dish.nnI really love the interior of the restaurant and the
service was excellent throughout. nnAt xa312.50, this is superb value for money and
certainly should not be missed."
Not so useful review Most useful review
Suggestions to the user to improve the reviews
1. Try having longer text that gives more details about the reviewed business.
2. Try using more new-line characters which enhances the presentation to the reader. This is
just one example to make your review appealing.
• Data Processing Steps:
1. Define the similarity measures
• users going to the same businesses
• users where all the places they visit are in the same cities
2. Pick a random user who has a good number of reviews (e.g., 20 reviews)
3. Get the businesses that the user reviewed
• Measure 1: suggest the users who went to the same places
• Measure 2: suggest the users whose all their cities are similar to the random user
FEATURE #7
SUGGEST FRIENDS BASED ON COMMON INTERESTS
& OTHER SIMILARITY MEASURE(S)
FEATURE #7
SUGGEST FRIENDS BASED ON COMMON INTERESTS
& OTHER SIMILARITY MEASURE(S)
User ID: ‘EH4jFgkwjPOQBFkSGLU1PQ’
Businesses that the user reviewed
['yTz8_GylkgCkuiBjSz8mIQ', '6AzXPSXxztBnGwkToG3jKg']
Friends suggested based on similarities
'GWJafGP6c-7Yw8WUzIsVKg', 'A854K5StAygBMr5QqUropg', 'iaBl0-BZDN1FLJw8-Zhndg',
'Syx29hfDK-asgfWlPAS6iw', '9DfAdrxmz9VO6N1W_mQOVQ',
'Jk1rtkcokDqeBWmVSuucUQ', 'QSdGIoISlC_xZSVFY0h9Ag',
'FE9npMEOxVy5RA27biWveA', 'SUtw2iUhu9gXLfSjxusVsA', 'eQUBtUDT6rLAAxPh-
FWThg', 'ICAvenbPw1IkZNPauUdxcA', 'qQ98WXky-jy-6zxozri_mQ',
'3LZKYC0_P4gRfCp2e1AWRQ', 'vLiU7p3vdNbg_EY5Ms0ipg',
'62stTQppHcH9u6E0sU1fPw', 'Nx6mxT2DsQtGNFR8-SECNg‘,
'EM5eOrzn2AmlNLLHOHdj2Q', …
• Data Processing Steps:
1. Start with a random business
2. Get the checkin times of the business
3. Parse the checkin time to get the busy hours and the free hours
4. Display the results to the user
FEATURE #8
BEST AND WORST TIMES TO VISIT A BUSINESS
FEATURE #8
BEST AND WORST TIMES TO VISIT A BUSINESS
Business ID:
‘zF2z6b8Hg0Yn7rnxcZGJWw’
(0-0=None, 0-1=None, 0-2=1, 0-3=2, 0-4=2, 0-5=None, 0-6=1, 1-0=4, 1-1=None, 1-2=None, 1-3=1, 1-4=1, 1-5=1, 1-6=1, 10-0=None, 10-1=None, 10-2=None, 10-
3=None, 10-4=None, 10-5=None, 10-6=None, 11-0=None, 11-1=None, 11-2=None, 11-3=None, 11-4=None, 11-5=None, 11-6=None, 12-0=None, 12-1=None, 12-
2=None, 12-3=None, 12-4=None, 12-5=None, 12-6=None, 13-0=None, 13-1=1, 13-2=None, 13-3=None, 13-4=None, 13-5=None, 13-6=1, 14-0=None, 14-1=None,
14-2=None, 14-3=None, 14-4=None, 14-5=None, 14-6=None, 15-0=None, 15-1=None, 15-2=None, 15-3=None, 15-4=None, 15-5=None, 15-6=None, 16-0=None,
16-1=None, 16-2=None, 16-3=1, 16-4=None, 16-5=None, 16-6=None, 17-0=None, 17-1=None, 17-2=None, 17-3=None, 17-4=1, 17-5=None, 17-6=None, 18-
0=None, 18-1=None, 18-2=None, 18-3=None, 18-4=None, 18-5=None, 18-6=None, 19-0=None, 19-1=None, 19-2=1, 19-3=None, 19-4=None, 19-5=None, 19-
6=None, 2-0=1, 2-1=None, 2-2=None, 2-3=2, 2-4=None, 2-5=None, 2-6=None, 20-0=None, 20-1=None, 20-2=None, 20-3=None, 20-4=None, 20-5=1, 20-6=1,
21-0=None, 21-1=3, 21-2=None, 21-3=1, 21-4=None, 21-5=1, 21-6=1, 22-0=None, 22-1=1, 22-2=1, 22-3=1, 22-4=None, 22-5=None, 22-6=2, 23-0=None, 23-1=1,
23-2=None, 23-3=1, 23-4=2, 23-5=1, 23-6=None, 3-0=1, 3-1=None, 3-2=None, 3-3=None, 3-4=None, 3-5=None, 3-6=None, 4-0=None, 4-1=None, 4-2=None,
4-3=None, 4-4=None, 4-5=None, 4-6=None, 5-0=None, 5-1=None, 5-2=None, 5-3=None, 5-4=None, 5-5=None, 5-6=None, 6-0=None, 6-1=None, 6-2=None,
6-3=None, 6-4=None, 6-5=None, 6-6=None, 7-0=None, 7-1=None, 7-2=None, 7-3=None, 7-4=None, 7-5=None, 7-6=None, 8-0=None, 8-1=None, 8-2=None,
8-3=None, 8-4=None, 8-5=None, 8-6=None, 9-0=None, 9-1=None, 9-2=None, 9-3=None, 9-4=None, 9-5=None, 9-6=None)
Business Checkin Times
Rush Hours Good Hours
[('Sunday', 1)]
{'Monday': [22, 13, 23], 'Tuesday':
[22, 0, 19], 'Friday': [20, 1, 21, 23],
'Wednesday': [22, 1, 21, 16, 23],
'Thursday': [17, 1], 'Sunday': [2, 3],
'Saturday': [20, 1, 13, 21, 0]}
QUESTIONS
THANK YOU

More Related Content

PDF
UPSERVE – Restaurant Sales and Analysis System
PPTX
Text mining of reviews
PDF
How to open a restaurant?
PPTX
TEAM2_RESTAURANT BUSINESS ANALYSIS.pptx by NSTI Mumbai
PDF
Web Scraping Restaurant Review Data from Google, Swiggy & Zomato
PPTX
Presentation fyp
PPTX
Web Scraping Restaurant Review Data from Google, Swiggy & Zomato
PDF
Restaurantdiary - Restaurant Reservation System
UPSERVE – Restaurant Sales and Analysis System
Text mining of reviews
How to open a restaurant?
TEAM2_RESTAURANT BUSINESS ANALYSIS.pptx by NSTI Mumbai
Web Scraping Restaurant Review Data from Google, Swiggy & Zomato
Presentation fyp
Web Scraping Restaurant Review Data from Google, Swiggy & Zomato
Restaurantdiary - Restaurant Reservation System

Similar to Analyzing the Big Data of Yelp Academic Dataset (20)

PDF
SWOT Analysis for Restaurants: A Strategic Guide
PDF
Please Beecells ENG
PPTX
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
PDF
Scrape Restaurant Reviews & Ratings for Business Insights
PPTX
Scraping Food Industry Reviews (2M+) for Actionable Insights.pptx
PDF
Scraping Food Industry Reviews (2M+) for Actionable Insights.pdf
PPTX
Project on operation management
PDF
Introduction to Apache Drill - NYC Apache Drill Meetup
PDF
Self service data exploration with apache drill
PPTX
Restaurant Data Scraping to Enhance Food Intelligence Strategies.pptx
PDF
Back to deliverables
PDF
Restaurant Data Scraping to Enhance Food Intelligence Strategies.pdf
PPTX
Restaraunt Data Analysis using Power BI, Excel and Python
PPTX
Time series Forecasting: Recruit Restaurant
PDF
Slides Mx 2008 Freitas
DOCX
Epics and User Stories
PPTX
Use Cases for Web Scraping for Restaurant and Fast Food Listings
PDF
Use Cases for Web Scraping for Restaurant and Fast Food Listings
PDF
restaurant management system for managing restaurant.
PPTX
Apache Drill – Hands-On SQL References
SWOT Analysis for Restaurants: A Strategic Guide
Please Beecells ENG
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Scrape Restaurant Reviews & Ratings for Business Insights
Scraping Food Industry Reviews (2M+) for Actionable Insights.pptx
Scraping Food Industry Reviews (2M+) for Actionable Insights.pdf
Project on operation management
Introduction to Apache Drill - NYC Apache Drill Meetup
Self service data exploration with apache drill
Restaurant Data Scraping to Enhance Food Intelligence Strategies.pptx
Back to deliverables
Restaurant Data Scraping to Enhance Food Intelligence Strategies.pdf
Restaraunt Data Analysis using Power BI, Excel and Python
Time series Forecasting: Recruit Restaurant
Slides Mx 2008 Freitas
Epics and User Stories
Use Cases for Web Scraping for Restaurant and Fast Food Listings
Use Cases for Web Scraping for Restaurant and Fast Food Listings
restaurant management system for managing restaurant.
Apache Drill – Hands-On SQL References
Ad

Recently uploaded (20)

PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Introduction to the R Programming Language
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Transcultural that can help you someday.
DOCX
Factor Analysis Word Document Presentation
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Business_Capability_Map_Collection__pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Microsoft 365 products and services descrption
PPT
Image processing and pattern recognition 2.ppt
PPT
Predictive modeling basics in data cleaning process
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
New ISO 27001_2022 standard and the changes
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Global Data and Analytics Market Outlook Report
DU, AIS, Big Data and Data Analytics.ppt
Introduction to the R Programming Language
Pilar Kemerdekaan dan Identi Bangsa.pptx
Transcultural that can help you someday.
Factor Analysis Word Document Presentation
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
CYBER SECURITY the Next Warefare Tactics
A Complete Guide to Streamlining Business Processes
Business_Capability_Map_Collection__pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Microsoft 365 products and services descrption
Image processing and pattern recognition 2.ppt
Predictive modeling basics in data cleaning process
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
New ISO 27001_2022 standard and the changes
SAP 2 completion done . PRESENTATION.pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
[EN] Industrial Machine Downtime Prediction
Global Data and Analytics Market Outlook Report
Ad

Analyzing the Big Data of Yelp Academic Dataset

  • 1. A N A LY Z I N G T H E B I G D AT A O F Y E L P A C A D E M I C D AT A S E T M O H A M M E D A L H A M A D I
  • 2. YELP ACADEMIC DATASET • 77,445 businesses • 2,225,213 reviews • 552,339 users • 55,569 checkins • 591,864 tips Business_IDUser_ID Business_ID User_ID Business_ID User_ID Business_ID User_ID BusinessUser Checkin Tip Review
  • 3. MOST COMMON USES • Users looking for the best business around • Users looking for a menu of a restaurant • Users want to see pictures of meals of a restaurant • Users want read reviews about a business • Users… • Users… Focus is on the user
  • 4. PROPOSED USES & FEATURES • Users – Suggest places to go to based on history. – Suggest better ways to improve users reviews to get more likes and attract more friends. – Suggest friends for to befriend based on the common interests and other possible similarity measure(s). – Suggest the best time to visit a business and warn about the times that are typically • Businesses – Suggest places for businesses to be based on customers going to similar places. – Suggest improvements for businesses to do based on customers opinions and reviews. – Inform businesses about the rush hours that customers visit the place the most. – Inform businesses about their competitors and what customers like about those businesses and how to improve in order to compete stronger.
  • 5. TECHNOLOGY USED • Apache Spark™ – Spark-2.0.0 – hadoop2.7 – Spark Python API (PySpark) • Download Apache Spark™ – http://spark.apache.org/downloads.html • Spark Python API Programming Guide – https://spark.apache.org/docs/0.9.0/python-programming-guide.html
  • 6. FEATURE #1 PLACES TO VISIT BASED ON HISTORY • Data Processing Steps: 1. Get all businesses the user reviewed 2. Get the categories of these businesses 3. Get all the businesses rated 5 stars in the same categories in the same cities 4. Display businesses names, cities and ratings to the user
  • 7. FEATURE #1 PLACES TO GO TO BASED ON HISTORY (EXAMPLE) User ID: ‘rpOyqD_893cqmDAtJLbdog’ Highly rated businesses visited by user (28 businesses) Business ID City Stars hW0Ne_HTHEAgGF1rAdmR-g Phoenix 3.5 9Y3aQAVITkEJYe5vLZr13w Scottsdale 3.5 cKiTluWCfMQTdmFZIugoiQ Scottsdale 3.5 u9wjRhUjySkHPa_hG3kFOg Las Vegas 4.0 … … … Categories flattened 'Hotels & Travel', 'Airports', 'Breakfast & Brunch', 'American (Traditional)', 'Restaurants', 'Taxis', 'Airport Shuttles', 'Transportation', 'Arts & Entertainment', 'Resorts', … Cities 'Pittsburgh' , 'Champaig n', 'Las Vegas', 'Scottsdale' , 'Phoenix' Business ID Name City Stars qAHfgkG-wIjx7Qd65... Waxworks Scottsdal e 5.0 tunbozfPcMd84VO8O... Last Wave Salon Scottsdal e 5.0 hzEFfuz2mOA3whGk8... Hairdressers II Pittsburg h 5.0 kDtr03NIjERTqpdfV... Prop Shop Pittsburg 5.0 Suggested Businesses
  • 8. FEATURE #2 PLACES FOR BUSINESSES TO OPEN BASED ON CUSTOMERS GOING TO SIMILAR PLACES • Data Processing Steps: 1. Start with a random business 2. Get the users who wrote reviews for this business 3. Get all their other reviews (for other businesses) 4. Get the other businesses that they wrote reviews for • These businesses must have high rating (>4) • These businesses must be in the same categories, too 5. Get the latitudes and longitude of these businesses
  • 9. FEATURE #2 PLACES FOR BUSINESSES TO OPEN BASED ON CUSTOMERS GOING TO SIMILAR PLACES (EXAMPLE) Business ID: ‘_RkvdDlzEFSRnGMrTRkVYA’ Users who wrote reviews Other businesses they wrote reviews for 'gqdCwtiDjOvRNcX81LQh-A‘, 'LMlBCXFVAHdPnSA94jc6PQ‘, 'zhVOlwBuEgdGlHjwgVf3Jg‘, 'UemY5i38Zb2hSS-Vwu4r2Q‘, ……. 'q_BKmbdlYfQJroJVHfYMUQ', 's_cKw6m0Fw9jZbobRH0YSg', 'vbruEqj8eSqsgGkEkKzkig', '3H2ttTM2aSIaZ6FTjHwDQQ', '5ambRqdTJt9vGwFzVI9HBw', '25cCnPfbVdYWNhbFLuwiYQ', 'UCaF3g4e1Diqpf0A06smag', 'DT6bZgApAKY0JE7McdUTyA', 'IPsG_71MD8pwB9i3TKOJYg‘, …… Businesses locations: 43.07495567072 6 - 89.44053018809 2 43.0940334 -89.3413172 43.060838 -89.4966755 43.056081 -89.4976609
  • 10. • Data Processing Steps: 1. Start with a random business that is rated 1 (the lowest) 2. Get all the reviews of this business 3. Analyze the clues that indicate unhappy customers’ reviews 4. Tokenize sentences to look for clues 5. Report the key customers opinions FEATURE #3 IMPROVEMENTS FOR BUSINESSES TO DO BASED ON CUSTOMERS OPINIONS REVIEWS
  • 11. FEATURE #3 IMPROVEMENTS FOR BUSINESSES TO DO BASED ON CUSTOMERS OPINIONS REVIEWS (EXAMPLE) Business ID: ‘4ghEtxHV0uhrpYRRWh7Whw’ All reviews of this business I'll go ahead and throw my hater hat in the ring as well. We cancelled our subscription for three reasons:… This is the worst delivery service I have ever had. Five of the last six Sundays have been missed. I keep… Where to start with this RAG, "The Arizona Repulsive" ?? They are the worst example of an alleged newspaper… The worst marketing company i have ever worked with. After paying over $3000 and receiving 2 calls I told them to… Clues clues = ['rude', 'poor', 'service', 'customer service', 'supposed to open', 'the worst'] Why customers are unhappy? ' The Accounting practices are the worst', 'the customer service rep', 'Awful customer service', 'Some of the worst customer service of any public company', 'One of the poorest examples of a newspaper in the US', 'This is the worst delivery service I have ever had'
  • 12. FEATURE #4 RUSH HOURS THAT CUSTOMERS VISIT THE PLACE THE MOST • Data Processing Steps: 1. Start with a random business 2. Get customers’ checkin information 3. Parse the checkin times 4. Display the rush hours of the business
  • 13. FEATURE #4 RUSH HOURS THAT CUSTOMERS VISIT THE PLACE THE MOST (EXAMPLE) Business ID: ‘DZZQhoOWmTcJi2iSBscV- g’ Checkin Information {"checkin_info": {"16-0": 2, "15-0": 2, "15-3": 1, "15-2": 1, "15- 4": 3, "18-0": 3, "18-1": 2, "18-4": 1, "14-4": 1, "14-0": 2, "14-1": 2, "14-2": 1, "14-3": 1, "17-1": 1, "11-5": 1, "13-2": 2, "11-1": 1, "13- 5": 3, "13-4": 3, "12-4": 1, "12-5": 1, "12-2": 1, "12-0": 1, "12-1": 1, "9-1": 3, "9-0": 2, "9-3": 1, "9-2": 2, "9-5": 2, "9-4": 2, "10-4": 1, "7- 4": 1, "16-3": 1, "17-5": 1, "16-4": 1, "17-0": 2, "10-1": 1, "10-3": 1, "8-0": 1, "8-1": 2, "8-3": 1}, "type": "checkin", "business_id": "DZZQhoOWmTcJi2iSBscV-g"} Parsed checkin information [('Friday', 13), ('Monday', 9), ('Sunday', 18), ('Thursday', 13), ('Thursday', 15)]
  • 14. FEATURE #5 COMPETITORS AND WHAT CUSTOMERS LIKE ABOUT THOSE BUSINESSES • Data Processing Steps: 1. Define the business competitor: • working in the same business category • has equal or higher number of reviews • has high rating (>4) • is in the same city 2. Pick a business randomly 3. Select the competitors from the dataset 4. Get the reviews of these businesses 5. Parse the reviews to look for clues of why customers like these businesses and report the results to the business owner
  • 15. FEATURE #5 COMPETITORS AND WHAT CUSTOMERS LIKE ABOUT THOSE BUSINESSES (EXAMPLE) Business ID: ‘sfrqgVaaEs- 7afZvGITTrA’ Competitors Business ID Business Name Stars Review Count 8uMP4kv2Je6rM9ZNz… Theatre Royal Bar 4.0 17 71Jz93r-PIkXpAP9O… The Antiquary 4.5 9 POabgnQCv-GgefKBD… Toddle In 4.0 7 Parsed reviews 'candles and various other quirky knick knacks that are excellent for gifts', 'i love that i can buy pens with almost any colour of ink (must get an orange one!!), ' There is an excellent selection of cards and stationary', 'a very cute little shop that's well presented with an excellent range of products' Clues clues = ['welcoming', 'the best', 'one of the best', 'comfortable', 'high quality', 'high-quality', 'professional', 'excellent', 'BEST']
  • 16. • Data Processing Steps: 1. Start with a random user who has no useful reviews 2. Pick any review for this customer 3. For this business that the user reviewed, we'll get the review that is voted most helpful 4. Define the features that determine the review quality: • text length • number of places and names mentioned in the review • presentation of the review to the reader • existence of quoted text in the review, which can cite actual content on the menu or in the business 5. Parse both reviews and compare them based on the features we defined 6. Present suggestions to the user FEATURE #6 BETTER WAYS FOR CUSTOMERS TO IMPROVE THEIR REVIEWS TO GET MORE LIKES AND MORE FRIENDS
  • 17. FEATURE #6 BETTER WAYS FOR CUSTOMERS TO IMPROVE THEIR REVIEWS TO GET MORE LIKES AND MORE FRIENDS User ID: ‘KQVa76T2- hBoejvnhuUB0w’ 'Simply wonderful ambiance and delicious food with very attentive servers! Would highly recommend it to anyone!!!' "It has taken me a while to visit The Grain Store but it was well worth the wait. Boasting the best of Scottish produce, they offer a two course lunch for a generous xa312.50 (or xa315 for three courses).nnTo start I had oysters which came with shallot vinegar. Not a lot you can say cooking wise but they were plump and fresh as a daisy and the vinegar was a classic partner to them.nnSarah went for Stornaway black pudding with apple, which came well presented and was a good sized starter portion. The variations of apple complimented the wonderful black pudding and we were really enjoying the open brick surroundings.nnFor main i had duck confit with celeriac and spinach. The skin was nice and crisp while the meat was still moist. It was well seasoned and the celeriac puree was silky and velvety and brought an earthyness to the plate. The spinach was superbly executed and the rich sauce tied this wonderful dish all together perfectly.nnI was tempted by the pork belly with chestnuts and kale and was glad Sarah ordered it so i could steal a taste! The pork was expertly cooked and seasoned and kale was soft but still had a slight bite to it. I'd have been equally as happy to devour that as I was with my duck dish.nnI really love the interior of the restaurant and the service was excellent throughout. nnAt xa312.50, this is superb value for money and certainly should not be missed." Not so useful review Most useful review Suggestions to the user to improve the reviews 1. Try having longer text that gives more details about the reviewed business. 2. Try using more new-line characters which enhances the presentation to the reader. This is just one example to make your review appealing.
  • 18. • Data Processing Steps: 1. Define the similarity measures • users going to the same businesses • users where all the places they visit are in the same cities 2. Pick a random user who has a good number of reviews (e.g., 20 reviews) 3. Get the businesses that the user reviewed • Measure 1: suggest the users who went to the same places • Measure 2: suggest the users whose all their cities are similar to the random user FEATURE #7 SUGGEST FRIENDS BASED ON COMMON INTERESTS & OTHER SIMILARITY MEASURE(S)
  • 19. FEATURE #7 SUGGEST FRIENDS BASED ON COMMON INTERESTS & OTHER SIMILARITY MEASURE(S) User ID: ‘EH4jFgkwjPOQBFkSGLU1PQ’ Businesses that the user reviewed ['yTz8_GylkgCkuiBjSz8mIQ', '6AzXPSXxztBnGwkToG3jKg'] Friends suggested based on similarities 'GWJafGP6c-7Yw8WUzIsVKg', 'A854K5StAygBMr5QqUropg', 'iaBl0-BZDN1FLJw8-Zhndg', 'Syx29hfDK-asgfWlPAS6iw', '9DfAdrxmz9VO6N1W_mQOVQ', 'Jk1rtkcokDqeBWmVSuucUQ', 'QSdGIoISlC_xZSVFY0h9Ag', 'FE9npMEOxVy5RA27biWveA', 'SUtw2iUhu9gXLfSjxusVsA', 'eQUBtUDT6rLAAxPh- FWThg', 'ICAvenbPw1IkZNPauUdxcA', 'qQ98WXky-jy-6zxozri_mQ', '3LZKYC0_P4gRfCp2e1AWRQ', 'vLiU7p3vdNbg_EY5Ms0ipg', '62stTQppHcH9u6E0sU1fPw', 'Nx6mxT2DsQtGNFR8-SECNg‘, 'EM5eOrzn2AmlNLLHOHdj2Q', …
  • 20. • Data Processing Steps: 1. Start with a random business 2. Get the checkin times of the business 3. Parse the checkin time to get the busy hours and the free hours 4. Display the results to the user FEATURE #8 BEST AND WORST TIMES TO VISIT A BUSINESS
  • 21. FEATURE #8 BEST AND WORST TIMES TO VISIT A BUSINESS Business ID: ‘zF2z6b8Hg0Yn7rnxcZGJWw’ (0-0=None, 0-1=None, 0-2=1, 0-3=2, 0-4=2, 0-5=None, 0-6=1, 1-0=4, 1-1=None, 1-2=None, 1-3=1, 1-4=1, 1-5=1, 1-6=1, 10-0=None, 10-1=None, 10-2=None, 10- 3=None, 10-4=None, 10-5=None, 10-6=None, 11-0=None, 11-1=None, 11-2=None, 11-3=None, 11-4=None, 11-5=None, 11-6=None, 12-0=None, 12-1=None, 12- 2=None, 12-3=None, 12-4=None, 12-5=None, 12-6=None, 13-0=None, 13-1=1, 13-2=None, 13-3=None, 13-4=None, 13-5=None, 13-6=1, 14-0=None, 14-1=None, 14-2=None, 14-3=None, 14-4=None, 14-5=None, 14-6=None, 15-0=None, 15-1=None, 15-2=None, 15-3=None, 15-4=None, 15-5=None, 15-6=None, 16-0=None, 16-1=None, 16-2=None, 16-3=1, 16-4=None, 16-5=None, 16-6=None, 17-0=None, 17-1=None, 17-2=None, 17-3=None, 17-4=1, 17-5=None, 17-6=None, 18- 0=None, 18-1=None, 18-2=None, 18-3=None, 18-4=None, 18-5=None, 18-6=None, 19-0=None, 19-1=None, 19-2=1, 19-3=None, 19-4=None, 19-5=None, 19- 6=None, 2-0=1, 2-1=None, 2-2=None, 2-3=2, 2-4=None, 2-5=None, 2-6=None, 20-0=None, 20-1=None, 20-2=None, 20-3=None, 20-4=None, 20-5=1, 20-6=1, 21-0=None, 21-1=3, 21-2=None, 21-3=1, 21-4=None, 21-5=1, 21-6=1, 22-0=None, 22-1=1, 22-2=1, 22-3=1, 22-4=None, 22-5=None, 22-6=2, 23-0=None, 23-1=1, 23-2=None, 23-3=1, 23-4=2, 23-5=1, 23-6=None, 3-0=1, 3-1=None, 3-2=None, 3-3=None, 3-4=None, 3-5=None, 3-6=None, 4-0=None, 4-1=None, 4-2=None, 4-3=None, 4-4=None, 4-5=None, 4-6=None, 5-0=None, 5-1=None, 5-2=None, 5-3=None, 5-4=None, 5-5=None, 5-6=None, 6-0=None, 6-1=None, 6-2=None, 6-3=None, 6-4=None, 6-5=None, 6-6=None, 7-0=None, 7-1=None, 7-2=None, 7-3=None, 7-4=None, 7-5=None, 7-6=None, 8-0=None, 8-1=None, 8-2=None, 8-3=None, 8-4=None, 8-5=None, 8-6=None, 9-0=None, 9-1=None, 9-2=None, 9-3=None, 9-4=None, 9-5=None, 9-6=None) Business Checkin Times Rush Hours Good Hours [('Sunday', 1)] {'Monday': [22, 13, 23], 'Tuesday': [22, 0, 19], 'Friday': [20, 1, 21, 23], 'Wednesday': [22, 1, 21, 16, 23], 'Thursday': [17, 1], 'Sunday': [2, 3], 'Saturday': [20, 1, 13, 21, 0]}