Zomato Crawler & Recommender

CONTENT & LOCATION AWARE
RESTAURANT RECOMMENDATIONS
USING URBAN REVIEW NETWORKS
PROJECT REPORT
Submitted By
Jayant Jaiswal, Roll No-12600112104, Regn No-121260110042
Shoaib Khan, Roll No-12600112163, Regn No-121260110101
Rohan Agarwal, Roll No-12600112143, Regn No-121260110081
Under the Supervision of
Asst. Prof. Partha Basuchowdhuri
Computer Science & Engineering
in partial fulﬁllment for the award of the degree
of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE & ENGINEERING
HERITAGE INSTITUTE OF TECHNOLOGY, KOLKATA
MAULANA ABUL KALAM AZAD
UNIVERSITY OF TECHNOLOGY

Acknowledgements
We would take this opportunity to thank Dr. P. Chaudhuri, Principal, Heritage In-
stitute of Technology for giving us the golden opportunity of working on this project
and providing us with all the necessary facilities and resources to work towards com-
pletion.
We are thankful to Asst. Prof. Partha Basuchowdhuri, our advisor and guide, for
his continuous support, advise and words of encouragement without which we could
have not seen through the completion of this project. He is not just an advisor but a
patient teacher who has always been there solving our doubts no matter how trivial
and providing us with valuable insights which helped us in every way possible. We
also owe our sincere gratitude to Dr. Subhashis Majumder, the Head of the Depart-
ment, for his enriching discussions, novel ideas and valuable feedbacks.
We would also like to thank our teachers, faculty members and laboratory assistants
at the Heritage Institute of Technology for playing a pivotal and decisive role during
the development of the project. Last but not the least we thank all friends for their
cooperation and encouragement.
Jayant Jaiswal
Shoaib Khan
Rohan Agarwal
i

HERITAGE INSTITUTE OF TECHNOLOGY
MAULANA ABUL KALAM AZAD UNIVERSITY OF TECHNOLOGY
BONAFIDE CERTIFICATE
Certiﬁed that this Project Report : ”CONTENT & LOCATION AWARE
RESTAURANT RECOMMENDATIONS USING URBAN REVIEW NET-
WORKS” is the bonaﬁde work of ”Jayant Jaiswal, Shoaib Khan and Rohan
Agarwal” who carried out this project work under my supervision.
SIGNATURE SIGNATURE
Dr. Subhashis Majumder Asst. Prof. Partha Basuchowdhuri
Head of the Department Project Guide
Computer Science & Engineering Computer Science & Engineering
East Kolkata Township, East Kolkata Township,
Chowbaga Road,Anandapur, Chowbaga Road,Anandapur,
West Bengal - 700107. West Bengal - 700107.
SIGNATURE
EXAMINER
ii

Abstract
Restaurant recommendation system is a very popular service whose so-
phistication keeps increasing everyday.In this paper we present a per-
sonalised restaurant recommendation system which has two parts to
it. The first part recommends users’ restaurants based on their restau-
rant review history. The second part recommends business owners with
places perfect to open a restaurant with a particular cuisine where the
owner would get the best traffic for the restaurant. Using Zomato data,
we built a restaurant recommendation system for the individuals and
business owners. For each user in our data we find out the cuisine
preferences and other restrictions such as services offered, ambience,
average rating, etc. and based on that we recommend the restaurants
accordingly. We propose a metric that takes the popularity as well as
the sentiment of opinions for the food items based on the user gener-
ated reviews as opposed to other systems where which only consider
the features mentioned above to recommend restaurants.
iii

Contents
1 Introduction 1
1.1 Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What are Recommendation Systems? . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Content Based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.3 Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Motivation for Restaurant Recommendations . . . . . . . . . . . . . . . . . 3
2 Literature Review 4
3 Problem Deﬁnition 5
4 Data Analysis 6
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Methodology 7
5.1 Location Aware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Content Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Conclusion 12
7 Future Works 13
8 References 14
iv

List of Figures
5.1 Live Map of Kolkata sorted on the basis of ratings . . . . . . . . . . . . . . 7
5.2 The road network stored in PostgreSQL . . . . . . . . . . . . . . . . . . . . 8
5.3 Map of Kolkata showing the important intersections to setup a new restau-
rant based upon a cuisine North Indian . . . . . . . . . . . . . . . . . . . . 9
5.4 The system taking user id as input to generate recommendations for that user. 11
5.5 Top 5 restaurants recommended by the system to the user for each food item 11
v

Chapter 1
Introduction
1.1 Road Map
In Chapter 1, we provide a broad description of the types of recommendation system
and applications of it in todays customer centric e-commerce market coupled with the basic
knowledge about recommendation system. In Chapter 2 we give a brief overview of the
prior works done in the field of restaurant recommendation. Chatpter 3 discusses about
the problem definition and terminologies related to it like content and location based rec-
ommendation. In Chapter 4 we discuss the methods of fetching data and the preprocessing
done to suit the sytem and create good recommendation. Chapter 5 discusses about the
methodologies and gives a detailed study about our system. The results of our system
on content and location specific recommendation are provided in Chapter 6. Scope for
improvements and future ideas are mentioned in Chapter 8 as future works.
1.2 What are Recommendation Systems?
Recommender systems have changed the way people find products, information, and even
other people. The goal of a Recommender System is to generate meaningful recommenda-
tions to a collection of users for items or products that might interest them. It has changed
the way inanimate websites communicate with their users. Rather than providing a static
experience in which users search for and potentially buy products, recommender systems
increase interaction to provide a richer experience. The systems identify recommendations
autonomously for individual users based on past purchases and searches, and on other users’
behavior. They study patterns of behavior to know what someone will prefer from among a
collection of things he has never experienced. The technology behind recommender systems
has evolved over the past 20 years into a rich collection of tools that enable the practitioner
or researcher to develop effective recommenders.
1.2.1 Collaborative Filtering
Collaborative filtering methods are based on collecting and analyzing a large amount
of information on users behaviors, activities or preferences and predicting what users will
like based on their similarity to other users. A key advantage of the collaborative filtering
1

approach is that it does not rely on machine analyzable content and therefore it is capable of
accurately recommending complex items such as movies without requiring an understanding
of the item itself. Many algorithms have been used in measuring user similarity or item
similarity in recommender systems. For example, the k-nearest neighbor (k-NN) approach
and the Pearson Correlation.
1.2.2 Content Based Filtering
Content-based filtering methods are based on a description of the item and a profile of
the users preference. In a content-based recommendation system, keywords are used to
describe the items; beside, a user profile is built to indicate the type of item this user likes.
In other words, these algorithms try to recommend items that are similar to those that
a user liked in the past (or is examining in the present). In particular, various candidate
items are compared with items previously rated by the user and the best-matching items
are recommended. This approach has its roots in information retrieval and information
filtering research.
1.2.3 Hybrid Approach
Recent research has demonstrated that a hybrid approach, combining collaborative fil-
tering and content-based filtering could be more effective in some cases. Hybrid approaches
can be implemented in several ways, by making content-based and collaborative-based pre-
dictions separately and then combining them, by adding content-based capabilities to a
collaborative-based approach (and vice versa), or by unifying the approaches into one model.
Several studies empirically compare the performance of the hybrid with the pure collabo-
rative and content-based methods and demonstrate that the hybrid methods can provide
more accurate recommendations than pure approaches. These methods can also be used to
overcome some of the common problems in recommendation systems such as cold start and
the sparsity problem. Netflix is a good example of a hybrid system. They make recommen-
dations by comparing the watching and searching habits of similar users (i.e. collaborative
filtering) as well as by offering movies that share characteristics with films that a user has
rated highly (content-based filtering).
1.2.4 Applications
1) Facebook users a recommender system to suggest Facebook users you may know offline.
The system is trained on personal data mutual friends, where you went to school, places of
work and mutual networks (pages, groups, etc.), to learn who might be in your offline &
offline network.
2) When you fill out your Taste Preferences or rate movies and TV shows, youre helping
Netflix to filter through the thousands of selections to get a better idea of what you might
like to watch. Factors that Netflix algorithm uses to make such recommendations include:
a) The genre of movies and TV shows available
b) Your streaming history, and previous ratings youve made.
2

c) The combined ratings of all Netflix members who have similar tastes in titles to you.
3) The Jobs You May Be Interested In feature shows jobs posted on LinkedIn that match
your profile in some way. These recommendations shown based on the titles and descriptions
in your previous experience, and the skills other users have endorsed.
4) Amazons algorithm crunches data on all of its millions of customer baskets, to figure out
which items are frequently bought together. This can lead to huge returns- for example,
if youre buying an electrical item, and see a recommendation for the cables or batteries it
requires beneath it, youre very likely to purchase both the core product and the accessories
from Amazon.
1.3 Motivation for Restaurant Recommendations
Obtaining recommendations from trusted sources is a critical component of the natural
process of human decision making. With burgeoning consumerism buoyed by the emergence
of the web, buyers are being presented with an increasing range of choices while sellers are
being faced with the challenge of personalizing their advertising efforts. In parallel, it has
become common for enterprises to collect large volumes of transactional data that allows
for deeper analysis of how a customer base interacts with the space of product offerings.
Recommender Systems have evolved to fulfill the natural dual need of buyers and sellers by
automating the generation of recommendations based on data analysis.
There are many recommendation systems available for problems like shopping, online
video entertainment, games etc. Restaurants & Dining is one area where there is a big
opportunity to recommend dining options to users based on their preferences as well as
historical data. Zomato is a very good source of such data with not only restaurant reviews,
but also user-level information on their preferred restaurants. This report describes the work
to recommend restaurants to a given Zomato user based on their history or their cuisine
preferences. It also does the task of recommending cuisine specific suitable locations to
newcomers in the restaurant business.
3

Chapter 2
Literature Review
In this section we bring to limelight a few previous works done in the field of providing
restaurant recommendations. Recommender systems seek to predict the ’rating’ or ’pref-
erence’ that a user would give to an item. Recommender systems typically produce a list
of recommendations in one of two ways - through collaborative or content-based filtering.
Collaborative filtering approaches building a model from a user’s past behavior (items pre-
viously purchased or selected and/or numerical ratings given to those items) as well as
similar decisions made by other users. This model is then used to predict items (or ratings
for items) that the user may have an interest in. Content-based filtering approaches uti-
lize a series of discrete characteristics of an item in order to recommend additional items
with similar properties. These approaches are often combined to from hybrid recommender
systems.
Traditional recommendation system has used user profile to analysis and find similar
user. The systems recommend restaurants to users from result of analysis. However, these
systems are lack of consideration of user mobility and environment. Other recommendation
system provides service by finding restaurant and providing information of restaurant by
web site. This system is close to search system but not recommendation system. Recently
research relating with context information is using user location to serve advertise, sale
and event information. This system analyses user preference through user profile and finds
restaurant satisfying user preference and closing user location. The research consists of two
sections, one which has online activity, and the other which processes data offline. When
the user is in motion, i.e., his geo-position changes notably, the system goes online and
recommendation module becomes active, retrieving nearby and restaurants and ranking
them, based on their properties, according to the scores generated offline. The offline part
generally remains in a non-functional mode when the user is stationary. The work of the
offline system is to generate a user interest profile, using a Machine Learning algorithm. The
drawback of the offline feature is that the interest profile is generated based on users check-
in to restaurants. It doesnt take into account users taste, habits and the cuisines he favors.
Thus the offline recommendation can be considered as a shallow approach lacking users
detailed interaction with each restaurants which can be obtained in the form of reviews.
4

Chapter 3
Problem Definition
Creating an innovative recommendation system to provide content based
recommendations to restaurant goers and owners and provide location
based recommendations to restaurant owners using Zomato Restaurant
Review Network.
Suggest best-suited places to new entrants in the restaurant business
for setting up a new restaurant to fill in the cuisine void and garner
high traffic.
Suggest restaurants to users based on their previous review activity on
Zomato by creating a recommendation system using all reviews from
all restaurants in a city.
5

Chapter 4
Data Analysis
4.1 Data Collection
Zomato and Yelp are two popular restaurant search, discovery and review services. While
both are popular gloablly Zomato has an edge over Yelp in India. Since we are based in India
we decided to choose Zomato as our ”Restaurant Review Network”. Also, Zomato provides
more carefully curated content which will be enough to satiate appetites even outside its
native land. Users can find restaurants, leave reviews, rate a restaurant, and keep their
own restaurant diary to share with friends. Zomato has built a highly coherent and focused
experience that puts the emphasis on being a comprehensive network for food-lovers. Very
little on the site is superfluous. In a survey users were impressed with the amount of
attention to detail evident in the sites content, and unlike on Yelp, several testers actually
used Zomatos curated lists and suggestions to find restaurants. Hence, Zomato was chosen
as our ”Restaurant Review Network” over other popular services due to it’s detailed yet
simplistic data.
4.2 Data Handling
We first crawled the restaurant data of Delhi from Zomato. The crawling was done using
data crawlers built in python which would specifically crawl restaurant data. The data
comprised of all possible features listed on Zomato like ”Dine-in or Takeaway”, ”AC or
Non-Ac”, etc. But then we switched to Kolkata as our sample city for data analysis due
to a number of reasons. Firstly, Kolkata had around 2000 restaurants compared to Delhi’s
10000+ restaurants. Also, we were based in Kolkata and knew the city in and out.Thus,
we could analyze the results better.
After the first crawl of restaurant’s data we crawled restaurant reviews for these 2000
restaurants. This crawl operation generated over two hundred thousand reviews. Each
review also comprised of restaurant name, reviewer id and name, details of the review. All
the data was later stored in MongoDB which is a No-Sql Database for easy fetching and
manipulation. These is the data that will be used by or system.
6

Chapter 5
Methodology
5.1 Location Aware
The location aware part of our project has the primary motive of recommending to people
who want to setup a new restaurant business explained earlier in our problem statement.
We assess the road map to identify concentration of restaurant clusters in a city. These
clusters can be deﬁned as restaurant hotspots. We have generated a live map of our sample
city (Kolkata) with all the restaurants marked in it. The nodes are given a particular color
as per the rating range in which they fall into and clicking on a node gives the details of
the restaurant the node is representing. This will help provide real time recommendations
to users based on ratings and locations. Below is a snapshot of the aforementioned.
Figure 5.1: Live Map of Kolkata sorted on the basis of ratings
The road network of our sample city was generated using OepenStreetMaps and Post-
greSql. It was a graph with road intersections as nodes and roads as edges. We couldn’t use
Google Maps for this as as it came at a premium. We had the coordinates of all restaurants
in the city. We added the restaurants as pendant nodes to their nearest intersection by
7

using K-NIN (K nearest neighbor) algorithm. Thus, we get a complete road network with
all restaurants and road intersections as nodes and the roads as edges.
Figure 5.2: The road network stored in PostgreSQL
For every node, which is an intersection, in the road network we store the distances of the
nearest restaurant for each and every cuisine at that node itself as node attributes. We now
created a vector for each road intersection which stores the X/Y ratios for top 10 cuisines
where,
X = Avg. rating for the nearest restaurant
Y = The distance of the nearest restaurant
The lesser this ratio (X/Y), the better it is suited to opening of a new restaurant for some
cuisine. Running the Page Rank Algorithm on the graph will also give us the important
intersections having more importance and traﬃc. Combining the page rank probablilites
with our ratio we deﬁne a new ratio R as follows :-
R =
Page Rank Probability
Ratio(X/Y )
If this ratio R is maximum, then the intersection is the best suited to setting up of a new
restaurant for that cuisine.
8

Figure 5.3: Map of Kolkata showing the important intersections to setup a new restaurant
based upon a cuisine (North Indian)
The map shows most important intersections favourable to setting up of a new restaurant
oﬀering north indian cuisine in red colour.
5.2 Content Based
We have crawled the restaurants of our sample city (Kolkata) for their features (i.e. rat-
ings, cuisines, bar, ac/non-ac, veg/non-veg, etc.) and their reviews given by the visiting
users. These reviews would give us insights about a users degree of likeness towards a par-
ticular restaurant. The review data for each restaurant in Kolkata consisted of user id, user
name, restaurant id restaurant name and the review. The reviews for each restaurants taken
individually were passed to ”Intellexer Sentiment Analyser” module. Applying Intellexer
on the restaurant reviews we get
1) Opinion Holders which are food items.
2) Each opinion holder (food item) having multiple opinions.
3) Sentiment values which can be either positive or negative for each opinion.
Intellexer Sentiment Analyzer is a powerful and eﬃcient solution that automatically ex-
tracts sentiments (positivity/negativity), opinion objects and emotions (liking, anger, dis-
gust, etc.) from unstructured text information. From these sentiment values we found out
the best food items available in that restaurant, applying sorting in descending order of
their sentiments. The sentiment values are calculated using the metric in equation 5.1. The
calculation for getting the best food item is done in the following way. For each opinion
holder we had a 3-tuple list of n items. The values in each tuple were
1) Food tags
2) Average Sentiment
9

3) Opinion Count
The food tags are the best spelling suited for a particular food item of North Indian
cuisine. These tags are sufficient to identify north Indian dishes and reviews from all the
user reviews. Example of these north Indian food tags are biryani, kebab, qorma, tikka
etc. Since there are multiple possible spellings for each unique food item holder given by
the users we attempt to replace all by a best suited name for the holder. Example Biryani
has multiple forms like biriyani,beryani,beeryani etc. We clubbed the sentiment of similar
food items with different spellings using fuzzywuzzy. Fuzzy String Matching, also called
Approximate String Matching, is the process of finding strings that approximately match a
given pattern. The closeness of a match is often measured in terms of edit distance, which
is the number of primitive operations necessary to convert the string into an exact match.
We took each words in our repository and found out a partial ratio of these items using
fuzz.partial Ratio() with the holders extracted from intellexer. We took a threshold value
of 67 i.e. if the ratio is greater than or equal to 67 then we considered the items to be
similar using this we clustered all the similar items and found the average of each tags in
our repository using the sentiment of the opinion from intellexer. Once we have got the
average sentiment and count of all the tags we then used a metric to rank the tags in that
restaurant. Here are some of the terms to be known:
Max cnt = Maximum of the opinion count value of all food tags
Min cnt = Minimum of the opinion count value of all food tags
The metric is :
Sentiment of tag = Avg. Sentiment of tag ×
opinion count of tag − min count + 1
max cnt − min cnt + 1
(5.1)
This metric is applied for each restaurant and their tags obtained. Using this we get a
normalized sentiment of each tag and sorting the tags on this value in descending order will
give us the highest ranked to lowest ranked tags. The normalization is done for a particular
restaurant and not related to all restaurants.
The Restaurant data of all the restaurants were inserted in MongoDB along with their
opinion counts. Thus, for each food tag in our repository we found out the restaurants that
provide that food. The previous max cnt and min cnt metric is again applied to individual
food tags in our repository for all restaurants. The result after sorting will give the top
restaurants for a particular food tag. This result for each individual tag is stored in the
database. Now for any user, his reviews are fed into intellexer and his opinion holders(food
items) are generated. These opinion holders indicate his food preference. The number of
opinion holders should be greater than one. Now comes the easy part of recommending the
already stored top 5 restaurants for each tag in the database to the user. Thus, the user
gets recommended based on his own reviews using our system. The results are shown in
screenshots below.
10

Figure 5.4: The system taking user id as input to generate recommendations for that user.
Figure 5.5: Top 5 restaurants recommended by the system to the user for each food item
11

Chapter 6
Conclusion
Our results show the busiest roads in the city which are ideal for setting up of new
restaurants and for foodies to take that course is a delight in itself. This can have positive
influence on current businesses also. Our results can provide cuisines lacking in a particular
area which can be exploited by current businesses. Previous works on location based recom-
mendation focused on providing results to users only but our system focuses on restaurant
owners.
Our system implements content-based filtering to provide restaurant recommendations
based on their previous reviews. The recommendations are pretty much accurate as per
our tests. Our system can be easily extended to other cities and cuisines. Our system has
immense potential and is multipurpose as it can come handy for businesses as well as the
average user. The field of restaurant recommendations is one of the uncharted territories
and our system is a small step in a giant ocean.
12

Chapter 7
Future Works
We plan on building our own sentiment analyzer pertaining to restaurants rather than
relying on a Intellexer Sentiment Analyser module. This will help in getting correct and
accurate sentiments for tags like services, food and other features of the restaurants and
solve our sentiment ambiguity. Sentiments are not purely positive or negative infact there
are various levels to identify sentiments and their effect in the statement. These can be
employed in details to the system for more accurate results.
We also plan on using Collaborative Filtering method to our system. These systems do
not use any information regarding the actual content of the items (as opposed to content
filtering). They are based on usage or preference patterns of other users. Selection (or
filtering) of items is done in a method similar to individuals collaborating to make recom-
mendations for each other i.e. if some tags are similar between users then the users are
termed as similar and similar recommendations are provided.
13

Chapter 8
References
1. Anant Gupta and Kuldeep Singh. Location Based Personalized Restaurant Recommen-
dation System for Mobile Environments.
2. Sumit Negi. Single Document Keyphrase Extraction Using Label Information.
3.Liu, J., Shang, J., Wang, C., Ren, X., Han, J., 2015. Mining Quality Phrases from Massive
Text Corpora, in:. Presented at the Proceedings of the 2015 ACM SIGMOD International
Conference on Management of Data, ACM, pp. 17291744.
4. Mariana Romanyshyn. RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN RE-
VIEWS. International Journal of Artiﬁcial Intelligence & Applications (IJAIA), Vol. 4, No.
4, July 2013.
5. El-Kishky, A., Song, Y., Wang, C., Voss, C.R., Han, J., 2014. Scalable topical phrase
mining from text corpora. Proceedings of the VLDB Endowment 8, 305316.
6. Burusothman Ahiladas, Paraneetharan Saravanaperumal, Sanjith Balachandran, Thamayan-
thy Sripalan and Surangika Ranathunga. Ruchi: Rating Individual Food Items in Restau-
rant Reviews.
14

Zomato Crawler & Recommender

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Zomato Crawler & Recommender (20)

Recently uploaded (20)

Zomato Crawler & Recommender