SlideShare a Scribd company logo
Ranked-Restaurant Searching System
using Data Mining: Bob is my friend
Han Sang Jun / Yoon Seung Hyun
14 – 2 Capstone Design
Handong University
How can we find
the most popular restaurant
for Pasta in Upland?
Not user’s feedback
But data already existing on WEB!
such as blog review for restaurant...
Collective Intelligence
Ranked-Restaurant Searching System using Data Mining
Ranked-Restaurant Searching System using Data Mining
Ranked-Restaurant Searching System using Data Mining
Make database
for all restaurants info in Korea
Step 1
1
Possession 19.89% for WEB search in Korea
http://place.map.daum.net/15935749
<html>
<body>
<tittle>restaurant_name</title>
<div>
<span class=“t_l”>address</span>
<span class=“t_l num”>phone</span>
<span class=“f_l”>category > Korean</span>
</div>
<div>
<dt class=“list_info”>available to use card</dl>
<dt class=“list_info”>available for reservation</dl>
.
.
</div>
</body>
</html>
Python Library – BeautifulSoup4
You can move, find, extract info
on HTML structure
url = “http://place.map.daum.net/15935749”
handle = urllib2.urlopen(url)
data = handle.read()
soup = BeautifulSoup(data)
#Name Crawl
name = soup.title.string
name = name.split("–")[0]
#Address1 Crawl
address1 = soup.find_all("dd", "desc")[0].get_text()
address1 = address1.strip()
from bs4 import BeautifulSoup
for i in range (00000000, 99999999):
if category == Restaurant:
Insert information in MySQLdb
http://place.map.daum.net/15935749
Name Address Phone Category Description Longitude Langtitude
Database for all restaurants in Korea
Google GeoCoding
Collect blogs
related to restaurant review
Step 2
2
Possession 76.89% for WEB search in Korea
http://section.blog.naver.com/
Popular Blog service
http://section.blog.naver.com/sub/SearchBlog.nhn?
type=post&option.keyword=포항%20환여횟집
Database for all restaurants in Korea
Name Address Phone Category Description Longitude Langtitude
Get blogs info with BeautifulSoup4
by getting href link
But does search engine show exact results to us?
If not...
Let computer learn
to collect appropriate blog’s data
with results preview
What is machine learning?
vague, ambiguous
Make Learning Set(training Set)
Training
Test Set
• Supervised Learning
 Someone gives us examples and the right answer
• Unsupervised Learning
 We see examples but get no feedback
 We need to find patterns in the data
• Semi-supervised Learning
 Given a small number of examples with the right answers, we need to
find patterns in the data, so that we can predict the right answer for
unseen examples
• Reinforcement Learning
 We take actions and get rewards
 Have to learn how to get high rewards
Different Kinds of Learning
Make learning set with results preview
for judging
whether it is related to restaurant review or not
We need to classify them!
Classifier – Supervised learning
1. K-nearest neighbors algorithm
2. Perceptron
3. Bayesian approach
4. Decision trees
5. Neural Networks
6. Support Vector Machines
7. Ensembles
Is_Restaurant Is_Not_Restaurant
restaurant
delicious
good
politic
hello
economy
atmosphere
200
250
300
250
350
200
150
300
word count word count
atmosphere
3000 blog previews for related to restaurant review
Learning Set
Is_Restaurant Is_Not_Restaurant
restaurant
delicious
good
politic
hello
economy
atmosphere
200
250
300
250
350
200
150
300
word count word count
atmosphere
3000 blog previews for related to restaurant review
Learning Set
Is_Restaurant Is_Not_Restaurant
restaurant
delicious
good
politic
hello
economy
atmosphere
200
250
300
250
350
200
150
300
word count word count
atmosphere
Learning Set
Using morpheme analyzer
eat, ate, eaten are changed to eat
“It was good to eat delicious pasta in this restaurant!”
Is this related to restaurant review?
with upper table...
Naïve Bayes
possibility that
a given sentence is in category of
Is_Restaurant
possibility that
a given sentence is in category of
Is_Not_Restaurant
Compare!
P(Is_Restaurant | sentence)
Compare!
P(Is_Not_Restaurant | sentence)
Compare!
p(Is | s)
p(s | Is) *
p(s)
p(Is)
p(IsN | s)
p(s | IsN) *p(IsN)
p(s)
p(Is | s)
p(s | Is) *p(Is)
p(s)
p(ls) : possibility of choosing Is_Restaurant(assume as 0.5)
p(s | ls) : In Is_Restaurant set
possibility of choosing the given sentece
p(Is | s)
p(s | Is) * p(Is)
p(s)
p(s | ls) : In Is_Restaurant set,
possibility of choosing the given sentence
p(s | Is) = p(word | Is) * p(word | Is) * p(word | Is)…
We assume that each word cannot affect to each other...
Is_Restaurant
“It was good to eat delicious pasta in this restaurant!”
p(Is) = 0.5
p(restaurant | Is) =
1000
250
p(delicious | Is) =
1000
300
p(good | Is) =
1000
250
p(eat | Is) =
1000
0(1)
restaurant
delicious
good
200
250
300
250
word count
atmosphere
“It was good to eat delicious pasta in this restaurant!”
0.5 * 0.001 * 0.3 * 0.25 * 0.25 =
9.375 * 10^-6
Is_Restaurant
restaurant
delicious
good
200
250
300
250
word count
atmosphere
0.001 ^ 4 = 1 * 10^-12
“It was good to eat delicious pasta in this restaurant!”
Is_Not_Restaurant
politic
hello
economy
atmosphere
350
200
150
300
word count
This is related to restaurant review!
with upper table...
“It was good to eat delicious pasta in this restaurant!”
Support Vector Machine
Ranked-Restaurant Searching System using Data Mining
We need to find optimal hyperplane
Extract meaningful info
from the collected blogs
Step 3
3
Title Date Content Reply URL
Database for Blogs
For restaurant ranking
Title Date Content Reply URL
Database for Blogs
Point for one blog review
= (date * 0.5) + (the number of reply * 0.5)
Point for one restaurant
= the average of reviews’ point
Food_Dictionary Area_Dictionary
If a word is represented
in dictionary over 100 times
Title Date Content Reply URL
Database for Blogs
from WEB encyclopedia
Food_Dictionary Area_Dictionary
Restaurant_Info
Blog
Extract info from Blogs
Table Left Join
Final Output
Extracting
Database Set
For user’s convenience,
Search engine needed
Step 4
4
Using “Apache Solr” that popular, fast open source
enterprise search platform from the Apache Lucene project.
Nutch
Solr
Lucene
Hadoop
Ranked-Restaurant Searching System using Data Mining
solrconfig.xml
data-config.xml
What we want to develop...
Improve filtering blog performance
On Ruby on Rails framework, using BlackLight
served as library to use Solr in Ruby on Rails
Front-End Design using HTML5, CSS3, Javascript, Jquery...
New Ranking Algorithm
Point of Restaurant =
Points of Blogger = Recency, Frequency, Density
Recency, Frequency, People
Based on Blogger’s action
New Ranking Algorithm
Implementation
Back End Computer
Main
Language
DB
Collective
Data
Blog
Front End Computer
Implementation
How can we find
the most popular restaurant
for Pasta in Upland?
Ask Bob!
ThankYou!
서가앤쿡 환여횟집 빕스 설빙 뚝배기 이탈리아
삼겹살 맛집 냉면 맛집 샤브샤브 맛집 보리밥 맛집 초밥 맛집
99% 96% 93% 100% 98%
100% 100% 100% 100% 100%
100 blogs related to Is_Restaurant
Performance
For one keywords => 10 blogs
정치 경제 문화 사회 한동대학교
소녀시대 중앙일보 한겨례 성경 기독교
59% 66% 45% 65% 42%
81% 67% 62% 54% 58%
100 blogs related to Is_Not_Restaurant
Performance
For one keywords => 10 blogs

More Related Content

PDF
Profiling US Restaurants from Billions of Payment Card Transactions
PDF
alacarte.fm presentation
PDF
Restaurants of Seoul - "likes" prediction report
PPTX
Text mining of reviews
PDF
Using Data Science to Transform OpenTable Into Your Local Dining Expert
PPTX
Lets eat presentation_final_20160521
PDF
Using Data Science to Transform OpenTable Into Your Local Dining Expert-(Pabl...
PDF
alacarte.fm, colony style
Profiling US Restaurants from Billions of Payment Card Transactions
alacarte.fm presentation
Restaurants of Seoul - "likes" prediction report
Text mining of reviews
Using Data Science to Transform OpenTable Into Your Local Dining Expert
Lets eat presentation_final_20160521
Using Data Science to Transform OpenTable Into Your Local Dining Expert-(Pabl...
alacarte.fm, colony style

Similar to Ranked-Restaurant Searching System using Data Mining (20)

PPTX
The Machine Learning Guide to Fine Dining
PPTX
Restaraunt Data Analysis using Power BI, Excel and Python
PPTX
Yelp Dataset Challenge
PPTX
Use Cases for Web Scraping for Restaurant and Fast Food Listings
PDF
Use Cases for Web Scraping for Restaurant and Fast Food Listings
PPTX
measuring service quality and customer satisfaction in Fast food restaurants ...
PDF
Discover your-latent-food-graph-with-this-1-weird-trick -- PyData NYC 2019
PDF
Discover You Latent Food Graph with this 1 Weird Trick
PPT
Ethnic Restaurant Selection Patterns of U.S. Tourists in Hong Kong: An Applic...
PDF
Leveraging context to support automated food recognition in restaurants
PDF
SWOT Analysis for Restaurants: A Strategic Guide
PDF
UPSERVE – Restaurant Sales and Analysis System
PPTX
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
PPTX
Presentation fyp
PPTX
How Does Restaurant Attribute Importance Differ by the Type of Customer and R...
PPTX
Analyzing the Big Data of Yelp Academic Dataset
PDF
Velocity+-+The+Little+Black+Book+of+Loyalty
PDF
Recsys 2015: Making Meaningful Restaurant Recommendations at OpenTable
PDF
Web Scraping Restaurant Review Data from Google, Swiggy & Zomato
The Machine Learning Guide to Fine Dining
Restaraunt Data Analysis using Power BI, Excel and Python
Yelp Dataset Challenge
Use Cases for Web Scraping for Restaurant and Fast Food Listings
Use Cases for Web Scraping for Restaurant and Fast Food Listings
measuring service quality and customer satisfaction in Fast food restaurants ...
Discover your-latent-food-graph-with-this-1-weird-trick -- PyData NYC 2019
Discover You Latent Food Graph with this 1 Weird Trick
Ethnic Restaurant Selection Patterns of U.S. Tourists in Hong Kong: An Applic...
Leveraging context to support automated food recognition in restaurants
SWOT Analysis for Restaurants: A Strategic Guide
UPSERVE – Restaurant Sales and Analysis System
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Presentation fyp
How Does Restaurant Attribute Importance Differ by the Type of Customer and R...
Analyzing the Big Data of Yelp Academic Dataset
Velocity+-+The+Little+Black+Book+of+Loyalty
Recsys 2015: Making Meaningful Restaurant Recommendations at OpenTable
Web Scraping Restaurant Review Data from Google, Swiggy & Zomato
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Ad

Ranked-Restaurant Searching System using Data Mining