SlideShare a Scribd company logo
Autor(i) Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
The Collection and Analysis of
Public Data
Case study - Bucharest
Costin-Gabriel CHIRU and Constantin Ciprian MIHAILA
costin.chiru@cs.pub.ro, cipri.mihaila@gmail.com
Purpose
• A method for collecting and analyzing data within
urban settlements – case study: Bucharest
• Purpose: collect important information about
different streets, points of interests, details about
urban planning, etc.
• Goals:
– facilitating a quick and correct evaluation of specific
areas (the proximity of different points of interest)
and
– identifying suitable location for adding new points of
interest (using heuristics and data mining techniques
such as clustering algorithms, association rules)
12.09.2014 RoeduNet 2014 2
Introduction
• Public data = information produced or held by a
certain person, institution or company, that can
be accessed, reused, redistributed in a free way
by any citizen.
• Efficient use of this data may contribute to the
improvement of people's lives and to the
intelligent development of a city (e.g. reducing
pollution, recycling, optimal use of infrastructure,
traffic management, efficiency of public
transport, planning of new construction,
customers information on real data, etc.)
12.09.2014 RoeduNet 2014 3
State-of-the-art
• Applications for obtaining directions / evaluating different
locations (Google Maps)
– Advantage: allows users to mark different locations on the
existing maps, offering information about their location (hotels,
bars, hospitals, shops, public transportation stations, etc.)
– Drawbacks: it has a relatively small number of annotations
(marks for different points of interest) and it doesn't make any
difference between the points that are marked  it doesn't
allow for specific types of interest points
• Applications for tourists
– Advantage: offer information about locations like restaurants,
bars and coffee shops (+ ratings), recommendations, maps,
itinerary plans and attractions
– Drawback: limited to the touristic relevant categories of point
of interest
12.09.2014 RoeduNet 2014 4
State-of-the-art
• Similar to Yelp, which allows searching for points
of interest from different categories: food,
nightlife, shopping, health & medical, etc.
– Drawback: suggestions only for the most popular
cities around the world
• The identification of suitable locations for adding
new points of interest used the framework for
spatial data mining from Chawla, Shekhar and
Wu, that is trying to predict locations using map
similarity metrics
12.09.2014 RoeduNet 2014 5
Data Collection
• Points of Interest and Streets Data Collection
– Using a Web Crawler for http://strazi.rou.ro/ (data divided
into categories and subcategories - airports, agencies,
banks, churches, shops - and included associated details -
longitude, latitude, city and street where it is placed)
– Servicii Google (Google Places API) – allows four types of
search: nearby search, radar search, text search, details
search. (e.g. information about 200 schools from
Bucharest perimeter)
• Urban Planning Data
– Extracted images having spatial coordinates and legend (
http://www.melon.ro/maps/PUG_BUCURESTI_IE.html )
– This information was integrated in the current project by
adding a new layer on top of Google Maps (built from
these images)
– Extracted and saved the information about the legend
12.09.2014 RoeduNet 2014 6
Evaluating Proximity of a Location
• Present the information in an useful manner by evaluating
the proximity of a given location
• 2 different ways of evaluation:
– Radius search: searching for points inside a circle whose radius
and center are selected by the user  results: list of points of
interest that are found within the selected area, along with their
details
• Scenario: an old person wants to buy a house and he/she needs to see
how many points of interest are within walking distance (shops,
transportation, hospitals, etc.).
– Searching the closest points of interest from a selected point.
This method receives as parameters the current position and
one or more locations types that the user is interested in (e.g.
schools, banks, shops, hospitals, etc.) and will display the
nearest point from each selected category (according to the
Haversine distance) + their information.
• Scenario: someone needs to know where is the closest place where
he can buy some drugs or where is the closest doctor
12.09.2014 RoeduNet 2014 7
Evaluating Proximity of a Location
12.09.2014 RoeduNet 2014 8
Radius search Closest points of interest
Town Planning Analysis
• Additionally, we also make an analysis of the town planning
in the selected area (identify the main urban areas and the
% they cover within it)
• Works with the radius evaluation because, in this case, we
can estimate the evaluated area (which is not possible in
the case of the closest points of interest)
• Takes into account the tiles that have their center inside
the evaluation area (circle)
• Results: a sorted table that contains the average % of
different area types within the area, along with their
legend descriptions.
• Scenario: when one wants to buy a house, he/she might be
interested what type of area is in the neighborhood, as this
is an important information that influences the price of the
house (e.g. how central it is, if there are public parks/
factories in the nearby).
12.09.2014 RoeduNet 2014 9
Location Prediction
• Identification of suitable location for adding new
points of interest such as: shops, banks, schools,
hospitals, etc.
• Highly dependent on the information collected
about different settlements, as each settlement
has its own specificity
• We worked on the data that we collected about
Bucharest, which consists in locations of various
(categorized) points of interest and the city
planning (offering details about regulations and
local rules, urban area delimitation, traffic
network structure, type and height of buildings,
etc.)
12.09.2014 RoeduNet 2014 10
Location Prediction
• Using Data Mining techniques:
– Clustering Algorithms (Hierarchical Clustering, DBSCAN) – used for
analyzing the clusters built with the points of interest from the same
category (agencies, banks, schools, shops)  determine a clustering
coefficient for each type of points
– Rules associations: rules consist of linking the urban plan legends to
the points of interest  identify points of interest that can be found
inside the urban planning area and ones that cannot be found there.
• Using heuristics:
– based on the similarities and differences between different urban
planning areas  assumption: the categories of points of interest are
uniformly distributed in all areas of the same type
– evaluation of an area to ensure that if we want to add a specific point
type in that area, such a point does not already exist  radius
representing the cluster coefficient previously computed and the
circle center being the same with the center of the group of tiles from
that area
12.09.2014 RoeduNet 2014 11
Location Prediction
12.09.2014 RoeduNet 2014 12
Hierarchical Clustering
DBSCAN
Suitable location for bars in
a specific urban area”
Conclusions
• Public data = important source of information that can be automatic
analyzed using algorithms and techniques from the data mining
• Bucharest case study  for a fast, efficient and correct town area
evaluation and for the identification of suitable locations for adding
new points of interest
• The evaluation part has a medium complexity, but increased utility
• The prediction part involves high complexity algorithms that use a lot
of data
• Posibile improvements:
– find new sources of data to be added in the system
– porting the application on mobile devices
– Identify better algorithms and heuristics for the prediction part
– Take advantage on the ratings provided by different users
– Can be easily adapted for other towns
12.09.2014 RoeduNet 2014 13
Questions
12.09.2014 RoeduNet 2014 12
Thank you very much!

More Related Content

PPTX
Baranzelli, C. - Measuring walkable neighbourhoods using new open data
PPTX
Sensing City Potential through Social Data @ ICMU2014 Panel
PPTX
Quettra Design Problem Solution - Deepti Chafekar
PPTX
Predicting the popular areas
PDF
Paper id 41201614
PPTX
A New Venue for Toronto: IBM Data Science Capstone Project by F. F. Mulks
PDF
Where should-you-go
PPTX
Enhanced Urban Planning through Disruptive Technologies for more Age Friendl...
Baranzelli, C. - Measuring walkable neighbourhoods using new open data
Sensing City Potential through Social Data @ ICMU2014 Panel
Quettra Design Problem Solution - Deepti Chafekar
Predicting the popular areas
Paper id 41201614
A New Venue for Toronto: IBM Data Science Capstone Project by F. F. Mulks
Where should-you-go
Enhanced Urban Planning through Disruptive Technologies for more Age Friendl...

Similar to The collection and analysis of public data - Bucharest case study (20)

PPTX
Impact of Urban Revitalization in Birmingham Alabama
PPTX
Opportunities for alternative data sources
PDF
IRJET- Efficient Geo-tagging of images using LASOM
PDF
Giovinazzo_ In2
PDF
The network structure of visited locations according to geotagged social medi...
PDF
IRJET- Popularity based Recommender Sytsem for Google Maps
PDF
Measuring System Performance in Cultural Heritage Systems
PDF
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
PDF
IRJET- Cost Comparison of different Grid Patterns of Floor Slab of Same Span
PPTX
Clustering-based Location Recommendation(Collaborative Filtering)
PDF
Predicting Venues in Location Based Social Network
PDF
PREDICTING VENUES IN LOCATION BASED SOCIAL NETWORK
PDF
Consumption capability analysis for Micro-blog users based on data mining
PPTX
Ibm slide
PDF
Schrenk ppt gi2011_ceit_en_final
PDF
unica360 spatial microdata list
PDF
TOURIST PLACE RECOMMENDATION SYSTEM
PDF
A Unified Framework for Retrieving Diverse Social Images
PDF
Cognitive Cities: City analytics
PDF
Travel Recommendation Approach using Collaboration Filter in Social Networking
Impact of Urban Revitalization in Birmingham Alabama
Opportunities for alternative data sources
IRJET- Efficient Geo-tagging of images using LASOM
Giovinazzo_ In2
The network structure of visited locations according to geotagged social medi...
IRJET- Popularity based Recommender Sytsem for Google Maps
Measuring System Performance in Cultural Heritage Systems
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
IRJET- Cost Comparison of different Grid Patterns of Floor Slab of Same Span
Clustering-based Location Recommendation(Collaborative Filtering)
Predicting Venues in Location Based Social Network
PREDICTING VENUES IN LOCATION BASED SOCIAL NETWORK
Consumption capability analysis for Micro-blog users based on data mining
Ibm slide
Schrenk ppt gi2011_ceit_en_final
unica360 spatial microdata list
TOURIST PLACE RECOMMENDATION SYSTEM
A Unified Framework for Retrieving Diverse Social Images
Cognitive Cities: City analytics
Travel Recommendation Approach using Collaboration Filter in Social Networking
Ad

More from University Politehnica Bucharest (20)

PPT
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PPT
Time series analysis for sales prediction
PPTX
Identification and Classification of the Most Important Moments in Students’ ...
PPTX
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
PPTX
Identifying cyclic words with the help of google
PPT
Expression of Political Opinions in Press
PPT
Determine the time period when a text was written using time series analysis
PPT
Using machine learning to generate predictions based on the information extra...
PPT
Hearthstone helper using optical character recognition techniques for cards d...
PPT
Movie recommender system using the user's psychological profile
PPT
Tracing the paths between concepts in large bio medical corpora
PPT
Archaisms and neologisms identification in texts
PPT
Unsupervised system for automatic grading of bachelor and master thesis
PPT
Tweets topic modelling across different countries prezentarea
PPT
Sentiment based text segmentation
PPTX
Creativity detection in texts
PPT
Nlp based heuristics for assessing participants in cscl chats
PPT
Detecting discourse creativity in chat conversations
PPT
PDF
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
Time series analysis for sales prediction
Identification and Classification of the Most Important Moments in Students’ ...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Identifying cyclic words with the help of google
Expression of Political Opinions in Press
Determine the time period when a text was written using time series analysis
Using machine learning to generate predictions based on the information extra...
Hearthstone helper using optical character recognition techniques for cards d...
Movie recommender system using the user's psychological profile
Tracing the paths between concepts in large bio medical corpora
Archaisms and neologisms identification in texts
Unsupervised system for automatic grading of bachelor and master thesis
Tweets topic modelling across different countries prezentarea
Sentiment based text segmentation
Creativity detection in texts
Nlp based heuristics for assessing participants in cscl chats
Detecting discourse creativity in chat conversations
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
Ad

Recently uploaded (20)

PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
2. Earth - The Living Planet earth and life
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
Comparative Structure of Integument in Vertebrates.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
7. General Toxicologyfor clinical phrmacy.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
2. Earth - The Living Planet earth and life
neck nodes and dissection types and lymph nodes levels
Derivatives of integument scales, beaks, horns,.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Introduction to Cardiovascular system_structure and functions-1
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
bbec55_b34400a7914c42429908233dbd381773.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Introduction to Fisheries Biotechnology_Lesson 1.pptx

The collection and analysis of public data - Bucharest case study

  • 1. Autor(i) Conducător științific Universitatea Politehnica București Facultatea de Automatică și Calculatoare Catedra de Calculatoare The Collection and Analysis of Public Data Case study - Bucharest Costin-Gabriel CHIRU and Constantin Ciprian MIHAILA costin.chiru@cs.pub.ro, cipri.mihaila@gmail.com
  • 2. Purpose • A method for collecting and analyzing data within urban settlements – case study: Bucharest • Purpose: collect important information about different streets, points of interests, details about urban planning, etc. • Goals: – facilitating a quick and correct evaluation of specific areas (the proximity of different points of interest) and – identifying suitable location for adding new points of interest (using heuristics and data mining techniques such as clustering algorithms, association rules) 12.09.2014 RoeduNet 2014 2
  • 3. Introduction • Public data = information produced or held by a certain person, institution or company, that can be accessed, reused, redistributed in a free way by any citizen. • Efficient use of this data may contribute to the improvement of people's lives and to the intelligent development of a city (e.g. reducing pollution, recycling, optimal use of infrastructure, traffic management, efficiency of public transport, planning of new construction, customers information on real data, etc.) 12.09.2014 RoeduNet 2014 3
  • 4. State-of-the-art • Applications for obtaining directions / evaluating different locations (Google Maps) – Advantage: allows users to mark different locations on the existing maps, offering information about their location (hotels, bars, hospitals, shops, public transportation stations, etc.) – Drawbacks: it has a relatively small number of annotations (marks for different points of interest) and it doesn't make any difference between the points that are marked  it doesn't allow for specific types of interest points • Applications for tourists – Advantage: offer information about locations like restaurants, bars and coffee shops (+ ratings), recommendations, maps, itinerary plans and attractions – Drawback: limited to the touristic relevant categories of point of interest 12.09.2014 RoeduNet 2014 4
  • 5. State-of-the-art • Similar to Yelp, which allows searching for points of interest from different categories: food, nightlife, shopping, health & medical, etc. – Drawback: suggestions only for the most popular cities around the world • The identification of suitable locations for adding new points of interest used the framework for spatial data mining from Chawla, Shekhar and Wu, that is trying to predict locations using map similarity metrics 12.09.2014 RoeduNet 2014 5
  • 6. Data Collection • Points of Interest and Streets Data Collection – Using a Web Crawler for http://strazi.rou.ro/ (data divided into categories and subcategories - airports, agencies, banks, churches, shops - and included associated details - longitude, latitude, city and street where it is placed) – Servicii Google (Google Places API) – allows four types of search: nearby search, radar search, text search, details search. (e.g. information about 200 schools from Bucharest perimeter) • Urban Planning Data – Extracted images having spatial coordinates and legend ( http://www.melon.ro/maps/PUG_BUCURESTI_IE.html ) – This information was integrated in the current project by adding a new layer on top of Google Maps (built from these images) – Extracted and saved the information about the legend 12.09.2014 RoeduNet 2014 6
  • 7. Evaluating Proximity of a Location • Present the information in an useful manner by evaluating the proximity of a given location • 2 different ways of evaluation: – Radius search: searching for points inside a circle whose radius and center are selected by the user  results: list of points of interest that are found within the selected area, along with their details • Scenario: an old person wants to buy a house and he/she needs to see how many points of interest are within walking distance (shops, transportation, hospitals, etc.). – Searching the closest points of interest from a selected point. This method receives as parameters the current position and one or more locations types that the user is interested in (e.g. schools, banks, shops, hospitals, etc.) and will display the nearest point from each selected category (according to the Haversine distance) + their information. • Scenario: someone needs to know where is the closest place where he can buy some drugs or where is the closest doctor 12.09.2014 RoeduNet 2014 7
  • 8. Evaluating Proximity of a Location 12.09.2014 RoeduNet 2014 8 Radius search Closest points of interest
  • 9. Town Planning Analysis • Additionally, we also make an analysis of the town planning in the selected area (identify the main urban areas and the % they cover within it) • Works with the radius evaluation because, in this case, we can estimate the evaluated area (which is not possible in the case of the closest points of interest) • Takes into account the tiles that have their center inside the evaluation area (circle) • Results: a sorted table that contains the average % of different area types within the area, along with their legend descriptions. • Scenario: when one wants to buy a house, he/she might be interested what type of area is in the neighborhood, as this is an important information that influences the price of the house (e.g. how central it is, if there are public parks/ factories in the nearby). 12.09.2014 RoeduNet 2014 9
  • 10. Location Prediction • Identification of suitable location for adding new points of interest such as: shops, banks, schools, hospitals, etc. • Highly dependent on the information collected about different settlements, as each settlement has its own specificity • We worked on the data that we collected about Bucharest, which consists in locations of various (categorized) points of interest and the city planning (offering details about regulations and local rules, urban area delimitation, traffic network structure, type and height of buildings, etc.) 12.09.2014 RoeduNet 2014 10
  • 11. Location Prediction • Using Data Mining techniques: – Clustering Algorithms (Hierarchical Clustering, DBSCAN) – used for analyzing the clusters built with the points of interest from the same category (agencies, banks, schools, shops)  determine a clustering coefficient for each type of points – Rules associations: rules consist of linking the urban plan legends to the points of interest  identify points of interest that can be found inside the urban planning area and ones that cannot be found there. • Using heuristics: – based on the similarities and differences between different urban planning areas  assumption: the categories of points of interest are uniformly distributed in all areas of the same type – evaluation of an area to ensure that if we want to add a specific point type in that area, such a point does not already exist  radius representing the cluster coefficient previously computed and the circle center being the same with the center of the group of tiles from that area 12.09.2014 RoeduNet 2014 11
  • 12. Location Prediction 12.09.2014 RoeduNet 2014 12 Hierarchical Clustering DBSCAN Suitable location for bars in a specific urban area”
  • 13. Conclusions • Public data = important source of information that can be automatic analyzed using algorithms and techniques from the data mining • Bucharest case study  for a fast, efficient and correct town area evaluation and for the identification of suitable locations for adding new points of interest • The evaluation part has a medium complexity, but increased utility • The prediction part involves high complexity algorithms that use a lot of data • Posibile improvements: – find new sources of data to be added in the system – porting the application on mobile devices – Identify better algorithms and heuristics for the prediction part – Take advantage on the ratings provided by different users – Can be easily adapted for other towns 12.09.2014 RoeduNet 2014 13
  • 14. Questions 12.09.2014 RoeduNet 2014 12 Thank you very much!

Editor's Notes