SlideShare a Scribd company logo
Guided By,
Prof. Dr. Dirk C. Mattfeld,
M.Sc. Jan Brinkmann
Pattern Recognition in Multiple Bikesharing Systems for
Comparability
Presented By,
Athiq Ahamed
Vorlesungstitel | Semester | Kapitel x | Folie 2
Š Prof. Dr. Dirk C. Mattfeld
Motivation for Bikesharing Systems (BSS)
• Increase in the proportion of people living in urban areas
• United Nations predicts by 2050, 86 % of the world will be urbanized
• Most important modes of transport are public and private
• Several problems in urban areas for transport
• Traditional urban transportation does not solve the problems
Vorlesungstitel | Semester | Kapitel x | Folie 3
Š Prof. Dr. Dirk C. Mattfeld
BSS being a Promising Nominee (Shared Mobility)
http://bike-sharing.blogspot.de/
• BSS is a sustainable short-term bicycle rental service
• Cost effective and flexible form of transportation (Customer, Subscriber)
• First BSS program was deployed on July 28, 1965
Benefits
• Health
• Use of free ride (money)
• Pollution control
• Having fun!!!!!!
Vorlesungstitel | Semester | Kapitel x | Folie 4
Š Prof. Dr. Dirk C. Mattfeld
Issues and Hypotheses
Issues
• Difficulties in operating and managing these BSS
• Predicting usage (open data)
• Availability of bikes and free racks
User satisfaction !!!!
Solution
• Planning the location properly and predicting bike usage (activity pattern)
• Prior redistribution of bikes
Hypotheses
1. Rentals and returns depend on the spatial and temporal factors.
2. Profile of users influences rentals and returns.
3. Profile of the user influences in business development in the neighborhood.
Vorlesungstitel | Semester | Kapitel x | Folie 5
Š Prof. Dr. Dirk C. Mattfeld
Knowledge Discovery in Databases
• Preprocessing (Data Cleansing)
• Data reduction / integration / transformation
• Data Mining (Knowledge Discovery)
• Clustering, Classification
• Visualization
• Geo BI
• Several other methods
• Post processing
• Decision support
Vorlesungstitel | Semester | Kapitel x | Folie 6
Š Prof. Dr. Dirk C. Mattfeld
Related Work
• Froehlich, Neumann and Oliver (2009) : Analyzed BSS usage to infer
human mobility patterns in the city (Barcelona).
• Kaltenbrunner et al (2010): Developed a short-term statistical prediction
model for station occupancy (Barcelona).
• Borgnat et al. (2010): Developed a statistical model for predicting the
occupancy of station (Lyon).
• Oliver O’Brien et al. (2013) : They analyzed 38 bikesharing systems,
widespread all around the world (Europe, Middle East, Asia, Australasia
and the Americas)
• Vogel et al. (2011) [69]: They presented complete analyses of operational
data from Vienna’s BSS (Vienna).
Vorlesungstitel | Semester | Kapitel x | Folie 7
Š Prof. Dr. Dirk C. Mattfeld
Problems with Existing Systems
• Existing system analyzed data with simple analysis techniques
• Geocoding with population, household, employment data was never taken
by any of the systems
• Most of their future work is comparability
• No in-depth comparability of systems is done
• None of the systems check comparability of two BSS with similar features
• Very fewer systems analyze cluster movements
Vorlesungstitel | Semester | Kapitel x | Folie 8
Š Prof. Dr. Dirk C. Mattfeld
Why Comparability ? Revisiting
• Comparability is crucial to get insights of activity patterns from multiple
systems
• For predicting or anticipating such a bike activity in future
• Designing a new or extending an existing BSS
• Location for a new system is planned properly
• Serves as an input for several applications
• As a result, bikes or free racks available all the time
Vorlesungstitel | Semester | Kapitel x | Folie 9
Š Prof. Dr. Dirk C. Mattfeld
Which Systems for Comparability ?
• Population
• Weather
• Household ratio
• Economical aspects
• Tourism
Vorlesungstitel | Semester | Kapitel x | Folie 10
Š Prof. Dr. Dirk C. Mattfeld
Which Systems for Comparability?
 Citi Bike NewYork and Capital Bike-Share Washington, D.C
 Open data from Citi Bike and Capital Bikeshare websites(2014)
Citi Bike NewYork Capital Bike-Share Washington
332 station ids with 340 station’s 356 stations
Annual (45 minutes free), 24
hour, 7(30 minutes free)
Annual, 30-day, 3 day,1 day( free
for 30 minutes)
80,81,216 trips approx per year
which reduced by 3% approx
after data cleansing
29,45,512 trips approx per year
which reduced by 3% approx
after data cleansing
Thousands of bikes, kiosks,
docking stations……
Thousands of bikes, kiosks
docking stations ……
Vorlesungstitel | Semester | Kapitel x | Folie 11
Š Prof. Dr. Dirk C. Mattfeld
Goal
• To identify patterns in BSS
• To prove the patterns are interesting using Data Mining and Geo BI
• With the hypotheses, one can prove that the patterns are interesting
• When the patterns are interesting hypotheses are proved
• When Hypotheses are proved, systems are comparable
Vorlesungstitel | Semester | Kapitel x | Folie 12
Š Prof. Dr. Dirk C. Mattfeld
Architecture Of NyDc
Clustering and
ClassificationVisualization
Tasks Tools
DB Postgres
Data
Cleansing
Postgres,
SAS
Clustering RapidMiner
Visualization Tableau
NyDC Tableau
Decision
Support
Vorlesungstitel | Semester | Kapitel x | Folie 13
Š Prof. Dr. Dirk C. Mattfeld
Overview of the Process
Duration Start
time
Stop
time
Start_id Stop_id Start
name
Stop
name
Start
longitude
Stop
longitude
Start
latitude
Stop
latitude
Bike_id User
type
Birth
year
Gender
Station ID Rental 0-1 ------ Rental 23-0 Returns 0-1 ------ Return 23-0
ID Start / Stop time Average rentals / returns
Station
ID
Rental 0-1 ------ Rental 23-0 Returns 0-1 ------ Return 23-0 Clusters
Vorlesungstitel | Semester | Kapitel x | Folie 14
Š Prof. Dr. Dirk C. Mattfeld
Data Cleansing (Selection / reduction / Intergration /
Transformation)
 For clustering meaningful attributes is necessary (cleaned)
 Only duration greater than 60 seconds are chosen
 Only summer months are chosen
 Data integrated from multiple data sources and average rentals and returns per
station per hour is calculated
 Input with 48 attributes and one ID after transformation (each hour as an attribute)
Weekday
Casual
Weekday
Subscriber
Weekday
Weekend
Subscriber
Weekend
Casual
Weekend
Data
Vorlesungstitel | Semester | Kapitel x | Folie 15
Š Prof. Dr. Dirk C. Mattfeld
Citi Bike Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 16
Š Prof. Dr. Dirk C. Mattfeld
Capital Bikeshare Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 17
Š Prof. Dr. Dirk C. Mattfeld
Citi Bike Subscriber Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 18
Š Prof. Dr. Dirk C. Mattfeld
Capital Bikeshare Subscriber Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 19
Š Prof. Dr. Dirk C. Mattfeld
Citi Bike Customer Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 20
Š Prof. Dr. Dirk C. Mattfeld
Capital Bikeshare Customer Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 21
Š Prof. Dr. Dirk C. Mattfeld
Data Mining (Knowledge Discovery)
Clustering
• Unsupervised learning, process of grouping common objects
• Data contains no labels
• Common objects are the ones which are similar (members or attributes)
• Idea is to find some structure/pattern in a collection of unlabeled data
• It is learning by observation, not with example (K-means and K-medoids)
• Goal, high intra-cluster similarity opposite for inter-cluster similarity
Areas
• Almost all the research fields
• Market research to medicines
• Image processing to spatial data
Vorlesungstitel | Semester | Kapitel x | Folie 22
Š Prof. Dr. Dirk C. Mattfeld
Data Mining (Knowledge Discovery)
Classification
• Classification is a supervised learning technique
• It’s a process of finding a model or function
• Distinguishes the data consisting of class labels.
• The given data is usually divided into training data (known class label) and
test data (unknown), (K-NN and Naive Bayes)
• Recall : It is the measure of completeness ---- TP/(TP + FN)
• Precision : It is the measure of exactness ---- TP/(TP + FP)
• Accuracy: The percentage of test set tuples that are correctly classified
• by the classifier.
Class “A” Class “Not A”
Test says “A” True Positive False Positive
Test says “Not A” False Negative True Negative
Vorlesungstitel | Semester | Kapitel x | Folie 23
Š Prof. Dr. Dirk C. Mattfeld
K-means: Clustering Algorithm
• A simple clustering algorithm for high intra-cluster similarity and opposite for
inter-cluster similarity
Working
1) It begins by randomly selecting k data points (initial centroids)
2) Creates k empty clusters.
3) It then assign’s exactly one centroid to each cluster.
4) After assigning, it iterates over all instances. It then assigns each data point
to one cluster with the nearest centroid (mean).
5) After each iteration, it computes cluster centroids based on the new data
points.
6) It checks if clustering is good enough (until no change) or it returns to (2).
Vorlesungstitel | Semester | Kapitel x | Folie 24
Š Prof. Dr. Dirk C. Mattfeld
Complicated Questions
How many clusters ???
• Davies–Bouldin index (DBI)
• Accuracy using Classification
• Experience
Why K-means ?
• Davies–Bouldin index (DBI) shows a low value
• High accuracy, precision, and recall using classification algorithms
Pseudo code for NyDc
• Run clustering algorithms
• Get accuracy using classification algorithms (choose the best one)
• Evaluate using Davies-Bouldin Index
• Use Geo BI to validate the analysis or proving the hypotheses
Vorlesungstitel | Semester | Kapitel x | Folie 25
Š Prof. Dr. Dirk C. Mattfeld
Clustering Accuracy Evaluation
Recall
Cluster 0 Cluster 1 Cluster 2 Cluster 3
K-means 83.33 87.5 99.17 95.08
K-medoids 93.17 85.07 84.62 80
EM 87.6 85.71 97.85 85.71
Precision
Cluster 0 Cluster 1 Cluster 2 Cluster 3
K-means 94.59 100 96.77 95.08
K-medoids 92.5 93.44 82.5 75.36
EM 94.64 85.71 86.67 92.31
Accuracy
Naive
Bayes
K-NN
K-means 91.46 96.32
K-medoids 87.33 87.92
EM 91.83 89.56
Vorlesungstitel | Semester | Kapitel x | Folie 26
Š Prof. Dr. Dirk C. Mattfeld
Clustering Validation
For understanding it clearly these clusters are named
• Commuter cluster (active day rental and return)
• Tourist or mix cluster (late afternoon and evening)
• Leisure cluster and utility cluster (active night and early morning)
• Residential or outer city cluster (low activity all time)
Proof for hypothesis one
• Sub-hypothesis 1: Temporal factors: time of the day plays an important
role
• Sub-hypothesis 2: Spatial factors: Location plays an important role
Vorlesungstitel | Semester | Kapitel x | Folie 27
Š Prof. Dr. Dirk C. Mattfeld
Sub-hypothesis 1- Temporal Validation (Citi Bike)
Vorlesungstitel | Semester | Kapitel x | Folie 28
Š Prof. Dr. Dirk C. Mattfeld
Sub-hypothesis 1- Temporal Validation (Capital Bikeshare)
Vorlesungstitel | Semester | Kapitel x | Folie 29
Š Prof. Dr. Dirk C. Mattfeld
Examples for Validation
Commuter
519 - Grand central terminal (railroad terminal)
Dupont station- Dupont circle
Tourist
2006 -Central park
Smithsonian - National mall Washington
Leisure
293- Lafayette Street
U St and 13 St NW- U Street.
Residential
Brooklyn
Arlington county
Vorlesungstitel | Semester | Kapitel x | Folie 30
Š Prof. Dr. Dirk C. Mattfeld
Sub-hypothesis 2: Spatial validation
Citi Bike Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 31
Š Prof. Dr. Dirk C. Mattfeld
Sub-hypothesis 2: Spatial validation
Capital Bikeshare Weekday
Vorlesungstitel | Semester | Kapitel x | Folie 32
Š Prof. Dr. Dirk C. Mattfeld
Proof for Hypothesis 2
• If he or she is a subscriber they are regular commuters
• Subscribers are educated and rich
• Similar morning pickup and return evening pattern (workers) validated in
white color job map
• 41 % of the subscribers are master degree holders and 63 % are under 35
• Subscribers spend more money on BSS than the customers
Vorlesungstitel | Semester | Kapitel x | Folie 33
Š Prof. Dr. Dirk C. Mattfeld
Proof for Hypothesis 2
• Nine in ten survey respondents were employed
• USA census reports shows that only about seven in ten adults in
Washington, D.C., are employed
• Customers are tourists or shoppers visiting neighborhood
• Customers show a less average activity in weekdays and more in the
weekends.
• Late pick ups and active afternoons proves them to be tourists or
shoppers or for household activities
Vorlesungstitel | Semester | Kapitel x | Folie 34
Š Prof. Dr. Dirk C. Mattfeld
Proof for Hypothesis 3
• The riders visit spending locations more frequently
• High average activity in weekends proves to be tourists or leisure user.
• Cyclist visits a supermarket 3.2 times per week
• Motorist visit 2.5 times and spends more money
• If there are more customers or tourist then it’s likely to be better business
in the neighborhood
• If there are more subscriber’s its likely to have a high usage of bikes
Vorlesungstitel | Semester | Kapitel x | Folie 35
Š Prof. Dr. Dirk C. Mattfeld
NyDc
• Since hypotheses are proved, patterns are interesting
• Since interesting patterns are similar they are comparable
• Final comparability model (Nydc) can be used for various applications
• Could serve as a benchmark for comparability study
• Several useful features
• Prediction model prototype was developed using NyDc
• This model can be mapped to the new location
Population Location Temporal Hot spot Household ... Cluster
average
Location decisions
Vorlesungstitel | Semester | Kapitel x | Folie 36
Š Prof. Dr. Dirk C. Mattfeld
Conclusion
• In-depth analysis is done by separating data
• Bike activity patterns are obtained for future prediction
• Hypotheses ( patterns are interesting ) are proved for solving the issues of
BSS
• Yes, the two system’s are comparable to a greater extent
• Can be mapped to other cities BSS design one with similar attributes or
prediction
• Business development, city dynamics or providing location based services.
Vorlesungstitel | Semester | Kapitel x | Folie 37
Š Prof. Dr. Dirk C. Mattfeld
Future Work
• Developing an analogy based bike sharing information system
• Using NyDc to develop a new Artificial intelligence recommendation
system
• Developing better algorithm for predictions (taking more features)
• Personalized data capturing for recommender system (automatic path
calculation with temporal information)
• NyDc comparison to other city somewhere in Asia.
• Human dynamics in different parts of the world.
Vorlesungstitel | Semester | Kapitel x | Folie 38
Š Prof. Dr. Dirk C. Mattfeld

More Related Content

PPT
Using FCA for Visual Browsing
PDF
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
PDF
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
PPTX
Diffusion models beat gans on image synthesis
PDF
8 tips for bike rental store
PPTX
Marketing channel
PPTX
Marketing plan
PPTX
Performance of graph query languages
Using FCA for Visual Browsing
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Diffusion models beat gans on image synthesis
8 tips for bike rental store
Marketing channel
Marketing plan
Performance of graph query languages

Viewers also liked (12)

PPT
Marketing channel business diagram
PDF
Neo4j Spatial - GIS for the rest of us.
PPTX
Marketing plan for metro bikes company (group no.8)
PDF
Data minig with Big data analysis
PPTX
OrientDB vs Neo4j - and an introduction to NoSQL databases
PPTX
Marketing,Indian market overview ,Survey ,Product launch STP, Marketing Mix ,...
PPT
3D PASSWORD
PDF
3d password - Report
PPT
Business plan bikes point
PDF
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
 
PPTX
OrientDB vs Neo4j - Comparison of query/speed/functionality
PDF
Data Modeling with Neo4j
 
Marketing channel business diagram
Neo4j Spatial - GIS for the rest of us.
Marketing plan for metro bikes company (group no.8)
Data minig with Big data analysis
OrientDB vs Neo4j - and an introduction to NoSQL databases
Marketing,Indian market overview ,Survey ,Product launch STP, Marketing Mix ,...
3D PASSWORD
3d password - Report
Business plan bikes point
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
Data Modeling with Neo4j
 
Ad

Similar to Pattern Recognition in Multiple Bike sharing Systems for comparability (20)

PPT
Costs, Policy, and Benefits in Long-term Digital Preservation, by Neil Beagrie
PDF
Towards Effective Device-Aware Federated Learning
PPTX
Australia's Environmental Predictive Capability
PPT
Multimedia Mining
PPTX
DWM- CO2_WAREHOUSE_MINING [Autosaved].pptx
PPTX
LEARN Final Conference: Tutorial Group | Costing RDM
PPTX
LEARN Conference - How to cost
PPTX
Jisc Research data shared service overview and update - May 2016
PPTX
Implementing Open Access: Effective Management of Your Research Data
PDF
Time series analysis : Refresher and Innovations
PPT
20yrs: 2010 KRDS
PPTX
ONS local presents clustering
PPTX
Design and generation of Linked Clinical Data Cube (Semantic Stats 2013)
PPTX
Fa19_P1.pptx
PPTX
Knowledge discovery process
PPT
Systems and Services: Adding Value For Research Data Assets
PPTX
NIST Big Data Public Working Group NBD-PWG
PDF
Fractional step discriminant pruning
PPTX
Data fusion for city live event detection
PPTX
Synopsis Presentation.pptx
Costs, Policy, and Benefits in Long-term Digital Preservation, by Neil Beagrie
Towards Effective Device-Aware Federated Learning
Australia's Environmental Predictive Capability
Multimedia Mining
DWM- CO2_WAREHOUSE_MINING [Autosaved].pptx
LEARN Final Conference: Tutorial Group | Costing RDM
LEARN Conference - How to cost
Jisc Research data shared service overview and update - May 2016
Implementing Open Access: Effective Management of Your Research Data
Time series analysis : Refresher and Innovations
20yrs: 2010 KRDS
ONS local presents clustering
Design and generation of Linked Clinical Data Cube (Semantic Stats 2013)
Fa19_P1.pptx
Knowledge discovery process
Systems and Services: Adding Value For Research Data Assets
NIST Big Data Public Working Group NBD-PWG
Fractional step discriminant pruning
Data fusion for city live event detection
Synopsis Presentation.pptx
Ad

Recently uploaded (20)

PDF
Introduction to the R Programming Language
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Lecture1 pattern recognition............
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Fluorescence-microscope_Botany_detailed content
PDF
annual-report-2024-2025 original latest.
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to the R Programming Language
Introduction to Knowledge Engineering Part 1
Lecture1 pattern recognition............
Clinical guidelines as a resource for EBP(1).pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Data Science and Data Analysis
Supervised vs unsupervised machine learning algorithms
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Fluorescence-microscope_Botany_detailed content
annual-report-2024-2025 original latest.
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
[EN] Industrial Machine Downtime Prediction
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Pattern Recognition in Multiple Bike sharing Systems for comparability

  • 1. Guided By, Prof. Dr. Dirk C. Mattfeld, M.Sc. Jan Brinkmann Pattern Recognition in Multiple Bikesharing Systems for Comparability Presented By, Athiq Ahamed
  • 2. Vorlesungstitel | Semester | Kapitel x | Folie 2 Š Prof. Dr. Dirk C. Mattfeld Motivation for Bikesharing Systems (BSS) • Increase in the proportion of people living in urban areas • United Nations predicts by 2050, 86 % of the world will be urbanized • Most important modes of transport are public and private • Several problems in urban areas for transport • Traditional urban transportation does not solve the problems
  • 3. Vorlesungstitel | Semester | Kapitel x | Folie 3 Š Prof. Dr. Dirk C. Mattfeld BSS being a Promising Nominee (Shared Mobility) http://bike-sharing.blogspot.de/ • BSS is a sustainable short-term bicycle rental service • Cost effective and flexible form of transportation (Customer, Subscriber) • First BSS program was deployed on July 28, 1965 Benefits • Health • Use of free ride (money) • Pollution control • Having fun!!!!!!
  • 4. Vorlesungstitel | Semester | Kapitel x | Folie 4 Š Prof. Dr. Dirk C. Mattfeld Issues and Hypotheses Issues • Difficulties in operating and managing these BSS • Predicting usage (open data) • Availability of bikes and free racks User satisfaction !!!! Solution • Planning the location properly and predicting bike usage (activity pattern) • Prior redistribution of bikes Hypotheses 1. Rentals and returns depend on the spatial and temporal factors. 2. Profile of users influences rentals and returns. 3. Profile of the user influences in business development in the neighborhood.
  • 5. Vorlesungstitel | Semester | Kapitel x | Folie 5 Š Prof. Dr. Dirk C. Mattfeld Knowledge Discovery in Databases • Preprocessing (Data Cleansing) • Data reduction / integration / transformation • Data Mining (Knowledge Discovery) • Clustering, Classification • Visualization • Geo BI • Several other methods • Post processing • Decision support
  • 6. Vorlesungstitel | Semester | Kapitel x | Folie 6 Š Prof. Dr. Dirk C. Mattfeld Related Work • Froehlich, Neumann and Oliver (2009) : Analyzed BSS usage to infer human mobility patterns in the city (Barcelona). • Kaltenbrunner et al (2010): Developed a short-term statistical prediction model for station occupancy (Barcelona). • Borgnat et al. (2010): Developed a statistical model for predicting the occupancy of station (Lyon). • Oliver O’Brien et al. (2013) : They analyzed 38 bikesharing systems, widespread all around the world (Europe, Middle East, Asia, Australasia and the Americas) • Vogel et al. (2011) [69]: They presented complete analyses of operational data from Vienna’s BSS (Vienna).
  • 7. Vorlesungstitel | Semester | Kapitel x | Folie 7 Š Prof. Dr. Dirk C. Mattfeld Problems with Existing Systems • Existing system analyzed data with simple analysis techniques • Geocoding with population, household, employment data was never taken by any of the systems • Most of their future work is comparability • No in-depth comparability of systems is done • None of the systems check comparability of two BSS with similar features • Very fewer systems analyze cluster movements
  • 8. Vorlesungstitel | Semester | Kapitel x | Folie 8 Š Prof. Dr. Dirk C. Mattfeld Why Comparability ? Revisiting • Comparability is crucial to get insights of activity patterns from multiple systems • For predicting or anticipating such a bike activity in future • Designing a new or extending an existing BSS • Location for a new system is planned properly • Serves as an input for several applications • As a result, bikes or free racks available all the time
  • 9. Vorlesungstitel | Semester | Kapitel x | Folie 9 Š Prof. Dr. Dirk C. Mattfeld Which Systems for Comparability ? • Population • Weather • Household ratio • Economical aspects • Tourism
  • 10. Vorlesungstitel | Semester | Kapitel x | Folie 10 Š Prof. Dr. Dirk C. Mattfeld Which Systems for Comparability?  Citi Bike NewYork and Capital Bike-Share Washington, D.C  Open data from Citi Bike and Capital Bikeshare websites(2014) Citi Bike NewYork Capital Bike-Share Washington 332 station ids with 340 station’s 356 stations Annual (45 minutes free), 24 hour, 7(30 minutes free) Annual, 30-day, 3 day,1 day( free for 30 minutes) 80,81,216 trips approx per year which reduced by 3% approx after data cleansing 29,45,512 trips approx per year which reduced by 3% approx after data cleansing Thousands of bikes, kiosks, docking stations…… Thousands of bikes, kiosks docking stations ……
  • 11. Vorlesungstitel | Semester | Kapitel x | Folie 11 Š Prof. Dr. Dirk C. Mattfeld Goal • To identify patterns in BSS • To prove the patterns are interesting using Data Mining and Geo BI • With the hypotheses, one can prove that the patterns are interesting • When the patterns are interesting hypotheses are proved • When Hypotheses are proved, systems are comparable
  • 12. Vorlesungstitel | Semester | Kapitel x | Folie 12 Š Prof. Dr. Dirk C. Mattfeld Architecture Of NyDc Clustering and ClassificationVisualization Tasks Tools DB Postgres Data Cleansing Postgres, SAS Clustering RapidMiner Visualization Tableau NyDC Tableau Decision Support
  • 13. Vorlesungstitel | Semester | Kapitel x | Folie 13 Š Prof. Dr. Dirk C. Mattfeld Overview of the Process Duration Start time Stop time Start_id Stop_id Start name Stop name Start longitude Stop longitude Start latitude Stop latitude Bike_id User type Birth year Gender Station ID Rental 0-1 ------ Rental 23-0 Returns 0-1 ------ Return 23-0 ID Start / Stop time Average rentals / returns Station ID Rental 0-1 ------ Rental 23-0 Returns 0-1 ------ Return 23-0 Clusters
  • 14. Vorlesungstitel | Semester | Kapitel x | Folie 14 Š Prof. Dr. Dirk C. Mattfeld Data Cleansing (Selection / reduction / Intergration / Transformation)  For clustering meaningful attributes is necessary (cleaned)  Only duration greater than 60 seconds are chosen  Only summer months are chosen  Data integrated from multiple data sources and average rentals and returns per station per hour is calculated  Input with 48 attributes and one ID after transformation (each hour as an attribute) Weekday Casual Weekday Subscriber Weekday Weekend Subscriber Weekend Casual Weekend Data
  • 15. Vorlesungstitel | Semester | Kapitel x | Folie 15 Š Prof. Dr. Dirk C. Mattfeld Citi Bike Weekday
  • 16. Vorlesungstitel | Semester | Kapitel x | Folie 16 Š Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Weekday
  • 17. Vorlesungstitel | Semester | Kapitel x | Folie 17 Š Prof. Dr. Dirk C. Mattfeld Citi Bike Subscriber Weekday
  • 18. Vorlesungstitel | Semester | Kapitel x | Folie 18 Š Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Subscriber Weekday
  • 19. Vorlesungstitel | Semester | Kapitel x | Folie 19 Š Prof. Dr. Dirk C. Mattfeld Citi Bike Customer Weekday
  • 20. Vorlesungstitel | Semester | Kapitel x | Folie 20 Š Prof. Dr. Dirk C. Mattfeld Capital Bikeshare Customer Weekday
  • 21. Vorlesungstitel | Semester | Kapitel x | Folie 21 Š Prof. Dr. Dirk C. Mattfeld Data Mining (Knowledge Discovery) Clustering • Unsupervised learning, process of grouping common objects • Data contains no labels • Common objects are the ones which are similar (members or attributes) • Idea is to find some structure/pattern in a collection of unlabeled data • It is learning by observation, not with example (K-means and K-medoids) • Goal, high intra-cluster similarity opposite for inter-cluster similarity Areas • Almost all the research fields • Market research to medicines • Image processing to spatial data
  • 22. Vorlesungstitel | Semester | Kapitel x | Folie 22 Š Prof. Dr. Dirk C. Mattfeld Data Mining (Knowledge Discovery) Classification • Classification is a supervised learning technique • It’s a process of finding a model or function • Distinguishes the data consisting of class labels. • The given data is usually divided into training data (known class label) and test data (unknown), (K-NN and Naive Bayes) • Recall : It is the measure of completeness ---- TP/(TP + FN) • Precision : It is the measure of exactness ---- TP/(TP + FP) • Accuracy: The percentage of test set tuples that are correctly classified • by the classifier. Class “A” Class “Not A” Test says “A” True Positive False Positive Test says “Not A” False Negative True Negative
  • 23. Vorlesungstitel | Semester | Kapitel x | Folie 23 Š Prof. Dr. Dirk C. Mattfeld K-means: Clustering Algorithm • A simple clustering algorithm for high intra-cluster similarity and opposite for inter-cluster similarity Working 1) It begins by randomly selecting k data points (initial centroids) 2) Creates k empty clusters. 3) It then assign’s exactly one centroid to each cluster. 4) After assigning, it iterates over all instances. It then assigns each data point to one cluster with the nearest centroid (mean). 5) After each iteration, it computes cluster centroids based on the new data points. 6) It checks if clustering is good enough (until no change) or it returns to (2).
  • 24. Vorlesungstitel | Semester | Kapitel x | Folie 24 Š Prof. Dr. Dirk C. Mattfeld Complicated Questions How many clusters ??? • Davies–Bouldin index (DBI) • Accuracy using Classification • Experience Why K-means ? • Davies–Bouldin index (DBI) shows a low value • High accuracy, precision, and recall using classification algorithms Pseudo code for NyDc • Run clustering algorithms • Get accuracy using classification algorithms (choose the best one) • Evaluate using Davies-Bouldin Index • Use Geo BI to validate the analysis or proving the hypotheses
  • 25. Vorlesungstitel | Semester | Kapitel x | Folie 25 Š Prof. Dr. Dirk C. Mattfeld Clustering Accuracy Evaluation Recall Cluster 0 Cluster 1 Cluster 2 Cluster 3 K-means 83.33 87.5 99.17 95.08 K-medoids 93.17 85.07 84.62 80 EM 87.6 85.71 97.85 85.71 Precision Cluster 0 Cluster 1 Cluster 2 Cluster 3 K-means 94.59 100 96.77 95.08 K-medoids 92.5 93.44 82.5 75.36 EM 94.64 85.71 86.67 92.31 Accuracy Naive Bayes K-NN K-means 91.46 96.32 K-medoids 87.33 87.92 EM 91.83 89.56
  • 26. Vorlesungstitel | Semester | Kapitel x | Folie 26 Š Prof. Dr. Dirk C. Mattfeld Clustering Validation For understanding it clearly these clusters are named • Commuter cluster (active day rental and return) • Tourist or mix cluster (late afternoon and evening) • Leisure cluster and utility cluster (active night and early morning) • Residential or outer city cluster (low activity all time) Proof for hypothesis one • Sub-hypothesis 1: Temporal factors: time of the day plays an important role • Sub-hypothesis 2: Spatial factors: Location plays an important role
  • 27. Vorlesungstitel | Semester | Kapitel x | Folie 27 Š Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 1- Temporal Validation (Citi Bike)
  • 28. Vorlesungstitel | Semester | Kapitel x | Folie 28 Š Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 1- Temporal Validation (Capital Bikeshare)
  • 29. Vorlesungstitel | Semester | Kapitel x | Folie 29 Š Prof. Dr. Dirk C. Mattfeld Examples for Validation Commuter 519 - Grand central terminal (railroad terminal) Dupont station- Dupont circle Tourist 2006 -Central park Smithsonian - National mall Washington Leisure 293- Lafayette Street U St and 13 St NW- U Street. Residential Brooklyn Arlington county
  • 30. Vorlesungstitel | Semester | Kapitel x | Folie 30 Š Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 2: Spatial validation Citi Bike Weekday
  • 31. Vorlesungstitel | Semester | Kapitel x | Folie 31 Š Prof. Dr. Dirk C. Mattfeld Sub-hypothesis 2: Spatial validation Capital Bikeshare Weekday
  • 32. Vorlesungstitel | Semester | Kapitel x | Folie 32 Š Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 2 • If he or she is a subscriber they are regular commuters • Subscribers are educated and rich • Similar morning pickup and return evening pattern (workers) validated in white color job map • 41 % of the subscribers are master degree holders and 63 % are under 35 • Subscribers spend more money on BSS than the customers
  • 33. Vorlesungstitel | Semester | Kapitel x | Folie 33 Š Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 2 • Nine in ten survey respondents were employed • USA census reports shows that only about seven in ten adults in Washington, D.C., are employed • Customers are tourists or shoppers visiting neighborhood • Customers show a less average activity in weekdays and more in the weekends. • Late pick ups and active afternoons proves them to be tourists or shoppers or for household activities
  • 34. Vorlesungstitel | Semester | Kapitel x | Folie 34 Š Prof. Dr. Dirk C. Mattfeld Proof for Hypothesis 3 • The riders visit spending locations more frequently • High average activity in weekends proves to be tourists or leisure user. • Cyclist visits a supermarket 3.2 times per week • Motorist visit 2.5 times and spends more money • If there are more customers or tourist then it’s likely to be better business in the neighborhood • If there are more subscriber’s its likely to have a high usage of bikes
  • 35. Vorlesungstitel | Semester | Kapitel x | Folie 35 Š Prof. Dr. Dirk C. Mattfeld NyDc • Since hypotheses are proved, patterns are interesting • Since interesting patterns are similar they are comparable • Final comparability model (Nydc) can be used for various applications • Could serve as a benchmark for comparability study • Several useful features • Prediction model prototype was developed using NyDc • This model can be mapped to the new location Population Location Temporal Hot spot Household ... Cluster average Location decisions
  • 36. Vorlesungstitel | Semester | Kapitel x | Folie 36 Š Prof. Dr. Dirk C. Mattfeld Conclusion • In-depth analysis is done by separating data • Bike activity patterns are obtained for future prediction • Hypotheses ( patterns are interesting ) are proved for solving the issues of BSS • Yes, the two system’s are comparable to a greater extent • Can be mapped to other cities BSS design one with similar attributes or prediction • Business development, city dynamics or providing location based services.
  • 37. Vorlesungstitel | Semester | Kapitel x | Folie 37 Š Prof. Dr. Dirk C. Mattfeld Future Work • Developing an analogy based bike sharing information system • Using NyDc to develop a new Artificial intelligence recommendation system • Developing better algorithm for predictions (taking more features) • Personalized data capturing for recommender system (automatic path calculation with temporal information) • NyDc comparison to other city somewhere in Asia. • Human dynamics in different parts of the world.
  • 38. Vorlesungstitel | Semester | Kapitel x | Folie 38 Š Prof. Dr. Dirk C. Mattfeld