SlideShare a Scribd company logo
Eduard Lazar - CitySprint
A geospatial and time series analysis
of the
CitySprint fleet
Blue signals a pick-up
Red signals a drop-off
Sample of how one driver’s journey looks like
Used for:
• Viewing the base unit of analysis
Demand heat map
Heat map of pickup locations density
Used for:
• Optimising resource allocation
• Identifying areas for potential expansion
K-means clustering analysis – 40 centres
Employed the K-means algorithm to identify clusters
of pickup points
Used for:
• Validating against current service centres map
• Identifying areas for potential expansion
K-means 100 centres
Higher granularity clustering
Used for:
• Assessing the frequency of pickups for micro-
clusters (e.g. villages, neighbourhoods)
• Directing drivers to hotter waiting areas
Geographical supply & demand
Pickup locations shown vs to routes
Used for:
• Improving likelihood of parcel pickup while on-route
0.0
4.5
9.0
13.5
18.0
0 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Expectedparcels
Time of day
Expected parcels allocated to cluster 41 (Stevenage)
Demand variation across time
Used for:
• Positioning couriers in the right place at the right time
For each demand cluster we calculated the
frequency of pickups per hour
The solution outline
• Data science capabilities of Spark, easy to use with SQL knowledge
• Map plotting on ARGIS – heat mapping, zoom in/out capabilities, real-time
• High-performance due to in-memory processing capabilities of Spark
• Can work with large data sets due to high performance disk-based data access
in Hadoop File System (HDFS)
• Can import data from EDW
Why Bigstep?
• Easy to use - Easy to deploy, redeploy, erase and rewind. Easy to experiment with
• Big Data Focus – Infrastructure, orchestration, and software ecosystem deliver
performance & ease of use for big data
• Domain Experts – Extensive hands-on experience in delivering complex big data
solutions for multiple verticals & use cases
• Consultative Approach – Direct contact and support from experienced big data, devops,
and infrastructure specialists
• Best In Class Infrastructure – The world’s highest performance cloud
Eduard Lazar - CitySprint

More Related Content

PDF
1Spatial: Edinburgh FME World Tour: Performance tips
PDF
Position Navigation and Timing Applications | Chaz Dixon | March 2015
PPTX
Hadoop
PDF
Computation of spatial data on Hadoop Cluster
PDF
Large Scale Geo Processing on Hadoop
PDF
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
PDF
CEPH DAY BERLIN - CEPH IMPLEMENTATIONS FOR THE MEERKAT RADIO TELESCOPE
PDF
DSD-INT 2018 Earth Science Through Datacubes - Merticariu
1Spatial: Edinburgh FME World Tour: Performance tips
Position Navigation and Timing Applications | Chaz Dixon | March 2015
Hadoop
Computation of spatial data on Hadoop Cluster
Large Scale Geo Processing on Hadoop
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
CEPH DAY BERLIN - CEPH IMPLEMENTATIONS FOR THE MEERKAT RADIO TELESCOPE
DSD-INT 2018 Earth Science Through Datacubes - Merticariu

What's hot (19)

PPTX
Weather Data Analytics Using Hadoop
PPTX
Advancing Scientific Data Support in ArcGIS
PDF
Geocap seismic oil and gas for ArcGIS- Oil and Gas seminar October 10th
PPTX
Andrew Fage presentation
PPTX
Presentation may30th
PPTX
TrueReusableCode-BigDataCodeCamp2016
PDF
Leveraging Map Reduce With Hadoop for Weather Data Analytics
PPT
SEPA - Esri UK Annual Conference 2016
PDF
Petroleum lunch seminar 30.10.2014
PPTX
2016 - IGNITE - Terraform to go from Zero to Prod in less than 1 month and TH...
PDF
Co gps energy efficient gps sensing with cloud offloading
DOCX
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
PDF
Post conversion of Lidar data on complex terrains
PPTX
Atmos - Tom hartley - Modelling Bird Behaviour to Progress Wind Farm Development
PPT
Watershed development and drainage assessments
PPTX
Scaling graphite to handle a zerg rush
PDF
Testbed in aarhus for precision positioning and autonomous systems (tapas)
PDF
GeoTrellis, GIS on Scala
PDF
LIDAR-derived DTM for archaeology and landscape history research some recent ...
Weather Data Analytics Using Hadoop
Advancing Scientific Data Support in ArcGIS
Geocap seismic oil and gas for ArcGIS- Oil and Gas seminar October 10th
Andrew Fage presentation
Presentation may30th
TrueReusableCode-BigDataCodeCamp2016
Leveraging Map Reduce With Hadoop for Weather Data Analytics
SEPA - Esri UK Annual Conference 2016
Petroleum lunch seminar 30.10.2014
2016 - IGNITE - Terraform to go from Zero to Prod in less than 1 month and TH...
Co gps energy efficient gps sensing with cloud offloading
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Post conversion of Lidar data on complex terrains
Atmos - Tom hartley - Modelling Bird Behaviour to Progress Wind Farm Development
Watershed development and drainage assessments
Scaling graphite to handle a zerg rush
Testbed in aarhus for precision positioning and autonomous systems (tapas)
GeoTrellis, GIS on Scala
LIDAR-derived DTM for archaeology and landscape history research some recent ...
Ad

Viewers also liked (8)

PPTX
True Reusable Code - DevSum2016
PDF
GDPR by Identity Methods
PPTX
Team3 presentation
PDF
Big Data Conference April 2015
PPTX
So you want to do a Big Data project?
PDF
Big Data project offer for HSL
PDF
이민의 포트폴리오
PDF
포트폴리오 오경원
True Reusable Code - DevSum2016
GDPR by Identity Methods
Team3 presentation
Big Data Conference April 2015
So you want to do a Big Data project?
Big Data project offer for HSL
이민의 포트폴리오
포트폴리오 오경원
Ad

Similar to CitySprint Fleetmapper use case -Big Data Bootcamp (20)

PPTX
Dunning time-series-2015
PPTX
How the Internet of Things is Turning the Internet Upside Down
PPTX
Dealing with an Upside Down Internet With High Performance Time Series Database
PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
PPTX
Time Series Data in a Time Series World
PPTX
Scalable Deep Learning in ExtremeEarth-phiweek19
PDF
ASE2010
PPTX
Data warehouse 23 spatial dimension in data warehouse
PDF
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
PDF
DataStax and Esri: Geotemporal IoT Search and Analytics
PPTX
arcgis-enterprise-caching-vector-and-raster-tiles.pptx
PPTX
The Future of Hadoop: A deeper look at Apache Spark
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PPTX
How the Internet of Things are Turning the Internet Upside Down
PPTX
Dealing with an Upside Down Internet
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
PDF
Software for the Hydrographic ocean
PPTX
Building HBase Applications - Ted Dunning
Dunning time-series-2015
How the Internet of Things is Turning the Internet Upside Down
Dealing with an Upside Down Internet With High Performance Time Series Database
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Time Series Data in a Time Series World
Scalable Deep Learning in ExtremeEarth-phiweek19
ASE2010
Data warehouse 23 spatial dimension in data warehouse
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
DataStax and Esri: Geotemporal IoT Search and Analytics
arcgis-enterprise-caching-vector-and-raster-tiles.pptx
The Future of Hadoop: A deeper look at Apache Spark
Apache Hadoop YARN - The Future of Data Processing with Hadoop
How the Internet of Things are Turning the Internet Upside Down
Dealing with an Upside Down Internet
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Sc...
Software for the Hydrographic ocean
Building HBase Applications - Ted Dunning

CitySprint Fleetmapper use case -Big Data Bootcamp

  • 1. Eduard Lazar - CitySprint A geospatial and time series analysis of the CitySprint fleet
  • 2. Blue signals a pick-up Red signals a drop-off Sample of how one driver’s journey looks like Used for: • Viewing the base unit of analysis
  • 3. Demand heat map Heat map of pickup locations density Used for: • Optimising resource allocation • Identifying areas for potential expansion
  • 4. K-means clustering analysis – 40 centres Employed the K-means algorithm to identify clusters of pickup points Used for: • Validating against current service centres map • Identifying areas for potential expansion
  • 5. K-means 100 centres Higher granularity clustering Used for: • Assessing the frequency of pickups for micro- clusters (e.g. villages, neighbourhoods) • Directing drivers to hotter waiting areas
  • 6. Geographical supply & demand Pickup locations shown vs to routes Used for: • Improving likelihood of parcel pickup while on-route
  • 7. 0.0 4.5 9.0 13.5 18.0 0 3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Expectedparcels Time of day Expected parcels allocated to cluster 41 (Stevenage) Demand variation across time Used for: • Positioning couriers in the right place at the right time For each demand cluster we calculated the frequency of pickups per hour
  • 8. The solution outline • Data science capabilities of Spark, easy to use with SQL knowledge • Map plotting on ARGIS – heat mapping, zoom in/out capabilities, real-time • High-performance due to in-memory processing capabilities of Spark • Can work with large data sets due to high performance disk-based data access in Hadoop File System (HDFS) • Can import data from EDW
  • 9. Why Bigstep? • Easy to use - Easy to deploy, redeploy, erase and rewind. Easy to experiment with • Big Data Focus – Infrastructure, orchestration, and software ecosystem deliver performance & ease of use for big data • Domain Experts – Extensive hands-on experience in delivering complex big data solutions for multiple verticals & use cases • Consultative Approach – Direct contact and support from experienced big data, devops, and infrastructure specialists • Best In Class Infrastructure – The world’s highest performance cloud
  • 10. Eduard Lazar - CitySprint

Editor's Notes

  • #2: Objectives: Take geospatial and time series data and make it easily manageable and usable by business users Discover new business insights to optimize operations Run real-time analysis on 22.626.119 records Test if Spark and Hadoop are suitable data analysis tools for CitySprint Design a flexible, versatile environment for analyzing fleet data Implement solution with enough performance so that real time data exploration is possible on the full dataset
  • #3: Follows a random driver on a typical day through pickup and dropoff points. Map can zoom in, zoom out
  • #4: Shows the hot points of pickup points along the uk. A good overview of the overall dataset.
  • #5: Compared against our service center locations it shows a few differences. A clustering algorithm identifies ‘clusters’ of elements by it’s own. K-means needs to be told how many clusters to look for.
  • #6: This is what happens if we tell k-means to split the dataset into 100 hot locations.
  • #7: The blue dots are actual gps information of en-route drivers. Shows typical routes but only some routes go through hot areas.
  • #8: A ‘cluster’ timetable is used to predict demand at a particular cluster on a particular time. Useful to instruct the driver if he is to stay or to go to it’s destination. This can help uberize the business.
  • #9: Used a combination of technologies, mostly Spark on Hadoop on Bigstep. Imported data from production Postgres DB via Sqoop into avro and from there via spark into varous CSV files rendered by the ESRI (ARCGIS). Postgres concentrates information from mobile devices.