SlideShare a Scribd company logo
Crowd DetectorCrowd Detector
Reza Asad
Insight Data Engineering June 2015
Motivation
● Avoid waiting time in crowded areas.
Data
● Lets imagine we had data about people's location.
● This could be collected form people's cell phones.
● How can we use such data?
Naive Approach
Demo
Data
● But such data is not available to me ...
● Solution : Engineer the data!
● Take data from yelp
● Perform a random walk
Pipeline
Data
Engineering Challenges
● Choosing K?
Engineering Challenges
● The area of SF: 46.87 mi ²
● For the purpose of this project each cluster is 0.09 mi ²
● This means k is roughly 500
Engineering Challenges
● Parameters to tune:
– Time it takes to produce the messages
– Processing time for k-means in Spark Streaming
– The update interval for a fixed data point in the
database
Goal
● Tune the parameters in order to have a stable system
● The total delay after processing each batch must be
constant and comparable to the batch interval.
● You can check this in the Spark API
Tackling Challenges
●
Having multiple producers and consumers ✔
● Kafka is fast with sending messages and is not the bottleneck
● Establishing some safe limits:
– Using spark.streaming.receiver.maxRate to control
the input rate ✔
– Understanding the complexity of the process in Spark
Streaming ✔
– Choosing the right batch interval ✔
Raw Data
Data Process
● Data filteration in spark streaming
Data Process
About Me
● Long time ago - B.S in pure math, University of Toronto
● More recent - M.S in applied math, University of British Columbia
● The exciting now - A data engineer who wants to go camping with other
data engineers

More Related Content

PDF
Circonus: Design failures - A Case Study
PDF
Statistics for Engineers
PDF
Digital Transformation & Solvency II Simulations for L&G: Optimizing, Acceler...
 
PDF
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
PPTX
1. introduction
PDF
Fall in Love with Graphs and Metrics using Grafana
PDF
Take Your Time
PDF
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
Circonus: Design failures - A Case Study
Statistics for Engineers
Digital Transformation & Solvency II Simulations for L&G: Optimizing, Acceler...
 
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
1. introduction
Fall in Love with Graphs and Metrics using Grafana
Take Your Time
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

What's hot (18)

PDF
Keynote: Scaling Sensu Go
PPTX
Kiwi.com Reaches Cruising Altitude with Scylla
PDF
How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...
PPTX
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
PDF
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
PDF
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
PPTX
Session 03 data_migration_at_scale_by_sameer
PDF
Streaming Sensor Data with Grafana and InfluxDB | Ryan Mckinley | Grafana
PPTX
NodeTime Tool Review
PDF
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
PDF
Slack in the Age of Prometheus
PPTX
PDF
Cassandra Meetup Nov 2019 - Cassandra Resiliency
PPTX
Html5 devconf nodejs_devops_shubhra
PDF
Lambda - Building On-prem GPU Training Infrastructure
PDF
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
PDF
Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...
PDF
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Keynote: Scaling Sensu Go
Kiwi.com Reaches Cruising Altitude with Scylla
How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
Session 03 data_migration_at_scale_by_sameer
Streaming Sensor Data with Grafana and InfluxDB | Ryan Mckinley | Grafana
NodeTime Tool Review
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Slack in the Age of Prometheus
Cassandra Meetup Nov 2019 - Cassandra Resiliency
Html5 devconf nodejs_devops_shubhra
Lambda - Building On-prem GPU Training Infrastructure
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Ad

Viewers also liked (9)

PPTX
Sobre el autor
PDF
IAF training.PDF
PDF
electronic-structure_of_aluminum_nitride_-_theory
DOCX
Naim Ahmed
PPT
2.5. rúbrica de evaluación individual tutores en red intef
PPTX
How To Capitalize On Opportunities While Minimizing Risk
PDF
Administracion por objetivos 1 yohana
PPTX
Success story: Kiran Mazumdar
PPTX
Historia del internet en el salvador
Sobre el autor
IAF training.PDF
electronic-structure_of_aluminum_nitride_-_theory
Naim Ahmed
2.5. rúbrica de evaluación individual tutores en red intef
How To Capitalize On Opportunities While Minimizing Risk
Administracion por objetivos 1 yohana
Success story: Kiran Mazumdar
Historia del internet en el salvador
Ad

Similar to Insight Recent Demo (20)

PPTX
Predictive maintenance withsensors_in_utilities_
PDF
Extracting Insights from Data at Twitter
PPTX
Building Data Pipelines with Spark and StreamSets
PDF
Engineering Data Pipeline for Data-Driven Analytics
PPT
Big Data on The Cloud
PDF
IOT_MODULE_4.pd easy to understand notes
PDF
Data Infrastructure for a World of Music
PPTX
Data Day Seattle 2015: Sarah Guido
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
PDF
04 open source_tools
PPTX
IoT Unit 4.pptxZxcvbnmklqwertyuiozxdfghjkl
PDF
iot_module4.pdf
PDF
Data Analytics Data Analytics Data Ana
PPTX
From Pipelines to Refineries: scaling big data applications with Tim Hunter
PDF
PXL Data Engineering Workshop By Selligent
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
PDF
Big Data : Risks and Opportunities
PPTX
Data Engineering at Udemy
PDF
Five Early Challenges Of Building Streaming Fast Data Applications
Predictive maintenance withsensors_in_utilities_
Extracting Insights from Data at Twitter
Building Data Pipelines with Spark and StreamSets
Engineering Data Pipeline for Data-Driven Analytics
Big Data on The Cloud
IOT_MODULE_4.pd easy to understand notes
Data Infrastructure for a World of Music
Data Day Seattle 2015: Sarah Guido
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
04 open source_tools
IoT Unit 4.pptxZxcvbnmklqwertyuiozxdfghjkl
iot_module4.pdf
Data Analytics Data Analytics Data Ana
From Pipelines to Refineries: scaling big data applications with Tim Hunter
PXL Data Engineering Workshop By Selligent
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Big Data : Risks and Opportunities
Data Engineering at Udemy
Five Early Challenges Of Building Streaming Fast Data Applications

Insight Recent Demo