Insight Recent Demo

Crowd DetectorCrowd Detector
Reza Asad
Insight Data Engineering June 2015

Motivation
● Avoid waiting time in crowded areas.

Data
● Lets imagine we had data about people's location.
● This could be collected form people's cell phones.
● How can we use such data?

Data
● But such data is not available to me ...
● Solution : Engineer the data!
● Take data from yelp
● Perform a random walk

Engineering Challenges
● Choosing K?

● The area of SF: 46.87 mi ²
● For the purpose of this project each cluster is 0.09 mi ²
● This means k is roughly 500

● Parameters to tune:
– Time it takes to produce the messages
– Processing time for k-means in Spark Streaming
– The update interval for a fixed data point in the
database

Goal
● Tune the parameters in order to have a stable system
● The total delay after processing each batch must be
constant and comparable to the batch interval.
● You can check this in the Spark API

Tackling Challenges
●
Having multiple producers and consumers ✔
● Kafka is fast with sending messages and is not the bottleneck
● Establishing some safe limits:
– Using spark.streaming.receiver.maxRate to control
the input rate ✔
– Understanding the complexity of the process in Spark
Streaming ✔
– Choosing the right batch interval ✔

Data Process
● Data filteration in spark streaming

About Me
● Long time ago - B.S in pure math, University of Toronto
● More recent - M.S in applied math, University of British Columbia
● The exciting now - A data engineer who wants to go camping with other
data engineers

Insight Recent Demo

More Related Content

What's hot (18)

Viewers also liked (9)

Similar to Insight Recent Demo (20)

Insight Recent Demo