SlideShare a Scribd company logo
Threading Needles in a Haystack:
Sessionizing Uber Trips in Real Time
Amey Chaugule
Uber Technologies, Inc.
Agenda
● The Marketplace Org @ Uber
● Sessionization use cases at Uber
● Some unique challenges about Sessions @ Uber
● Anatomy of an Uber Session
● Our Sessions DSL
● Sessions in Production
● Scale
● Q & A
The Uber Marketplace
The Uber Marketplace
In our marketplace, there are models that describe the world and the
decision engines that act upon them.
Marketplace @ Uber
Real-time events in the physical world drive marketplace dynamics
which then affect the algorithmic engines in the Marketplace which in
turn influence the events in the real world in a continuous feedback
loop.
Some examples of marketplace dynamics:
Supply
Demand
Forecast
Trips
Need for a sessionized view of the Uber experience
Given the scale and complexity of our systems, events are distributed
across multiple disparate real-time streams over the twin dimensions of
time and space.
How do we contextualize these event streams so they can be logically
grouped together and quickly surface useful information to downstream
decision engines?
Real-time Use cases
For instance, the some algorithms need to adjust to spikes in demand to
balance the supply in near real-time.
While some machine learning-models need information about
pre-request activities in real-time.
To understand rider behaviour we need to understand the full user
experience from start to finish.
Post-Dispatch Rider
Cancel
Completed Trip
Pre-Dispatch Rider
Cancel
Unfulfilled
Dispatched
Exit pre-request
screen
Exit at request
screen
RequestSession Start
Driver Cancel
Get to Request
Screen
R/S
C/R
C/S
The definition of this experience, a SESSION, is critical to understanding our
internal business operations.
Some unique challenges
Given the ride-sharing marketplace Uber operates, our Sessions state
machine needs to model interactions between riders, driver partners
and back-end systems
The Anatomy of an Uber Session
The Sessions State Machine
The Sessions State Machine - The shopping state
The Sessions State Machine - Requesting a ride
The Sessions State Machine - Cancellation (the sad path ☹ )
The Sessions State Machine - On Trip (The happy path )
The Sessions State Machine - Session End (The happy path )
The Sessions State Machine - On trip (the happy path )
Implementation
Pickup Experience at Super Bowl 2017
Sessions DSL
type Condition = SessionInput => Boolean
/**
* This class defines a single transition. It takes in two functions:
*
* @param condition Function of type (SessionInput => Boolean) representing conditions that trigger this transition.
* @param nextState Function of type (State, SessionInput) => State, which results in the next state.
*/
class Transition(condition: Condition, nextState: (State, SessionInput) => State) {
def isConditionValid(event: SessionInput) = condition(event)
def transition(event: SessionInput, fromState: State): State = nextState(fromState, event)
}
e.g.
val requestRideAndRiderLookingTransition = new Transition(isRequestEvent and isRiderLooking, RequestRideState.transitionTo)
Sessions DSL
sealed trait State {
val ts: Long
val name: RiderSessionStateName.Value
val transitionReason: Option[String]
val jobUuid: Option[String]
val originLocation: Option[GeoLocation]
val destinationLocation: Option[GeoLocation]
// List of Transitions out of this state. They are evaluated in the order of precedence they appear in this list.
val transitions: List[Transition]
/**
*
* @param event Current session input.
* @return The next state resulting from the input event.
*/
def withEvent(event: SessionInput): State = {
// Find the transition with condition that event holds valid.
val transition = transitions.find(t => t.isConditionValid(event))
transition map(_.transition(event, this)) getOrElse this
}
}
Sessions DSL - Putting it all together
case class RequestRideState(ts: Long,
override val vehicleViewId: VehicleView,
transitionReason: Option[String] = None,
jobUuid: Option[String] = None,
originLocation: Option[GeoLocation] = None,
destinationLocation: Option[GeoLocation] = None) extends State with AssignedVehicleView {
override val name: SessionState.Value = SessionState.RequestRide
override val transitions: List[Transition] = List(Transition.onTripTransition,
Transition.shoppingStateFromRequestTransition,
Transition.requestRideSelfTransition,
Transition.shoppingStateOnRequestExpiration)
}
Sessions - Moving from Spark to Flink
unionedStreams
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SessionInput](Time.seconds(30)) {
override def extractTimestamp(event: SessionInput): Long = event.timestamp })
.keyBy(_.riderUuid)
.window(EventTimeSessionWindows.withDynamicGap(Time.minutes(30)))
.evictor(DeltaEvictor.of(30000.0D, FlinkSessionsPipeline.deltaFunction, true))
.process(new ProcessWindowFunction[SessionInput, List[RiderSessionObject], String, TimeWindow]() {
…
}
Spark’s stateFunc abstraction just fit nicely into ProcessWindow
Sessions in Production
“Time is relative… and clocks are hard.” - I. Brodsky
Our state machine models interactions between riders, drivers, and
internal back-end systems each with their own notion of time!
We essentially need to keep track of per-key watermarks.
Checkpointing
Checkpointing to HDFS can be unreliable.
What levels of backpressure can your downstream applications
tolerate?
At times, it’s just easier to store per-key state to Kafka and restart a
new pipeline, letting backfill take care of any gaps.
Backfills
We rely on upserts into Elasticsearch and backfilling can introduce subtle
bugs by being “more correct.”
In our implementation each session is indexed by an anonymized rider
UUID and its start time, i.e. event time of the first event to kick off a
session.
Re-ordering the event consumption in a backfill can easily give you two
nearly identical sessions with slightly different start times
Schema Evolution
Uber’s ride sharing products are constantly evolving and our pipeline
needs to keep in sync.
A new pipeline without a state needs to warm up the state before we
can trust it to reliably write to our production indices.
Production hand off between the old and the new pipelines needs to be
carefully choreographed.
Observability
Keep track of any and every metric you can possibly think of for data
reliability as well as processing metrics such as Flink & JVM stats.
We use M3, our open source metrics platform for Prometheus.
Upstream data can and will change on you.
Imputing State
Uber’s first and foremost responsibility is to ensure a reliable, safe ride
for our users from request/dispatch all the way to a completed trip.
Mobile logging is buffered and the best option, especially when running
on low tech phones and in areas with poor network connectivity.
Our sessionization pipelines need to be resilient to dropped events
Sessions in Numbers
Scale
We ingest tens of billions of events daily.
We generate tens of millions of sessions.
Currently the production pipeline is running in Spark Streaming and
we’re comparing it against a Flink successor.
Thank you.
Learn more: uber.com/marketplace

More Related Content

PPTX
Making Sense of Streaming Sensor Data: How Uber Detects on Trip Car Crashes -...
PPTX
Uber Trips Analysis PPT[1] - Read-Only.pptx
PPTX
Budapest Spark Meetup - Apache Spark @enbrite.ly
PDF
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
PDF
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
PDF
RESTful API using scalaz (3)
PPT
Software System Engineering - Chapter 14
PDF
Traversals and Scans and CQRS, oh my!
Making Sense of Streaming Sensor Data: How Uber Detects on Trip Car Crashes -...
Uber Trips Analysis PPT[1] - Read-Only.pptx
Budapest Spark Meetup - Apache Spark @enbrite.ly
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
RESTful API using scalaz (3)
Software System Engineering - Chapter 14
Traversals and Scans and CQRS, oh my!

Similar to Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: Sessionizing the Uber firehose in realtime" (20)

PDF
OutSystsems User Group Netherlands September 2024.pdf
PPTX
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
PDF
Keeping London On The Move - Interesting Solutions For Challenging Problems
PDF
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
PDF
Sessionization with Spark streaming
PPTX
Taras Girnyk "Debugging and Profiling distributed applications using Opentrac...
PDF
Building Continuous Application with Structured Streaming and Real-Time Data ...
PPTX
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
PDF
Behavior Driven Development and Laravel
PDF
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
PDF
Stream Processing with Kafka in Uber, Danny Yuan
PDF
Alexander Graebe
PDF
Streaming Processing in Uber Marketplace for Kafka Summit 2016
PPTX
Beyond parallelize and collect - Spark Summit East 2016
PDF
Big Data Analytics Platforms by KTH and RISE SICS
PDF
Uber rides data analysis using python under goggle co lab
PDF
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
PDF
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
PPTX
Observability - the good, the bad, and the ugly
PDF
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
OutSystsems User Group Netherlands September 2024.pdf
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Keeping London On The Move - Interesting Solutions For Challenging Problems
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Sessionization with Spark streaming
Taras Girnyk "Debugging and Profiling distributed applications using Opentrac...
Building Continuous Application with Structured Streaming and Real-Time Data ...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Behavior Driven Development and Laravel
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Stream Processing with Kafka in Uber, Danny Yuan
Alexander Graebe
Streaming Processing in Uber Marketplace for Kafka Summit 2016
Beyond parallelize and collect - Spark Summit East 2016
Big Data Analytics Platforms by KTH and RISE SICS
Uber rides data analysis using python under goggle co lab
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Observability - the good, the bad, and the ugly
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25-Week II
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Programs and apps: productivity, graphics, security and other tools
A comparative analysis of optical character recognition models for extracting...
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”

Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: Sessionizing the Uber firehose in realtime"

  • 1. Threading Needles in a Haystack: Sessionizing Uber Trips in Real Time Amey Chaugule Uber Technologies, Inc.
  • 2. Agenda ● The Marketplace Org @ Uber ● Sessionization use cases at Uber ● Some unique challenges about Sessions @ Uber ● Anatomy of an Uber Session ● Our Sessions DSL ● Sessions in Production ● Scale ● Q & A
  • 4. The Uber Marketplace In our marketplace, there are models that describe the world and the decision engines that act upon them.
  • 5. Marketplace @ Uber Real-time events in the physical world drive marketplace dynamics which then affect the algorithmic engines in the Marketplace which in turn influence the events in the real world in a continuous feedback loop. Some examples of marketplace dynamics: Supply Demand Forecast Trips
  • 6. Need for a sessionized view of the Uber experience Given the scale and complexity of our systems, events are distributed across multiple disparate real-time streams over the twin dimensions of time and space. How do we contextualize these event streams so they can be logically grouped together and quickly surface useful information to downstream decision engines?
  • 7. Real-time Use cases For instance, the some algorithms need to adjust to spikes in demand to balance the supply in near real-time. While some machine learning-models need information about pre-request activities in real-time.
  • 8. To understand rider behaviour we need to understand the full user experience from start to finish. Post-Dispatch Rider Cancel Completed Trip Pre-Dispatch Rider Cancel Unfulfilled Dispatched Exit pre-request screen Exit at request screen RequestSession Start Driver Cancel Get to Request Screen R/S C/R C/S The definition of this experience, a SESSION, is critical to understanding our internal business operations.
  • 9. Some unique challenges Given the ride-sharing marketplace Uber operates, our Sessions state machine needs to model interactions between riders, driver partners and back-end systems
  • 10. The Anatomy of an Uber Session
  • 12. The Sessions State Machine - The shopping state
  • 13. The Sessions State Machine - Requesting a ride
  • 14. The Sessions State Machine - Cancellation (the sad path ☹ )
  • 15. The Sessions State Machine - On Trip (The happy path )
  • 16. The Sessions State Machine - Session End (The happy path )
  • 17. The Sessions State Machine - On trip (the happy path ) Implementation Pickup Experience at Super Bowl 2017
  • 18. Sessions DSL type Condition = SessionInput => Boolean /** * This class defines a single transition. It takes in two functions: * * @param condition Function of type (SessionInput => Boolean) representing conditions that trigger this transition. * @param nextState Function of type (State, SessionInput) => State, which results in the next state. */ class Transition(condition: Condition, nextState: (State, SessionInput) => State) { def isConditionValid(event: SessionInput) = condition(event) def transition(event: SessionInput, fromState: State): State = nextState(fromState, event) } e.g. val requestRideAndRiderLookingTransition = new Transition(isRequestEvent and isRiderLooking, RequestRideState.transitionTo)
  • 19. Sessions DSL sealed trait State { val ts: Long val name: RiderSessionStateName.Value val transitionReason: Option[String] val jobUuid: Option[String] val originLocation: Option[GeoLocation] val destinationLocation: Option[GeoLocation] // List of Transitions out of this state. They are evaluated in the order of precedence they appear in this list. val transitions: List[Transition] /** * * @param event Current session input. * @return The next state resulting from the input event. */ def withEvent(event: SessionInput): State = { // Find the transition with condition that event holds valid. val transition = transitions.find(t => t.isConditionValid(event)) transition map(_.transition(event, this)) getOrElse this } }
  • 20. Sessions DSL - Putting it all together case class RequestRideState(ts: Long, override val vehicleViewId: VehicleView, transitionReason: Option[String] = None, jobUuid: Option[String] = None, originLocation: Option[GeoLocation] = None, destinationLocation: Option[GeoLocation] = None) extends State with AssignedVehicleView { override val name: SessionState.Value = SessionState.RequestRide override val transitions: List[Transition] = List(Transition.onTripTransition, Transition.shoppingStateFromRequestTransition, Transition.requestRideSelfTransition, Transition.shoppingStateOnRequestExpiration) }
  • 21. Sessions - Moving from Spark to Flink unionedStreams .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SessionInput](Time.seconds(30)) { override def extractTimestamp(event: SessionInput): Long = event.timestamp }) .keyBy(_.riderUuid) .window(EventTimeSessionWindows.withDynamicGap(Time.minutes(30))) .evictor(DeltaEvictor.of(30000.0D, FlinkSessionsPipeline.deltaFunction, true)) .process(new ProcessWindowFunction[SessionInput, List[RiderSessionObject], String, TimeWindow]() { … } Spark’s stateFunc abstraction just fit nicely into ProcessWindow
  • 23. “Time is relative… and clocks are hard.” - I. Brodsky Our state machine models interactions between riders, drivers, and internal back-end systems each with their own notion of time! We essentially need to keep track of per-key watermarks.
  • 24. Checkpointing Checkpointing to HDFS can be unreliable. What levels of backpressure can your downstream applications tolerate? At times, it’s just easier to store per-key state to Kafka and restart a new pipeline, letting backfill take care of any gaps.
  • 25. Backfills We rely on upserts into Elasticsearch and backfilling can introduce subtle bugs by being “more correct.” In our implementation each session is indexed by an anonymized rider UUID and its start time, i.e. event time of the first event to kick off a session. Re-ordering the event consumption in a backfill can easily give you two nearly identical sessions with slightly different start times
  • 26. Schema Evolution Uber’s ride sharing products are constantly evolving and our pipeline needs to keep in sync. A new pipeline without a state needs to warm up the state before we can trust it to reliably write to our production indices. Production hand off between the old and the new pipelines needs to be carefully choreographed.
  • 27. Observability Keep track of any and every metric you can possibly think of for data reliability as well as processing metrics such as Flink & JVM stats. We use M3, our open source metrics platform for Prometheus. Upstream data can and will change on you.
  • 28. Imputing State Uber’s first and foremost responsibility is to ensure a reliable, safe ride for our users from request/dispatch all the way to a completed trip. Mobile logging is buffered and the best option, especially when running on low tech phones and in areas with poor network connectivity. Our sessionization pipelines need to be resilient to dropped events
  • 30. Scale We ingest tens of billions of events daily. We generate tens of millions of sessions. Currently the production pipeline is running in Spark Streaming and we’re comparing it against a Flink successor.
  • 31. Thank you. Learn more: uber.com/marketplace