Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput

Task Flow Rate Management
via Spark Streaming
Steve Hastings & Anikate Singh
DataScience@Concur

Task Queue
Flag Tasks
Service Center
• Dynamically flag transactions while controlling their throughput based on various parameters
• Decouple the task creation process from the task flagging process
• Real time visibility on task flow rates and ability to perform analytics
Problem Domain

Data Pipeline
docker-compose build
Task
Producer
Spark
Java
Consumer
nodejsKafka
Kafka
Task
Producer
TASK FLAGGED
Spark
Java
Consumer
Dashboard
nodejs
Server-Sent
EventsOVERLOAD
docker-compose up

Total vs. Flagged Volume in RT

Spark Streaming Agenda
1. Flagging Strategies
1. Random Sampling (dynamic sampling rates)
2. Transaction analysis (dynamic analysis)
2. Maintaining State
1. In spark or elsewhere
3. Getting data in/out

Flagging - Random Sampling
• Random txn flagging
– Each txn individually sampled
• Two types of rate change
– "Panic Button" - instant drop in rate
– Re-equilibrate - slow rise in rate
Rate Change Formulae
R0 = T
Rt+1 = Rt + 0.25 x (T - Rt)
or
Rt+1 = 0.9 * Rt
PANIC

Flagging - Transaction Analysis
• Example: outliers
– Flag txns with some value > 95 percentile
– But with streaming updates to percentiles
• Using Twitter Algebird QTree
+ = ?

Maintaining State
Data
0
State
0
State
1
State
2
Update
Func
Update
Func
Data
1
Data
1
Up
F

Maintaining State (cont)
def updateState(values: Seq[(Int,Int)],
state: Option[Double]): Option[Double] = {
val newState = state match {
case Some(old) => // do stuff
case None => // do other stuff
}
Some(newState)
}

Getting Data In/Out
• Kafka Input
– Fairly easy to get up and going
• Kafka (or any) output
– Create your own connections
– Watch where your code is running
Spark Streaming
Executor
Executor
Executor
DestinationSource

Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput

More Related Content

What's hot (20)

Similar to Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput (20)

Recently uploaded (20)

Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput