SlideShare a Scribd company logo
Task Flow Rate Management
via Spark Streaming
Steve Hastings & Anikate Singh
DataScience@Concur
Task Queue
Flag Tasks
Service Center
• Dynamically flag transactions while controlling their throughput based on various parameters
• Decouple the task creation process from the task flagging process
• Real time visibility on task flow rates and ability to perform analytics
Problem Domain
Data Pipeline
docker-compose build
Task
Producer
Spark
Java
Consumer
nodejsKafka
Kafka
Task
Producer
TASK FLAGGED
Spark
Java
Consumer
Dashboard
nodejs
Server-Sent
EventsOVERLOAD
docker-compose up
Total vs. Flagged Volume in RT
Spark Streaming Agenda
1. Flagging Strategies
1. Random Sampling (dynamic sampling rates)
2. Transaction analysis (dynamic analysis)
2. Maintaining State
1. In spark or elsewhere
3. Getting data in/out
Flagging - Random Sampling
• Random txn flagging
– Each txn individually sampled
• Two types of rate change
– "Panic Button" - instant drop in rate
– Re-equilibrate - slow rise in rate
Rate Change Formulae
R0 = T
Rt+1 = Rt + 0.25 x (T - Rt)
or
Rt+1 = 0.9 * Rt
PANIC
Flagging - Transaction Analysis
• Example: outliers
– Flag txns with some value > 95 percentile
– But with streaming updates to percentiles
• Using Twitter Algebird QTree
+ = ?
Maintaining State
Data
0
State
0
State
1
State
2
Update
Func
Update
Func
Data
1
Data
1
Up
F
Maintaining State (cont)
def updateState(values: Seq[(Int,Int)],
state: Option[Double]): Option[Double] = {
val newState = state match {
case Some(old) => // do stuff
case None => // do other stuff
}
Some(newState)
}
Getting Data In/Out
• Kafka Input
– Fairly easy to get up and going
• Kafka (or any) output
– Create your own connections
– Watch where your code is running
Spark Streaming
Executor
Executor
Executor
DestinationSource
Q & A

More Related Content

PDF
Real Time Test Data with Grafana
PDF
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
PDF
Reactive mistakes - ScalaDays Chicago 2017
PDF
Apache Gearpump next-gen streaming engine
PDF
QCON 2015: Gearpump, Realtime Streaming on Akka
PPT
Learning From the Past: Automated Rule Generation for CEP - DEBS 2014
PDF
Introduction to near real time computing
PDF
Apache Gearpump - Lightweight Real-time Streaming Engine
Real Time Test Data with Grafana
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Reactive mistakes - ScalaDays Chicago 2017
Apache Gearpump next-gen streaming engine
QCON 2015: Gearpump, Realtime Streaming on Akka
Learning From the Past: Automated Rule Generation for CEP - DEBS 2014
Introduction to near real time computing
Apache Gearpump - Lightweight Real-time Streaming Engine

What's hot (20)

PDF
Real World Serverless
PDF
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
PDF
Grafana 7.0
PDF
Introduction to Apache Apex - CoDS 2016
PDF
Zurich Flink Meetup
PDF
Introduction to Stateful Stream Processing with Apache Flink.
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
PPTX
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PDF
Reactive mistakes reactive nyc
PDF
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
PPTX
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
PDF
Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans
PPTX
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
PDF
Akka Streams
PDF
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Real World Serverless
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Grafana 7.0
Introduction to Apache Apex - CoDS 2016
Zurich Flink Meetup
Introduction to Stateful Stream Processing with Apache Flink.
Architectual Comparison of Apache Apex and Spark Streaming
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Reactive mistakes reactive nyc
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Akka Streams
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Ad

Similar to Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput (20)

PDF
Extending Spark Streaming to Support Complex Event Processing
PDF
Spark cep
PDF
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
PDF
SnappyData at Spark Summit 2017
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
PPTX
Realtime Statistics based on Apache Storm and RocketMQ
PDF
Journey into Reactive Streams and Akka Streams
PDF
Data Stream Analytics - Why they are important
PDF
Scala in increasingly demanding environments - DATABIZ
PPT
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
PDF
Reactor, Reactive streams and MicroServices
PDF
Springone2gx 2014 Reactive Streams and Reactor
PPTX
Stream processing from single node to a cluster
PPTX
Next Gen Big Data Analytics with Apache Apex
PDF
So you think you can stream.pptx
PPTX
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
PDF
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
PPTX
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Extending Spark Streaming to Support Complex Event Processing
Spark cep
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
SnappyData at Spark Summit 2017
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
Realtime Statistics based on Apache Storm and RocketMQ
Journey into Reactive Streams and Akka Streams
Data Stream Analytics - Why they are important
Scala in increasingly demanding environments - DATABIZ
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Reactor, Reactive streams and MicroServices
Springone2gx 2014 Reactive Streams and Reactor
Stream processing from single node to a cluster
Next Gen Big Data Analytics with Apache Apex
So you think you can stream.pptx
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Apache Flink @ Tel Aviv / Herzliya Meetup
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Ad

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to machine learning and Linear Models
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
annual-report-2024-2025 original latest.
Computer network topology notes for revision
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Mega Projects Data Mega Projects Data
Introduction to machine learning and Linear Models
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Foundation of Data Science unit number two notes
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction-to-Cloud-ComputingFinal.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ISS -ESG Data flows What is ESG and HowHow
annual-report-2024-2025 original latest.

Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput