SlideShare a Scribd company logo
DATA
LivePerson Case Study:
Real Time Data Streaming
March 20th 2014
Ran Silberman
About me
● Technical Leader of Data Platform in LivePerson
● Bird watcher and amateur bird photographer
Pharaoh Eagle-Owl / Bubo ascalaphus
This is what the people from previous slide were looking at…
Amir Silberman
Agenda
● Why we chose Kafka + Storm
● How implementation was done
● Measures of success
● Two examples of use
● Tips from our experience
Data in LivePerson
Visitor in Site
Chat Window
Agent console
LivePerson SaaS Server
LoginMonitor
Rules,
Intelligence,
Decision
Chat
Chat
Invite
DATA
DATA DATA
BIG
DATA
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
ETL
Sessionize
Modeling
Schema
View
Real-Time data
Historical data
Why Kafka + Storm?
● Need to scale out and plan for future scale
○ Limit for scale should not be technology
○ Let the limit be cost of (commodity) hardware
● What Data platforms can be implemented quickly?
○ Open source - fast evolving and community
○ Micro-services - do only what you ought to do!
● Are there risks in this choice?
○ Yes! technology is not mature enough
○ But, there is no other mature technology that can
address our needs!
Long-eared Owl / Asio otus
Amir Silberman
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
Customers
ETL
Sessionize
Modeling
Schema
View
1st phase - move to Hadoop
ETL
Sessionize
Modeling
Schema
View
RealTime
servers
BI DWH
(Vertica)HDFS
Hadoop
MR Job transfers
data to BI DWH
Customers
2. move to Kafka
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1
Customers
3. Integrate with new producers
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Customers
4. Add Real-time BI
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Storm
Topology
Analytics
DB
Architecture
Real-time
servers
Kafka
Storm
Cassandra/
CouchBase
Real Time Processing
Flow rate
into Kafka:
33 MB/Sec
Flow rate
from Kafka:
20 MB/Sec
Total daily data
in Kafka:
17 Billion events
Some Numbers: Cyber Monday 2013
Dashboards
4 topologies
reading all
events
Eurasian Wryneck / Jynx torquilla
Amir Silberman
Two use cases
1. Visitor list
2. Agent State
1st Strom Use Case: “Visitors List”
Use case:
● Show list of visitors in the “Agent Console”
● Collect data about visitor in real time
● Visitor stickiness in streaming process
Visitors List Topology
Selected Analytics DB - Couchbase
1st Strom Use Case: “Visitors List”
● Document Store - for complex documents
● Searchable - possible to search by different
attributes.
● High throughput - Read & Write
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Write event to
Visitor document
emit emit
Kafka events stream
Add/
Update
Couchbase
“Visitor List” Topology:
Analytics DB: Couchbase - Document store
Parse Avro into
tuple
emit
Visitors List - Storm considerations
● Complex calculations before sending to DB
○ Ignore delayed events
○ Reorder events before storing
● Document cached in memory
● Fields Grouping to bolt that writes to CouchBase
● High parallelism in bolt that writes to CouchBase
Visitors List Topology
European Roller / Coracias garrulus
Amir Silberman
2nd Storm Use Case: “Agent State”
Use case:
● Show Agent activity on “Agent Console”
● Count Agent statistics
● Display graphs
Agent Status Topology
Selected Analytics DB - Cassandra
2nd Storm Use Case: “Agent State”
● Wide Column Store DB
● Highly Available w/o Single point of failure
● High throughput
● Optimized for counters
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka events stream
Add
“Agent Status” Topology:
Analytics DB: Cassandra - Document store
Parse Avro into
tuple
emit
Data
visualization
using Highcharts
Agent Status - Storm considerations
● Counters stored by topology
● Calculations done after reading from DB
● Delayed events should not be ignored
● Order of events does not matter
● Using Highcharts for data visualization
Spur-winged Lapwing / Vanellus spinosus
Amir Silberman
3rd Storm Use Case: Data Auditing
Use case:
● Needs to be able to tell whether events arrived
○ Where there any missing events?
○ Where there any duplicated events?
○ How long did it take for events to arrive?
● Data not important - only count of events
3rd Storm Use Case: Data Auditing
Realtime server
Kafka
Topics
Auditing
Topic
Storm Sync
topology
Audit-loader
topology
MySql
Hadoop
HDFS
audit
job
kafka
1
3
4
2
Auditor
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka events stream
Add
“Sync Audit” Topology:
Sync messages between two topics
Parse Avro into
tuple
emit
Kafka Audit topic
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka Audit topic
Add
“Load Audit” Topology:
Analytics DB: MySql - RDBMS
Parse Avro into
tuple
emit
Auditing Report
“Load Audit” Topology:
● Stores statistics of events count
● SQL type DB
● Used for Auditing and other statistics
● Requires metadata in events header
Challenges:
● High network traffic
● Writing to Kafka is faster than reading
● All topologies read all events
● How to avoid resource starvation in Storm
Subalpine Warbler / Sylvia cantillans
Amir Silberman
Optimizations of Kafka
● Increase Kafka consuming rate by adding partitions
● Run on physical machines with RAID
● Set retention to the proper need
● Monitor data flow!
Optimizations of Storm
● #of Kafka-Spouts = number of total partitions
● Set “Isolation mode” for important topologies
● Validate Network cards can carry network traffic
● Set Storm cluster on high CPU machines
● Monitor servers CPU & Memory (Graphite)
● Assess min. #Cores that topology needs
○ Use “top” -> “load” to find server load
Demo
● Agent Console - https://z1.le.liveperson.net/
71394613 / rans@liveperson.com
● My Site - http://birds-of-israel.weebly.com/
Questions?
Little Owl / Athene noctua
Amir Silberman
Thank you!
Ruff / Philomachus pugnax
Amir Silberman

More Related Content

PPT
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
PPTX
From a kafkaesque story to The Promised Land
PDF
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
PPTX
Realtime Statistics based on Apache Storm and RocketMQ
PPTX
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
PDF
Scaling Apache Storm - Strata + Hadoop World 2014
PPTX
Streaming and Messaging
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
From a kafkaesque story to The Promised Land
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Real-Time Analytics with Kafka, Cassandra and Storm
Realtime Statistics based on Apache Storm and RocketMQ
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Scaling Apache Storm - Strata + Hadoop World 2014
Streaming and Messaging

What's hot (20)

PPTX
Experience with Kafka & Storm
PPTX
Samza at LinkedIn: Taking Stream Processing to the Next Level
PDF
Distributed real time stream processing- why and how
PPTX
Yahoo compares Storm and Spark
PPTX
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
PDF
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
Stream Processing made simple with Kafka
PPTX
Resource Aware Scheduling in Apache Storm
PPTX
Functional Comparison and Performance Evaluation of Streaming Frameworks
PDF
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
PPT
Real-Time Streaming with Apache Spark Streaming and Apache Storm
PDF
Real-time streams and logs with Storm and Kafka
PDF
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
PDF
ApacheCon BigData Europe 2015
PDF
Developing Java Streaming Applications with Apache Storm
PPTX
Spark Streaming Recipes and "Exactly Once" Semantics Revised
PDF
Uber Real Time Data Analytics
PDF
War Stories: DIY Kafka
PDF
Benchmarking Apache Samza: 1.2 million messages per sec per node
Experience with Kafka & Storm
Samza at LinkedIn: Taking Stream Processing to the Next Level
Distributed real time stream processing- why and how
Yahoo compares Storm and Spark
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
Stream Processing made simple with Kafka
Resource Aware Scheduling in Apache Storm
Functional Comparison and Performance Evaluation of Streaming Frameworks
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-time streams and logs with Storm and Kafka
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
ApacheCon BigData Europe 2015
Developing Java Streaming Applications with Apache Storm
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Uber Real Time Data Analytics
War Stories: DIY Kafka
Benchmarking Apache Samza: 1.2 million messages per sec per node
Ad

Similar to Real Time Data Streaming using Kafka & Storm (20)

PDF
Couchbase@live person meetup july 22nd
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PPTX
Project Deimos
PDF
Kafka Vienna Meetup 020719
PDF
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
PPTX
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
PPTX
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
PDF
A Global Source of Truth for the Microservices Generation
PDF
Streaming Visualisation
PPTX
Liveperson DLD 2015
PPTX
Real-Time Big Data at In-Memory Speed, Using Storm
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
PDF
Time Series Analysis Using an Event Streaming Platform
PDF
Time Series Analysis… using an Event Streaming Platform
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
PDF
Microservices, Events, and Breaking the Data Monolith with Kafka
PPTX
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
PPTX
Understanding event data
PPTX
Cassandra summit-2013
Couchbase@live person meetup july 22nd
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Project Deimos
Kafka Vienna Meetup 020719
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
A Global Source of Truth for the Microservices Generation
Streaming Visualisation
Liveperson DLD 2015
Real-Time Big Data at In-Memory Speed, Using Storm
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Microservices, Events, and Breaking the Data Monolith with Kafka
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Understanding event data
Cassandra summit-2013
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
A Presentation on Artificial Intelligence
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
A Presentation on Artificial Intelligence
Big Data Technologies - Introduction.pptx
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf

Real Time Data Streaming using Kafka & Storm

  • 1. DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
  • 2. About me ● Technical Leader of Data Platform in LivePerson ● Bird watcher and amateur bird photographer Pharaoh Eagle-Owl / Bubo ascalaphus This is what the people from previous slide were looking at… Amir Silberman
  • 3. Agenda ● Why we chose Kafka + Storm ● How implementation was done ● Measures of success ● Two examples of use ● Tips from our experience
  • 4. Data in LivePerson Visitor in Site Chat Window Agent console LivePerson SaaS Server LoginMonitor Rules, Intelligence, Decision Chat Chat Invite DATA DATA DATA BIG DATA
  • 5. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers ETL Sessionize Modeling Schema View Real-Time data Historical data
  • 6. Why Kafka + Storm? ● Need to scale out and plan for future scale ○ Limit for scale should not be technology ○ Let the limit be cost of (commodity) hardware ● What Data platforms can be implemented quickly? ○ Open source - fast evolving and community ○ Micro-services - do only what you ought to do! ● Are there risks in this choice? ○ Yes! technology is not mature enough ○ But, there is no other mature technology that can address our needs!
  • 7. Long-eared Owl / Asio otus Amir Silberman
  • 8. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers Customers ETL Sessionize Modeling Schema View
  • 9. 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica)HDFS Hadoop MR Job transfers data to BI DWH Customers
  • 10. 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Customers
  • 11. 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Customers
  • 12. 4. Add Real-time BI 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Analytics DB
  • 13. Architecture Real-time servers Kafka Storm Cassandra/ CouchBase Real Time Processing Flow rate into Kafka: 33 MB/Sec Flow rate from Kafka: 20 MB/Sec Total daily data in Kafka: 17 Billion events Some Numbers: Cyber Monday 2013 Dashboards 4 topologies reading all events
  • 14. Eurasian Wryneck / Jynx torquilla Amir Silberman
  • 15. Two use cases 1. Visitor list 2. Agent State
  • 16. 1st Strom Use Case: “Visitors List” Use case: ● Show list of visitors in the “Agent Console” ● Collect data about visitor in real time ● Visitor stickiness in streaming process
  • 18. Selected Analytics DB - Couchbase 1st Strom Use Case: “Visitors List” ● Document Store - for complex documents ● Searchable - possible to search by different attributes. ● High throughput - Read & Write
  • 19. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Write event to Visitor document emit emit Kafka events stream Add/ Update Couchbase “Visitor List” Topology: Analytics DB: Couchbase - Document store Parse Avro into tuple emit
  • 20. Visitors List - Storm considerations ● Complex calculations before sending to DB ○ Ignore delayed events ○ Reorder events before storing ● Document cached in memory ● Fields Grouping to bolt that writes to CouchBase ● High parallelism in bolt that writes to CouchBase
  • 22. European Roller / Coracias garrulus Amir Silberman
  • 23. 2nd Storm Use Case: “Agent State” Use case: ● Show Agent activity on “Agent Console” ● Count Agent statistics ● Display graphs
  • 25. Selected Analytics DB - Cassandra 2nd Storm Use Case: “Agent State” ● Wide Column Store DB ● Highly Available w/o Single point of failure ● High throughput ● Optimized for counters
  • 26. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Agent Status” Topology: Analytics DB: Cassandra - Document store Parse Avro into tuple emit Data visualization using Highcharts
  • 27. Agent Status - Storm considerations ● Counters stored by topology ● Calculations done after reading from DB ● Delayed events should not be ignored ● Order of events does not matter ● Using Highcharts for data visualization
  • 28. Spur-winged Lapwing / Vanellus spinosus Amir Silberman
  • 29. 3rd Storm Use Case: Data Auditing Use case: ● Needs to be able to tell whether events arrived ○ Where there any missing events? ○ Where there any duplicated events? ○ How long did it take for events to arrive? ● Data not important - only count of events
  • 30. 3rd Storm Use Case: Data Auditing Realtime server Kafka Topics Auditing Topic Storm Sync topology Audit-loader topology MySql Hadoop HDFS audit job kafka 1 3 4 2 Auditor
  • 31. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Sync Audit” Topology: Sync messages between two topics Parse Avro into tuple emit Kafka Audit topic
  • 32. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka Audit topic Add “Load Audit” Topology: Analytics DB: MySql - RDBMS Parse Avro into tuple emit Auditing Report
  • 33. “Load Audit” Topology: ● Stores statistics of events count ● SQL type DB ● Used for Auditing and other statistics ● Requires metadata in events header
  • 34. Challenges: ● High network traffic ● Writing to Kafka is faster than reading ● All topologies read all events ● How to avoid resource starvation in Storm Subalpine Warbler / Sylvia cantillans Amir Silberman
  • 35. Optimizations of Kafka ● Increase Kafka consuming rate by adding partitions ● Run on physical machines with RAID ● Set retention to the proper need ● Monitor data flow!
  • 36. Optimizations of Storm ● #of Kafka-Spouts = number of total partitions ● Set “Isolation mode” for important topologies ● Validate Network cards can carry network traffic ● Set Storm cluster on high CPU machines ● Monitor servers CPU & Memory (Graphite) ● Assess min. #Cores that topology needs ○ Use “top” -> “load” to find server load
  • 37. Demo ● Agent Console - https://z1.le.liveperson.net/ 71394613 / rans@liveperson.com ● My Site - http://birds-of-israel.weebly.com/
  • 38. Questions? Little Owl / Athene noctua Amir Silberman
  • 39. Thank you! Ruff / Philomachus pugnax Amir Silberman