SlideShare a Scribd company logo
Interactive Visualization of Data powered by
Spark
Streaming Data @ Zoomdata
Visualizations react to
new data delivered
Users start,
stop, pause
the stream
Users select a rolling
window or pin a start
time to capture
cumulative metrics
Drivers for Streaming Data
Data Freshness Time to Analytic Business Context
Challenges
● Time
● Frequency
● Retention
● Synchronization
● Order
● Updates
Addressing the Problem @ Zoomdata
Historical Revised
Receive Data JMS Kafka
Manipulate Stream Single JVM in Memory Spark Streaming
Hold Data in Buffer MongoDB Pluggable
Interact with Data Custom Code Pluggable
Technology Cast
● The Stream - Kafka, Kinesis, JMS
● Processing Fabric - Spark Streaming
● Landing Area - MemSQL, Solr, Kudu, Others
How it looks
With the rest of the app
Scale Out
Benefits
● Contextual Expressiveness with Streaming Data
● Independent scalability (scale-up, scale-around)
● Expressiveness powered by Spark -- using
Windowing (dataframe API with stream)
Side Benefits
● Separation of concerns
● Disaster Recovery, COOP, other Data management
concerns
● Restatements
● Options!
Demo
● Twitter Producer
● Spark Streaming
● MemSQL & Solr Sinks
Future Work
● Cross Stream Synchronization & Fusion
● On-demand scale out and resource management via
Mesos
● Schema Evolution
● Storage Tiering
Thanks
For more information contact:
ruhollah@zoomdata.com
quan@zoomdata.com

More Related Content

PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
PDF
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
PPTX
Splice Machine Overview
PPTX
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
PPTX
Instrumenting your Instruments
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
PDF
What's new in SQL on Hadoop and Beyond
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Splice Machine Overview
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Instrumenting your Instruments
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
What's new in SQL on Hadoop and Beyond

What's hot (20)

PPTX
Lambda-less Stream Processing @Scale in LinkedIn
PDF
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
PDF
Which Change Data Capture Strategy is Right for You?
PPTX
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
PPTX
Solr + Hadoop: Interactive Search for Hadoop
PPTX
How do spark_kafka_and_syncsort_dmx-h
PDF
Family data sheet HP Virtual Connect(May 2013)
PPTX
Druid Overview by Rachel Pedreschi
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
PPTX
Building Continuously Curated Ingestion Pipelines
PDF
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
PDF
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
PDF
ASPgems - kappa architecture
PPT
The Evolution of Big Data Pipelines at Intuit
PPTX
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
PDF
Big Telco - Yousun Jeong
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
PDF
Intro to databricks delta lake
Lambda-less Stream Processing @Scale in LinkedIn
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
Which Change Data Capture Strategy is Right for You?
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Solr + Hadoop: Interactive Search for Hadoop
How do spark_kafka_and_syncsort_dmx-h
Family data sheet HP Virtual Connect(May 2013)
Druid Overview by Rachel Pedreschi
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Building Continuously Curated Ingestion Pipelines
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
ASPgems - kappa architecture
The Evolution of Big Data Pipelines at Intuit
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Big Telco - Yousun Jeong
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Intro to databricks delta lake
Ad

Viewers also liked (20)

PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
PDF
Building the Ideal Stack for Real-Time Analytics
PDF
The Fast Path to Building Operational Applications with Spark
PPTX
Put Alternative Data to Use in Capital Markets

PPTX
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
PPTX
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
PPTX
The Evolution of Data Architecture
PPTX
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
PDF
Cloudera and Qlik: Big Data Analytics for Business
PDF
Zoomdata
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

PDF
Softnix Messaging Server
PDF
CWIN17 Frankfurt / Cloudera
PPTX
Ibm watson
PDF
빅데이터윈윈 컨퍼런스_데이터시각화자료
PPTX
Security implementation on hadoop
PDF
Softnix Security Data Lake
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
PDF
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Building the Ideal Stack for Real-Time Analytics
The Fast Path to Building Operational Applications with Spark
Put Alternative Data to Use in Capital Markets

Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
The Evolution of Data Architecture
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Cloudera and Qlik: Big Data Analytics for Business
Zoomdata
Using Big Data to Transform Your Customer’s Experience - Part 1

Softnix Messaging Server
CWIN17 Frankfurt / Cloudera
Ibm watson
빅데이터윈윈 컨퍼런스_데이터시각화자료
Security implementation on hadoop
Softnix Security Data Lake
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Benefits of Transferring Real-Time Data to Hadoop at Scale
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
Ad

Similar to Spark meetup - Zoomdata Streaming (20)

PDF
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
PDF
Interactive Visualization of Streaming Data Powered by Spark
PDF
SnappyData @ Seattle Spark Meetup
PPTX
Architectures, Frameworks and Infrastructure
PPTX
Speed up your XPages Application performance
PDF
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PDF
Testing data streaming applications
PDF
BigDataSpain 2016: Introduction to Apache Apex
PPTX
How jKool Analyzes Streaming Data in Real Time with DataStax
PPTX
How jKool Analyzes Streaming Data in Real Time with DataStax
PPTX
Performance on a budget
PDF
Data Pipelines and Telephony Fraud Detection Using Machine Learning
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
PDF
On-boarding with JanusGraph Performance
PPTX
Druid Optimizations for Scaling Customer Facing Analytics
PPTX
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
PDF
Secrets of Spark's success - Deenar Toraskar, Think Reactive
PDF
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark
SnappyData @ Seattle Spark Meetup
Architectures, Frameworks and Infrastructure
Speed up your XPages Application performance
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Testing data streaming applications
BigDataSpain 2016: Introduction to Apache Apex
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
Performance on a budget
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Apache Tez: Accelerating Hadoop Query Processing
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
On-boarding with JanusGraph Performance
Druid Optimizations for Scaling Customer Facing Analytics
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Our Multi-Year Journey to a 10x Faster Confluent Cloud

More from Zoomdata (8)

PDF
The New Basics of Business Intelligence Lesson 5: Embedded Analytics
PDF
The New Basics of Business Intelligence Lesson 4: Search Data
PDF
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
PDF
The New Basics of Business Intelligence Lesson 2: Real Time Data Visualization
PDF
The New Basics of Business Intelligence Lesson 1: Big Data Exploration
PPTX
Optimize Performance and Scalability
PPTX
Query Any Data by Wayne Eckerson
PPTX
Data as a Product by Wayne Eckerson
The New Basics of Business Intelligence Lesson 5: Embedded Analytics
The New Basics of Business Intelligence Lesson 4: Search Data
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 2: Real Time Data Visualization
The New Basics of Business Intelligence Lesson 1: Big Data Exploration
Optimize Performance and Scalability
Query Any Data by Wayne Eckerson
Data as a Product by Wayne Eckerson

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
Teaching material agriculture food technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Modernizing your data center with Dell and AMD
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
Teaching material agriculture food technology
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.

Spark meetup - Zoomdata Streaming