SlideShare a Scribd company logo
Insights Without
Tradeoffs Using
Structured Streaming
Michael Armbrust - @michaelarmbrust
Spark Summit East 2017
Parallelism
Split up the problem
to harness many
machines for computation
Complexity
Handle cross-machine
communication and failures
Developer Productivity
Quickly and concisely express
common computations
Efficiency
Hand-tune code to minimize
overheads and process the
most data per cycle
SQL
Throughput
Process large historical
repositories quickly
Latency
Up-to-date answers
as new data arrives
Structured Streaming
Production Use Cases
Streaming at
Collect logs and metrics from a variety of
sources to ensure the security,
availability and performance of our cloud
platform.
Engineer Office Hours
MY HOURS
• Spark Core
• R
• Data Science
• ML
• GraphFrames, Deep Learning
• Databricks
• Spark SQL
• Structured Streaming
• Spark SQL
• Structured Streaming
• Databricks
TODAY 4:30 –
5:15
OTHER ENGINEERS
TODAY 1:45 –
5:15
THURS 10:30 –
2:30
@ Databricks Booth
Thank you!
All code available at databricks.com/blog
and @michaelarmbrust

More Related Content

PDF
Insights Without Tradeoffs: Using Structured Streaming
PDF
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
PDF
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
Spark Summit EU talk by Tug Grall
PDF
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
PPTX
Building Data Pipelines with Spark and StreamSets
Insights Without Tradeoffs: Using Structured Streaming
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit EU talk by Tug Grall
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Building Data Pipelines with Spark and StreamSets

What's hot (20)

PDF
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
PDF
Big Telco - Yousun Jeong
PDF
Spark Summit EU talk by Pat Patterson
PPTX
Data Science at Scale by Sarah Guido
PDF
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
PDF
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
PDF
Spark Summit EU talk by Ahsan Javed Awan
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PDF
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
PPTX
Self-Service Analytics on Hadoop: Lessons Learned
PDF
Announcing Databricks Cloud (Spark Summit 2014)
PPTX
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
PDF
Sparkler Presentation for Spark Summit East 2017
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PDF
A Production Quality Sketching Library for the Analysis of Big Data
PDF
Building Robust Production Data Pipelines with Databricks Delta
PDF
Enancing Threat Detection with Big Data and AI
PDF
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Big Telco - Yousun Jeong
Spark Summit EU talk by Pat Patterson
Data Science at Scale by Sarah Guido
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Spark Summit EU talk by Ahsan Javed Awan
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
Self-Service Analytics on Hadoop: Lessons Learned
Announcing Databricks Cloud (Spark Summit 2014)
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Sparkler Presentation for Spark Summit East 2017
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
A Production Quality Sketching Library for the Analysis of Big Data
Building Robust Production Data Pipelines with Databricks Delta
Enancing Threat Detection with Big Data and AI
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Ad

Viewers also liked (9)

PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
PDF
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
PDF
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
PDF
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
PDF
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
PPTX
Tuning and Monitoring Deep Learning on Apache Spark
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
PDF
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
PDF
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Tuning and Monitoring Deep Learning on Apache Spark
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Ad

Similar to Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armbrust (20)

PPTX
StructuredStreaming webinar slides.pptx
PPTX
StructuredStreaming webinar slides.pptx
PDF
2017 big data landscape and cutting edge innovations public
PDF
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
PDF
Structured Streaming in Spark
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
PPT
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
PDF
Making Structured Streaming Ready for Production
PPTX
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
PPTX
Spark Streaming @ Scale (Clicktale)
PDF
Introduction to Structured streaming
PDF
Writing Continuous Applications with Structured Streaming in PySpark
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PDF
AI-Powered Streaming Analytics for Real-Time Customer Experience
PPTX
Discretized streams
PPTX
Spark Streaming with Azure Databricks
PDF
What's new with Apache Spark's Structured Streaming?
PDF
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
StructuredStreaming webinar slides.pptx
StructuredStreaming webinar slides.pptx
2017 big data landscape and cutting edge innovations public
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Structured Streaming in Spark
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Making Structured Streaming Ready for Production
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Spark Streaming @ Scale (Clicktale)
Introduction to Structured streaming
Writing Continuous Applications with Structured Streaming in PySpark
Designing Structured Streaming Pipelines—How to Architect Things Right
AI-Powered Streaming Analytics for Real-Time Customer Experience
Discretized streams
Spark Streaming with Azure Databricks
What's new with Apache Spark's Structured Streaming?
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Global journeys: estimating international migration
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx
Clinical guidelines as a resource for EBP(1).pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Reliability_Chapter_ presentation 1221.5784
Fluorescence-microscope_Botany_detailed content
Database Infoormation System (DBIS).pptx
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Global journeys: estimating international migration
Major-Components-ofNKJNNKNKNKNKronment.pptx
.pdf is not working space design for the following data for the following dat...
Launch Your Data Science Career in Kochi – 2025
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21

Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armbrust

Editor's Notes