SlideShare a Scribd company logo
Streaming Data Analytics with ksqlDB and Superset
w/ Robert Stolz
Email: robert@preset.io
GitHub: garden-of-delete
Find me on the Superset Slack!
Who am I?
2
● Data Engineer and Developer Advocate @ Preset
● Background in scientific research, computational
biology, mathematics, open-source software
● Data architecture and best practices nerd
● New(ish) to Kafka
Agenda
3
• The history and anatomy of Apache Superset
• What superset offers a streaming data architecture
• Streaming analytics w/ Kafka: paths and challenges
Feel free to ask questions as they come up
Keep an eye out for this series on the Preset Blog!
Apache Superset
4
Apache Superset
2019 2021
2015
Version 1.0,
ASF incubator
graduation
5
Dynamic Dashboards
Dashboard filters and Jinja templating enable end-users
to drill deeper into data
No Code Exploration
Create beautiful, complex charts from your data without
having to write any code
SQL Lab
State of the art SQL IDE with a rich metadata browser for
deeper analysis
Rich Visualizations
Beautiful array of interactive visualizations including
geospatial
Granular Permissions
Row level security, configurable data policies
Semantic Layer
Support for virtual columns, virtual tables, view creation,
and more
Caching
Reduce load on the database - faster queries, faster
results
Modern Datastack Support
Connect to any SQL speaking database, including popular
cloud data warehouses and SQL engines
Alerts & Reports
Get notified via Slack or email when dips or spikes happen
in your data
Custom Viz Plugins
Build your own custom visualization plug-in or connect to
popular 3rd party plug-ins
6
Apache Superset
Superset speaks SQL via SQLAlchemy
7
Who uses Apache Superset?
and hundreds more...
8
Value proposition of open-source BI
● Extensibility: custom analytics, embedding, piecemeal
● Control: avoid vendor lock-in
● Cost: free to use and modify, but can be expensive to maintain an
enterprise deployment
● Quality: open-source is a better process for making software
9
Superset’s lightweight semantic layer
SQL
speaking
datasources
React
front-end
Python
back-end
+
semantic
layer
10
Explore
Explore: in-chart analytics
SQL Lab
Dashboard
Dashboard: Native Filters
Dashboard: Drag and Drop Editing
Dashboard: Alerts and Reports
Why connect streaming data to the BI layer?
● BI is one of the primary sensory organs of modern organizations
● Faster well-informed decision-making is a generally desirable thing
● Many more specific business use-cases require fast response to external events
○ Anomaly detection
○ Location and time-sensitive services
○ Extreme event monitoring
○ Visualizing and analyzing a real-world process that is constantly evolving
The Question
Want to understand: what paths exist for getting streaming data from
Kafka into Superset? (and more generally into the BI/analytics layer)
Distinct from wanting to analyze metadata from a kafka deployment
Best practice: Intermediate datastore
?
Want to understand: what paths exist for getting streaming data from
Kafka into Superset? (and more generally into the BI/analytics layer)
Distinct from wanting to analyze metadata from a kafka deployment
Direct connection
- Connect Kafka directly to Superset
- The most naive approach
Direct connection
- Superset would need to consume data from Kafka topics directly
- Undesirable to have data live in the BI/Analytics layer
Streaming Analytics w/ Superset + ksqlDB
- ksqlDB provides a SQL speaking interface for data in Kafka topics
- Powered by Kafka’s stream processing framework
Streaming Analytics w/ Superset + ksqlDB
- No SQLAlchemy dialect for ksqlDB (as of today)
- Probably undesirable to have historical data, complex aggregates,
etc accessible only through Kafka’s stream-processing framework
Best-practice: Intermediate datastore
- Desirable properties: high write-volume, robust support for event
data, low read-after-write latency, integrated kafka consumer
?
Best-practice: Intermediate datastore
- Desirable properties: high write-volume, robust support for time-
series data, low read-after-write latency, integrated kafka consumer
- Druid, Clickhouse, Rockset, Pinot, Cassandra, etc ...
How to choose the right datastore?
Path 1: Integrated consumer
- Integrated consumers ingest event data directly from Kafka topics
- Transformation can be handled by the datastore or by kafka streams
- Best performance, limited flexibility in choice of datastore
Path 2: ksqlDB connection
- Some transformation tasks are handled by ksqlDB (Kafka Streams)
- Expands the list of possible intermediate datastores
Path 3: Ad-hoc consumers
- Maximum flexibility around choice of datastore
- Comes at the expense of performance
- Can be harder to maintain
Superset fits into batch and streaming data architectures
Src: Designing Cloud Data Platforms by Danil Zburivsky and Lynda Partner
Manual Setup
• Complex set-up
• Maximum control over
configuration
• Good for enterprise
deployments
• Advanced features require
additional set-up (Async
Queries, Query Caching,
Prophet integration,
Dashboard thumbnails,
Alerts and Reports)
Docker-compose
• Easiest set-up
• Great for trying out
Superset and local
development
• Some features are part
of the stack by default
(caching) and some
aren’t (alerts and
reports, prophet
integration)
Preset Cloud
• No set-up
• Good for individual
evaluation all the
way up to enterprise
needs
• All advanced
Superset features
available
• Still FREE for small
teams!
Three ways to run Superset
Streaming Data Analytics with ksqlDB and Superset
w/ Robert Stolz
Email: robert@preset.io
GitHub: garden-of-delete
Find me on the Superset Slack!

More Related Content

PPTX
Apache Superset - open source data exploration and visualization (Conclusion ...
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Event-Driven Architecture (EDA)
PPTX
Presto: SQL-on-anything
PDF
Open Source DataViz with Apache Superset
PDF
Combining logs, metrics, and traces for unified observability
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Kafka Architecture & Fundamentals Explained
Event-Driven Architecture (EDA)
Presto: SQL-on-anything
Open Source DataViz with Apache Superset
Combining logs, metrics, and traces for unified observability
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry

What's hot (20)

PDF
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
ODP
Presto
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
The basics of fluentd
PPSX
Docker Kubernetes Istio
PPTX
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
PDF
Livy: A REST Web Service For Apache Spark
PPTX
Applications Performance Monitoring with Applications Manager part 1
PDF
Présentation de Apache Zookeeper
PPTX
PDF
Spark 2.x Troubleshooting Guide
 
KEY
Web API Basics
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PPTX
ASP.MVC Training
PDF
[Spring Camp 2018] 11번가 Spring Cloud 기반 MSA로의 전환 : 지난 1년간의 이야기
PDF
Log analysis with the elk stack
PPTX
Docker introduction
PDF
Consumer offset management in Kafka
PDF
Redis cluster
PDF
Microservice With Spring Boot and Spring Cloud
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Presto
Apache Iceberg - A Table Format for Hige Analytic Datasets
The basics of fluentd
Docker Kubernetes Istio
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Livy: A REST Web Service For Apache Spark
Applications Performance Monitoring with Applications Manager part 1
Présentation de Apache Zookeeper
Spark 2.x Troubleshooting Guide
 
Web API Basics
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
ASP.MVC Training
[Spring Camp 2018] 11번가 Spring Cloud 기반 MSA로의 전환 : 지난 1년간의 이야기
Log analysis with the elk stack
Docker introduction
Consumer offset management in Kafka
Redis cluster
Microservice With Spring Boot and Spring Cloud
Ad

Similar to Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset (20)

PDF
Operational Analytics on Event Streams in Kafka
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
PPTX
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
PDF
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PDF
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
PDF
Streaming Visualization
PDF
Concepts and Patterns for Streaming Services with Kafka
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
PDF
Streaming Visualisation
PDF
Streaming Visualization
PPTX
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
PPTX
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
PPTX
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
PDF
Meetup: Streaming Data Pipeline Development
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Operational Analytics on Event Streams in Kafka
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Streaming Visualization
Concepts and Patterns for Streaming Services with Kafka
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Streaming Visualisation
Streaming Visualization
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Meetup: Streaming Data Pipeline Development
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
A Presentation on Artificial Intelligence
The AUB Centre for AI in Media Proposal.docx
Modernizing your data center with Dell and AMD
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset

  • 1. Streaming Data Analytics with ksqlDB and Superset w/ Robert Stolz Email: robert@preset.io GitHub: garden-of-delete Find me on the Superset Slack!
  • 2. Who am I? 2 ● Data Engineer and Developer Advocate @ Preset ● Background in scientific research, computational biology, mathematics, open-source software ● Data architecture and best practices nerd ● New(ish) to Kafka
  • 3. Agenda 3 • The history and anatomy of Apache Superset • What superset offers a streaming data architecture • Streaming analytics w/ Kafka: paths and challenges Feel free to ask questions as they come up Keep an eye out for this series on the Preset Blog!
  • 5. Apache Superset 2019 2021 2015 Version 1.0, ASF incubator graduation 5
  • 6. Dynamic Dashboards Dashboard filters and Jinja templating enable end-users to drill deeper into data No Code Exploration Create beautiful, complex charts from your data without having to write any code SQL Lab State of the art SQL IDE with a rich metadata browser for deeper analysis Rich Visualizations Beautiful array of interactive visualizations including geospatial Granular Permissions Row level security, configurable data policies Semantic Layer Support for virtual columns, virtual tables, view creation, and more Caching Reduce load on the database - faster queries, faster results Modern Datastack Support Connect to any SQL speaking database, including popular cloud data warehouses and SQL engines Alerts & Reports Get notified via Slack or email when dips or spikes happen in your data Custom Viz Plugins Build your own custom visualization plug-in or connect to popular 3rd party plug-ins 6 Apache Superset
  • 7. Superset speaks SQL via SQLAlchemy 7
  • 8. Who uses Apache Superset? and hundreds more... 8
  • 9. Value proposition of open-source BI ● Extensibility: custom analytics, embedding, piecemeal ● Control: avoid vendor lock-in ● Cost: free to use and modify, but can be expensive to maintain an enterprise deployment ● Quality: open-source is a better process for making software 9
  • 10. Superset’s lightweight semantic layer SQL speaking datasources React front-end Python back-end + semantic layer 10
  • 16. Dashboard: Drag and Drop Editing
  • 18. Why connect streaming data to the BI layer? ● BI is one of the primary sensory organs of modern organizations ● Faster well-informed decision-making is a generally desirable thing ● Many more specific business use-cases require fast response to external events ○ Anomaly detection ○ Location and time-sensitive services ○ Extreme event monitoring ○ Visualizing and analyzing a real-world process that is constantly evolving
  • 19. The Question Want to understand: what paths exist for getting streaming data from Kafka into Superset? (and more generally into the BI/analytics layer) Distinct from wanting to analyze metadata from a kafka deployment
  • 20. Best practice: Intermediate datastore ? Want to understand: what paths exist for getting streaming data from Kafka into Superset? (and more generally into the BI/analytics layer) Distinct from wanting to analyze metadata from a kafka deployment
  • 21. Direct connection - Connect Kafka directly to Superset - The most naive approach
  • 22. Direct connection - Superset would need to consume data from Kafka topics directly - Undesirable to have data live in the BI/Analytics layer
  • 23. Streaming Analytics w/ Superset + ksqlDB - ksqlDB provides a SQL speaking interface for data in Kafka topics - Powered by Kafka’s stream processing framework
  • 24. Streaming Analytics w/ Superset + ksqlDB - No SQLAlchemy dialect for ksqlDB (as of today) - Probably undesirable to have historical data, complex aggregates, etc accessible only through Kafka’s stream-processing framework
  • 25. Best-practice: Intermediate datastore - Desirable properties: high write-volume, robust support for event data, low read-after-write latency, integrated kafka consumer ?
  • 26. Best-practice: Intermediate datastore - Desirable properties: high write-volume, robust support for time- series data, low read-after-write latency, integrated kafka consumer - Druid, Clickhouse, Rockset, Pinot, Cassandra, etc ...
  • 27. How to choose the right datastore?
  • 28. Path 1: Integrated consumer - Integrated consumers ingest event data directly from Kafka topics - Transformation can be handled by the datastore or by kafka streams - Best performance, limited flexibility in choice of datastore
  • 29. Path 2: ksqlDB connection - Some transformation tasks are handled by ksqlDB (Kafka Streams) - Expands the list of possible intermediate datastores
  • 30. Path 3: Ad-hoc consumers - Maximum flexibility around choice of datastore - Comes at the expense of performance - Can be harder to maintain
  • 31. Superset fits into batch and streaming data architectures Src: Designing Cloud Data Platforms by Danil Zburivsky and Lynda Partner
  • 32. Manual Setup • Complex set-up • Maximum control over configuration • Good for enterprise deployments • Advanced features require additional set-up (Async Queries, Query Caching, Prophet integration, Dashboard thumbnails, Alerts and Reports) Docker-compose • Easiest set-up • Great for trying out Superset and local development • Some features are part of the stack by default (caching) and some aren’t (alerts and reports, prophet integration) Preset Cloud • No set-up • Good for individual evaluation all the way up to enterprise needs • All advanced Superset features available • Still FREE for small teams! Three ways to run Superset
  • 33. Streaming Data Analytics with ksqlDB and Superset w/ Robert Stolz Email: robert@preset.io GitHub: garden-of-delete Find me on the Superset Slack!