SlideShare a Scribd company logo
Writing an interactive interface for SQL on Flink
How and why we created SQLStreamBuilder—and the lessons learned along the way
Kenny Gorman
Co-Founder and CEO
www.eventador.io
2019 Flink Forward Berlin
Background and motivations
● Eventador.io has offered a managed Flink runtime for a few years now. We
started to see some customer patterns emerge.
● The state of the art today is to write Flink jobs in Java or Scala using the
DataStream/Set API and/or the Table API’s.
● While powerful, the time and expertise needed isn’t trivial. Adoption and time to
market lags.
● Teams are busy writing code. Completely swamped to be precise.
Why SQL anyway?
● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about
data. Everyone knows SQL.
● It’s declarative, just ask for what you want to see.
● It’s been extended to accommodate streaming constructs like windows
(Flink/Calcite).
● Streaming SQL never completes, it’s a query on boundless data.
● It’s an amazing way to interact with streaming data.
Of workloads could be represented
with SQL, and we plan to grow that.
Require more complex logic best
represented in Java/Scala.
80%
20%
What if we could go beyond simply building
processors in SQL - do it interactively, manage
schema’s and make it all easy?
Could building logic on streams be as productive
and intuitive as using a database yet as scalable
and powerful as Flink?
Eventador SQLStreamBuilder
● Interactive SQL editor - create and submit any Flink compatible SQL
● Virtual Table Registry - source/sink + schema definition
● Query Parser - Gives instant feedback
● Job payload management - Builds job payloads
● Flink runner - Takes the payload and runs the job
● Delivered as a cloud service - in your AWS account
Feedback on SQL
execution
Where do I send results?
Where to run the job
The SQL statement Sampling rather than a
result-set
A sample of results
in browser
Schema management - Virtual Table Registry
● SQL requires a schema of typed
columns - streams don’t have have to
have this.
● It’s common to use AVRO (easy to solve
for) but also free-form JSON
● Free form means - a total F**ing mess.
● Sources - Kafka/Kinesis (soon)
● Sinks - Kafka,S3, JDBC, ELK (soon)
SQLStreamBuilder Components
● Interactive SQL interface
○ Handles query creation and submission.
○ Handles feedback from SQLIO
○ Interface to build queries, sources and sinks
○ Python + Vue.js
○ Results are sampled back to interface
● SQL engine (SQLIO)
○ Parse incoming statements
○ Map data sources/sinks
○ Parse schema (Schema Registry+AVRO / JSON)
○ Build POJOs
○ Submit payload to runner (Flink)
○ Java
● Virtual Table Registry
○ Creation of schema for streams
○ AVRO + JSON
○ Python
SQLStreamBuilder (con’t)
● Job Management Interface
○ Stop/Start/Edit/etc
○ Python + Vue.js
○ Uses Flink APIs
● Builder
○ Handles creation of assets via K8’s
○ Python
○ PostgreSQL backend
○ Kubernetes orchestration
● Flink runner
○ Run jobs on Flink 1.8.2
○ Kubernetes orchestration
○ Any Flink compatible SQL statement
Query Lifecycle - Parse
HTTPS POST {
“schema”: { },
“topic”: “myTopic”,
“virtualTable: “myTable”,
“sqlStmt”: “SELECT ….”
}
SQLIO
RESULT (if error)
{ “errorMsg”: “unable to find table myTable” }
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
...
Table parse = mytopic
.select()
.sort()
...
- Validate SQL against Flink/Calcite
- JSON -> AVRO
- AVRO Schema
- Generate Classes (POJO’s)
SQL console
Query Lifecycle - Execute
SQLIO
Apache Kafka / Socket.io
SQL console
Column Column Column
Value Value Value
- If class exists
- class, method, params
.connect(
new Kafka()
.version("0.11")
.topic("...")
.sinkPartitionerXX
result.writeToSink(..);
env.execute(..);
- Enhanced schema typing
- Enhanced feedback/logging
- Sends base64 encoded payload to Flink
Job
SAMPLE THE DATA TO USER
SQL join streams from multiple clusters/types
Write to multiple types of sinks, building complex
processing pipelines
Aggregate data before pushing to expensive/slow
database endpoints
Conditionally write to multiple S3 buckets
Building Processing Environments
SELECT * FROM sensors
JOIN account_info ON ...
SELECT sensorid, max(temp)
FROM stream
GROUP BY sensorid, tumble(..)
SELECT sensorid, region
FROM stream
WHERE region IN [...]
s3://xxx/yyys3://xxx/yyy
sms
SELECT * FROM table
WHERE user_selected_thing = ‘foo’;
SELECT sensorid, message
FROM stream
WHERE is_alert = ‘t’ Data Science
ML Team(s)
SnowFlakeDB
or other Data
Warehouse
Javascript User Functions - Introduced Today
function ICAO_lookup(icao) {
try {
var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection();
c.requestMethod='GET';
var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream));
return reader.readLine();
} catch(err) {
return "Unknown: " + err;
}
}
ICAO_lookup($p0);
Whats next?
We are hiring
Number of changes for
community evaluation
Whats Next?
● Read Consistent (Snapshot) Sink
● Auto-detect and type schema
● SQL API
● Intelligent Auto-scale
● AWS Spot instance support/management
Streams
Are
The
Database
Thank You
hello@eventador.io

More Related Content

PDF
Streaming sql w kafka and flink
PDF
Uber Business Metrics Generation and Management Through Apache Flink
PDF
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
PDF
Willump: Optimizing Feature Computation in ML Inference
PDF
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
PDF
Streaming Data from Cassandra into Kafka
PDF
Airflow tutorials hands_on
Streaming sql w kafka and flink
Uber Business Metrics Generation and Management Through Apache Flink
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Willump: Optimizing Feature Computation in ML Inference
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
The magic behind your Lyft ride prices: A case study on machine learning and ...
Streaming Data from Cassandra into Kafka
Airflow tutorials hands_on

What's hot (20)

PDF
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
PDF
Building an analytics workflow using Apache Airflow
PDF
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
PPTX
Enhancements in Java 9 Streams
PDF
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
PDF
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
PPTX
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
PDF
AIRflow at Scale
PPTX
Apache Airflow overview
PDF
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
PDF
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
PDF
Streaming your Lyft Ride Prices - Flink Forward SF 2019
PDF
Airflow introduction
PDF
SAIS2018 - Fact Store At Netflix Scale
PDF
Real-Time Stream Processing with KSQL and Apache Kafka
PDF
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
PDF
Scaling up uber's real time data analytics
PDF
Apache Airflow
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Building an analytics workflow using Apache Airflow
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Enhancements in Java 9 Streams
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Kafka Practices @ Uber - Seattle Apache Kafka meetup
AIRflow at Scale
Apache Airflow overview
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Airflow introduction
SAIS2018 - Fact Store At Netflix Scale
Real-Time Stream Processing with KSQL and Apache Kafka
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Scaling up uber's real time data analytics
Apache Airflow
Ad

Similar to Writing an Interactive Interface for SQL on Flink (20)

PPTX
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PPTX
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
PPTX
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
PDF
Continus sql with sql stream builder
PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
PPTX
Workshop híbrido: Stream Processing con Flink
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
PDF
Flink's SQL Engine: Let's Open the Engine Room!
PDF
Stream Processing Handson With Apache Flink Giannis Polyzos
PPTX
Apache Flink: Past, Present and Future
PDF
JConWorld_ Continuous SQL with Kafka and Flink
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PDF
Near real-time anomaly detection at Lyft
PPTX
Stream Analytics with SQL on Apache Flink
PDF
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
PDF
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PPTX
The Evolution of (Open Source) Data Processing
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Why and how to leverage the power and simplicity of SQL on Apache Flink
Continus sql with sql stream builder
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Workshop híbrido: Stream Processing con Flink
Webinar: Flink SQL in Action - Fabian Hueske
Flink's SQL Engine: Let's Open the Engine Room!
Stream Processing Handson With Apache Flink Giannis Polyzos
Apache Flink: Past, Present and Future
JConWorld_ Continuous SQL with Kafka and Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Near real-time anomaly detection at Lyft
Stream Analytics with SQL on Apache Flink
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
The Evolution of (Open Source) Data Processing
Ad

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
annual-report-2024-2025 original latest.
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Computer network topology notes for revision
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
SAP 2 completion done . PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
annual-report-2024-2025 original latest.
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Database Infoormation System (DBIS).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Mega Projects Data Mega Projects Data
Computer network topology notes for revision
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx

Writing an Interactive Interface for SQL on Flink

  • 1. Writing an interactive interface for SQL on Flink How and why we created SQLStreamBuilder—and the lessons learned along the way Kenny Gorman Co-Founder and CEO www.eventador.io 2019 Flink Forward Berlin
  • 2. Background and motivations ● Eventador.io has offered a managed Flink runtime for a few years now. We started to see some customer patterns emerge. ● The state of the art today is to write Flink jobs in Java or Scala using the DataStream/Set API and/or the Table API’s. ● While powerful, the time and expertise needed isn’t trivial. Adoption and time to market lags. ● Teams are busy writing code. Completely swamped to be precise.
  • 3. Why SQL anyway? ● SQL is > 30 years old. It’s massively useful for inspecting and reasoning about data. Everyone knows SQL. ● It’s declarative, just ask for what you want to see. ● It’s been extended to accommodate streaming constructs like windows (Flink/Calcite). ● Streaming SQL never completes, it’s a query on boundless data. ● It’s an amazing way to interact with streaming data.
  • 4. Of workloads could be represented with SQL, and we plan to grow that. Require more complex logic best represented in Java/Scala. 80% 20%
  • 5. What if we could go beyond simply building processors in SQL - do it interactively, manage schema’s and make it all easy? Could building logic on streams be as productive and intuitive as using a database yet as scalable and powerful as Flink?
  • 6. Eventador SQLStreamBuilder ● Interactive SQL editor - create and submit any Flink compatible SQL ● Virtual Table Registry - source/sink + schema definition ● Query Parser - Gives instant feedback ● Job payload management - Builds job payloads ● Flink runner - Takes the payload and runs the job ● Delivered as a cloud service - in your AWS account
  • 7. Feedback on SQL execution Where do I send results? Where to run the job The SQL statement Sampling rather than a result-set A sample of results in browser
  • 8. Schema management - Virtual Table Registry ● SQL requires a schema of typed columns - streams don’t have have to have this. ● It’s common to use AVRO (easy to solve for) but also free-form JSON ● Free form means - a total F**ing mess. ● Sources - Kafka/Kinesis (soon) ● Sinks - Kafka,S3, JDBC, ELK (soon)
  • 9. SQLStreamBuilder Components ● Interactive SQL interface ○ Handles query creation and submission. ○ Handles feedback from SQLIO ○ Interface to build queries, sources and sinks ○ Python + Vue.js ○ Results are sampled back to interface ● SQL engine (SQLIO) ○ Parse incoming statements ○ Map data sources/sinks ○ Parse schema (Schema Registry+AVRO / JSON) ○ Build POJOs ○ Submit payload to runner (Flink) ○ Java ● Virtual Table Registry ○ Creation of schema for streams ○ AVRO + JSON ○ Python
  • 10. SQLStreamBuilder (con’t) ● Job Management Interface ○ Stop/Start/Edit/etc ○ Python + Vue.js ○ Uses Flink APIs ● Builder ○ Handles creation of assets via K8’s ○ Python ○ PostgreSQL backend ○ Kubernetes orchestration ● Flink runner ○ Run jobs on Flink 1.8.2 ○ Kubernetes orchestration ○ Any Flink compatible SQL statement
  • 11. Query Lifecycle - Parse HTTPS POST { “schema”: { }, “topic”: “myTopic”, “virtualTable: “myTable”, “sqlStmt”: “SELECT ….” } SQLIO RESULT (if error) { “errorMsg”: “unable to find table myTable” } final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ... Table parse = mytopic .select() .sort() ... - Validate SQL against Flink/Calcite - JSON -> AVRO - AVRO Schema - Generate Classes (POJO’s) SQL console
  • 12. Query Lifecycle - Execute SQLIO Apache Kafka / Socket.io SQL console Column Column Column Value Value Value - If class exists - class, method, params .connect( new Kafka() .version("0.11") .topic("...") .sinkPartitionerXX result.writeToSink(..); env.execute(..); - Enhanced schema typing - Enhanced feedback/logging - Sends base64 encoded payload to Flink Job SAMPLE THE DATA TO USER
  • 13. SQL join streams from multiple clusters/types Write to multiple types of sinks, building complex processing pipelines Aggregate data before pushing to expensive/slow database endpoints Conditionally write to multiple S3 buckets
  • 14. Building Processing Environments SELECT * FROM sensors JOIN account_info ON ... SELECT sensorid, max(temp) FROM stream GROUP BY sensorid, tumble(..) SELECT sensorid, region FROM stream WHERE region IN [...] s3://xxx/yyys3://xxx/yyy sms SELECT * FROM table WHERE user_selected_thing = ‘foo’; SELECT sensorid, message FROM stream WHERE is_alert = ‘t’ Data Science ML Team(s) SnowFlakeDB or other Data Warehouse
  • 15. Javascript User Functions - Introduced Today function ICAO_lookup(icao) { try { var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection(); c.requestMethod='GET'; var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream)); return reader.readLine(); } catch(err) { return "Unknown: " + err; } } ICAO_lookup($p0);
  • 17. We are hiring Number of changes for community evaluation
  • 18. Whats Next? ● Read Consistent (Snapshot) Sink ● Auto-detect and type schema ● SQL API ● Intelligent Auto-scale ● AWS Spot instance support/management