SlideShare a Scribd company logo
Continuous SQL with Apache Streaming
Timothy Spann
Developer Advocate
https://github.com/tspannhw/SpeakerProfile
2
https://github.com/tspannhw https://www.datainmotion.dev/
3
Speaker Bio
DZone Zone Leader and Big Data MVB;
@PaasDev
https://github.com/tspannhw https://www.datainmotion.dev/
https://github.com/tspannhw/SpeakerProfile
https://dev.to/tspannhw
https://sessionize.com/tspann/
https://www.slideshare.net/bunkertor
Developer Advocate
4
I Can Haz Data?
Today’s Data. REST and Websocket JSON “stonks”
{"symbol":"CLDR",
"uuid":"10640832-f139-4b82-8780-e3ad37b3d0
ce",
"ts":1618529574078,
"dt":1612098900000,
"datetime":"2021/01/31 08:15:00",
"open":"12.24500",
"close":"12.25500",
"high":"12.25500",
"volume":"12353",
"low":"12.24500"}
5
End to End Streaming Demo Pipeline
Enterprise
sources
Weather
Errors
Aggregates
Alerts
Stocks
ETL
Analytics
Streaming SQL
Clickstream Market data
Machine logs Social
https://github.com/tspannhw/CloudDemo2021
6
End to End Streaming Demo Pipeline
Enterprise
sources
Weather
Errors
Aggregates
Alerts
Stocks
ETL
Analytics
Streaming SQL
Clickstream Market data
Machine logs Social
7
WHAT IS APACHE NIFI?
Apache NiFi is a scalable, real-time streaming data
platform that collects, curates, and analyzes data so
customers gain key insights for immediate
actionable intelligence.
8
APACHE NIFI
Enable easy ingestion, routing, management and delivery of
any data anywhere (Edge, cloud, data center) to any
downstream system with built in end-to-end security and
provenance
ACQUIRE PROCESS DELIVER
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Guaranteed Delivery
• Full data provenance from acquisition to
delivery
• Diverse, Non-Traditional Sources
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
© 2021 Cloudera, Inc. All rights reserved. 9
https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
● Reduce, Reuse, Recycle. Use Parameters to reuse
common modules.
● Put flows, reusable chunks into separate Process
Groups.
● Write custom processors if you need new or
specialized features
● Use Cloudera supported NiFi Processors
● Use Record Processors everywhere
No More Spaghetti Flows
10
WHAT IS APACHE PULSAR?
Apache Pulsar is an open source, cloud-native
distributed messaging and streaming platform.
EVENTS
11
APACHE PULSAR
Enable Geo-Replicated Messaging
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
© 2021 Cloudera, Inc. All rights reserved. 12
Flink SQL
https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
13
Flink SQL
-- specify Kafka partition key on output
SELECT foo AS _eventKey FROM sensors
-- use event time timestamp from kafka
-- exactly once compatible
SELECT eventTimestamp FROM sensors
-- nested structures access
SELECT foo.’bar’ FROM table; -- must quote nested
column
-- timestamps
SELECT * FROM payments
WHERE eventTimestamp > CURRENT_TIMESTAMP-interval
'10' second;
-- unnest
SELECT b.*, u.*
FROM bgp_avro b,
UNNEST(b.path) AS u(pathitem)
-- aggregations and windows
SELECT card,
MAX(amount) as theamount,
TUMBLE_END(eventTimestamp, interval '5' minute) as
ts
FROM payments
WHERE lat IS NOT NULL
AND lon IS NOT NULL
GROUP BY card,
TUMBLE(eventTimestamp, interval '5' minute)
HAVING COUNT(*) > 4 -- >4==fraud
-- try to do this ksql!
SELECT us_west.user_score+ap_south.user_score
FROM kafka_in_zone_us_west us_west
FULL OUTER JOIN kafka_in_zone_ap_south ap_south
ON us_west.user_id = ap_south.user_id;
Key Takeaway: Rich SQL grammar with advanced time and aggregation tools
© 2021 Cloudera, Inc. All rights reserved. 14
Flink SQL
SELECT location, station_id, latitude, longitude, observation_time, weather, temperature_string,
relative_humidity, wind_string, wind_dir, wind_degrees, wind_mph, pressure_in, dewpoint_string,
dewpoint_f, dewpoint_c FROM weather2 WHERE location is not null and location <> 'null' and
trim(location) <> '' and location like '%NJ'
SELECT HOP_END(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND) as
windowEnd, count("close") as closeCount, sum(cast("close" as float)) as closeSum, avg(cast("close" as
float)) as closeAverage, min("close") as closeMin, max("close") as closeMax, sum(case when "close" >
14 then 1 else 0 end) as stockGreaterThan14 FROM stocksraw GROUP BY HOP(eventTimestamp,
INTERVAL '1' SECOND, INTERVAL '30' SECOND)
© 2021 Cloudera, Inc. All rights reserved. 15
Upcoming - Flink + Pulsar (FLiP)
https://flink.apache.org/2019/05/03/pulsar-flink.html
https://github.com/streamnative/pulsar-flink
https://streamnative.io/en/blog/release/2021-04-20-flin
k-sql-on-streamnative-cloud
16
LET’S CONNECT!
@PaasDev

More Related Content

PDF
Using the flipn stack for edge ai (flink, nifi, pulsar)
PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
PDF
ApacheCon 2021: Apache NiFi 101- introduction and best practices
PDF
Music city data Hail Hydrate! from stream to lake
PDF
fluentd -- the missing log collector
PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
PDF
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
PDF
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Using the flipn stack for edge ai (flink, nifi, pulsar)
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Music city data Hail Hydrate! from stream to lake
fluentd -- the missing log collector
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...

What's hot (20)

PDF
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
PDF
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
PDF
Python web conference 2022 apache pulsar development 101 with python (f li-...
PDF
StreamNative FLiP into scylladb - scylla summit 2022
PDF
ApacheCon 2021 Apache Deep Learning 302
PDF
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
PDF
Follow the (Kafka) Streams
PDF
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
PDF
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
PDF
Apache Kafka lessons learned @PAYBACK
PDF
Flink sql for continuous sql etl apps & Apache NiFi devops
PDF
So You Want to Write a Connector?
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
PPTX
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
PDF
Apache Deep Learning 201 - Philly Open Source
PDF
Ultimate journey towards realtime data platform with 2.5M events per sec
PDF
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
PPTX
Flink history, roadmap and vision
PDF
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
Python web conference 2022 apache pulsar development 101 with python (f li-...
StreamNative FLiP into scylladb - scylla summit 2022
ApacheCon 2021 Apache Deep Learning 302
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Follow the (Kafka) Streams
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Apache Kafka lessons learned @PAYBACK
Flink sql for continuous sql etl apps & Apache NiFi devops
So You Want to Write a Connector?
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Apache Deep Learning 201 - Philly Open Source
Ultimate journey towards realtime data platform with 2.5M events per sec
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Flink history, roadmap and vision
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Ad

Similar to Continuous SQL with Apache Streaming (FLaNK and FLiP) (20)

PDF
JConWorld_ Continuous SQL with Kafka and Flink
PDF
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
PDF
Real-time Streaming Pipelines with FLaNK
PDF
BigDataFest_ Building Modern Data Streaming Apps
PDF
big data fest building modern data streaming apps
PDF
RTAS 2023: Building a Real-Time IoT Application
PDF
Streaming sql w kafka and flink
PDF
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
PDF
Meetup - Brasil - Data In Motion - 2023 September 19
PDF
Meetup - Brasil - Data In Motion - 2023 September 19
PDF
Continus sql with sql stream builder
PDF
BigDataFest Building Modern Data Streaming Apps
PDF
Meetup: Streaming Data Pipeline Development
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
PDF
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
PDF
Continuous SQL with Apache Streaming (FLaNK and FLiP)
PDF
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
JConWorld_ Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Real-time Streaming Pipelines with FLaNK
BigDataFest_ Building Modern Data Streaming Apps
big data fest building modern data streaming apps
RTAS 2023: Building a Real-Time IoT Application
Streaming sql w kafka and flink
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Continus sql with sql stream builder
BigDataFest Building Modern Data Streaming Apps
Meetup: Streaming Data Pipeline Development
Webinar: Flink SQL in Action - Fabian Hueske
Why and how to leverage the power and simplicity of SQL on Apache Flink
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
ai tools demonstartion for schools and inter college
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
L1 - Introduction to python Backend.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Digital Strategies for Manufacturing Companies
PPTX
Introduction to Artificial Intelligence
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPT
Introduction Database Management System for Course Database
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
medical staffing services at VALiNTRY
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
history of c programming in notes for students .pptx
PDF
Softaken Excel to vCard Converter Software.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
ai tools demonstartion for schools and inter college
Computer Software and OS of computer science of grade 11.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
L1 - Introduction to python Backend.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Digital Strategies for Manufacturing Companies
Introduction to Artificial Intelligence
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Upgrade and Innovation Strategies for SAP ERP Customers
VVF-Customer-Presentation2025-Ver1.9.pptx
Introduction Database Management System for Course Database
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
medical staffing services at VALiNTRY
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
history of c programming in notes for students .pptx
Softaken Excel to vCard Converter Software.pdf

Continuous SQL with Apache Streaming (FLaNK and FLiP)

  • 1. Continuous SQL with Apache Streaming Timothy Spann Developer Advocate https://github.com/tspannhw/SpeakerProfile
  • 3. 3 Speaker Bio DZone Zone Leader and Big Data MVB; @PaasDev https://github.com/tspannhw https://www.datainmotion.dev/ https://github.com/tspannhw/SpeakerProfile https://dev.to/tspannhw https://sessionize.com/tspann/ https://www.slideshare.net/bunkertor Developer Advocate
  • 4. 4 I Can Haz Data? Today’s Data. REST and Websocket JSON “stonks” {"symbol":"CLDR", "uuid":"10640832-f139-4b82-8780-e3ad37b3d0 ce", "ts":1618529574078, "dt":1612098900000, "datetime":"2021/01/31 08:15:00", "open":"12.24500", "close":"12.25500", "high":"12.25500", "volume":"12353", "low":"12.24500"}
  • 5. 5 End to End Streaming Demo Pipeline Enterprise sources Weather Errors Aggregates Alerts Stocks ETL Analytics Streaming SQL Clickstream Market data Machine logs Social https://github.com/tspannhw/CloudDemo2021
  • 6. 6 End to End Streaming Demo Pipeline Enterprise sources Weather Errors Aggregates Alerts Stocks ETL Analytics Streaming SQL Clickstream Market data Machine logs Social
  • 7. 7 WHAT IS APACHE NIFI? Apache NiFi is a scalable, real-time streaming data platform that collects, curates, and analyzes data so customers gain key insights for immediate actionable intelligence.
  • 8. 8 APACHE NIFI Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance ACQUIRE PROCESS DELIVER • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 9. © 2021 Cloudera, Inc. All rights reserved. 9 https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html ● Reduce, Reuse, Recycle. Use Parameters to reuse common modules. ● Put flows, reusable chunks into separate Process Groups. ● Write custom processors if you need new or specialized features ● Use Cloudera supported NiFi Processors ● Use Record Processors everywhere No More Spaghetti Flows
  • 10. 10 WHAT IS APACHE PULSAR? Apache Pulsar is an open source, cloud-native distributed messaging and streaming platform. EVENTS
  • 11. 11 APACHE PULSAR Enable Geo-Replicated Messaging ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
  • 12. © 2021 Cloudera, Inc. All rights reserved. 12 Flink SQL https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite
  • 13. 13 Flink SQL -- specify Kafka partition key on output SELECT foo AS _eventKey FROM sensors -- use event time timestamp from kafka -- exactly once compatible SELECT eventTimestamp FROM sensors -- nested structures access SELECT foo.’bar’ FROM table; -- must quote nested column -- timestamps SELECT * FROM payments WHERE eventTimestamp > CURRENT_TIMESTAMP-interval '10' second; -- unnest SELECT b.*, u.* FROM bgp_avro b, UNNEST(b.path) AS u(pathitem) -- aggregations and windows SELECT card, MAX(amount) as theamount, TUMBLE_END(eventTimestamp, interval '5' minute) as ts FROM payments WHERE lat IS NOT NULL AND lon IS NOT NULL GROUP BY card, TUMBLE(eventTimestamp, interval '5' minute) HAVING COUNT(*) > 4 -- >4==fraud -- try to do this ksql! SELECT us_west.user_score+ap_south.user_score FROM kafka_in_zone_us_west us_west FULL OUTER JOIN kafka_in_zone_ap_south ap_south ON us_west.user_id = ap_south.user_id; Key Takeaway: Rich SQL grammar with advanced time and aggregation tools
  • 14. © 2021 Cloudera, Inc. All rights reserved. 14 Flink SQL SELECT location, station_id, latitude, longitude, observation_time, weather, temperature_string, relative_humidity, wind_string, wind_dir, wind_degrees, wind_mph, pressure_in, dewpoint_string, dewpoint_f, dewpoint_c FROM weather2 WHERE location is not null and location <> 'null' and trim(location) <> '' and location like '%NJ' SELECT HOP_END(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND) as windowEnd, count("close") as closeCount, sum(cast("close" as float)) as closeSum, avg(cast("close" as float)) as closeAverage, min("close") as closeMin, max("close") as closeMax, sum(case when "close" > 14 then 1 else 0 end) as stockGreaterThan14 FROM stocksraw GROUP BY HOP(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND)
  • 15. © 2021 Cloudera, Inc. All rights reserved. 15 Upcoming - Flink + Pulsar (FLiP) https://flink.apache.org/2019/05/03/pulsar-flink.html https://github.com/streamnative/pulsar-flink https://streamnative.io/en/blog/release/2021-04-20-flin k-sql-on-streamnative-cloud