Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming

Streaming Data Platforms for
Cloud-Native Event-driven Applications

Apache Pulsar +
Apache NiFi +
Apache Flink +
Cloudera
Tim Spann

Agenda
1. Welcome
2. Introduction to Apache Pulsar
• Basics of Pulsar
• Use Cases
3. Apache NiFi <-> Apache Pulsar
4. Apache Flink <-> Apache Pulsar
5. Let’s Build an App!
• Demo
4. Resources
5. Q&A

Tim Spann
Developer Advocate
Tim Spann
Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big
Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience
at both global conferences and through individual conversations.

FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft

Proprietary & Confidential |
Apache Pulsar made easy.

Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.

CREATED
Originally
developed inside
Yahoo! as Cloud
Messaging
Service
GROWTH
10x Contributors
10MM+ Downloads
Ecosystem Expands
Kafka on Pulsar
AMQ on Pulsar
Functions
. . .
2012 2016 2018 TODAY
APACHE TLP
Pulsar
becomes
Apache top
level project.
OPEN SOURCE
Pulsar
committed
to open source.
Apache Pulsar Timeline

Pulsar Has a Built-in Super Set of OSS
Features
Durability
Scalability Geo-Replication
Multi-Tenancy
Uniﬁed Messaging
Model
Reduced Vendor Dependency
Functions
Open-Source Features

Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Integrated Schema Registry

Ideal for app and data tiers
Less sprawl and better
utilization
Cloud-native scalability
Build globally without
the complexity
Cost effective long-term
storage
Pulsar across the
organization

Joining Streams in SQL
Perform in Real-Time
Ordering and Arrival
Concurrent Consumers
Change Data Capture
Data Streaming

Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging

Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming

Messages - the basic unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.

Connectivity
• Libraries - (Java, Python, Go, NodeJS,
WebSockets, C++, C#, Scala, Rust,...)
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Ofﬂoaders - Tiered Storage - (S3)
hub.streamnative.io

Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers

Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar

Uniﬁed
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Inﬁnite
Scalability
The Basics

● Serverless computing framework.
● Unbounded storage, multi-tiered
architecture, and tiered-storage.
● Streaming & Pub/Sub messaging
semantics.
● Multi-protocol support.
● Open Source
● Cloud-Native
● Multi-Tenant
Features

Message Queuing
Data Streaming
Unified Messaging & Streaming Platform

Apache Pulsar features
Cloud native with decoupled
storage and compute layers.
Built-in compatibility with your
existing code and messaging
infrastructure.
Geographic redundancy and high
availability included.
Centralized cluster management
and oversight.
Elastic horizontal and vertical
scalability.
Seamless and instant partitioning
rebalancing with no downtime.
Flexible subscription model
supports a wide array of use cases.
Compatible with the tools you use
to store, analyze, and process data.

● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata
Storage
Pulsar Cluster

Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
Tenant - Namespaces - Topics

Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Messages

Flexible Pub/Sub API for Pulsar - Shared
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("work-q-1")
.subscriptionType(SubType.Shared)
.subscribe();

Flexible Pub/Sub API for Pulsar - Failover
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("stream-1")
.subscriptionType(SubType.Failover)
.subscribe();

Reader Interface
byte[] msgIdBytes = // Some byte
array
MessageId id =
MessageId.fromByteArray(msgIdBytes);
Reader<byte[]> reader =
pulsarClient.newReader()
.topic(topic)
.startMessageId(id)
.create();
Create a reader that will read from
some message between earliest and
latest.
Reader

Built-in
Back
Pressure
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("hellotopic")
.blockIfQueueFull(true) // enable blocking
// when queue is full
.maxPendingMessages(10) // max queue size
.create();
// During Back Pressure: the sendAsync call blocks
// with no room in queues
producer.newMessage()
.key("mykey")
.value("myvalue")
.sendAsync(); // can be a blocking call

• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Oﬄoaders - Tiered Storage - (S3)
Tenant - Namespaces - Topics

Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Speciﬁcally designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
A serverless event streaming
framework

● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Publish the results of the
computation to another topic.
● Support multiple
programming languages (Java,
Python, Go)
● Can leverage 3rd-party
libraries to support the
execution of ML models on
the edge.
Pulsar Functions

Moving Data In and Out of Pulsar
IO/Connectors are a simple way to integrate with external systems and move
data in and out of Pulsar. https://pulsar.apache.org/docs/en/io-jdbc-sink/
● Built on top of Pulsar Functions
● Built-in connectors - hub.streamnative.io
Source Sink

Apache NiFi Pulsar Connector
https://github.com/streamnative/pulsar-nifi-bundle

https://github.com/david-streamlio/pulsar-nifi-bundle

https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html

https://github.com/david-streamlio/pulsar-nifi-bundle/releases/tag/v1.14.0

Events <->
Streaming FLiPS Apps
StreamNative Hub
StreamNative Cloud
Uniﬁed Batch and Stream COMPUTING
Batch
(Batch + Stream)
Uniﬁed Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
<-> Events <->
CDC
Apps

Building Real-Time Requires a Team

Building An App
Code Along With Tim
<<DEMO>>

RESOURCES
Here are resources to continue your journey
with Apache Pulsar

Links
● https://github.com/tspannhw/Flip-iot
● https://github.com/streamnative/examples/tree/master/cloud/go
● https://pulsar.apache.org/docs/en/client-libraries-go/
● https://github.com/apache/pulsar-client-go
● https://github.com/tspannhw/Meetup-YourFirstEventDrivenApp
● https://github.com/tspannhw/SpeakerProfile
● https://github.com/tspannhw/pulsar-flinksql-1.13.2
● https://github.com/tspannhw/pulsar-pychat-function
● https://www.datainmotion.dev/
● https://www.meetup.com/SF-Bay-Area-Apache-Pulsar-Meetup/events/283083105/
● https://medium.com/@tspann
● https://dzone.com/users/297029/bunkertor.html
● https://dev.to/tspannhw
● https://linktr.ee/tspannhw

Scan the QR code
to learn more about
Apache Pulsar and
StreamNative.

Scan the QR code
to build your own
apps today.

Deploying AI With an
Event-Driven
Platform
https://dzone.com/trendreports/enterprise-ai-1

Let’s Keep
in Touch!
Tim Spann
Developer Advocate
@PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
@TimPulsar@hachyderm.io
https://hachyderm.io/@TimPulsar

Data Aggregation
Concentrate data for multiple clusters into a single one where main processing
happens

Active - Active
Processing happens identical in both clusters.

Active - Passive
Only active cluster processes data - Passive cluster is for failover

● Lightweight computation similar
to AWS Lambda.
● Speciﬁcally designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Java Functions
A serverless event
streaming framework
Pulsar Functions

Integrated with pulsar-admin CLI

● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions

Simple Function with Pulsar SDK

from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
fields = json.loads(input)
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(fields["comment"])
row = { }
row['id'] = str(msg_id)
if ss['compound'] < 0.00:
row['sentiment'] = 'Negative'
else:
row['sentiment'] = 'Positive'
row['comment'] = str(fields["comment"])
json_string = json.dumps(row)
return json_string
Entire Function
Pulsar Python NLP Function

https://www.inﬂuxdata.com/integration/mqtt-monitoring/
https://www.influxdata.com/integration/mqtt-monitoring/
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 components
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Apache NiFi Basics

Apache NiFi - Apache Pulsar Connector

Apache NiFi - Producing to Pulsar

● Uniﬁed computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
Apache Flink

Apache Flink Job Running Against Apache Pulsar

● Java, Scala, Python Support
● Strong ETL/ELT
● Diverse ML support
● Scalable Distributed compute
● Apache Zeppelin and Jupyter Notebooks
● Fast connector for Apache Pulsar
Apache Spark

val dfPulsar = spark.readStream.format("
pulsar")
.option("
service.url", "pulsar://pulsar1:6650")
.option("
admin.url", "http://pulsar1:8080
")
.option("
topic", "persistent://public/default/airquality").load()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("
console")
.option("truncate", false).start()
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 3.2.0
/_/
Using Scala version 2.12.15
(OpenJDK 64-Bit Server VM, Java 11.0.11)
Apache Spark + Apache Pulsar

Apache Spark Running Against Apache Pulsar

Fanout, Queueing, or Streaming
In Pulsar, you have ﬂexibility to use different subscription modes.
If you want to... Do this... And use these subscription
modes...
achieve fanout messaging
among consumers
specify a unique subscription name
for each consumer
exclusive (or failover)
achieve message queuing
among consumers
share the same subscription name
among multiple consumers
shared
allow for ordered consumption
(streaming)
distribute work among any number of
consumers
exclusive, failover or key shared

84
Uniﬁed Messaging Model
Simplify your data infrastructure
and enable new use cases with
queuing and streaming capabilities
in one platform.

Ideal for app and data tiers
Less sprawl and better utilization
Cloud-native scalability
Build globally without the
complexity
Cost effective long-term storage
Pulsar across the
organization

Starting a Function - Distributed Cluster
Once compiled into a JAR, start a Pulsar Function in a distributed cluster:

Customers on StreamNative Cloud

Why a Table Abstraction?
● In many use cases, applications want to consume data from a
Pulsar Topic as if it were a database table, where each new
message is an “update” to the table.
● Up until now, applications used Pulsar consumers or readers to
fetch all the updates from a topic and construct a map with the
latest value of each key for received messages.
● The Table Abstraction provides a standard implementation of this
message consumption pattern for any given keyed topic.

TableView
● New Consumer type added in Pulsar 2.10 that provides a continuously
updated key-value map view of compacted topic data.
● An abstraction of a changelog stream from a primary-keyed table, where
each record in the changelog stream is an update on the primary-keyed table
with the record key as the primary key.
● READ ONLY DATA STRUCTURE!

How does it work?
● When you create a TableView,
and additional compacted
topic is created.
● In a compacted topic, only the
most recent value associated
with each key is retained.
● A background reader
consumes from the compacted
topic and updates the map
when new messages arrive.

Event-Based Microservices
• When your microservices use a message bus to communicate, it is
referred to as an “event-driven” architecture.
• In an event-driven architecture, when a service performs some
piece of work that other services might be interested in, that
service produces an event. Other services consume those events
so that they can perform their own tasks.
• Messages are stored in intermediate topics to prevent message
loss. If a service fails, the message remains in the topic.

Event-Based Microservices Application
• A basic order entry use case
for a food delivery service
will involve several different
microservices.
• They communicate via
messages sent to Pulsar
topics.
• A service can also subscribe
to the events published by
another service.

Founded by the original creators of Apache Pulsar.
StreamNative has more experience designing,
deploying, and running large-scale Apache Pulsar
instances than any team in the world.
StreamNative employs more than 50% of the
active core committers to Apache Pulsar.

Pulsar and other
Ecosystems
Real-Time
Processor
Long-Term
Storage
Subscription B Group 0
Subscription B Group 20
...
Pulsar will feed data into
many different systems and
data format is crucial.
Pulsar feeds real-time
engines and data format
makes this integration easier.
Data is offloaded from Pulsar
to S3/HDFS and beneﬁts
from a correct data format.
Any improvement in size
reduction is multiplied by the
number of consumers.

Apache Avro
Avro is a data serialization system used across Big Data ecosystem.
Avro provides:
● Rich data structures
● A compact, fast, binary data format
● A container ﬁle, to store persistent data
● Remote procedure call (RPC)
● Simple integration with dynamic languages
For more information: https://avro.apache.org/docs/current/

Messaging
versus
Streaming
Messaging Use Cases Streaming Use Cases
Service x commands service y to
make some change.
Example: order service removing
item from inventory service
Moving large amounts of data to
another service (real-time ETL).
Example: logs to elasticsearch
Distributing messages that
represent work among n
workers.
Example: order processing not in
main “thread”
Periodic jobs moving large amounts of
data and aggregating to more
traditional stores.
Example: logs to s3
Sending “scheduled” messages.
Example: notiﬁcation service for
marketing emails or push
notiﬁcations
Computing a near real-time aggregate
of a message stream, split among n
workers, with order being important.
Example: real-time analytics over page
views

Differences in consumption
Messaging Use Case Streaming Use Case
Retention The amount of data retained is
relatively small - typically only a day
or two of data at most.
Large amounts of data are retained,
with higher ingest volumes and
longer retention periods.
Throughput Messaging systems are not designed
to manage big “catch-up” reads.
Streaming systems are designed to
scale and can handle use cases
such as catch-up reads.

{
"title" : "BUS 76 - Oct 17, 2022 04:38:41 PM",
"description" : "Bus route 76 is operating on a detour in Hasbrouck Heights. Terrace Avenue
is closed from Paterson Avenue to Madison Avenue for utility work.Buses will use Paterson",
"link" : "https://www.njtransit.com/node/1540960",
"guid" : "https://www.njtransit.com/node/1540960",
"advisoryAlert" : 0,
"pubDate" : "Oct 17, 2022 04:38:41 PM",
"ts" : "1666039443600",
"companyname" : "newjersey",
"uuid" : "3b19dd32-db1e-4320-ba9a-f6bfd4a87bb9",
"servicename" : "bus"
}
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation><temp_c>20.0</temp_c>
<pressure_string>1019.1 mb</pressure_string>
<pressure_mb>1019.1</pressure_mb>
<pressure_in>30.17</pressure_in>
<dewpoint_string>26.6 F</dewpoint_string>
</current_observation>
Data
Information?
'aircraft': [{'hex': 'ae6d7a',
'alt_baro': 25000, 'mlat': [],
'tisb': [], 'messages': 177,
'seen': 0.1, 'rssi': -22.7}

Trains, Planes and Automobiles +++
Information Needed Data Feed(s)
Local weather conditions ● XML, JSON, RSS
Mass transit status & alerts ● XML, JSON, RSS
Regional highways & tunnels ● GeoRSS, XML, ProtoBuf, JSON
Local social media ● JSON
ADS-B Plane Data ● JSON
Local air quality ● JSON

https://streamnative.io/blog/engineering/2022-04-14-what-the-ﬂip-is-the-ﬂip-stack/

TRANSCOM
Traffic
Sensors
Aggregates
Status
SQL
Analytics
APIs
REST

https://streamnative.io/apache-niﬁ-connector/

Apache NiFi - Data Lineage / Provenance

TRANSIT:
Infinite Message Bus with Apache Pulsar

Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Uniﬁed Messaging
Model

Cloud native with decoupled
storage and compute layers.
Built-in compatibility with your
existing code and messaging
infrastructure.
Geographic redundancy and high
availability included.
Centralized cluster management
and oversight.
Elastic horizontal and vertical
scalability.
Seamless and instant partitioning
rebalancing with no downtime.
Flexible subscription model
supports a wide array of use cases.
Compatible with the tools you use
to store, analyze, and process data.
Apache Pulsar features

Use Pulsar to Stream from Lakehouses

Use Pulsar to Stream to Lakehouses

ETL:
Streaming with Apache Spark

Building Spark SQL View
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/weather")
.load()
dfPulsar.printSchema()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("console")
.option("truncate", false)
.start()

Example Spark Code
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/airquality").load()
val pQuery = dfPulsar.selectExpr("*").writeStream.format("parquet")
.option("truncate", false).start()

ENRICH, ML & ROUTE:
Pulsar Functions

Why Pulsar Functions for Microservices?
Desired Characteristic Pulsar Functions…
Highly maintainable and testable ● Small pieces of code in Java, Python, or Go.
● Easily maintained in source control repositories and tested
with existing frameworks automatically.
Loosely coupled with other services ● Not directly linked to one another and communicate via
messages.
Independently deployable ● Designed to be deployed independently
Can be developed by a small team ● Often developed by a single developer.
Inter-service Communication ● Support all message patterns using Pulsar as the underlying
message bus.
Deployment & Composition ● Can run as individual threads, processes, or K8s pods.
● Function Mesh allows you to deploy multiple Pulsar
Functions as a single unit.

Pulsar Functions
● Route
● Enrich
● Convert
● Lookups
● Run
Machine Learning
● Logging
● Auditing
● Parse
● Split
● Convert

Pulsar Functions
● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party
libraries to support the
execution of ML models on the
edge.

from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
logger.info("Message Content: {0}".format(input))
msg_id = context.get_message_id()
row = { }
row['id'] = str(msg_id)
json_string = json.dumps(row)
return json_string
Python Pulsar Function

CONTINUOUS ANALYTICS:
Apache Flink SQL

SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality;
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea;
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea;

Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming

More Related Content

Similar to Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming (20)

More from Timothy Spann (20)

Recently uploaded (20)

Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming