Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Multi-Region Clusters, Audit Logs, and more)

Confluent Platform 5.4 -
RBAC, Multi-Region Clusters and more
March 25th 2020
Kai Waehner
Technology Evangelist
kai.waehner@confluent.io
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de

Ways to Deploy
Confluent Platform
The enterprise distribution of
Apache Kafka
VM
Deploy on any platform
on-prem or cloud
Self Managed Software Fully Managed Software
Confluent Cloud
Apache Kafka re-engineered
for the Cloud
Available on the leading public clouds
Also via MarketplaceAlso via Marketplace

Confluent Platform
Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE
COMMITTER-DRIVEN EXPERTISE PartnersTrainingProfessional
Services
Enterpris
e Support
Apache Kafka
EFFICIENT
OPERATIONS AT SCALE
PRODUCTION-
STAGE PREREQUISITES
UNRESTRICTED
DEVELOPER PRODUCTIVITY
SQL-based Stream Processing
KSQL (ksqlDB)
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
non-Java clients | REST Proxy
GUI-driven Mgmt & Monitoring
Control Center
Flexible DevOps Automation
Operator | Ansible
Dynamic Performance & Elasticity
Auto Data Balancer | Tiered Storage
Enterprise-grade Security
RBAC | Secrets | Audit logs
Data Compatibility
Schema Registry | Schema Validation
Global Resilience
Multi-Region Clusters | Replicator
Developer Operator Architect
Open Source | Community licensed
PARTNERSHIP
FOR BUSINESS SUCCESS
Complete Engagement Model
Revenue / Cost / Risk Impact
TCO / ROI
Executive Buyer

Rapid Pace of Innovation to Enable Enterprises
January 2020
CP 5.4 (based on AK 2.4)
Security
● Role-Based Access Control
● Structured Audit Logs
Resilience
● Multi-Region Clusters
Data Compatibility
● Schema Validation
Management & Monitoring
● Control Center
○ RBAC management
○ Replicator monitoring
Performance & Elasticity
● Tiered Storage (preview)
Stream Processing
● ksqlDB features (preview)
April 2019
Developers
● Free single-broker
developer license
● librdkafka and clients 1.0
KSQL
● New query expressions
● GUI enhancements
Replicator
● Schema migration to
CCloud
Control Center
● Dynamix broker
configuration
● Schema Registry
management
● Multi-cluster Connect &
KSQL
● Enhanced scalability
July 2018
Security
● AD/LDAP Authorizer
Replicator
● Automatic offset translation
Control Center
● Consumer lag
● View broker configuration
● View topics
● KSQL editor
Ecosystem
● MQTT Proxy
July 2019
Security
● Role-Based Access Control
(preview)
● Secret Protection
DevOps automation
● Kubernetes Operator
● Ansible Playbooks
● Control Center redesigned
user interface
● New CLI

Enterprise Grade
Security
• Architecting with security is a
design priority
• Avoiding unnecessary complexity is
key
• As usage of event streaming
spreads, native tools (e.g. Kafka
Access Control Lists) for managing
authorization can become complex
• Problem is exacerbated when
failing to standardize security across
the platform 6
Why you need better
authorization?

Role-Based Access Control
Provides platform-wide security
with fine-tuned granularity
• Granular control of access
permissions, including:
• Clusters, topics, consumer
groups, connectors
• Efficient management at large scale
• Delegate authorization
management to true resource
owners
• Platform-wide standardization
• Enforced via GUI, CLI and APIs
• Enforced across all CP
components: Connect, KSQL,
Schema Registry, REST Proxy,
Control Center and MQTT Proxy
Users/
Groups
Roles Resource
scoping
CLI GUI API
Role
Binding
RBAC
authorization
7

Enterprise Grade
Security
• Lack of visibility into actions taken
by users/applications
• Difficult to perform forensics to
detect anomalies and identify bad
actors
• Failure to comply with regulatory
requirements
8
Why you need better
visibility?

Structured Audit
Logs
Enable security traceability and
regulatory compliance
• Detection of abnormal behavior and
potential security threats
• Capture authorization logs in a set of
dedicated Kafka topics
• Process and analyze with KSQL, or
offload to external systems (e.g.
Splunk, S3)
• Industry Standardization
• Uses CloudEvents specification to
define the syntax of the logs
Event Description Category Capture
Default
Authorize An RBAC
authorization is
being requested.
MANAGEMENT Yes
CreateTopics A topic is being
created.
MANAGEMENT Yes
Produce A Kafka producer is
writing a batch of
records to a topic.
PRODUCE No
FetchConsumer A Kafka consumer is
reading a batch of
records from a topic.
CONSUME No
LeaderAndIsr Controller is sending
leader and ISR state
to a broker.
INTERBROKER No
Sample Audit Logs
9

Global Resilience
• Modern companies have high
expectations for durability,
availability, and latency
• Replication based on Kafka Connect
(e.g. Replicator or MirrorMaker 2)
come with operational complexity
and require downtime
• Stretch cluster architectures
historically came with a tradeoff:
availability vs. performance
11
Why you need better
disaster recovery?

Multi-Region Clusters
Change the game for disaster recovery for Kafka
• Zero downtime and zero data loss
for critical Kafka Topics
• Automated client failover
• Streamlined DR operations
• Leverages Kafka’s internal
replication
• No separate Connect clusters
• Single multi-region cluster with
high write throughput
• Asynchronous replication using
“Observer” replicas
• Low bandwidth costs and high read
throughput
• Remote consumers read data
locally, directly from Observers
Broker
1
Broker
2
Broker
3
ZK1
Broker
4
Broker
5
Broker
6
Broker
1
Broker
2
ZK2
Client D Client F Client G
Failover site
ZK3
Broker
3
Broker
4
Broker
5
Broker
6
Client A Client B
us-central-1
Client A Client B
automated
client failover
Observer
replicas
us-west-1 us-east-1
Site failure!
“tie-breaker”
datacenter
Single Kafka Cluster
12

Data Compatibility
• Confluent Schema Registry increase
data compatibility through client-
level “agreements”, but Kafka is
unaware
• No programmatic way of enforcing
that producers talk to Schema
Registry before publishing
messages to Kafka
• Leads to risk and uncertainty
regarding data quality for large
organizations
14
Why you need
enhanced validation of
of data quality?

Schema Validation
Provides a centralized way of
controlling data compatibility
• Certainty and piece of mind at scale
regarding data quality
• Automated broker-side schema
validation and enforcement
• Direct interface from the broker
to Confluent Schema Registry
• Granular control over schema
validation
• Enabled at the topic level
Producer Broker
Schema
Registry
1. Invalid
schema
2. Error
message
confluent.value.schema.validation=true
15

GUI -Driven

GUI Driven
Management
&
Monitoring
• Control Center is rapidly becoming
the de facto user interface for many
Confluent Platform users
• We must ensure that Control Center
can manage and monitor Confluent
Platform comprehensively
• Need for a wide variety of use cases
and supported scale
17
Why these
improvements to
Control Center?

GUI-driven mgmt for
new CP 5.4 features
• Role-Based Access Control
• View own permissions, and
manage subordinate role
bindings
• Multi-Region Clusters
• Track Observer replica
placement in each topic view
• Schema Validation
• Enable at the topic level when
creating or editing topics
RBAC management
18

Confluent Replicator
integration
• Simplified monitoring for multi-site
replication with the GUI
• Track key metrics such as
throughput and lag
Replicator monitoring
19

New aggregate views
• Simplified monitoring and
troubleshooting for Kafka clusters
• Cluster Overview: shows overall
status of the Kafka cluster,
including brokers, replicas,
partitions and topics
• Metrics Dashboard: aggregates
all Kafka cluster metrics into a
single page
Cluster Overview
Metrics Dashboard
20

Dynamic Performance & Elasticity

Dynamic Performance
& Elasticity
• As event streaming spreads, the
platform is required to store larger
amounts of data for longer periods
of time
• Kafka’s tight coupling between
compute and storage leads to
difficulty to scale the platform
• Longer data retention leads to high
storage costs
22
Why you need
enhanced scalability
and data efficiency ?

Tiered Storage (preview)
Enable Kafka with infinite retention cost-
effectively
• Infinite retention
• Older data is offloaded to
inexpensive object storage,
accessible at any time
• Reduced storage costs
• Storage limitations, like capacity and
duration, are effectively uncapped
• Elastic scalability
• “Lighter” Kafka brokers enable
instantaneous load balancing when
scaling up
Broker
Compute Storage
Clients
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Object Storage
23

Confluent Server
Enables enterprise features
Required to enable:
● Operator
● RBAC
● Structured Audit Logs
● Multi-Region Clusters
● Schema Validation
● Tiered Storage (preview)
Optional software package
● Deploy CP with Confluent
Server or Apache Kafka
● In-place migration between
Confluent Server and Kafka
Apache Kafka
enterprise capabilities
Confluent Server
Confluent Platform
KSQL
Schema
Registry
REST Proxy
Control Center Replicator MQTT Proxy
24

Kafka
producer/
consumer
Kafka
Streams
ksqlDB
The 3 stream processing modalities with Confluent

ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
The 3 stream processing modalities differ in
flexibility and ease of use
Kafka producer/consumer Kafka Streams ksqlDB
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(), Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;

Using external processing systems leads to
complicated architectures
DB CONNECTOR
CONNECTOR
APP
APP
DB
STREAM
PROCESSING
CONNECTOR APPDB

We can put it back together in a simpler way
DB
APP
APP
DB
APP
PULL
PUSH
CONNECTORS
STREAM PROCESSING
STATE STORES
ksqlDB

Connect integration and pull queries enable end-to-end
streaming in just a few SQL statements
Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
Capture data
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN user_profiles ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa';
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);

Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Multi-Region Clusters, Audit Logs, and more)

Confluent Platform
and Confluent Cloud
are always built on the
latest Version of
Apache Kafka
If you want to learn what’s included in
Apache Kafka 2.4, we have resources
available for you:
• Technical Blog:
https://www.confluent.io/blog/apac
he-kafka-2-4-latest-version-updates
• Overview Video:
https://youtu.be/Ipzc--mbvzg
32
Apache Kafka 2.4

33
Alle Events hier:
https://events.confluent.io/
confluentkitchenseries2020

Kai Waehner
Technology Evangelist
kai.waehner@confluent.io
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de

Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Multi-Region Clusters, Audit Logs, and more)

More Related Content

What's hot (20)

Similar to Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Multi-Region Clusters, Audit Logs, and more) (20)

More from Kai Wähner (20)

Recently uploaded (20)

Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Multi-Region Clusters, Audit Logs, and more)