Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière

1
Au-delà des brokers, un tour de
l’environement Kafka
Florent Ramière @framiere
Technical Account Manager/SE
Confluent
PARIS - 11 OCTOBRE 2018

3
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
polymorphic
What

6
Store & ETL Process
Publish &
Subscribe
In short

9
… with great properties
• Scalability
• Replication
• Security
• Resiliency
• Throughput
• Ordering
• Exactly Once Semantic
• Transaction
• Idempotency
• Immutability
• Performance
• …

11
… spawned a full platform
Apache Kafka®
Core | Connect API | Streams API
Stream Processing & Compatibility
KSQL | Schema Registry
Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME
APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
CONFLUENT PLATFORM
Administration & Monitoring
Control Center | Security
Connectivity
Clients | Connectors | REST Proxy
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED

13
Apache Kafka Connect API: Import and Export Data In & Out of Kafka
JDBC
Mongo
MySQL
Elastic
Cassandra
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and sinks
Preserves data schema
Integrated within
Confluent Control Center

14
Connectors: Connect Kafka Easily with Data Sources and Sinks
Databases Datastore/File Store
Analytics Applications / Other
Orange Logo denotes Connectors developed and fully supported by Confluent

15
Kafka Connect API, Part of the Apache Kafka™ Project
Connect any source to any target system
Integrated
• 100% compatible with Kafka v0.9 and
higher
• Integrated with Confluent’s Schema
Registry
• Easy to manage with Confluent Control
Center
Flexible
• 40+ open source connectors available
• Easy to develop additional connectors
• Flexible support for data types and
formats
Compatible
• Maintains critical metadata
• Preserves schema information
• Supports schema evolution
Reliable
• Automated failover
• Exactly-once guarantees
• Balances workload between nodes

16
Confluent Hub - The Kafka App Store

18
Clients: Communicate with Kafka in a Broad Variety of Languages
Apache Kafka
Confluent Platform Community Supported
Proxy http/REST
stdin/stdout
Confluent Platform Clients developed and fully supported by Confluent

19
REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
REST Proxy
Non-Java Applications
Native Kafka Java
Applications
Schema Registry
REST / HTTP
Simplifies administrative
actions
Simplifies message creation
and consumption
Provides a RESTful
interface to a Kafka cluster

21
MQTT Proxy: Streamline IoT Data Integration with Kafka
Connect all IoT data sources with the streaming
platform - leverages all of your infrastructure
investments
Reduce operational cost and complexity by
eliminating third party MQTT brokers and their
intermediate storage and lag
Ensure IoT data delivery at all QoS levels (QoS0,
QoS1 and QoS2) of the MQTT protocol
Kafka Broker
Kafka Broker
Kafka Broker
MQTT
ProxyGatewaysDevices
MQTT MQTT

22
?
Frictionless MQTT Connectivity with Confluent Platform
Kafka BrokerKafka BrokerKafka BrokerDevicesDevicesDevicesGateways
MQTT
Broker
Connect
w/ MQTT
connector
Connect
w/ MQTT
connectorMQTT
DevicesDevicesDevicesDevices MQTT
Kafka BrokerKafka BrokerKafka Broker
MQTT
ProxyMQTT
DevicesDevicesDevicesGateways
DevicesDevicesDevicesDevices
MQTT
Approach 1: Integrate 3rd Party MQTT Broker(s) with Kafka Connect :
Approach 2: Integrate MQTT clients directly via MQTT Proxy (CP 5.x and later) :

24
Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt

25
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• mapValues()
• filter()
• punctuate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL
Flexibility Simplicity
Trade offs

26
KSQL: Enable Stream Processing using SQL-like Semantics
Example Use Cases
• Streaming ETL
• Anomaly detection
• Event monitoring
Leverage Kafka Streams API
without any coding required
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via CLI or Control Center
user interface

27
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;

29
The Challenge of Data Compatibility at Scale
App 1
App 2
App 3
Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems can
use the data is key to an
operational stream pipeline
Example: Date formats
Even within a single application,
different formats can be
presented
Incompatibly formatted message

30
Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible changes
● Support multi-data center environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Registry

32
Multiple options!
• Zip
• Yum/apt
• Ansible
• Docker
• DC/OS
• Helm-charts
• Confluent Operator
• ... Cloud!

33
Operator: Achieve End to End Automation on Kubernetes
Confluent Platform
Docker Images
Public Cloud On-Premises
Pivotal Mesosphere Red HatAWS Azure GCP
Confluent Operator operationalizes years of experience delivering a
fully-managed service - Confluent Cloud - on the leading public clouds
Confluent Cloud
Docker Images
Confluent Operator
Accelerate time to value with
automated zero-touch
provisioning
Reduce OpEx and boost DevOps
agility with rolling updates, elastic
scaling and auto data balancing
Increase resiliency via SLA
monitoring through Control
Center or Prometheus

35
Auto Data Balancer: Achieve Enterprise-level Performance for Kafka
Before
After
Rebalance
Dynamically move partitions
to optimize resource
utilization and reliability
Enable elastic scaling by
easily adding and removing
nodes from your Kafka cluster
ADB traffic is throttled upon
data transfers to ensure
network bandwidth

36
Replicator: Stretch Kafka Across Data Centers and Public Cloud
Protect business-critical data and
metadata by replicating down to topic-level
configurations
Minimize recovery time objectives (RTO)
through automated failover and
switchback
Meet recovery point objectives (RPO)
running more workers to increase
replication throughput
Bridge your data center to the
cloud with Confluent Cloud

37
Deploy Confluent Platform on K8s via a Growing Partner Ecosystem

39
Confluent Control Center– Cluster Health & Administration
Cluster health dashboard
• Monitor the health of your
Kafka clusters
and get alerts if any problems
occur
• Measure system load,
performance,
and operations
• View aggregate statistics or
drill down
by broker or topic
Cluster administration
• Monitor topic configurations

40
Operate More Secure, Reliable and Performant Apache Kafka
● Broker configuration view → see config across
multiples Kafka clusters or check values for specific
brokers
● Consumer lag → view how consumers are
performing based on offset, spot potential issues
and take proactive steps to keep performance high
● Feature access controls → control customer access
to topic inspection, schemas, and KSQL
For Operators
Control Center enhancements in Confluent Platform 5.0

41
Build More Powerful Streaming Applications
● Topic inspection → gain insight into the actual data in
Kafka topics
● Schema registry integration → view older and current
schema versions in a git-like UI
● KSQL GUI → create streams and tables from topics,
experiment with transient queries, and run persistent
queries to filter and enrich data
For Developers

42
View consumer-partition lag across
topics for a consumer group
Alert on max consumer group lag
across all topics
Consumer Lag Monitoring
42

43
Make stream processing more accessible
Build stream processing IP in CE
Manage streams & tables
Run KSQL (transient & persistent)
View persistent queries
KSQL UI
43

44
White papers
https://github.com/framiere/monitoring-demo

47
Resources - Confluent Cloud Datasheet
https://www.confluent.io/wp-content/uploads/confluent-cloud-datasheet.pdf

48
Resources - Confluent Enterprise Reference Architecture
https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/

49
Optimizing Your Apache Kafka® Deployment
https://www.confluent.io/white-paper/optimizing-your-apache-kafka-deployment/

50
Small Cluster Reference Architecture – 19 software nodes – 8 Hosts
10 nodes
6 nodes
2 nodes
1 node

51
Large Cluster Reference Architecture – 22 software nodes - 19 hosts
3 nodes
4 nodes
5 nodes
4 nodes
2 nodes
2 nodes
1 node

53
Resources – Community Slack and Mailing List
https://slackpass.io/confluentcommunity
https://groups.google.com/forum/#!forum/confluent-platform

55
A Kafka Story
https://github.com/framiere/a-kafka-story

56
cp-demo
https://github.com/confluentinc/cp-demo
With security inside!

57
Kafka Boom Boom
https://github.com/Dabz/kafka-boom-boom

Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière

More Related Content

Similar to Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière (20)

More from confluent (20)

Recently uploaded (20)

Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière