SlideShare a Scribd company logo
Kafka Practical
Experience
RiCo Chen
Agenda
◎Kafka Overview
◎AFT Kafka Architecture
.Net Client -Pub/Sub
Key Terms Introduction
Message Delivery
◎Monitor Kafka Architecture
◎Kafka Performance Tuning
Producer
Broker
Consumer
JVM
Kafka Overview
• High-performance distributed streaming platform
• Popular project on githup: star=6389 , fork=3909
• Simple installation、Easy scale-out、More resource(Linux)
• High availability、High failover、High reliability、Auto
balance、Message persistence..
• Use Cases:Log Aggregation、Metrics、Event Sourcing、
Messaging…
• Power by: LinkedIn、airbnb、Mozilla、Twitter、LINE、
skyscanner、trivago、Hotel.com、PayPal、Uber、
Yahoo…
AFT Kafka Architecture
Kafka cluster(P2P)
broker1 broker2 broker3
zookeeper cluster(Master-Salve)
node1 node2 node3
UGS
API
UGS
WEB
UGS
WinSvr
Producers
Consumers
Serialization
publish message(Batch、
Fire and forget)
Middleware
Consumer
group
Logger
Services
Logger
Services
Logger
Services
deserialization
subscribe message
(message set、
async)
Topic’s configuration、broker status、
Cluster membership、controller process、
Coordinator process
Message(binary) queue、
partition、offset manager、
Leader cache、topic、
ReplicaManager、
GroupCoordinator(rebalance)
Replicat
e
Replicat
e
TCP
TCP
Heartbeat
.Net Client -Pub/Sub
Key Terms Introduction
• Broker: MQ process(Minimum unit in kafka cluster)
• Topic: Category of message(data is store in)
• Producer: push message to broker(write data)
• Consumer: pull message from broker(read data)
• ConsumerGroup: provide tolerance、 scalability、parallel for
Consumer
• Partition: provide tolerance、 scalability、parallel
• Offset: Message position on each partition
Message Delivery
At most once: Messages may be lost but are never redelivered
At least once: Messages are never lost but may be redelivered
Exactly once: Each message is delivered once and only
once(0.11.x)
Messages sent by a producer to a particular topic partition will
be appended in the order they are sent
A consumer instance sees records in the order they are stored in
the log.
Tolerate up to N-1 server failures.(depends replication factors)
Monitor Kafka Architecture
Telegraf
http
(every 10 sec)
Influxdb
2.Result
Grafana
JMX
1.QL
http
(every 10 sec)
Kafka-
Manager
Kafka
Eagle
Mysql
2.Result
TCP
1.Collec
t
2.Store
TCP
High level architecture blueprint
UGS
Platform
/Producer
Logger
Services
/Consumer
Channel
SQL
Server
Kafka
eagle
/Consumer
Kafka Performance Tuning
• Producer
• Broker
• Consumer
• JVM
Producer
• Load balancing(sends data directly to the broker that is the leader for the
partition)
• Acks=0 producer no wait any acknowledgment from the broker at all.
Lowest latency at the cost of durability but high data lost.
• Acks=1 producer gets an acknowledgment after the leader wrote the
record to its local log, but will respond without awaiting full
acknowledgement from all followers. Maybe follower will be lost data if
leader commit after.
• Acks=-1 producer gets all acknowledgment after all in-sync replicas has
received the data. Strong guarantee data not be lost.
• batch.size=100 ,net client
• send.buffer.bytes=100*1024
• producer.type=async
• compression.type=none
• max.in.flight.requests.per.connection=3
Note: min.insync.replicas>=2
ACKs Throughput Latency Durability
0 High Low No Gurantee
1 Medium Medium Leader
-1 Low High ISR
Broker
• More partition = more concurrent process = more memory =
more io access =increase throughput= increase latency
(brokers have to distribution on each partition) P.S single topic
less than 1024 partitions
• Number of Factors = two brokers at least
num.io.threads=8 num.network.threads=3 background.threads=10
queued.max.requests=500
socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
num.recovery.threads.per.data.dir=2
Log.retention.hours=24 log.flush.interval.messages=10000
log.flush.interval.ms=1000 log.cleanup.policy=delete log.cleaner.enable=true
log.cleaner.threads=1 log.cleaner.backoff.ms=30000
log.segment.bytes=1073741824 replica.fetch.min.bytes=1
replica.high.watermark.checkpoint.interval.ms=5000
replica.fetch.wait.max.ms=500
min.insync.replicas=2
Consumer
• Need enough partitions to handle message from producer
• Maximum number of consumer = a multiple of broker(balance
is better)
• max.poll.records=5000
• enable.auto.commit=true
• auto.commit.interval.ms=5000
• fetch.max.wait.ms=500
• fetch.min.bytes=1
• keep small Batch size in our .net client(for real time consumer
data)
JVM
• Avoid out of memory
• Avoid high frequency trigger GC
-Xmx8g –Xms8g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -
XX:MaxMetaspaceFreeRatio=80 -XX:MinMetaspaceFreeRatio=50 -
XX:G1HeapRegionSize=16M -XX:InitiatingHeapOccupancyPercent=35
-Xms: Set initial Java heap size
-Xmx: Set maximum Java heap size
+UseG1GC: Enable G1 GC
MaxGCPauseMillis: Set maximum pause
MaxMetaspaceFreeRatio: Set maximun metaspace free ratio
MinMetaspaceFreeRatio: Set minimun metaspace free ratio
G1HeapRegionSize: Adjust G1 region on each heap
InitiatingHeapOccupancyPercent: initial Java heap occupancy threshold
Q & A
Reference
• https://kafka.apache.org/
• https://github.com/apache/kafka
• http://www.oracle.com/technetwork/articles/java/g1gc-
1984535.html
• https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cn
f_jvmgc.htm
• RiCo’s blog

More Related Content

PDF
Real Time Test Data with Grafana
PDF
Grafana 7.0
PDF
Grafana introduction
PDF
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
PPTX
Monitoring in a scalable world
PPTX
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
PDF
Prometheus loves Grafana
PPTX
InfluxDb
Real Time Test Data with Grafana
Grafana 7.0
Grafana introduction
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
Monitoring in a scalable world
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Prometheus loves Grafana
InfluxDb

What's hot (20)

PPTX
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
PDF
Best Practices for Scaling an InfluxEnterprise Cluster
PDF
InfluxDB 2.0: Dashboarding 101 by David G. Simmons
PDF
Introduction to InfluxDB and TICK Stack
PDF
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
PPTX
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
PPTX
Robust Stream Processing With Apache Flink
PDF
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
PDF
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
PDF
InfluxDB & Kubernetes
PPTX
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
PPTX
University program - writing an apache apex application
PDF
Intro to Kapacitor for Alerting and Anomaly Detection
PDF
Time Series Database and Tick Stack
PDF
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
PDF
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
PDF
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
PDF
OSDC 2018 - Distributed monitoring
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Best Practices for Scaling an InfluxEnterprise Cluster
InfluxDB 2.0: Dashboarding 101 by David G. Simmons
Introduction to InfluxDB and TICK Stack
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
Why Architecting for Disaster Recovery is Important for Your Time Series Data...
Robust Stream Processing With Apache Flink
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
InfluxDB & Kubernetes
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
University program - writing an apache apex application
Intro to Kapacitor for Alerting and Anomaly Detection
Time Series Database and Tick Stack
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
OSDC 2018 - Distributed monitoring
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ad

Similar to Kafka practical experience (20)

PPTX
Distributed messaging through Kafka
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
PPTX
Kafka overview v0.1
PPTX
Fundamentals and Architecture of Apache Kafka
PPTX
Kafka basics and best prectices
PPTX
Introduction to Kafka
PDF
Kafka in action - Tech Talk - Paytm
PDF
Developing Real-Time Data Pipelines with Apache Kafka
PDF
Developing Realtime Data Pipelines With Apache Kafka
PPTX
Kafka and ibm event streams basics
DOCX
KAFKA Quickstart
PPTX
Kafka reliability velocity 17
PPT
Apache kafka- Onkar Kadam
PDF
Hello, kafka! (an introduction to apache kafka)
PDF
Kafka Deep Dive
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
PPTX
Kafka 101
PPTX
Kafka101
PDF
Reliability Guarantees for Apache Kafka
PPTX
Apache Kafka Reliability
Distributed messaging through Kafka
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Kafka overview v0.1
Fundamentals and Architecture of Apache Kafka
Kafka basics and best prectices
Introduction to Kafka
Kafka in action - Tech Talk - Paytm
Developing Real-Time Data Pipelines with Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Kafka and ibm event streams basics
KAFKA Quickstart
Kafka reliability velocity 17
Apache kafka- Onkar Kadam
Hello, kafka! (an introduction to apache kafka)
Kafka Deep Dive
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Kafka 101
Kafka101
Reliability Guarantees for Apache Kafka
Apache Kafka Reliability
Ad

More from Rico Chen (19)

PDF
Csharp基礎程式設計課程...........................
PPTX
VS2022入門................................
PPTX
SSIS2022入門..............................
PDF
SQL-PASS-Summit-2023-善用SQLServer2022輕鬆完成應用需求Rico.pdf
PDF
給開發人員的資料庫效能建議
PDF
SQL Server全集中實戰效能調校指引-第三章部分試讀
PDF
SQL Server全集中實戰效能調校指引-第二章部分試讀
PDF
SQL Server全集中實戰效能調校指引-第一章部分試讀
PDF
Fast build a recommendation system though sql server2017
PDF
大型Sql server zero down time 解決方案
PDF
Sql2017 in memory oltp for developers
PPTX
Kafka cluster best practices
PPTX
Automatic databasemigrationbyrico.chen
PDF
Query store查詢調校新利器
PDF
Sql server 2014 新功能探索
PDF
進擊的Sql2016 in memory oltp rico
PDF
搶救資料庫效能大作戰
PDF
查詢調校不求人
PPT
20120324 sql server 2012新特性by_rico
Csharp基礎程式設計課程...........................
VS2022入門................................
SSIS2022入門..............................
SQL-PASS-Summit-2023-善用SQLServer2022輕鬆完成應用需求Rico.pdf
給開發人員的資料庫效能建議
SQL Server全集中實戰效能調校指引-第三章部分試讀
SQL Server全集中實戰效能調校指引-第二章部分試讀
SQL Server全集中實戰效能調校指引-第一章部分試讀
Fast build a recommendation system though sql server2017
大型Sql server zero down time 解決方案
Sql2017 in memory oltp for developers
Kafka cluster best practices
Automatic databasemigrationbyrico.chen
Query store查詢調校新利器
Sql server 2014 新功能探索
進擊的Sql2016 in memory oltp rico
搶救資料庫效能大作戰
查詢調校不求人
20120324 sql server 2012新特性by_rico

Recently uploaded (20)

PPTX
Geodesy 1.pptx...............................................
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Welding lecture in detail for understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
UNIT 4 Total Quality Management .pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
PPT on Performance Review to get promotions
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Geodesy 1.pptx...............................................
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Welding lecture in detail for understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
OOP with Java - Java Introduction (Basics)
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Operating System & Kernel Study Guide-1 - converted.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
UNIT 4 Total Quality Management .pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT on Performance Review to get promotions
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Foundation to blockchain - A guide to Blockchain Tech
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx

Kafka practical experience

  • 2. Agenda ◎Kafka Overview ◎AFT Kafka Architecture .Net Client -Pub/Sub Key Terms Introduction Message Delivery ◎Monitor Kafka Architecture ◎Kafka Performance Tuning Producer Broker Consumer JVM
  • 3. Kafka Overview • High-performance distributed streaming platform • Popular project on githup: star=6389 , fork=3909 • Simple installation、Easy scale-out、More resource(Linux) • High availability、High failover、High reliability、Auto balance、Message persistence.. • Use Cases:Log Aggregation、Metrics、Event Sourcing、 Messaging… • Power by: LinkedIn、airbnb、Mozilla、Twitter、LINE、 skyscanner、trivago、Hotel.com、PayPal、Uber、 Yahoo…
  • 4. AFT Kafka Architecture Kafka cluster(P2P) broker1 broker2 broker3 zookeeper cluster(Master-Salve) node1 node2 node3 UGS API UGS WEB UGS WinSvr Producers Consumers Serialization publish message(Batch、 Fire and forget) Middleware Consumer group Logger Services Logger Services Logger Services deserialization subscribe message (message set、 async) Topic’s configuration、broker status、 Cluster membership、controller process、 Coordinator process Message(binary) queue、 partition、offset manager、 Leader cache、topic、 ReplicaManager、 GroupCoordinator(rebalance) Replicat e Replicat e TCP TCP Heartbeat
  • 6. Key Terms Introduction • Broker: MQ process(Minimum unit in kafka cluster) • Topic: Category of message(data is store in) • Producer: push message to broker(write data) • Consumer: pull message from broker(read data) • ConsumerGroup: provide tolerance、 scalability、parallel for Consumer • Partition: provide tolerance、 scalability、parallel • Offset: Message position on each partition
  • 7. Message Delivery At most once: Messages may be lost but are never redelivered At least once: Messages are never lost but may be redelivered Exactly once: Each message is delivered once and only once(0.11.x) Messages sent by a producer to a particular topic partition will be appended in the order they are sent A consumer instance sees records in the order they are stored in the log. Tolerate up to N-1 server failures.(depends replication factors)
  • 8. Monitor Kafka Architecture Telegraf http (every 10 sec) Influxdb 2.Result Grafana JMX 1.QL http (every 10 sec) Kafka- Manager Kafka Eagle Mysql 2.Result TCP 1.Collec t 2.Store TCP
  • 9. High level architecture blueprint UGS Platform /Producer Logger Services /Consumer Channel SQL Server Kafka eagle /Consumer
  • 10. Kafka Performance Tuning • Producer • Broker • Consumer • JVM
  • 11. Producer • Load balancing(sends data directly to the broker that is the leader for the partition) • Acks=0 producer no wait any acknowledgment from the broker at all. Lowest latency at the cost of durability but high data lost. • Acks=1 producer gets an acknowledgment after the leader wrote the record to its local log, but will respond without awaiting full acknowledgement from all followers. Maybe follower will be lost data if leader commit after. • Acks=-1 producer gets all acknowledgment after all in-sync replicas has received the data. Strong guarantee data not be lost. • batch.size=100 ,net client • send.buffer.bytes=100*1024 • producer.type=async • compression.type=none • max.in.flight.requests.per.connection=3 Note: min.insync.replicas>=2 ACKs Throughput Latency Durability 0 High Low No Gurantee 1 Medium Medium Leader -1 Low High ISR
  • 12. Broker • More partition = more concurrent process = more memory = more io access =increase throughput= increase latency (brokers have to distribution on each partition) P.S single topic less than 1024 partitions • Number of Factors = two brokers at least num.io.threads=8 num.network.threads=3 background.threads=10 queued.max.requests=500 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 socket.send.buffer.bytes=102400 num.recovery.threads.per.data.dir=2 Log.retention.hours=24 log.flush.interval.messages=10000 log.flush.interval.ms=1000 log.cleanup.policy=delete log.cleaner.enable=true log.cleaner.threads=1 log.cleaner.backoff.ms=30000 log.segment.bytes=1073741824 replica.fetch.min.bytes=1 replica.high.watermark.checkpoint.interval.ms=5000 replica.fetch.wait.max.ms=500 min.insync.replicas=2
  • 13. Consumer • Need enough partitions to handle message from producer • Maximum number of consumer = a multiple of broker(balance is better) • max.poll.records=5000 • enable.auto.commit=true • auto.commit.interval.ms=5000 • fetch.max.wait.ms=500 • fetch.min.bytes=1 • keep small Batch size in our .net client(for real time consumer data)
  • 14. JVM • Avoid out of memory • Avoid high frequency trigger GC -Xmx8g –Xms8g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 - XX:MaxMetaspaceFreeRatio=80 -XX:MinMetaspaceFreeRatio=50 - XX:G1HeapRegionSize=16M -XX:InitiatingHeapOccupancyPercent=35 -Xms: Set initial Java heap size -Xmx: Set maximum Java heap size +UseG1GC: Enable G1 GC MaxGCPauseMillis: Set maximum pause MaxMetaspaceFreeRatio: Set maximun metaspace free ratio MinMetaspaceFreeRatio: Set minimun metaspace free ratio G1HeapRegionSize: Adjust G1 region on each heap InitiatingHeapOccupancyPercent: initial Java heap occupancy threshold
  • 15. Q & A
  • 16. Reference • https://kafka.apache.org/ • https://github.com/apache/kafka • http://www.oracle.com/technetwork/articles/java/g1gc- 1984535.html • https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cn f_jvmgc.htm • RiCo’s blog