SlideShare a Scribd company logo
Building a company-wide data
pipeline upon Apache Kafka -
engineering for 150 billion
messages per day
Yuto Kawamura

LINE Corp
Speaker introduction
• Yuto Kawamura

• Senior software engineer of
LINE server development

• Work at Tokyo office

• Apache Kafka contributor

• Joined: Apr, 2015 (about 3
years)
About LINE
•Messaging service 

•Over 200 million global monthly active users
1
in countries with top
market share like Japan, Taiwan and Thailand

•Many family services

•News 

•Music

•LIVE (Video streaming) 

1
As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia. 

Agenda
• Introducing LINE server

• Data pipeline w/ Apache Kafka
LINE Server Engineering is
about …
• Scalability

• Many users, many requests, many data

• Reliability

• LINE already is a communication infra
in countries

Scale metrics: message
delivery
LINE Server
25 billion /day
(API call: 80 billion
/ day)
Scale metric: Accumulated
data (for analysis)
40PB
Messaging System
Architecture Overview
LINE Apps
LEGY JP
LEGY DE
LEGY SG
Thrift RPC/HTTP
talk-server
Distributed Data Store
Distributed async
task processing
LEGY
• LINE Event Delivery Gateway

• API Gateway/Reverse Proxy

• Written in Erlang

• Features focused on needs of implementing a messaging
service

• e.g, Zero latency code hot swapping w/o closing client
connections
talk-server
• Java based web application server

• Implements most of messaging functionality + some other
features

• Java8 + Spring + Thrift RPC + Tomcat8
Datastore with Redis and
HBase
• LINE’s hybrid datastore =
Redis(in-memory DB, home-
brew clustering) +
HBase(persistent distributed
key-value store)

• Cascading failure handling

• Async write from background
task processor

• Data correction batch
Primary/
Backup
talk-server
Cache/
Primary
Dual write
Message Delivery
LEGY
LEGY
talk-server
Storage
1. Find nearest LEGY
2. sendMessage(“Bob”, “Hello!”)
3. Proxy request
4. Write to storage
talk-server
X. fetchOps()
6. Proxy request
7. Read message
8. Return fetchOps() with message
5. Find LEGY Bob is connecting,
Notify message arrival
Alice
Bob
There’re a lot of internal communication
processing user’s request
talk-server
Threat
detection
system
Timeline
Server
Data Analysis
Background
Task
processing
Request
Communication between
internal systems
• Communication for querying, transactional
updates:

• Query authentication/permission

• Synchronous updates
• Communication for data synchronization, update
notification:

• Notify user’s relationship update

• Synchronize data update with another service
talk-server
Auth
Analytics
Another
Service
HTTP/REST/RPC
Apache Kafka
• A distributed streaming platform

• (narrow sense) A distributed persistent message queue
which supports Pub-Sub model

• Built-in load distribution

• Built-in fail-over on both server(broker) and client
How it works
Producer
Brokers
Consumer
Topic
Topic
Consumer
Consumer
Producer
AuthEvent event = AuthEvent.newBuilder()
.setUserId(123)
.setEventType(AuthEventType.REGISTER)
.build();
producer.send(new
ProducerRecord(“events", userId, event));
consumer = new KafkaConsumer("group.id" ->
"group-A");
consumer.subscribe("events");
consumer.poll(100)…
// => Record(key=123, value=...)
Consumer GroupA
Pub-Sub
Brokers
Consumer
Topic
Topic
Consumer
Consumer GroupB
Consumer
Consumer
Records[A, B, C…]
Records[A, B, C…]
• Multiple consumer “groups” can
independently consume a single topic
Example: UserActivityEvent
Scale metric: Events
produced into Kafka
Service Service
Service
Service
Service
Service
150 billion
msgs / day
(3 million msgs / sec)
our Kafka needs to be high-
performant
• Usages sensitive for delivery latency

• Broker’s latency impact throughput as well

• because Kafka topic is queue
… wasn’t a built-in property
• KAFKA-4614 Long GC pause harming broker performance
which is caused by mmap objects created for OffsetIndex

• 99th %ile latency of Produce request: 150 ~ 200ms => 10ms
(x15 ~ x20 faster)

• KAFKA-6051 ReplicaFetcherThread should close the
ReplicaFetcherBlockingSend earlier on shutdown

• Eliminated ~x1000 slower response during restarting broker 

• (unpublished yet) Kafka broker performance degradation when
consumer requests to fetch old data

• x10 ~ x15 speedup for 99th %ile response
Performance Engineering
Kafka
• Application Level:

• Read and understand code

• Patch it to eliminate
bottleneck

• JVM Level:

• JVM profiling

• GC log analysis

• JVM parameters tuning
• OS Level:

• Linux perf

• Delay Accounting

• SystemTap
e.g, Investigating slow
sendfile(2)
• SystemTap: A kernel dynamic tracing tool

• Inject script to probe in-kernel behavior
stap —e '
...
probe syscall.sendfile {
d[tid()] = gettimeofday_us()
}
probe syscall.sendfile.return {
if (d[tid()]) {
st <<< gettimeofday_us() - d[tid()]
delete d[tid()]
}
}
probe end {
print(@hist_log(st))
}
'
e.g, Investigating slow
sendfile(2)
• Found that slow sendfile is blocking Kafka’s event-loop

• => patch Kafka to eliminate blocking sendfile
stap -e ‘…’
value |---------------------------------------- count
0 | 0
1 | 71
2 |@@@ 6171
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472
32 |@@@ 3418
2048 | 0
4096 | 1
and we contribute it back
More interested?
• Kafka Summit SF 2017

• One Day, One Data Hub, 100
Billion Messages: Kafka at
LINE

• https://youtu.be/
X1zwbmLYPZg

• Google “kafka summit line”
Summary
• Large scale + high reliability = difficult and exciting
Engineering!

• LINE’s architecture will be keep evolving with OSSs

• … and there are more challenges

• Multi-IDC deployment

• more and more performance and reliability
improvements
End of presentation.
Any questions?

More Related Content

PDF
Clovaを支える技術 機械学習配信基盤のご紹介
PPTX
Architecture Sustaining LINE Sticker services
PDF
Metrics driven development with dedicated Observability Team
PDF
LINEデリマでのElasticsearchの運用と監視の話
PPTX
Apache Kafka : Monitoring vs Alerting
PDF
Multi-DC Kafka
PDF
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Clovaを支える技術 機械学習配信基盤のご紹介
Architecture Sustaining LINE Sticker services
Metrics driven development with dedicated Observability Team
LINEデリマでのElasticsearchの運用と監視の話
Apache Kafka : Monitoring vs Alerting
Multi-DC Kafka
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style

What's hot (20)

PPTX
SOAP Monitoring
PDF
Introducción a Stream Processing utilizando Kafka Streams
PPTX
Web Analytics using Kafka - August talk w/ Women Who Code
PDF
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
PDF
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
PPTX
Migrating applications to serverless Apache Kafka + KSQL
PDF
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
PPTX
Autonomous workload rebalancing in kafka
PDF
Common issues with Apache Kafka® Producer
PPTX
ONAP on Vagrant
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
PPT
Tale of two streaming frameworks- Apace Storm & Apache Flink
PPTX
Building an Event Bus at Scale
PPTX
[Webinar] AWS Monitoring with Site24x7
PDF
GraphQL - A love story
PPTX
Microsoft Azure and Windows Application monitoring
PPTX
Kafka connect
PPTX
4. introduction to Asp.Net MVC - Part II
PPTX
Fundamentals and Architecture of Apache Kafka
SOAP Monitoring
Introducción a Stream Processing utilizando Kafka Streams
Web Analytics using Kafka - August talk w/ Women Who Code
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Migrating applications to serverless Apache Kafka + KSQL
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Autonomous workload rebalancing in kafka
Common issues with Apache Kafka® Producer
ONAP on Vagrant
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building High-Throughput, Low-Latency Pipelines in Kafka
Tale of two streaming frameworks- Apace Storm & Apache Flink
Building an Event Bus at Scale
[Webinar] AWS Monitoring with Site24x7
GraphQL - A love story
Microsoft Azure and Windows Application monitoring
Kafka connect
4. introduction to Asp.Net MVC - Part II
Fundamentals and Architecture of Apache Kafka
Ad

Similar to Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day (20)

PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
PDF
LINE's messaging service architecture underlying more than 200 million monthl...
PPTX
Distributed Kafka Architecture Taboola Scale
PDF
NoSQL afternoon in Japan kumofs & MessagePack
PDF
NoSQL afternoon in Japan Kumofs & MessagePack
PDF
Keystone - ApacheCon 2016
PPTX
From a kafkaesque story to The Promised Land
PPTX
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
PDF
Making Apache Kafka Even Faster And More Scalable
PDF
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
PPTX
Liveperson DLD 2015
PDF
Flink forward-2017-netflix keystones-paas
PDF
Kinesis vs-kafka-and-kafka-deep-dive
PDF
Music city data Hail Hydrate! from stream to lake
PPT
10135 b 11
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
PDF
Cloud lunch and learn real-time streaming in azure
PDF
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE's messaging service architecture underlying more than 200 million monthl...
Distributed Kafka Architecture Taboola Scale
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Keystone - ApacheCon 2016
From a kafkaesque story to The Promised Land
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
From a Kafkaesque Story to The Promised Land at LivePerson
Making Apache Kafka Even Faster And More Scalable
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
Liveperson DLD 2015
Flink forward-2017-netflix keystones-paas
Kinesis vs-kafka-and-kafka-deep-dive
Music city data Hail Hydrate! from stream to lake
10135 b 11
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Cloud lunch and learn real-time streaming in azure
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
Reducing Microservice Complexity with Kafka and Reactive Streams
Ad

More from LINE Corporation (20)

PDF
JJUG CCC 2018 Fall 懇親会LT
PDF
Reduce dependency on Rx with Kotlin Coroutines
PDF
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
PDF
Use Kotlin scripts and Clova SDK to build your Clova extension
PDF
The Magic of LINE 購物 Testing
PPTX
GA Test Automation
PDF
UI Automation Test with JUnit5
PDF
Feature Detection for UI Testing
PDF
LINE 新星計劃介紹與新創團隊分享
PDF
​LINE 技術合作夥伴與應用分享
PDF
LINE 開發者社群經營與技術推廣
PDF
日本開發者大會短講分享
PDF
LINE Chatbot - 活動報名報到設計分享
PDF
在 LINE 私有雲中使用 Managed Kubernetes
PDF
LINE TODAY高效率的敏捷測試開發技巧
PDF
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
PDF
LINE Things - LINE IoT平台新技術分享
PDF
LINE Pay - 一卡通支付新體驗
PDF
LINE Platform API Update - 打造一個更好的Chatbot服務
PDF
Keynote - ​LINE 的技術策略佈局與跨國產品開發
JJUG CCC 2018 Fall 懇親会LT
Reduce dependency on Rx with Kotlin Coroutines
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Use Kotlin scripts and Clova SDK to build your Clova extension
The Magic of LINE 購物 Testing
GA Test Automation
UI Automation Test with JUnit5
Feature Detection for UI Testing
LINE 新星計劃介紹與新創團隊分享
​LINE 技術合作夥伴與應用分享
LINE 開發者社群經營與技術推廣
日本開發者大會短講分享
LINE Chatbot - 活動報名報到設計分享
在 LINE 私有雲中使用 Managed Kubernetes
LINE TODAY高效率的敏捷測試開發技巧
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE Things - LINE IoT平台新技術分享
LINE Pay - 一卡通支付新體驗
LINE Platform API Update - 打造一個更好的Chatbot服務
Keynote - ​LINE 的技術策略佈局與跨國產品開發

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Electronic commerce courselecture one. Pdf
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

  • 1. Building a company-wide data pipeline upon Apache Kafka - engineering for 150 billion messages per day Yuto Kawamura LINE Corp
  • 2. Speaker introduction • Yuto Kawamura • Senior software engineer of LINE server development • Work at Tokyo office • Apache Kafka contributor • Joined: Apr, 2015 (about 3 years)
  • 3. About LINE •Messaging service •Over 200 million global monthly active users 1 in countries with top market share like Japan, Taiwan and Thailand
 •Many family services •News •Music •LIVE (Video streaming) 
 1 As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia. 

  • 4. Agenda • Introducing LINE server • Data pipeline w/ Apache Kafka
  • 5. LINE Server Engineering is about … • Scalability • Many users, many requests, many data • Reliability • LINE already is a communication infra in countries

  • 6. Scale metrics: message delivery LINE Server 25 billion /day (API call: 80 billion / day)
  • 7. Scale metric: Accumulated data (for analysis) 40PB
  • 8. Messaging System Architecture Overview LINE Apps LEGY JP LEGY DE LEGY SG Thrift RPC/HTTP talk-server Distributed Data Store Distributed async task processing
  • 9. LEGY • LINE Event Delivery Gateway • API Gateway/Reverse Proxy • Written in Erlang • Features focused on needs of implementing a messaging service • e.g, Zero latency code hot swapping w/o closing client connections
  • 10. talk-server • Java based web application server • Implements most of messaging functionality + some other features • Java8 + Spring + Thrift RPC + Tomcat8
  • 11. Datastore with Redis and HBase • LINE’s hybrid datastore = Redis(in-memory DB, home- brew clustering) + HBase(persistent distributed key-value store) • Cascading failure handling • Async write from background task processor • Data correction batch Primary/ Backup talk-server Cache/ Primary Dual write
  • 12. Message Delivery LEGY LEGY talk-server Storage 1. Find nearest LEGY 2. sendMessage(“Bob”, “Hello!”) 3. Proxy request 4. Write to storage talk-server X. fetchOps() 6. Proxy request 7. Read message 8. Return fetchOps() with message 5. Find LEGY Bob is connecting, Notify message arrival Alice Bob
  • 13. There’re a lot of internal communication processing user’s request talk-server Threat detection system Timeline Server Data Analysis Background Task processing Request
  • 14. Communication between internal systems • Communication for querying, transactional updates: • Query authentication/permission • Synchronous updates • Communication for data synchronization, update notification: • Notify user’s relationship update • Synchronize data update with another service talk-server Auth Analytics Another Service HTTP/REST/RPC
  • 15. Apache Kafka • A distributed streaming platform • (narrow sense) A distributed persistent message queue which supports Pub-Sub model • Built-in load distribution • Built-in fail-over on both server(broker) and client
  • 16. How it works Producer Brokers Consumer Topic Topic Consumer Consumer Producer AuthEvent event = AuthEvent.newBuilder() .setUserId(123) .setEventType(AuthEventType.REGISTER) .build(); producer.send(new ProducerRecord(“events", userId, event)); consumer = new KafkaConsumer("group.id" -> "group-A"); consumer.subscribe("events"); consumer.poll(100)… // => Record(key=123, value=...)
  • 17. Consumer GroupA Pub-Sub Brokers Consumer Topic Topic Consumer Consumer GroupB Consumer Consumer Records[A, B, C…] Records[A, B, C…] • Multiple consumer “groups” can independently consume a single topic
  • 19. Scale metric: Events produced into Kafka Service Service Service Service Service Service 150 billion msgs / day (3 million msgs / sec)
  • 20. our Kafka needs to be high- performant • Usages sensitive for delivery latency • Broker’s latency impact throughput as well • because Kafka topic is queue
  • 21. … wasn’t a built-in property • KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex • 99th %ile latency of Produce request: 150 ~ 200ms => 10ms (x15 ~ x20 faster) • KAFKA-6051 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown • Eliminated ~x1000 slower response during restarting broker • (unpublished yet) Kafka broker performance degradation when consumer requests to fetch old data • x10 ~ x15 speedup for 99th %ile response
  • 22. Performance Engineering Kafka • Application Level: • Read and understand code • Patch it to eliminate bottleneck • JVM Level: • JVM profiling • GC log analysis • JVM parameters tuning • OS Level: • Linux perf • Delay Accounting • SystemTap
  • 23. e.g, Investigating slow sendfile(2) • SystemTap: A kernel dynamic tracing tool • Inject script to probe in-kernel behavior stap —e ' ... probe syscall.sendfile { d[tid()] = gettimeofday_us() } probe syscall.sendfile.return { if (d[tid()]) { st <<< gettimeofday_us() - d[tid()] delete d[tid()] } } probe end { print(@hist_log(st)) } '
  • 24. e.g, Investigating slow sendfile(2) • Found that slow sendfile is blocking Kafka’s event-loop • => patch Kafka to eliminate blocking sendfile stap -e ‘…’ value |---------------------------------------- count 0 | 0 1 | 71 2 |@@@ 6171 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472 32 |@@@ 3418 2048 | 0 4096 | 1
  • 25. and we contribute it back
  • 26. More interested? • Kafka Summit SF 2017 • One Day, One Data Hub, 100 Billion Messages: Kafka at LINE • https://youtu.be/ X1zwbmLYPZg • Google “kafka summit line”
  • 27. Summary • Large scale + high reliability = difficult and exciting Engineering! • LINE’s architecture will be keep evolving with OSSs • … and there are more challenges • Multi-IDC deployment • more and more performance and reliability improvements