SlideShare a Scribd company logo
Building	Resilient	
Log	Aggregation	Pipeline	
Using	Elasticsearch and	Kafka
Rafał Kuć @	Sematext Group,	Inc.
Sematext &	I
Logsene
SPM
logs
metrics
Next	30	minutes…
Log	shipping	
- buffers
- protocols
- parsing
Central	buffering
- Kafka
- Redis
Storage	&	Analysis
- Elasticsearch
- Kibana
- Grafana
Log	shipping	architecture
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
Focus:	Elasticsearch
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
Elasticsearch	cluster	architecture
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Dedicated	masters	please
client
client
client
data
data
data
data
data
data
master
master
master
discovery.zen.minimum_master_nodes ->	N/2	+	1	master	eligible	nodes
ingest
ingest
ingest
One	big	index	is	a	no-go
Not	scalable	enough	for	time	based	data
One	big	index	is	a	no-go
Indexing	slows	down	with	time
One	big	index	is	a	no-go
Expensive	merges
One	big	index	is	a	no-go
Delete by	query needed	for	data	retention
One	big	index	is	a	no-go
Not	scalable	enough	for	time	based	data
Indexing	slows	down	with	time
Expensive	merges
Delete by	query needed	for	data	retention
Daily	indices	are	a	good	start
2016.11.18 2016.11.19 2016.11.22 2016.11.23.	.	.
Indexing is	faster for	smaller	indices
Deletes are	cheap	
Search can	be	performed	on	indices	that	are	needed
Static indices	are	cache	friendly
indexing
most	searches
Daily	indices	are	a	good	start
2016.11.18 2016.11.19 2016.11.22 2016.11.23.	.	.
Indexing is	faster for	smaller	indices
Deletes are	cheap	
Search can	be	performed	on	indices	that	are	needed
Static indices	are	cache	friendly
indexing
most	searches
We	delete whole	indices
Daily	indices	are	sub-optimal
black	
friday
saturday
sunday
load
is	not
even
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
logs_02
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
logs_02
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01 logs_02
indexing
logs_N.	.	.
around	5	– 10GB	per	shard	on	AWS
Slice	using	size
Predictable searching	and	indexing	performance
Better indices	balancing
Fewer	shards
Easier handling of	spiky	loads
Less	costs	because	of	better hardware	utilization
Proper	Elasticsearch	configuration
Keep	index.refresh_interval at	maximum	possible	value
1	sec	->	100%,	5	sec	->	125%,	30	sec	-> 175%	
You	can	loosen up	merges
- possible	because	of	heavy	aggregation	use
- segments_per_tier ->	higher
- max_merge_at_once->	higher
- max_merged_segment ->	lower
All	prefixed	with	index.merge.policy
} higher	indexing	
throughput
Proper	Elasticsearch	configuration
Index only	needed	fields
Use	doc	values
Do	not	index	_source
Do	not	store	_all
Optimization	time
We	can	optimize data	nodes	for	time	based	data
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Hot	– cold	architecture
ES	hot ES	cold ES	cold
-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold
Hot	– cold	architecture
logs_2016.11.22
ES	hot ES	cold ES	cold
-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold
curl	-XPUT	localhost:9200/logs_2016.11.22 -d	'{	
"settings"	:	{		
"index.routing.allocation.exclude.tag"	:	"cold",	
"index.routing.allocation.include.tag"	:	"hot"	
}
}'
Hot	– cold	architecture
logs_2016.11.22
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2016.11.22
logs_2016.11.23
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2016.11.22
logs_2016.11.23
ES	hot ES	cold ES	cold
indexing
move	index	after	day	ends
curl	-XPUT	localhost:9200/logs_2016.11.22/_settings	-d	'{
"index.routing.allocation.exclude.tag"	:	"hot",
"index.routing.allocation.include.tag”	:	"cold"
}'
Hot	– cold	architecture
logs_2016.11.23 logs_2016.11.22
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2016.11.23
logs_2016.11.24
logs_2016.11.22
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2016.11.23
logs_2016.11.24
logs_2016.11.22
ES	hot ES	cold ES	cold
indexing
move	index	after	day	ends
Hot	– cold	architecture
logs_2016.11.24 logs_2016.11.22 logs_2016.11.23
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
Hot	ES	Tier
Good	CPU
Lots	of	I/O
Cold	ES	Tier
Memory	bound
Decent	I/O
ES	cold
Cold	ES	Tier
Memory	bound
Decent	I/O
Hot	– cold	architecture	summary
ES	cold
Optimize	costs – different	hardware	for	different	tier
Performance – use	case	optimized	hardware
Isolation – long	running	searches	don’t	affect	indexing
Elasticsearch	client node	needs
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Elasticsearch	client node	needs
No	data	=	no	IOPS
Large	query	throughput	=	high	CPU	usage
Lots	of	results	=	high	memory usage
Lots	of	concurrent	queries	=	higher	resources utilization
Elasticsearch	ingest node	needs
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Elasticsearch	ingest	node	needs
No	data	=	no	IOPS
Large	index	throughput	=	high	CPU	&	memory	usage
Complicated	rules	=	high	CPU	usage
Larger	documents	=	more	resources utilization
Elasticsearch	master node	needs
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Elasticsearch	ingest	node	needs
No	data	=	no	IOPS
Large	number	of	indices	=	high	CPU	&	memory	usage
Complicated	mappings	=	high	memory	usage
Daily	indices	=	spikes	in	resources utilization
Focus:	Centralized	Buffer
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
Why	Apache	Kafka?
Fast &	easy	to	use
Easy	to	scale
Fault	tolerant	and	highly	available
Supports	streaming
Works	in	publish/subscribe mode
Kafka	architecture
ZooKeeper
ZooKeeper
ZooKeeper
Kafka
Kafka
KafkaKafka
Kafka	&	topics
security_logs access_logs
app1_logs app2_logs
Kafka	stores	data
in topics	
written	on	disk
Kafka	&	topics	&	partitions	&	replicas
logs
partition	2
logs
partition	1
logs
partition	3
logs
partition	4
logs		replica
partition	2
logs		replica
partition	1
logs		replica
partition	3
logs		replica
partition	4
Scaling	Kafka
logs
partition	1
Scaling	Kafka
logs
partition	1
logs
partition	2
logs
partition	3
logs
partition		4
Scaling	Kafka
logs
partition	1
logs
partition	2
logs
partition	3
logs
partition		4
logs
partition	5
logs
partition	6
logs
partition	7
logs
partition	8
logs
partition	9
logs
partition	10
logs
partition	11
logs
partition	12
logs
partition	13
logs
partition	14
logs
partition	15
logs
partition	16
Things	to	remember	when	using	Kafka
Scales by	adding more	partitions not	threads
The	more	IOPS the	better
Keep	the	#	of	consumers	equal	to	#	of	partitions
Replicas used	for	HA and	FT only
Offsets stored	per	consumer	– multiple	destinations
easily	possible
Focus:	Shipper
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
What	about	the	shipper?
logs
Centralized
Buffer
Which	shipper	to	use?
Which	protocol should	be	used
What	about	the	buffering
Log	to	JSON or	parse and	how
Buffers
performance & availability
batches	&	threads when	central	buffer	is	gone
Buffer	types
Disk ||	memory ||	combined	hybrid approach
On	source	||	centralized
App
Buffer
App
Buffer
file	or	local	log	shipper
easy	scaling	– fewer	moving	parts
often	with	the	use	of	lightweight	shipper
App
App
Kafka /	Redis /	Logstash /	etc…
one	place	for	all	changes
extra	features	made	easy	(like	TTL)
ES
ES
Buffers	Summary
Simple Reliable
App
Buffer
App
Buffer
ES
App
App
ES
Protocols
UDP	– fast,	cool	for	the	application,	not	reliable
TCP – reliable	(almost) application	gets	ACK when	written to	buffer
Application level	ACKs	may	be	needed
HTTP
RELP
Beats
Kafka
Logstash,	rsyslog,	Fluentd
Logstash,	rsyslog
Logstash,	Filebeat
Logstash,	rsyslog,	Filebeat,	Fluentd
Choosing	the	shipper
application
rsyslog Elasticsearch
http
socket
memory	&	disk	
assisted	queues
Choosing	the	shipper
application
rsyslog Elasticsearch
http
socket
memory	&	disk	
assisted	queues
application
file
rsyslog
filebeat
consumer
What	about	OS?
Say	NO to	swap
Set	the	right	disk	scheduler
CFQ for	spinning	disks
deadline for	SSD
Use	proper	mount options	for	ext4
noatime
nodirtime
data=writeback,	nobarier
For	bare	metal
check	CPU	governor
disable	transparent	huge	pages
/proc/sys/vm/nr_hugepages=0
We	are	engineers!
We	develop DevOps	tools!
We	are	DevOps people!
We	do	fun	stuff	;)
http://sematext.com/jobs
Thank	you	for	listening!	Get	in	touch!
Rafał
rafal.kuc@sematext.com
@kucrafal
http://sematext.com
@sematext http://sematext.com/jobs
Come	talk	to	us
at	the	booth

More Related Content

PPTX
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
PDF
Introducing log analysis to your organization
PDF
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
PDF
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
PDF
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
PPTX
Feedback on Big Compute & HPC on Windows Azure
PDF
Querying Data Pipeline with AWS Athena
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
Introducing log analysis to your organization
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Feedback on Big Compute & HPC on Windows Azure
Querying Data Pipeline with AWS Athena

What's hot (19)

PDF
Big Data Tools in AWS
PPTX
Deploy data analysis pipeline with mesos and docker
PDF
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
PDF
A New Chapter of Data Processing with CDK
PDF
Scalable and Reliable Logging at Pinterest
PDF
Lambda Architecture Using SQL
PDF
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
PDF
Airstream: Spark Streaming At Airbnb
PDF
Spark Working Environment in Windows OS
PDF
The Data Mullet: From all SQL to No SQL back to Some SQL
PDF
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
PDF
Top 5 mistakes when writing Streaming applications
PPTX
DataEngConf SF16 - High cardinality time series search
PDF
Spark Internals Training | Apache Spark | Spark | Anika Technologies
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PDF
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Lambda architecture
Big Data Tools in AWS
Deploy data analysis pipeline with mesos and docker
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
A New Chapter of Data Processing with CDK
Scalable and Reliable Logging at Pinterest
Lambda Architecture Using SQL
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Airstream: Spark Streaming At Airbnb
Spark Working Environment in Windows OS
The Data Mullet: From all SQL to No SQL back to Some SQL
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Top 5 mistakes when writing Streaming applications
DataEngConf SF16 - High cardinality time series search
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Next CERN Accelerator Logging Service with Jakub Wozniak
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Lambda architecture
Ad

Similar to DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using Elasticsearch and Kafka (20)

PPTX
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
PPTX
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
PPTX
Centralized log-management-with-elastic-stack
PDF
IBM Cloud Day January 2021 Data Lake Deep Dive
PDF
Big Telco Real-Time Network Analytics
PDF
Big Telco - Yousun Jeong
PDF
Kafka & Hadoop in Rakuten
PDF
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
PPTX
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
PDF
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
PDF
Spark and Couchbase: Augmenting the Operational Database with Spark
PDF
Architecting Data in the AWS Ecosystem
PDF
Análisis del roadmap del Elastic Stack
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
PPTX
High performance Spark distribution on PKS by SnappyData
PPTX
High performance Spark distribution on PKS by SnappyData
PDF
Serverless Data Platform
PDF
Elastic Stack Roadmap
PDF
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
PDF
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Centralized log-management-with-elastic-stack
IBM Cloud Day January 2021 Data Lake Deep Dive
Big Telco Real-Time Network Analytics
Big Telco - Yousun Jeong
Kafka & Hadoop in Rakuten
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Spark and Couchbase: Augmenting the Operational Database with Spark
Architecting Data in the AWS Ecosystem
Análisis del roadmap del Elastic Stack
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
Serverless Data Platform
Elastic Stack Roadmap
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf

DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using Elasticsearch and Kafka