SlideShare a Scribd company logo
Introducing	Log	Analysis
To	Your	Organization
Rafał	Kuć
Sematext Und	Mich
logs
metrics
cloud
&
Next	60	minutes…
Log	shipping	
- buffers
- protocols
- parsing
Central	buffering
- Kafka
- Redis
Storage	&	Analysis
- Elasticsearch
- Kibana
- Grafana
Why	&	How?
- Should	I	try?
- Open	source
- Commercial
Why	You	Should	Care
Environments	are	getting	bigger
Why	You	Should	Care
Environments	are	getting	bigger
Containers	are	everywhere
Why	You	Should	Care
Environments	are	getting	bigger
Containers	are	everywhere
Infrastructure	work	gets	automated
Created	by	Kjpargeter - Freepik.com
Why	You	Should	Care
Environments	are	getting	bigger
Containers	are	everywhere
Infrastructure	work	gets	automated
Logs	&	metrics	at	the	same	place
Why	You	Should	Care
Environments	are	getting	bigger
Containers	are	everywhere
Infrastructure	work	gets	automated
Faster	diagnostics	==	less	money	spent
Logs	&	metrics	at	the	same	place
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
cloud
Going	For	Commercial	Solution
Icon	made	by	Smashicons from www.flaticon.com
Going	Open-Source
Going	Open-Source
Going	Open-Source
Going	Open-Source
Going	Open-Source	– Today’s	Focus
Log	shipping	architecture
File
Log	shipping	architecture
File Shipper
Log	shipping	architecture
File Shipper
File Shipper
File Shipper
Log	shipping	architecture
File Shipper
File Shipper
File Shipper
Centralized
Buffer
Log	shipping	architecture
File Shipper
File Shipper
File Shipper
Centralized
Buffer
data
Log	shipping	architecture
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
Focus:	Shipper
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
What	about	the	shipper?
logs
Centralized
Buffer
Which	shipper	to	use?
Which	protocol should	be	used
What	about	the	buffering
Log	to	JSON or	parse and	how
Buffers
performance & availability
batches	&	threads when	central	buffer	is	gone
Buffer	types
Disk ||	memory ||	combined	hybrid approach
On	source	||	centralized
App
Buffer
App
Buffer
file	or	local	log	shipper
easy	scaling	– fewer	moving	parts
often	with	the	use	of	lightweight	shipper
App
App
Kafka /	Redis /	Logstash /	etc…
one	place	for	all	changes
extra	features	made	easy	(like	TTL)
ES
ES
Buffers	Summary
Simple Reliable
App
Buffer
App
Buffer
ES
App
App
ES
Protocols
UDP	– fast,	cool	for	the	application,	not	reliable
TCP – reliable	(almost) application	gets	ACK when	written to	buffer
Application level	ACKs	may	be	needed
HTTP
RELP
Beats
Kafka
Logstash,	rsyslog,	Fluentd
Logstash,	rsyslog
Logstash,	Filebeat
Logstash,	rsyslog,	Filebeat,	Fluentd
Choosing	the	shipper
application
rsyslog Elasticsearch
http
socket
memory	&	disk	
assisted	queues
Final	Architecture
application
rsyslog Elasticsearch
http
socket
memory	&	disk	
assisted	queues
application
file
rsyslog
Logagent
filebeat
consumer
Final	Architecture
application
rsyslog Elasticsearch
http
socket
memory	&	disk	
assisted	queues
application
file
rsyslog
Logagent
filebeat
consumer
Parsing	Done	Here
Focus:	Centralized	Buffer
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
Why	Apache	Kafka?
Fast &	easy	to	use
Easy	to	scale
Fault	tolerant	and	highly	available
Supports	streaming
Works	in	publish/subscribe mode
Kafka	architecture
ZooKeeper
ZooKeeper
ZooKeeper
Kafka
Kafka
KafkaKafka
Kafka	&	topics
security_logs access_logs
app1_logs app2_logs
Kafka	stores	data
in topics	
written	on	disk
Kafka	&	topics	&	partitions	&	replicas
logs
partition	2
logs
partition	1
logs
partition	3
logs
partition	4
logs		replica
partition	2
logs		replica
partition	1
logs		replica
partition	3
logs		replica
partition	4
Scaling	Kafka
logs
partition	1
Scaling	Kafka
logs
partition	1
logs
partition	2
logs
partition	3
logs
partition		4
Scaling	Kafka
logs
partition	1
logs
partition	2
logs
partition	3
logs
partition		4
logs
partition	5
logs
partition	6
logs
partition	7
logs
partition	8
logs
partition	9
logs
partition	10
logs
partition	11
logs
partition	12
logs
partition	13
logs
partition	14
logs
partition	15
logs
partition	16
Things	to	remember	when	using	Kafka
Scales by	adding more	partitions not	threads
The	more	IOPS the	better
Keep	the	#	of	consumers	equal	to	#	of	partitions
Replicas used	for	HA and	FT only
Offsets stored	per	consumer	– multiple	destinations
easily	possible
Focus:	Elasticsearch
File Shipper
File Shipper
File Shipper
Centralized
Buffer
ES ES ES
ES ES ES
ES ES ES
data
Elasticsearch	cluster	architecture
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Dedicated	masters	please
client
client
client
data
data
data
data
data
data
master
master
master
discovery.zen.minimum_master_nodes ->	N/2	+	1	master	eligible	nodes
ingest
ingest
ingest
Elasticsearch	– Indices
Index – logical	place	for	data
Elasticsearch	– Indices
Index – logical	place	for	data
Index	– can	be	compared	to	database	in	DB
Elasticsearch	– Indices
Index – logical	place	for	data
Index	– can	be	compared	to	database	in	DB
Index	– built	out	of	one	or	more	shards
Elasticsearch	– Indices
Index – logical	place	for	data
Index	– can	be	compared	to	database	in	DB
Index	– built	out	of	one	or	more	shards
Shard – can	be	spread	among	multiple	nodes
Scaling	Elasticsearch
Logs
Shard1
Scaling	Elasticsearch
Logs
Shard1	
Users
Shard1	
Invoices
Shard1
Scaling	Elasticsearch
Logs
Shard1	
Logs
Shard2	
Logs
Shard3	
Logs
Shard4
Scaling	Elasticsearch
Logs
Shard3	
Logs
Shard2	
Logs
Shard4	
Logs
Shard1
Scaling	Elasticsearch
Logs
Shard1	
Logs
Replica4
Logs
Shard2	
Logs
Replica3
Logs
Shard4	
Logs
Replica1
Logs
Shard3	
Logs
Replica2
One	big	index	is	a	no-go
Not	scalable	enough	for	time	based	data
One	big	index	is	a	no-go
Not	scalable	enough	for	time	based	data
Indexing	slows	down	with	time
One	big	index	is	a	no-go
Not	scalable	enough	for	time	based	data
Indexing	slows	down	with	time
Expensive	merges
One	big	index	is	a	no-go
Not	scalable	enough	for	time	based	data
Indexing	slows	down	with	time
Expensive	merges
Delete by	query needed	for	data	retention
Daily	indices	are	a	good	start
2017.11.16 2017.11.17 2017.11.20 2017.11.21.	.	.
Indexing is	faster for	smaller	indices
Deletes are	cheap	
Search can	be	performed	on	indices	that	are	needed
Static indices	are	cache	friendly
indexing
most	searches
Daily	indices	are	a	good	start
2017.11.16 2017.11.17 2017.11.20 2017.11.21.	.	.
Indexing is	faster for	smaller	indices
Deletes are	cheap	
Search can	be	performed	on	indices	that	are	needed
Static indices	are	cache	friendly
indexing
most	searches
We	delete whole	indices
Daily	indices	are	sub-optimal
black	
friday
saturday
sunday
load
is	not
even
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
logs_02
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01
indexing
logs_02
around	5	– 10GB	per	shard	on	AWS
Size	based	indices	are	optimal
size	limit	for	indices
logs_01 logs_02
indexing
logs_N.	.	.
around	5	– 10GB	per	shard	on	AWS
Slice	using	size
Predictable searching	and	indexing	performance
Better indices	balancing
Fewer	shards
Easier handling of	spiky	loads
Less	costs	because	of	better hardware	utilization
Proper	Elasticsearch	configuration
Keep	index.refresh_interval at	maximum	possible	value
1	sec	->	100%,	5	sec	->	125%,	30	sec	-> 175%	
You	can	loosen up	merges
- possible	because	of	heavy	aggregation	use
- segments_per_tier ->	higher
- max_merge_at_once->	higher
- max_merged_segment ->	lower
All	prefixed	with	
index.merge.policy
} higher	indexing	
throughput
Proper	Elasticsearch	configuration
Index only	needed	fields
Use	doc	values
Do	not	index	_source
Do	not	store	_all
Optimization	time
We	can	optimize data	nodes	for	time	based	data
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Hot	– cold	architecture
ES	hot ES	cold ES	cold
-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold
Hot	– cold	architecture
logs_2017.11.22
ES	hot ES	cold ES	cold
-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold
curl	-XPUT	localhost:9200/logs_2017.11.22 -d	'{	
"settings"	:	{		
"index.routing.allocation.exclude.tag"	:	"cold",	
"index.routing.allocation.include.tag"	:	"hot"	
}
}'
Hot	– cold	architecture
logs_2017.11.22
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2017.11.22
logs_2017.11.23
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2017.11.22
logs_2017.11.23
ES	hot ES	cold ES	cold
indexing
move	index	after	day	ends
curl	-XPUT	localhost:9200/logs_2017.11.22/_settings	-d	'{
"index.routing.allocation.exclude.tag"	:	"hot",
"index.routing.allocation.include.tag”	:	"cold"
}'
Hot	– cold	architecture
logs_2017.11.23 logs_2017.11.22
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2017.11.23
logs_2017.11.24
logs_2017.11.22
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
logs_2017.11.23
logs_2017.11.24
logs_2017.11.22
ES	hot ES	cold ES	cold
indexing
move	index	after	day	ends
Hot	– cold	architecture
logs_2017.11.24 logs_2017.11.22 logs_2017.11.23
ES	hot ES	cold ES	cold
indexing
Hot	– cold	architecture
Hot	ES	Tier
Good	CPU
Lots	of	I/O
Cold	ES	Tier
Memory	bound
Decent	I/O
ES	cold
Cold	ES	Tier
Memory	bound
Decent	I/O
Hot	– cold	architecture	summary
ES	cold
Optimize	costs – different	hardware	for	different	tier
Performance – use	case	optimized	hardware
Isolation – long	running	searches	don’t	affect	indexing
Elasticsearch	client node	needs
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Elasticsearch	client node	needs
No	data	=	no	IOPS
Large	query	throughput	=	high	CPU	usage
Lots	of	results	=	high	memory usage
Lots	of	concurrent	queries	=	higher	resources utilization
Elasticsearch	ingest node	needs
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Elasticsearch	ingest	node	needs
No	data	=	no	IOPS
Large	index	throughput	=	high	CPU	&	memory	usage
Complicated	rules	=	high	CPU	usage
Larger	documents	=	more	resources utilization
Elasticsearch	master node	needs
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
Elasticsearch	ingest	node	needs
No	data	=	no	IOPS
Large	number	of	indices	=	high	CPU	&	memory	usage
Complicated	mappings	=	high	memory	usage
Daily	indices	=	spikes	in	resources utilization
What	about	OS?
Say	NO to	swap
Set	the	right	disk	scheduler
CFQ for	spinning	disks
deadline for	SSD
Use	proper	mount options	for	ext4
noatime
nodirtime
data=writeback,	nobarier
For	bare	metal
check	CPU	governor
disable	transparent	huge	pages
/proc/sys/vm/nr_hugepages=0
Analysis	- Kibana
Analysis	- Kibana
Analysis	- Kibana
Analysis	- Kibana
Analysis	- Kibana
Analysis	- Kibana
Analysis	- Kibana
Analysis	- Grafana
Analysis	- Grafana
Analysis	- Grafana
Where	To	Go	From	Here?
We	are	engineers!
We	develop DevOps	tools!
We	are	DevOps people!
We	do	fun	stuff	;)
http://sematext.com/jobs
Thank	you	for	listening!	Get	in	touch!
Rafał
rafal.kuc@sematext.com
@kucrafal
http://sematext.com
@sematext http://sematext.com/jobs

More Related Content

PPTX
Log analysis using Logstash,ElasticSearch and Kibana
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Log analytics with ELK stack
PDF
A Deep Dive into Kafka Controller
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Freezer - Vietnam OpenStack Technical Meetup #12
Log analysis using Logstash,ElasticSearch and Kibana
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Log analytics with ELK stack
A Deep Dive into Kafka Controller
Performance Tuning RocksDB for Kafka Streams’ State Stores
Freezer - Vietnam OpenStack Technical Meetup #12

What's hot (20)

PDF
Cisco Meraki- Simplifying IT
PPTX
Introduction to Kafka Cruise Control
PDF
Container Performance Analysis
PDF
No Easy Breach DerbyCon 2016
PPTX
Log analysis using elk
PDF
Kafka Security 101 and Real-World Tips
PPTX
Open source SOC Tools for Home-Lab
PDF
Elastic SIEM (Endpoint Security)
PDF
Room 3 - 6 - Nguyễn Văn Thắng & Dzung Nguyen - Ứng dụng openzfs làm lưu trữ t...
PDF
CloudStack - Top 5 Technical Issues and Troubleshooting
PDF
Long live to CMAN!
PDF
Room 1 - 1 - Benoit TELLIER - On premise email inbound service with Apache James
PPTX
Azure key vault
PPTX
Apache Kafka Best Practices
PPTX
A visual introduction to Apache Kafka
PDF
Log analysis with the elk stack
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
PPTX
Fleet and elastic agent
PDF
ELK in Security Analytics
PDF
How Netflix Tunes EC2 Instances for Performance
Cisco Meraki- Simplifying IT
Introduction to Kafka Cruise Control
Container Performance Analysis
No Easy Breach DerbyCon 2016
Log analysis using elk
Kafka Security 101 and Real-World Tips
Open source SOC Tools for Home-Lab
Elastic SIEM (Endpoint Security)
Room 3 - 6 - Nguyễn Văn Thắng & Dzung Nguyen - Ứng dụng openzfs làm lưu trữ t...
CloudStack - Top 5 Technical Issues and Troubleshooting
Long live to CMAN!
Room 1 - 1 - Benoit TELLIER - On premise email inbound service with Apache James
Azure key vault
Apache Kafka Best Practices
A visual introduction to Apache Kafka
Log analysis with the elk stack
Apache Spark on K8S Best Practice and Performance in the Cloud
Fleet and elastic agent
ELK in Security Analytics
How Netflix Tunes EC2 Instances for Performance
Ad

Similar to Introducing log analysis to your organization (20)

PPTX
Matt Franklin - Apache Software (Geekfest)
PDF
The Scout24 Data Platform (A Technical Deep Dive)
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
AWS Reinvent Recap 2018
PPTX
Thing you didn't know you could do in Spark
PDF
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
PPTX
re:Invent Recap-AWSMeetup
PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
PDF
Cloud Native Data Pipelines (DataEngConf SF 2017)
PPTX
AWS Meetup Fort Lauderdale Re:invent Recap
PDF
What's new in Elasticsearch v5
PDF
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
PDF
Lambda-B-Gone: In-memory Case Study for Faster, Smarter and Simpler Answers
PDF
Real time serverless data pipelines on AWS
PPTX
Data Streaming with Apache Kafka & MongoDB - EMEA
PPTX
Webinar: Data Streaming with Apache Kafka & MongoDB
PDF
Migrando aplicaciones SAP a AWS
PPTX
Data & analytics challenges in a microservice architecture
PPTX
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
PPTX
Microsoft Azure update
Matt Franklin - Apache Software (Geekfest)
The Scout24 Data Platform (A Technical Deep Dive)
Apache Kafka - Scalable Message-Processing and more !
AWS Reinvent Recap 2018
Thing you didn't know you could do in Spark
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
re:Invent Recap-AWSMeetup
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Cloud Native Data Pipelines (DataEngConf SF 2017)
AWS Meetup Fort Lauderdale Re:invent Recap
What's new in Elasticsearch v5
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
Lambda-B-Gone: In-memory Case Study for Faster, Smarter and Simpler Answers
Real time serverless data pipelines on AWS
Data Streaming with Apache Kafka & MongoDB - EMEA
Webinar: Data Streaming with Apache Kafka & MongoDB
Migrando aplicaciones SAP a AWS
Data & analytics challenges in a microservice architecture
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
Microsoft Azure update
Ad

More from Sematext Group, Inc. (20)

PDF
Tweaking the Base Score: Lucene/Solr Similarities Explained
PDF
OOPs, OOMs, oh my! Containerizing JVM apps
PPTX
Is observability good for your brain?
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
Solr on Docker - the Good, the Bad and the Ugly
PDF
Monitoring and Log Management for
PDF
Introduction to solr
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
How to Run Solr on Docker and Why
PDF
Tuning Solr & Pipeline for Logs
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
PDF
Top Node.js Metrics to Watch
PPT
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
PDF
Docker Logging Webinar
PDF
Docker Monitoring Webinar
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
PDF
Side by Side with Elasticsearch & Solr, Part 2
Tweaking the Base Score: Lucene/Solr Similarities Explained
OOPs, OOMs, oh my! Containerizing JVM apps
Is observability good for your brain?
Solr Search Engine: Optimize Is (Not) Bad for You
Solr on Docker - the Good, the Bad and the Ugly
Monitoring and Log Management for
Introduction to solr
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Elasticsearch for Logs & Metrics - a deep dive
How to Run Solr on Docker and Why
Tuning Solr & Pipeline for Logs
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Top Node.js Metrics to Watch
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Docker Logging Webinar
Docker Monitoring Webinar
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Side by Side with Elasticsearch & Solr, Part 2

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
web development for engineering and engineering
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPT
Drone Technology Electronics components_1
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Welding lecture in detail for understanding
PDF
Well-logging-methods_new................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Construction Project Organization Group 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Geodesy 1.pptx...............................................
Sustainable Sites - Green Building Construction
Foundation to blockchain - A guide to Blockchain Tech
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
web development for engineering and engineering
bas. eng. economics group 4 presentation 1.pptx
OOP with Java - Java Introduction (Basics)
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Lesson 3_Tessellation.pptx finite Mathematics
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
additive manufacturing of ss316l using mig welding
Drone Technology Electronics components_1
Arduino robotics embedded978-1-4302-3184-4.pdf
Welding lecture in detail for understanding
Well-logging-methods_new................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Construction Project Organization Group 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Geodesy 1.pptx...............................................

Introducing log analysis to your organization