SlideShare a Scribd company logo
Nathaniel Braun
Thursday, April 28th, 2016
OpenTSDB for
monitoring @ Criteo
@
2 | Copyright © 2016 Criteo
•Overview of Hadoop @ Criteo
•Our experimental cluster
•Rationale for OpenTSDB
•Stabilizing & scaling OpenTSDB
•OpenTSDB to the rescue in practice
Hitch hiker’s guide to this presentation
Overview of
Hadoop @ Criteo
@
4 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo
Tokyo TY5 – PROD AS
Sunnyvale SV6 – PROD NA
HongKong HK5 – PROD CN
Paris PA4 – PROD / PREPROD
Paris PA3 –PREPROD / EXP
Amsterdam AM5 – PROD
Criteo’s 8 Hadoop clusters – running CDH Community Edition
5 | Copyright © 2016 Criteo
AM5: main production cluster
• In use since 2011
• Running CDH3 initially, CDH4 currently
• 1118 DataNodes
• 13 400+ compute cores
• 39 PB of raw disk storage
• 105 TB of RAM capacity
• 40 TB of data imported every day, mostly through HTTPFS
• 100 000+ jobs run daily
Overview of Hadoop @ Criteo – Production AM5
6 | Copyright © 2016 Criteo
PA4: comparable to AM5, with fewer machines
• Migration done in Q4 2015 – H1 2016
• Running CDH5
• 650+ DataNodes
• 15 600+ compute cores
• 54 PB of raw disk storage
• 143 TB of RAM capacity
• Huawei servers (AM5 is HP-based)
Overview of Hadoop @ Criteo – Production PA4
7 | Copyright © 2016 Criteo
Criteo has 3 local production Hadoop clusters
• Sunnyvale (SV6): 20 nodes
• Tokyo (TY5): 35 nodes
• Hong Kong (HK5): 20 nodes
Overview of Hadoop @ Criteo – Production local clusters
8 | Copyright © 2016 Criteo
Criteo has 3 preproduction Hadoop clusters
• Preprod PA3: 54 nodes, running CDH4
• Preprod PA4: 42 nodes, running CDH5
• Experimental: 53 nodes, running CDH5
Overview of Hadoop @ Criteo – Preproduction clusters
9 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo – Usage
Types of jobs running on our clusters
• Cascading jobs, mostly for joins between different types of logs (e.g. displays & clicks)
• Pure Map/Reduce jobs for recommendation, Hadoop streaming jobs for learning
• Scalding jobs for analytics
• Hive queries for Business Intelligence
• Spark jobs on CDH5 
10 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo – Special consideration
• Kerberos for security
• High-availability on NameNodes and ResourceManager (CDH5 only)
• Infrastructure installed & maintained with Chef
11 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo
How can we monitor this complex
infrastructure and services running on top
of it?
Our experimental
cluster
@
13 | Copyright © 2016 Criteo
• Useful for testing infrastructure changes without impacting users (no SLA)
• Test environment for new technologies
• HBase
o Natural joins
o OpenTSDB for metrology & monitoring
o hRaven for job detailed data (not used anymore)
• Spark, now in production @ PA4
Our experimental cluster – Purpose
14 | Copyright © 2016 Criteo
• Based on Google BigTable paper
• Integrated with the Hadoop stack
• Stores data in rows sorted by row key
• Uses regions as an ordered set of rows
• Regions sharded by row key bounds
• Regions managed by Region servers, collocated with DataNodes (data is stored on HDFS)
• Oversize regions split into two regions
• Values stored in columns, with no fixed schema as in RDBMS
• Columns grouped in column families
Our experimental cluster – HBase features
15 | Copyright © 2016 Criteo
Our experimental cluster – HBase architecture
Row key
(user UID)
CF0: user CF1: event
C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site
AAA value Firefox NULL Click Client #0
BBB value Chrome NULL Click Client #0
CCC value Chrome ccc@mail.com Display Client #1
DDD value IE NULL Sales Client #2
EEE value IE NULL Display Client #0
FFF value IE NULL Display Client #3
∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙
XXX value Firefox NULL Sales Client #4
YYY value Chrome NULL Bid Client #5
ZZZ value Opera zzz@mail.com Click Client #5
16 | Copyright © 2016 Criteo
Our experimental cluster – HBase architecture
Row key
(user UID)
CF0: user CF1: event
C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site
AAA value Firefox NULL Click Client #0
BBB value Chrome NULL Click Client #0
CCC value Chrome ccc@mail.com Display Client #1
DDD value IE NULL Sales Client #2
EEE value IE NULL Display Client #0
FFF value IE NULL Display Client #3
∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙
XXX value Firefox NULL Sales Client #4
YYY value Chrome NULL Bid Client #5
ZZZ value Opera zzz@mail.com Click Client #5
R0
R1
R5
17 | Copyright © 2016 Criteo
Our experimental cluster – HBase architecture
Row key
(user UID)
CF0: user CF1: event
C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site
AAA value Firefox NULL Click Client #0
BBB value Chrome NULL Click Client #0
CCC value Chrome ccc@mail.com Display Client #1
DDD value IE NULL Sales Client #2
EEE value IE NULL Display Client #0
FFF value IE NULL Display Client #3
∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙
XXX value Firefox NULL Sales Client #4
YYY value Chrome NULL Bid Client #5
ZZZ value Opera zzz@mail.com Click Client #5
R0
R1
R5
RS1
RS2
18 | Copyright © 2016 Criteo
HBase on the experimental cluster
• 50 region servers
• 44 000+ regions
• ~90 000 requests / second from OpenTSDB
Our experimental cluster – HBase @ Criteo
Rationale for
OpenTSDB
on
20 | Copyright © 2016 Criteo
Metrics to monitor:
• CPU load
• Processes & threads
• RAM available/reserved
• Free/used disk space
• Network statistics
• Sockets open/closed
• Open connections with their statuses
• Network traffic
Rationale for using OpenTSDB – Infrastructure monitoring
21 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Service monitoring
NodeManagers ResourceManagersYARN
DataNodes NameNodes JournalNodesHDFS
ZooKeeper Kerberos
HBase
Kafka Storm
22 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Service monitoring
NodeManagers ResourceManagersYARN
DataNodes NameNodes JournalNodesHDFS
ZooKeeper Kerberos
HBase
Kafka Storm
Huge diversity of services!
23 | Copyright © 2016 Criteo
• Diversity
• Many types of nodes & services
• Must be extensible simply to add new metrics
• Scale
• > 2 500 servers
• ~ 90 000 requests / second
• Storage
• Keep fine-grained resolution (down to the minute, at least)
• Long-term storage for analysis & investigation
Rationale for using OpenTSDB – Scale
24 | Copyright © 2016 Criteo
• Suits the problem well: “Hadoop for monitoring Hadoop”
• Designed for time series: HBase schema optimized for time series queries
• Scalable and resilient, thanks to HBase
• Extensible easily: writing data collector is easy
• Simple to query
Rationale for using OpenTSDB – Solution
25 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Easy to query
uri = URI.parse("http://0.rtsd.hpc.criteo.preprod:4242/api/query")
http = Net::HTTP.start(uri.hostname, uri.port)
http.read_timeout = 300
params = {
'start' => '2016/04/21-10:00:00',
'end' => '2016/04/21-12:00:00',
'queries‘ => {
'aggregator' => 'min',
'downsample' => '5m-min',
'metric' => 'hadoop.resourcemanager.queuemetrics.root.AllocatedMB',
'tags' => {
'cluster' => 'ams',
'host' => 'rm.hpc.criteo.prod'
}
}
request = Net::HTTP::Post.new(uri.path, initheader = {'Content-Type' =>'application/json'})
request.body = params.to_json
response = http.request(request)
26 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
27 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Metric
28 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Time range
Metric
29 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Time range
Metric
Tag keys/values
30 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Time range
Metric
Tag keys/values
Aggregator
31 | Copyright © 2016 Criteo
• OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors
• Some TSDs used for writing, others for reading, while tcollectors collect metrics
• TSDs are stateless
• TSDs use asyncHBase to scale
• Quiz: what are the advantages?
Rationale for using OpenTSDB – Design
32 | Copyright © 2016 Criteo
• OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors
• Some TSDs used for writing, others for reading, while tcollectors collect metrics
• TSDs are stateless
• TSDs use asyncHBase to scale
• Quiz: what are the advantages?
Rationale for using OpenTSDB – Design
1. Clients never interact
with HBase directly
2. Simple protocol → easy
to use & extend
3. No state, no
synchronization → great
scalability
33 | Copyright © 2016 Criteo
• Metrics consist in:
• metric name
• UNIX timestamp
• value (64 bit integer or single-precision floating point value).
• tags (key-value pairs) specific to that metric instance
• Tags useful for aggregations on time series
proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod
• Charts: average load in 15 minutes with the count
aggregator (proxy to machine count)
• Quiz: what is the chart below?
Rationale for using OpenTSDB – Metrics
proc.loadavg.15min
34 | Copyright © 2016 Criteo
• Metrics consist in:
• metric name
• UNIX timestamp
• value (64 bit integer or single-precision floating point value).
• tags (key-value pairs) specific to that metric instance
• Tags useful for aggregations on time series
proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod
• Charts: average load in 15 minutes with the count
aggregator (proxy to machine count)
• Quiz: what is the chart below?
Rationale for using OpenTSDB – Metrics
proc.loadavg.15min
proc.loadavg.15min
cluster=*
35 | Copyright © 2016 Criteo
• A single data table (split in regions), named tsdb
• Row key: <metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]
• timestamp is rounded down to the hour
• This schema helps group data from the same metric & time bucket close together (HBase sorts rows based on the row key)
• Assumption: query first on time range, then metric, then tags, in that order of preference
• Tag keys are sorted lexicographically
• Tags should be limited, because they are in the row key. Usually less than 5 tags.
• Values are stored in columns
• Column name: 2 or 4 bytes. For 2 bytes:
• Encode offset up to 3 600 seconds → 212 = 4096 → 12 bits
• 4 bits left for format/type
• Other tables, for metadata and name ↔ ID mappings
Rationale for using OpenTSDB – HBase schema
36 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – HBase schema
Hexadecimal representation of a row key, with two tags
Sorted row keys for the same metric: 000001
Note: row key size varies across rows, because of tags
37 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Statistics
Quiz: what should we look
for?
38 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Statistics
Quiz: what should we look
for?
39 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Statistics
Quiz: what should we look
for?
367 513 metrics
30 tag keys (!)
86 194 tag values
Stabilizing &
scaling OpenTSDB
41 | Copyright © 2016 Criteo
OpenTSDB was hard to scale at first. What problem can you see?
Scaling OpenTSDB
42 | Copyright © 2016 Criteo
OpenTSDB was hard to scale at first. What problem can you see?
Scaling OpenTSDB
We’re missing data points 
43 | Copyright © 2016 Criteo
• Analyze all the layers of the system
• Logs are your friends
• Change parameters one by one, not all at once
• Measure, change, deploy, measure. Rinse, repeat
Scaling OpenTSDB – Lessons learned
44 | Copyright © 2016 Criteo
Varnish & OpenResty save the day
Scaling OpenTSDB – Nifty trick
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
45 | Copyright © 2016 Criteo
Varnish & OpenResty save the day
Scaling OpenTSDB – Nifty trick
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
OpenTSDB to the
rescue in practice
47 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
hadoop.namenode.fsnamesystem.tag.HAState
48 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
hadoop.namenode.fsnamesystem.tag.HAState
49 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
• Hard to spot : it in the morning nothing has changed
hadoop.namenode.fsnamesystem.tag.HAState
50 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
• Hard to spot : it in the morning nothing has changed
• Would be impossible to see with daily aggregation
hadoop.namenode.fsnamesystem.tag.HAState
51 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
• Hard to spot : it in the morning nothing has changed
• Would be impossible to see with daily aggregation
• Trivia: we fixed the tcollector to get that metric
hadoop.namenode.fsnamesystem.tag.HAState
52 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
53 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike
54 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike Node not reporting points
55 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike Node not reporting points
Another huge spike
56 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike Node not reporting points
Another huge spike
No data
57 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
58 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
Service restart – configuration change
59 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
Service restart – configuration change Service restart – OOM
60 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
Service restart – configuration change Service restart – OOM
Log extract:
NodeManager
configured
with 192 GB
physical
memory
allocated to
containers,
which is more
than 80% of
the total
physical
memory
available (89
GB)
61 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Hiccups
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
62 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Hiccups
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
OpenTSDB problem – not node-specific
63 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Hiccups
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
OpenTSDB problem – not node-specific Node probably dead 
64 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystem.BlocksTotal
65 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
File deletion
File deletion
hadoop.namenode.fsnamesystem.BlocksTotal
66 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
File deletion
File deletion
File creation
hadoop.namenode.fsnamesystem.BlocksTotal
67 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystem.BlocksTotal
hadoop.namenode.fsnamesystem.FilesTotal
68 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
Slope
hadoop.namenode.fsnamesystem.BlocksTotal
hadoop.namenode.fsnamesystem.FilesTotal
69 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
Slope
hadoop.namenode.fsnamesystem.BlocksTotal
hadoop.namenode.fsnamesystem.FilesTotal
Be careful about the scale!
70 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
71 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
72 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
• Answer: NameNode checkpoint
73 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
• Answer: NameNode checkpoint
• Note: done at regular intervals
74 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
• Answer: NameNode checkpoint
• Note: done at regular intervals
• Trivia: never do a failover during a checkpoint!
75 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
76 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
77 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is the problem?
78 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is the problem?
• Answer: no NameNode checkpoint → no FS image!
79 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is the problem?
• Answer: no NameNode checkpoint → no FS image!
• Follow-up: standby namenode could not startup after a failover, because its FS image was too old
80 | Copyright © 2016 Criteo
Criteo ♥ BigData
- Very accessible: only 50 euros, which will be given to charity
- Speakers from leading organizations: Google, Spotify, Mesosphere, Criteo …
https://www.eventbrite.co.uk/e/nabdc-not-another-big-data-conference-registration-24415556587
81 | Copyright © 2016 Criteo
Criteo is hiring!
http://labs.criteo.com/
Criteo is hiring!

More Related Content

PDF
OpenTSDB 2.0
PDF
OpenTSDB: HBaseCon2017
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
Monitoring MySQL with OpenTSDB
PPTX
Update on OpenTSDB and AsyncHBase
PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
PDF
openTSDB - Metrics for a distributed world
PDF
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
OpenTSDB 2.0
OpenTSDB: HBaseCon2017
Update on OpenTSDB and AsyncHBase
Monitoring MySQL with OpenTSDB
Update on OpenTSDB and AsyncHBase
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
openTSDB - Metrics for a distributed world
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

What's hot (20)

PPTX
HBaseCon 2013: OpenTSDB at Box
PPTX
opentsdb in a real enviroment
PPTX
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
PPTX
Keynote: Apache HBase at Yahoo! Scale
PDF
Go and Uber’s time series database m3
PDF
Advanced Apache Cassandra Operations with JMX
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
PDF
Gnocchi v3
PDF
Time Series Processing with Solr and Spark
PDF
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
PDF
Chronix Poster for the Poster Session FAST 2017
PDF
JEEConf. Vanilla java
PDF
Gnocchi v4 - past and present
PPTX
Bucket your partitions wisely - Cassandra summit 2016
PPTX
MongoDB for Time Series Data: Sharding
PDF
A Fast and Efficient Time Series Storage Based on Apache Solr
PDF
HBaseCon2017 Transactions in HBase
PDF
Gnocchi v3 brownbag
PDF
The new time series kid on the block
PPTX
Back to Basics Webinar 6: Production Deployment
HBaseCon 2013: OpenTSDB at Box
opentsdb in a real enviroment
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Keynote: Apache HBase at Yahoo! Scale
Go and Uber’s time series database m3
Advanced Apache Cassandra Operations with JMX
HBaseCon2017 gohbase: Pure Go HBase Client
Gnocchi v3
Time Series Processing with Solr and Spark
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Chronix Poster for the Poster Session FAST 2017
JEEConf. Vanilla java
Gnocchi v4 - past and present
Bucket your partitions wisely - Cassandra summit 2016
MongoDB for Time Series Data: Sharding
A Fast and Efficient Time Series Storage Based on Apache Solr
HBaseCon2017 Transactions in HBase
Gnocchi v3 brownbag
The new time series kid on the block
Back to Basics Webinar 6: Production Deployment
Ad

Similar to OpenTSDB for monitoring @ Criteo (20)

PPTX
Performance is not an Option - gRPC and Cassandra
PDF
SD Times - Docker v2
PDF
Using Databases and Containers From Development to Deployment
PDF
SQL Engines for Hadoop - The case for Impala
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PDF
Argus Production Monitoring at Salesforce
PDF
Argus Production Monitoring at Salesforce
PPTX
Cloudstone - Sharpening Your Weapons Through Big Data
PPTX
Performance Optimizations in Apache Impala
PDF
Data Science in the Cloud @StitchFix
PDF
Presto talk @ Global AI conference 2018 Boston
PDF
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
PDF
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
PDF
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
PDF
Stream Processing with Apache Kafka and .NET
PPTX
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
PPTX
OpenStack Paris 2014 - Federation, are we there yet ?
PDF
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
Performance is not an Option - gRPC and Cassandra
SD Times - Docker v2
Using Databases and Containers From Development to Deployment
SQL Engines for Hadoop - The case for Impala
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
Cloudstone - Sharpening Your Weapons Through Big Data
Performance Optimizations in Apache Impala
Data Science in the Cloud @StitchFix
Presto talk @ Global AI conference 2018 Boston
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
Stream Processing with Apache Kafka and .NET
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
OpenStack Paris 2014 - Federation, are we there yet ?
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
A Day in the Life of a Druid Implementor and Druid's Roadmap
Ad

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
ai tools demonstartion for schools and inter college
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Digital Strategies for Manufacturing Companies
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
medical staffing services at VALiNTRY
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Operating system designcfffgfgggggggvggggggggg
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Essential Infomation Tech presentation.pptx
Transform Your Business with a Software ERP System
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
How Creative Agencies Leverage Project Management Software.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Nekopoi APK 2025 free lastest update
ai tools demonstartion for schools and inter college
PTS Company Brochure 2025 (1).pdf.......
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Digital Strategies for Manufacturing Companies
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
medical staffing services at VALiNTRY
Navsoft: AI-Powered Business Solutions & Custom Software Development
Design an Analysis of Algorithms II-SECS-1021-03
How to Choose the Right IT Partner for Your Business in Malaysia

OpenTSDB for monitoring @ Criteo

  • 1. Nathaniel Braun Thursday, April 28th, 2016 OpenTSDB for monitoring @ Criteo @
  • 2. 2 | Copyright © 2016 Criteo •Overview of Hadoop @ Criteo •Our experimental cluster •Rationale for OpenTSDB •Stabilizing & scaling OpenTSDB •OpenTSDB to the rescue in practice Hitch hiker’s guide to this presentation
  • 4. 4 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo Tokyo TY5 – PROD AS Sunnyvale SV6 – PROD NA HongKong HK5 – PROD CN Paris PA4 – PROD / PREPROD Paris PA3 –PREPROD / EXP Amsterdam AM5 – PROD Criteo’s 8 Hadoop clusters – running CDH Community Edition
  • 5. 5 | Copyright © 2016 Criteo AM5: main production cluster • In use since 2011 • Running CDH3 initially, CDH4 currently • 1118 DataNodes • 13 400+ compute cores • 39 PB of raw disk storage • 105 TB of RAM capacity • 40 TB of data imported every day, mostly through HTTPFS • 100 000+ jobs run daily Overview of Hadoop @ Criteo – Production AM5
  • 6. 6 | Copyright © 2016 Criteo PA4: comparable to AM5, with fewer machines • Migration done in Q4 2015 – H1 2016 • Running CDH5 • 650+ DataNodes • 15 600+ compute cores • 54 PB of raw disk storage • 143 TB of RAM capacity • Huawei servers (AM5 is HP-based) Overview of Hadoop @ Criteo – Production PA4
  • 7. 7 | Copyright © 2016 Criteo Criteo has 3 local production Hadoop clusters • Sunnyvale (SV6): 20 nodes • Tokyo (TY5): 35 nodes • Hong Kong (HK5): 20 nodes Overview of Hadoop @ Criteo – Production local clusters
  • 8. 8 | Copyright © 2016 Criteo Criteo has 3 preproduction Hadoop clusters • Preprod PA3: 54 nodes, running CDH4 • Preprod PA4: 42 nodes, running CDH5 • Experimental: 53 nodes, running CDH5 Overview of Hadoop @ Criteo – Preproduction clusters
  • 9. 9 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo – Usage Types of jobs running on our clusters • Cascading jobs, mostly for joins between different types of logs (e.g. displays & clicks) • Pure Map/Reduce jobs for recommendation, Hadoop streaming jobs for learning • Scalding jobs for analytics • Hive queries for Business Intelligence • Spark jobs on CDH5 
  • 10. 10 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo – Special consideration • Kerberos for security • High-availability on NameNodes and ResourceManager (CDH5 only) • Infrastructure installed & maintained with Chef
  • 11. 11 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo How can we monitor this complex infrastructure and services running on top of it?
  • 13. 13 | Copyright © 2016 Criteo • Useful for testing infrastructure changes without impacting users (no SLA) • Test environment for new technologies • HBase o Natural joins o OpenTSDB for metrology & monitoring o hRaven for job detailed data (not used anymore) • Spark, now in production @ PA4 Our experimental cluster – Purpose
  • 14. 14 | Copyright © 2016 Criteo • Based on Google BigTable paper • Integrated with the Hadoop stack • Stores data in rows sorted by row key • Uses regions as an ordered set of rows • Regions sharded by row key bounds • Regions managed by Region servers, collocated with DataNodes (data is stored on HDFS) • Oversize regions split into two regions • Values stored in columns, with no fixed schema as in RDBMS • Columns grouped in column families Our experimental cluster – HBase features
  • 15. 15 | Copyright © 2016 Criteo Our experimental cluster – HBase architecture Row key (user UID) CF0: user CF1: event C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site AAA value Firefox NULL Click Client #0 BBB value Chrome NULL Click Client #0 CCC value Chrome ccc@mail.com Display Client #1 DDD value IE NULL Sales Client #2 EEE value IE NULL Display Client #0 FFF value IE NULL Display Client #3 ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ XXX value Firefox NULL Sales Client #4 YYY value Chrome NULL Bid Client #5 ZZZ value Opera zzz@mail.com Click Client #5
  • 16. 16 | Copyright © 2016 Criteo Our experimental cluster – HBase architecture Row key (user UID) CF0: user CF1: event C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site AAA value Firefox NULL Click Client #0 BBB value Chrome NULL Click Client #0 CCC value Chrome ccc@mail.com Display Client #1 DDD value IE NULL Sales Client #2 EEE value IE NULL Display Client #0 FFF value IE NULL Display Client #3 ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ XXX value Firefox NULL Sales Client #4 YYY value Chrome NULL Bid Client #5 ZZZ value Opera zzz@mail.com Click Client #5 R0 R1 R5
  • 17. 17 | Copyright © 2016 Criteo Our experimental cluster – HBase architecture Row key (user UID) CF0: user CF1: event C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site AAA value Firefox NULL Click Client #0 BBB value Chrome NULL Click Client #0 CCC value Chrome ccc@mail.com Display Client #1 DDD value IE NULL Sales Client #2 EEE value IE NULL Display Client #0 FFF value IE NULL Display Client #3 ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ XXX value Firefox NULL Sales Client #4 YYY value Chrome NULL Bid Client #5 ZZZ value Opera zzz@mail.com Click Client #5 R0 R1 R5 RS1 RS2
  • 18. 18 | Copyright © 2016 Criteo HBase on the experimental cluster • 50 region servers • 44 000+ regions • ~90 000 requests / second from OpenTSDB Our experimental cluster – HBase @ Criteo
  • 20. 20 | Copyright © 2016 Criteo Metrics to monitor: • CPU load • Processes & threads • RAM available/reserved • Free/used disk space • Network statistics • Sockets open/closed • Open connections with their statuses • Network traffic Rationale for using OpenTSDB – Infrastructure monitoring
  • 21. 21 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Service monitoring NodeManagers ResourceManagersYARN DataNodes NameNodes JournalNodesHDFS ZooKeeper Kerberos HBase Kafka Storm
  • 22. 22 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Service monitoring NodeManagers ResourceManagersYARN DataNodes NameNodes JournalNodesHDFS ZooKeeper Kerberos HBase Kafka Storm Huge diversity of services!
  • 23. 23 | Copyright © 2016 Criteo • Diversity • Many types of nodes & services • Must be extensible simply to add new metrics • Scale • > 2 500 servers • ~ 90 000 requests / second • Storage • Keep fine-grained resolution (down to the minute, at least) • Long-term storage for analysis & investigation Rationale for using OpenTSDB – Scale
  • 24. 24 | Copyright © 2016 Criteo • Suits the problem well: “Hadoop for monitoring Hadoop” • Designed for time series: HBase schema optimized for time series queries • Scalable and resilient, thanks to HBase • Extensible easily: writing data collector is easy • Simple to query Rationale for using OpenTSDB – Solution
  • 25. 25 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Easy to query uri = URI.parse("http://0.rtsd.hpc.criteo.preprod:4242/api/query") http = Net::HTTP.start(uri.hostname, uri.port) http.read_timeout = 300 params = { 'start' => '2016/04/21-10:00:00', 'end' => '2016/04/21-12:00:00', 'queries‘ => { 'aggregator' => 'min', 'downsample' => '5m-min', 'metric' => 'hadoop.resourcemanager.queuemetrics.root.AllocatedMB', 'tags' => { 'cluster' => 'ams', 'host' => 'rm.hpc.criteo.prod' } } request = Net::HTTP::Post.new(uri.path, initheader = {'Content-Type' =>'application/json'}) request.body = params.to_json response = http.request(request)
  • 26. 26 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI
  • 27. 27 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Metric
  • 28. 28 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Time range Metric
  • 29. 29 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Time range Metric Tag keys/values
  • 30. 30 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Time range Metric Tag keys/values Aggregator
  • 31. 31 | Copyright © 2016 Criteo • OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors • Some TSDs used for writing, others for reading, while tcollectors collect metrics • TSDs are stateless • TSDs use asyncHBase to scale • Quiz: what are the advantages? Rationale for using OpenTSDB – Design
  • 32. 32 | Copyright © 2016 Criteo • OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors • Some TSDs used for writing, others for reading, while tcollectors collect metrics • TSDs are stateless • TSDs use asyncHBase to scale • Quiz: what are the advantages? Rationale for using OpenTSDB – Design 1. Clients never interact with HBase directly 2. Simple protocol → easy to use & extend 3. No state, no synchronization → great scalability
  • 33. 33 | Copyright © 2016 Criteo • Metrics consist in: • metric name • UNIX timestamp • value (64 bit integer or single-precision floating point value). • tags (key-value pairs) specific to that metric instance • Tags useful for aggregations on time series proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod • Charts: average load in 15 minutes with the count aggregator (proxy to machine count) • Quiz: what is the chart below? Rationale for using OpenTSDB – Metrics proc.loadavg.15min
  • 34. 34 | Copyright © 2016 Criteo • Metrics consist in: • metric name • UNIX timestamp • value (64 bit integer or single-precision floating point value). • tags (key-value pairs) specific to that metric instance • Tags useful for aggregations on time series proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod • Charts: average load in 15 minutes with the count aggregator (proxy to machine count) • Quiz: what is the chart below? Rationale for using OpenTSDB – Metrics proc.loadavg.15min proc.loadavg.15min cluster=*
  • 35. 35 | Copyright © 2016 Criteo • A single data table (split in regions), named tsdb • Row key: <metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>] • timestamp is rounded down to the hour • This schema helps group data from the same metric & time bucket close together (HBase sorts rows based on the row key) • Assumption: query first on time range, then metric, then tags, in that order of preference • Tag keys are sorted lexicographically • Tags should be limited, because they are in the row key. Usually less than 5 tags. • Values are stored in columns • Column name: 2 or 4 bytes. For 2 bytes: • Encode offset up to 3 600 seconds → 212 = 4096 → 12 bits • 4 bits left for format/type • Other tables, for metadata and name ↔ ID mappings Rationale for using OpenTSDB – HBase schema
  • 36. 36 | Copyright © 2016 Criteo Rationale for using OpenTSDB – HBase schema Hexadecimal representation of a row key, with two tags Sorted row keys for the same metric: 000001 Note: row key size varies across rows, because of tags
  • 37. 37 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Statistics Quiz: what should we look for?
  • 38. 38 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Statistics Quiz: what should we look for?
  • 39. 39 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Statistics Quiz: what should we look for? 367 513 metrics 30 tag keys (!) 86 194 tag values
  • 41. 41 | Copyright © 2016 Criteo OpenTSDB was hard to scale at first. What problem can you see? Scaling OpenTSDB
  • 42. 42 | Copyright © 2016 Criteo OpenTSDB was hard to scale at first. What problem can you see? Scaling OpenTSDB We’re missing data points 
  • 43. 43 | Copyright © 2016 Criteo • Analyze all the layers of the system • Logs are your friends • Change parameters one by one, not all at once • Measure, change, deploy, measure. Rinse, repeat Scaling OpenTSDB – Lessons learned
  • 44. 44 | Copyright © 2016 Criteo Varnish & OpenResty save the day Scaling OpenTSDB – Nifty trick OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB RTSD Read OpenTSDB RTSD Read OpenTSDB RTSD Read OpenTSDB
  • 45. 45 | Copyright © 2016 Criteo Varnish & OpenResty save the day Scaling OpenTSDB – Nifty trick OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB RTSD Read OpenTSDB RTSD Read OpenTSDB RTSD Read OpenTSDB
  • 46. OpenTSDB to the rescue in practice
  • 47. 47 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs hadoop.namenode.fsnamesystem.tag.HAState
  • 48. 48 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! hadoop.namenode.fsnamesystem.tag.HAState
  • 49. 49 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! • Hard to spot : it in the morning nothing has changed hadoop.namenode.fsnamesystem.tag.HAState
  • 50. 50 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! • Hard to spot : it in the morning nothing has changed • Would be impossible to see with daily aggregation hadoop.namenode.fsnamesystem.tag.HAState
  • 51. 51 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! • Hard to spot : it in the morning nothing has changed • Would be impossible to see with daily aggregation • Trivia: we fixed the tcollector to get that metric hadoop.namenode.fsnamesystem.tag.HAState
  • 52. 52 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity
  • 53. 53 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike
  • 54. 54 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike Node not reporting points
  • 55. 55 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike Node not reporting points Another huge spike
  • 56. 56 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike Node not reporting points Another huge spike No data
  • 57. 57 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
  • 58. 58 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis Service restart – configuration change
  • 59. 59 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis Service restart – configuration change Service restart – OOM
  • 60. 60 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis Service restart – configuration change Service restart – OOM Log extract: NodeManager configured with 192 GB physical memory allocated to containers, which is more than 80% of the total physical memory available (89 GB)
  • 61. 61 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Hiccups hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
  • 62. 62 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Hiccups hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis OpenTSDB problem – not node-specific
  • 63. 63 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Hiccups hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis OpenTSDB problem – not node-specific Node probably dead 
  • 64. 64 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystem.BlocksTotal
  • 65. 65 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue File deletion File deletion hadoop.namenode.fsnamesystem.BlocksTotal
  • 66. 66 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue File deletion File deletion File creation hadoop.namenode.fsnamesystem.BlocksTotal
  • 67. 67 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystem.BlocksTotal hadoop.namenode.fsnamesystem.FilesTotal
  • 68. 68 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue Slope hadoop.namenode.fsnamesystem.BlocksTotal hadoop.namenode.fsnamesystem.FilesTotal
  • 69. 69 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue Slope hadoop.namenode.fsnamesystem.BlocksTotal hadoop.namenode.fsnamesystem.FilesTotal Be careful about the scale!
  • 70. 70 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
  • 71. 71 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern?
  • 72. 72 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern? • Answer: NameNode checkpoint
  • 73. 73 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern? • Answer: NameNode checkpoint • Note: done at regular intervals
  • 74. 74 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern? • Answer: NameNode checkpoint • Note: done at regular intervals • Trivia: never do a failover during a checkpoint!
  • 75. 75 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
  • 76. 76 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
  • 77. 77 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is the problem?
  • 78. 78 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is the problem? • Answer: no NameNode checkpoint → no FS image!
  • 79. 79 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is the problem? • Answer: no NameNode checkpoint → no FS image! • Follow-up: standby namenode could not startup after a failover, because its FS image was too old
  • 80. 80 | Copyright © 2016 Criteo Criteo ♥ BigData - Very accessible: only 50 euros, which will be given to charity - Speakers from leading organizations: Google, Spotify, Mesosphere, Criteo … https://www.eventbrite.co.uk/e/nabdc-not-another-big-data-conference-registration-24415556587
  • 81. 81 | Copyright © 2016 Criteo Criteo is hiring! http://labs.criteo.com/ Criteo is hiring!