SlideShare a Scribd company logo
Visualizing Systems with Statemaps
CTO
bryan@joyent.com
Bryan Cantrill
@bcantrill
The stack of abstraction
‱ Our software systems are built as stacks of abstraction
‱ These stacks allow us to stand on the shoulders of history — to
reuse components without rebuilding them
‱ We can do this because of the software paradox: software is
both information and machine, exhibiting properties of both
‱ Our stacks are higher and run deeper than we can see or know:
software is opaque; the nature of abstraction is to seal us from
what runs beneath!
Run silent, run deep
‱ Not only is the stack deep, it is silent
‱ Running software emits neither light nor heat; it makes no
sound; it attracts no mass; it (mostly) has no odor
‱ Running software is — by all conventional notions — unseeable
‱ This generally isn’t a bad thing, as long as it all works

Hurricanes from butterïŹ‚ies
‱ When the stack of abstraction performs pathologically, its power
transmogriïŹes to peril: layering ampliïŹes performance
pathologies but hinders insight
‱ Work ampliïŹes as we go down the stack
‱ Latency ampliïŹes as we go up the stack
‱ Seemingly minor issues in one layer can cascade into systemic
pathological performance

‱ As the system becomes dominated by its outliers, butterïŹ‚ies
spawn hurricanes of pathological performance
Debugging the hurricanes
‱ Understanding a pathologically performing system is
excruciatingly difïŹcult:
‱ Symptoms are often far removed from root cause
‱ There may not be a single root cause but several
‱ The system is dynamic and may change without warning
‱ Improvements to the system are hard to model and verify
‱ Emphatically, this is not “tuning” — it is debugging
How do we debug?
‱ To debug methodically, we must resist the temptation to quick
hypotheses, focusing rather on questions and observations
‱ Iterating between questions and observations gathers the facts
that will constrain future hypotheses
‱ These facts can be used to disconïŹrm hypotheses!
‱ How do we ask questions?
‱ How do we make observations?
Asking questions
‱ For performance debugging, the initial question formulation is
particularly challenging: where does one start?
‱ Resource-centric methodologies like the USE Method
(Utilization/Saturation/Errors) can be excellent starting points

‱ But keep these methodologies in their context: they provide
initial questions to ask — they are not recipes for debugging
arbitrary performance pathologies!
Making observations
‱ Questions are answered through observation
‱ But — reminder! — software cannot by conventionally seen!
‱ It is up to the system itself to have the capacity to be seen
‱ This capacity is the system’s observability — and without it, we
are reduced to guessing
‱ Do not conïŹ‚ate software observability with control theory’s
deïŹnition of observability!
‱ Software is observable when it can answer your question about
its behavior — software observability is not a boolean!
The pillars of observability
‱ Much has been made of the so-called “pillars of observability”:
monitoring, logging and instrumentation
‱ Each of these is important, for each has within it the capacity to
answer questions about the system
‱ But each also has limitations!
‱ Their shared limitation: each can only be as effective as the
observer — they cannot answer questions not asked!
‱ Observability seeks to answer questions asked and prompt new
ones: the human is the foundation of observability!
Observability through instrumentation
‱ Static instrumentation modiïŹes source to provide semantically
relevant information, e.g., via logging or counters
‱ Dynamic instrumentation allows for the system to be changed
while running to emit data, e.g. DTrace, OpenTracing
‱ Both mechanisms of instrumentation are essential!
‱ Static instrumentation provides the observations necessary for
early question formulation

‱ Dynamic instrumentation answers deeper, ad hoc questions
Aggregation
‱ When instrumenting the system, it can become overwhelmed
with the overhead of instrumentation
‱ Aggregation is essential for scalable, non-invasive
instrumentation — and is a ïŹrst-class primitive in (e.g.) DTrace
‱ But aggregation also eliminates important dimensions of data,
especially with respect to time; some questions may only be
answered with disaggregated data!
‱ Use aggregation for performance debugging — but also
understand its limits!
Visualization
‱ The visual cortex is unparalleled at detecting patterns
‱ The value of visualizing data is not merely providing answers,
but also (and especially) provoking new questions
‱ Our systems are so large, complicated and abstract that there is
not one way to visualize them, but many
‱ The visualization of systems and their representations is an
essential facet of system observability!
Visualization: Gnuplot
‱ Graphs are terriïŹc — so much so that we should not restrict
ourselves to the captive graphs found in bundled software!
‱ An ad hoc plotting tool is essential for performance debugging;
and Gnuplot is an excellent (if idiosyncratic) one
‱ Gnuplot is easily combined with workhorses like awk or perl
‱ That Gnuplot is an essential tool helps to set expectation
around performance debugging tools: they are not magicians!
Visualization: Heatmaps
Visualization: Flamegraphs
Visualization: Statemaps
‱ Flamegraphs help understand the work a system is doing, but
how does one visualize a system that isn’t doing work?
‱ That is, idleness is a common pathology in a suboptimal
system; there is a hidden bottleneck — but where?
‱ To explore these kinds of problems, we have developed
statemaps, a visualization of entity state over time
Visualization: Statemaps
Statemap input data
‱ Statemaps operate on a payload of concatenated JSON where
each line corresponds to a state transition for an entity:‹
‹
{ "time": "52524411", "entity": "30080", "state": 0 }‹
{ "time": "52587486", "entity": "30137", "state": 0 }
{ "time": "52769425", "entity": "30080", "state": 4 }
{ "time": "52895402", "entity": "30137", "state": 1 }
{ "time": "53177670", "entity": "62308", "state": 0 }
{ "time": "53230742", "entity": "30137", "state": 0 }
{ "time": "53268043", "entity": "30137", "state": 1 }
{ "time": "53562441", "entity": "62308", "state": 4 }
{ "time": "53616633", "entity": "30137", "state": 0 }
{ "time": "53762283", "entity": "30137", "state": 6 }‹


Statemap input data
‱ States are described in JSON metadata header, e.g.:‹
‹
‹
{‹
"start": [ 1544138397, 322335287 ],‹
"title": "PostgreSQL statemap on HAB01436, by process ID",‹
"host": "HAB01436",‹
"entityKind": "Process",‹
"states": {‹
"on-cpu": {"value": 0, "color": "#DAF7A6" },‹
"off-cpu-waiting": {"value": 1, "color": "#f9f9f9" },‹
"off-cpu-semop": {"value": 2, "color": "#FF5733" },‹
"off-cpu-blocked": {"value": 3, "color": "#C70039" },‹
"off-cpu-zfs-read": {"value": 4, "color": "#FFC300" },‹
"off-cpu-zfs-write": {"value": 5, "color": "#338AFF" },‹
"off-cpu-zil-commit": {"value": 6, "color": "#66FFCC" },‹
"off-cpu-tx-delay": {"value": 7, "color": "#CCFF00" },‹
"off-cpu-dead": {"value": 8, "color": "#E0E0E0" },‹
"wal-init": {"value": 9, "color": "#dd1871" },‹
"wal-init-tx-delay": {"value": 10, "color": "#fd4bc9" }‹
}‹
}
Statemap output
‱ Statemap rendering code processes the JSON stream and
renders it into a SVG that is the actual state map
‱ SVG can be manipulated interactively (zoomed, panned,
highlighted, etc.) but also stands independently
‱ Statemaps are entirely neutral with respect to methodology!
Instrumentation for statemaps
‱ Statemaps themselves — like gnuplot — are entirely generic to
input data: they visualize arbitrary state over arbitrary time
‱ We have developed example statemap-generating dynamic
instrumentation for database, CPU, I/O, ïŹlesystem operations
‱ The data rate in terms of state transitions per second varies
based on what is being instrumented: from <10/sec to >1M/sec
Coalescing states
‱ For even modestly large inputs, adjacent states must be
coalesced to allow for reasonable visualization
‱ When this aggregation is required, the statemap rendering code
coalesces the least signiïŹcant two adjacent states — allowing
for larger trends to stay intact
‱ The threshold at which states are coalesced can be dynamically
adjusted to allow for higher resolution
‱ Importantly, the original data retains all state transitions!
Coalescing states
Coalescing states
Tagged statemaps
‱ We have found it useful to be able to tag states with immutable
information that describes the context around the state
‱ For example, tagging a state for CPU execution with immutable
context information (process, thread, etc.)
‱ Tag occurs separately in the stream, e.g.:‹
‹
{ "state": 0, "tag": "d136827", "pid": "51943", "tid": "1",
"execname": "postgres", "psargs": "/opt/postgresql/9.6.3/bin/
postgres -D /manatee/pg/data" }‹

‹
{ "time": "330931", "entity": "12", "state": 0, "tag": "d136827" }
Tagged statemaps
Stacked statemaps
‱ We have found it useful to be able to stack statemaps from
either disjoint sources or disjoint machines
‱ Allows for activity in one domain or machine to be tightly
correlated with activity in another domain or machine
‱ Across machines, can be subject to wall clock skew

‱ 
but if wall clocks are skewing within the datacenter, there are
likely bigger problems!
Stacked statemaps across domains
Stacked statemaps across machines
Stacked statemaps across many machines?
Statemaps
‱ Statemaps provide a generic and system-neutral tool for
visualizing system state over time
‱ Statemaps use visualization to prompt questions
‱ Statemaps work in concert with system observability facilities
that can answer the questions that statemaps raise
‱ We must keep the human in mind when developing for
observability — the capacity to answer arbitrary questions is
only as effective as the human asking them!
‱ Statemap renderer: https://github.com/joyent/statemap

More Related Content

PDF
Pragmatic Guide to Apache KafkaÂź's Exactly Once Semantics
PPTX
Running & Scaling Large Elasticsearch Clusters
PPTX
Monitoring & Observability
PDF
Observability
PPTX
Pipes in Windows and Linux.
PDF
Qiwi Ltd
PPTX
Operating System Concepts_1.pptx
PDF
Beautiful Monitoring With Grafana and InfluxDB
Pragmatic Guide to Apache KafkaÂź's Exactly Once Semantics
Running & Scaling Large Elasticsearch Clusters
Monitoring & Observability
Observability
Pipes in Windows and Linux.
Qiwi Ltd
Operating System Concepts_1.pptx
Beautiful Monitoring With Grafana and InfluxDB

What's hot (20)

PDF
Introduction to InfluxDB and TICK Stack
PDF
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
PDF
SREcon 2016 Performance Checklists for SREs
PDF
Hadoop I/O Analysis
PPTX
SCS DevSecOps Seminar - State of DevSecOps
PPTX
Impala presentation
 
PDF
DevOps Lifecycle | Edureka
PPTX
Devops ppt
PDF
PgQ Generic high-performance queue for PostgreSQL
PPTX
Hadoop File system (HDFS)
PPTX
Git undo
PPTX
Sizing Your MongoDB Cluster
PDF
BigTable And Hbase
PDF
Observability
PPTX
RocksDB detail
PDF
Etsy Activity Feeds Architecture
PPTX
Lecture 2 process
PDF
Introducing IO-500 benchmark
PDF
CMake - Introduction and best practices
PPT
Exokernel operating systems
Introduction to InfluxDB and TICK Stack
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
SREcon 2016 Performance Checklists for SREs
Hadoop I/O Analysis
SCS DevSecOps Seminar - State of DevSecOps
Impala presentation
 
DevOps Lifecycle | Edureka
Devops ppt
PgQ Generic high-performance queue for PostgreSQL
Hadoop File system (HDFS)
Git undo
Sizing Your MongoDB Cluster
BigTable And Hbase
Observability
RocksDB detail
Etsy Activity Feeds Architecture
Lecture 2 process
Introducing IO-500 benchmark
CMake - Introduction and best practices
Exokernel operating systems
Ad

Similar to Visualizing Systems with Statemaps (20)

PDF
The Hurricane's Butterfly: Debugging pathologically performing systems
PPTX
Unit II - Data Science (3) VI semester SRMIST
 
PDF
From Pipelines to Refineries: Scaling Big Data Applications
PPTX
RAJAT PROJECT.pptx
PPTX
Is Spark the right choice for data analysis ?
PPTX
Deep dive time series anomaly detection with different Azure Data Services
PPTX
Time Series Anomaly Detection with .net and Azure
PDF
Building a Database for the End of the World
 
PDF
Building an Experimentation Platform in Clojure
PDF
Performance tuning Grails applications
PDF
Zebras all the way down: The engineering challenges of the data path
PDF
Velocity 2015 linux perf tools
PDF
Performance tuning Grails applications
PPTX
Stream Analytics
PPTX
DSC650 : DATA TECHNOLOGY AND FUTURE EMERGENCE (CHAPTER 4)
PDF
Stream Processing Overview
PDF
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
PDF
Module 3 - Basics of Data Manipulation in Time Series
PDF
Introduction to Java Profiling
PPTX
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
The Hurricane's Butterfly: Debugging pathologically performing systems
Unit II - Data Science (3) VI semester SRMIST
 
From Pipelines to Refineries: Scaling Big Data Applications
RAJAT PROJECT.pptx
Is Spark the right choice for data analysis ?
Deep dive time series anomaly detection with different Azure Data Services
Time Series Anomaly Detection with .net and Azure
Building a Database for the End of the World
 
Building an Experimentation Platform in Clojure
Performance tuning Grails applications
Zebras all the way down: The engineering challenges of the data path
Velocity 2015 linux perf tools
Performance tuning Grails applications
Stream Analytics
DSC650 : DATA TECHNOLOGY AND FUTURE EMERGENCE (CHAPTER 4)
Stream Processing Overview
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Module 3 - Basics of Data Manipulation in Time Series
Introduction to Java Profiling
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Ad

More from bcantrill (20)

PDF
Predicting the Present
PDF
Sharpening the Axe: The Primacy of Toolmaking
PDF
Coming of Age: Developing young technologists without robbing them of their y...
PDF
I have come to bury the BIOS, not to open it: The need for holistic systems
PDF
Towards Holistic Systems
PDF
The Coming Firmware Revolution
PDF
Hardware/software Co-design: The Coming Golden Age
PDF
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
PDF
No Moore Left to Give: Enterprise Computing After Moore's Law
PDF
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
PDF
Platform values, Rust, and the implications for system software
PDF
Is it time to rewrite the operating system in Rust?
PDF
dtrace.conf(16): DTrace state of the union
PDF
Papers We Love: ARC after dark
PDF
Principles of Technology Leadership
PDF
Platform as reflection of values: Joyent, node.js, and beyond
PDF
Debugging under fire: Keeping your head when systems have lost their mind
PDF
Down Memory Lane: Two Decades with the Slab Allocator
PDF
The State of Cloud 2016: The whirlwind of creative destruction
PDF
Oral tradition in software engineering: Passing the craft across generations
Predicting the Present
Sharpening the Axe: The Primacy of Toolmaking
Coming of Age: Developing young technologists without robbing them of their y...
I have come to bury the BIOS, not to open it: The need for holistic systems
Towards Holistic Systems
The Coming Firmware Revolution
Hardware/software Co-design: The Coming Golden Age
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
No Moore Left to Give: Enterprise Computing After Moore's Law
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Platform values, Rust, and the implications for system software
Is it time to rewrite the operating system in Rust?
dtrace.conf(16): DTrace state of the union
Papers We Love: ARC after dark
Principles of Technology Leadership
Platform as reflection of values: Joyent, node.js, and beyond
Debugging under fire: Keeping your head when systems have lost their mind
Down Memory Lane: Two Decades with the Slab Allocator
The State of Cloud 2016: The whirlwind of creative destruction
Oral tradition in software engineering: Passing the craft across generations

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Introduction to Artificial Intelligence
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Digital Strategies for Manufacturing Companies
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
medical staffing services at VALiNTRY
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
System and Network Administration Chapter 2
PPTX
history of c programming in notes for students .pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
L1 - Introduction to python Backend.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Upgrade and Innovation Strategies for SAP ERP Customers
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PTS Company Brochure 2025 (1).pdf.......
Introduction to Artificial Intelligence
Odoo POS Development Services by CandidRoot Solutions
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Digital Strategies for Manufacturing Companies
ISO 45001 Occupational Health and Safety Management System
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
medical staffing services at VALiNTRY
ManageIQ - Sprint 268 Review - Slide Deck
CHAPTER 2 - PM Management and IT Context
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
System and Network Administration Chapter 2
history of c programming in notes for students .pptx
Wondershare Filmora 15 Crack With Activation Key [2025
L1 - Introduction to python Backend.pptx
Odoo Companies in India – Driving Business Transformation.pdf
How to Choose the Right IT Partner for Your Business in Malaysia

Visualizing Systems with Statemaps

  • 1. Visualizing Systems with Statemaps CTO bryan@joyent.com Bryan Cantrill @bcantrill
  • 2. The stack of abstraction ‱ Our software systems are built as stacks of abstraction ‱ These stacks allow us to stand on the shoulders of history — to reuse components without rebuilding them ‱ We can do this because of the software paradox: software is both information and machine, exhibiting properties of both ‱ Our stacks are higher and run deeper than we can see or know: software is opaque; the nature of abstraction is to seal us from what runs beneath!
  • 3. Run silent, run deep ‱ Not only is the stack deep, it is silent ‱ Running software emits neither light nor heat; it makes no sound; it attracts no mass; it (mostly) has no odor ‱ Running software is — by all conventional notions — unseeable ‱ This generally isn’t a bad thing, as long as it all works

  • 4. Hurricanes from butterïŹ‚ies ‱ When the stack of abstraction performs pathologically, its power transmogriïŹes to peril: layering ampliïŹes performance pathologies but hinders insight ‱ Work ampliïŹes as we go down the stack ‱ Latency ampliïŹes as we go up the stack ‱ Seemingly minor issues in one layer can cascade into systemic pathological performance
 ‱ As the system becomes dominated by its outliers, butterïŹ‚ies spawn hurricanes of pathological performance
  • 5. Debugging the hurricanes ‱ Understanding a pathologically performing system is excruciatingly difïŹcult: ‱ Symptoms are often far removed from root cause ‱ There may not be a single root cause but several ‱ The system is dynamic and may change without warning ‱ Improvements to the system are hard to model and verify ‱ Emphatically, this is not “tuning” — it is debugging
  • 6. How do we debug? ‱ To debug methodically, we must resist the temptation to quick hypotheses, focusing rather on questions and observations ‱ Iterating between questions and observations gathers the facts that will constrain future hypotheses ‱ These facts can be used to disconïŹrm hypotheses! ‱ How do we ask questions? ‱ How do we make observations?
  • 7. Asking questions ‱ For performance debugging, the initial question formulation is particularly challenging: where does one start? ‱ Resource-centric methodologies like the USE Method (Utilization/Saturation/Errors) can be excellent starting points
 ‱ But keep these methodologies in their context: they provide initial questions to ask — they are not recipes for debugging arbitrary performance pathologies!
  • 8. Making observations ‱ Questions are answered through observation ‱ But — reminder! — software cannot by conventionally seen! ‱ It is up to the system itself to have the capacity to be seen ‱ This capacity is the system’s observability — and without it, we are reduced to guessing ‱ Do not conïŹ‚ate software observability with control theory’s deïŹnition of observability! ‱ Software is observable when it can answer your question about its behavior — software observability is not a boolean!
  • 9. The pillars of observability ‱ Much has been made of the so-called “pillars of observability”: monitoring, logging and instrumentation ‱ Each of these is important, for each has within it the capacity to answer questions about the system ‱ But each also has limitations! ‱ Their shared limitation: each can only be as effective as the observer — they cannot answer questions not asked! ‱ Observability seeks to answer questions asked and prompt new ones: the human is the foundation of observability!
  • 10. Observability through instrumentation ‱ Static instrumentation modiïŹes source to provide semantically relevant information, e.g., via logging or counters ‱ Dynamic instrumentation allows for the system to be changed while running to emit data, e.g. DTrace, OpenTracing ‱ Both mechanisms of instrumentation are essential! ‱ Static instrumentation provides the observations necessary for early question formulation
 ‱ Dynamic instrumentation answers deeper, ad hoc questions
  • 11. Aggregation ‱ When instrumenting the system, it can become overwhelmed with the overhead of instrumentation ‱ Aggregation is essential for scalable, non-invasive instrumentation — and is a ïŹrst-class primitive in (e.g.) DTrace ‱ But aggregation also eliminates important dimensions of data, especially with respect to time; some questions may only be answered with disaggregated data! ‱ Use aggregation for performance debugging — but also understand its limits!
  • 12. Visualization ‱ The visual cortex is unparalleled at detecting patterns ‱ The value of visualizing data is not merely providing answers, but also (and especially) provoking new questions ‱ Our systems are so large, complicated and abstract that there is not one way to visualize them, but many ‱ The visualization of systems and their representations is an essential facet of system observability!
  • 13. Visualization: Gnuplot ‱ Graphs are terriïŹc — so much so that we should not restrict ourselves to the captive graphs found in bundled software! ‱ An ad hoc plotting tool is essential for performance debugging; and Gnuplot is an excellent (if idiosyncratic) one ‱ Gnuplot is easily combined with workhorses like awk or perl ‱ That Gnuplot is an essential tool helps to set expectation around performance debugging tools: they are not magicians!
  • 16. Visualization: Statemaps ‱ Flamegraphs help understand the work a system is doing, but how does one visualize a system that isn’t doing work? ‱ That is, idleness is a common pathology in a suboptimal system; there is a hidden bottleneck — but where? ‱ To explore these kinds of problems, we have developed statemaps, a visualization of entity state over time
  • 18. Statemap input data ‱ Statemaps operate on a payload of concatenated JSON where each line corresponds to a state transition for an entity:‹ ‹ { "time": "52524411", "entity": "30080", "state": 0 }‹ { "time": "52587486", "entity": "30137", "state": 0 } { "time": "52769425", "entity": "30080", "state": 4 } { "time": "52895402", "entity": "30137", "state": 1 } { "time": "53177670", "entity": "62308", "state": 0 } { "time": "53230742", "entity": "30137", "state": 0 } { "time": "53268043", "entity": "30137", "state": 1 } { "time": "53562441", "entity": "62308", "state": 4 } { "time": "53616633", "entity": "30137", "state": 0 } { "time": "53762283", "entity": "30137", "state": 6 }‹ 

  • 19. Statemap input data ‱ States are described in JSON metadata header, e.g.:‹ ‹ ‹ {‹ "start": [ 1544138397, 322335287 ],‹ "title": "PostgreSQL statemap on HAB01436, by process ID",‹ "host": "HAB01436",‹ "entityKind": "Process",‹ "states": {‹ "on-cpu": {"value": 0, "color": "#DAF7A6" },‹ "off-cpu-waiting": {"value": 1, "color": "#f9f9f9" },‹ "off-cpu-semop": {"value": 2, "color": "#FF5733" },‹ "off-cpu-blocked": {"value": 3, "color": "#C70039" },‹ "off-cpu-zfs-read": {"value": 4, "color": "#FFC300" },‹ "off-cpu-zfs-write": {"value": 5, "color": "#338AFF" },‹ "off-cpu-zil-commit": {"value": 6, "color": "#66FFCC" },‹ "off-cpu-tx-delay": {"value": 7, "color": "#CCFF00" },‹ "off-cpu-dead": {"value": 8, "color": "#E0E0E0" },‹ "wal-init": {"value": 9, "color": "#dd1871" },‹ "wal-init-tx-delay": {"value": 10, "color": "#fd4bc9" }‹ }‹ }
  • 20. Statemap output ‱ Statemap rendering code processes the JSON stream and renders it into a SVG that is the actual state map ‱ SVG can be manipulated interactively (zoomed, panned, highlighted, etc.) but also stands independently ‱ Statemaps are entirely neutral with respect to methodology!
  • 21. Instrumentation for statemaps ‱ Statemaps themselves — like gnuplot — are entirely generic to input data: they visualize arbitrary state over arbitrary time ‱ We have developed example statemap-generating dynamic instrumentation for database, CPU, I/O, ïŹlesystem operations ‱ The data rate in terms of state transitions per second varies based on what is being instrumented: from <10/sec to >1M/sec
  • 22. Coalescing states ‱ For even modestly large inputs, adjacent states must be coalesced to allow for reasonable visualization ‱ When this aggregation is required, the statemap rendering code coalesces the least signiïŹcant two adjacent states — allowing for larger trends to stay intact ‱ The threshold at which states are coalesced can be dynamically adjusted to allow for higher resolution ‱ Importantly, the original data retains all state transitions!
  • 25. Tagged statemaps ‱ We have found it useful to be able to tag states with immutable information that describes the context around the state ‱ For example, tagging a state for CPU execution with immutable context information (process, thread, etc.) ‱ Tag occurs separately in the stream, e.g.:‹ ‹ { "state": 0, "tag": "d136827", "pid": "51943", "tid": "1", "execname": "postgres", "psargs": "/opt/postgresql/9.6.3/bin/ postgres -D /manatee/pg/data" }‹ 
‹ { "time": "330931", "entity": "12", "state": 0, "tag": "d136827" }
  • 27. Stacked statemaps ‱ We have found it useful to be able to stack statemaps from either disjoint sources or disjoint machines ‱ Allows for activity in one domain or machine to be tightly correlated with activity in another domain or machine ‱ Across machines, can be subject to wall clock skew
 ‱ 
but if wall clocks are skewing within the datacenter, there are likely bigger problems!
  • 30. Stacked statemaps across many machines?
  • 31. Statemaps ‱ Statemaps provide a generic and system-neutral tool for visualizing system state over time ‱ Statemaps use visualization to prompt questions ‱ Statemaps work in concert with system observability facilities that can answer the questions that statemaps raise ‱ We must keep the human in mind when developing for observability — the capacity to answer arbitrary questions is only as effective as the human asking them! ‱ Statemap renderer: https://github.com/joyent/statemap