SlideShare a Scribd company logo
Instrumenting the
real-time web:
Running node.js in production

Bryan Cantrill
VP, Engineering

bryan@joyent.com
@bcantrill
“Real-time web?”

   • The term has enjoyed some popularity, but there is
     clearly confusion about the definition of “real-time”
   • A real-time system is one in which the correctness of the
     system is relative to its timeliness
   • A hard real-time system is one which the latency
     constraints are rigid: violation constitutes total system
     failure (e.g., an actuator on a physical device)
   • A soft real-time system is one in which latency
     constraints are more flexible: violation is undesirable but
     non-fatal (e.g., a video game or MP3 player)
   • Historically, the only real-time aspect of the web has
     been in some of its static content (e.g. video, audio)
The rise of the real-time web

    • The rise of mobile + HTML5 has given rise to a new
     breed of web application: ones in which dynamic data
     has real-time semantics
    • These data-intensive real-time applications present new
     semantics for web-facing applications
    • These present new data semantics for web applications:
     CRUD, ACID, BASE, CAP — meet DIRT!
The challenge of DIRTy apps

   • DIRTy applications tend to have the human in the loop
      • Good news: deadlines are soft — microseconds only
        matter when they add up to tens of milliseconds

      • Bad news: because humans are in the loop, demand
        for the system can be non-linear

   • One must deal not only with the traditional challenge of
     scalability, but also the challenge of a real-time system!
Building DIRTy apps

   • Embedded real-time systems are sufficiently controlled
     that latency bubbles can be architected away
   • Web-facing systems are far too sloppy to expect this!
   • Focus must shift from preventing latency bubbles to
     preventing latency bubbles from cascading
   • Operations that can induce latency (network, I/O, etc.)
     must not be able to take the system out with them!
   • Implies purely asynchronous and evented architectures,
     which are notoriously difficult to implement...
Enter node.js

   • node.js is a JavaScript-based framework for building
     event-oriented servers:
      var http = require(‘http’);

      http.createServer(function (req, res) {
             res.writeHead(200, {'Content-Type': 'text/plain'});
             res.end('Hello Worldn');
      }).listen(8124, "127.0.0.1");

      console.log(‘Server running at http://127.0.0.1:8124!’);
node.js as building block

    • node.js is a confluence of three ideas:
       • JavaScriptʼs rich support for asynchrony (i.e. closures)
       • High-performance JavaScript VMs (e.g. V8)
       • The system abstractions that God intended (i.e. UNIX)
    • Because everything is asynchronous, node.js is ideal for
     delivering scale in the presence of long-latency events!
The primacy of latency

   • As the correctness of the system is its timeliness, we
     must be able to measure the system to verify it
   • In a real-time system, it does not make sense to
     measure operations per second!
   • The only metric that matters is latency
   • This is dangerous to distill to a single number; the
     distribution of latency over time is essential
   • This poses both instrumentation and visualization
     challenges!
Instrumenting for latency

    • Instrumenting for latency requires modifying the system
     twice: as an operation starts and as it finishes
    • During an operation, the system must track — on a per-
     operation basis — the start time of the operation
    • Upon operation completion, the resulting stored data
     cannot be a scalar — the distribution is essential when
     understanding latency
    • Instrumentation must be systemic; must be able to
     reach to the sources of latency deep within the system
    • These constraints eliminate static instrumentation; we
     need a better way to instrument the system
Enter DTrace

   • Facility for dynamic instrumentation of production
     systems originally developed circa 2003 for Solaris 10
   • Open sourced (along with the rest of Solaris) in 2005;
     subsequently ported to many other systems (MacOS X,
     FreeBSD, NetBSD, QNX, nascent Linux port)
   • Support for arbitrary actions, arbitrary predicates, in
     situ data aggregation, statically-defined instrumentation
   • Designed for safe, ad hoc use in production: concise
     answers to arbitrary questions
   • Particularly well suited to real-time: the original design
     center was the understanding of latency bubbles
DTrace + Node?

   • DTrace instruments the system holistically, which is to
    say, from the kernel, which poses a challenge for
    interpreted environments
   • User-level statically defined tracing (USDT) providers
    describe semantically relevant points of instrumentation
   • Some interpreted environments (e.g., Ruby, Python,
    PHP, Erlang) have added USDT providers that
    instrument the interpreter itself
   • This approach is very fine-grained (e.g., every function
    call) and doesnʼt work in JITʼd environments
   • We decided to take a different tack for Node
DTrace for node.js

    • Given the nature of the paths that we wanted to
     instrument, we introduced a function into JavaScript that
     Node can call to get into USDT-instrumented C++
    • Introduces disabled probe effect: calling from JavaScript
     into C++ costs even when probes are not enabled
    • We use USDT is-enabled probes to minimize disabled
     probe effect once in C++
    • If (and only if) the probe is enabled, we prepare a
     structure for the kernel that allows for translation into a
     structure that is familiar to node programmers
Node USDT Provider

   • Example one-liners:
     dtrace -n ‘node*:::http-server-request{
        printf(“%s of %s from %sn”, args[0]->method,
            args[0]->url, args[1]->remoteAddress)}‘

     dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘

     dtrace -n gc-start’{self->ts = timestamp}’ 
        -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’



   • A script to measure HTTP latency:
     http-server-request
     {
            self->ts[args[1]->fd] = timestamp;
     }

     http-server-response
     /self->ts[args[0]->fd]/
     {
            @[zonename] = quantize(timestamp - self->ts[args[0]->fd]);
     }
User-defined USDT probes in node.js

   • Our USDT technique has been generalized by Chris
     Andrews in his node-dtrace-provider npm module:
       https://github.com/chrisa/node-dtrace-provider
   • Used by Joyentʼs Mark Cavage in his ldap.js to measure
     and validate operation latency
   • But how to visualize operation latency?
Visualizing latency

    • Could visualize latency as a scalar (i.e., average):




    • This hides outliers — and in a real-time system, it is the
     outliers that you care about!
    • Using percentiles helps to convey distribution — but
     crucial detail remains hidden
Visualizing latency as a heatmap

    • Latency is much better visualized as a heatmap, with
     time on the x-axis, latency on the y-axis, and frequency
     represented with color saturation:




    • Many patterns are now visible (as in this example of
     MySQL query latency), but critical data is still hidden
Visualizing latency as a 4D heatmap

   • Can use hue to represent higher dimensionality: time on
     the x-axis, latency on the y-axis, frequency via color
     saturation, and hue representing the new dimension:




   • In this example, the higher dimension is the MySQL
     database table associated with the operation
Visualizing node.js latency

    • Using the USDT probes as foundation, we developed a
     cloud analytics facility that visualizes latency in real-time
     via four dimensional heatmaps:




    • Facility is available via Joyentʼs no.de service, Joyentʼs
     public cloud, or Joyentʼs SmartDataCenter
Debugging latency

   • Latency visualization is essential for understanding
     where latency is being induced in a complicated system,
     but how can we determine why?
   • This requires associating an external event — an I/O
     request, a network packet, a profiling interrupt — with
     the code thatʼs inducing it
   • For node.js — like other dynamic environments — this is
     historically very difficult: the VM is opaque to the OS
   • Using DTraceʼs helper mechanism, we have developed
     a V8 ustack helper that allows OS-level events to be
     correlated to the node.js-backtrace that induced them
   • Available for node 0.6.7 on Joyentʼs SmartOS
Visualizing node.js CPU latency

   • Using the node.js ustack helper and the DTrace profile
     provider, we can determine the relative frequency of
     stack backtraces in terms of CPU consumption
   • Stacks can be visualized with flame graphs, a stack
     visualization developed by Joyentʼs Brendan Gregg:
node.js in production

    • node.js is particularly amenable for the DIRTy apps that
     typify the real-time web
    • The ability to understand latency must be considered
     when deploying node.js-based systems into production!
    • Understanding latency requires dynamic instrumentation
     and novel visualization
    • At Joyent, we have added DTrace-based dynamic
     instrumentation for node.js to SmartOS, and novel
     visualization into our cloud and software offerings
    • Better production support — better observability, better
     debuggability — remains an important area of node.js
     development!
Thank you!

   • @ryah and @rmustacc for Node DTrace USDT
    integration
   • @dapsays, @rmustacc, @rob_ellis and @notmatt for
    cloud analytics
   • @chrisandrews for node-dtrace-provider and
    @mcavage for putting it to such great use in ldap.js
   • @dapsays for the V8 DTrace ustack helper
   • @brendangregg for both the heatmap and flame graph
    visualizations
   • More information: http://dtrace.org/blogs/dap,
    http://dtrace.org/blogs/brendan and http://smartos.org

More Related Content

PDF
[RLkorea] 각잡고 로봇팔 발표
PDF
Sharpness-aware minimization (SAM)
PDF
Capsule Networks
PDF
오토인코더의 모든 것
PPTX
Recurrent Neural Network
PDF
The Guerrilla Guide to Game Code
PPTX
Beyond porting
PPTX
PRML Chapter 5
[RLkorea] 각잡고 로봇팔 발표
Sharpness-aware minimization (SAM)
Capsule Networks
오토인코더의 모든 것
Recurrent Neural Network
The Guerrilla Guide to Game Code
Beyond porting
PRML Chapter 5

What's hot (20)

PPTX
An introduction to quantum machine learning.pptx
PDF
[PR12] understanding deep learning requires rethinking generalization
PDF
Meta learning tutorial
PDF
Look, Listen and Act [Navigation via Reinforcement Learning]
DOCX
Comparative study of ANNs and BNNs and mathematical modeling of a neuron
PPTX
Artificial neural network
KEY
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
PDF
[PR12] Spectral Normalization for Generative Adversarial Networks
PPTX
Sequence models
PDF
Improved Trainings of Wasserstein GANs (WGAN-GP)
PDF
CNNs: from the Basics to Recent Advances
PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PPTX
Semi supervised learning machine learning made simple
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PPT
lecun-01.ppt
PDF
[기초개념] Graph Convolutional Network (GCN)
PDF
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
PDF
乱数と擬似乱数の生成技術
An introduction to quantum machine learning.pptx
[PR12] understanding deep learning requires rethinking generalization
Meta learning tutorial
Look, Listen and Act [Navigation via Reinforcement Learning]
Comparative study of ANNs and BNNs and mathematical modeling of a neuron
Artificial neural network
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
[PR12] Spectral Normalization for Generative Adversarial Networks
Sequence models
Improved Trainings of Wasserstein GANs (WGAN-GP)
CNNs: from the Basics to Recent Advances
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Semi supervised learning machine learning made simple
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
lecun-01.ppt
[기초개념] Graph Convolutional Network (GCN)
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
乱数と擬似乱数の生成技術
Ad

Viewers also liked (8)

PDF
Node Summit 2012
PDF
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
PPTX
ql.io: Consuming HTTP at Scale
PDF
Rqa14 secondary
PDF
Probabilistic algorithms for fun and pseudorandom profit
PDF
BPF - in-kernel virtual machine
PDF
Linux BPF Superpowers
PDF
BPF: Tracing and more
Node Summit 2012
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
ql.io: Consuming HTTP at Scale
Rqa14 secondary
Probabilistic algorithms for fun and pseudorandom profit
BPF - in-kernel virtual machine
Linux BPF Superpowers
BPF: Tracing and more
Ad

Similar to Instrumenting the real-time web: Node.js in production (20)

PDF
John adams talk cloudy
PPTX
The impact of cloud NSBCon NY by Yves Goeleven
PDF
Performance Analysis: new tools and concepts from the cloud
PDF
Build cloud native solution using open source
PDF
Data Lake and the rise of the microservices
ODP
Birmingham-20060705
PDF
node.js and Containers: Dispatches from the Frontier
PPTX
Sync in an NFV World (Ram, ITSF 2016)
PPTX
Sync in an NFV World (Ram, ITSF 2016)
PPTX
Onboarding a Historical Company on the Cloud Journey
PPTX
Moving to software-based production workflows and containerisation of media a...
PPTX
Fiware: Connecting to robots
PDF
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
PPTX
Tech 2 tech low latency networking on Janet presentation
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
PDF
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
PPTX
Brad stack - Digital Health and Well-Being Festival
PDF
Intro to Databases
PDF
Fixing twitter
PDF
Fixing_Twitter
John adams talk cloudy
The impact of cloud NSBCon NY by Yves Goeleven
Performance Analysis: new tools and concepts from the cloud
Build cloud native solution using open source
Data Lake and the rise of the microservices
Birmingham-20060705
node.js and Containers: Dispatches from the Frontier
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
Onboarding a Historical Company on the Cloud Journey
Moving to software-based production workflows and containerisation of media a...
Fiware: Connecting to robots
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Tech 2 tech low latency networking on Janet presentation
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Brad stack - Digital Health and Well-Being Festival
Intro to Databases
Fixing twitter
Fixing_Twitter

More from bcantrill (20)

PDF
Predicting the Present
PDF
Sharpening the Axe: The Primacy of Toolmaking
PDF
Coming of Age: Developing young technologists without robbing them of their y...
PDF
I have come to bury the BIOS, not to open it: The need for holistic systems
PDF
Towards Holistic Systems
PDF
The Coming Firmware Revolution
PDF
Hardware/software Co-design: The Coming Golden Age
PDF
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
PDF
No Moore Left to Give: Enterprise Computing After Moore's Law
PDF
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
PDF
Visualizing Systems with Statemaps
PDF
Platform values, Rust, and the implications for system software
PDF
Is it time to rewrite the operating system in Rust?
PDF
dtrace.conf(16): DTrace state of the union
PDF
The Hurricane's Butterfly: Debugging pathologically performing systems
PDF
Papers We Love: ARC after dark
PDF
Principles of Technology Leadership
PDF
Zebras all the way down: The engineering challenges of the data path
PDF
Platform as reflection of values: Joyent, node.js, and beyond
PDF
Debugging under fire: Keeping your head when systems have lost their mind
Predicting the Present
Sharpening the Axe: The Primacy of Toolmaking
Coming of Age: Developing young technologists without robbing them of their y...
I have come to bury the BIOS, not to open it: The need for holistic systems
Towards Holistic Systems
The Coming Firmware Revolution
Hardware/software Co-design: The Coming Golden Age
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
No Moore Left to Give: Enterprise Computing After Moore's Law
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Visualizing Systems with Statemaps
Platform values, Rust, and the implications for system software
Is it time to rewrite the operating system in Rust?
dtrace.conf(16): DTrace state of the union
The Hurricane's Butterfly: Debugging pathologically performing systems
Papers We Love: ARC after dark
Principles of Technology Leadership
Zebras all the way down: The engineering challenges of the data path
Platform as reflection of values: Joyent, node.js, and beyond
Debugging under fire: Keeping your head when systems have lost their mind

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Instrumenting the real-time web: Node.js in production

  • 1. Instrumenting the real-time web: Running node.js in production Bryan Cantrill VP, Engineering bryan@joyent.com @bcantrill
  • 2. “Real-time web?” • The term has enjoyed some popularity, but there is clearly confusion about the definition of “real-time” • A real-time system is one in which the correctness of the system is relative to its timeliness • A hard real-time system is one which the latency constraints are rigid: violation constitutes total system failure (e.g., an actuator on a physical device) • A soft real-time system is one in which latency constraints are more flexible: violation is undesirable but non-fatal (e.g., a video game or MP3 player) • Historically, the only real-time aspect of the web has been in some of its static content (e.g. video, audio)
  • 3. The rise of the real-time web • The rise of mobile + HTML5 has given rise to a new breed of web application: ones in which dynamic data has real-time semantics • These data-intensive real-time applications present new semantics for web-facing applications • These present new data semantics for web applications: CRUD, ACID, BASE, CAP — meet DIRT!
  • 4. The challenge of DIRTy apps • DIRTy applications tend to have the human in the loop • Good news: deadlines are soft — microseconds only matter when they add up to tens of milliseconds • Bad news: because humans are in the loop, demand for the system can be non-linear • One must deal not only with the traditional challenge of scalability, but also the challenge of a real-time system!
  • 5. Building DIRTy apps • Embedded real-time systems are sufficiently controlled that latency bubbles can be architected away • Web-facing systems are far too sloppy to expect this! • Focus must shift from preventing latency bubbles to preventing latency bubbles from cascading • Operations that can induce latency (network, I/O, etc.) must not be able to take the system out with them! • Implies purely asynchronous and evented architectures, which are notoriously difficult to implement...
  • 6. Enter node.js • node.js is a JavaScript-based framework for building event-oriented servers: var http = require(‘http’); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(8124, "127.0.0.1"); console.log(‘Server running at http://127.0.0.1:8124!’);
  • 7. node.js as building block • node.js is a confluence of three ideas: • JavaScriptʼs rich support for asynchrony (i.e. closures) • High-performance JavaScript VMs (e.g. V8) • The system abstractions that God intended (i.e. UNIX) • Because everything is asynchronous, node.js is ideal for delivering scale in the presence of long-latency events!
  • 8. The primacy of latency • As the correctness of the system is its timeliness, we must be able to measure the system to verify it • In a real-time system, it does not make sense to measure operations per second! • The only metric that matters is latency • This is dangerous to distill to a single number; the distribution of latency over time is essential • This poses both instrumentation and visualization challenges!
  • 9. Instrumenting for latency • Instrumenting for latency requires modifying the system twice: as an operation starts and as it finishes • During an operation, the system must track — on a per- operation basis — the start time of the operation • Upon operation completion, the resulting stored data cannot be a scalar — the distribution is essential when understanding latency • Instrumentation must be systemic; must be able to reach to the sources of latency deep within the system • These constraints eliminate static instrumentation; we need a better way to instrument the system
  • 10. Enter DTrace • Facility for dynamic instrumentation of production systems originally developed circa 2003 for Solaris 10 • Open sourced (along with the rest of Solaris) in 2005; subsequently ported to many other systems (MacOS X, FreeBSD, NetBSD, QNX, nascent Linux port) • Support for arbitrary actions, arbitrary predicates, in situ data aggregation, statically-defined instrumentation • Designed for safe, ad hoc use in production: concise answers to arbitrary questions • Particularly well suited to real-time: the original design center was the understanding of latency bubbles
  • 11. DTrace + Node? • DTrace instruments the system holistically, which is to say, from the kernel, which poses a challenge for interpreted environments • User-level statically defined tracing (USDT) providers describe semantically relevant points of instrumentation • Some interpreted environments (e.g., Ruby, Python, PHP, Erlang) have added USDT providers that instrument the interpreter itself • This approach is very fine-grained (e.g., every function call) and doesnʼt work in JITʼd environments • We decided to take a different tack for Node
  • 12. DTrace for node.js • Given the nature of the paths that we wanted to instrument, we introduced a function into JavaScript that Node can call to get into USDT-instrumented C++ • Introduces disabled probe effect: calling from JavaScript into C++ costs even when probes are not enabled • We use USDT is-enabled probes to minimize disabled probe effect once in C++ • If (and only if) the probe is enabled, we prepare a structure for the kernel that allows for translation into a structure that is familiar to node programmers
  • 13. Node USDT Provider • Example one-liners: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %sn”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘ dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘ dtrace -n gc-start’{self->ts = timestamp}’ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’ • A script to measure HTTP latency: http-server-request { self->ts[args[1]->fd] = timestamp; } http-server-response /self->ts[args[0]->fd]/ { @[zonename] = quantize(timestamp - self->ts[args[0]->fd]); }
  • 14. User-defined USDT probes in node.js • Our USDT technique has been generalized by Chris Andrews in his node-dtrace-provider npm module: https://github.com/chrisa/node-dtrace-provider • Used by Joyentʼs Mark Cavage in his ldap.js to measure and validate operation latency • But how to visualize operation latency?
  • 15. Visualizing latency • Could visualize latency as a scalar (i.e., average): • This hides outliers — and in a real-time system, it is the outliers that you care about! • Using percentiles helps to convey distribution — but crucial detail remains hidden
  • 16. Visualizing latency as a heatmap • Latency is much better visualized as a heatmap, with time on the x-axis, latency on the y-axis, and frequency represented with color saturation: • Many patterns are now visible (as in this example of MySQL query latency), but critical data is still hidden
  • 17. Visualizing latency as a 4D heatmap • Can use hue to represent higher dimensionality: time on the x-axis, latency on the y-axis, frequency via color saturation, and hue representing the new dimension: • In this example, the higher dimension is the MySQL database table associated with the operation
  • 18. Visualizing node.js latency • Using the USDT probes as foundation, we developed a cloud analytics facility that visualizes latency in real-time via four dimensional heatmaps: • Facility is available via Joyentʼs no.de service, Joyentʼs public cloud, or Joyentʼs SmartDataCenter
  • 19. Debugging latency • Latency visualization is essential for understanding where latency is being induced in a complicated system, but how can we determine why? • This requires associating an external event — an I/O request, a network packet, a profiling interrupt — with the code thatʼs inducing it • For node.js — like other dynamic environments — this is historically very difficult: the VM is opaque to the OS • Using DTraceʼs helper mechanism, we have developed a V8 ustack helper that allows OS-level events to be correlated to the node.js-backtrace that induced them • Available for node 0.6.7 on Joyentʼs SmartOS
  • 20. Visualizing node.js CPU latency • Using the node.js ustack helper and the DTrace profile provider, we can determine the relative frequency of stack backtraces in terms of CPU consumption • Stacks can be visualized with flame graphs, a stack visualization developed by Joyentʼs Brendan Gregg:
  • 21. node.js in production • node.js is particularly amenable for the DIRTy apps that typify the real-time web • The ability to understand latency must be considered when deploying node.js-based systems into production! • Understanding latency requires dynamic instrumentation and novel visualization • At Joyent, we have added DTrace-based dynamic instrumentation for node.js to SmartOS, and novel visualization into our cloud and software offerings • Better production support — better observability, better debuggability — remains an important area of node.js development!
  • 22. Thank you! • @ryah and @rmustacc for Node DTrace USDT integration • @dapsays, @rmustacc, @rob_ellis and @notmatt for cloud analytics • @chrisandrews for node-dtrace-provider and @mcavage for putting it to such great use in ldap.js • @dapsays for the V8 DTrace ustack helper • @brendangregg for both the heatmap and flame graph visualizations • More information: http://dtrace.org/blogs/dap, http://dtrace.org/blogs/brendan and http://smartos.org