SlideShare a Scribd company logo
node.js in production:
Reflections on three years
of riding the unicorn
Bryan Cantrill
SVP, Engineering
bryan@joyent.com
@bcantrill
Tuesday, December 3, 13
Production systems

•

Production systems are ones doing real work: when
they misbehave, users or other systems are affected

•

Production systems value reliability, performance and
ease of deployment — usually in that order

•

Contrast to development systems, that value ease of
development and speed of development — in that order

•

These values can be in tension: new languages and
environments typically arise for their development
values, not their production ones

•

Would node.js be any different?

Tuesday, December 3, 13
node.js advantages

•

In terms of production suitability, node.js had — and still
has — a couple of major advantages going for it:

•
•

It’s built on a VM (V8) that itself was designed for
performance

•

Tuesday, December 3, 13

It leverages extant (Unix) abstractions

•

•

It’s not a new language

Its pure event-oriented model aligns ease of
programming with scalability with respect to load

As the stewards of both node and SmartOS, Joyent had
another advantage: we could change, improve or
leverage SmartOS to accommodate node in production
node.js challenges

•

But node.js also has a couple of major challenges:

•
•

JavaScript closures make it easy to accidentally
reference memory

•

Because node.js is often used to connect backend
components, failure to propagate back pressure can
induce memory explosion and death

•

Tuesday, December 3, 13

Single-threaded execution of JavaScript means that
compute-bound code can entirely impede progress

High performance VM also implies inscrutable core
dumps and very limited instrumentation
August 2010: DTrace in node.js

•

Added simple user-level statically defined tracing
(USDT) probes for node.js on platforms that support
DTrace (e.g., Mac OS X, SmartOS)

•

Probes were around connection establishment, serving
HTTP requests, etc.

•

Allowed questions to be dynamically asked of running,
production node.js servers, e.g.:
dtrace -n ‘node*:::http-server-request{
printf(“%s of %s from %sn”, args[0]->method,
args[0]->url, args[1]->remoteAddress)}‘
dtrace -n http-server-request’{
@[args[1]->remoteAddress] = count()}‘
dtrace -n gc-start’{self->ts = timestamp}’ 
-n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’

Tuesday, December 3, 13
August 2010: Deploying 0.2.x

•

In August 2010, we deployed our first node.js-based
service into production: a NodeKnockout leader-board
that used node.js DTrace probes to geolocate
connections to contestants in real-time

•

Results were promising; surprisingly easy to develop
and deploy a node.js based service — and service
consumed very little CPU

•

Watching the Node Knockout contestants in production
revealed they were all light on CPU:

•

But there was a storm cloud...

Tuesday, December 3, 13
August 2010: Deploying 0.2.x, cont.

•

We had a memory leak that resulted in heap exhaustion
after several hours under heavy load

•

Our service was stateless and load balanced for HA, so
this was more disconcerting than debilitating...

•

...but we also had quite a few contestants that would run
their RSS up and crash; there was clearly a larger issue:

Tuesday, December 3, 13
February 2011: 0.4.0

•

In February 2011, we deployed our first major node.jsbased service (on 0.4.0)

•

Service was able to be built remarkably quickly — but
with some pain-points around Connect

•

Despite being potentially a compute-bound service,
CPU consumption was (again) a non-issue

•

And with an updated node (and many fixed node leaks),
memory consumption wasn’t necessarily as acute...

•

…but we hit our first “spinning black hole” problem

Tuesday, December 3, 13
January 2011: node-dtrace-provider

•

Our DTrace probes in node were proving to be too lowlevel for higher-level services — we needed to allow
USDT probes to be expressed in JavaScript

•

Fortunately, DTrace community member Chris Andrews
extended his libusdt to node.js, allowed statically
defined probes in JavaScript, e.g.:
var dtp = d.createDTraceProvider(‘foo’);
var probe = dtp.addProbe(‘foo-start’);
probe.fire(function(p) {
return ([ { bar: 123, baz: ‘bar’ } ]);
});

Tuesday, December 3, 13
April 2011: Restify

•

Based on our experiences with Connect/Express, we
wanted to build a node module that was purpose-built to
implement HTTP-based API endpoints

•

Based on Chris Andrews’ work, we wanted to have first
class support for DTrace

•

Joyent’s Mark Cavage developed node-restify, which
quickly became the foundation for all of our services

•

Built-in DTrace support allows full observability into perroute/per-handler latency — a capability that we could
not live without at this point

Tuesday, December 3, 13
November 2011: MDB support for V8

•

In mid-2011, Joyent’s Dave Pacheco dared to dream the
impossible dream: full postmortem support for V8 for
MDB, the debugger native to SmartOS

•

Several unspeakable layer violations, mdb_v8 brought
postmortem debugging to node.js

•

::jsstack prints full stack including both native C++
frames and JavaScript frames

•
•

::jsprint prints JavaScript objects — from the dump

Tuesday, December 3, 13

Thanks to mdb_v8, we were able to go back to a core
dump from that infinite loop in our service deployed
several months earlier — and nail it
December 2011: DTrace ustack helper

•

mdb_v8 was actually a way station to an even bolder
dream: a DTrace ustack helper for node.js

•

A ustack helper is a bit of code that accompanies a
binary and assists DTrace in probe context to resolve
stack frames to their higher-level names

•

Once completed, allows user-level stack traces to be
associated with in-kernel events — like profiling events

•

Can use the DTrace profile provider to determine how a
node.js program is consuming CPU via stack sampling

Tuesday, December 3, 13
December 2011: Flame graphs

•

Pouring through stack traces can make hot functions
difficult to visualize

•

Joyent’s Brendan Gregg developed flame graphs, which
allow us to easily visualize thousands of sampled
stacks:

Tuesday, December 3, 13
January 2012: Bunyan

•

Logging was becoming more and more of a problem for
us — especially as we were developing distributed
systems in node.js

•

Joyent’s Trent Mick developed node-bunyan, a simple
and fast JSON logging library for node.js

•

Provides standardized, JSON, line-based log output that
can be easily processed with JSON tools, e.g.:
{"name":"moray","hostname":"d1cfb6c7-c975-4ed8-a689fb18f94b6bfc","pid":8393,"component":"manatee","path":"/manatee/sdc/
election","level":20,"db":{"available":2,"max":15,"size":2,"waiting":
0},"options":{"async":false,"read":true},"msg":"pg:
entered","time":"2013-12-03T02:54:24.565Z","v":0}

•

Tuesday, December 3, 13

Also includes command line tool, bunyan, for displaying
Bunyan logs
February 2012: npm shrinkwrap

•

npm allows for fine-grained semver control over
package dependencies, but we found that nested
dependencies could result in non-replicable installs

•

“npm shrinkwrap” generates a file that shrinkwraps all
nested dependencies into npm-shrinkwrap.json,
thereby locking down all nested versions

•

Guarantees that all installs will have same semver
versions of dependencies

•

Doesn’t necessarily guarantee identical installs,
however; for this, one needs private npm repositories

Tuesday, December 3, 13
April 2012: node-vasync

•

There are a number of modules that deal with some of
the mechanics of asynchronous control flow…

•

But we found that libraries that handle We found we
needed one that emphasized debugging, and in
particular,

•

node-vasync captures a number of popular flow patterns
and allows state to be inspected via MDB

Tuesday, December 3, 13
May 2012: ::findjsobjects

•

Building on Dave Pacheco’s mdb_v8, we implemented a
debugger command that iterates over all of memory in a
core dump, looking for JavaScript objects

•

Entirely brute force, but allows one to take a swing at a
nasty node.js issue: semantic memory leaks
> ::findjsobjects
OBJECT #OBJECTS
95709ac1
195
957093f9
66
95f13181
130
8432ff55
222
843304dd
91
8432cc55
99
95f08545
66
8432f2e1
546
9570cafd
47
8432be95
415
8432fb09
67

Tuesday, December 3, 13

#PROPS
3
9
5
3
9
9
14
2
24
3
19

CONSTRUCTOR: PROPS
Object: socket, type, handle
Object: uid, windowsVerbatimArguments, stdio, …
<anonymous> (as exports.StringDecoder): …
Buffer: length, offset, parent
Object: refreservation, creation, name, type, …
Object: time, msg, level, hostname, pid, action, …
ChildProcess: _closesNeeded, stdio, …
Array
Object: <sliced string>, <sliced string>, …
Array
Socket: errorEmitted, _bytesDispatched, …
May 2012: ::findjsobjects -p

•

Searching by property name allows one to find particular
objects in the JavaScript heap, e.g.:
> ::findjsobjects -p ip4addr | ::findjsobjects | ::jsprint -a
8432b109: {
ip4addr: 9aee115d: "10.88.88.200",
VLAN: 9aee1199: "0",
Host Interface: 9aee1185: "e1000g0",
Link Status: 9aee1175: "up",
MAC Address: 9aee113d: "02:08:20:47:93:82",
}
…

•

While designed for postmortem debugging, this allows
mdb_v8 to be used for in situ debugging in development

•

Also guides one to a best practice: towards unique
property names (which we have historically done in the
operating system via structure prefixing)

Tuesday, December 3, 13
July 2012: node-fast

•

While HTTP makes it very easy to put together a
distributed system, parsing and connection
management can become prohibitively expensive

•

In building Manta, we found that we needed something
lighter/faster; Joyent’s Mark Cavage built node-fast

•

Only what you need: fully async/duplex/persistent
connections, simple on-wire protocol (JSON), etc.

•

None of what you don’t want: no IDL madness, no object
model, no binary translation madness, etc.

•

Deliberately light and limited — HTTP is still the right
answer until it isn’t

Tuesday, December 3, 13
October 2012: Bunyan + DTrace

•

With all of our services using Bunyan, we could enable
dynamic logging by adding DTrace USDT probes

•

Can use the raw DTrace probes:
# dtrace -qn log-debug'{printf("%sn", copyinstr(arg0))}' -x strsize=8k
{"name":"wf-moray-backend","hostname":"414ffb35-adee-47b7-bdf4d21cb039386c","pid":
10952,"component":"MorayClient","host":"10.99.99.17","port":
2020,"req_id":"bddb180f-1770-edcf-8df2-b3a81d97e9b1","level":
20,"bucket":"wf_runners","key":"414ffb35-adee-47b7-bdf4d21cb039386c","value":
{"active_at":"2013-12-03T07:22:25.125Z","idle":false},"msg":"putObject:
entered","time":"2013-12-03T07:22:25.135Z","v":0}
...

•

Added the json() subroutine to DTrace to make this
easier to process

•

Can also use “bunyan -p” and avoid the lower-level
DTrace details entirely

Tuesday, December 3, 13
May 2013: --abort-on-uncaught-exception

•

Crash dumps are great — but aborting after an
uncaught exception makes it very difficult to determine
the true origin of the exception

•

Dave Pacheco implemented a V8 patch to induce a
process abort (and a core dump) on an uncaught
exception

•

This allows us to use postmortem debugging to debug
our everyday logic errors

•

Available starting in 0.10.x — we use it wherever we
have it!

Tuesday, December 3, 13
July 2013: Thoth

•

One of the most important systems we have built in
node is Manta, our object store featuring in situ compute

•

Manta is an excellent platform for building data-based
services — especially for large data objects

•

We built manta-thoth, a platform for core and crash
dump analysis that allows us to debug core dumps
without moving them

•

Thoth has become critically important for us to track and
automatically debug production node.js services

Tuesday, December 3, 13
December 2013: Dump analysis on Linux

•

Postmortem debugging has been a (the) tremendous
breakthrough for node.js in production…

•

...but despite all node’s postmortem support all being
open source, it has been limited to SmartOS

•

Some have toyed with porting MDB to Linux; this is in
principle possible, but will be rough sledding

•

Joyent’s TJ Fontaine (of node core fame) observed what
we had done with dump analysis on Manta and had a
simpler idea…

•

What about making Linux dumps consumable on
SmartOS — and therefore Manta?

Tuesday, December 3, 13
December 2013: Linux support in libproc

•

Over the course of a multiday engineering hackathon,
TJ and Joyent’s Max Brunning added support for Linux
crash dumps in SmartOS’s libproc

•

Fortunately, because of the way the postmortem work
was done by Dave Pacheco, it Just Works

•

Do this yourself:
https://gist.github.com/tjfontaine/de104fe058300a51f7cf

•

For Linux users: put your Linux dumps to Manta, and
you can finally debug those pesky leaks and crashes!

•

Use --abort-on-uncaught-exception and you
can use Manta and postmortem debugging to debug
more quotidian programming errors!

Tuesday, December 3, 13
Node.js in production!

•

For us at Joyent, the tooling that we have built into
node.js has resulted in what we believe to be the best
dynamic environment for production use

•

Yes, even when compared to much older platforms like
Java and Erlang...

•

There is still work to be done, especially around add-on
development (see TJ’s shim work!) and potentially better
bundling of objects…

•

We will continue to emphasize production deployment
and use in our stewardship of node.js!

Tuesday, December 3, 13
Thank you

•

@dapsays, the Patron Saint of node.js in production, for
DTrace support, MDB support, node-vasync, Manta, etc.

•
•
•
•
•

@mcavage for node-restify, node-fast, Manta, etc.

Tuesday, December 3, 13

@trentmick for node-bunyan
@chrisandrews for node-dtrace-provider
@brendangregg for flame graphs
@tjfontaine for bringing postmortem debugging to an
entirely new audience with Linux support for libproc!

More Related Content

PDF
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
PDF
Docker's Killer Feature: The Remote API
PDF
Run containers on bare metal already!
PDF
node.js and Containers: Dispatches from the Frontier
PDF
Dynamic Languages in Production: Progress and Open Challenges
PDF
Experiences porting KVM to SmartOS
PDF
Manta: a new internet-facing object storage facility that features compute by...
PDF
The Container Revolution: Reflections after the first decade
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
Docker's Killer Feature: The Remote API
Run containers on bare metal already!
node.js and Containers: Dispatches from the Frontier
Dynamic Languages in Production: Progress and Open Challenges
Experiences porting KVM to SmartOS
Manta: a new internet-facing object storage facility that features compute by...
The Container Revolution: Reflections after the first decade

What's hot (20)

PDF
The DIY Punk Rock DevOps Playbook
PDF
Bringing the Unix Philosophy to Big Data
PDF
Papers We Love: Jails and Zones
PDF
Why it’s (past) time to run containers on bare metal
PDF
Leaping the chasm from proprietary to open: A survivor's guide
PDF
The dream is alive! Running Linux containers on an illumos kernel
PDF
Platform as reflection of values: Joyent, node.js, and beyond
PDF
Down Memory Lane: Two Decades with the Slab Allocator
PDF
Triton + Docker, July 2015
PDF
Cloud stack design camp on jun 15
PDF
BayLISA meetup: 8/16/12
PDF
The Internet-of-things: Architecting for the deluge of data
PDF
Instrumenting the real-time web
PDF
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
PPTX
Xen and Apache cloudstack
PPTX
Deploying Apache CloudStack from API to UI
PDF
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
PPT
Hyper v r2 deep dive
PPTX
BACD July 2012 : The Xen Cloud Platform
The DIY Punk Rock DevOps Playbook
Bringing the Unix Philosophy to Big Data
Papers We Love: Jails and Zones
Why it’s (past) time to run containers on bare metal
Leaping the chasm from proprietary to open: A survivor's guide
The dream is alive! Running Linux containers on an illumos kernel
Platform as reflection of values: Joyent, node.js, and beyond
Down Memory Lane: Two Decades with the Slab Allocator
Triton + Docker, July 2015
Cloud stack design camp on jun 15
BayLISA meetup: 8/16/12
The Internet-of-things: Architecting for the deluge of data
Instrumenting the real-time web
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Xen and Apache cloudstack
Deploying Apache CloudStack from API to UI
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Hyper v r2 deep dive
BACD July 2012 : The Xen Cloud Platform
Ad

Similar to node.js in production: Reflections on three years of riding the unicorn (20)

PDF
Node.js at Joyent: Engineering for Production
PDF
Instrumenting the real-time web: Node.js in production
KEY
Dcjq node.js presentation
PDF
Debugging node in prod
KEY
Practical Use of MongoDB for Node.js
KEY
Mongo and node mongo dc 2011
PDF
NodeJS for Beginner
PPTX
MunichJS - node.js - from the beginning
PDF
Surge2012
PDF
Unity Loves HelNode - Helsinki Node.js November Meetup
PDF
JavaScript in 2015
PDF
Nodejs - A quick tour (v5)
PPTX
Event-driven IO server-side JavaScript environment based on V8 Engine
PDF
Node.js, toy or power tool?
PDF
Introduction to Node.js
PPTX
NodeJs
PDF
Matthew Eernisse, NodeJs, .toster {webdev}
KEY
NodeJS
PDF
NodeWay in my project & sails.js
KEY
A million connections and beyond - Node.js at scale
Node.js at Joyent: Engineering for Production
Instrumenting the real-time web: Node.js in production
Dcjq node.js presentation
Debugging node in prod
Practical Use of MongoDB for Node.js
Mongo and node mongo dc 2011
NodeJS for Beginner
MunichJS - node.js - from the beginning
Surge2012
Unity Loves HelNode - Helsinki Node.js November Meetup
JavaScript in 2015
Nodejs - A quick tour (v5)
Event-driven IO server-side JavaScript environment based on V8 Engine
Node.js, toy or power tool?
Introduction to Node.js
NodeJs
Matthew Eernisse, NodeJs, .toster {webdev}
NodeJS
NodeWay in my project & sails.js
A million connections and beyond - Node.js at scale
Ad

More from bcantrill (20)

PDF
Predicting the Present
PDF
Sharpening the Axe: The Primacy of Toolmaking
PDF
Coming of Age: Developing young technologists without robbing them of their y...
PDF
I have come to bury the BIOS, not to open it: The need for holistic systems
PDF
Towards Holistic Systems
PDF
The Coming Firmware Revolution
PDF
Hardware/software Co-design: The Coming Golden Age
PDF
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
PDF
No Moore Left to Give: Enterprise Computing After Moore's Law
PDF
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
PDF
Visualizing Systems with Statemaps
PDF
Platform values, Rust, and the implications for system software
PDF
Is it time to rewrite the operating system in Rust?
PDF
dtrace.conf(16): DTrace state of the union
PDF
The Hurricane's Butterfly: Debugging pathologically performing systems
PDF
Papers We Love: ARC after dark
PDF
Principles of Technology Leadership
PDF
Zebras all the way down: The engineering challenges of the data path
PDF
Debugging under fire: Keeping your head when systems have lost their mind
PDF
The State of Cloud 2016: The whirlwind of creative destruction
Predicting the Present
Sharpening the Axe: The Primacy of Toolmaking
Coming of Age: Developing young technologists without robbing them of their y...
I have come to bury the BIOS, not to open it: The need for holistic systems
Towards Holistic Systems
The Coming Firmware Revolution
Hardware/software Co-design: The Coming Golden Age
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
No Moore Left to Give: Enterprise Computing After Moore's Law
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Visualizing Systems with Statemaps
Platform values, Rust, and the implications for system software
Is it time to rewrite the operating system in Rust?
dtrace.conf(16): DTrace state of the union
The Hurricane's Butterfly: Debugging pathologically performing systems
Papers We Love: ARC after dark
Principles of Technology Leadership
Zebras all the way down: The engineering challenges of the data path
Debugging under fire: Keeping your head when systems have lost their mind
The State of Cloud 2016: The whirlwind of creative destruction

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
1. Introduction to Computer Programming.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Univ-Connecticut-ChatGPT-Presentaion.pdf
cloud_computing_Infrastucture_as_cloud_p
1. Introduction to Computer Programming.pptx
Machine learning based COVID-19 study performance prediction
gpt5_lecture_notes_comprehensive_20250812015547.pdf

node.js in production: Reflections on three years of riding the unicorn

  • 1. node.js in production: Reflections on three years of riding the unicorn Bryan Cantrill SVP, Engineering bryan@joyent.com @bcantrill Tuesday, December 3, 13
  • 2. Production systems • Production systems are ones doing real work: when they misbehave, users or other systems are affected • Production systems value reliability, performance and ease of deployment — usually in that order • Contrast to development systems, that value ease of development and speed of development — in that order • These values can be in tension: new languages and environments typically arise for their development values, not their production ones • Would node.js be any different? Tuesday, December 3, 13
  • 3. node.js advantages • In terms of production suitability, node.js had — and still has — a couple of major advantages going for it: • • It’s built on a VM (V8) that itself was designed for performance • Tuesday, December 3, 13 It leverages extant (Unix) abstractions • • It’s not a new language Its pure event-oriented model aligns ease of programming with scalability with respect to load As the stewards of both node and SmartOS, Joyent had another advantage: we could change, improve or leverage SmartOS to accommodate node in production
  • 4. node.js challenges • But node.js also has a couple of major challenges: • • JavaScript closures make it easy to accidentally reference memory • Because node.js is often used to connect backend components, failure to propagate back pressure can induce memory explosion and death • Tuesday, December 3, 13 Single-threaded execution of JavaScript means that compute-bound code can entirely impede progress High performance VM also implies inscrutable core dumps and very limited instrumentation
  • 5. August 2010: DTrace in node.js • Added simple user-level statically defined tracing (USDT) probes for node.js on platforms that support DTrace (e.g., Mac OS X, SmartOS) • Probes were around connection establishment, serving HTTP requests, etc. • Allowed questions to be dynamically asked of running, production node.js servers, e.g.: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %sn”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘ dtrace -n http-server-request’{ @[args[1]->remoteAddress] = count()}‘ dtrace -n gc-start’{self->ts = timestamp}’ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’ Tuesday, December 3, 13
  • 6. August 2010: Deploying 0.2.x • In August 2010, we deployed our first node.js-based service into production: a NodeKnockout leader-board that used node.js DTrace probes to geolocate connections to contestants in real-time • Results were promising; surprisingly easy to develop and deploy a node.js based service — and service consumed very little CPU • Watching the Node Knockout contestants in production revealed they were all light on CPU: • But there was a storm cloud... Tuesday, December 3, 13
  • 7. August 2010: Deploying 0.2.x, cont. • We had a memory leak that resulted in heap exhaustion after several hours under heavy load • Our service was stateless and load balanced for HA, so this was more disconcerting than debilitating... • ...but we also had quite a few contestants that would run their RSS up and crash; there was clearly a larger issue: Tuesday, December 3, 13
  • 8. February 2011: 0.4.0 • In February 2011, we deployed our first major node.jsbased service (on 0.4.0) • Service was able to be built remarkably quickly — but with some pain-points around Connect • Despite being potentially a compute-bound service, CPU consumption was (again) a non-issue • And with an updated node (and many fixed node leaks), memory consumption wasn’t necessarily as acute... • …but we hit our first “spinning black hole” problem Tuesday, December 3, 13
  • 9. January 2011: node-dtrace-provider • Our DTrace probes in node were proving to be too lowlevel for higher-level services — we needed to allow USDT probes to be expressed in JavaScript • Fortunately, DTrace community member Chris Andrews extended his libusdt to node.js, allowed statically defined probes in JavaScript, e.g.: var dtp = d.createDTraceProvider(‘foo’); var probe = dtp.addProbe(‘foo-start’); probe.fire(function(p) { return ([ { bar: 123, baz: ‘bar’ } ]); }); Tuesday, December 3, 13
  • 10. April 2011: Restify • Based on our experiences with Connect/Express, we wanted to build a node module that was purpose-built to implement HTTP-based API endpoints • Based on Chris Andrews’ work, we wanted to have first class support for DTrace • Joyent’s Mark Cavage developed node-restify, which quickly became the foundation for all of our services • Built-in DTrace support allows full observability into perroute/per-handler latency — a capability that we could not live without at this point Tuesday, December 3, 13
  • 11. November 2011: MDB support for V8 • In mid-2011, Joyent’s Dave Pacheco dared to dream the impossible dream: full postmortem support for V8 for MDB, the debugger native to SmartOS • Several unspeakable layer violations, mdb_v8 brought postmortem debugging to node.js • ::jsstack prints full stack including both native C++ frames and JavaScript frames • • ::jsprint prints JavaScript objects — from the dump Tuesday, December 3, 13 Thanks to mdb_v8, we were able to go back to a core dump from that infinite loop in our service deployed several months earlier — and nail it
  • 12. December 2011: DTrace ustack helper • mdb_v8 was actually a way station to an even bolder dream: a DTrace ustack helper for node.js • A ustack helper is a bit of code that accompanies a binary and assists DTrace in probe context to resolve stack frames to their higher-level names • Once completed, allows user-level stack traces to be associated with in-kernel events — like profiling events • Can use the DTrace profile provider to determine how a node.js program is consuming CPU via stack sampling Tuesday, December 3, 13
  • 13. December 2011: Flame graphs • Pouring through stack traces can make hot functions difficult to visualize • Joyent’s Brendan Gregg developed flame graphs, which allow us to easily visualize thousands of sampled stacks: Tuesday, December 3, 13
  • 14. January 2012: Bunyan • Logging was becoming more and more of a problem for us — especially as we were developing distributed systems in node.js • Joyent’s Trent Mick developed node-bunyan, a simple and fast JSON logging library for node.js • Provides standardized, JSON, line-based log output that can be easily processed with JSON tools, e.g.: {"name":"moray","hostname":"d1cfb6c7-c975-4ed8-a689fb18f94b6bfc","pid":8393,"component":"manatee","path":"/manatee/sdc/ election","level":20,"db":{"available":2,"max":15,"size":2,"waiting": 0},"options":{"async":false,"read":true},"msg":"pg: entered","time":"2013-12-03T02:54:24.565Z","v":0} • Tuesday, December 3, 13 Also includes command line tool, bunyan, for displaying Bunyan logs
  • 15. February 2012: npm shrinkwrap • npm allows for fine-grained semver control over package dependencies, but we found that nested dependencies could result in non-replicable installs • “npm shrinkwrap” generates a file that shrinkwraps all nested dependencies into npm-shrinkwrap.json, thereby locking down all nested versions • Guarantees that all installs will have same semver versions of dependencies • Doesn’t necessarily guarantee identical installs, however; for this, one needs private npm repositories Tuesday, December 3, 13
  • 16. April 2012: node-vasync • There are a number of modules that deal with some of the mechanics of asynchronous control flow… • But we found that libraries that handle We found we needed one that emphasized debugging, and in particular, • node-vasync captures a number of popular flow patterns and allows state to be inspected via MDB Tuesday, December 3, 13
  • 17. May 2012: ::findjsobjects • Building on Dave Pacheco’s mdb_v8, we implemented a debugger command that iterates over all of memory in a core dump, looking for JavaScript objects • Entirely brute force, but allows one to take a swing at a nasty node.js issue: semantic memory leaks > ::findjsobjects OBJECT #OBJECTS 95709ac1 195 957093f9 66 95f13181 130 8432ff55 222 843304dd 91 8432cc55 99 95f08545 66 8432f2e1 546 9570cafd 47 8432be95 415 8432fb09 67 Tuesday, December 3, 13 #PROPS 3 9 5 3 9 9 14 2 24 3 19 CONSTRUCTOR: PROPS Object: socket, type, handle Object: uid, windowsVerbatimArguments, stdio, … <anonymous> (as exports.StringDecoder): … Buffer: length, offset, parent Object: refreservation, creation, name, type, … Object: time, msg, level, hostname, pid, action, … ChildProcess: _closesNeeded, stdio, … Array Object: <sliced string>, <sliced string>, … Array Socket: errorEmitted, _bytesDispatched, …
  • 18. May 2012: ::findjsobjects -p • Searching by property name allows one to find particular objects in the JavaScript heap, e.g.: > ::findjsobjects -p ip4addr | ::findjsobjects | ::jsprint -a 8432b109: { ip4addr: 9aee115d: "10.88.88.200", VLAN: 9aee1199: "0", Host Interface: 9aee1185: "e1000g0", Link Status: 9aee1175: "up", MAC Address: 9aee113d: "02:08:20:47:93:82", } … • While designed for postmortem debugging, this allows mdb_v8 to be used for in situ debugging in development • Also guides one to a best practice: towards unique property names (which we have historically done in the operating system via structure prefixing) Tuesday, December 3, 13
  • 19. July 2012: node-fast • While HTTP makes it very easy to put together a distributed system, parsing and connection management can become prohibitively expensive • In building Manta, we found that we needed something lighter/faster; Joyent’s Mark Cavage built node-fast • Only what you need: fully async/duplex/persistent connections, simple on-wire protocol (JSON), etc. • None of what you don’t want: no IDL madness, no object model, no binary translation madness, etc. • Deliberately light and limited — HTTP is still the right answer until it isn’t Tuesday, December 3, 13
  • 20. October 2012: Bunyan + DTrace • With all of our services using Bunyan, we could enable dynamic logging by adding DTrace USDT probes • Can use the raw DTrace probes: # dtrace -qn log-debug'{printf("%sn", copyinstr(arg0))}' -x strsize=8k {"name":"wf-moray-backend","hostname":"414ffb35-adee-47b7-bdf4d21cb039386c","pid": 10952,"component":"MorayClient","host":"10.99.99.17","port": 2020,"req_id":"bddb180f-1770-edcf-8df2-b3a81d97e9b1","level": 20,"bucket":"wf_runners","key":"414ffb35-adee-47b7-bdf4d21cb039386c","value": {"active_at":"2013-12-03T07:22:25.125Z","idle":false},"msg":"putObject: entered","time":"2013-12-03T07:22:25.135Z","v":0} ... • Added the json() subroutine to DTrace to make this easier to process • Can also use “bunyan -p” and avoid the lower-level DTrace details entirely Tuesday, December 3, 13
  • 21. May 2013: --abort-on-uncaught-exception • Crash dumps are great — but aborting after an uncaught exception makes it very difficult to determine the true origin of the exception • Dave Pacheco implemented a V8 patch to induce a process abort (and a core dump) on an uncaught exception • This allows us to use postmortem debugging to debug our everyday logic errors • Available starting in 0.10.x — we use it wherever we have it! Tuesday, December 3, 13
  • 22. July 2013: Thoth • One of the most important systems we have built in node is Manta, our object store featuring in situ compute • Manta is an excellent platform for building data-based services — especially for large data objects • We built manta-thoth, a platform for core and crash dump analysis that allows us to debug core dumps without moving them • Thoth has become critically important for us to track and automatically debug production node.js services Tuesday, December 3, 13
  • 23. December 2013: Dump analysis on Linux • Postmortem debugging has been a (the) tremendous breakthrough for node.js in production… • ...but despite all node’s postmortem support all being open source, it has been limited to SmartOS • Some have toyed with porting MDB to Linux; this is in principle possible, but will be rough sledding • Joyent’s TJ Fontaine (of node core fame) observed what we had done with dump analysis on Manta and had a simpler idea… • What about making Linux dumps consumable on SmartOS — and therefore Manta? Tuesday, December 3, 13
  • 24. December 2013: Linux support in libproc • Over the course of a multiday engineering hackathon, TJ and Joyent’s Max Brunning added support for Linux crash dumps in SmartOS’s libproc • Fortunately, because of the way the postmortem work was done by Dave Pacheco, it Just Works • Do this yourself: https://gist.github.com/tjfontaine/de104fe058300a51f7cf • For Linux users: put your Linux dumps to Manta, and you can finally debug those pesky leaks and crashes! • Use --abort-on-uncaught-exception and you can use Manta and postmortem debugging to debug more quotidian programming errors! Tuesday, December 3, 13
  • 25. Node.js in production! • For us at Joyent, the tooling that we have built into node.js has resulted in what we believe to be the best dynamic environment for production use • Yes, even when compared to much older platforms like Java and Erlang... • There is still work to be done, especially around add-on development (see TJ’s shim work!) and potentially better bundling of objects… • We will continue to emphasize production deployment and use in our stewardship of node.js! Tuesday, December 3, 13
  • 26. Thank you • @dapsays, the Patron Saint of node.js in production, for DTrace support, MDB support, node-vasync, Manta, etc. • • • • • @mcavage for node-restify, node-fast, Manta, etc. Tuesday, December 3, 13 @trentmick for node-bunyan @chrisandrews for node-dtrace-provider @brendangregg for flame graphs @tjfontaine for bringing postmortem debugging to an entirely new audience with Linux support for libproc!