►
From YouTube: Throw Me a Lifebuoy: Debugging Node.js in Production with Diagnostic Reports - Christopher Hiller
Description
Christopher Hiller, IBM
Diagnostic Reports are a recent addition to Node.js core. This feature enables insight into Node.js processes running in production—without needing to attach a debugger—and the results can be interpreted offline. If you've ever had to debug issues in production with a customer, you know this can be a life-saver.
I’ll show you how to trigger report generation manually and automatically, then use the results to diagnose a problem process. While this is fine and dandy, manual diagnosis can be tedious, so I'll also demo a toolkit I've been working on. This toolkit can help automatically detect known issues, redact secrets from a report, and much more.
A
So
this
talk
is
about
diagnostic
reports
and
nodejs.
It's
going
to
cover
some
of
the
material
that
gireesh
covered
yesterday,
but
it's
also
going
to
talk
a
little
bit
about
all
I'm
going
to
talk
about
giving
you
an
introduction,
say
a
few
things
that
you
can
do
with
diagnostic
reports:
how
to
use
them,
basics
and
I'm,
going
to
talk
about
some
tooling
and
belt
to
help
you
use.
A
So
my
name
is
Chris
Hillier
I
come
from
Portland
Oregon
I'm
known
as
bone
skull
on
the
internet,
so
I
work
for
IBM,
primarily
working
on
nodejs
related
things,
I'm
a
maintainer
of
MOCA,
which
is
a
testing
framework,
also
involved
as
a
maintainer
of
MOCA
in
the
open,
J's
Foundation,
cross-project,
Council
and
I
am
bone
skull
on
get
up
and
on
Twitter.
If
you
have
nothing
better
to
do,
you
can
look
at
my
tweets
and
that's
bone
skull
with
a
zero.
A
A
Maybe
your
stack
raises
in
your
logs,
and
so
you
look
at
the
stack
trace
that
says:
oh
well,
you're
doing
something
weird
and
the
stack
trace
points
to
to
this
code,
where
you're
you're
saying
like
rinder
and
you
want
to
delete
a
temp
directory
or
something
and
you
pass
this
flag,
and
so
the
error
that
you
get
looks
like
this,
so
error
not
empty
directory,
not
empty
or
under
yadda
yadda
yadda,
not
empty.
So,
okay.
Why
would
this
fail?
Some
of
you
may
have
an
idea,
so
you're
you're,
passing
a
correct
flag.
A
You're,
meticulous
integration
test
pass
works
on
your
machine
works
in
CI
builds
green,
but
this
happens
so
one
way
to
help
you
figure
out.
This
problem
is
to
use
a
diagnostic
report,
and
can
you
and
see
that
anyway,
it
says,
use
a
diagnostic
report,
and
so
let
me
describe
the
diagnostic
report
and
this
is
the
gist:
it's
a
experimental
module,
some
functionality
added
it's
in
no.12,
so
this
is
in
LTS.
You
can
use
it,
but
it
is
an
experimental
API.
So
that
means
it's
behind
a
flag.
A
You
need
to
pass
a
flag
to
use
it
experimental
if
you're
not
familiar
in
the
node
sense
that
that
essentially
means
the
API
or
the
the
behavior
could
break
outside
of
the
normal
major
release
cadence.
So
if
you
do
start
using
them,
please
be
aware
that
they
could
break.
That
being
said,
they
do
their
job
very
well,
but
that
that
API
might
change
the
output
might
change
slightly.
You
know
before
we
hit
the
next
next
major.
A
So
essentially,
what
this
is
is
it's
a
huge
JSON
dump
reflecting
the
state
of
the
process,
most
of
I,
seen
of
them
I've
seen
work
out
to
be
two
up:
28
25k,
you
can
trigger
it
several
looking
ways,
including
you
can
give
it
some
command
line
flags.
You
can
create
programmatically.
You
can
even
tell
it
to
dump
a
diagnostic
report
when
you
receive
a
user
signal.
So
how
do
we
want
to
create
a
port
in
this
case?
Where
we've
got
this
process?
A
That's
crashed
so
we're
gonna,
we're
gonna
start
up
that
process
again,
except
for
any
of
these
flags,
so
experimental
report.
You
need
that
to
do
any
of
this
stuff
right
now.
You're
gonna
see
report
uncut
exception
and
then
give
it
a
nice
file
name.
You
don't
need
to
pass
the
the
file
name,
but
in
our
case
that
will
be
helpful,
but
normally
it'll
create
this
very
long,
filename
based
on
the
time
stamp.
So
you
run
this
in
in
your
production
and
time
time
passes
and
now
you
have
another
problem.
A
So
now
you
have
a
diagnostic
report
and
it
crashed,
and
now
you
have
a
lot
of
JSON,
and
so
it
looks
kind
of
like
this,
where
it's
just
like
this
blob-
and
you
know
we
can
kind
of
zoom
in
and
maybe
take
a
closer
look.
So
it
contains
a
whole
lot
of
stuff
and
I'm
gonna
try
to
run
through
this
pretty
quick,
but
so
there's
nine
or
eight
defending
nine
top-level
properties
and
the
first
one
is
going
to
be
header,
and
that's
going
to
talk
all
about
the
report
itself.
A
Information
about
the
node
process,
the
command
line.
You
can
see
the
version,
the
versions
of
the
in
the
libraries
that
node
uses
operating
system
version
CPUs
all
sorts
of
stuff-
so
that's
gonna
be
in
the
header
next
one
if
we
will
scroll
down-
and
this
is
an
order-
so
the
next
one
you
see
is
JavaScript
stack
and
it's
going
to,
of
course
give
you
the
stack
in
this
case.
It
crashed
on
an
air.
A
A
Next
is
this
libuv
might
need
a
better
name,
but
it's
it's
essentially
the
state
of
the
event
loop.
What's
in
that
event,
loop
right
now,
and
so
this
is
it,
you
know
it
gets
a
little
technical
but
there's
stuff
in
this
this
particular
event
loop
and
over
there
environment
variables.
This
has
been
trimmed,
but
it's
everything
in
your
environment.
A
Windows
users
will
not
get
this
so
user
limits
if
you're
a
user
on
a
Linux
system,
you'll
have
like
limits
of
what
you
can
consume.
Shared
objects
will
be
the
shared
libraries
that
that
node
has
is
using,
and
so
what
we
are
concerned
with
like
what
can
help
us
solve
the
problem
we
have
well,
it
would
be
here
in
the
header,
so
we
look
in
this
header
and
we
see
we
want
to
focus
on
this.
The
node.js
version.
A
So
the
problem
here
is
rim
ref,
with
that
recursive
flag
didn't
land
until
12.10,
so
your
node
version
is
too
old,
but
a
start,
no
stack
trace,
wouldn't
tell
you
that
so
great
hey,
you
found
the
problem
good
job,
so
you
take
this
and
you
want
to
say,
oh
look.
This
is
this
is
the
problem
everybody
and
you're
going
to
slack,
and
you
take
this
big
report.
A
You
paste
it
in
there
and
now
you
have
another
problem
and
what
you
did
was
you
just
leaked
the
entire
environment
like
in
the
slack
or
wherever
you
sent
it?
Maybe
you
sent
it
through
email.
Hopefully,
you
didn't
put
it
on
paste
bin,
but
yeah
they're
gonna,
be
your
your
your
AWS
stuff
in
there.
Who
knows
so.
Your
team
lead
is
pissed,
and
so
that's
that's
kind
of
what
we
need
to
avoid,
so
so
how
we
gonna.
What
are
we
gonna
do
about
this?
A
So
there
is
a
tool
that
I
was
working
on
and
and
it's
out
now,
but
it's
called
report
toolkit
and
it's
a
tool
for
processing
and
analyzing
diagnose
reports,
it's
kind
of
a
multi-tool,
so
it
does
several
different
things.
It's
not
Unix.
You
know,
you
know
how
multi
tools
kind
of
suck
to
do
any
of
those
with
anyway.
So
they
don't
do
any
one
thing
great,
but
I'm
getting
ahead
of
myself.
So
this
thing
is
going:
this
does
some
cool
stuff.
It
gives
you
a
CLI
tool
to
to
consume
these
things
and
there's
programmable
API.
A
A
Is
it
will
look
for
things
that
it
knows
are
potentially
naughty
and
need
to
be
kept
secret
and
it's
based
on
the
black
list
that
may
be
WSS
get
secrets,
project
news-
you
may
be
familiar
with
that,
but
you
can
kind
of
customize
it
to
your
needs.
So
what
I
will
do
is
little
it'll
replace
all
those
terrible
secrets
in
that
report
file
with
this
string
and
so
it'll
and
it'll
overwrite
the
file
in
place.
So
you
know
nope
nobody's
the
wiser
right,
and
so
now
you
can.
A
You
can
safely
pass
this
report
around
sure
with
your
colleagues.
You
know
discuss
it
over
dinner,
but
so
time
passes
and
you
get
you,
you
have
another
problem,
so
you
have
this
this
process
and
maybe
it's
even
a
test
or
something,
but
you
have
this
process
and
it's
running,
but
but
you
thought
it
should
have
stopped.
So
it's
not
a
zombie
process
but
I'm
just
gonna
call
it
a
zombie
process.
A
So
you
don't
know
why-
and
this
is
this
is
weird
because
so
you
got
this
process
and
you'll
know
why,
and
so
you
open
up
your
debugger
and
it
doesn't.
You
know
it
doesn't
stop
it's
not
doing
anything.
It's
just
sitting
there.
So
it's
not
hitting
lines
of
code.
You
know
you
set
breakpoints
whatever,
so
you
don't
know
why
one
thing
you
can
do
this
is
something
that
report
diagnostic
reports
can
help
you
with.
So
you
can
actually
generate
a
diagnostic
report
on
demanded.
A
The
process
doesn't
have
to
crash
for
you
to
get
a
diagnostic
report
and
so
I
know
we
love
command
line
flags,
and
so
we
can
send
report
on
the
signal
and
so
by
default.
What
this
will
do
is
the
process
will
respond
to
the
user.
To
signal
and
that
that's
configurable,
but
it
but
so
you'll
start
start
your
process
and
you
can
do
this
sort
of
thing
and
the
process
ID,
and
so
that
sends
the
user
to
signal
and
when
the
process
receives
that
signal.
A
Node
will
say
it's
time
for
me
to
create
a
diagnostic
report
and
so
it'll
dump
a
diagnostic
report
out.
A
A
So
this
timer
and
it's
active,
so
it's
so
it's
in
the
event
loop
and
it's
referenced.
So,
okay,
it's
so
it's
still
on
hasn't
been
garbage
collected
fires,
an
MS
from
now
999,
that's
a
while
right,
and
so
you
can
see
that
using
this
you
can
get
a
clue,
so
I
must
have
created
some
set
timeout
or
some
interval
or
something-
and
you
know
I
was
off
by
several
orders
of
magnitude.
You
know
who
knows
but
that'll
give
you
a
clue
to
try
to
figure
out.
A
A
So
there
are
these
rules-
they're,
heuristics
they're,
just
some
algorithms
and
functions
that
that
accept
a
a
report
file
and
you
can
examine
the
function,
examines
the
report
file
and
it
decides
what
to
do,
and
so
the
the
there
are
built-in
rules.
One
of
these
happens
to
be
the
long
timeout
rule
which
will
look
for
this
very
situation
in
your
report
file,
and
so
you
could
run
this
on
your
report
file.
Any
report
file
really
and
it'll
look
and
you'll
see.
Is
there
anything
fishy
going
on
here?
A
So
one
of
those
rules?
Is
it
a
long
time
out
one
where
it
will
it
will?
Let
you
know
if
there's
a
timeout?
That's
that's
far
off
in
the
future
and
it's
still
active,
and
so
you
could.
You
know,
write
your
own
rules
to
this.
It's
like
a
you
know
a
plugin
system,
and
so
you
could
you
could
write
your
own.
It
works.
Similarly,
the
similar
I
came
and
say
that
word,
but
that's
how
it
works.
It
works
like
yes,
LaHood,
and
so
you
can
write
your
own
rules,
publish
the
ESM
you
could
have.
A
A
A
That's
one
of
the
rules,
there's
there's
others
that
will
look
and
make
sure
that
you're
you
know,
memory
usage
is
within
expected
range.
Your
CPU
usage
is
within
an
expected
range,
there's
another
one
that
actually
will
examine
your
shared
shared
libraries
versus
the
libraries
that
node
was
built
built
with
and
if
there's
a
mismatch
there,
and
so
that's
not
gonna,
you
know
be
something
that
most
people
wouldn't
be
concerned
about.
But
if
you're
compiling
node
that
might
come
up
where
you
say,
have
a
different
version
of
open
SSL
than
node
expects
so.
A
A
You
know,
maybe
it
fails
on
one
machine,
but
not
the
other,
and
you
can't
really
tell
what
the
difference
is.
So
one
thing
that
report
toolkit
can
help
us
here
is
it
provides
a
diff
sub
come
and
so
it's
you
know,
you
could
take
a
report,
a
dot
Jason
for
BJ's
and
give
it
to
your
favorite
dipping
tool,
but
that's
for
dipping
source
code
or
text
files.
It's
it's
not
for
dipping
these
report
files
a
neat
thing
about
when
we
know
the
data
we
have.
A
We
can
create
a
custom,
a
purpose-built,
diff
tool
for
this,
and
so
that's
what
this
is
it.
It
tries
to
ignore
stuff
that
it
thinks
you
probably
won't
care
about
and
so
tries
to
kind
of.
You
know
signal-to-noise
ratio.
It
tries
to
make
it
nicer
for
you
to
to
look
at
your
reports
and
say:
oh
well,
that's
how
they're
different.
Instead
of
this,
you
know
huge,
unified
dump
or
side-by-side
diff,
and,
and
so
it
answers
your
process.
A
How
does
this,
if
you
run
this
again
and
again
again,
you
couldn't
you
can
different
them
all
and
say
how
does
the
process
change
over
time?
Maybe
that's
a
single
process,
maybe
that's
a
process
on
several
different
machines,
but
you
can
dip
any
two
reports
this
way
and
the
diff
output
looks
something
like
that.
A
In
this
case,
we
see
that
you
know
the
command
line.
Flags
are
a
little
different.
So
with
this
first
report
file,
we
actually
said
efore
for
eval,
and
so
the
the
command
that
was
sent
was
actually
hey.
Just
write
a
report,
the
other
one
who
knows,
but
it
didn't
have
any
command
line
options.
The
the
first
report
was
generated
with
12.1,
the
second
one
was
generated
with
11.2
and
so
it
this
is.
This
is
an
excerpt
of
that
diff,
but
yeah.
A
A
Another
thing
is:
maybe
you
got
maybe
got
processes
that
are
crashing
somewhere,
maybe
a
lot
of
them
and
maybe
you're
like
that's,
not
a
big
deal.
We
can
just
restart
them
because
it's
no
right,
but
so
you
want
to
know
how
frequently
certain
exceptions
are
happening,
and
maybe
this
will
help
you
prioritize
bug,
fixes
or
who
knows
what,
but
to
be
able
to
figure
this
out.
How
often
does
a
particular
exception
happen?
You
need
to
be
able
to
count
them.
A
So
how
do
you
count
an
exception?
Well,
you
need
to
somehow
you
know
you
could
you
can
take
the
whole
exception
and
stuff
it
who
knows,
but
you
could
take
a
what
you
can
do
here
is
you
can
take
a
hash
of
that
exception
and
you
can
kind
of
there's
there's
some
customization
that
can
happen
here,
but
you
can
take
a
hash
and
actually
just
kind
of
output.
This
a
little
little
bit
of
Jason
with
an
SH
one
here,
music
report
tool
kit.
Of
course
you
could
do
that
with
a
script
report.
A
Toolkit
we'll
do
it
out
of
the
box.
It'll
also
convert
these
diagnostic
reports
to
CSV
JSON.
You
can
filter
stuff,
so
if
you
only
want
a
couple
of
those
fields
you
have
to
filter
table,
of
course,
is
that
kind
of
output
you
saw
before
newline
would
be
something
like
new
line:
delimited
JSON.
If
you
need
that
sort
of
thing,
a
numeric
eye
kind
of
this
kind
of
experiment
where
you
can
like
use
it
in
in
a
shell
context
where
you
can
actually
pipe
it
to
something
and
maybe
generate.
A
There's
like
these,
like
neat
little
tools,
that'll
generate
like
graphs
and
your
console,
you
could
do
that
and
just
combine
it
with
filter
and
only
pick
out.
You
know
a
certain
a
certain
field
and
keep
running
that
over
time.
Redact,
of
course,
is
it's
essentially
the
same
thing
as
the
redact
command.
So
you
can
combine
these
transforms.
You
write
your
own
ocean
npn
using
no.
You
can't
do
that,
but
so
this
is
what.
A
If
something
would
look
like,
so
you'd
get
this
stack
hash
and
you
can
see
there
is
sha
1
hash
calculated
for
this
I
think
you
know
you
need
to
be
able
to
customize
this
a
bit.
Maybe
if
your
exceptions
have
some
user
information
in
them-
and
you
want
to
get
rid
of
that-
you
know
maybe
there's
some
personal
personally
identifiable
information
in
there.
You
should
be
able
to
pass
it
a
a
like
a
red,
regular
expression
or
just
a
function,
and
you
know,
write
your
own
and
plug
it
into
this
thing
and
it'll
help.
A
A
A
There's
a
tutorial
written
by
gireesh
who
spoke
about
diagnosed
reports
yesterday,
and
also
he
was
the
one
who
who
got
this
code
into
core,
but
there's
a
tutorial
there,
which
links
to
those
two
developer.com
you
can
also
and
I
apologize.
This
is
not
very
legible,
but
the
documentation
site
for
report
toolkit
is
IBM.
Github
do
forward
slash
report,
toolkit
and
I'll
leave
that
up
for
a
second,
but
it
is
an
IBM
project,
I'm
the
only
person
working
on
it,
but
it's
still
an
IBM
project
and
so
again
I
am
Christopher
Hiller.