►
From YouTube: Monitoring ZFS by Richard Elling
Description
From the OpenZFS Developer Summit 2018
Slides: https://drive.google.com/open?id=1Q-I4xD6q_wWkmhCy5oJ0Gyt00EgvfCec
A
Our
next
speaker
is
a
Richard
Elling,
so
Richard
comes
from
the
sun
microsystems
days,
so
he's
been
exposed
to
CFS
from
a
long
time
ago
and
he's
been
playing
around
the
FS
and
doing
some
changes
for
many
many
years
back.
So
today
he
is
gonna
talk
about
observing
and
monitoring
in
CFS
and
how
to
improve.
In
that
aspect,.
B
Thanks
and
good
to
see
your
buddy
here
again
this
year,
how
many
years
has
it
been
six
Wow,
very
good
glad
to
see
a
lot
of
familiar
faces
like
many
of
you
when
we
deal
with
people
with
issues
and
so
forth
and
or
monitoring
systems,
especially
with
systems
demonstrating
mal
behaviors?
B
B
B
But
so
with
me,
telemetry
is
very
important
and
it's
been
a
while,
since
I
worked
on
the
space
shuttle
program,
but
STS
stands
for
space
transportation
system,
which
is
more
commonly
known
as
space
shuttle,
and
we
had
it.
We
have
a
lot
of
telemetry
in
there
and
this
is
a
failure
mode
or
the
result
of
a
failure.
Sts-107
is
Columbia
that
burned
up
on
re-entry
when
the
wing
burned
through-
and
this
is
the
telemetry
where
you
can
clearly
see
that
that
was
not
normal
right.
B
We
have
a
bunch
of
normal
flights
and
then
we
have
the
flight
that
was
abnormal,
and
this
happens
to
our
ZFS
systems
in
the
world
as
well.
And
so
the
question
is:
how
can
we
not
only
capture
this?
We
deal
with
it
immediately
for
operations,
but
also
forensic
ly.
How
do
we
go
back
and
understand
what
happened
back
in
time?
So
so
it's
very
important
and
NASA.
We
collected
a
bunch
of
telemetry
all
kinds
of
telemetry
and
stored
that
stuff
forever,
and
so
they
were
able
to
go
back
through
and
get
that
information.
So
how?
B
It's
an
example
load
the
font
conversion.
This
is
an
example,
then,
of
what
you
actually
see
in
a
in
a
Linux
case,
step
file
you'll
see
some
metadata
in
the
first
line,
then
you'll
see
not
quite
like
this
is
a
proportional
font
and
it
out,
but
you'll
see
a
name
type
data
as
a
tuple
and
then
until
recently,
all
the
types
were
four
which
is
unsigned
int.
B
Recently
we
started
adding
in
some
strings,
so
you'll
see
type
seven,
so
any
if
anybody's
written
code
that
assumed
that
four
was
always
going
to
be
four
surprised.
It's
not
it's.
It
is,
in
fact
the
data
type
and
so
most
of
within
what
we
see
is
unsigned
in
64
for
K
stats
and
ZFS
on
Linux
that
becomes
important
soon
and
there's
also
a
few
command
line
tools
that
you
can
run
to
observe
these
things
so
in
in
the
ZFS
on
Linux
tree,
there's,
April,
Irish
death,
of
course,
built
into
the
zero
command.
B
There's
a
couple
of
commands
out
there
with
history
that
goes
way
back
they've
been
carried
along
arc
stat
arc
summary
tell
you
a
lot
of
information
about
this
the
arc
and
how
it's
currently
using
all
your
REM
debuff
stuff,
our
stat
in
SPL
slab
are
a
little
bit
more
developer-friendly
or
maybe
developers
are
more
interested
in
it
and
say
operations,
people
but
but
they're
there
and
then,
of
course,
in
the
performance
world.
There's
you
will
always
hear
us
say
so.
B
Can
you
give
us
the
IO,
stat
minus
X
output
from
that,
and
the
reason
is,
is
it
shows
us?
The
device
stats,
including
operations
reads:
writes
operation
counts
but,
most
importantly,
latency,
or
at
least
an
average
latency,
which
is
the
first
place
where
we
go
to
say:
hey
your
average
latency
is
1.3
seconds.
You
might
have
something
wrong
your
disk,
and
so
we
see
that
a
lot
in
the
ZFS
on
Linux
world.
It
picks
up
G
Vols
as
well,
so
people
were
looking
for
instrumentation
for
his
evils.
Bandwidth
and
latency.
It's
all
right.
B
There
and
I
was
tap
my
sex
and
then
there's
a
couple.
Others
a
little
bit
more
specific
Z
fetch
debt,
which
just
looks
at
how
well
your
prefetcher
is
working
or
not
working
and
then
there's
one
you
may
not
have
heard
of
three
people
have
that
I
wrote
called
Costa
analyser,
which
started
in
the
lumo
space
and
it
tries
to
do
a
performance
guy
analysis
of
I
see
all
these
stats.
B
So
that's
what
you
can
get
command-line,
there's
probably
a
thousand
others
out
there
as
well.
That's
pretty
easy
to
write
a
tool
to
scrape
that
stuff,
but
we
really
want
to
use
our
eyeballs
right,
and
so
what
I'm
gonna
talk,
then
about
is
really
three
stacks
or
three
major
components
of
modern
databases
and
monitoring
tools.
B
One
thing
I've
seen
a
couple
of
slides
I
think
Christian
showed
that
using
Prometheus
with
graph
Anna
Griffin
ax
is
by
enlarge
a
terrific
open
source
project
to
help
us
guys,
take
telemetry
and
other
data
and
and
present
it
in
dashboards,
really
good
stuff.
I
can't
recommend
it
highly
enough,
then.
The
other
two
are
in
flux,
data's,
tick
stack,
which
is
Telegraph
in
Flex,
DB,
coronagraph
and
capacitor,
and
the
other
project,
though
you'll
hear
a
lot
about,
is
Prometheus
which
comes
where's
that
originally
from
yeah.
B
B
In
the
sense,
the
javascript
in
the
browser
runs
everywhere.
It
actually
turns
out
to
be
quite
good,
even
on
small
phones.
So
that's
pretty
cool.
It's
got
a
plug-in
architecture.
Lots
of
community
stuff
check
it
out.
It's
good
stuff,
I,
store
stuff,
then,
into
a
time
series
database
and
in
the
old
days
we
would
use
sequel
or
nice
you'll
see
some
old
people
talking
about
using
things
like
Cassandra.
B
You
know
databases
which
really
weren't
specifically
designed
for
time
series,
but
nowadays
we
recognize
the
need
to
really
focus
in
on
doing
a
real
time
series,
databases
and
they're
becoming
commercially
viable
and
big
open
source
environments.
So
the
difference
between
the
two
in
flux,
DB
versus
Prometheus,
push-pull,
discussed
a
little
bit
earlier.
They've
got
query
languages
to
me.
The
most
important
thing
is
data
types,
so
you
remember
earlier
today,
George
was
talking
about
the
CIO
pipeline
and
where
are
you
in
the
CIO
pipeline
when
you
get
an
error?
B
Well,
you'll
get
an
event
with
that
error,
and
in
that
event
it
will
tell
you
where,
in
the
pipeline,
which
pipeline
stages
you
were
instructed
to
go
through
and
where
you
are
in
that
pipeline
stage,
which
is
really
useful
information,
it's
an
integer
right.
It's
an
enum
story
is,
it
is
a
bit
field
and
the
problem
is
now
if
I
store
that
as
in
boolean
or
you
know,
I
certainly
I
won't
do
as
a
boolean,
but
if
I
store
it
as
a
float
in
Prometheus,
then
I
have
a
different
version.
B
Now
that
has
a
different
pipeline
set.
Like
suppose
we
add
encryption
now.
That
number
doesn't
make
sense
unless
I
also
know
which
version
of
the
OS
that
came
from.
So
why
do
I
like
data
types
of
strings
as
data
types
of
strings
is
when
I
generate
that
event,
information
I
can
decode
it
and
then
give
you
a
string.
That
says
these
are
the
pipeline
stages,
and
this
is
where
you
broke,
that
it
uh-huh
and
it
comes
out
nicely
gets
stored
in
my
database.
B
So
the
other
thing
you'll
notice
about
prometheus
and
it's
float
issue
is
we
do
high
speed
data
systems
at
new
Isis,
and
so
we
look
at
counters
so
just
for
example,
bandwidth,
counters,
I'm,
counting
bytes,
going
across
my
IOP.
The
path
for
the
high
speed
systems
we
have
I
will
roll
over
43
bits
of
a
mentis
in
just
a
few
days.
Right
and
now
in
the
systems
we
have
on,
the
drawing
board
will
roll
over
that
counter
and
and
maybe
two
days
and
it's
just
not
going
to
fit
afloat,
and
so.
B
B
Collectors
agents
and
aggregators
there's
two
that
I'm
missing
out
here.
These
are
the
two
that
I
keep
track
of
in
trying
to
keep
up
the
ZFS
on
Linux
data
collection
with
those
and
those
are
Prometheus.
Projects
has
a
node
exporter
which
puts
out
you
know
the
generic
set
of
I/o
statistics.
You
know
network
bandwidth,
CPU
memory,
those
kind
of
things
for
a
compute
node
and
then
there's
a
ZFS
plugin
for
that,
and
it
will
give
you
all
the
ZFS
data
and
then
in
Telegraph
for
the
influx
data
environment.
B
It
also
collects
most
of
those
things
as
well:
I'm
able
to
convert
those
strings
to
enums
as
strings
so
they're
a
little
bit
more
useful,
but
both
of
those
are
are
there
and,
as
of
my
Christmas
project,
was
to
get
those
up
to
date
on
as
far
as
where
we
were
on
Christmas
of
2017,
so
that
should
be
pretty
up
to
date
and
those
are
out
in
the
in
the
released
and
supported
versions,
which
is
pretty
cool.
So
then
those
take
a
look,
then.
B
B
B
So
I
only
have
2
colors,
sorry,
but
basically
there's
times
when
the
arc
is
been
asked
to
shrink
and
it
and
then
for
a
while,
it's
going
to
not
try
to
grow
again
and
then
that's
when
no
grow
is
ticked
over
to
2
off,
and
so
it
goes
into
a
phase
where
it
won't
grow.
And
so
you
can
actually
then
see
that
in
the
bottom,
where
we
we
were
growing
the
arc
or
in
the
arc,
then
we
got
to
a
point.
We
started
to
reduce
the
arc
size.
B
This
is
what
we
still
need
to
figure
out.
Then
we
tip
over
to
a
no
grow
phase
and
then,
after
a
while
there's
a
timeout
for
no
grow
and
then
it's
allowed
to
grow
again,
you
can
see
it
grow
a
little
bit
and
then
comes
back
down
and
goes
back
up.
So
this
is
the
kind
of
thing
that
when
you
look
at
it
this
way
it's
it's
immediately
obvious
that
oh
I
have
a
problem
here
and
we
will.
We
are
working
on
this
problem.
B
It's
annoying,
but
it's
not
it's
a
it's
a
new
thing,
so
we
know
it's
something
we've
introduced
and
so,
if
I,
by
contrast,
if
I
had
been
into
an
email
conversation
with
somebody
in
that,
so
will
give
me
the
arc,
stats
and
and
inside
no,
no,
not
those
ones.
I
need
this.
One
I
need
to
know
what
arc
no
grow
is
and
I
need
to
know.
B
You
know
these
various
levels
for
these
other
parameters,
then
in
fact
I
wouldn't
be
able
to
you
know
it
take
a
week
to
get
through
that
email
chain
right,
but
if
they
had
all
the
data
always
collected
and
it's
all
right
there
then
I
can
just
say:
pull
up
this
dashboard
or
I
will
email
you
a
dashboard.
You
take
a
look
at
the
timing
question
and
then
we
can
dive
into
it
further
and
with
that
I'll
make
a
shameless
plug
for
tomorrow,
the
people
at
the
hackathon.
B
We
start
to
peel
this
onion
when
we
start
to
look
at
this
data
and
when
I
have
all
the
stats
available.
My
fingertips
in
the
database
back
in
time
going
back
two
and
a
half
years,
for
example
in
my
lab,
then
we
can
questions
now
that
we
could
ask
of
the
experiments
we
did
two
years
ago.
That,
in
fact,
we
didn't
understand,
might
have
been
a
problem
back
then,
and
we
can
go
back
and
take
a
look
at
that
thanks
to
the
databases
and
there
are
new
dashboards
and
we
can
deliver
new
analysis
as
well.
B
Yes,
isn't
it
another
example
same
sort
of
thing
and
we
can
show
clearly
hit
rates
and
that
kind
of
thing,
and
since
this
was
done
in
the
browser,
when
I
hover
over
a
particular
time
point,
you
can
actually
see
all
the
values
out
numerically,
and
so
you
know
this
is
very
useful.
It
becomes
very
interactive
in
these
things
when
you
start
to
get
into
it
so
tomorrow,
those
that
were
interested
at
the
hackathon
we
can
go
and
do
some
some
stupid,
Patrick's.
B
This
is
the
graphical
version
of
that
which
you
would
see
scroll
off
a
few
pages
of
text.
If
you
ran
out
of
stat
my
sex-
and
this
is
just
a
transition
from
ways
of
doing
random,
writes
and
then
we
started
this
grub,
and
these
are
hard
drives.
So
you
can
kind
of
see
and
with
that
and
I
really
wanted
to
point
out
that
we
start
to
view
these
things
differently
and
on
the
bottom
graph.
B
I
use
a
model
where
writes
go
down,
so
I
use
a
negative
Y
axis
for
the
writes
and
then
reads
go
up,
and
so,
when
you
do
that,
it's
very
obvious,
then,
when
I
look
at
the
graph,
am
I
writing
versus
reading.
My
brain
will
tell
me
and
then,
if
you
look
at
the
total
bandwidth,
when
I'm
doing
both
reads
and
writes,
I
can
get
it
an
appreciation
of
the
total
bandwidth
in
the
system
all
without
looking
at
any
numbers
right.
Let
your
eyes
do
that.
Do
the
math
for
you
and
so
that's
an
example.
B
Then
of
we
did
some
random
fills
and
we
did
some
write
and
I
scrubs.
So
next
I
want
to
talk
about
when
I
build
the
dashboards.
The
ones
I
release
out
two
people
at
the
bottom
row
of
the
dashboard
I
put
some
documentation
in
and
we
can
do
its
markup
for
those
who
are
interested.
So
you
can
get
a
little
bit
fancy
with
the
documentation.
Earlier
today
we
were
talking
about
the
zio
pipeline
and
briefly
mentioned
the
fact
that
we
break
these
iOS
out
into
queues,
and
then
we
so
we
have
five
queues.
B
For
sync,
read:
sync:
write,
async,
read
a
sink
or
eight
and
the
scrub
queue
and
then
those
all
merge
back
together
and
get
actually
sent
down
to
the
disk,
which
also
has
a
queue,
and
then
we
kind
of
want
to
be
interested
in
understanding.
Where
are
we
spending
all
the
time?
Because
we
have
these
tunable
x'
in
in
the
scheduler,
but
to
date
I?
B
You
kind
of
need
to
understand,
what's
in
the
queue
during
my
experiments
and
then
also
what
do
I
want
it
to
look
like
at
the
end
right.
So
what
is
it
now
and
adversity?
What
I
want
to
look
at?
And
so,
if
you
look
in
the
full
IO
stat
W
on
linux,
you'll
actually
see
a
histogram,
then
of
the
Layton
sees
in
all
of
these
queues
again
great
font
translation.
B
But
trust
me
when
I
say
that
you,
you
will
see
these
things
and
we
gather
this
data
for
not
only
every
or
the
top
level,
which
is
what
I'm
showing
here.
When
you
do,
is
you
pull
IO
step
by
step?
You
get
the
top
level
roll-ups
for
all
of
the
V
devs
underneath,
but
in
fact,
in
the
system
there
is
in
fact
every
read
of
all
the
way
down
to
the
leafy
a
histogram
for
all
of
these
cues.
B
B
B
Groups
that
do
dashboards,
you'll
find
heat
maps
that
don't
do
anything
like
you
thought
you
thought
new
heat
maps,
but
these
are
latency
heat
maps
made
very
famous
by
early
work
in
Joyent
a
few
years
ago
and
now
readily
available
to
you.
So
this
is
the
same
data
over
time
in
the
queues.
So
at
the
end
to
end
we
have
the
top
level
read
and
write
lane
C's
for
the
pool
and
I
did
it
in
this
dashboard
I
did
reads
on
the
right.
B
So
in
this
experiment,
which
I
just
ran,
my
laptop
on
the
plane
over
I
was
doing
a
fill
of
a
small
pool
and
then,
when
it
finally
filled
up,
then
I
did
a
scrub
and
so
I've
kind
of
brought
out
for
this
time
period
at
the
top
level.
Then
we
see
a
bunch
of
writes
and
then
at
the
top
level
the
scrub
does
a
bunch
of
reads:
okay,
so
that
should
be
pretty
readily.
Hopefully
intuitively
obvious.
B
When
we
go
down,
then
the
next
level
down
into
a
sink
sink
reads
versus,
writes
for
the
next
level
down.
Then
you
can
really
start
to
see
it
break
out.
So
you
can
see
clearly
that
we've
got
synchronous
read
going
along
the
only
async
read
that
I'm,
aware
of
is
a
prefetch
right.
So
the
only
time
you're
going
to
see
secret
and
all
reads
of
basically
synchronous
is,
is
the
lesson
and
then
we
have
scrub
reads
and
scrubs
have
a
intentionally
lower
priority
than
the
others.
B
So
these
are
the
the
cues
waiting
to
get
to
the
disk.
Okay
for
the
the
three
columns
on
the
left
side
and
then
the
actual
disk
use
are
on
the
right
and
so
which
you
can
clearly
see
or
I
tried
to
intentionally
show.
You
is
on
the
scrub,
read
queue,
you
see,
there's
a
few
iOS
or
few
residency
times
in
there
that
are
down
around
the
to
microsecond
kind
of
range
and
then
the
bulk
of
them
are
way
up
and
around
you
know,
eight
microsecond
or
eight
milliseconds
or
so,
and.
B
But
it's
time
going
up
from
microseconds
to
milliseconds
to
seconds
on
the
on
the
y
axis
and
we
get
a
distribution
across
there
and
you
can
see
that
a
whole
bunch
of
these
iOS
are
queued
in
the
scrub.
Read
queue
for
you
know,
eight
milliseconds
kind
of
timeframe,
and
if
we
were
looking
at
that
versus
the
other
queues,
then
it
becomes
interesting.
Do
we
need
to
bump
up
the
the
priorities
for
the
scrub
queue
or
not?
I,
don't
know
the
answer,
but
this
is
where
I
can
kind
of
get
a
feeling
for
that
and
then.
B
Similarly,
on
the
bottom
row,
we
have
rights
and
you
can
see
the
rights
coming
through
and
you
know:
I
filled
up
the
pool
pause
for
a
little
while
and
then
described
so
you
can
kind
of
get
that
so
hopefully,
if
we
can
get
more
and
more
people
to
take
a
look
at
these,
especially
when
you're
off
tinkering
in
the
lab
with
new
code
and
a
repeatable
workload
and
even
some
ZTS
stuff,
then
we
can
get
an
idea.
You
know.
Is
this
really
an
improvement?
Is
it
doing
what
I
think
it
is?
B
B
Have
a
question:
so
how
am
I
going
to
get
this
to
you?
Guys
I'm
gonna
put
together
a
page
that
kind
of
tells
you
how
to
build
and
get
to
the
point.
We
can
collect
the
data
and
then
I've
got
somewhere
on
my
github
site,
a
start
to
a
series
of
dashboards
that
are
specifically
oriented
towards
my
ZFS
work
and
then
there's
a
few
dashboards
in
the
public
domain
out
there
on
the
graph
on
a
dashboard
lists
that
do
do
some
ZFS
work.
B
B
Oh,
so
how
much
space
do
I
need
to
store
all
this?
This
stuff?
It's
surprisingly
compact,
so
one
of
the
things
most
of
my
data
is
stored
is
in
influx
DB
and
influx.
Db
has
an
extraordinarily
compact
ability
to
compress
the
data,
since
it
knows
its
time
series-
and
it
knows
the
types
and
it
really
compacts
that
quite
well.
I've
got
two
and
a
half
years
of
data
on
I'm
up
to
about
140
machines
right
now,
and
it's
using
about
200
gigabytes
of
disk
space.
B
B
So
the
question
is:
how
do
we
parse
the
data
and
get
it
up
into
in
flex,
DB
and
the
way
it's
actually
works
is
inside
Telegraph
and
inside
Prometheus,
node
Explorer,
there's
a
ZFS
agent
that
will
read
the
case,
step
files
and
then
convert
that
into
the
format
for
the
telemetry
stream.
That's
given
to
you,
so
you
know
it's
it's
a
transformation
kind
of
workload.
It's
it's
not
high
rocket
science,
so
kind
of
thing.
The
biggest
impediment
to
us
doing
more
in
lieu
of
these
is
how
do
we
get
the
code?
B
B
B
You
know
that
was
obviously
a
red
alert
kind
of
a
situation,
and
you
want
to
get
those
early,
as
can
be
into
the
operation
sense
and
at
the
same
time,
there's
an
Operations
Group
that
usually
does
capacity
planning,
and
you
want
to
help
the
capacity
planners
predict
how
much
more
gear
that
they
need
to
buy
from
new
Isis.
And
so
all
these
things
are
very
useful
for
us,
and-
and
we
want
to
do
that
and
to
help
that-
and
so
we
have
different
consumers
for
the
data.
B
Yet
I'd
mostly
do
performance
work
and
so
in
the
performance,
work
I
care
entirely
about
latency,
all
right,
not
entirely
mostly
about
latency,
because
that's
where
it's
it's
most
painful,
but
for
an
Operations
guy.
Maybe
they
care
about.
You
know
what
is
my
daily
increase
in
load
right,
which,
which
is
a
very
different
question
the
tooling
can
handle
both
and
then
it's
a
matter
of
how
do
I
communicate
a
dashboard
that
make
sense
right.
B
There's
a
I
I'm
of
the
opinion,
don't
delete
anything
so
I'm,
like
oh
yeah,
I,
collected
all
this
data
right
every
10
seconds
or
15
seconds
or
1.
Second,
depending
on
what
it
is.
I
don't
ever
want
to
delete
it,
which
is
why
my
lab
has
got
this
database.
That's
been
around
since
I
started
working
there,
but
and
that
becomes
impractical
at
large
scale
and
so
those
guys
wanted
downsample
and
all
that
so
there's
ways
to
do
that.
Fortunately,
most
of
those
things
we
do
in
ZFS
is
counters
and
they're
unsigned
in
64.
B
Is
there
always
going
to
go
up
and
so
they're
easy
to
downsample
I
can
just
take
a
sample
and
and
throw
away
an
intermediate
sample
and
still
retain
that
that
growth
over
time,
which
is
really
cool
to
developers
gauges
are
troublesome.
Averages
are
appalling,
incrementing
counters
are
good,
just
take
account,
we
love
it
any
other
questions.