►
From YouTube: Debugging ZFS: State of the Art on Linux by Tom Caputi
Description
From the 2019 OpenZFS Developer Summit
slides: https://drive.google.com/open?id=1YZ1RW13yY8umhQF5CQ82zzk_mVUodYbT
A
B
B
Was
out
really
late
and
between
the
time
zones
and
and
eating
pretty
much
only
junk
food,
I'm
I'm?
Sorry,
just
just
stay
with
me
and
we'll
all
get
through
this
okay
anyway,
I'm
here
to
talk
about
ZFS,
debugging
techniques
and
I've
been
spending
a
lot
of
time
over
the
past
six
months
to
a
year
or
so
just
doing
a
bunch
of
debugging.
So
I
I
don't
know,
let's,
let's
just
get
into
it.
Okay,
so
here
I
have
a
system
I
created
a
pool
here.
B
B
So
this
is
a
problem.
People
should
be
able
to
make
data
sets
last
time,
I
checked
so
I
guess
the
first
thing
that
I
try
to
do
when
I
hit.
Something
like
this
is
I'm
gonna.
That's
amazing!
Right
there
that
worked.
Okay,
usually
when
I
break
ZFS
or
most
of
the
people
in
this
room
break
see.
If
this
you
can't
control,
see
it
if
you
are
able
to
control,
see
the
process.
B
What
that
basically
means
is
that
your
process
is
busted
in
userspace
somewhere
for
the
most
part,
and
that
means
that
this
is
probably
a
lot
easier
to
debug
than
anything
else.
That
could
possibly
happen.
So,
let's,
let's
try.
This
okay
I
have
T
MUX
loaded
up
here.
So
let's
do
let's
go
back
to
this.
Let's
try
this
again
and
since
obviously
this
is
a
reproducible
problem.
B
Let's
go
over
here
and
let's
just
run
a
CH
tap.
Okay.
This
is
one
of
the
first
things
I
do
whenever
I
encounter
a
problem
and
you'll
notice,
nothing's
happening.
Let's
search
ran
for
the
ZFS
process
and
see
if
it
is
okay,
so
there
it
is,
and
it's
asleep
it's
not
doing
anything.
It's
not
really
using
a
whole
lot
of
memory,
but
this
is
good.
This
is
all
this
is
all
information
and
we
know
that
we
can
control,
see
it.
B
B
B
This
is
a
wonderful
tool
whenever
you're
debugging
anything
because
it
will
at
least
tell
you
usually
where
the
problem
that
you're
trying
to
debug
started,
because
it
will
tell
you
the
system
call
that
it's
stuck
on
trying
to
execute.
So
let's
try
this.
Let's
do
s
trace
and
you
see
it
did
a
whole
bunch
of
nonsense,
getting
all
kinds
of
things
from
about
what
kind
of
features
to
pull
supports
and
all
these
other
things,
but
that
last
one
is
weird
okay,
so
it's
asleep!
B
So
let's
start
with
that,
shall
we
that
sounds
like
a
good
place
to
start,
let's
go
into
the
code
and
let
me
bring
this
over
to
do
and
I
apologize.
This
is
a
little
bit
of
back-and-forth
over
here
and
then
let
me
just
switch
to
it'll
be
easier.
If
I
just
do
it
like
this.
B
Okay,
hopefully
that
should
be
easier,
but
anyway,
so
let's
do
let's
open
up
ZFS
main
dot
C
and
take
a
look
at
what's
going
on
here.
So
we
know
that
the
thing
didn't
really
the
the
thing.
The
program
didn't
really
call
anything
before
it
got
to
that
sleep.
It
just
basically
loaded
a
bunch
of
other
it
loaded
a
bunch
of
info
about
the
features,
but
it
didn't
really
get
to
any
of
the
code.
B
So,
let's
take
a
look
just
start
at
ZFS
main
dot
C
and
try
to
follow
it
around
and
see
what
might
be
the
problem
here.
So
we're
gonna
go
to
CMD
and
go
to
ZFS
because
that's
command.
We
ran
and
ZFS
main
dot
C
and
over
here,
and
let's
do
let's
search
for
ZFS.
Do
you
create
pretty
easy
look
around?
We
find
it.
Let's
see
if
there's
anything
suspicious
here.
B
Dear
future,
Tom
I
know
how
much
you
love
fixing
bugs
in
ZFS
I,
also
know
your
birthday
is
coming
up
in
five
months:
that's
not
even
accurate,
so
I
wanted
to
give
you
an
early
birthday,
present
I
added,
even
more
bugs
to
see
if
this
I
didn't
think
it
was
possible
able
to
add
more
I'm
the
best
past
tom,
great
okay.
So
we
have
some
things
here
that
we
need
to
fix.
Let's
get
rid
of
this
one,
because
this
one's
obviously
a
problem.
B
A
B
Let's
talk
about
user
space
lockups,
so
this
was
kind
of
the
first
problem
that
we
just
saw
here.
We
saw
that
everything
you
know
the
whole
process
was
kind
of
stuck
in
user
space,
but
we
were
able
to
get
out
of
it
just
by
hitting
ctrl
C,
which
was
probably
the
first
thing
that
you'd
try
as
somebody
who's
familiar
with
a
UNIX
system
like
that's
probably
one
of
the
first
things
you've
tried,
but
that's
actually
a
really
valuable
piece
of
debugging
information.
It's,
whether
or
not
you
can
cancel
it
after
you've
determined
this.
B
B
It
was
stuck
in
S,
which
is
the
asleep
state
which
basically
means
that
it's
canceled
whenever
we
want
to
it's,
also
important
to
keep
in
mind
that
the
process
might
just
be
slow,
it
might
be
doing
its
job,
but
it
might
just
be
taking
a
lot
longer
than
you
expect.
If
that's
the
case,
then
you
know,
then
you
have
more
work
to
do,
but
for
right
now,
we've
gotten
to
something
pretty
easy.
So
let's
go
back
here.
It's
rebuilt.
Let's
try
this
again.
B
So
this
is
what
a
crash
looks
like
pretty
obvious
I'm
sure
most
of
you
have
seen
this
kind
of
thing
before
and
usually
this
is
the
if
you're
using
a
UI.
You
will
see
this
message.
Otherwise
you
will
just
see
this
message,
which
is
very
helpful
and
tells
you
almost
nothing
segmentation
fault
core
dumped
and
it's
kind
of
a
lie,
because
if
you
look
around
this
one
is
here
from
before
so
ignore
that.
B
Rm
does
still
work,
so
you
run
this
in
seg
faults
and,
let's,
let's
take
a
look
at
how
you
debug
this.
This
is
also
pretty
easy
to
debug
and
the
reason
that
this
is
kind
of
easy
to
debug
is
because
there's
all
kinds
of
tools
and
stuff
that
have
existed
for
you
know
since
the
beginning
of
time
to
help
you
figure
out
how
to
debug
this
stuff.
So,
let's,
let's
use
the
biggest
one
right
now,
the
biggest
one
is
obviously
gdb.
B
So
let's
do
and
let's
get
rid
of
this
message
if
I
can-
or
maybe
it's
just
there
forever-
oh
yeah,
it
is,
but
it's
not
letting
me
alt-tab
to
it,
so
go
away
there.
You
go
okay,
so
let's
there
now
should
be
a
core
file
here
and
there's
not
and
the
reason
I'm
sorry
I
brought
this
up
before.
But
usually
when
you
do,
when
you
do,
when
you
crash
a
program,
it
will
tell
you
that
the
core
is
dumped,
and
that
is
a
lie.
What
you
need
to
do,
first,
is
you
need
to
set?
B
So
now
that
we've
done
this,
it
will
actually
create
a
core
file
when
we
do
this
and
if
we
do
LS
core
there.
It
final
is,
you
know
for
real
this
time,
unlike
before,
which
was
just
a
figment
of
your
imagination,
and
now
we
can
use
gdb
to
solve
it
and
what
you'd
normally
type
is
GDB
GDB
ZFS,
which
is
the
name
of
the
binary
that
you're
trying
to
work
on
and
then
core,
which
is
the
name
of
the
core
file.
B
What
the
core
file
represents
is
basically
the
state
of
your
system
at
the
moment
that
it
crashed.
So,
by
doing
this,
you
can
tell
it
see
right
here
that
here's
the
stack
trace
of
exactly
where
it
crashed
conveniently
with
no
line
numbers.
Now
we
can
see
that
right
here,
the
top
of
it
is
this.
Nice
convenient
crash,
the
program
function
which
might
have
something
to
do
with
the
problem.
B
Let's
take
a
look
at
that.
So,
if
I
just
search
around
here
for
crash
the
yep
there,
it
is-
and
it
says
this
will
crash
the
application-
wonderful
wonder
what
it
does.
This
function
serves
two
purposes.
First,
it
crashes
the
program.
Second,
it
crashes
the
program,
it's
worth
repeating
for
emphasis:
okay,
that's
not
good
here.
Let's
just
get
rid
of
this
okay,
so
now
we've
gotten
written
that
rid
of
this,
and
hopefully
now
we
should
be
able
to
create
a
data
set
and
move
on
to
you
know
the
actual
kind
of
hard
stuff
gonna
bug.
B
So,
let's
quickly
get
out
of
this,
we
found
this
one
do
again
make
j7
and
I
can
just
hit
up
if
I
know
what
I'm
doing
there
we
go,
and
so
this
brings
us
to
the
second
kind
of
the
second
kind
of
thing
that
can
happen
when
you're
debugging
issues
this
one's
a
lot
more
common.
This
is
a
user
space
crash.
Usually
when
you
find
a
problem,
it
won't
just
be
that
somebody
put
asleep
somewhere
because
most
of
the
time
people
don't
want
their
programs
to
just
go
to
sleep
most
of
the
times.
B
B
When
that
happens,
it's-
and
this
is
the
wrong
slide
and
I
deleted
the
right
slide
so
anyway,
but
but
basically
the
system
will
cry
when
the
system
crashes,
the
the
thing
are,
the
operating
system
will
tell
you
that
it
crashed,
it
will
usually
say
something
about,
it
will
usually
say
it
was
killed
with
a
sig
int
or
a
sig
abort,
or
some
other
kind
of
signal
like
that.
That
you
did
not
issue
it
and
it
will
it
will.
B
You
know
if
you
have
you
limit
def
see
said
it
will,
dump
out
the
core
file
and
allow
you
to
debug
it
with
gdb.
Let's
go
back
to
this
and
okay.
Let's
try
this
one
more
time.
I
have
a
good
feeling
about
this
and
still
nothing,
but
this
is
something
different.
So
now
this
time
we've
tried
to
control,
see
the
program
and
it
hasn't
returned.
B
This
is
more
often
than
not
what
I
usually
see
whenever
I'm
debugging
ZFS
stuff,
because
most
of
the
stuff
that
I
and
I'm
sure
a
lot
of
people
here
work
with
you
know
we
write
code
for
the
for
the
Linux
kernel
or
for
the
BSD
kernel
or,
for
you
know
any
other
kernel
that
there
is,
and
usually
when
the
kernel
crash
is
your
there's
not
a
lot.
You
can
do
as
soon
as
I
see
this.
The
first
thing
that
I
do
is
I,
go
to
be
message
so
I
come
over
here
type.
B
B
But
you
know
when,
when
you
see
a
message
like
this
from
the
message
that
says
verify
and
then
some
statement,
it
usually
is
indicating
that
that
statement
is
false.
In
this
case,
it's
saying
that
whatever
the
current
time
is
minus
some
start
time
is
greater
than
or
equal
to
five
minutes
and
right
now
we're
saying
that
no
time
has
passed.
So
that's
weird,
but
I
also
don't
know
what
would
be
doing
this,
but
conveniently
it
gives
us
a
line
number
again.
So
again,
this
is
really
whenever
you're
creating
a
an
issue
report
for
ZFS
developers.
B
This
is
a
very
important
thing
to
include
is
not
just
the
stack
trace
because
a
lot
of
times.
This
is
what
ends
up
in
the
issue
report,
but
this
is
the
important
stuff.
So
let's
go
take
a
look
at
that.
This
is
in
ZFS,
ioctl,
dot,
C
and
it's
line
three
three,
four
nine.
It
says
so
come
over
here
and
we'll
go
to
line
three
three
four
nine
and
it
says
ZFS
allows
us
to
create
too
many
data
sets
and
I
misspelled.
Two.
B
We
should
make
sure
that
we
don't
create
more
than
a
data
set
every
five
minutes,
then
right
here
it
says
in
debug
mode.
You
can
create
data
sets
every
hour.
This
will
give
you
an
excuse
to
go,
get
some
lunch.
At
least
it
was
thoughtful.
That's
good,
okay!
So
anyway,
so
now,
let's
get
rid
of
this.
We
found
this
problem.
We
know
that
this
probably
isn't.
Actually
you
know
this
isn't
actually
true.
B
But
let's,
let's,
let's
take
this
and
get
rid
of
it
for
right
now
and
now,
interestingly,
we
have
crashed
the
kernel,
so
the
only
way
to
actually
get
back
to
a
state
where
we
can
retest
this
is.
We
have
to
restart
the
whole
thing.
B
B
Oh,
there
was
user
space
crashes,
it
just
moved
to
slide
down.
Let's
talk
about
kernel
crashes,
the
process
is
basically
appear
to
be
stuck
and
they
cannot
be
terminated
with
control
C.
If
you
look
at
them
in
PS,
ox
or
h
top
or
anything
like
that,
they'll
be
stuck
in
the
D
State
and
usually
they
may
or
may
not
say
just
the
word
killed
if
you're
using
a
bash
shell.
B
The
reason
that
they
say
killed
is
because
literally,
if
you,
if
you
cause
a
real
problem,
that
actually
causes
the
kernel
to
crash
the
kernel
that
that
is
how
the
kernel
like
is
built
to
handle
it.
Is
it
literally
just
sends
the
kill
to
the
process
and
kind
of
hangs
it
there
forever
in
ZFS
it's
a
little.
We
have
some
debugging
stuff
for
the
asserts.
That's
a
little
bit
nicer
and
basically,
we
just
hang
the
process
forever.
B
Ourselves
quote
for
inspection:
that's
what
it
says
in
the
comment
when,
whenever
you
hit
a
problem
like
this,
just
remember
that
the
most
important
thing
that
you
can
report
to
to
developers-
and
you
know
for
yourself
if
you're
trying
to
debug
these
issues-
is
to
look
and
be
message
depending
on
how
the
thing
crashed
and
what
kind
of
other
problems
may
be
may
have
arrived.
The
system
may
have
become
completely
unresponsive
and
you
might
really
just
be
stuck.
B
B
B
It's
just
going
to
touch
a
file
and
do
a
zpool
sync
for
those
of
you
who
don't
know
zpool
sync,
basically
just
make
sure
that
all
of
that
your
transaction
groups
are
synced
and
that
everything's
out
on
data
so
that
you're
sure
that
your
data
is
safe,
and
here
again,
we've
seen
a
crash
now.
I
wonder
why
that
is.
If
we
come
here
to
D
message,
we
don't
have
any
information,
that's
not
very
helpful
and
that
that
can
be
pretty
hard
to
kind
of
figure
out.
What's
going
on.
B
The
interesting
thing
about
problems
like
this
is
that
thank
you.
Ten
minutes
left
the
interesting
thing
about
problems
like
this
is
that
you
kind
of
usually
a
lot
of
times
what
you
will
end
up
having
to
do
is
you
will
end
up
having
to
wait
and
wait,
because
in
exactly
two
minutes,
theoretically,
the
kernel
will
print
out
a
stack
trace
of
what
is
stuck
and
that
can
usually
help
out
a
lot,
but
there
is
a
slightly
faster
way
to
do
it.
B
If
you
don't
want
to
sit
around
and
wait-
and
you
can
do
that
with
this
handy
dandy-
quick
little
very
long-
bash
script,
which
is
included
at
the
end
of
this
presentation,
so
that
you
can
copy
and
paste
it
in
basically
what
this
thing
does.
Is
it
just
prints
out
all
of
the
stack
traces
in
the
system
and
D
dupes
them
so
that
you
can
see
which
ones
are
which
ones
may
or
may
not
be
stuck
now
a
lot
of
times?
B
You
will
get
some
that
are
pretty
standard,
and
it's
bet,
especially
if
you
scroll
kind
of
to
the
bottom
and
with
ZFS
ZFS,
does
spawn
a
whole
bunch
of
a
whole
bunch
of
tasks
use.
So
you
will
find
a
lot
of
threads,
which
are
normally
just
kind
of
stuck
here
like
this
and
they're
just
waiting
for
some
work
to
do
so
in
this
case
task
queue
thread.
These
are
all
just
threads
that
are
waiting
for
more
work
to
do
and
there's
nothing
really
there.
B
But
if
you
scroll
up-
and
you
look
at
the
unique
ones
here-
we
can
see
that
we're
stuck
in
txt
wait
synced.
Now
that's
kind
of
interesting
because
usually
when
usually
the
TFG
sync
wait,
saying
usually
a
THD
sync
should
happen
every
5
seconds
or
so
so.
There's
no
reason
that
this
should
have
kind
of
needed
to
wait
for
it
or
I'm.
Sorry,
the
zpool
sync
command
is
waiting
for
that
to
finish,
because
that's
like
it's
entire
job,
but
there's
no
reason
that
it
really
should
have
been
stuck.
B
Let's
take
a
quick
look,
the
only
thing
that
I
did
before
that
was
I
touched
a
file.
So
let's
go
take
a
look
at
that
code:
real,
quick
and
we're
going
to
search
through
the
code
or
is
EFS
create,
because
that
is
the
function
that
is
responsible
for
creating
a
file
and
we
are
going
to
search
in
the
right
directory.
B
So,
let's
take
a
look
around
here:
here's
the
function
and,
let's
see,
if
we
see
anything
suspicious
in
here
because
helpfully
I've
been
very
nice
and
commenting
all
of
my
bugs
that
I've
added
and
this
function
does
do
a
lot
but
I
think
pretty
soon
whoa,
okay,
I'm,
just
not
ready
for
that
kind
of
commitment.
B
Okay,
so
what
happened
here?
If
you
look
at
the
code,
is
basically
I
commented
out
this
line,
which
is
DM
UTX
commit
what
this
represents
is
effectively,
what's
called
a
kernel
deadlock
and
basically
the
way
that
this
works
is
the
zpool
sync
thread
is
waiting
on
some
resource
that
will
never
come,
and
this
can
come
for
this
can
happen
for
a
couple
of
different
reasons.
One
of
the
big
reasons
that
this
can
happen
is
basically
because
that
thread
its
itself
waiting
on
something
else.
B
Let's
just
get
rid
of
this
and
we'll
rebuild
now
as
with
as
before.
In
order
to
make
this
work,
we
need
to
reboot
thee.
We
need
to
reboot
the
operating
system
because,
basically,
the
whole
system
there's
no
real
way
to
get
yourself
out
of
this
kind
of
thing,
but
while
we're
waiting
it
for
it
to
reboot.
B
Basically,
let's
just
go
over
the
symptoms
that
we
kind
of
see
here.
Whenever
you
have
a
kernel
lockup,
it
can
be
the
results
of
one
of
a
number
of
things.
It
could
be
a
rogue
call
to
sleep.
It
could
be
really
simple
like
that,
but
it
almost
never
is.
It
could
be
waiting
on
something
else
from
the
thread
like
in
this
case
it
was
waiting
on
the
TX
commit
it
could
also
just
be
waiting
on.
It
could
be
waiting
on
a
resource
that
is
simply
not
really
there
or
not
available
or
a
signal.
B
B
B
So
now
that
we
have
that,
let's
do
one
last
thing
and
basically
what
that
thing
is
is
let's
try,
let's
try
it.
Let's
try
to
create
a
bunch
of
files.
Let's
see
what
the
I/o
looks
like
if
we
try
to
write
a
whole
bunch
of
data,
I,
guess
in
this
case
we'll
just
do
one
file,
so
I'm
gonna
take
my
convenient
little
command
right
here
now,
for
those
of
you
who
are
not,
you
know
super
familiar
with
some
of
the
things
that
I
did
here.
B
So,
let's
go
take
a
look
first
thing:
I
want
to
do
is
run
H
top
and
that's
kind
of
weird
for
those
of
you
who
didn't
like
kind
of
immediately
see
it,
but
basically
we're
writing
data
to
a
file
and
what's
interesting
here,
is
that
we're
using
a
whole
lot
of
CPU,
and
you
can
see
that
in
H
top
they
helpfully
color
code
this.
So
you
can
see
that
this
is
read
CPU
time,
which
means
time
stuck
in
the
connell.
B
And
if
you
look
around
at
some
of
the
options,
you
can
go
here
to
hide
kernel,
threats
and
disable
that,
and
you
can
see
that
we
are
using
84,
that
you
know
a
whole
lot
of
percent
way
more
than
you
know.
You'd
kind
of
expect
for
writing
data
in
these
z,
WR
ISS
threads,
which
is
kind
of
interesting.
Now
that
in
itself,
like
those
threads
kind
of
do
a
lot,
those
are
the
threads
that
happen
to
do
all
of
the
kind
of
issuing
I/o
and
calculations
related
to
to
doing
I/o,
and
things
like
that.
B
So,
let's,
let's
do.
Let's
do
a
little
bit
more
analysis
here
earlier.
I
forget
who
showed
it,
but
we
saw
what
a
flame
graph
was
and
that's
very
helpful,
because
now
I
don't
really
need
to
explain
it,
but
basically
what
a
flame
graph
is.
Just
as
a
quick
recap
is,
it
is
a
it
is
a.
It
is
kind
of
like
a
visualization
of
how
much
time
your
CPU
is
spent
doing
any
given
task.
So
let's
restart
this
and
I'm
just
restarting
it,
because
I
want
to
make
sure
that
I
get
kind
of
the
same
picture.
B
So
this
is
what
it
was
doing.
You
can
see
it
spent
about
half
of
its
time
idle.
That's
what
all
this
stuff
is
right
here
and
you
can
see
that
says,
do
idle!
So
that's
kind
of
a
good
indication,
but
you
can
see
here
that
the
zwr
is
s
threads
they're
spending
pretty
much
all
of
their
time
here
in
zyo,
checksum
compute,
that's
interesting
because,
usually
in
ZFS
we
use
LZ
for
checksumming
and
that
pretty
non-intrusive
it
doesn't.
You
know
it
doesn't
really
take
a
whole
lot
of
CPU
to
do
that.
B
B
Okay,
right
here
and
okay,
this
is
immediately
suspicious
find
out
if
anyone
would
notice
if
I
started,
mining
bitcoins
here
mined
bitcoins
for
a
bit
every
time
we
do
a
checksum.
That
would
probably
do
it.
So
here
we
have
this
thing,
which
is
obviously
eating
up
a
whole
lot
of
CPU
right
here.
This
is
just
a.
This
is
just
a
hard
loop
and
let's
go
see,
oh
so,
here's
another
interesting
thing
helpfully
I
decided
to
dump
the
bit
the
bitcoins
to
this
ZFS
debug
message
and
for
those
of
you
who
are
not
familiar
with
this.
B
B
Now
this
is
kind
of
a
virtual
file.
So
this
is
like
an
self-updating
list
that
gets,
you
know,
recycled
kind
of
circularly,
so
this
does
not
have
everything
in
it
all
the
time.
But
this
can't-
and
you
know
you
can
go
through
the
whole
thing,
but
as
of
0-8-
oh,
this
is
now
enabled
by
default,
so
you
should
be
able
to
always
at
least
see
these
messages
and
get
an
idea
of
what's
going
on.
B
So
performance
issues:
what
do
they
look
like?
Basically,
in
general,
it's
just
whenever
somebody
complains
it's
whenever
a
process
isn't
moving
as
quickly
as
it
should
be,
whenever
it's
moot
using
up
more
resources
than
it
should
be
or
anything
else,
whenever
you're
looking
to
debug
something
that's
a
performance
problem
and
not
just
a
hard
crash
or
anything
else
like
that.
The
important
thing
is
basically
to
try
to
figure
out
what
your
bottleneck
is.
First,
it
could
be
CPU,
in
which
case
the
good
thing.
The
best
things
to
check
are
copper,
H
top.
B
If
it's
RAM,
you
can
also
check
top
or
H
top,
and
look
at
how
much
RAM
it's
using
or
free
M
disk
IO.
You
can
check
I,
Oh,
stat,
mx1
or
io
top.
Those
are
all
really
good
tools
and
for
network
which
doesn't
so
much
happen,
probably
for
those
of
us
here
at
this
conference,
but
in
general
you
can
look
at
if
top
and
get
an
idea
of
how
much
how
quickly
data
is
moving
in
and
out
over
the
network.
It
might
also
be
waiting
on
another
process.
B
So
possibly,
you
might
need
to
check
for
other
slow
processes
that
your
process
might
depend
on
when
it
comes
to
finding
the
culprit.
The
best
tools
that
you
can
use
our
flame
graphs
as
we
showed
earlier
and
another
really
good.
One
is
perf
top,
which
I'll
show
really
quick.
So
we'll
run
this
again
and
then,
if
I
run
perf
top,
you
can
see
kind
of
the
same.
The
same
kind
of
information,
it's
a
bit
harder
to
see
where
the
stack
traces
are.
B
Finally,
for
disk
bottlenecks,
we
have
a
ton
of
tools
in
ZFS
being
a
file
system,
including
0
I/o
stat,
and
you
can
add
any
number
of
flags.
These
were
already
kind
of
shown
by
Brian.
You
know
to
show
what
kinds
of
things
they're
capable
of
of
displaying,
but
basically
they
can
give
you
the
latency
the
size
and
the
queuing
statistics
of
pretty
much
all
the
I/os
in
the
system.
B
You
can
add
a
dash
V
if
you're
really
crazy
and
want
to
see
everything
for
every
disk
the
arc
debts
can
give
you
a
good
idea
of
you
know.
What's
going
on
with
the
arc,
how
much
memory
it's
using
and
those
kinds
of
things
and
again
I
Oh
top
can
be
very
can
be.
You
know
really
helpful
in
just
determining,
especially
how
busy
disks
are
because
they
just
give
you
a
nice
percentage
number
for
network
bottlenecks.
I,
don't
have
a
whole
lot
to
say
here,
because
I
don't
deal
with
these
too
much,
but
basically,
usually.
B
B
New
tools
that
are
coming
out,
there's
BPF
trace,
which
was
mentioned
before
and
BPF
trace,
is
basically
it
can
help
you
print
information
about
kernel
function,
calls
and
the
same
thing
with
func
graph,
it's
a
kind
of
a
similar
utility,
but
it
gives
you
like
kind
of
the
call
hierarchy
of
everything
and
finally
ZFS
debug
message
for
resources.
These
are
kind
of
all
the
things
that
I
talked
about,
including
this
nice
really
long,
bash,
one-liner
that
my
coworker
provided
and
you
can
find
any.
You
can
find
that
you
all
the
rest
of
the
stuff.