►
Description
From the 2019 OpenZFS Developer Summit
slides: https://drive.google.com/open?id=1oho9X5bkW-I-yJ-pVD8VqkaloxhGepzT
A
Just
to
give
a
quick
background,
as
I
said,
I
work
at
del
phix
and
our
product
is
basically
an
appliance
that
sparked
by
ZFS,
and
our
customers
run
either
on
the
public
cloud
or
on
premise
of
hypervisor,
and
we
recently
did
this
switch
where
we
were
using
in
Lumos
and
we
transitioned
our
operating
system
to
Linux
and
one
of
the
main
questions
that
was
brought
up
during
this
transition
was.
Can
we
maintain
our
existing
debugging
processes
that
we
have
from
illumos
for
issues
that
we
encountered
production
with
Linux
as
an
engineering
organization?
A
One
of
our
main
goal
on
that
aspect
is
to
root
cause
on
first
failure.
Whatever
we
hit
an
issue
in
production,
we
want
to
debug
it
there
and
then
for
performance
pathologies.
What
this
looked
like
most
of
the
time
is
hopping
on
the
VM
gathering
some
logs
and,
if
that's
not
enough,
trace
the
system
to
analyze
its
runtime
behavior,
but
for
more
severe
issues
like
panics
or
dead
logs.
What
we
generally
do
is
we
collect
the
cross
down
and
we
do
post-mortem
debugging,
basically
analyzing,
that
custom
in-house
in
illumos.
A
Everything
was
built
with
these
use
cases
in
mind,
so
our
procedures
were
created
accordingly,
now
switching
to
Linux
many
things
carried
over,
but
specifically
for
post-mortem
debugging.
We
found
the
workflows
and
the
tools
available
not
being
exactly
sufficient
for
our
needs.
So
at
this
point,
I'd
like
to
take
some
time
and
talk
about
post-mortem
debugging
in
general
and
specifically,
why
is
it
so
important
for
us
at
del
phix
as
an
appliance
company?
Well
consider
if
we
didn't
have
grass
demand,
we
wouldn't
do
any
post-mortem
debugging
and
a
customer
VM
crassus.
A
The
only
state
that
you
have.
The
only
information
that
you
have
related
to
the
failure
is
whatever
the
customer
thought
to
mention
at
the
time.
Whatever
support
thought
to
ask
and
whatever
logs
were
recorded,
while
the
VM
was
running,
that
the
developers
thought
during
development
that
they
would
be
useful,
but
most
of
the
time,
this
is
not
enough
for
us
to
root
cause
issues,
and
you
know
if
we
didn't
have
craftsman
terms.
A
What
would
end
up
happening
is
that
we
would
iterate
with
a
customer
and
by
iterating
I
mean
you
would
basically
add
some
more
logging
statements
or
you
change
some
code,
give
it
to
the
customer.
The
customer
Rani's
reproduce
the
issue,
and
then
they
come
back
to
you
with
the
new
data
and
then
maybe
you
can
root
cause
the
issue.
Maybe
you
need
to
go
through
another
iteration
and
you
can
see
where
this
is
going.
A
This
is
a
very
slow,
it's
a
very
error-prone
process
and
if
you've
done
this
iteration
like
four
or
five
times
it
can
be
a
little
bit
embarrassing
because
it
looks
like
you
don't
know
what
you're
doing
so.
I,
don't
say
that
crash
dumps
and
post-mortem
debugging
is
a
silver
bullet.
Sometimes
you
still
need
do
all
the
things
that
I
mentioned,
but
across
them
is
comprehensive.
A
It
most
of
the
time
it
has
all
the
state
that
you
would
ever
need
to
do
to
root
cause
issue,
and
it
never
lies
it's
kind
of
like
facts,
not
people's
opinions.
Another
great
point
about
grass
dams
is
that
they
decouple
the
activity
of
root,
causing
the
failure
from
the
process
of
restoring
the
system.
A
So
if
you
look
at
the
failure,
the
first
time
you
see,
the
messing
you
can
kinda
see
like
oh
I
can
apply
this
workaround
having
taken
a
grass
dump
and
the
customer
can
continue
running
their
system
while
I
root
cause
the
failure.
On
my
own
I
mean
this
is
a
great
thing.
It
basically
lowers
the
severity
of
the
issue
right
away,
so
I
have
a
real-world
example
here,
something
that
George
and
Matt
worked
13,
2
or
3
years
ago.
A
So
basically,
what
happens
like
a
customer
hit
an
issue,
and
you
know
as
an
engineer
having
this
collision.
You
see
something
like
that.
This
is
very
similar
to
the
DES
message:
output
that
Tom
showed
earlier,
so
you're
handling
the
escalation,
and
you
want
to
start
debugging
the
issue.
So
you
look
at
the
console
and
you're
okay.
We
are
lucky
we
hit
an
assertion,
failure
we
get
a
nice
output
where
it
actually
points
out
the
file
name
and
the
line
with
it
at
file,
name
where
the
problem
occurred.
A
In
this
specific
case,
we
see
that
the
problem
is
in
an
assertion
within
CIO
done,
so
you
start
taking
investigation,
notes
right.
You
are
looking
at
the
assertion
and
basically
what
the
assertion
says
is
that
the
block
pointer
of
that
CIO
is
not
the
same
as
the
original
block
pointer
whatever.
That
means
for
now
right
we're
just
looking
at
the
code.
Looking
at
the,
if
condition
we
see
that
okay,
this
is
an
opera
egao.
A
A
Basically,
what
happens
is
that
ZFS
compares
the
check
sums
of
incoming
blocks
to
ZFS
and
compares
them
to
blocks
on
disk
to
their
destination
blocks
on
disk,
and
if
this
check
some
mats,
it
means
that
nothing
has
changed,
so
we
can
actually
skip
doing
that
I/o.
This
is
actually
pretty
common
when
you
do
full
backups
on
large
random
access
files
that
have
almost
identical
data.
A
So
with
that
in
mind,
we
basically
rephrase
the
problem
that
we
just
saw
what
happened
we're
using
an
upright,
but
the
block
pointer
that
we
are
about
to
write
that
we
would
be.
Writing
is
not
the
same
as
what's
on
disk,
so
that's,
basically,
a
bug
in
the
top
right
also
is
an
X.
An
extra
note
is
that
the
block
pointer
is
not
any
better
block
pointer,
so
straight
off
the
bat
we
know.
We
can
know
that
we
can
relieve
the
customer
by
disabling
an
OP
rights
right
away,
and
here
I
have
this
one
line.
A
There
is
basically,
we
are
passing
a
command
to
MDB.
That
tells
it
to
look
up.
The
no
ZFS
nope
right
enabled
variable
and
write
zero
to
it,
basically
disabling
that
functionality,
but
before
doing
that
we
also
generate
the
car
across
them,
so
we
can
analyze
it
house
so
alright,
so
the
system
is
running
now
the
customer
is
happy,
but
there's
still
this
bug
that
other
customers
may
be
hitting.
So
we
basically
have
two
choices.
We
can
start
reading
the
tens
of
thousands
of
lines
of
related
code.
The.
B
A
Code,
the
team
view
code,
you
know
all
the
entrances
to
like
the
right
code
paths
or
we
can
start
analyzing
across
time
that
we
got
and
make
targeted
questions
towards
the
root
cause
of
the
issue.
So
I
really
like
this
example,
because
you
know
the
zero
code
path
is
it's
a
little
bit
complex,
George
Wilson
had
a
whole
talk
last
year,
explaining
all
the
pipeline
stages
of
these
things
and
following
the
control
flow,
just
by
reading
the
code
is
not
very
easy.
A
The
other
thing
about
this
is
that,
looking
at
the
stack
trace,
we
can't
tell
where
the
CIL
came
from.
Basically,
some
kind
of
thread
was
spawned
running
zio,
execute
right
away,
and
it
failed.
What's
even
more
interesting
is
that
in
the
zero
code,
but
many
times
the
thread
that
actually
issued
the
zio
may
not
be
around
anymore,
because
this
may
be
an
a
synchronous
right.
A
So,
even
if
you
were
to
print
all
the
stack
traces
in
the
system,
you
still
there's
a
possibility
that
you
wouldn't
be
able
to
find
the
thread
that
actually
issued
the
zio.
So
what
I'm
trying
to
say
is
that
you
need
to
inspect
the
actual
data
on
the
system.
Specifically
that's
all
Zi
or
see
what
it
looks
like
and
there's
no
better
place
to
do
that
than
actually
analyzing
the
kratom.
A
So
what
we
would
do
in
illumos,
we
would
start
inspecting
the
cio.
We
would
use
m
DB
that
the
kernel
debugger
in
Lumos
that
works
on
live
systems
and
crash
dumps.
We
would
print
the
stacks
with
function.
Arguments
and
you
know
we
would
look
at
the
function.
The
first
function
argument
of
zi
you're
done
it's
a
jo
T
pointer
and
we
would
actually
print
what
the
actual
structure
looks
like.
So
in
this
case,
what
we're
basically
saying
is
that
take
that
pointer
type
Casa
to
zero
and
print
it.
A
One
interesting
thing
about
this
is
that
iodine
is
said
to
debuff
right,
override
done,
which
means
that
this
is
a
right,
override
io,
and
this
is
gonna,
be
helpful.
Later
I
just
wanted
to
point
out,
but
we
can
do
cool
things
like
examining
what
the
block
pointers
look
like
now:
I'm,
not
gonna,
go
through
like
what
these
commands
exactly
mean,
but
I'm.
Basically,
pointing
out
that
the
GAO
that
we
were
about
to
write
looks
like
this.
The
first
image
and
the
the
second
image
shows
what
the
block
pointer
on
disk
looks
like.
A
So
these
are
obviously
different.
Therefore,
you
know
it
makes
sense
why
the
assertion
was,
but
what's
even
more
interesting,
is
that
the
block
pointer
of
disk
is
a
hole
which
means
that
the
blow
pointer
was
freed
and
when
was
it
freed.
We
look
at
the
birth
time
and
we
see
that
it
was
freed
that
txt
38
91,
okay,
that's
kind
of
interesting.
What's
the
current
exe,
we
can
actually
walk
the
our
way
from
the
zero
pointer
all
the
way
to
the
spa
structure,
which
basically
describes
a
pool
and
things
like.
A
What's
the
currently
singing
text
see
and
we
see
that
the
currently
seeking
text
is
38
92,
so
basically
that
block
was
freed
one
text
before
the
current
one,
all
right,
so
that's
kind
of
interesting,
so
we
are
going
back
to
iodine
being
set
to
debuff
right
over
write
down,
which
is
basically
a
callback
and
looking
at
the
code.
There
is
only
one
place
where
these,
where
iodine
is
set
to
that
and
that's
the
buff
right
all
right,
so
we're
going
somewhere.
Looking
at
the
code
around
this
code
path,
we
find
a
new
clue.
A
Basically,
it's
a
comment
telling
us
that
this
block
was
provided
by
open
context
and
we
came
here
other
by
D,
musing
or
dim
be
damned
you
buff
right
embedded.
So
we
have
our
two
suspects
where
the
co
could
have
potentially
come
from.
But
if
you
remember,
we
made
that
note
earlier
that
that
block
pointer
is
not
an
embedded
block
pointer,
so
it
must
be
d
amusing.
A
So
we
look
at
the
D
mu
sync
code
and
we
stumble
upon
this
interesting
part.
Where
there's
this
huge
comment
and
and
if
case
that
potential
disables,
not
rights
so
reading
the
comment,
what
it
basically
says
is
that
there's
a
possibility
that
our
blog
may
have
been
freed
or
it's
data
may
have
been
modified,
so
we
should
disable
the
knob
right,
because
the
context
has
changed.
A
What's
even
more
interesting,
though,
is
that
the,
if
statement,
checks
if
the
record
has
been
dirtied,
which
means
that
it
only
checks
if
the
data
was
modified,
it
doesn't
check
if
the
block
has
been
freed.
So
lo
and
behold
that's
our
problem,
so
the
actual
fix
would
be
something
like
this,
and
this
is
what
ended
up
happening.
Matt
and
George,
reproduced
this
in
house,
and
they
came
up
with
his
fix.
They
basically
added
this
extra
case
here
in
the
if
statement
and
they
restructured
the
comment
to
say
to
be
explicit
and
say:
hey.
A
This
could
have
been
checked
by
being
dirty,
the
first
condition
or
by
being
freed
than
the
second
condition
so
case
closed.
The
problem
was
that
we
were
using
an
upright,
but
the
block
pointers.
The
block
pointer
underneath
us
has
actually
changed.
The
root
cause
was
that
that
Dimmu
scene
wasn't
doing
good
enough
a
job
of
considering
if
it's
disabled,
uprights
or
not,
and
the
fix
was
adding
the
extra
check.
A
So
just
to
do
a
recap:
I
hope
that
was
motivating
enough.
That
post-mortem
debugging
is
like
very
important.
It
is
especially
for
us
at
del
phix.
It
allows
you
to
examine
the
processor
and
in
memory
state
at
the
time
of
the
crash,
and
what
we've
seen
is
that
if
you
bundle
it
with
a
zdb
output,
which
is
basically
the
corresponding
on
disk
state,
most
of
the
time
is
all
the
state
that
you
need
to
debug
ZFS
issue.
A
Of
system
recovery
from
the
process
of
root
cause
analysis,
and
all
these
things
are
great
and
crash
dumps
are
very
helpful,
but
there
is
no
use
of
across
them.
If
you
don't
have
a
tool
that
can
analyze
it
efficiently,
and
we
did
some
research
on
post-mortem
debuggers
in
linux
and
we
found
there
there.
There
was
some
work
there,
but
there
wasn't.
We
couldn't
find
anything
that
was
sufficient
enough
for
our
needs,
so
we
leveraged
some
existing
projects
and
we
created
SDB.
A
The
slick.
Debugger
sdb
is
a
post-mortem
and
live
debugger
when
I
say,
laugh
debugger.
It
means
that
you
can
attach
it
to
a
live
system
and
you
can
introspect
memory
without
actually
halting
the
system.
The
system
is
still
running.
You
don't
want
to
just
how
the
system
in
production,
its
user
experience,
is
similar
to
NDB.
A
So
I
could
go
on
about
many
hours
talking
about
s
DB,
but
this
is
the
ZFS
conference
and
we're
gonna
look
at
some
examples
of
debugging
ZFS
with
s
DB,
so
I'm,
very
similar
to
what
tom
so
the
earlier
about
the
duplicated
stacks
most
of
the
times.
You
want
to
see
what's
going
on
in
the
system
and
keep
in
mind
that
all
the
command
output
that
I'm
gonna
so
in
this
presentation,
works
both
for
life
systems
and
crash
dumps.
A
A
You
can
see,
for
example,
in
this
system.
We
have
like
64
threads
that
are
NFS
servers
waiting
for
something,
and
you
know
the
way
down.
All
these
things
are
sorted
by
count.
We
see
some
networking
thread
doing
something,
but
you
know
we
want
to
examine
ZFS
issues,
so
we
can
ask
okay,
what's
going
on
specifically
for
ZFS,
so
we
say
stacks
does
n
z,
FS.
Basically,
this
says
show
me
all
the
stacks
related
to
the
ZFS
module.
So
here
we
can
see.
Okay,
17
threads
are
ethers
waiting
for
something
we
have
four
quiescing
threads.
A
We
also
have
at
the
very
end,
let's
say
the
debuff,
a
big
evict
thread
doing
something,
and
you
can
be
even
more
specific
from
that.
You
can
ask
questions
like
okay.
Are
there
any
threads
issuing
any
ZFS
iocked
dolls?
And
you
can
do
that
by
writing.
Something
like
this
Stax
does
see
ZFS
the
vehicle.
Basically
you
say
you
tell
stacks
filter
by
all
the
threads
that
have
this
function
in
their
stack
trace.
A
So
in
this
example,
we
can
see
okay,
there's
one
I
octal
doing
a
sir
there's
another
I
octal,
getting
the
history
there's
another
I
octal
exporting
sample,
but
the
real
power
of
SDP
is
examining
actual
data
structures.
So
in
this
example,
let's
say
that
we
want
to
introspect
a
pool
state
right.
All
the
pools
are
part
of
this
AVL
tree
called
span,
namespace
AVL
and
what
I
have
here
is
basically
necessary
command.
A
That
says:
ok,
I,
want
you
to
examine
whatever
is
at
address
at
the
address
of
span,
namespace
ABL,
and
you
can
see
the
a
the
fields
of
the
AVL
struct
and
you
know
what
you
would
do
in
general
with
another
debugger.
If
you
want
it
to,
let's
say
traverse
this
AVL
tree
and
see
all
the
data
I
need.
You
would
basically
start
from
the
root,
so
you
can
pipe
whatever
you
found
into
the
member
command
which
delivers
and
their
references
and
member
out
of
a
structure
starting
from
the
root.
A
And
then
you
know
you
can
print
up
that
root
and
then
you
can
go
to
one
the
left
child
and
then
the
right
child
one
by
one.
But
you
know
this
is
not
very
scalable
if
you
have
like
a
million
nodes
right,
so
you
know
we
have
s
TB
and
we
have
pipes,
so
we
can
just
make
a
command
that
works
it
for
us
and
what's
great
about
this
is
this
is
like
definitely
drunk
from
our
experiences
from
MDB
on
MTB.
You
do
something
very
similar,
but
you
would
say
pipe
a
VL
actually
stb.
A
We
don't
just
pipe
integers
and
pointers
around.
We
just
pass
like
whole
objects
with
their
own
state
like
type
information,
addresses
value,
symbol,
names
and
things
like
that.
So
what
can
actually
figure
out
that
hey
I'm
being
passed
an
AVL
tree
and
I'm
gonna
walk
it
appropriately.
So
in
this
case
we
see
that
we
have
four
nodes
in
our
span:
namespace
AVL,
and
we
can
cast
these
nodes
to
spot
these
structures
and
one
behold
we
just
you
know
the
output
was
humongous,
so
I
just
printed
the
first
one.
We
have
a
pool
name
named
application.
A
We
can
get
a
pointer
to
its
config
so
far
and
so
forth.
We
can
actually
continue
our
pipeline
and
print
all
the
names
of
the
pools
by
just
walking
the
ABL
casting
everything
to
a
spotty
structure
and
dereferencing
the
span
name
member-
and
this
is
something
that
we've
done
so
many
times
that
we
actually
created
a
shorthand
for
it.
That
also
slightly
starts
printing
printing
things.
So
you
have
the
addresses
of
the
spotty
structures
on
the
left
and
their
names
on
the
right.
A
A
So
you
can
continue
introspecting
spotty
structures
just
like
that
in
this
example,
I'm,
just
printing,
all
the
singing
TX
T's,
the
country,
singing
the
exists
of
all
these
structures,
but
you
say:
okay,
we
have
all
these
thinking
takes
this,
but
that's
that
output
is
not
that
great
like
what
if
I
wanted
to
know
specifically,
what's
the
kind
singing
take
three
of
the
are
pole.
Well,
these
are
just
numbers
here.
Well,
what
you
could
do
is
you
could
actually
use,
filter
and
filter.
It's
a
pretty
powerful
command.
In
this
example.
What
filter
does
is.
A
First
of
all,
we
take
all
this
party
structures
and
we
filter
all
these
objects
that
we
are
passed
through.
The
pipe
and
we're
saying:
okay,
I
just
want
the
objects
whose
pan
name
equals
to
our
pool,
and
then
you
can
print
a
spasming
txt
and
again
we
do
this
often,
so
we
actually
added
argument
parsing
to
spa.
So
we
can
do
this
right
away
but
filter
by
itself.
It's
just
like
a
powerful
command
and
drew
ow.
A
I
have
another
example
actually
of
how
good
filter
is
drawing
from
whatever
we
learn
by
Paul
earlier
about
Metis
labs
and
being
loaded.
Let's
say
that
we
wanted
to
know
how
many
Metis
labs
are
currently
loaded
in
the
our
pool.
This
would
be
a
pipeline
that
we
would
construct
and
we
can
walk
this
together.
So
these
are
the
rail
for
the
relevant,
see
structures
in
the
code.
So
what
happens?
We
go
through
all
the
spot,
distractors
structures
and
specifically
look
for
one.
The
one
whose
span
name
is
our
pool.
A
We
dereference
the
spa
Metis
labs
by
Flass
member,
which
is
an
AVL
tree.
Then
we
walk
that
AVL
tree,
basically
all
which
basically
it's
the
AVL
tree
that
contains
all
the
Metis
laps
in
the
pool.
We
cast
its
node
to
a
metal,
slab,
T
pointer,
and
then
we
filter
all
of
these
structures
by
the
MS
loaded
field,
which
basically
indicates
if
a
metal
slab
is
loaded
or
not,
and
then
we
can
pipe
lists
to
count
and
that's
basically
like
a
pretty
wedding.
Okay,
like
how
many
objects
were
passed
in
this
pipe,
so
that's
pretty
cool.
A
We.
We
can
basically
ask
almost
any
question
that
we
want-
and
you
know
we
can
create
a
pipeline
for
this-
to
answer
a
question.
For
example,
and
another
thing
though,
besides
the
pipe
lines,
is
also
pretty
printing.
So
let's
say
that
you
wanted
to
print
all
the
and
Flast
allocation
segments
in
the
art
bowl
you
construct
a
pipeline.
That's
similar!
You
know
you
walk
all
the
meta
slabs,
you
cast
them
and
you
get
the
MS
and
Flast
allocations,
member,
which
is
a
rain
stream,
and
we
have
a
command
that
you're
pass.
A
It
array
you're,
passing
in
terrain
tree
and
it
pretty
prints
things.
For
example,
okay,
the
first
strange
tree
didn't
have
any
entries
and
zero
bytes
in
it,
but
the
second
rain
streak
has
151
entries
and
you
can
see
them
all
pretty
pretty
nice
with
their
offsets
and
their
lengths
and
what's
again
cool
about
this-
is
that
rain
straight
is
not
just
a
pretty
printer.
A
You
can
actually
continue
the
pipe
from
there
and
say:
okay
from
this
like
segments,
how
many
are
above
this
offset,
so
you
use
the
filter
command
at
the
end
and
you're
saying:
okay,
I
want
RS,
start
range
segment
start
to
be
bigger
than
this
value.
Then
you
can't
so
just
focusing
a
little
bit
more
on
the
on
this
concept
of
pretty
printing
most
debuggers.
You
know
for
something
like
a
block
pointer
here,
ID
reference.
Basically,
the
block
pointer
of
the
over
block
would
print
something
like
this.
This
is
pretty
standard.
A
We're
like
okay,
it's
a
C
structure,
I'm
gonna
show
you
each
field
and
the
value
of
its
one
of
the
fields
and
that's
good,
that's
helpful,
sometimes,
but
most
of
the
time
you
know
you
have
things
like
blk
prop,
which
is
basically
an
integer
where
each
bit
is
a
flag.
All
you
have
DVS
divides
virtual
addresses,
which
here
are
represented
with
to
you
in
64
eats,
but
they
actually
means
so
much
more.
A
So
we
have
the
block
by
turn
command,
which
can
basically
teach
GDB
GDB
stb
to
make
things
more
readable,
decode
these
values
for
you
and
actually
present
useful
information.
That
answers
your
question
so
in
this
example,
we
decoded
the
DBA
words
to
the
axle
DVS
here
and
you
can
see.
Okay,
the
va0
is
an
addressing
vid
of
0
at
this
offset
with
that
length
continue
in
this
example,
you
know
blk
prop,
as
I
said,
it's
like
a
bit
field.
Where
now
we
can
pretty
print
and
say,
okay,
what
Oh?
A
This
big
number
means
is
that
this
this
is
a
level
0
block
we're
using
flat,
sir
for
for
checksumming
we're
using
LG
for
for
compression
and
so
far
and
so
forth.
Another
very
useful
command.
That's
currently,
actually
that's
currently
work
in
progress.
This
is
the
only
sample
that
I
could
make
working
for
now.
B
A
Can
actually
print
all
the
CIOs
in
on
the
system
and
again
this
is
a
this
can
be
either
for
a
running
system
or
across
them,
and
this
is
pretty
powerful
because
you
can
see
the
relations
of
all
the
CIOs.
You
know,
parent
to
child.
You
can
see
what
type
of
zio
it
is.
You
can
see
the
current
state
and
you
can
see
if
there
is
any
waiter
waiting
on
that
CIO
and
yeah
continuing
in
this
small
tour
of
commands
there.
A
A
That
above
has
like
the
most
active
memory,
but
what's
even
more
interesting
is
that
you
get
a
mapping
of
where
the
caste
is
actually
reside
in
ZFS
on
linux,
for
certain,
depending
on
the
size
of
the
allocation,
we
do
different
things
for
smaller
locations.
We
are
using
an
underlying
Linux
slap
casts
and
for
bigger
ones.
A
We
use
Casas
kind
of
like
maintained
on
the
SPL
layer,
so
that's
pretty
useful
to
kind
of
like
know
and
see
where
to
look
for
different
things,
and
specifically,
you
know
with
the
Linux
lab
allocator,
just
doing
merging,
basically
putting
two
different
classes
together,
sometimes
just
to
like
save
space,
you
can
be
even
more
specific
and
say:
okay,
art,
buff
header.
T
fool
is
a
cast
that's,
but
by
Linux
lab
and
it's
actually
part
of
the
task
stats,
Linux
casts
and
then
just
adding
on
top
of
this
command.
A
A
So
you
you
see
that
okay,
it's
using
316
kilobytes
memory,
it's
backed
by
a
Linux
lab
Cass
named
K
malloc
4096,
whose
total
memory
is
3.7
megabytes
and
is
utilizing
basically
8%
of
that
gas,
and
you
can
actually
just
break
out
of
that
and
ask
more
questions.
Okay,
like
what's
the
actual
caste
look
like,
and
we
have
the
slabs
command,
which
is
basically
SPL,
kmm
Casas,
but
for
the
Linux
kernel,
and
you
can
look
up
that
caste
and
you
can
see
okay,
the
actual
cash
utilization
is
65%.
A
All
right,
I
have
even
more
examples
and
I
can
we
can
talk
about
them
during
questions
so
and
if
we
have
time
but
I
just
want
to
quickly
go
through
how
SDB
actually
works.
You
can't
talk
about
s
DB
without
talking
about
dragon.
Dragon
was
a
is
a
small
C
library
in
Python
developed
by
Omar
Sandoval
at
Facebook.
It
basically
enables
the
Python
interpreter
to
introspect
live
systems
and
grass
tons.
It
comes
with
a
nice
Python
API
and
object
model.
It's
pretty
fast.
To
start
up.
A
It's
still
like
some
fixture
features
like
function
arguments,
but
we're
working
on
that
the
community
is
still
small,
but
they've
been
they've,
started
marketing
pretty
aggressively
and
it's
growing
and
their
overall
open
to
patches.
So
you
could
actually
do
a
lot
of
these
things
that
I
just
saw
just
with
dragon
as
long
as
you
are
willing
to
just
type
on
a
Python
or
apple
just
Python
code,
and
this
can
be
very
cumbersome
because
you
know
imagine
that
you're
debugging
an
issue
in
production
and
you
have
the
Python
or
Apple
in
front
of
you
like
now.
A
You
know
your
focus
should
be
debugging,
but
you're
actually
distracted
by
trying
to
write
a
program,
getting
your
spaces
right
and
making
sure
that
you
don't
get
syntax
errors.
So
I
mean
there
is
a
reason
why
you
use
the
cell
for
everyday
things
instead
of
like
something
like
a
Python
repo.
So,
okay,
that's
Ragan.
What
is
this
to
be?
Sdb
is
basically
a
layer
that
leverage
is
the
dragon
API
to
provide
the
debugging
experience
that
I
just
saw
it
can
be
extended,
as
I
said
in
Python,
with
new
commands.
A
These
commands
generally
use
the
Dragon
API
to
query
whatever
they
need
from
the
kernel,
and
then
you
know
these
kind
of
like
code
that
you
can
write
to
query.
The
kernel
can
be
reusable
by
using
SDP
constructs,
so
it
can
actually
receive
and
pass
objects
through
a
pipe
and
again
as
I
want
to
point
out
like
point
two
is
a
pretty
powerful
concept
because
we're
just
passing
like
dragon
objects
that
have
oh,
like
a
whole
context
with
them.
They're,
not
just
pointers
that
we
pass
along.
A
So
just
to
recap:
SDB
is
a
debugger
for
live
systems
and
customs.
It
leverages
the
dragon
library
to
introspect
its
targets.
It
can
be
extended
in
python
to
work,
complex
data
structures
that
developers
write
and
things
like
filters
and
aggregators
and
pritam
printers
of
data,
and
basically,
what
we're
going
for
here
is
that
allowing
the
user
to
ask
almost
any
question
that
can
be
answered,
given
the
available
state
that
they
have
either.
A
A
No,
no,
it's
not
it's
actually
out
what
yes.
So
the
question
was:
where
is
this
tool
located
and
is
it
a
work
in
progress?
And,
yes,
it
is
a
work
in
progress,
but
we've
already
started
using
it
and
you
can
use
it
too.
I
have
a
resources
page
with
a
github
repo
it.
Currently,
it's
currently
out
there
at
the
Dell
fix
a
github
group,
but
you
should
be
able
to
clone
it.
You
know
work
on
it
and
I'd
be
more
than
happy
to
accept
patches
to.
A
One
discussion
that
I
would
actually
like
to
bring
up
at
some
point
is
potentially
having
some
stb
scripts
decoupled
outside
of
this
DB
repo
in
ZFS.
So,
ideally,
when
a
developer
introduces
new
code
and
adds
new
structures
are
made,
some
changes,
they
also
change
the
corresponding
sdb
Python
scripts,
together
with
it.
This
is
kind
of
like
a
model
that
we
had
at
del
phix
from
before,
with
the
Lumos
and
MDB,
and
it
it's
something
that
has
worked
pretty
well
for
us
and
it
also
decreases
them.
A
The
burden
of
like
hey
I
need
to
maintain
two
different
things
in
two
different
repos.
It
also
went
you
like
submitter
PR,
at
least
me
Ezzor.
If
you're,
it
makes
me
a
little
bit
more
confident
seeing
that
hey
you've
actually
added
some
scripts
together
with
your
code
and
we,
if
something
fails
in
production
and
I,
have
no
idea
about
your
feature.
I
can
at
least
see
what's
wrong
with
like
this
has
to
be
command
that
you
provided.
A
Yes,
so
yeah
during
a
transition,
we
actually
looked
at
a
lot
of
different
things.
You
could
ask
like:
why
didn't
we
just
use
GDB
or
a
class,
or
you
know
something
existing
in
Linux
or
why
did
we
not
port
NDB
right,
and
there
are
multiple
trade-offs
to
all
of
these
things
right
specifically
for
MDB,
it
would
be
the
porting
effort.
Is
that
it's
pretty
big,
meaning
that
NDB
like,
even
if
we
just
ported
it
as
it
is
out
of
the
box
right
currently
like
it
words
with
see
only
with
CTF.
A
It
has
some
dwarf
support,
basically
kind
of
like
translating
dwarf
to
CTF
on
the
fly,
while
reading
it
and
I
actually
work
with
Robert
Misaki
to
do
that
initially,
but
we
had
so
many
problems,
because
city
of
convert
couldn't
handle
new
dwarf
constructs
after
Dorf
version,
2
and
other
weird
things
like
that.
So
basically
porting
wouldn't
be
that
easy.
Now
you
could
say:
okay,
like
you,
already
have
this
infrastructure
right
built
on
this,
like
don't
you
care
about
that
too?
A
But
it
turns
out
that
once
we
found
something
like
dragon
that
does
most
of
the
stuff
that
we
currently
need
implementing
a
new
interface
on
top
of
that,
it's
not
a
lot
of
engineering
effort
and
it
actually
works
already
with
the
Linux
ecosystem.
Like
you
know,
the
Dorf
simple
information
everywhere,
like
the
kind
of
like
different
types
of
grass
dams
like
we
wouldn't
have
to
deal
with
writing
that
code
anymore
and
dragon
was
something
that
was
already
used.
A
A
A
A
Alright,
so
I
just
have
future
work
slide,
and
these
are
some
of
the
things
that
we're
thinking
of
doing
in
the
future.
I
just
wanted
to
point
that
out
just
for
the
github
repo.
Here
there
is
actual.
As
for
the
actual
community,
you
can
start
by
checking
out
the
github
repo
there
and
you
may
actually
find
some
references
to
something
that
we've
started
doing.
Basically,
we've
created
a
small
organization
that
we're
trying
to
attract
people
working
on
Linux
debugging
across
the
Linux
debugging
landscape,
meaning
from
debuggers
to
actual
kernel
developers.
A
You
know
changing
the
craft
and
format
and
things
like
that.
Unfortunately,
I
don't
have
a
slide
with
all
this
information,
but
I'd
be
more
than
happy
to
share
that
with
you
yeah.
We
have
monthly
meetings
and
we're
basically
trying
to
seek
up
all
together
to
make
sure
that
there's
no
duplicate
work.
You
know,
and
you
know,
no
two
people
are
working
on
the
same
things
and
kind
of
like
yeah.
Just
basically
do
decision
make.
You
know
how
we
want
things
to
look
like
in
the
future.
A
All
right,
oh
yeah,
first
thing
over
there
at
future
work
is
that
we
need
more
commands,
ZFS
or
even
outside.
If
that's
your
jam,
so
I'll,
be
there
tomorrow
at
the
hackathon
and
I'd
be
more
than
happy
to
help
everyone
write
new
commands
or
even
just
like,
set
up
stb
and
use
it
to
introspect
the
system.