►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello,
everybody
and
welcome
to
another
ceph
code,
walkthrough,
I'm
here
with
patrick
and
we're
going
to
be
going
over
typically
over
the
cef
file
system
base,
in
particular
with
the
metadata
servers,
which
is
actually
keeping
various
well
metadata
on
various
objects
and
inside
of
stuff.
So
patrick,
will
you
please
take
it
away?
A
B
B
B
That's
the
one
we're
going
to
be
looking
at
through
most
of
this
talk,
which
is
all
the
source
code
for
the
mds
itself,
which
manages
the
metadata
in
ceffs,
there's
also
the
source
client
directory
which
we're
not
going
to
be
getting
into,
but
that's
where
the
libs
ffs
and
the
fuse
client
lives
and
all
the
code
for
for
handling
talking
to
the
mds
and
also
the
osds
and
then
finally,
there's
another
module
here
called
the
osdc
or
the
osd
object,
cacher,
which
is
used
by
both
the
mds
and
the
client.
B
The
client
uses
it
to
read
and
write
file,
data
to
a
range
of
objects
and
the
mds
also
uses
the
object
cacher
to
delete,
object,
delete
files
or
delete
the
range
of
objects,
backing
a
file's
data
and
also
to
handle
its
journal
so
and
then
finally,
there's
another
bit
of
code
for
those
who
want
to
get
even
more
adventurous
in
the
the
kernel
the
linux
kernel
itself
in
the
fsf
directory.
B
That's
where
the
ffs
kernel
driver
lies
again,
we're
not
getting
into
that
today,
so
starting
with
the
or
getting
into
the
mds,
a
good
place
to
start
as
usual
would
be
the
main
function,
and
that
is
in
this
step,
mds
directory
and
unlike
the
rest
of
the
mds
code,
that's
actually
in
the
just
in
the
top
level
source
directory
of
the
sep
tree,
and
this
is
just
a
standard
theft
demon
that
that
configures
signal
handling
reads
arguments.
B
So
here's
the
main
function,
it
configures
its
thread,
name
parses
arguments.
Does
some
global
initialization,
that's
common
to
all
theft
demons
and
nothing
too
special
about
or
unique
to
the
mds
in
this
code,
fairly
common
to
all
the
sf
demons.
The
interesting
bits
here
are
when
the
the
mds
actually
creates
a
messenger
sets
some
policies
for
the
messenger
sets
the
buying
address
for
the
address
and
port
that
the
clients
are
going
to
connect
to
configures
and
monitor
clients.
B
A
class
that
lies
in
the
source
and
the
s
directory
tree
and
that's
going
to
be
actually
the
entry
point
for
all
the
work
that
the
mds
is
going
to
do.
And
finally,
this
main
function
just
as
a
weight
in
the
messenger
loop
and
that's
the
place
where
it
sits,
letting
the
messenger
pick
up
messages
and
then
the
mds
team
will
actually
handle
them
when
this
method
actually
ends.
That's
when
the
the.
A
B
So
getting
into
the
mds.
We'll
now
talk
about
the
mds
daemon.
B
So
this
is
the
startup
code
for
the
mds
and
also
the
handling,
so
the
standby
state
for
the
mdss,
all
mds's
start
in
a
standby
state
waiting
to
be
given
a
place
in
an
mds
cluster
for
a
file
system
by
the
monitors,
so
the
mds
daemon
will
sit
and
listen
for
a
new
mds
map
and
that's
handled
in
this
handle
mds
map
function.
B
Open
the
c
file
on
the
bottom
and
you'll
see
this
method
is
actually
called
in.
A
method
called
handle
core
message,
and
this
is
driven
by
this
ms
dispatch
2
method.
This
is
a
method
common
for
all
messenger
dispatchers
in
in
ceph,
where
the
this
ms
dispatch
method
is
called
when
any
new
message
arrives
and
the
mds
daemon
is
going
to
try
to
actually
process
that
message.
So
it's
going
to
try
to
handle
this
as
a
core
message.
B
And
it
goes
to
a
bunch
of
different
message
types.
So
is
it
a
minor
map?
Is
it
an
mds
map
and
so
on?
It
is
an
mds
map
is
processed
here.
Coming
back
to
this
handle
mds
map
method,
and
this
involves
decoding
the
actual
mds
map
from
the
message.
B
And
then
finally
processing
it
and
it's
going
to
do
a
number
of
things
for
checking
the
state.
So
it's
going
to
look
at
it's
going
to
do
a
number
of
state
transitions
according
to
what
the
monitors
tell
it
should
be
doing.
There's
a
bunch
of
sanity
checking
in
here.
One
one
of
particular
note
is:
if
it
gets
removed
from
the
nds
map,
then
the
mds
is
actually
just
going
to
respawn
itself.
B
B
So
again,
there's
a
bunch
of
state
transition
code
here
for
for
handling
the
mds
map,
notably
when
it's,
if
it's
actually
not
a
standby
mds
and
it's
and
it's
an
active
mds.
It's
going
to
hand
this
new
mds
map
off
to
the
nds
rank
which
we'll
be
talking
about
next,
where
further
processing
may
occur.
B
Let's
move
on
to
the
mds
rank.
The
mds
rank
is
a
fairly
recent
bit
of
code
in
sepafes.
B
It
was
added
about
four
years
ago
as
an
abstraction
for
the
state
the
mds
holds
when
it
actually
is
a
member
of
a
cfs
file
system
and
not
a
standby,
and
this
is
where
the
mds
will
handle
moving
between
major
states
like
during
recovery
when
the
mds
during
failover,
when
an
mds
is
starting
up.
It
goes
to
a
number
of
states,
including
replay
rejoin,
resolve,
etc,
and
this
is
all
handled
within
the
mds
rank
class.
B
B
It's
going
to
handle
some
queuing
for
continuations
in
the
nds.
According
to
changes
in
state.
B
We
also
have
these
methods
for
when
the
mds
hit
certain
bugs,
for
example,
the
the
mds
is,
it
finds
damaged
metadata.
It
will
call
these
methods
for
actually
notifying
the
monitors
that
the
rank
is
damaged
and
then
committee
and
then
committing
suicide,
so
it
doesn't
cause
further
damage
to
the
metadata.
B
Another
a
code
here
is
that
the
mds
wholesale
keeps
track
of
the
nbs's
mds
maps
from
other
ranks
and
use
that
to
make
sure
that
they're
that
the
mds
cluster
is
consistent
in
terms
of
what
the
state
of
all
the
mdss
are.
B
And,
let's
see,
mds
rank
is
also
responsible
for
handling
its
admin
socket
command.
So
if
you
ever
tell
an
mds
to
drop
its
cache
or
list
the
sessions
this,
this
is
handled
in
this
method.
B
Is
so,
as
we
saw
earlier
in
the
mds
demon,
it
actually
will
dispatch
certain.
I
don't
believe
we
saw
that.
Let
me.
B
All
right,
so
here's
the
ms
dispatch
method
in
the
mds
daemon,
if,
if
it
doesn't
handle
this
core
message,
for
example,
an
mds
map
update
or
a
monitor
map
update,
then
it
checks
if
it
actually
has
a
rank
and
if
it
does
it'll
call
them.
Yes
ranks
ms
dispatch
method.
B
Is
various
checks
here,
but
the
making
sure
that
the
sources
of
the
the
message
are
valid
then
we're
calling
into
this
dispatch
method
and
again
more
checks
are
being
done.
For
example,
if
the
mds
determines
that
it's
laggy,
for
example,
is
this
is
checked
by
by
the
mds
sending
a
a
beacon
or
a
heartbeat
to
the
to
the
monitor
clusters
periodically,
and
if
it
doesn't
receive
an
acknowledgement
response
back
from
the
monitors,
then
it
begins
to
believe
that
it's
laggy
or
there's
a
network
partition
in
that
situation.
B
The
mds
rank
will
actually
just
put
the
message
on
a
queue
waiting
to
be
processed
later
until
the
mds
is
no
longer
laggy,
and
this
is
one
of
those
cues
we
talked
about
earlier,
and
this
would
be
a
continuation
for
that.
B
B
B
This
is
all
done
through
the
server
dispatch,
in
addition
to
the
client
request,
which
is
the
the
message
that
the
mds
is
going
to
be
processing
the
most
perhaps
with
the
inclusion
of
capability
updates,
various
pure
requests
from
other
mds's
heartbeats
from
other
mds.
This
drives
the
metadata
balancing
we'll
take
a
look
at
that
later.
B
Here's
where
the
mds
locker
will
receive
updates
to
the
to
the
capabilities
or
from
other
mds's
locked
messages
that
update
the
locks
that
the
mds
has
maintained
for
all
the
metadata.
We'll
look
at
that
again,
some
more
later
and
also
client
capability
updates
from
from
the
clients
themselves,
also
go
to
the
locker,
so
the
mds
rank
is
sort
of
the
gateway
to
all
the
dispatch
loops
for
the
other
parts
of
the
mds.
B
Right,
let's
move
on
to.
B
So
again,
the
monitors
keep
track
of
all
of
the
mds's
in
the
file
system,
and
this
is
done
through
an
fs
map.
It's
similar
to
the
osd
map.
It's
going
to
keep
track
of
or-
and
I
should
start
out
by
saying
the
old
way.
The
old
way
we
kept
track
of
the
mds's
in
the
in
the
cluster
was
to
what
was
called
an
mds
map,
and
this
is
still
still
present
in
ceffs,
but
it
was
the
because
we
only
had
one
file
system.
B
There
was
only
one
mds
map
and
back
when
john
spray
was
working
on
adding
multiple
file
system
support
to
cefs.
We
added
this
new
class
called
the
fs
map,
which
maintains
multiple
nds
maps
for
each
file
system
we
might
have,
and
that
will
so
here's
the
fs
map
that
the
monitor
is
keeping
track
of.
You
want
to
look
at
the
data.
B
This
is
hard
to
navigate
on
a
small
screen,
so
here's
the
main
data
structures
that
the
fs
map
is
keeping
track
of
the
main
one
here.
That's
interesting
is
this
file
systems
list.
B
So
looking
next
at
the
file
system
class,
this
is
mostly
a
wrapper
around
mds
map
which,
as
I
said
now,
we're
having
a
number
of
mds
maps
according
to
the
number
of
file
systems.
B
So
this
class
just
holds
three
field.
Three
three
members
used
to
be
two
as
of
just
a
week
or
two
ago,
this
new
one.
This
mirror
info
is
brand
new,
but
this
is
just
the
cluster,
the
file
system
id
and
then
also
the
mds
map
for
it.
B
All
the
data
members
are
at
the
bottom
of
the
class,
so
the
mds
map
keeps
track
of
a
number
of
things,
including
this,
the
epoch,
which
allows
us
to
keep
track
of
changes
that
have
been
made
to
the
to
the
mds
map.
This
is
mostly
useful
for
for
the
monitors,
keeping
track
of
changes,
but
also
clients
that
need
to
know
if
there's
been
an
actual
update
to
the
map
or
not
here
we're
going
to
keep
track
of
the
the
file
system
name.
B
B
The
max
mds,
so
when
you're
setting
the
max
mds
on
file
system
to
increase
the
number
of
file
system
ranks,
that's
where
this
change
would
be
set
a
number
of
data
pools.
So
if
you
use
the
fs,
add
data
pool
command
that
data
pool
id
would
be
added
to
this
vector.
You
can't
actually
set
a
file
layout
in
cefs
without
first
adding
the
with
a
with
a
separate
data
pool
without
first
adding
it
to
this
list
of
data
pools
in
the
in
the
mds
map.
B
That's,
of
course,
a
form
of
protection,
because
you
don't
want
clients
to
be
writing
to
arbitrary
pools,
we're
also
keeping
track
of
a
number
of
mds
ranks
that
are
are
available,
so
the
in
set
is
the
number
or
the
various
mds
ranks
that
are
in
the
file
system.
This
will
just
be,
could
actually
just
be
an
integer
now,
rather
than
a
set,
it's
going
to
be
a
consecutive
mds
rank
0
to
n
n
being
max
mds
for
the
cluster.
B
B
A
B
B
Then.
Finally,
we
also
have
this
up
map,
so
this
is
keeping
track
of
which
mds
ident
ids
are
actually
associated
with
a
given
rank.
B
So
this
is
another
class
or
struct
in
the
mds
map
which
supports,
or
is
really
just
keeping
track
of,
the
number
of
fields
associated
with
an
mds
daemon,
including
its
global
identifier,
the
name
of
the
demon
so
like
mdsa,
which
rank
is
following,
if
any,
what
stated
it
is
in
and
the
file
system
that
it
wants
to
join.
This
is
a
new
feature
we
added
in
ffs
to
have
and
to
allow
you
to
have
an
mds
join
a
particular
file
system.
B
And
also
what
addresses
the
the
fps
is
on.
So
this
is
all
kept
track
of
in
the
mds
map,
and
perhaps
I
should
have
mentioned.
B
That
an
mds
can
be
in,
and
that's
here
in
this
demon,
state,
enum
and
you've-
probably
seen
most
of
these.
If
you've
ever
admired
a
step.
F
cluster
before
creating
is
done
when
we're
creating
a
new
mds
rank
or
or
restarting
an
mds
rank
after
shrinking
a
cluster
and
then
growing
it
again,
and
then
a
number
of
states
associated
with
failover
when,
when
an
mds
starts
up
taking
over
for
a
rank.
B
It
goes
to
a
number
of
these
states,
and
I
showed
earlier
that
the
mds
keep
track
of
the
of
what
mds
map
other
mds's
are
aware
of,
and
that's
partially
to
or
that's
mostly
to
keep
track
of
or
to
have
consistent
operation
when
they're
doing
recovery
as
a
group
and
that's
important
during
especially
cash,
the
recovering
the
global
cash,
the
state
of
the
global
cash.
B
B
So
the
next
thing
we're
going
to
look
at
is
this
mds
server
me
drink
of
water,
and
this
is
the
module.
That's
handling,
client
request
dispatch
predominantly
so,
if
we
place
to
start
here,
would
be
this
dispatch
method.
B
B
Again,
remember
that
the
mds
rank
calls
this
dispatch
method.
If
it's
a,
if
it's
one
of
the
types
of
messages
that
the
server
would
handle
here,
we
have
this
switch
based
off
of
the
message
type
looking
and
the
server
will
fork
off
didn't,
say
fork.
It
will
call
off
message
for
handling
each
different
type
of
met
message.
So
here's
the
the
client
reconnect
so
during
the
nds
failover,
one
of
the
states
is
reconnect.
B
B
And
then
certain
types
of
messages
require
waiting
for
for
active.
So
if
the
message
requires
that
the
server
be
active,
for
example,
we're
just
doing
a
standard
make
dir
or
open,
then
then
that
message
will
be
retried
once
the
mds
turn
becomes
active.
B
Then.
Finally,
another
switch
on
the
message
type
here,
we're
looking
at
the
client
session.
So
whenever
a
client
opens
a
new
session
with
mds.
This
is
where
all
the
bookkeeping
for
that
would
would
be,
would
be
done
in
this
handle
client
session
message.
Then
we
have
into
client
requests.
This
is
where
the
the
main
entry
point
for
for
all
the
the
request
handling
in
in
the
mds
client
reclaim
is
a
new
feature.
B
So,
let's
look
at
handle
client
requests,
so
any
client
message
that
will
access
metadata
on
in
cefs
will
go
through
this
code
path
and
there's
a
number
of
different
types
of
client
requests.
B
B
B
B
Here
we're
doing
some
bookkeeping
for
the
the
request.
So,
if
you've
ever
seen,
the
mds
make
note
of
slow
operation
for
a
client
for
like
a
a
get
adder,
that's
not
that's
being
served
too
slowly.
This
is
part
of
that
bookkeeping.
The
mds
will
will
track
every
request
in
this
md
request
class,
which
is
in
this
mutation.h,
which
is
gonna,
keep
track
of.
B
I
mean
I'm
going
fast.
It's
going
to
keep
track
of
all
the
metadata
requests
that
the
clients
have,
and
this
is
going
to
keep
track
of
locks.
The
the
distributed
metadata
locks
that
the
the
client
acquires
as
part
of
a
request
and
any
other
metadata,
but
also
how
long
it
takes
to
actually
process
the
request.
B
B
And
then,
here
we
pull
the
message
off
the
client
request,
because
we're
going
to
take
a
look
at
it
again
and
here
we're
doing
a
number
of
checks.
For
example,
this
is
a
check
to
see
if
the
the
file
system
is
full,
so
we're
looking
at
the
type
of
metadata
opera
doing
if
it's
doing
a
mutation.
B
If
the
cluster
is
full,
then
it's
going
to
respond
with
if
there's
no
space,
otherwise
we're
going
to
get
into
more
specific
code
paths
and
that's
going
to
be
as
a
switch
based
off
of
the
type
of
operation
we're
doing.
B
B
We
have
getadder,
which
is
looking
up
by
path
set
adder,
which
would
be
flushing
metadata
mutations
on
the
client
side
to
the
mds,
for
example,
a
change
mod
there's,
not
a
separate
change,
mod
rpc
and
then
yes,
it
would
all
be
processed
through
setadder
we
have
our
x
header
operations
directory
reader,
setting
up
posix
files,
locks,
creating
a
new
file
link,
armor,
rename
maker,
mlink,
etc.
B
Again
notice.
There's
no
change
mod.
That's
all
done
through
set
editor
all
right,
so
we
don't
have
time,
of
course,
to
go
through
all
these.
So
I
thought
we
would
just
take
a
look
at
handle
client
get
adder,
which
is
akin
to
a
stat
in
posix.
B
So
here
again
we're
getting
the
client
request
message
and
we're
going
to
be
look
doing
some
basic
checks
on
on
the
message
itself
to
make
sure
that
it's
a
valid
request.
For
example,
the
file
name
path
can't
be
empty
if
it
is
and
when
we
return
it's
invalid.
B
B
B
B
Okay,
so
as
part
of
a
get
out
of
request,
we
also
issue,
what's
called
a
capability
in
step
fest.
So
that's
a
type
of
lease,
although
leases
and
cepha
suffocals
there's
also
leases
and
suffice,
and
there
and
they're
slightly
different,
but
there's
an
old
file
system
concept
from
the
90s
called
elise,
which
is
a
type
of
lock
that
clients
get
on
metadata
to
ensure
that
they
have
that
the
metadata
is
not
changed
so
therefore,
they
can
cache
the
metadata
and
improve
the
latency
of
operations
locally.
B
So
the
mds
has
fine
grain
capabilities
tracking
the
different
rights
that
clients
have
on
on
a
given
inode,
including
reading
writing
file.
The
file
laid
out
layout,
the
number
of
hard
links
on
the
file,
what
authenticate
authenticated
or
the
the
state
of
the
authorizations
on
the
the
file
like
the
the
uid
or
or
the
permission,
bits,
etc?
B
Apologies-
I
don't
know
what
this
fix
me
is
for,
but
here
we're
looking
at
the
different
things
that
the
the
get
adder
is
asking
for.
For
example,
we
want
the
the
link
the
number
of
hard
links
for
the
file.
We
would
be
setting
this
theft
cap
link
shared
bit
and
we
would
add
a
read
lock
for
getting
the
link
lock
on
the
inode.
B
This
ref
here
is
a
kind
of
poorly
named
variable,
it's
the
actual
inode
associated
with
the
what
we're
doing
a
get
adder
on
and
we'll
see
this
later,
but
there's
a
number
of
metadata
locks
on
on
the
the
metadata
itself
to
corresponding
to
who
has
a
permission
to
change
it
or
if
we
have
a
collection
of
read
locks
on
the
on
that
lock.
We
have
multiple
clients
that
are
that
want
to
have
the.
A
B
B
The
x
adder
lock
so
the
I
know
the
client
has
a
the
collection
of
of
extended
attributes
set
on
a
file.
So
it
would
have
this
lock
as
well
and
that's
all
controlled
by
what
the
client
asks
for
and
the
client
can
be
can
limit
itself
on
what
it
what
it
wants
and
that
can
help
make
sure
the
request
is
processed
more
quickly.
If
we
don't
have
all
the
clients
asking
for
right
locks,
which
would
require
us
to
revoke
a
lot
of
permissions
on
other
clients
and
slow
everything
down.
B
So
the
clients
can
be
minimalistic
in
what
they
asked
for
and
after
adding
all
the
different
kinds
of
locks
that
the
client
needs
in
order
to
actually
get
this
design
out
state.
It's
all
driven
to
this
nds
locker
acquire
lock
and
the
locker
is
going
to
be
responsible
for
looking
at
each
of
the
locks
that
are
being
asked
for
and
trying
to
change.
The
state
of
the
distributed
locks
across
all
the
mdss
and
also
the
clients,
and
try
to
change
and
poke.
A
B
State
of
the
locks
in
a
direction
that
gets
us
to
a
point
where
we
can
actually
finish
this
request.
You
can
see
if,
if
we
don't
succeed,
we're
going
to
return
because
it's
going
to
be
the
operation
will
be
retried
later,
but
as
part
of
trying
to
acquire
locks.
If
there's
conflicts,
for
example,
another
client
has
exclusive,
extended,
attribute
permissions
and
we're
trying
to
get
all
the
files
I
know
state.
B
Likewise,
the
locker
is
also
going
to
handle
issuing
capabilities
so
as
part
of
acquiring
the
locks
on
this
file,
the
the
locker
is
also
going
to
take
care
of
the
details
of
issuing
the
capability
on
the
inode
in
question,
so
that
the
client
knows
what
it
can
cache
according
to
the
get
adder
and
in
a
general
stat
operation.
That
would
be
pretty
much
everything
and
it
would
have
a
read
lot
read:
capabilities
on
all
the
associated
metadata.
B
As
far
as
what
a
read
capability
would
look
like,
for
example,
with
the
x
adder
shared,
that
would
be
the
that
would
be
the
x
adder
shared
because
it's
going
to
be
sharing
the
date
with
other
clients.
It
wouldn't
have
permission
to
write,
whereas
theft,
capex
at
or
exclusive
would
would
grant
permission
to
make
modifications
to
the
extended
attributes.
B
And
finally,
we
have
this
check
to
make
sure
that
the
client
has
permission
to
read
that
inode
and
that
would
be.
For
example,
we
have
in
the
step
x,
capabilities
for
clients.
B
We
can
limit
access
based
off
of
the
network
address
or
what
path
the
client
should
be
able
to
to
read,
and
that's
all
done
through
this
check
access
method,
and
if
the
client
lacks
permission
to
read,
then
we
return,
and
here
we're
going
to
set
a
number
of
fields
in
the
reply
for
this
get
adder
request,
which
we're
also
calling
that
this
debug
message
and
then
we
finally
respond
to
the
request.
B
See
here
here
we're
actually
gonna
reply
to
the
client
request
and
make
a
m-client
reply
message.
B
And
there's
various
bookkeeping
done
on
the
metadata
mutation
so
that
mutation
class
I
talked
about
earlier.
They
were
marking
an
event
that
we're
replying
to
the
message.
So
if
you
did
see
a
slow
operation,
you
could
see
what
events
have
been
processed
and
when.
B
Okay,
so
let's
go
back
to
the.
B
You
know
client
get
at
her.
We
spoke
a
bit
at
length
about
what
acquire
locks
is
doing.
So
now
we
can
go,
have
a
look
at
that.
B
B
So
here
we're
getting
our
mutation
reference
for
the
actual
operation,
the
vector
of
walks
that
the
mutation
requires
and
some
other
metadata
concerning
the
off
pins.
So
here
we're
just.
B
The
locker
is
an
incredibly
complex
bit
of
code
because
we're
handling
a
distributed
locks
across
mdf's
and
clients,
but
for
those
who
are
interesting
and
we're
not
going
to
go
through
this
really
in
depth.
But
I
wanted
to
point
out
here's
where
the
caps
that
we
would
eventually
be
issuing
to
the
client.
So
there
might
be
a
set
of.
I
knows
that
we're
going
to
issue
caps
for.
B
B
B
All
right
so
again,
the
locker
is
a
very
complex
bit
of
code.
There's
a
very
there's,
a
state
machine
in
the
in
this
locks,
dot
c
and
h
files
users.
This
is
some
fairly
old
code,
which
is
responsible
for
managing
the
different
state
of
the
locks
associated
with
with
each
type
of
lock
that
an
inode
might
have.
B
That's
all
managed
here,
see
all
right.
So,
let's
move
on
to,
I
don't
think
we'll
get
through.
All
of
this.
B
Mdcash.H,
so
this
is
the
structure
that
manages
the
global
cat
or
the
cash
for
the
for
a
given
mds
rank,
and
this
is
where
all
the
management
for
changes
to
the
cache
are
gonna
take
place.
Access
to
like
traversing
a
file
path
is
all
gonna
go
through
that
amd
cache.
B
So,
there's
a
number
of
structures
that
keep
track
of
of
the
the
metadata
in
the
md
cache.
So
you
know
the
bottom
again,
there's
the
various
trucks
that
we're
keeping
track
of
here's
the
the
inode
map.
So
it's
going
to
associate
the
inode
number
that
we're
all
familiar
with
and
the
actual
ci
node,
which
is
a
structure
for
the
cache
inode.
B
B
The
filer,
which
is
again
used
for
managing
the
or
being
able
to
delete,
file,
tabs
or
file
file
data
and
then
also
setting
back
traces
on
files
in
the
default
data
pool,
which
allows
us
to
keep
track
of
the
the
file
path.
The
file
data
is
associated
with.
If
we
need
to
do
some
kind
of
recovery
and
then
various
state,
we
need
to
re,
we
need
to
keep
track
of
the
the
the
cache
and,
for
example,
uncommitted,
peer
rename.
B
Alder
is
going
to
be
used
when
we're
doing
rename
requests
that
affect
multiple
nds's.
B
Cache
is
a
set
of
sub
trees
if
you're
familiar
with
using
multiple
active
metadata
servers
and
cfs,
we
have
this
concept
of
a
subtree,
so
we'll
actually
split
the
file
system
tree
into
multiple
pieces
called
subtrees
and
they
can
be
hierarchical
and
nested,
and
these
are
distributed
across
mdss.
B
Each
mds
rank
will
keep
track
of
a
list
of
its
own
subtrees
that
it
can
see,
and
if
you
have
multiple
layers
of
subtrees,
some
mdss
may
not
actually
be
aware
of
the
full
subtree
map
across
all
the
mds
ranks,
and
it
does
it
keeps
track
of
the
ones
they
can
see.
However,
and
it
also
keeps
track
the
sub
trees
that
under
their
boundaries
to
the
to
that
subtree.
B
B
One
of
the
primary
methods,
that
of
the
metadata
cache
classes
is
path,
traverse
method,
and
we
saw
that
called
earlier
in
the
in
the
get
adder
code
path
and
that's
going
to
again
take
this
metadata
request
method,
a
factory
for
setting
up
continuations.
If
we
have
to
retry
the
command
later
so
we
can
keep
track
of
when
it
should
finally
be
retrieved.
B
What
the
file
path
of
the
path
we're
trying
to
traverses
various
flags,
and
that's
done
all
in
this
method.
So
we're
gonna
be
looking
at
various
flags
in
in
this
like
whether
or
not
the
we
want
the
directory
entry
associated
with
the
path
or
if
we
need
to
get
certain
read
locks
as
well,
that
can
all
be
batched
together
as
we
traverse
the
path.
B
File
names
from
the
path,
so
this
is
all
managed
here
and
there's
a
number
of
lock
operations
that
we
might
need
to
do
associated
with
traversing
the
path,
and
that's
also
bundled
up
into
this
method
as
well,
for
example,
as
part
of
path
traversal,
we
might
need
to
acquire
certain
read
locks
on
on
a
directory
fragment
as
we
as
we
traverse
the
path,
and
if
we
don't
have
it,
then
we
need
to
retry.
B
Similarly,
we
also
have
this
frozen
path,
so
I
spoke
about
how
the
sub
trees
would
be
distributed
across
multiple
ndss.
B
B
B
So
again,
there's
also
this
get
inode
method,
and
this
just
is
a
wrapper
around
the
inode
map
allowing
us
to
find.
I
know
this
is
used
in
several
places
in
the
mds
for
obvious
reasons
and
then
also
getting
a
directory
fragment.
So
this
is
this
durfrag
t
is
a
structure
for
keeping
track
of
the
inode
and
the
associated
fragment
of
a
directory.
So
talking
briefly
about
fragments
in
in
stuff
effects,
the
directory
may
be
sharded
across
multiple
objects,
and
this
is
for
efficiency
and
performance
reasons.
B
We
don't
want
in
an
object
that
represents
a
directory
in
rados
who
have
too
many
omap
key
value
pairs,
and
so
that
may
be
fragmented
into
multiple
objects
and
those
directory
fragments
can
be
distributed
across
multiple
mdss
or
or
just
kept
separate
for
for
space
for
performance
reasons,
but
we
can
also
again
distribute
them
across
mds's
so
that
we
can
improve
throughput
on
a
large
directory
by
allowing
operations
on
a
given
fragment
to
go
to
separate
mds's.
B
B
B
Okay,
well,
I
kind
of
went
through
the
the
code
path
for
for
processing,
request
kind
of
a
whirlwind
tour
of
the
nds,
and
I
think,
where
we
just
stopped,
was
about
as
good
a
place
as
any.
B
B
Thank
you
all
right.
So
if
there's
no
questions
I'll,
just
briefly
touch
on
a
few
other
files,
so
we
have
this
in
the
mda.
Cache
has
an
you
know:
it's
gonna
keep
track
of
various
metadata
objects
and
like,
for
example,
starting
with
the
root
directory.
So
you
can
use
path
traversal,
but
also
the
number
of
I
knows
that
are
in
the
cache.
I
talked
about
how
there's
the
mdss
keep
track
of
of
locks
for
each
type
of
metadata
object.
A
B
That
keeps
track
of
a
number
of
different
basic
operations
on
on
cache
objects,
including
inodes
directories,
directory
fragments
and
also
directory
entries,
and
so
there's
a
number
of
pins
that
we
might
have
set
on
a
on
a
cache
object.
For
example,
here's
a
pin
for
a
request.
So
when
we
pin
something
to
the
mdf
cache,
we
don't
want
it
to
be
trimmed
from
the
mds
cache
if
the
nds
cache
is
too
full
because
an
oper
there's
a
outstanding
client
requested
is
trying
to
utilize
this.
B
This
this
object,
and
so
this
is
the
type
of
pin
we
might
put
on
on
the
the
cache
object,
there's
also
locked,
pins
or
or
replicated
pins.
So
if
metadata
is
replicated
across
the
number
of
mds's
for
for
performance
reasons,
we
don't
want
it
to
be
removed.
While
it
still
has
outstanding
replicas,
and
then
we
have
the
number
of
state
opera
states,
for
example,
state
auth.
This
is
the
most
important
one.
B
This
is
just
indicating
whether
the
mds
holding
this
this
object
is
the
authority
for
that
object
and
therefore
has
the
capability
to
actually
do
modifications
to
the
object,
assuming
that
it
has
the
required
locks
and
so
there's
a
number
of
basic
operations
we
have
on
here
like
setting
the
state
checking
whether
we're
authority.
B
B
So
again,
that's
a
this
is
just
a
abstract
class.
So
let's
look
at
the
see
inode
classes,
the
the
cache.
I
knows
themselves
so
c
c
stands
for
cache,
of
course,
so
each
inode
in
memory
is
going
to
inherit
from
this
I
know
store
base
class,
and
this
is
all
of
the
metadata-
that's
actually
persisted
to
to
the
rados
metadata
pool
and
there's
three
primary
structures
associate
with
that
dig
it
up
here.
B
We
are
well
there's
a
number
of
things,
actually
there's
the
symbolic
link
string
if
it
is
a
symbolic
link,
the
the
tree
of
directory
fragments.
So
if
this
is
a
directory
inode,
we
would
keep
track
of
what
the
directory
fragmentation
is
on
that
on
that
directory.
B
Another
one
is
the
the
inode
data
itself,
so
there's
the
same
relatively
new
bit
of
code
here
I
know
constant
pointer,
which
is
just
this:
a
shared
pointer
of
this
memphis
inode,
which
itself
is
just
this
inode
t
structure
and
we'll
take
a
very
quick
look
at
that.
So
a
lot
of
the
metadata
types
and
step
effects
are
in
this
mds
types,
header
file-
and
you
can
see
that
here.
B
B
B
Coming
back
to,
I
know
store
base,
so
this
is
just
all
of
the
data
is
stored
in
the
stored
in
the
metadata
pool
and,
for
example,
here
here's
also
the
etc
x
center
map
associated
with
an
inode
all
right
and
then
that's
inherited
by
inode
store-
and
I
know
store,
is
some
c
plus
plus
organizational
class.
I
know
sort
bear
more
of
that
and
finally,
we
have
ci
node.
A
B
And
locks
what
state
the
inode
is
in,
for
example,
does
it
have?
Is
the
rstat
information
dirty
or
is
it
frozen?
B
Are
we
exporting
the
inode
and
that's
all
managed
here
another
the
one
more
interesting
bits
from
our
mds
hacking
standpoint?
Is
this
projected
inode?
So
this
is
where
we
would
actually
make
mutations
to
the
inode.
There's
a
there's,
a
vector
of
of
mutations
on
the
ino,
which
we
call
projected
inodes.
B
So
anytime
we
make
a
change
to
the
x-matter
map
or
the
inode
data
like
the
uid.
We
would
store
that
in
this
projected
inode
struct.
You
can
see
the
inode
pointer
and
the
x
saturn
map
pointer
here
and
when
those
changes
are
finally
persisted
to
the
mds
journal,
then
we
would
pop
that
projection
and
actually
set
it
in
the
the
main
part
of
the
inode
class,
but
as
part
of
journaling.
B
We
have
this
projected
inode
concept,
so
I
think,
I'm
out
of
time
again,
there's
other
cached
cache
objects
like
the
cache
directory
and
the
cache
directory
entry
and
cache
directory
fragment.
They
have
similar
operations
on
them
and
then,
finally,
also
the
capabilities
themselves,
which
are
just
a
in
mds
memory
concept
for
for
the
locks
that
clients
have
through
capabilities.
A
Patrick
I'm
looking
in
chat
right
now.
I
don't
see
any
questions.
Anybody
still
on
the
call
have
any
questions
all
right.
A
Well
then,
I
think
that
wraps
it
up
for
another
step
code,
walkthrough
some
thanks
to
patrick
in
the
chat.
So
thank
you,
patrick
for
taking
the
time
for
giving
us,
even
though
his
broad
this
was
a
very
well
in-depth
walkthrough.
I
feel
like
and
appreciate
everybody
joining
us
live
as
well,
and
we
do
these
every
month,
we'll
have
another
one.
Next
month
and
yeah
everybody
for
joining
us.