►
From YouTube: CephFS Code Walkthrough: kclient overview
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So,
anyway,
yeah
this
is
a
talk,
talk
about
a
little
overview
of
the
of
the
k
client
code
and
I'm
not
going
to
go
into
a
lot
of
real
depth
here.
I'm
just
going
to
give
this
sort
of
a
lay
of
the
lane
or
interested
in
doing
some
work
in
the
k,
client
and
you
want
to
get
started.
A
So
the
first
thing
is
that
when
you
pull
down
like
a
kernel
tree,
you
know
the
first
thing
to
do
is
to
understand
what
the
k
client
is
and
really
what
it
is.
Is
that
it's
a
re-implementation
of
stuff?
So
we
have
a
you
know.
Seth
is
a
you
know,
big
complicated
tree
that
we've
all
done
some
work
in,
but
the
the
k
client
is
really
just
a
copy
of
all
of
all
that
code.
A
It
doesn't
really
share
much
with
the
user
land
cake,
user
land
client,
aside
from
maybe
a
few
header
files
here
and
there,
and
so
we've
got
several
pieces
in
the
kernel
to
handle
all
the
stuff
client
stuff
and
the
first
piece
is
libsep,
which
is
a
kernel,
module
and
really
what
this
is
just
the
underlying
transport
layer.
A
For
ceph
for
the
seth
code
in
the
kernel-
and
so
here
all
this
code
is
contained
in
netsef,
where
most
of
it
is
anyway,
this
aside
from
a
few
header
files
that
are
under
an
include
directory-
and
then
you
know
in
here,
you
can
see
that
there's
a
bunch
of
stuff
that
you
can
probably
pretty
much
sort
of
you
understand
from
the
names
there's
parts
for
handling
off
in
the
authentication
there's
some
stuff
for
handling
crypto.
A
The
messenger
is
here
and
there's
v1
v2
variants.
We
have
an
osd
map,
there's
the
osd
code,
client
code,
the
mod
client
code,
stuff
for
handling
osd
and
mod
maps,
et
cetera,
et
cetera
right.
So
any
case,
we've
got
all
this
all
this
code
basically
lives
in
here.
I'm
not
gonna.
Go
over
this
in
great
detail,
but
I'll
talk
about
a
couple
pieces
of
it
here
in
a
minute.
Aside
from
that,
we
have
two
more
pieces
in
the
kernel.
There
is
drivers.
A
And
so
this
is
the
rbg
rbd
driver,
it's
not
very
big,
but
it
called
basically,
it
just
calls
down
into
libsep
to
do
most
of
what
it
does.
This
is
how
the
the
routers
block
devices
is
implemented,
and
so
you
can
see
here's
you
know,
you
know
it's
mostly
self-contained
into
a
single
file
and
it's.
B
A
Huge
it's
like
maybe
a
couple
thousand
lines.
Maybe
let's
see
or
seven
thousand
lines
excuse
me
and
then
we've
got.
A
Fsf,
which
is
where
all
the
cefs
code
lives-
and
this
is
a
complete
dfs
layer-
kernel
driver
for
the
seth
stuff,
client
rbd.
When
you
you
know,
when
you
look
in
there,
what
it's
doing
is
it
creates
a
block
device
driver.
You
know
when
you
think
of
file
systems
in
the
kernel.
A
A
So
when
you,
when
we
are
dealing
with
a
file
system,
most
of
what
we
are
doing
is
responding
to
you
know:
user
driven
events,
things
that
come
in
from
like
say,
syscalls
or
you
know
it
could
be
nfs
activity
or
nfsd
activity
or
smbd
activity.
Probably
these
days
any
of
that
kind
of
stuff
is,
you
know
we
are
essentially
we
call
you
know.
Most
of
it
comes
in
through
the
vfs
layer
and
the
vms
layer
will
call
has
a
bunch
of
is
essentially
object.
A
Oriented
too,
just
like
you
know,
sort
of
in
a
poor
man's
object.
Orientation
like
a
lot
of
kernel
is
but
anyway,
there's
a
main
file
for
the.
A
So
essentially
in
fs.h
include
linux
fs.h.
We
have
a
a
bunch
of
different
objects.
You
know
the
canonical
one
for
the
driver.
Is
this
file
system
type
thing
and
it
has?
You
know,
operations
a
couple
of
operations
vectors
in
it,
one
through
init
fs
contacts,
there's
a
mount
which
is
for
the
old
mount
api
and
there's
kill
sb
here,
which
will
kill
the
super
block
eventually
once
you're
ready
to
destroy
it
from
a
file
system
type.
So
what
happens
is
when
you
go
to
mount
right?
You
know
you.
A
So
then
we
end
up
with
a
struct
super
block
object,
and
this
thing
a
super
block
is
what
represents
a
file
system
in
the
kernel,
and
so
we've
got
a
bunch
of
fields
in
here
and
a
lot
of
these
look,
you
know,
may
look,
you
know
self-explanatory,
but
in
particular
we've
got
a
sfs
info.
A
This
structure
here
will
end
up
pointing
at
what
we
call
this
ffs
client.
So
in
in
this
generic
super
block,
you'll
have
a
pointer
to
a
generic
pointer.
That's
you
know
a
void
star,
pointer
right
and
it
points
to
you
know
some
super
block
info
that
we've
got
for
perseph
from
here.
We've
got
two
main
things
that
we
deal
with.
A
There's
a
struct.
A
A
The
big
one
here
is
struct
inode,
so
this
represents
an
inode
in
the
in
the
kernel
and
these
structures
are
usually
embedded
inside
of
the
of
the
step.
Inode
info
know.
The
super
block
is
a
little
different.
It
points
it
has
a
pointer
to
a
to.
You
know
private
info
when
we
get
down
to
inodes.
What
we
do
is
we
embed
them
then,
from
here
we've
got.
A
So
here's
a
struct
entry
which
represents
a
compat
name
component
in
the
in
the
file
in
the
super
block
per
super
block
linux
that
actually
has
probably
the
most
advanced
denture.
Caching,
you
know
or
lookup
handling,
denture
handling,
info
or
method
in
the
world.
It's
probably
one,
that's
one
of
the
places
where
it
really
shines,
and
you
know
compared
to
anything
else,
that's
out
there,
but
any
case
we
have
a.
A
So
each
of
these
structures,
usually
they'll,
be,
will
have
and
there's
also
a
struct
file,
which
represents
a
open
file.
I'll
look
at
that
right
now.
Each
of
these
operations,
or
each
of
these
structures
has
a
or
object,
has
a
usually
has
a
operation
struct
hanging
off.
This
is
sort
of
where
all
the
class
methods
for
the
thing
end
up
being
right.
So
you've
got
here
a
bunch
of
dispatch
operations
for
various
things
like
for
a
lookup
right.
You
know
if
you
wanted
to
look
up
in
a
directory.
A
You
know
that
when
we
go
to
do
the
lookup,
you
know
when
the
when
you're
gonna
do
the
lookup.
It's
gonna
call
this
lookup
vector
right.
We've
got
a
bunch
of
other
stuff
in
here.
There's
create
right
to
create
a
file
in
there
link
unlink
sim
link,
you
know
all
the
stuff
whenever
a
sys
call
comes
in,
we
end
up
usually
doing
some
path.
A
Name
walking
to
go,
do
look
up
lookups
in
there
and
then
we
end
up
calling
a
sort
of
a
terminal
operation
that
will
actually
do
the
the
stuff
we
want
to
do.
One
of
the
particular
ones.
To
note,
is
this
atomic
open
so
for
network
file
systems
in
particular
right?
Is
that
usually
like
when
we
go
to
do
to
open
a
file
on
like
a
local
file
system?
We're
gonna
go,
do
a
path
walk
down
to
where
the
thing
is.
A
Look
it
up
turn
it
into
an
inode
excuse
me
and
then
we
will
turn
around
and
issue
an
open
call
against
the
thing
so
atomic
open,
but
that's
a
bit
of
a
waste
for
a
network
file
system
right,
because
if
you
know,
if
you
have
to
do
a
lookup
and
then
go
back
and
open
the
thing,
that's
two
up
two
round
trips
to
the
to
the
server
right.
So
we
don't
want
to
do
that.
What
we'll
do
is
do
we?
A
We
instead
will
just
issue
an
open
request
right
and
if
it
comes
back
with
an
email
or
whatever
right,
then
we
just
say:
okay,
this
entry
just
didn't
exist,
and
then
so
we
have
a
combined
lookup
and
open
call
here
in
atomic
opening.
But
any
case
you
can
see
all
this
stuff.
Some
of
these
things,
too,
are
some
somewhat
blocked
by
specifics
as
well.
A
Like
fine
map
and
so
for
those
who,
just
generally
don't-
and
if
you
don't
define
these
operations,
usually
what
happens
is
you
know
you
get
either
some
generic
error
back
or
you
know
if
they're
set
to
null,
then
they'll
get
you'll,
get
back
a
generic
error
when
you
try
to
call
them
or
it
will
have
some
sort
of
fallback
code
that
will
do
do
something
different,
and
so
each
of
these
objects
has
the
one
has
these
I
I
got.
You
know
these
operation
structure
hanging
off
of
it.
There's
an
iodine
operations.
A
Here's
the
super
one
for
like
if
you
want
to
mount
a
mount,
do
that
kind
of
stuff
create
new
inodes
in
the
super
block,
because
I
know
this
can
be
different
sizes
for
different
file
systems.
So
we
have
to
have
an
allocator,
a
method
in
here
as
well.
A
Now,
let's
talk
more
about
the
specific
stuff
and
seth
so
so,
like
we
were
talking
about
right.
Most
of
the
info
in
for
the
sub
client
is
in
super
dot,
h,
fs,
super
dot,
h
and.
C
A
Here's
like
in,
for
instance,
is
the
set
inode
info
right,
which
is
the
you
know,
the
inode
representation
for
a
sephi
node,
and
if
you
see
in
here
there's
basically,
we
have
a
bunch
of
fields
that
are
set
specific
layout
info,
say
and
here
there's
stuff
for
caps.
A
Here
you
know
an
rb
tree
to
hold
cap
structures
which
we'll
talk
about
a
little
later
and
then
finally,
you
have
down
here
where
the
vfs
I
node
is
actually
embedded
note
here
that
it's
we
actually
embed
a
full
vfsi
node
in
the
end
of
the
thing
so.
B
A
To
allocate
nine
node,
you
essentially
get
all
this
junk
at
the
top
and
then
the
vfsi
node
allocated,
and
you
can
see
here
too.
We
also
got
some
if
depth
out
pieces
like
this.
If
you
don't
compile
it
with
fs
cache,
you
won't
get
this
field
in
here,
because
we
don't
need
it.
A
Also,
we've
got
you
know,
so
you
know
when
you
go
to
mount
the
thing
right.
A
So
per
super
block.
We
go
to
mount.
What
happens?
Is
we
open
a
bunch
of
sockets
to
different
servers
right?
You
know
we
have
to
open
them
on
first
figure
out
where
the
mds
is,
then
we
go
talk
to
it
and
then
we
go,
you
know,
might
have
to
go
fetch
some
stuff
off
an
osd
later
and
so
the
at
the
super
block
level.
A
We
have
this
stepffs
client
struct
and
it
actually
points
to
another
thing
called
the
set
mds
client,
which
is
actually
the
part
that
talks
to
the
mds
so
and
then
we've
also
got
some
things
in
here
like
work,
use
gap
or
queues
that
sort
of
thing
and
again,
if
depth
pieces,
and
if
you
don't
compile
with
debug
tests,
you
won't
get
any
of
this
stuff
get
with
fx
cache.
A
The
interesting
bit
here
is
like
in
this
fmds
client.
A
So
in
this
fmds
client,
this
is
the
thing
that
represents
our
connection
to
a
particular
mdf
to
the
to
the
nds's
as
a
whole,
and
so
you
can
see
here.
We've
got
a
struct
a
list
of
sessions,
there
are
an
array
of
sessions
and
then
there's
also
in
each
session.
A
And
each
of
these
guys
has
a
a
connection
as
well,
and
this
thing
is
the
you
know
dials
back
into
libsev,
and
then
that
represents
the
connection
to
the
to
the
set
daemon.
On
the
other
end
and
now
the
set
connection.
A
You
know
that
represents
the
actual
tcp
socket
that
it's
going
to
talk
to
that
uses
to
talk
to
a
to
the
daemon
on
the
other
end
and
there's
some
other
features
in
here
too
one
of
the
interesting
bits
here
you
know:
we've
had
this
sort
of,
unfortunately
named
mutex
in
here,
pretty
much
everything
is
done
under
this
mutex,
the
the
stuff
lipsef
code
is
highly
serialized,
probably
unnecessarily
so,
and
it
seems
to
probably-
and
my
suspicion
is
that
this
kills
performance,
one
of
the
things
I'm
actually
looking
at
doing
it
right
now
doing.
C
A
Now,
or
have
been
looking
at
over
the
last
month
or
two
is
trying
to
at
least
get
some
pieces
of
of
the
call
of
the
the
activity
out
from
under
this
mutex.
So
we
don't
need
to
serialize
quite
everything
in
any
case.
A
So,
let's
look
at
like
what
happens
like
let's
say:
we've
mounted
a
file
system
and
now
we're
going
to
go.
We
want
to
do
open
a
file,
maybe
and
write
to
it.
Okay,
that's
probably
you
know
a
pretty
simple
sort
of
operation
that
we
do
so.
A
The
first
thing
we're
going
to
do
right
is
we're
going
to
call
into
the
walk
down
the
path
till
we
get
to
the
ceph
until
this
ffs
and
maybe
walk
down
doing
lookups
as
well
for
different
path,
components
till
we
get
to
the
file
we
want
to
open
and
so
in
that
directory
in
the
directory,
where
we're
going
to
open
that
file,
we're
going
to
issue
an
atomic
open,
and
so
when
you
get
the
atomic
opening,
you
get
a
bunch
of
different
parameters.
A
There's
a
directory.
Here's
the
the
name
of
the
entry,
which
is
the
you
know,
the
entry
that
we
want
to
open
here's
a
file
that
is
not
quite
filled
out
yet
because
we
haven't
actually
done
the
open
and
then
there's
some.
You
know
open
flags
and
then
a
mode
for
the
like,
if
we're
doing
a
create
a
new
mass
mode.
A
So
essentially,
what
we'll
do
is
walk
down
this
thing.
If
we've
got,
you
know
more
of
the
name
max
which
we
should
send
back,
you
name
too
long,
but
then
we've
got
some
other
stuff
too,
like.
If
this
thing
is
a
create,
we're
gonna
do
some.
You
know
some
setup
and
then
we
you
know:
if
we're
not
in
a.
If
we're
not
being
looked
up,
then
this
is
not
negative.
So
there's
a
whole
bunch
of
rules
for
doing
atomic
opens.
A
It's
probably
more
complex
than
it
needs
alvaro
has
been
raging
about
this
for
years,
but
he
just
doesn't
want
to
change
our
change
out
how
it's
going
to
work
in
any
case.
Eventually,
though,
we're
going
to
come
down
here
and
we're
going
to
prepare
and
open.
So
this
is
actually
the
request
that
we're
going
to
send
to
the
mds
to
say:
hey
we're
going
to
open.
We
want
to
open
this
file
all
right,
so
we
set
up
some
stuff
here
again.
A
A
We
fill
out
the
the
request,
this
mds
request,
which
is
what
this
rack
thing
is
here,
and
then
we
submit
it
to
the
to
the
to
the
engine
to
do
to
do
its
thing
and
then
eventually
this
is
a
synchronous
request
in
most
cases
or
definitely
synchronous,
request
here
and
then
we
end
up
when
the
error
comes
back.
A
If
we
get
enough,
we
say,
oh
that
file
doesn't
exist
and
we
send
it
back
to
the
journal
to
say:
hey,
you
know
that
that
file
wasn't
open
or
did
or
this
file
doesn't
exist
or
otherwise.
If
we've
got
a,
you
know,
oh
create.
So
we've
got
a
bunch
of
rules
here
that
happen,
but
at
the
end
of
it
all,
we
will
call
this
finish
open,
which
will
finish
filling
out
the
the
file
strip,
the
struct
file,
and
then
you
know,
let's
say:
we've
got
a
full
at
that
point.
A
A
And
then
pass
back
to
the
user
so
now,
let's
say
that:
we've
done
that
and
now
the
we're
going
to
issue
a
write.
So
the
first
thing
it's
going
to
do
is
it
calls
this
function?
It
turns
this
right
into.
We
have
this
sort
of
generic
iterator
called
irb,
and
so
we
get
a
write
request
from
userland
or
or
maybe
from
a
splice
request,
or
you
know
who
knows
what
right,
but
it
gets
turned
into
this
ioder,
and
it
says
you
know,
write
this
data
to
this.
A
You
know,
because
you
know
to
these
positions
in
a
file
in
the
file,
and
we've
also
got
this
iocb,
which
is
sort
of
an
I
o
context,
and
so
the
first
thing
it's
going
to
do
is
look
in
the
icd
and
say:
okay
here
it
finds
the
file.
This
ki
filt
is
what
points
to
the
struct
file
and
then
from
that
we
can
get
the.
I
know
that
we're
going
to
talk
to.
A
Doing
I,
if
we're
trying
to
do
an
an
oh
append
right,
it
will
do
this.
Do
a
get
adder
to
try
to
get
the
size
for
the
file.
This
is
terribly
racy,
but
it
does
sort
of
semi
work.
A
If
we
do
the
deal
is
that
we
want
to
do
a
synchronous
write
in
that
case
right
because
we
never
want
to
so
so
seth
is
you
know
if
we've
got
the
well
here
I'll
talk
about
it,
but
in
case,
if
we,
if
we've
got
this,
then
we're
going
to
want
to
do
the
right
synchronously.
A
You
know
it
turns
out
that
the
thing
was
set.
Uid.
B
C
B
C
A
And
then
what
we
do
is
we
check.
We
start
checking
for
caps.
So
the
next
thing
we
do
here
is
we
call
down-
and
we
say:
okay,
we're
going
to
get
caps
and
we're
going
to.
We
have
to
have
file.
You
know
the
fw
caps
file
right
caps,
but
we
also
want
file
buffer
cap
or
maybe
lazy,
io
cap.
A
Now
the
mds
will
issue
these.
You
know
fb
caps
if
it
wants
to
allow
the
client
to
cache
or
to
buffer
rights,
and
if
this
is
the
case,
then
we
can
go
down
here
when
we
get
down
here.
Also
here
where
we
increment
the
I
version,
but
down
here,
we
we'll
look
and
see.
Okay
did
we
get?
These
got
represents
what
caps
we
got
and
if
we
got
the
file
buff
didn't
get
the
file
buffer
caps.
We're
going
to
need
to
do
all
the
synchronous
same
thing.
A
If
we've
got
we're
doing
it,
oh
direct
right:
we
don't,
we
can't
use
page
cache
there.
I
think
we're
doing
a
thing
was
opened.
Osync
or
we've
got
or
there's
been
an
error
right
back
error.
Recently,
we've
got
that
we
also
switched
to
doing
synthesis
so
any
case
we
come
down
here.
We
get
some
snap
context
info
there
and
then
we
come
down
here
and
call
if
we're
doing
a
synchronous
right,
we're
going
to
call
down
into
here
and
do
this.
A
A
But
when
it's
not,
then
we
actually
have
to
go.
Do
we
actually
go
and
issue
a
synchronous
write
to
the
server
or
any
case
from
there?
We
will
call,
but
let's
say
we
have
caps
we're
going
to
do
a
buffered
right.
A
We
separate
again
so,
in
addition,
every
inode
also
has
some
has
a
what
we
call
an
address
space,
and
so
here
we
can
do
when
we're
using
the
page
cache.
Now
when
we
don't
have
caps
or
we
don't
or
doing
particular
types
of
writes
like
godirect,
whatever
we
don't
use
the
page
cache,
but
when
we
do,
we
have
to
call
into
operations
here,
and
so
the
first
stop,
for
this
is
seth
right
again
and
what
this
does
we'll
go
and
try
to
fault
in
the
page.
A
So
and
then
we
we've
seeded
a
lot
of
this
code
or
moved
a
lot
of
this
code
into
the
netfest
layer
now,
but
effectively.
What
it
does
is
it
calls
in
does
a
and
the
netfest
layer.
Now
we
have
this
set
netfest
redox
and
it
has
a
bunch
of
operations
to
do
things
like
issue
read
operations.
A
We
also
can
resize
them
that
kind
of
thing.
So.
C
A
A
So
here
we're
given
this
page
and
now
we
have
this-
we
have
structfolio
now,
but
essentially
structfolio
represents
a
page,
and
what
we
will
do
is
we're
given
a
page
here.
A
A
We're
gonna
go
and
do
this
operation
here
to
fill
it,
and
so
what
this
does
is
that.
A
Say
is
anybody
if
anyone
has
questions,
please
speak
up
and
ask
you,
don't
have
to
wait
for
me
to
step
down?
A
I'll
take
that
as
a
no,
but
in
case
we
have
this
osd
request
that
we
record
that
we
do
here,
and
so
we
call
down
into
step
os
the
new
request
passing
a
whole
bunch
of
parameters
based
on
what
we're
trying
to
do,
which
is
three
to
five.
You
know,
do
a
read
and
then
we
so
we're
building
a
read
request
here,
and
then
we
also
build
this
iodine
or
x-ray,
which
is,
I
won't
go
into
right
now,
but.
B
A
Case
essentially,
this
is
the
thing
that
holds
all
the
pages
or
represents
a
bunch
of
the
represents
and
arrange
the
file.
A
C
A
Thing
will
then
go
turn
around
and
you
know
this
will
start
the
request
running
and
then
eventually,
once
it
finishes,
we
will
call
this
finish.
Netfest
read
function
which
will
then
go
and
handle
you
know
whatever
whatever.
A
So
when
we
pass
this
thing
down
here
to
the
libsep
engine,
what
happens
is
we've
given
the
lipstick,
an
array
of
pages
and
said,
okay
plop,
all
the
data
that
you
read
into
here,
and
so
these
pages
will
end
up
being
written
to
populated,
and
then
we
call
this
finishnet
if
that's
read,
which
goes
and
checks
the
result
right
and
if
we
got
no
object
right.
A
If
there's
no
object
where
this,
where
we
went
to
go,
do
this
thing
we
basically
zero,
fill
the
page
and
then
and
then
test
it
back
ditto
here
we've
got
or
you
know.
If
you've
got
different
errors,
then
we
will
handle
them
as
well
and
then.
A
We
call
this
netflix
terminated,
which
will
then
go
and
write
the
or
finish
off
the
finish
off
the
request.
A
Okay,
so
now
we've
got,
we've
done
a
write
again
at
that
point,
the
we
pass
it
back
to
the
netfest
and
the
vfs
layer.
A
It
will
go
then
take
the
pages
that
it
got
from
userland
and
copy
them
into
the
page,
cache
pages
that
we're
using
and
then
we
call
sephwrite
end
or
then
write
ends
will
get
called,
and
this
will
go
and
set
the
pages
up
to
date
so
that
the
page
cache
knows
that
it
can
satisfy
reads
out
of
these
pages
and
then
we
go
and
maybe
to
go
and
increase
the
size
of
the
file,
which
is
what
this
does,
because
if
we
wrote
to
the
end
of
the
file
now
the
file's
longer,
we
have
to
go
and
do
that
and
we
have
to
mark
these
pages
dirty
and
so
later
on.
A
The
kernel
will
turn
around
and
write
these
pages
back
to
the
server
you
know
at
this
point
all
we
did
was
we
read
in
the
pa
with
the
unread,
the
unwritten
parts
of
the
page
did
a
you
know,
copy
the
data
into
them,
and
then
we
mark
the
thing
dirty
so
that
we
can
write
back
later
and
then
a
point
we
unlock
the
pages
and
put
them,
and
then
we
might
also
check
and
see
if
we
have
caps,
we
need
to
deal
with.
A
From
there
now,
let's
say:
okay,
we've
got
the
page
in
the
page,
cache
okay,
the
data
in
the
page
cache
now
we're
later
on
someone,
let's
say
someone
call
sync
right,
you
know
or
fsync,
or
something
like
that.
Now
we
have
to
write
them
back
at
that
point.
What
we
do
is
we
do
something
called.
I
usually
get
a
call
from
the
dfs
to
do
something
called
write
pages
and
we
have
the
set
right
pages
start
a
vector
that
will
do
this
and
what
this
thing
does.
A
Is
it
walks
down
the
tree
of
pages
that
we've
got
the
array
of
pages
that
we've
got?
It's
not
really
an
array,
but
it
looks
like
one
and
it
will
set
them
up
to
be
written
back
and
then
start,
and
so
you
can
see
in
here
where
sorry,
I'm
probably
going
kind
of
fast,
but
but
you
know,
there's
a
bunch
of
special
cases
here.
Oh,
if
we're
beyond
the
size
of
the
file
we
have
to
may
have
to
invalidate
some
things
might
have
to
deal
with
snapshots.
A
It's
pretty
complicated,
but
essentially
at
some
point
we
will
come
down
here.
Step
set
the
page
right
back
bit
in
the
page,
which
basically
tells
the
colonel
hey.
I'm
writing
this
thing
back
and
then
we
clear
the
dirty
bit
for
the
page.
That
means
that
someone.
So
if
someone
comes
in
later,
you
know
and
dirties
the
page
again,
then
we
will
ensure
that
it
doesn't
that
we
didn't
miss
a
race
and
miss
the
right
and
then
so
anyway,
we
gang
up
a
whole
list
of
these
pages.
A
It's
going
to
walk
down
while
it
does
the
build
it
until
we
are
finished.
Writing
back
what
the
vfs
says.
It
wants
to
write
back,
so
we
gang
up,
apply
all
these
pages
and
then
we
call
def
osdc
new
request.
So
we
generate
a
new
write
request
in
this
case
right.
A
C
A
Times
we're
writing
back
data
because
we're
trying
to
clear
memory.
So
we
you
know
if
we
are
under
memory
pressure
and
we
can't
allocate
pages
in
order
to
write,
write
that
data.
Then
we
have
a
big
problem
right.
We
can't
we
can't
go
anywhere
so
any
case
we've
got.
We
set
this
our
callback
at
the
end
to
write
pages
finish,
because
at
the
end
of
the
thing
we're
going
to
need
to
handle
some
stuff,
you
need
to
handle
that
in
the
right
and
then
we
go
and
start
a
marshalling
up
pages.
A
A
A
A
All
the
osd
client
engine
will
submit
the
request
to
the
osd
and
collect
the
reply
and
then
eventually
we'll
call
when
it
gets
the
reply.
It
will
call
right
pages
finish
to
finish
doing
up
and
then
that
point
it
will
come
down
here.
Mark
the
pages
clean
you
know
or
in
the
right
back
of
the
page,
as
you
can
see
here
or
end
page
right
back,
and
so
we
can
tell
the
kernel.
Okay,
it's
not
under
write
back
anymore.
A
We
might
update
some
metrics
and
then
we
put
whatever
cap
refs
we've
held
because
we
take
references
to
the
buffer
caps.
While
we're
doing
all
this
and
then
eventually
we
put
the
request
and
then
things
finishes.
Right
is
done.
A
Okay,
so
that's
all
fine
and
dandy.
So
that's
a
good
example
of
how
complex
it
can
be
to
actually
write
out.
Data
reading
is
also
pretty
complex
in
the
same
way,
but
in
addition
to
all
that,
we
also
have
some
sort
of
more
autonomic
stuff
that
we
do
some
more
autonomic
activity,
that
sort
of
happens
in
the
background
or
that
it's
driven
by
the
mds.
A
So
we've
got
this
so
I
mentioned
that
seth
connection
earlier
each
connection
has
what's
called
a
dispatch
routine
or
a
bunch
of
operations.
Actually,
just
like
we
have
operations
for
the
op
for
the
inodes
that
are
handled
by
syscalls.
When
we
get
certain
socket
activity
or
certain
connection
activity,
we
will
call
different
operations
vectors
in
this
op
and
this
struct,
and
so,
for
instance,
like
for
the
mbs.
This
is
the
mds
operations
vector.
A
And
you
can
see
here,
we've
got
a
bunch
of
stuff
that
we
do
depending
on
which
sort
of
message
comes
in
so,
like
you
know,
if
you
get
a
reply
from
our
request,
it
will
handle
reply.
If
you've
got
oh
caps,
you
know
you
gotta
handle
caps,
so
on
so
forth.
A
C
A
Caps
are
you
know
we
have
a
structure,
you
know
an
object
that
we
track
just
like
everything
else
to
represent
a
cap,
and
these
are
held
in
rb
tree
in
the
in
the
island,
and
so
when
we
get
say
a
request
from
the
mbs
to
say
you
know
flush
some,
you
know.
Maybe
we
wanted
to.
A
You
know
to
start
writing.
You
know
two
favorite
books
of
caps
or
or
to
grant
some
caps.
We
get
this
call
from
the
mbs.
A
And
we
will
call
it
we'll
call
seth
handle
so
like
on
an
open
or
something
like
that.
We
might
add
some
new
cap
structures
to
it
and
then,
when
we,
you
know
just
in
the
response
to
the
request
itself
to
a
normal
mds
request,
but
we
also
can
get
an
asynchronous
grant
from
the
client
say:
hey,
okay,
you
know,
maybe
some
caps
came
free
that
you
had
requested
earlier
here.
A
A
So
if
we
get
you
know,
depending
on
what
sort
of
cap
request
we
get,
we
will
there's
a
switch
switch
statement
here
for
certain
types
of
cap
requests
and
then
yeah,
here's
the
thing
and
like
let's
say
in,
for
instance,
if
we
get
a
cap
opera
vote
or
grant
we're
going
to
call
this
thing,
which
is
steph,
which
is
handle
cap
grant.
A
That
does
basically
we'll
walk
down
the
tree
and
handle
all
the
cap
grant
order
or
revoke,
and
at
the
end
of
that
we
will.
We
might
end
up
kicking
off
right
back,
particularly
like
let's
say
someone
revoked
our
or
someone
tried
to
open
the
file
that
we
had
open
and
had
buffer
to
write
for
before
before
we
had
written
back,
we
might
call
and
then
also
we
you
know
buffer
caps
get
revoked.
A
A
A
Asynchronous
cap
requests
from
the
server
or,
for
you
know,
you
know
doing
things
like
returning
caps
to
the
to
the
nbs.
A
Let's
see
yeah,
that's
really
about
all.
I
had
to
to
go
over
sort
of
the
10
000
foot
view
of
the
client.
It
gets
pretty
complicated.
You
know
complex
down
in
there
and
you
know
one
of
them,
but
that
maybe
give
you
at
least
a
start
on
on
what
to
do.
Does
anybody.
C
You
worked
on
that
something
generic
for
all
file
systems,
so
I
mean.
Can
you
like
briefly
say
what
was
the
solution?
You
came
up
with
to
handle
letters
when
you're
trying
to
you
know,
write
back
pages.
C
A
Yeah,
so
essentially
what
happened?
Oh,
that
was
yeah.
That's
not
really
directly
related
to
seth,
but
I
can
talk
about
so
you
know
effectively
right.
You
know
when
we
have
when
you
write
data
to
the
inode
or
to
the
page
cache
right.
You
know.
A
Eventually
we
have
to
write
it
back,
but
a
lot
of
times
that
write
back
can
happen
behind
the
scenes,
like
you
might
not
even
be
aware
that
you
know
it's
doing
maybe
flushing
pages
out
because
of
memory
pressure
and
but
at
some
point
you
know
you
know
the
ideas
that
you
will
call
fsync.
You
know
and
then
at
that
point,
if
there's
an
error,
and
so
if
you're,
let's
say
you're,
writing
back.
C
A
A
Yeah
there's
some
so
we
put
some
infrastructure
in
several
years
ago,
called
aeroseq,
which
just
sort
of
splits
up
a
an
integer
that
we
track
in
the
inode
that
just
it's
like
basically
can
hold
an
error
up
to,
like
you
know,
negative
four,
nine,
six
or
an
error
of
four
or
nine
six
right
and
then
from
there
we
have
and
then
the
rest
of
the
32-bit
integer
will
hold
a
counter,
and
that
counter
tells
us
whether
this
isn't.
A
You
know
whether
this
error
has
is
a
way
for
us
to
tell
whether
this
error
has
been
been
not
seen
yet
or
not.
On
a
particular
file
script.
You
know
I
could
go
into
that,
it's
a
little
more
complex,
but
so
essentially
what
we
did,
though
you
know.
So
when
you.
A
So
you
know
so:
here's
like
you
know
if
we
have
a
32-bit
integer
right,
we
have
something
here
called
we.
We
declare,
you
know
12
bits
of
that
to
represent
the
air
node
that
actually
that
we're
going
to
store
usually
like
when
we
store
an
integer
right.
You
know
we
have
a
bunch
of
bits
left
over
if
it's
just
going
to
store
an
error.
A
A
So
when
an
error
is
recorded,
what
you
know
if
we
will
bump
this
counter
every
time
right,
so
let's
say
you,
you
wrote
back
once
and
you
got
an
error:
okay
and
then
someone
called
fsync
right
after
they
get
their
right
and
then
you,
maybe
you
write
back
some
more
data
and
you
get
another.
You
get
another
error
right
and
then
you
know
it's
hard
to
tell
whether
you
know
the.
A
If
you
call
sync
two
times
in
a
row,
you
know
you
don't
want
to
report
two
errors
in
a
row
before
there's
there's
not
been
another
right
back,
and
so
what
we
want
to
do
is
ensure
that
we
report
this
each
error.
Only
you
know
report
the
latest
error
only
once
reach
fsync
or
or
you
know,
msync
or
whatever
so
yeah
I
mean.
A
Essentially
we
have
this
counter
in
here
and
then
we
have
this
scene
flag,
and
so
all
this
thing
does,
is
it
just
you
know
in
each
struct
file
we
keep
it.
We
keep
another
32-bit
integer
and
we
keep
basically
take
a
snapshot
of
whatever
the
thing
was
right
after
we
did
the
fsync
so
like.
A
If
you
do
an
fsync,
we
take
a
snapshot
of
what
it
was
at
that
time,
and
we
reported
you
know
if,
if
it
looks
like
we
need
to
report
an
error
we
will
based
on,
you
know
whether
and
then,
if
we,
when
we
look
in
the
struct
file
version
of
this
counter,
we
will,
if,
if
it
looks
like
this
thing,
has
already
been
recorded,
we
don't.
C
A
You
say
there
is
a
new
right
back
error.
We
will
bump
this
counter
and
then,
when
we
go
to
write
back,
we
will
compare
it
to
the
one.
That's
in
the
struct
file
from
the
last
time
that
we
kind
of
did
it
and
we
say
well,
the
counter
got
bumped,
even
though
the
error
number
is
still
the
same,
we're
still
gonna
we.
We
know
that
there
was
a
new
error
that
was
recorded
since
the
last
time.
We
report
reported
this
error
and
then
we
can
report
an
error
again
on
that
on
the
subsequent
sync.
A
So
that's
the
idea,
basically
is
to
allow
you
to
record
errors
in
one
place,
but
we
can
report
them
over
multiple
file,
descriptors,
that
sort
of
answer
your
question.
I
could
probably
give
a
whole
talk
on
this
alone.
C
Okay,
yeah
thanks.
I
I
look
at
this
documentation.
I
mean
I
had
this
question
when
you're
talking
about
how
hard
is
the
issues
that
you
might
have
when
you
know
doing
during
a
page
like
back
so
part
of
this.
A
Yeah,
so
the
problem
we
used
to
have
right
was
that
we
tracked
a
lot
of
these
these
errors.
We
had
actually
basically
had
a
flag
that
was
just
in
the
inode
and
the,
and
so
what
would
happen
is
let's
say
you
might
have
this
file
open
multiple
times
and
lots
of
people
are
writing
to
it,
and
then
we
would
we
like
to
write
back
error
happened.
A
We
would
just
set
that
flag
and
then,
whenever
someone
called
fsync,
then
you
they
would
see
the
error
and
clear
the
flag
right
and
then
so,
if
another
file,
you
know,
if
another
person
calls
f
sync
on
a
different
file
descriptor,
they
wouldn't
see
an
error
right
and
they
would
think
that,
oh
my
my
code
must
or
my
rights
must
have
gone
through.
A
They
refine
right,
but
they
might
not
they're,
and
so
this
was
the
mechanism
that
we
that
we
came
up
with
to
to
basically
ensure
that
whenever
you,
you
know,
whoever
is
writing
to.
When
you
have
people
write
into
multiple
file
descriptors
on
the
same
file
that
they
all
will
see
errors,
you
know
if,
if
there
was
one
app
you
know
so
that
way
they
ensure
that
that
it's
it's
reliable
to
to
ensure
that
you
will
that
we
report
errors
to
everywhere.
So
but
yeah
take
a
look.
A
B
Hey
jeff,
quick
question:
do
we
use
reader
plus
in
pk
clan.
A
Yeah
we
do
well,
we
yeah
we
do.
We've
got
a.
There
is
no
real
read
of
reader
plus
operation
in
the
kernel.
A
So
you
know
you
just
do
a
reader,
but
what
we
do
is
we
will
do
a
read
dur
and
then
we
will
pre-populate
inodes
and
denturies
in
the
in
the
denture
cache
and
in
the
inode
cache,
when
we
wouldn't
do
that.
A
A
So
we
have
a
reader
notice,
it's
called
iterate
now,
but
reader
infrastructure
is
actually
pretty
complex
at
the
vfs
layer.
But
yes,
essentially
when
we
get
info
back,
we
also
get
back
when
we.
A
Of
denturies
we
also
get
inaudible
for
them
as
well,
usually
and
so
yeah
we
will
pre-populate
the
inode
cache.
You
know
we'll
allocate
inodes
and
pre-populate
all
the
info
in
there
and
so
that
later,
when
someone
goes
to
do
like,
say
a
stat
or
or
something
like
that
against
that
particular
entry,
we
don't
need
to
go
to
the
server
for
it,
because
we've
already
got
it
in
the
last
reader
request.
B
Okay,
so
I
remember
some
time
there
was
this
reader
p
call
added
that
in
one
call
you
can
fetch
the
the
directory
entries
and
the
stat
information
too.
I
mean
currently
doing
a
stat,
always
obviously
fetches
the
stats
for
a
niner
from
the
d
cache
or
something
I
mean
from
the
tunnel
cache.
B
But
this
was
one
call
that
fetches,
like
you
know
the
entries
and
then
all
the
stat
information
for
those
entries
I
mean
it
was
added
infuse
if
I
was
not
wrong,
but
maybe
it
was
hired
to
vfs
and
every
other
file
system.
I
read
not
sure,
but
it
looks.
A
A
Sorry,
are
you
talking
more
at
the
reader,
or
are
you
talking
more
at
the
at
the
vfs
layer?
So
like
was
there?
There
is
no
reader
plus
system
call
okay,
and
so
we
don't
have
anything
like
that.
Now,
the
nfs
server
can
do
may
get
a
reader
plus
request.
You
know,
in
fact,
or
you
know,
an
nfs
v4
reader
request
is
also
like
reader
plus
right.
No,
it
also
gets
fetch
designer
okay.
B
A
Yeah
yeah,
they
went
didn't
go
in
I
mean
we
did
some.
We
did
add
stat
x,
a
while
back,
which
you
know
it's
like
an
extended
stat
thing
right
and
we
there's
been
some
discussion
about
doing
a
reader
plus
operation,
but
it's
not
real
clear.
You
know
what
applications
will
use
this
right.
I
mean
I
could
see
it
being
used
by
like
samba,
maybe
or
the
you
know
the
nfs
you
usually
land
nfs
servers,
maybe
but
it's
not
as
useful
as
you
might
think,
and
so
you
know
and
effectively.
A
You
know
I
mean
the
only
thing
that
you're
that
you
will,
because
you
know
like
we
are
populating
the
cash
when
we
do
a
reader
in
the
background
right
so
like
it
said
like
say
we
get
a
reader
request
right.
We
will
go
ahead
and
and
get
the
list
of
dentures
that
are
in
the
directory,
but
we'll
also
you
know
this.
A
The
mds
will
also
forward
along
all
the
all
the
a
lot
of
the
I
node
info
as
well
for
for
each
of
those
inodes
and
we
so
again
we
populate
the
cache
for
that.
So
you
know,
if
you
do
a
reader
plus
system
call
the
only
part
you're
really
saving
you
is
that
the
context
switch
from
the
from
you
know.
A
The
multiple
system
calls
right,
you
know
it
and
then,
if
with
something
like
a
seth
right
where
we,
you
know
the
time
that
you
are
spending
in
that
is
pretty
negligible.
A
You
know
most
of
your
time
is
dominated
by
the
round
trips
to
the
to
the
nbs,
and
so
it
turns
out
not
to
be
as
big
a
win
as
you
might
think.
It
would
be
to
just
cut
out
that
sort
of
system
call
activity.
A
It's
a
bigger
deal
if
you've
got
something
like
like
on
a
local
file
system.
Maybe
there
was
some
talk
about
that.
That's
that's
really
where
I
think
you
would
see
it
like
if
you
want
to
do
like
xfs
might
want
to
would
want
to
do
that
because
then
you
could
do
you
know
something,
that's
a
pretty,
because
it's
pretty
fast
right!
You
know
dealing
with
local
disk,
and
so
you
don't
need
to
do
you
know
cutting
out
that
system
call
overhead,
because
anytime,
you
do
a
system
call.
A
You
have
to
do
a
contact
switch
and
so
the
cutting
out
that
system
call
overhead
can
be
a
big
deal
there,
but
less
so
in
the
network
files.
B
Any
other
questions,
yeah
yeah,
it
was
probably
related
to
xfs.
There
was
this,
someone
added
a
bulk
start
call
or
something,
but
I'm
not
sure
if
that
those
made
it,
but
it
it
was
on
these
similar
lines
of
you
know
having
getting
a
list
of
entries
and
all
the
standard
information.
So
you
don't
have
to
do
the
the
the
start
call
again
so
yeah,
I
think
so.
You're
coming
from
the
same
lines.
A
A
A
A
Okay,
if
not
I'll,
take
that
as
a
no
and
thanks
for
your
time,
everybody
have
a
good
day.