►
From YouTube: Code walk-through: new developers
Description
Sage Weil provides a high-level introduction to the Ceph code base, targeting new contributors. This talk focuses on the code itself and how to get oriented.
You may also want to look at https://www.youtube.com/watch?v=t5UIehZ1oLs for an introduction to the development environment (how to clone, build, test, find an open a bug, open a pull request, etc.).
A
All
right,
hello,
everyone
I,
see
sorry
for
the
short
notice
last
week,
so
the
goal
is
to
do
a
bit
of
a
high-level
walkthrough,
just
introduce
people
to
how
the
source
code
is
organized.
How
to
find
things.
There
was
an
earlier
talk,
I
guess
it
was
about
a
year
ago.
Maybe
that
was
targeting
just
new
contributors,
how
you
check
out
the
code
I
clone
it,
how
you
run
juice,
you
make
how
you
build,
how
you
open
a
bug,
I
commit
something
and
sign
it
off
and
open
the
full
request
and
all
that
stuff.
A
So
for
that,
I
would
refer
back
to
that.
Other
talk,
that's
recorded
it's
on
YouTube.
This
is
gonna,
skip
all
that
and
sort
of
jump
straight
into
the
code,
but
files
are
in
get
how
they're
organized
all
that
stuff
use
the
right
keyboard
here.
Okay,
so,
let's
start
by
just
looking
at
what
is
in
stuff
like
it.
So
if
you
check
out
SEF
and
the
top-level
has
a
bunch
of
files,
there's
a
few
interesting
things
like
you
know:
the
source
code
licenses
copying
files
whatever
there's
a
coding
style
file.
A
That
might
be
worth
looking
at
that.
Just
outlines
what
our
coding
style
for
see
those
pluses
and
it's
basically
based
on
the
Google
guide
for
simplicity,
with
a
few
variations
that
are
sort
of
noted
in
that
file.
I'm,
nothing
too
exciting,
but
that's
where
you
find
it
there's
also.
The
other
big
one
here
is
submitting
patches,
which
is
sort
of
a
procedural
thing:
how
to
sign
off
your
commits
what
the
DCO
the
signed
off
by
line
actually
means
all
that
stuff.
A
All
this
stuff's
there,
the
main
things
that
this
top-level
directory
are
the
dock
directory,
which
is
all
the
documentation
in
restructured
text
that
generates
Doc's,
F
calm,
but
that's
one
big
tree
and
then
the
other
big
tree
is
source.
That's
where
most
of
it
is
all
the
other
ones
have
like
little
scripts
packaging
information,
oh
I,
guess
the
third
one
is
QA,
but
all
our
QA
files
are
in
/qa.
A
There
are
scripts
that
you
run
that
generate
workloads.
There
are
yellow
files
that
generate
pathology,
tasks
whole
bunch
of
stuff
in
there
and
that's
probably
its
own
a
whole
other
thing
that
just
covers
that.
Well,
we're
gonna
focus
on
the
source
directory
cuz.
That's
where
all
the
source
code
does.
A
A
A
An
include
directory
and
there's
also
a
common
directory.
These
are
sort
of
interchangeable.
They
have
all
the
common
code,
sometimes
there's
a
header
and
include,
and
the
dot
CC
file
income,
and
sometimes
both
the
header
and
the
CC
father
in
common
that
just
sort
of
grew
that
way
over
the
years,
but
they're
more
or
less
the
same.
Just
random
infrastructure,
common
stuff,
there's
a
subdirectory
for
each
of
the
demon
types.
So
all
the
OSD
code,
mostly
is
an
OSD
there's
one
for
for
the
Mon
for
the
NDS
and
so
on.
A
You
can
find
those
there
there's
a
directory
for
Lavar
BD,
there's,
there's
a
whole
directory
for
rgw
all
that
I'm
beyond
that.
There
are
a
couple
other
sort
of
important
shared
directories,
the
first
one,
that's
probably
worth
noting-
is
a
message
directory
and
they're.
Really
two
of
two
of
these
there's
a
message
which
has
so
the
messenger
is
sort
of
one
of
the
key
components
and
stuff
that
handles
all
the
passing
and
messages
between
demons.
A
So
it
basically
hides
the
network
from
everybody,
and
so
you
have
sort
of
a
abstracted
entity
which
is
a
usually
a
demon
or
a
client
or
some
participant
in
the
distributed
system,
or
they
instantiate
a
messenger,
and
then
they
can
send
messages
to
each
other
and
those
that
message
passing
is
asynchronous
sort
of
like
RPC,
but
you
just
send
a
message
one
way
and
then
maybe
you
get
a
reply
back.
You
can
define
the
protocol.
However,
you
want
that
is
defined
in
terms
of
messages
instead
of
RPC
calls.
A
So
the
message
directory
has
the
classes
that
define
that
interface
I'm.
The
main
one
is
messenger
which
is
sort
of
your
your
endpoint,
and
you
can
create
one
of
a
particular
type
you
can
there
calls
in
here
that
send
a
message
to
a
particular
destination
and
so
on.
There's
another
class
called
dispatcher,
which
is
basically
an
interface
at
the
receiving
end
of
a
message.
A
So
each
entity
that
actually
is
receiving
messages
is
a
child
of
the
dispatcher
class
and
it
basically
just
the
main
thing
that
happens
here
is
there
is
a
dispatched
virtual
dispatch
method
that
you
have
to
implement.
That
basically
gets
called
for
every
message
that
is
incoming
off
the
wire.
That's
very
high-level
introduction
there.
There
are
two
two
main
implementations
of
this
there's,
a
simple
messenger
which
is
the
older
one,
which
is
under
the
simple
directory
I.
A
Don't
look
at
that?
There's
a
newer
implementation
of
this.
That's
now
the
default
as
of
already
and
luminous
called
the
async
messenger.
That's
sort
of
a
more
constrained
thread
pool
it's
better
design,
that's
what
everything
needs
is
now
the
implementation
sits
in
there,
but,
along
with
the
messenger
sort
of
this
obstructed
thing,
let's
see
past
messages
around
are
all
the
messages
that
actually
get
sent
over
the
wire.
Those
are
all
defined
in
the
messages
directory
and
you'll,
see
that
there
are
a
bazillion
of
them.
A
A
It
and
we
pass
in
a
what
is
that
going
sorry,
meaning
you
pass
in
a
type
to
the
message
constructor.
So
each
of
these
messages
has
a
unique
integer
type
that
distinguishes
among
between
all
the
other
messages
message
types.
Then
you,
you
define
methods
that
encode
and
decode
the
payload
thing
doesn't
actually
have
a
payload,
so
it's
pretty
simple
I,
maybe
is
like
the
more
complicated
one
is
the
command
message
which
is
used
to
send
a
command
to
the
monitor
or
to
a
demon
like
SEF,
tau
or
realest
Sui.
A
So
there
are
a
couple
of
class
couple
members
to
that
message
that
gets
sent
over
the
wire
there's
a
structure
that
you
use
when
you're
actually
using
this.
This
determines
which
cluster
you're
talking
to,
and
this
is
the
actual
command,
and
these
are
helper
methods,
they're
just
used
for
debug
output,
so
that
they'll
actually
show
sort
of
what
the
contents
of
the
message
are
when
you're
looking
at
the
debug
logs,
but
the
important
ones
are
in
code
payload,
which
basically
takes
whatever's
in
the
message
and
generates
a
byte
buffer.
A
A
Okay,
so
that's
that's
messenger
in
messages.
Let's
look
at
a
demon
and
walk
through
its
main
and
see
if
we
can
sort
of
make
some
progress
here.
So
this
is
the
OSD
code
when,
in
main
we
parse
the
arguments
and
then
we
call
this
function
called
global
in
it,
and
this
is
sort
of
the
some
of
the
crafty
infrastructure
that
all
the
stuff
daemon
share.
There's
two
they're
sort
of
two
pieces
of
this.
The
first
piece
is
in
common
there's,
something
called
a
stuff
context.
A
But
the
main
thing
that
the
step
context
contains
is
a
copy
of
the
configuration,
and
so
all
the
configuration
settings
are
there
in
memory
and
they're
sort
of
associated
with
your
self
context,
and
so,
whenever
you're
doing
something
in
the
code
that
sort
of
configuration
dependent,
you
have
to
pass
in
a
reference
to
that
stuff
context,
so
it
can
figure
out
what
configuration
options
to
apply
could
generally.
There
are
other
things
like
the
logging
infrastructure,
so
this
log
class
is
the
thing
that
generates
far
likes.
A
S-Something
got
log
files
that
you
can
generate
entries
and
they
all
get
sent
there
and
written
and
so
on.
That's
associated
with
that
they're
much
a
class.
Some
background
threads
that
like
to
flush
the
logs
things
like
that
are
all
associated
with
this
stuff
context.
This,
like
service
thread,
is
and
reopen
blogs
these
types
of
helpers
and
there's,
you
know,
there's
a
whole
bunch
of
random,
weird
stuff.
In
here
that
dealing
with
like
demonization
and
so
on,
I
bet
at
a
high
level
stuff
context
is
sort
of
associated
with
one
of
those
process.
A
So
if
you
look
back
here
over
at
the
main
for
the
OSD
one
of
the
first
things
we
do,
is
we
call
this
global
init
function,
which
is
a
helper
for,
like
you
know,
starting
up
this
sort
of
demon
environment
and
it
returns
this
F
context
just
sort
of
a
handle
for
that
nearly
instantiated
thingamajig,
notably
one
of
the
important
things
that's
best,
is
the
MD
config
T.
So
this
is.
These
are
all
the
config
options,
so
that's
stored
in
config,
dot
H,
and
this
is
like
a
generic
class.
A
That's
a
big
bucket
for
all
the
config
options,
as
methods
like
get
value
and
set
value
for
getting
and
setting
configuration
options.
This
is
some
of
the
oldest
code
in
the
tree,
and
so
there's
there
sort
of
some
crafty
that's
around
the
interface,
but
at
a
high
level.
Sort
of
the
interesting
part
here
is
that
here
we
have
a
map
of
options
to
values,
options
that
get
applied
to
runtime
stuff.
Okay.
So
that's
all
in
you'll
notice
that
this
lives
in
common.
So,
generally
speaking,
everything
in
common
is
stuff.
A
That's
shared
between
everything
I'm
in
stuff,
so
that
includes
daemons.
It
includes
liberators
lip
stuff
of
s,
Lib
RB
d,
so
both
client-side
libraries
that
might
be
shared
objects
and
also
banned
line
utilities
and
and
full
Bowl
on
demons.
There's
another
directory,
that's
sort
of
a
parallel,
that's
called
global,
and
this
is
stuff.
That
is
only
applies
to
the
demons
or
the
command-line
utilities,
because
one
of
the
cardinal
rules
of
writing
shared
libraries
is
that
you
shouldn't
have
any
sort
of
hidden
shared
state
that
gets
because
it
whatever
it
face.
A
A
You
know
like
the
thing
that
handles
segfault
and
generates
a
nice
back
trace
in
your
log
files
all
gone
here,
and
the
main
thing
here
is
global
in
it
is
a
set
of
helpers
for
like
starting
at
processes.
So
that's
what's
getting
called
over
here
and
it
does
things
like
global
and
it's
you
know
it
creates
a
stuff
context
here
somewhere
it
does
things
like
install
signal
and
lers.
A
D
B
B
A
A
C
A
C
A
So
in
stuff
there
are
different
entities,
there's
there's
clients
and
demons
and
whatever
each
of
those
is
sort
of
a
an
endpoint
or
an
entity
within
the
distributed
system,
and
each
of
those
has
a
messenger
that
he
uses
to
send
messages
to
other
points
in
the
system,
and
they
also
all
have
a
set
context
because
they
have
their
own
local
configuration
settings
and
so
on.
Yes,.
A
So,
let's
work
our
way
through
the
OSD
to
where
we
actually
store
objects.
So,
let's
see
how
should
we
do
this?
So
the
SEF
post
e
dot
CC
it
does
initial
startup.
It
does
some
demon
demon,
ization
settings,
mentor,
really
boring
stuff
just
to
like
start
up
a
demon
somewhere
in
here
it
will
create,
what's
called
an
object
store.
A
They'll,
see
that
you,
you
know
you
San
Shi,
8:1
they're
a
couple
different
types,
so
this
interface
is
implemented
as
file
store,
go
to
the
old
way
of
storing
objects
and
files.
There's
a
blue
store,
back-end,
just
a
new
way
of
using
block
devices,
there's
a
mem
store,
which
is
one
that's
all
in
memory.
It's
used
mostly
for
benchmarking
and
there's
one
called
K
store,
that's
sort
of
like
a
it's
a
source,
everything
in
key
value
pairs,
it's
sort
of
a
toy
and
yeah.
So
this
defines
the
whole
interface
you
can
now
you
have
objects.
A
You
have
collections
which
are
sorta
like
directories
of
objects,
you
have
all
the
operations
you
have,
transactions
that
mutate,
those
objects
and
so
on,
but
the
object
store
class.
Basically,
is
you
know,
loading
up
a
class
that
allows
you
to
access
that
data,
so
we
need,
let's
see
sorry
that
starts
up
it.
Does
that
yeah
yeah.
C
A
A
A
simplified
implementation
of
the
interface
and
M
store
is
usually
the
place
to
start,
because
it's
it's
basically
just
putting
everything
in
memory
I'm,
you
know
an
object
as
attributes.
It
has
a
no
map
header.
It
has
o
map
data.
It
has
somewhere
in
here
there's
the
actual
pipe
data.
I
think
it's
abstracted
into
the
subclass
I
can
remember
and
so
on.
So
this
sort
of
is
easiest
way
to
understand
that
interface.
A
It
stops
listening
for
messages
it
cleans
up
and
then
all
these
things
unblock
and
then
we
shut
down.
That's
the
basic
flow
of
the
main
thread,
but
mostly
action
happens
in
in
the
OSD.
Oh
I
know
that's
d
dot,
CC
the
implementation
of
that
class.
You'll
notice
there
is
in
here
there
is
a
dispatch
function
that
gets
called.
When
a
message
comes
in
over
the
wire,
then
it
basically
takes
a
lock
and
it
make
sure
we're
not
shutting
down
it.
A
A
So,
for
example,
if
we
get
a
new
SD
map
and
we'll
look
at
the
velocity
map,
and
then
we
do
all
the
processing
associated
with
that,
so
in
the
OSD
case
it
like
takes
all
the
maps,
event
a
message
and
it
writes
them
all
to
disk
and
it
commits
them
and
then
it
does
bunch
of
stuff.
So
that's
that
the
interesting
most
interesting
stuff
here
that
happens
in
the
OSD
is
in
this
other
variant
of
the
dispatch
called
fastest
batch.
It's
like
dispatch,
but
it's
called
direct
sort
of
synchronously
from
the
messenger.
A
So
there
you
have
to
be
careful,
it
locks
you
take
the
regular
dispatch,
is
it's
in
a
worker
thread,
and
so
you
can
block
and
do
all
kinds
of
random
stuff,
but
investors
that
should
be
very
careful.
What
we
do
Chester
anyway
I'm
in
here
there's
if
we
get
a.
If
we
get
a
request,
we
sort
of
instantiate
a
a
rapper
class
around
it.
That's
we
can
track
its
progress
and
then
we
encounter
work
queue
and
the
OST
has
this
multi-threaded
sharded
work
queue
or
its
processing
operations,
they're
sort
of
the
opposite
function.
A
A
A
The
idea
was
always
that
we
would
add
multiple
types
and
they
would
all
implement
this
abstract
PG
interface
and
the
way
things
grew
up,
because
there
was
only
the
one
implementation
for
such
a
long
time
that
interfaces
gotten
really
fat,
and
so
it
doesn't
really
quite
work.
That
way
and
in
fact,
the
replicated,
pools
and
Oratia
coded
pools
work
so
similar
they're,
both
log
based
that
they're
actually
both
sort
of
specializations
of
something
called
the
primary
log.
A
A
And
I'm
primary
logged
PG
is
the
actual
implementation
of
that,
which
is
the
only
implementation
of
it
and
it's
structured
so
that
all
the
common
code,
around
PG,
logs
and
peering
and
stuff
is
in
PG
and
primary
log
PG,
and
then
it
has
two
different
backends
one
which
does
replication
and
one
which
does
erasure
coding,
others
PG
as
a
PG
back-end,
which
is
sort
of
this
back
in
interface.
It
ease
that's
the
abstract
interface
and
there's
replicated
back-end,
which
is
for
replicated
pools
and
there's
easy
backend,
which
is
for
richer
coated
pools.
A
Interface
and
different
implementations,
but
the
way
it
group
they're
sort
of
all
separated
and
the
same
at
the
same
time.
So
it's
a
bit
silly
I'm,
but
regardless,
in
this
case
let's
say,
you've
had
an
OSD
up
or
right
came
in
over
the
wire
I'm.
You
would
end
up
calling
this
do
request
method
in
the
PG.
We
just
saw
the
caller
for
that
an
OSD
OCC
where
it
pulls
something
off
its
work
queue
and
it
passes
it
to
to
request
that
comes
in
here.
A
A
If
it's
an
operation,
then
we
we
called
you
up,
and
this
is
where
all
the
action
happens
for
actual
write,
request,
read
and
write
request.
So
again,
there's
some
backups
that
cause
stuff.
Here
we
make
sure
that
we
check
whether
it's
a
read
or
write
operation.
We
make
sure
the
operation
was
sent
to
the
right.
It
was
D
that
it's
actually
allowed
to
do
the
operation
that
it's
asking
to
do
our
these
are
the
set.
X
capabilities
are
getting
enforced
right
here.
A
Sometimes,
operations
RPG
wide
like
listing
objects,
and
so
those
get
passed
off
to
another
function.
But
assuming
it's
a
per
optic
thing,
we
check
that
the
object
name
is
valid
and
then
we
make
sure
the
client
isn't
blacklisted
because
it
was
fenced
out
of
the
cluster
and
a
bunch
of
other
checks
here
to
make
redose
actually
work
grey.
A
D
They
do
as
the
old
piece
has
one
what
will
be
context,
but
in
bacterial
piece.
So
why
here
we
have
the
new
piece.
A
So
raitis
operations
and
unlike
a
lot
of
other
systems,
a
radis
operation,
is
actually
a
it's
a
compound
operation.
So
one
of
those
messages
that
goes
over
the
wire
one
raitis
request-
we
call
it
at
can
add
multiple
operation,
so
it
might
just
have
one
read
read
this
object
read
this
by
range,
but
it
could
read
like
three
different
byte
ranges
and
the
attribute
all
at
the
same
time,
and
so
they
all
does
all
get
done
together.
In
the
reply
gets
sent
over
the
wire
together
or
more
commonly
in
the
right
case.
A
You
can
do
multiple
operations,
so
you
might
have
a
rate
of
stop
that
will
write
some
data
to
the
object.
It'll
set
an
attribute
and
it
will
set
a
no
map
key
and
it
could
do
all
those
things.
Atomically
and
it'll
get
committed
atomically
to
the
system,
and
so
that's
that's.
Why
that's
why
this
is
a
vector,
as
each
of
these
requests
is
actually
a
list
of
things
to
do
at
the
same
time?
A
So
that's
that's
implemented
here
in
this.
Do
SD
ops
class.
So
it's
again,
we
iterate
over
the
vector
here
and
that's
big
for
loop
and
we
again
just
switch
over
the
opcode
so
they're.
All
these
different
opcodes
I
could
have
lamented
some
of
their
sort
of
group.
If
I,
what
kind
they
are.
So
all
the
read
read
operations
are
here,
you
know:
do
you
read,
for
example,
or
let's
see
at.
A
I
saw
glossed
over
this
mo
st
op
request
before,
let's
go
back
and
look
at
that,
so
this
is
one
of
the
most
important
messages
in
the
system,
because
that's
sort
of
the
I/o
request
that
gets
sent
from
a
client
to
the
OSD
and
then
gets
sent
back
and
then
there's
another
another
one
over
here
that
called
him
Ostia
reply.
That's
the
reply.
C
A
It's
sort
of
hidden
in
these
message
definitions,
but
it's
part
of
the
message
envelope
around
it.
Each
of
the
entities
in
the
system
that
has
a
messenger
has
a
unique
address.
That's
a
entity
energy.
So
there
is
a
in
a
message.
Sorry
there's
a
types
thing,
HD
a
dirty
and
it
basically
there's.
Basically
an
IP
address,
there's
a
type
which
basically
says
whether
this
is
a
well
I
guess.
This
is
not
usually
news
in
the
legacy
address
or
we're
about
to
implement
a
new
version
of
this
with
bechet's
or
messenger
on
wire
protocol.
A
So
it's
which
protocol
you're
speaking
and
then
there's
a
nonce,
which
is
something
like
the
T
ID
or
some
other
unique
things
so
that,
if
they're,
multiple
clients
on
the
same
ip
address,
they
have
a
unique
identifier.
That's
what
the
key
is
and
if
you
look
in
include
an
include
directory.
There
are
a
few
header
files
in
here
that
were
originally
shared
between
will
they're
still
shared
between
the
user
space
code
and
the
kernel
client
code.
A
Some
of
the
underlying
types
for
this
messenger,
for
example,
has
things
like
the
message:
header,
the
old
instance
of
it.
The
new
header,
which
is
the
sort
of
the
envelope
that
surrounds
each
of
these
messages-
and
it
includes
you,
know,
there's
a
sequence
number
because
there's
some
ordered
stream
of
messages.
There's
a
transaction
ID
field
that
you
can
use.
So
you
can
associate
requests
with
replies
and
there's
like
the
length
of
the
payload.
Basically,
here
each
message
has
yeah
so
anyway,
if
you
go
look
at
the
MOC
op,
which
is
an
I/o
request.
A
It
includes
this
vector
of
OSD
operations
and
it,
and
if
you
want
to
know
what
those
are,
you
can
look
an
OST
types
that
age
and
you
can
see
what
an
OST
off
is,
and
it
basically
is
the
structure,
an
object,
name
and
then
input
and
output
data
might
be
a
little
bit
confusing.
But
it's
because
we
use
this
class
sort
of
on
the
OST
side.
A
When
you
send
a
message,
you
populate
the
in
data
and
then
when
is
a
process
that
it
fills
in
the
out
data
and
when
you
send
the
reply,
it
always
sends
the
out
data
back
to
the
client,
and
so
each
of
these
OST
ops
or
the
pass
its
own
written
data
and
read
data
I.
Guess,
there's
no
sir
general
sense.
B
How
does
on
every
object
when
you
write
data,
how
does
it
check
whether
that's
present
or
not,
on
before,
even
committing
read
disk
or
a
solid-state
drive?
How
does
it
check
the
variety
of
the
data,
as
well
as
how
many
number
of
reads
do
you
or
does
happen
before
for
a
right?
How
many
deeds
associated
with
the
right?
Okay,.
A
B
A
Depends
so,
first
on
the
data,
integrity,
side
and
there's
sort
of
two
pieces
of
this?
One
is
on
the
network
side
of
things.
So
when
these
messages
are
encoded
and
sent
over
the
wire,
the
envelope
there's
a
header
that
I
showed
you
there's
also
a
footer,
and
that
includes
a
CRC
of
all
the
data.
That's
in
club
for
that
message,
and
so
as
the
center
of
the
wire
we're
doing
CRC
checks
to
make
sure
that
we
got
what
was
sent
from
the
other
end,
because
the
TCP
check
something
is
just
just
too
weak.
A
You
got
all
kinds
of
errors
if
you
rely
on
that.
So
that's
sort
of
in
covers
the
integrity
of
data
from
getting
from
point
A
to
point
B.
Once
it's
once
you're
writing
it.
It
depends.
It
all
depends
on
what
the
backend
is
so,
which
object,
store
implementation
you're
using,
and
that
determines
how
many
iOS
you
do
and
what
the
data
integrity
guarantees
are
and
so
on
in
file
store.
A
A
Mostly,
we
try
to
layer
on
CRC's,
but
it's
sort
of
a
opportunistic
thing.
The
new
back
in
blue
store,
the
newest
implementation
of
the
object,
store
interface,
uses
the
block
device
directly
and
it
does
check
sums
on
everything
that
it
writes.
So
it'll
allocate
some
space
on
disk
it'll
write
new
data
there
and
then,
when
it
stores
metadata,
includes
a
checksum
of
the
data
along
with
the
pointer
to
those
bytes.
So
anytime
we
ever
read
data
off
of
disk.
We
also
fit
and
verify
the
checksum
so
that
we
got
back
was
written
before.
A
So
that's
that's
where
the
data
integrity
is
we
don't
on
a
write?
We
don't
like
write
the
data,
wait
for
the
device
to
say
it,
wrote
it
and
then
read
it
back
again
to
make
sure
that
it
actually
did.
We
don't
do
anything
like
that.
I'd
be
pretty
paranoid
and
it
wouldn't
really
be
trustworthy
anyway,
because
if
you
read
back
Dave,
you
just
wrote
your
probably
and
hit
the
device's
cache
or
something
like
that
anyway.
So
it
would
necessarily
mean
that
you,
it
actually
successfully
read
it.
A
B
A
B
A
It
depends
so
I'll
talk
about
blue
source
since
that's
the
new
back-end
and
it's
the
most
easiest
to
understand,
and
probably
so,
let's
say,
you're
writing
to
a
new
object.
So
you
have
your
message:
come
across
the
network,
Theo
steal
one
packet
turn
it
into
an
object,
store
transaction
that
says
write
to
this
object.
The
OSD
actually
has
this
processing
that
it's
gonna
do
a
check
to
see
if
the
object
already
exists.
So
it's
going
to
call
into
the
object,
store
and
say
load
the
metadata
for
this
object
in
the
blue
store
implementation.
A
It's
a
non-existent
object,
one
of
those
two
things.
If
you're
unlucky,
then
rocks
people
have
to
load
one
of
its
SST
files
and
do
one
or
two
iOS
in
order
to
make
sure
that
that
object
doesn't
actually
exist.
How
many
is
depends
on
how
many
levels
you
have
in
your
database
and
how
much
memory
you
have
and
whether
your
index
filters
and
bloom
filters
and
stuff
are
hitting
and
how
effective
those
are.
So
there's
like
a
whole
world
of
rocks
to
be
about
how
well
that
actually
works.
A
There's
a
lot
of
work
happening
in
blue
store
right
now
to
make
the
caching
blue
store
much
smarter
so
that
we
are
very
aggressively.
Caching,
the
the
index
filters
and
bloom
filters
for
rocks
to
be
to
eliminate
those
iOS
and
those
have
the
highest
priority
in
cache,
because
they
don't
have
sort
of
the
best
bang
for
buck
as
far
as
eliminating
items
to
the
device,
but
you
might
miss.
But
in
the
optimistic
case
you
don't
hit
anything
at
all,
and
so
OSD
decides
that
object
doesn't
exist.
A
It
generates
a
transaction
to
write
it
that
gets
passed
in
the
object
store
at
that
point.
Blue
store
is
going
to
say,
I'm
just
gonna
pick
a
region
on
disk
that
isn't
allocated
it's
going
to
queue,
an
I/o
to
actually
write
to
that
space,
and
it's
going
to
wait
for
that.
I
would
actually
go
to
the
device
and
come
back
and
it'll
do
this
for
a
bunch
of
operations
and
at
some
point
it's
gonna
in
sort
of
a
bad
fashion.
A
It'll
do
it
it'll
then
issue
a
flush
block
device
that
says
a
request
that
the
hardware
actually
make
sure
everything
is
actually
durable
and
committed
to
the
underlying
stable,
medium
or
whatever,
and
when
that
returns
then
blue
store
will
turn
around
and
then
they'll
write
the
metadata
that
points
to
that
new
space.
So
the
object
table
will
get
written
over
here
once
that's
committed,
then
it'll
submit
a
rocks,
TV
transaction
that
says
there's
now
this
object
with
name
foo
that
is
stored
in
these
blocks.
A
That
goes
through
the
rocks
to
be
transaction
log,
and
it
then
also
does
a
write
to
the
device
and
does
a
flush
in
order
to
make
sure
that
actually
commits
to
disk
and
so
in.
In
the
normal
case,
when
you're
doing
a
write
to
an
object,
maybe
there's
a
read,
because
you
have
a
cache,
miss
determine
whether
the
object
exists
or
not,
but
assuming
it
doesn't
you'll
do
too
I
always
want
to
write
the
data
and
then
one
right
to
metadata
and
once
that's
committed
that'll
tell
the
Oh
Steve
the
transaction
is
safe.
A
A
We
make
sure
we're
not
sort
of
recovering
the
object
and,
assuming
all
that
stuff
looks
good,
then
there's
something
called
an
object
context,
which
is
basically
the
Oh
STIs
layer.
Both
the
layers
handle
to
an
object
that
might
have
some
implied
rights
or
reads
to
it.
It's
sort
of
like
a
sore
like
the
inode,
but
it's
at
a
slightly
higher
it's
one
layer
up,
and
so
it
calls
a
function
called.
B
A
Let's
see
find
object,
context
yeah.
This
is
what
it
calls
right
here.
So
this
is
a
helper
in
the
OSD
that
actually
loads.
This
is
where
we
actually
load
the
metadata.
Let's
find
object,
context
and
you'll
see
what
it
does
here.
Calls
get
object
context
and
it
does
a
first.
It
looks
up
in
its
cache
because,
hopefully,
if
you're
doing,
post
e
has
a
cache
at
that
layer,
so
usually
I'll
hit
live
like
that.
A
I
guess
your
plot
doesn't
usually
do
it,
but
assuming
it
doesn't
it
basically
does
it
get
adder
that
all
through,
through
the
BG
Beck
feed
you
back
into
the
object
store
and
does
it
get
a
tour
on
an
object
to
load
the
attribute,
and
it's
then
to
test
if
it
exists
and
it's
that
successful,
then
it
installs
an
entry
in
this
object,
context
cache
and
if
it
say
in
it
it
creates
anyone
and
also
installs
that
into
cash,
and
then
you
get
a
ref
to
that
cash.
Basically
here.
A
Trying
to
remember
I
think
I
can't
remember
if
it
does.
Let
me
look
at
object
context,
so
the
so
you
remember,
an
object
has
sort
of
three
ways
of
the
source
data
it
has
attributes
which
are
sort
of
analogous
to
Dennett
attributes
in
a
file
system.
They're
meant
to
be
small,
with
small,
a
small
number
of
them
and
relatively
small
values
and
small
names,
and
so
it
might
load
all
those
at
once.
It
might
only
load
the
ones
it
might
only
load.
A
So
there's
one
sort
of
magic
attribute,
that's
called
underscore
that
has
what's
called
the
object
info
and
that's
the
greatest
metadata
about
that
object
like
what
version
it
is.
That's
the
main
thing,
a
few
other
things.
You
know
what
it's
what
is
checksum
is
if
it
has
a
object,
full
object,
checksum
that
sort
of
thing-
and
so
that's
I-
think
that's
probably
what
it
all
loads.
A
A
So
there's
the
attributes,
there's
OMAP,
which
is
also
key
value
data,
but
it's
sort
of
meant
to
be
unbounded.
You
could
have
megabytes
of
it
into
in
a
single
object
and
it's
sort
of
you
know.
Random
access
values
can
be
big,
that
sort
of
thing
and
then
there's
the
data
portion,
which
is
just
a
byte
stream
kinda
like
a
file,
and
so
any
object
can
have
all
three
it
can
have
just
by
data.
It
could
have
just
our
map.
Usually
it's
just
one
of
the
other.
B
A
Depends
so
yeah
if
you
miss
all
of
your
caches,
then
yeah
like
in
the
worst
case.
You
would
miss
the
object.
Contact
cast
to
get
up
to
read
the
attribute.
You'd
miss
the
rocks
to
be
cast,
you'd
have
to
read
the
SST
index
and
you
have
to
get
the
key
value
pair
and
then
it
would
get
in
the
Blue.
Star
has
its
own
cache
and
then
it
would
also
get
in
solving
the
OST
cache
and
so
on.
A
But
in
sort
of
that
daddy
state
workflow
do
you
tend
to
hit
those
caches,
and
so
it
you
don't
have
all
the
streets.
So.
B
B
B
A
Yeah
I'm
not
sure
it
depends,
I,
think
I.
Think
of
me
in
the
best
case,
there's
one
first
or
there's
no
reason
it's
just
a
right
and
the
worst
case
would
be
like
a
bunch
of
iOS
to
the
other
like
device
to
prime
all
this
stuff.
So
what
it
averaged
out
is
gonna
depend
on
how
big
your
caches
are.
What
your
workload
is.
A
big
projects
are
like
20
million
different
variables.
A
A
For
a
new
put
in
the
case
of
our
body,
you
tend
to
write
to
existing
objects,
and
so
assuming
you
have
a
huge
big
enough
data
set
that
all
the
metadata
isn't
fit
in
cache,
then
every
write
is
going
to
load
some
metadata
about
that
existing
object
and
then
thanks,
I'm
update
to
it,
but
at
all
it
all
depends
on
how
big
your
caches
are.
Oh.
B
B
A
Yeah
again,
it
depends
on
how
much
memory
gives
the
ass
so
I
already
answered
that
question.
I'm,
assuming
you
have
sort
of
a
more
normal
deployment
where
you
have.
You
know
a
few
gigabytes
per
postie
and
you're
writing
to
new
objects.
Yeah
the
rocks
TV.
All
those
caches
are
gonna
easily
fit
in
memory
and
so
you're,
not
those
aren't
gonna
generate
reads,
and
so
a
write
is
going
to
result
in
one
right
and
there
won't
be
any
reads.
B
B
A
A
Let's,
let's
look
at
the
top-level
directory
here,
so
we
looked
at
the
OST
code,
which
is
ghosty
demon.
This
is
the
part
of
the
USD
that
talks
to
other
eros
T's
and
does
replication
or
your
coding
all
that
stuff
we
looked
at
OS,
which
is
the
object
store.
That's
the
actual
back-end
that
stores
data
on
the
local
device.
We
looked
at.
A
Let's
see
the
message
directory,
which
is
the
API
and
implementations
for
passing
messages
and
the
actual
message
definitions
themselves,
the
other
sort
of
there's
Mon.
This
is
the
Mon
daemon.
It
runs
Paxos
and
sort
of
each
track
of
who's
participating.
The
cluster
managers,
the
new
one
that
is
similar
at
them
on
and
there's
only
personally,
only
one
of
them
active
at
a
time
and
it
embeds
the
whole
Python
runtime.
A
So
you
can
type
modules
that
run
in
the
manager
and
those
those
modules
live
in
a
directory
called
PI
bind,
which
is
where
all
of
the
all
of
the
Python
code
is
and
pipelines.
So
this
includes
like
wrappers
for
Lavar,
batil
of
rgw,
labret,
dos
and
so
on,
and
there's
a
directory
for
all
the
manager
modules.
So
in
here
you
can
see
there
are
more
than
10
now,
like
the
dashboard
is
in
here
they're
like
a
bazillion
files.
A
A
You
ever
go
looking
for
them.
Let's
see
other
things
at
the
top
level
that
are
interesting
kv
is
these
are
wrappers.
This
is
our
internal
abstraction
for
a
key-value
database.
There
are
three
implementations
right
now.
One
wraps
leveldb
one
wraps
Rock
Stevie
and
one
wraps
is
an
in-memory
benchmarking.
One
called
an
MDB
pretty
much
everything
now
it
defaults
to
rocks
to
be
so.
The
level
to
be
one
isn't
really
used
anymore.
A
Blue
store
uses
the
rocks
to
be
one
to
talk,
drugs
to
be,
and
the
monitor
uses
rocks
Yui
to
store
its
own
database
stuff,
but
Cavey
the
interface
that
they
all
go
I'm
going
to
ski
belly
DB.
So
it's
a
local
DB
like
interface,
that
gives
you
key
value,
gets
and
puts
the
transactions.
Basically,
so
you
can
imagine
plugging
something
like
Berkeley
to
be
in
here,
something
if
you
really
wanted
to,
and
then
let's
see
the
other
interesting
one
is
Oh
SDC,
which
is
short
for
OSD
client.
A
You
know
whatever
handles,
are
replies
and
then
does
callbacks
to
the
IRF
upper
layer
and
so
on.
So
if
you
look
deeply
deeply
in
here,
there's
like
a
transaction
class
here
that
I'm
looking
at
right
now,
that's
not
super
interesting.
But
if
you
look
at
like
a
read
function
down
here,
you
can
tell
it
what
object
to
read
and
it'll,
give
you
a
return
transaction
ID
and
you
pass
it
a
context
which
is
like
a
callback
that
gets
triggered.
A
A
That
takes
a
number,
that's
pure
abstract,
so
you
implement
your
own
callback
by
overloading
this
finish
function
to
do
whatever
you
need
to
do
in
your
bottom
half
and
you
pass
the
pointers
to
these
around
and
the
code,
the
code
when
it
triggers
it
will
call
either
finish
or
actually
it'll
call
this
complete,
which
basically
just
calls
finish
and
then
deletes
itself.
So
this
is
kind
of
how
lambdas
work,
but
it's
not
built
into
the
language.
A
Let's
see
client
is
the
set
of
s.
Client,
though
it
has
its
client-side
I,
know
cash
and
implements
read
and
write.
It
has
a
buffer
cache
all
the
stuff
that
you
kind
of
expect
profile,
system,
client,
API
and
that
implements
all
the
complicated
protocol
between
the
client
in
the
MVS
for
leases
and
locks
on
you
know,
data
or
metadata,
and
so
on.
A
A
All
right,
Romo
son
here
liberate
us,
is
basically
a
wrapper
around
object,
ur
that
packages
that
up
as
a
shared
library-
and
so
you
know,
if
you
look
at
liberate
of
stuff
CC,
you
have
things
like
raid
us
and
it
gratos
create
or
whatever,
and
they
basically
just
sort
of
instantiate
an
object
or
basically
and
then
pass
through
things
and
with
the
wrap
things
are
nice,
the
nice
C++
way
or
than
ICU
a
to
call
into
the
object
here
and
do
the
right
thing
and
put
the
stuff
context
data,
so
that's
associated
with
it
all
that
stuff,
and
so
it's
a
relatively
thin
directory,
but
it
that's
all
there.
A
That's
mostly
that
there's
an
off
directory
that
has
all
this
FX
and
I'm
related
authentication
code,
Kerberos
stuff
is
going
to
land
in
there
shortly,
there's
a
class
directory
CLS.
These
are
all
of
the
radius
classes
that
can
be
dynamically
loaded
into
the
OSD
to
implement
new
greatest
operations,
and
so
you'll
see
there's
a
bunch
of
our
BD
classes
that
everybody
uses.
There's
art,
UW
classes
that
aren't.
You
have
uses
on
the
OSD
side
to
make
atomic
complicated
updates
to
objects
on
the
OSD
side.
A
A
That's
mostly
I
think
those
are
the
interesting
parts
and
write
a
time
so
I
think
if
you're
here
now
or
if
you
watch
this
later-
and
there
are
specific
areas
of
the
code
that
you
would
like
to
go,
get
more
detail
on.
Follow
up
on
the
list.
Reply
to
the
email
or
first
any
Miller
mentioned
on
IRC,
and
we
can
do
another
one.
These
could
walk
through
that
sort
of
zooms
in
on
something
else.
A
C
Would
like
to
have
this
Lebar
be
declines
a
little
bit
in
detail
if
it
is
possible,
maybe
I'm
trying
to
do
some.
Caching,
with
the
prototype
exposed.
The
one
like
I
tried
to
work
with
is
DC
object,
cache
sure,
but
it's
like
a
lot
of
dependencies
in
that
layer.
I
don't
want
to
do
so.
I
I
was
just
thinking
of
doing
a
prototype
layer
ever
in
the
cache
pass-through
and
right
back
so
I
was
just
interested
in
the
barbary
a
little
bit
more
okay.
A
C
A
C
A
A
A
It
has.
Oh,
that's
nice!
I
doesn't
manifest
object
info
t
here
it
is
yeah.
They
named
the
object,
the
version,
a
user,
visible
version,
the
last
request
that
touched
it
sighs
it's
m
time,
local
in
time
it
has
flags
it
flags
about
whether
it
has
IMAP
data
or
white
out
or
it's
dirty
and
for
tearing
this
is
used
for
set
of
s
order
truncates
in
this
sort
of
weird
way.
Watchers.
This
is
part
of
Washington
defy
the
sort
of
pub/sub
things
array.
A
This
layer
whole
object,
digests
which
are
sort
of
opportunity
to
be
sent
them
hints
I.
Think
that's
it,
though.
The
the
underlying
object
store
layer,
its
concept
of
an
object,
includes
attributes
and
data,
a
no
map,
all
the
attributes
on
the
object,
there's
one
of
them.
That
is
just
underscore
and
that's
stores
an
encoded
object,
info
T
and
then
all
the
other
attributes
are
the
ones
that
are
visible
to
the
user
that
are
exposed
to
the
radius,
client
or
whatever.
A
A
Blue
store
types
at
H,
you
look
at
blue
store,
Oh
node,
which
is
sort
of
its
eye,
node
type
structure-
it
has
you
know
some
hints
and
so
on,
and
it
somewhere
in
here
at
stores
the
trees
yeah.
The
attributes
for
that
object,
so
in
blue
store
faults
and
all
node
o
node
sort
of
into
its
cache.
It
has
all
the
attributes
right
there
and
so
will
call
into
the
and
that
layer
will
be
able
to
get
them
quickly
or
get
all
over
them
all
in
sir,
whatever
it
is.
D
A
In
a
general
object
store,
the
interface
looks
like
there
isn't,
actually
data
type,
the
its
interface.
So
if
you
do
get
after
you
for
that,
we
give
up
a
handle
for
the
collection
and
the
object
name,
you
give
it
the
name
and
you
get
a
value
out.
So
it's
like
get
X
adder
the
system
call
and
there's
a
variation
of
this
called
get
adders
that
just
fetches
all
of
them
and
it'll
give
you
an
STL
map
of
all
attributes
to
their
values.
A
So
this
is
the
interface
that
consumers
of
that
will
use
and
at
the
greatest
level,
this
is
what
the
OST
consumes
at
the
rate
of
sliver.
They
are
on
the
other
side
of
the
network,
there's
a
whole
different
set
of
operations
that
you
can
do
that
I,
don't
remember
what
they
are.
I
can't
remember
those
get
all
that
repeats
or
not,
but
it's
just
someone
you
can
go.
A
Look
at
the
rate
of
cider
spells
I'm
I
apologize
I'm
at
a
I'm
at
a
time,
I
need
to
go
eat
dinner,
but
thanks
everyone
for
coming
and
we're
gonna
post
this
on
youtube
tomorrow.
Whatever
follow
up
on
the
email
list,
if
you
have
questions
about
this
or
if
you
have
other
requests
for
other
deep
dives
or
topics
you
want
to
cover,
I'm
that's
a
great
take
pretty
much.