►
From YouTube: CDS G/H (Day 1) - OSD review
Description
https://wiki.ceph.com/Planning/CDS/CDS_Giant_and_Hammer_(Jun_2014)
24 June 2014
Ceph Developer Summit G/H
Day 1
OSD review session
A
Alright,
so
on
to
the
next
session
here,
so
the
next
one
we
have
is
the
OSD
review.
Primarily,
it
looks
like
we're
going
to
focus
on
the
back
end.
Discussion
for
kinetics
rocks
DB.
Some
of
that
work,
that's
going
into
it.
I,
don't
think
there
was
anything
else
as
far
as
the
review
goes,
but
I
do
know
that
we
have
some
discussion
later
on
in
the
day.
That's
about
the
new
work,
that's
coming!
So
if
you
want
to
give
us
the
review
sage.
B
Sure
I
can
give
answer
the
high-level
thing
if
and
samer
Josh
other,
if
you
want
to
chime
in
too
to
be
great
and
so
there's
a
bunch
of
stuff
that
we
proposed
originally
during
a
giant
summit
and
then
there's
a
bunch
of
other
sort
of
work
that
was
in
progress
and
people
were
discussing
on
the
mailing
list.
A
lot
of
us
just
around
the
the
backends
for
the
OST
in
general,
so
I
wanted
to
just
sort
of
highlight
the
work.
That's
in
progress
were
that
and
where
it
might
be
going.
B
So
the
first
is
that
just
a
sort
of
level
set
remind
everyone
that
the
that
the
way
the
OSD
is
architected
there
is
an
internal
abstraction
called
object
store.
That's
the
interface
that
the
OST
demon
uses
to
talk
to
its
local
storage,
I'm,
usually
a
file
system
and
disk.
Whatever
that's
attached
on
that
log
on
that
node,
that's
sort
of
cleanly
encapsulate
sever
ething
it
needs
to
do
is
store,
store
data
locally
and
there's
currently
one
implementation
that
everybody
uses
for
that.
B
It's
called
file
store
and
that's
sort
of
the
piece
that
stores
objects
as
files,
usually
next
fest,
but
I'll
also
can
do
butter,
fessor,
x4
and
a
piece
that
has
that
journal
that
we're
normally
putting
an
SSD.
So
so
it's
the
thing,
that's
responsible
for
managing
that
that
that
local
disk,
but
the
nice
thing
about
that
interfaces
you
can
plug
in
an
alternative
back
ends
that
implement
things
differently.
So
one
of
the
things
we
added
be
three
to
six
months
ago
was
store
back
end.
B
B
Do
benchmarking,
mostly
I'm
in
validation,
to
see
how
cluster
perform
if
it
had
sort
of
this
infinitely
or
drm
speed
back
end,
just
as
useful
for
benchmarking
and
some
other
things
Amanda's,
also
just
as
a
reference
implementation
for
what
the
backend
sort
of
has
to
do
in
order
to
be
to
be
functional,
the
the
second
back
end,
that
was
our
third
I
guess
back
in
that
was
added
recently
from
how
my
wing
at
United
stack
was
a
key
value,
store
implementation,
and
the
idea
here
is
that
basically,
everything
that
you're
putting
into
the
OSD
that
is
storing
locally,
is
just
dumped
into
a
key
value
store
like
level
gb
and
all
the
data
lives
there.
B
I
mean,
if
that's
sort
of
the
workload
then
just
dump
all
that
in
level
to
be
don't
bother,
creating
individual
files
in
the
file
system
and
things
will
things
we
better
and
the
nice
thing
also
about
that
interface.
It
has
bubble
doing,
in
particular,
has
a
transactional
interface
things
I
do
update
these.
You
know
37
keys
and
it
will
atomically
do
that
in
an
inefficient
way,
which
matches
very
well
the
semantics
that
our
object
store.
B
Interface
has
I'm,
so
you
can
get
rid
of
that
journaling
weirdness
that
that
the
file
store
does
so
that
went
in
a
while
ago,
and
we've
been
how
much
been
doing
I'm
very
sort
to
improve
performance
and
tested
and
and
tweak
and
so
forth.
So
that's
been,
that's
been
fun,
I'm
sort
of
the
other.
The
other
thread
of
this,
though,
is
that
the
the
interface
that
we
use
to
talk
to
that
key
to
a
store
also
has
an
abstract
sort
of
wrapper
in
incest,
land
and
that's
called
key
value.
B
Db
I
believe
and
that's
just
a
thin
wrapper
around
level.
Tvs
API.
Essentially
so
there's
another
branch
outstanding
right
now
called
with
Brock's
TB.
That
takes
Fox
TV,
which
is
a
different
key
value,
store
that
came
out
of
facebook.
That
I
believe
actually
was
based
on
level
to
be
originally,
but
it's
optimized
specifically
for
flash
SSD
stuff
and
it's
suppose
to
be
super
fast.
B
We
just
use
it
for
only
a
few
different
things,
but
you
could
throw
those
few
different
things
in
rocks,
DB
and
sickle
to
be
so.
That's
in
progress,
that'll
be
the
second
key
value
back
end
the
idea.
Then
it's
you
can
sort
of
slaton
others,
depending
on
what
makes
sense
for
your
work
bud
or
what
sort
of
long
term
ends
up
being
being
better.
B
B
It's
got
a
little
arm
process
or
running
stuff
on
there
and
you
talk
to
it
over
tcp,
but
and
instead
of
having
to
change
two
things
wanted
to
see
if
the
net
and
the
other
thing
is
that,
instead
of
sending
you
know
block,
I
use
to
this,
you
send
key
value,
is
so
internally
on
the
disk.
It's
doing
it
has
sort
of
a
key
value
database.
That's
all
sort
of
cute
well
integrated
with
the
disk
room,
where
presumably
to
do
do
it
all,
efficiently
and
so
forth,
and
so
josh
did
some
patches.
B
B
There
are
a
few
sort
of
rough
edges
because
of
like
they're,
using
like
the
C++
API
and
it
required
C++
11,
which
we
don't
build
everything
with
yet
because
it's
not
supported
voltar
distros,
and
there
are
some
issues
with
the
iterators,
like
reverse
iterators,
being
exposed
through
the
C++
API,
and
so
we're
working
around
that
I'm
in
some
annoying
ways,
but
their
modulus
some
sort
of
annoying
issues
to
get
cleaned
up.
That's
mostly
ready
to
get
hold
of
the
tree
just
as
sort
of
a
second
reference.
B
Our
third
reference
back
end
implementation,
that
people
can
people
can
play
with
not
again
not
recommended
for
for
production
use.
Yet
you
can't
even
buy
these
drives
at
anyway,
but
as
sort
of
an
experiment
to
see
what
two
people
can
play
with
it
and
see
what
kinds
of
what
kinds
of
work,
both
and
so
forth.
It
makes
sense
on
this
sort
of
this
particular
class
of
Ethernet
dries
josh
and
stress.
Shear
stress
want
to
talk
any
talk
about
the
kinetic
step
at
all.
I,
don't
see
him
like
Josh
I,
don't
think
he's
here
yet.
B
B
Yes,
so
the
current
generation
of
kinetic,
as
whatever
is
doesn't
let
you
run
your
own
code
on
the
disc,
so
they
they
wanted.
My
understanding
is
they
wanted
to
sort
of
have
a
a
new
standard,
ish
interface
open
up
all
the
API,
so
people
can
assume
that
interface.
But
you
know
it's
really
about
like.
What's
the
next
generation
of
disc
interface,
supposed
to
look
like
kinetico
go
use
that
instead,
so
you
can't,
you
can't
run
the
set
posting
on
the
arm.
B
There
are
other
hard
drive
manufacturers
who
are
making
Ethernet
discs
that
do
let
you
run
whatever
code.
So
hgst
I
made
this
big
announcement
at
OpenStack,
where
they
have
very
thorough
version
of
the
ethernet
drive
that
has
just
a
general
purpose:
ARM
processor,
with
a
couple
gigs
of
ram
and
a
couple
of
Coors,
maybe
just
one
core
I
remember,
but
you
really
it's
like
running
debian
or
something
you
just
like
install
the
OSD
on
there.
B
The
arm
build,
which
already
we
did
much
work
last
year
that
I
overall
works
you
can
just
actually
I
can
barely
talk
to
us.
They
just
start
install
this
F
and
you
could
run
Seth
on
the
disk.
So
you
can
imagine
these
data
centers
that
are
just
full
of
racks
and
racks
of
disks
and
shelves
that
are
plugged
directly
into
Ethan
Ethan
it
for
cold
storage,
is
pretty
pretty
cool
and
then
sort
of
the
host
entirely
just
sort
of
what
we
were
thinking
when
we
design
Stefan
called
them.
B
Oh
s
DS
object,
storage
devices
that
would
actually
be
a
device.
So
yes,
so
that's
that's!
That's
happening.
I
mean
for
those
drives,
though
they're
not
doing
anything
weird
with
the
disk
interface,
so
it's
still
just
a
block
device,
and
so
on
that
little
arm
chip
on
the
disk,
you
still
have
to
run
exit
pass
or
whatever
so
I
think,
there's
still
a
lot
of
room
to
sort
of
improve
that
that
situation
but
I
think
again,
sort
of
broad
strokes.
B
The
file
store,
one
that
everybody
uses
is
something
that
we
arrived
to
incrementally
over
many
years
and
and
works
well,
and
is
sort
of
welt
us
and
robust,
but
isn't
always
the
most
performant
and
isn't
necessarily
what
you
would
sort
of
end
up
with.
If
you
started
with
a
clean
slate,
and
so
people
should
put
your
thinking,
caps
on
I
guess
and
see
what
what
else?
What
else
makes
sense
josh
has
arrived?
Josh
is
coming
yep,
hey.
C
You
now
there,
you
and
I
didn't
hear
the
first
part
of
it,
unfortunately,
but
basically
and
is
like
a
prototype
back-end
that
I'm
going
to
push
to
a
branch
later
this
week,
which
uses
the
existing
Elias
tour
and
interface
and
the
key
value
DP
interface.
And
this
influence
that,
in
terms
of
at
the
connect
API
and
talking
directly
to
the
can
make
back-end
from
those
tees.
C
B
C
B
C
A
B
So
there
is
a
bunch
of
versioning
that
does
happen
all
already
in
terms
of
the
windows.
Ts
are
communicating
to
each
other,
their
future
bits
and
they
go
to
avoid
sending
certain
just
to
their
peers
that
they
know
that
understand.
That's
just
the
basic
step
that
we
have
to
do
to
make
it.
So
we
can
do
rolling
upgrades
in
the
system.
B
B
A
C
B
B
Let's
talk
a
little
bit
a
little
bit
about
that.
It's
yeah,
I
think.
B
Think
so
I
think
it
just
came
in
over
the
list.
It's
my
understanding,
I,
don't
know
saying
this
or
anything
else
you
want
to
talk
to
you
talk
about
as
far
as
maybe
not
back
in
specific
but
just
stuff.
That's
in
progress
on
the.
D
St
together,
the
other
notable
thing
that
happened
since
the
last
time
it
would
be
the
reading
entrails
it
and
pearls
and
improvements
in
the
OC
from
greg
das,
beste
special,
that's
cool.
We
were
looking
as
some
librettos
to
in
to
improve
post
and
throughput
on
hiya,
though
there
were
a
bunch
of
changes
made
to
the
way
the
OSD
dispatches
messages
to
decrease
the
CU
and
include
locking.
B
Just
probably
worth
mentioning
yeah,
this
is
all
post
firefly
as
far
as
when
all
this
one
in,
but
there
was
a
piece
that
I'm
that
changed
the
way
that
the
messaging
layer
that's
pulling
stuff
off
the
network,
hands
off
those
ops
to
the
OSD
and
then
there's
a
a
piece
that
makes
it
the
work
queue.
That's
processing
those
behave
better
when
you
have
lots
of
different
cores,
I.
B
A
B
D
That
means
they
so
the
facebooks
one
solution.
Yes,.
E
B
Okay,
yeah
that
I
mean
the
nice
thing
about
from
what
I've
seen
that
the
rocks
TV
back
end.
It
was
it's
all.
It
was
almost
almost
a
search
and
replace
of
the
level
to
be
one
because
I
think
they
started
with
level
2d
or
at
least
adopted
the
exact
same
API,
though
it's
sort
of
slots
in
easley
there's
another
project
called
hyper
leveldb
out
there
were
somebody
else
that
their
own
set
of
changes
to
level
to
be,
I
think
mostly
around-
will
swing
around
threading.
B
Weird
lot
want
them:
lots
of
course,
dispatching
transactions
and
I
think
level
to
be
sort
of
piles
it
all
up
in
one
and
one
thread.
So
that
would
be
another
one
that,
for
the
same
reasons,
would
be
sort
of
trivial
to
two
fought
and
I
do
two
experiments
with
so
for
people
who
are
interested
in
this
stuff.
B
B
Would
that
think
that
is,
it
is
going
to
make
it
in
for
for
giant
is
the
was
it
called
read
Ford
and
the
idea
there
is
when
you
have
a
hot
in
a
cold
here
and
using
you're
using
ratos
caches
and
you
object
is
not
existent
in
a
cashier.
But
it's
a
read,
you're,
better
off
just
forwarding
that
to
the
base
deer
and
reading
from
there
and
not
promoting
it
to
the
cast
year,
because,
apparently,
with
you
know,
high-end,
flash
and
low
in
flash
or
that's
probably
a
very
impress,
I
sturm,
but
basically
the
flash.
B
That's
intended
for
colder
storage
can't
do
as
many
write
cycles
and
tends
to
be
slower
on
rights.
But
it
turns
out
the
reed
performance
is
essentially
the
same
as
the
as
the
good
stuff
and
so
forwarding,
if
it's
a
read-only
event,
just
forwarding
it
to
the
cold
tier
and
reading
from
there.
It's
just
as
good.
B
That's
a
relatively
simple
changed.
Lewis
I
went
over
it
with
luis
last
week
or
like
before.
So
we
might.
We
might
stick
that
in
20
minutes,
not
whatever,
but
I
think
that
one's
going
to
go
ahead
and
giant.
It's
a
pretty
simple
change
actually
to
add
that
at
that
additional
mode,
the
second
one
that
we
were
talking
about
was
would
actually
proxy
the
read
through
that
cash
OSD.
And
so
you
don't
go
back
to
the
client
and
sort
of
redirect.
B
A
B
Wouldn't
that
be
great,
there
are
a
bunch
of
different
caps
that
limit
different
parts
that
that
consume
memory,
but
there's
no
sort
of
total
total
cap
on
the
process.
Unfortunately,
so
that's
actually
sort
of
going
forward
in
general.
It's
something
that
we
need
to
do
a
better
job
of
is
figuring
out
how
to
ensure
that
the
OSU's
don't
get
too
carried
away,
particularly
if
we
will
want
want
to
talk
about
using
this.
B
This
new
generation
of
ethernet
drive
or
running
the
OSD
on
on
the
disk,
and
you
sort
of
have
this
much
more
constrained
environment
where
apparently,
a
couple
couple
gigs
of
ram.
Probably
I
me
need
to
be
quite
sure
that
you
fit
within
that
and
don't
dig
yourself
into
a
hole
because
it
will
be
challenging
to
get
out
of
it.
B
B
Okay,
I
think
that
the
things
that
worry
me
is,
if
you
have
situations
where
you
know
the
the
data
distribution
changes
such
that
suddenly,
a
subset
of
osts
are
mapped
to
a
whole
bunch
of
pg's
and
in
order
to
go
through
all
the
current
peering
and
recovery
stuff,
even
to
just
sort
of
again
shed
that
responsibility
back
off
to
another
device
they
get.
They
have
a
lot
of
work
to
do,
but
we
need
to.
A
B
B
I'm,
so
one
of
the
again
with
all
these
key
value
backends,
we
have
this
sort
of
internal
interface
that
we
use
that
wraps
the
individual
back-end
that
we
can
then
map
onto
it.
So
the
goal
with
with
the
kinetic
work
also
was
to
to
get
something
that'll
work
on
a
generic
key
value
store
and
then
have
an
tickets
or
plug
in
it
other
ones
easily.
So
I
would
love
to
see
a
set
of
patches
that
just
wire
up
the
nvm
kb
interface.
The
one
challenge
there
is
that
our
key
value
interface
is
transactional.