►
Description
Videos from Ceph Developer Summit: Infernalis (Day 2.2)
04 March 2015
https://wiki.ceph.com/Planning/CDS/Infernalis_(Mar_2015)
A
B
B
But
in
the
case
of
our
BD,
things
are
mostly
better,
but
in
our
GW
there
are
lots
of
cases
where
you
have
objects
that
are
written
somewhere
in
radius,
and
you
have
some
sort
of
metadata
that
points
to
it
or
references
it
or
some
other
way
that
you
need
to
update
atomically
and
it's
just
frustrating
every
time
you
have
to
do
that.
To
do
adventure
reinvent
your
own
two-phase
commit
thing.
So
the
basic
idea
is
to
expose
a
multi
object
to
liberate
us.
B
There
are
a
few
different
variations
on
how
this
would
actually
get
done.
The
one
that
makes
the
most
sense
that
we've
discussed
recently
is
a
list
as
model
to
here
so
in
the
blueprint.
So
the
basic
idea
would
be
send
this
full
transaction
to
the
master.
It
would
just
hold
that
in
memory
for
the
duration,
it
would
send
it
prepare
to
any
of
the
slaves
that
have
just
their
portion
of
the
update.
Just
their
sub
part
of
that
object
right
operation.
B
B
It
would
apply
the
transaction
locally
and
mark
something
market
that
it's
committing
and
then
it
would
send
an
act
to
the
client
because
at
that
point
were
were
committed,
and
it
would
also
send
messages
to
all
the
slaves
again
saying
that
there
should
take
that
thing,
that
they
durably
prepared
and
apply
it
to
the
actual
object
and
once
the
slaves
act,
then
the
master
could
forget
discarded,
sort
of
committing
state,
and
everything
would
be
all
done
so
that
the
key
thing
that
that
doing
this
in
radio
spies
you
is
that
all
of
the
failure
paths
rollback
stuff
can
be
handled
by
the
OCS
themselves.
B
They
can
communicate
with
each
other.
So,
for
example,
if
there's
a
peering
shift,
then
when
we
reactivate
we
just
look
at
our
PT
state,
we
say:
do
we
have
notif?
Do
we
have
slave
operations
that
we've
prepared
but
haven't
committed
yet?
And
if
so,
we'll
just
tell
the
master,
we'll
ask
the
master.
If
we
should
roll
them
forward,
a
roll
them
back
and
the
master
can
can
do
that.
B
We
don't
really
have
to
do
the
same
thing
on
the
master,
because,
if
the,
if
there's
a
peering
change
and
the
master
moves
primary
moves
to
a
different
OST,
the
clients
going
to
resend
the
whole
transaction
anyway,
just
like
it
does
every
other
other
operation.
So
there's
not
really
any
point.
That's
the
reason
why
the
master
holds
the
transaction
in
memory
and
doesn't
write
it
to
disk,
at
least
in
this
current
formulation
on
the
email
list.
B
B
Right,
so
any
any
questions
so
far,
I
think.
C
Okay,
let's,
let's
let's
say
we
get
to
part
or
bullet
point
three,
three
of
the
slaves
who
persist
the
repairs,
three
of
them
haven't,
and
everyone
goes
through
appearing
change.
When
we
come
back
up,
three
of
them
have
uncompleted
transactions
and
the
other
three
don't
who
decides
that
this
is
a
rollback
situation?
Add
about
a
roll
forward.
B
C
B
Does
work
yeah
well,
so
there
are
lots
of
ways
that
we
can
make
it
slightly
more
complicated,
more
interesting.
For
example,
each
of
those
object
write
operations
could
potentially
have
read
guards
at
the
front
like
make
sure
that
the
version
is
X
and
call
some
class
operation
or
whatever
it
is,
and
if
those
fail,
then
the
preparer
would
fail
basically
and
then
the
transaction
would
abort.
So
that's
one
sort
of
easy
extension
and
our
job.
B
You
might
use
this,
for
example,
to
make
sure
that
the
no
we
don't
go
over
quota
or
something
on
the
index
or
something
I
don't
know
whatever
it
is.
Imagine
lots
of
uses
there.
I
probably
won't
worry
about
it
instead
of
the
first
iteration,
although
we
just
want
to
make
sure
that
we
adequately
tests
all
the
repair
well
pair,
I
guess.
C
B
When
it,
when
I
think
about
this,
it
just
did
this
struct
here
at
the
top,
where
it's
just
a
map
of
existing
object.
Right
operations
like
I,
think
it
I
think
it
just
works,
but
I'm
I'm
sort
of
expect
that
we're
missing
something,
and
it's
not
that
simple.
If
I
don't
find
out
till
we
actually
go,
do
it.
C
B
B
B
B
C
B
D
C
If
you're
worried
specifically
about
the
case
where
you're
creating
an
object
and
also
updating
a
pointer,
then
we
can
get
that
one
while
still
doing
the
prepare.
If
you're
worried
about
a
four
megabyte
over
right,
though
you're
right,
we
would
have
to
do
it
that
way.
I'm
not
sure
reform
about
overwrite
is
that
common
yeah.
C
B
C
B
C
B
C
C
B
A
lot
more
you
do
this
detect
or
avoid
that
box.
Yeah,
that's
I,
think
that's
the
hard
part,
so
I
mean
the
the
yeah
and
I
could
cheat
and
just
say
that's
up
to
the
client.
To
do
that.
I
mean
in
practice
like
greatest
gateway,
sort
of
the
motivating
example.
Here
the
the
master
is
always
going
to
be
the
like
the
head
object
or
you
know
that
object.
B
One
of
the
one
of
the
objects
in
the
RW
object,
like
a
shadow
or
whatever,
and
the
in
the
index
is
0,
is
going
to
be
a
secondary
thing
and
there's
there's
never
really
a
combination
in
which
you
would
end
up
dead,
locking
because
you
don't
have
to
sort
of
weird
dependencies,
but
that's
kind
of
dangerous.
That's
good.
C
I
mean
we
could
push
that
to
the
client
we
could
have
them
submit
a
list
of
we
get.
We
get.
The
interface
could
force
them
to
submit
a
list.
An
ordered
list
of
object,
write
operations
of
pair
h,
object,
t
object,
right
operation
touch
that
they
promise
that
no
two
clients
will
submit
operations
with
flipped
or
cyclic
orderings.
C
That
is
it's
a
it's
a
dag.
Am
that's
not
an
a
reasonable
approach,
but
it's
also
hard
to
use
correctly
yeah
yep.
It
also
prevents
them
from
doing
from
use
it
from
having
any
kind
of
parallelism.
So
if
we,
if
we
wanted
there
to
be
parallelism,
for
example,
the
case
where
you're
creating
and
new
objects,
which
are
necessarily
at
the
bottom
of
the
lock
hierarchy,
because
no
one
else
can
possibly
be
looking
at
them,
then
you'd
want
to
be
able
to
say
these.
C
B
Seems
to
me,
like
deadlock.
Detection
might
be
a
simpler
path
here.
So
what?
If
every
to
both
the
the
master
and
all
of
the
prepares
have
a
list
of
all
the
objects
that
are
involved
in
the
transaction
and
if,
while
while
I,
prepare
or
a
master
coordinator
is
in
in
flight,
we
see
another
transaction.
That.
C
B
B
It's
my
the
blue
jeans
likes
to
adjust
my
microphone
volume.
Sorry,
so
I'm
not
sure
where
it
was.
If,
if,
if
you
cut
out
just.
B
I
explain
the
perfect
solution.
Okay,
so
if
it
feels
like
intuitively,
this
might
be
enough
information,
but
I'm,
not
sure.
So
if
the
the
master
obviously
has
a
list
of
all
the
objects
that
are
involved,
if
all
the
prepares
also
have
a
complete
list
of
the
object
that
are
better
involved
in
the
operation,
then,
if
a
second
transaction
comes
along,
while
the
first
one
is
in
progress,
we
should
be
able
to
look
at
those
two
sets
of
objects
that
are
involved
until
whether
they
conflict.
B
C
C
C
D
C
B
another's
happening
on
bien
se,
others
happening
on
a
and
C
those
30
s
DS
any
one
of
them
only
ceased
operations
on
two
objects:
Oh
each
one
of
these
those
see
use
two
operations
on
three
objects
that
don't
conflict
or
rather
that
don't
form
a
loom.
The
only
way
they
would
be
able
to
form
a
lute
to
detect
that
there
is
a
look
is
that
all
three
of
them
would
have
to
get
together
and
agree,
even
though
all
three
of
them
are
not
in
the
same
trend:
transit,
transaction,
yeah,
yeah,
that's
the
that's!
B
C
C
B
B
My
initial
instinct
is
to
put
them
in
the
PG
met,
objective
attempt
branch
when,
when,
if
that
merges,
that's
sort
of
the
only
place
that
lives
within
the
placement
group
but
is
outside
of
the
namespace
that
persists
across
or
can
presents
persist
across
printables
the
temp
directory
gets
zapped
all
the
time.
I
thought.
B
B
C
B
C
Possibly
I
mean
that's
that
doesn't
strike
me
as
a
particularly
awkward
thing
like
if
you're
looking
at
the
librettist
interface,
like
oh
lookie,
I
have
atomic
operations.
I
would
like
to
atomically
out
update
these
two
one
megabyte
chunks
of
these
two
objects,
because
that's
where
I
can't
think
of
anything
but
something
yeah.
B
C
C
C
B
Yeah
yeah,
okay,
yeah
you're,
right,
okay,
no
put
them
in
turn,
on
which
case
it's
just
a
wrap-up.
It's
a
there's,
there's
going
to
be
a
PG
log
item,
a
special
PG
thing
called
prepare
that
has
is
associated
with
that
internal
object
and
has
some
stuff
when
it
when
you,
when
you
load
it
in,
though
you'll
need
to
load
up
the
object
context
on
the
actual
object
that
you're
preparing
against.
So
you
can
upload,
you
can
update
it.
B
It's
locked
eight,
but
when
I
read
comes
along,
you
block
that
sort
of
thing,
and
also
so
that
when
Pierre
incompletes
it'll
send
the
notify
back
to
the
de
master,
it
says
rule
for
to
roll
back,
and
then
you
need
a
another
entry.
That's
the
actual
update,
which
I
think
is
just
a
standard
update
on
the
actual
target,
object
on
the
target
object
and
is
committed,
along
with
an
entry
with
type
delete
on
the
internal
error,
object
and
they'll.
B
B
C
Problems
actually
placed
as
a
brief
Saturday
check.
Does
this
fully
handle
the
register
Wu's
use
equation?
This.
B
Sounds
remarkably
like
the
conversation
we
had
like
a
year
and
half
ago
and
refers
talking
about
this,
and
I
think
I
think
that
the
radius
gw1
is
like
a
series
of
it's
like
a
five-phase
or
three
phase
type
thing
where
there's
it's
like
object
index
object
index
or
something
like
that.
So
it's
actually
like
two
transactions
in
sequence.
B
C
C
It
shorted
ready
w
bucket
would
be
a
lot
easier.
B
C
B
B
C
B
E
C
C
C
B
D
E
C
No,
the
client
the
object
or
just
gets
so
the
it
would
just
look
like
the
object
or
knows
all
the
OST
is
involved
in
the
transaction,
replica
or
all
of
the
placement
coups
involved
in.
If
any
of
them
go
through
an
acting
set
change,
it
resends
it
because
it
knows
that
if
any
of
the
primaries
for
those
placement
groups
see
an
interval
change
for
any
of
the
other
primaries,
they
will
assume
that
the
transaction
canceled
on
early.
C
C
B
C
E
Or
with
those
we
kind
of
already
have
like
me,
it
would
happen
that
it
would
give
us
more
guarantees
with
I'm
enemy,
see
there.
It
would
be
simpler
to
program
to,
but
it's
also
a
lot
of
extra
rights
to
do
it.
B
In
that
case
yeah,
or
was
the
other
thing,
it
was
like
that
the
lot.
E
So
we
could
wait
mean
we
could
certainly
try
using
this,
and
if
it
doesn't
add
a
lot
more
latency,
then
it
would
certainly
simplify
things,
but
that
not
having
to
have
his
extra
locks
on
mouch
map
and
well
other
thing.
That
is
what
is
immediately,
in
my
mind,
is
not
the
pipe
the
possibility
of
having
am
I
atomic
in
the
future
force
FS.
If
that
gets
inflated
Linux.
E
It's
basically
guarantees
you
atomic
rights
I
think
to
at
memory-mapped.
Well,
let's
see
used
by
one
process
I'm
when
you
close
it,
it
all
went
when
you
close
it.
If
atomically
makes
the
all
those
rights,
persistent
I,
see.
B
C
C
B
So
figure,
so
it's
it's
an
order
in
operation
which
is
sort
of
tricky.
So
the
the
other
thing,
though,
that
that
doesn't
fit
with
this
problem.
It's
just
that
the
in
POSIX
the
rights
are
supposed
to
be
atomic,
but
we
break
that
when
we
cross
striped
boundaries
lfs,
so
that
would
do
we
do
that
very
simply.
B
B
C
They're
difficult
to
apply
with
rye
toast
right
now,
yeah,
that's
yeah
I
mean
our
our
existing
use.
Cases
are
things
that
are
designed
on
top
of
librettos,
with
its
limitations
in
mind
that
also
need
good
performance.
This
is
more
for
applications
that
are
not
currently
written
for
librettos,
because
it's
too
hard
yeah,
which
is
frustrating
because
we
don't
have
good
examples
of
those
yeah
yep.
C
Okay,
although
if
we
could
get
a
if
we
could
get
I
mean
there's
a
there
is
already
an
example
in
the
tree
of
a
sort
of
splitting
be
tree
implementation
on
top
of
flipper
a
dose.
So
one
exercise
might
be
to
take
that
design
and
work
out
how
much
easier
it
would
have
been
if
she'd
had
access
to
transactions.
C
B
B
Did
they
ever
show
up
nected,
but
yeah?
Okay,
okay,
so
I
mean
I,
think
we
agree.
This
would
be
useful.
It's
also
not
at
the
top
of
the
list
of
things
to
do
hard.
It's
hard,
but
not
so
hard.
C
My
worry
is
that
for
applications
like
that
you'd,
you
want
to
relax
the
atomicity
guarantees
so
that
you
can
get
acceptable
performance.
So
we
need
an
example
of
something
where
the
atomicity
guarantees
here
are
actually
completely
necessary
because,
with
the
locks
were
sure
trial
system
thing
that,
like
half
the
advantage
of
a
social
thing,
is
that
you
can
allied
at
the
end
of
the
log
if
it
turns
out
to
have
been
in
consistently
applied
since
the
log
grows.
E
Yeah
I
wouldn't
think
that,
like
the
the
general
generic
writing
to
the
log
would
and
user
screams
actions,
but
some
kind
of
background
garbage
collection,
type
of
operations
or
read
back
and
data,
or
that
kind
of
thing
we
actually
want
to
repack
dit
atomically
into
you.
I
Gregson
still
have
the
old
data
around
until
you
get
rid
of
it.
I
don't
know,
maybe
maybe
it's
not
necessary.
That's.