►
From YouTube: CDS G/H (Day 1) - RBD review
Description
https://wiki.ceph.com/Planning/CDS/CDS_Giant_and_Hammer_(Jun_2014)
24 June 2014
Ceph Developer Summit G/H
Day 1
RBD review session
A
Alright,
so
this
is
the
rbd
review
session,
looks
like
we're
going
to
give
sage
a
little
bit
of
a
break.
You
won't
have
to
run.
Things
will
make
make
Josh
in
the
spotlight
here
looks
like
the
focus
here
was
mostly
on
journaling
and
mirroring,
which
josh,
if
you
can
give
us
a
little
bit
an
overview
there.
But
after
that,
if
there
are
any
questions
on
rbd
work
that
was
in
flight
for
giant,
we
can
also
answer
those
so
joshua
on.
Take
it
away
sure.
B
So
it's
an
asynchronous
replication
for
really
possibly
from
one
day
to
another
datacenter
or
within
one
cluster,
from
one
quote
for
different
look
cool
and
the
general
structure
of
was
would
be
having
a
journal
of
all
the
data
written
to
image
and
striped
over
Raiders
objects,
hello
to
the
way
the
MDS
stripes
it's
general
and
have
another
process
somewhere
reading
that
journal
and
replaying
it
on
a
different
cluster
or
different
pool
or
even
a
different,
a
different
site.
B
Perhaps-
and
so
there
were
a
number
of
things
we
talked
about
at
the
giant
cement
for
how
that
could
work
and
win
terms
of
the
journaling.
But
what
we
didn't
really
define
well
was
how
the
actual
mirroring
would
be
done
and
how
that
agent
would
be
structured.
So
I
want
to
talk
a
little
about
that.
Oh.
B
Thank
you
my
mind.
The
simplest
minute
way
might
be
to
have
a
quick
sort
of
11
paths
as
over,
like
sort
of
a
scheduling
process
that
I
keep
track
of
all
the
ugly
images
images
that
exist.
That
needs
to
be
mirrored
and
runs
sub
processes
that
actually
do
the
mirror
rank
for
each
individual
image
that
wouldn't
really
scale
past
them
and
network
interface.
For
one
note,
though,
but
it
might
be
a
good
starting
point.
B
And
that
would
make
most
sense
tuna
in
a
pull
model
where
that
kind
of
agent
would
run
in
the
destination
cluster
where's
mirror
where
it's
mirroring
from
source
cluster
to
the
destination
cluster.
It's
run
in
the
destination
cluster
of
its
it's,
this
mirroring
from
to
a
bunch
of
images,
and
then
you
can
be
limited
by
the
network
interface
in
each
destination
cluster,
rather
than
in
only
in
one
node
from
the
source
cluster.
Perhaps.
C
Yeah
so
there
chime
in
here
briefly,
there
was
it.
We
had
a
discussion
on
sat
oval,
I'm
a
couple
months
back
about
this
too
we're
just
sort
of
looking
at
what
the
what
the
I/o
model
for
this
is.
So
it's
definitely
not.
It's
definitely
not
free
right.
So
then
the
cost
is
basically
that
for
each
image
you
want
to
enable
Mearing
on
your
also
writing
a
journal
for
it.
That's
a
journaling
inside
of
or
on
top
of
ratos.
C
B
Yeah
another
thing
we
mentioned
there
was
that
we
could
have
an
option
to
pack
rights
after
they've
written
to
the
journal,
rather
than
after
they've
been
written
to
journal.
The
journal
end
the
normal
location
and
have
an
inner,
and
you
perhaps
used
already
cached
serve
the
beads
from
that
for
any
progress
rights.
C
And
it
might
be
that
if
you
put
the
put
the
journal
in
a
different
pool
like
maybe
you
have
the
base
images
on
hard
disks
and
the
journals
on
SSDs,
just
a
sort
of
a
familiar
theme.
I
guess
and
then
you
get
sort
of
low
write,
latency
and
the
reeds
would
be
coming
from
the
disks
might
be
might
be.
Okay,
maybe.
B
Ya,
thang
wit
I
got
because
of
my
something
you
wanted
to
tune.
It
just
enable
out
to
be
in
a
different
pool,
since
I
think
you
might
be
a
common
thing
to
maybe
have
the
journal
not
replicated
three
times
for
apps,
while
the
orbit
and
acclimate
is
or
maybe
have
the
journal,
even
a
race
recorded.
Oh.
C
C
B
Yes,
whom
and
I
think
the
general
model
I've
been
thinking
about.
This
is
where
11,
which
you
mark
and
RVD
image
as
eating
as
wanting
to
be
mirrored,
and
so
it
sets
some
flag
on
that
image
and
then
there's
no
question
about
whether
we
want
to
score
more
than
one
thing
being
from
a
journal
and
how
we
how
we
handle
a
I
criminal
journal.
In
that
case,
if
there's
only
a
single
like
follower
to
the
journal,
let's
you
be
playing
it
on
a
different
close
to
our
separate
location.
B
It's
easy
enough
to
just
update
a
single
precision
marker,
but
otherwise
we
have
to
know
how
many
there
are
and
then
how
many
different
replication
the
things
are,
reading
the
journal
and
what
the
position
of
each
of
them
is
and
then
something
else
has
to
be
to
determine
when
to
trim
as
well,
and
maybe
I
could
be
the
same
thing
process.
That's
doing
application.
C
C
C
E
C
Questions
are
around
where
the
priorities
are
going
to
fall
as
far
as
focusing
on
dish
at
current
issues
with
you
know,
improving
performance
or
robustness
or
usability
or
whatever
or
sort
of
biting
off
a
big
big
thing.
I
think
this
is
definitely
something
that
would
be
very
valuable
for
a
lot
of
different
use
cases.
C
Replication
right
now
by
using
the
rbd
snapshots
and
snapshot
mirroring.
So
if
you
periodically
take
a
snapshot
of
the
image,
there's
a
moderately
efficient
way
to
get
a
diff,
essentially
between
two
snapshots
and
stream,
just
that
difficult
network,
and
one
of
the
other
blueprints
actually
can
help
prove
that
somewhat.
But
this
would
be
more
near
real
time
where
you
know
you
have
like
a
mirror.
That's
you
know
seconds
till
later,
in
this
delay
or
whatever
you
can
bigger.
C
But
it
is
sort
of
a
nice
enterprise-e
feature,
that's
not
present
any
other.
You
know
open
source
software
defined
type
systems
kind
of,
but
I
don't
know
if
their
questions
about
that.
We
can
talk
more
about
that.
I
think
there's
also.
I
can
also
use
this
as
a
catch-all
slot
for
any
of
the
other,
just
random
stuff
in
our
BD
that
we
want
to
that.
We
want
to
address
over
the
balance
of
the
cycle.
I
think
one
was
one
of
the
things
we
talked
about
was
enabling
caching
by
default
or
let
us
the
option.
B
His
way
through
us
he's
an
option
like
right
cat
right
through
until
flush.
That's
it
to
be
safe
with
older
guests
that
don't
necessarily
actually
send
em
right
flush
operations
down,
so
it
the
cash
is
a
right
through
until
it
sees
the
first
flush
come
through
and
then
there's
the
big
girl
supports
it.
I.
C
B
And
for
a
lot
of
things,
people
will
have
to
configure
it
in
the
system.
That's
like
managing
their
VMs
anyway,
since
they
will
always
have
the
default
value
there.
That's
our
kind
of
writing,
whatever
default.
We
said
as
well,
but
it
would
help
for
more
general
operations
like
between
line
in
for
export.
That
kind
of
thing.
C
C
Let's
see
what
the
other
things
there
was
an
object,
cache
or
fix
that
that
how
am
I
found
when
he
was
doing
performance
testing,
where
we
fixed
her
to
the
easy
one
but
there's
a
different
variant.
It's
like
they're
still
had
a
duration,
doing
right
back
to
identify
which
objects
for
dirty
or
something
like
that.
Yeah.
B
That's
right
into
that
then
I
think
we
we
discuss
a
little
bit
about.
Maybe
making
a
I'll
be,
can
t
keep
better
track
of
which
objects
had
30
blocks
in
the
first
place,
a
little
even
iterating
through
anything.
B
Some
generic
etiquette
things
that
we
might
want
to
change
for
the
performance
work.
Things
like
adding
more
trays
points
for
what
lab
I'll
be
catcher
and
for
a
little
video
itself
to
see
where
things
are
slowing
down.
B
B
C
Yeah
yep
I
was
gonna,
say
for
talk
about
trace
points
and
tracing
Adam.
Croom
is
looking
at
some
of
that
with
over
the
summer
too
I
think
a
cool
one
of
the
goals
is,
you
could
have
and
exists
a
running.
You
know
mu
kayvyun
process
or
any
other
LaBarbera
client
and
attached
to
the
oven,
sokka
and
start
slurping
off
trace
information.
So
you
can
capture
our
hood
and
then
replay
it
later
model
it
or
do
whatever.
C
Let's
see
an
RB
d,
DF
command
is
something
that
has
come
up
where
r
BD
images
are
firmly
provisioned.
So
if
you
do
SEF
TF,
you
find
out
how
much
space
is
actually
used
on
the
pool,
but
you
don't
know
how
much
you
sort
of
promised
to
users
by
creating
images
that
are
you
know
under
terabytes
in
size
or
whatever
so
RB
gdf
could
have
that
total
separately.
C
C
C
You
can
also
run
l
io
to
re-export
rbd
as
I
skazhi,
but
if
you
want
an
H,
a
situation
where
you
do
multi
path
to
multiple
targets
and
that
sort
of
handled
properly
in
the
back
end
with
locking
and
blacklisting
and
so
forth,
so
you
do
the
failover
in
a
safe
way
and
there's
a
bunch
of
probably
like
pacemaker
stuff.
That
need
to
be
done
to
do
that
correctly
and
I.
Don't
think,
there's
anything.
C
That's
at
least
open
that
that
does
that
right
now,
but
get
having
sort
of
a
robust,
I
scuzzy
back,
is
something
that
people
definitely
definitely
want
to
see
for
now
getting
her
legacy
environments
migrated
over.
B
B
The
interface
mocosa
might
be
like
a
memory
mapped
region
that
you
reading
right
to
like
about
various
receiving
commands
and
setting
responses.
Once
they're
complete.
B
B
C
Might
be
useful
for
the
RVD
or
for
the
most
ease
that
just
have
sort
of
an
ephemeral
map
and
memory
of
that's
sort
of
telling
up
the
iOS
they're
doing
broken
down
by
client
and
that
the
calamari
bids
could
slurp
up
and
and
publish
somehow.
So,
if
you
wanted
to,
you
could
sort
of
get
a
snapshot
of
you
know
over
the
last
10
sec,
whatever
a
minute.
These
are
the
clients
that
are
doing
the
work
work
in
the
cluster.
B
C
B
C
C
What
do
you
think
about
having
the
SD
sort
of
if
they're,
counting
all
the
stuff
it
internally
just
keeping
like
these
are
the
top
10
clients
that
are
doing
I
Oh
with
me
and
then
having
that
be
something
that
Callum
I
or
something
else
could
slip
up
and
aggregate
across
the
cluster.
So
you
can
just
see
these
something
go
to
the
grid
in
like
a
SEF,
io
top
or
whatever
type.
B
D
B
D
B
B
C
F
F
D
D
F
I
mean
in
in
lust,
you
do
it
with
the
job
ID
and
then
you
look
that
up
by
the
scheduler.
But
that's
obviously
like
use
case
specific
and
it's
easy
because
you
already
have
the
schedule
or
server.
D
D
C
Yeah
I
mean
I
wonder
if
it
would
make
sense
to
actually
bump
it
up
a
layer,
and
so
if
the
client
says
I
have
a
workload
ex
that
I
want
to
associate
with
IO
with
it
would
like
to--
mop
monitor,
because
greg
will
beat
me
about
the
head
and
shoulders,
but
somebody
will
say
I
want
to
register
this
this
workload.
I
get.
E
C
D
D
I
you
can
either
compress
it
or
use
the
same
token
trip
sure.
There's,
there's
no
real
need
to
have
a
global
booked
up
today.
D
D
At
that
I,
don't
think
with
you
live,
love,
I
suspect.
If
you
just
compress
to
the
resulting
blob,
you
wouldn't
wind
up
with
all
that
they
can
explosion.
Yeah.
F
D
C
What
else
so
I
impact
our
BD
authorities?
What
other
other
things
that
are.
C
B
Of
the
taifa
template
is
some
this
and
work
on
that
on
the
copy
and
read,
which
means
that
for
clones
I'm,
instead
of
only
doing
a
copy
data
from
a
parent
image,
when
you
do
a
right
you're,
also
when
you're
reading
it,
you
mean
the
entire
object.
Instead
of
just
that
section,
you
need
and
then
write
that
out
to
the
child
image.
So
it's
kind
of
like
an
on
the
online
flattening
rather
than
the
standard
offline
planning,
did.
B
A
B
That
was
great
awful.
B
C
Yeah,
I
just
going
to
say
that,
but
live
already
lets
you
do
fancy
striping
or
instead
of
just
chunking
the
image
across
objects.
You
can
stripe
across
sets
of
objects
with
a
small
stripes
eyes,
which
is
mostly
useful
for
workloads
where
you
have
something
that
is
itself
like
a
journal
like
say
a
database.
That's
doing
a
bunch
of
small
is
sequentially
and
you
want
to
spray
those
across
multiple
disks,
instead
of
just
hammering
on
one
disk
for
a
while
and
then
hammering
on
the
next
disc
object
disk
whatever.
C
So
it's
just
adding
that
support
to
you
the
criminal
side
to
you
right
now,
it
still
just
as
a
chunking,
not
the
fancy,
weird
striping
modes,
that
one
is
a
bit
of
a
bit
of
work,
though,
because
all
of
the
global
stuff,
whatever
yeah
it
would
basically
needs
to
code,
needs,
be
restructure
to
do
more
scattered
and
sort
of
peace.
Peace,
iOS
together
to
map
a
bunch
of
sort
of
parallel
things
to
multiple
requests
and
dispatch
them
in
parallel,
and
it
doesn't
do
that.
Yet
the
file
system
supports
this.
C
Even
though
RB
doesn't
that
the
file
system
does
it
inefficiently,
so
it
will
just
do
like
each
little
piece
at
a
time
as
separate
is
so
it'll
just
go
really
slow.
So
hopefully,
when
we
do
this
work,
we
can
have
that
generic
bit
infrastructure
in
there
to
do
that
effectively.
That'll
been
captured.
Both
the
file
system
in
that
are
we
to
use
cases,
but
that's
ilias,
I
think
on
his
way.
But
again,
so
it's
a
junkie
piece
of
work
will,
by
focusing
on
some
of
the
I'm.
C
E
E
Well,
I,
don't
have
anything
parel
working
yet
so
I
have
I,
guess
what
you
call
a
one-way
one
thing
working
for
the
semi
working
for
the
non
parent
case
still
went
when
there
was
just
a
single
image
without
without
they
need
to
go
to
the
parent,
because
the
parent
can
have
a
different
type
important.
So
that's
the
that's
all
complicated
yeah
yep.