►
From YouTube: 2019-01-02 :: Ceph Developer Monthly
Description
Monthly developer meeting for the coordination of Ceph project development.
http://tracker.ceph.com/projects/ceph/wiki/Planning
A
A
A
B
We'll
see,
wasn't
there
a
couple
of
things
from
now?
Are
there?
Oh?
Maybe
he
added
oh
yeah
there
we
go.
Yes.
B
A
C
It
going
didn't
I
have
a
quick
question
for
you
before
yes
thing
yep,
when
you're
adding
a
new
monitor
the
I
was
thinking
that
you
would
be
able
to
do.
The
minimal
def
Kampf,
with
just
mon
english,
like
just
mon
host
sequel.
A
A
A
A
C
A
C
C
C
log
and
then
skyhook
in
Jeff,
Southern
Cal
to
you,
okay,
so
we
tried
to.
We
have
like
a
lot
of
use
cases,
but
we
try
to
boil
it
down
in
like
pretty
easy
to
digest
so
I,
don't
know
which
one
is
first
on
the
list:
I
don't
have
the
CDM.
Is
it
the
PG
one
yeah?
Could
you
also
just
pace
it
er
than
that
chat?
Okay,
yeah
I
said
so.
There's
a
couple
use
cases
for
this.
C
Really.
What
we
like
to
do
is
you
have
to
define
some
custom
deviate,
ops,
very
good,
analogous
to
like
object,
class
methods
and
we'd
like
them
to
run
with
the
same
atomicity
semantics
as
object
operations
and
I.
Think
that
is
already
the
case
yeah,
although
I'm
not
sure
and
then
the
two
use
cases
we
have
that
you're
pretty
compelling.
Are
we
have
some
cases
where
we'd
like
to
do
we'd
like
to
put
a
guard
and
in
front
of
object
operations
but
we'd
like
to
do
it
scalably
at
the
pool
level?
C
So
we
don't
want
to
iterate
over
all
the
objects
to
update
this
value.
That's
being
checked
on
that
object,
io
path.
So
this
is
like
there's
four,
this.
C
So,
like
the
simplest
case
is
that
we
have
like
an
application-level
epoch,
value
or
version
value
that
we're
controlling
and
then
the
app
is
going
to
go
around
and
like
update
this
value
on
each
PG.
But
then
all
of
the
object
operations
would
do
a
read
and
just
make
sure
that
their
assumptions
are
correct.
It's
really
a
scalable
way
to
do
some
really
lightweight
synchronization
between
between
clients,
I
think,
there's
a
case
in
skyhook
where
we're
interested
in
doing
collection
level
locking.
C
But
we
need
to
do
this.
We
want
to
just
do
the
coordination
through
Rados
by
collection.
You
mean
kratos
pool
yeah,
but
it's
it's
not
necessarily
at
that
granularity.
It's
it's
like
arbitrary
groupings
of
objects,
so
would
be
like
a
table
level
lock
in
sky
oak,
but
that
table
is
like
in
number
of
objects.
Yeah.
A
The
problem
here
is
that
I
think
using
the
word.
Scalable
is
misleading.
You
can
get
to
the
point
where
you
basically
push
the
enumeration
over
objects
into
the
OSD,
but
you
still,
it's
still
owned
objects.
It's
just
the
OSD,
that's
doing
the
iteration
and
you're,
not
everything
across
the
wire.
We.
C
Don't
need
to
change
the
state
of
all
the
objects,
so
we
don't
want
to
we're
not
talking
about
pushing
an
OP
to
the
PG
level
and
then
doing
a
local
iteration
over
the
objects.
Okay,
we'd
like
to
have
like
a
value
that
says
you
know
like
food
version
in
the
PG,
there's
only
one
value
physically
one
value
of
this
and
then
in
the
I/o
path
for
an
object
operation.
We
would
just
read
this
one
value
from
the
PG
I
see
so
just.
D
C
D
A
C
C
A
It
sounds
like
what
you
want
to
do.
Is
you
want
to
set
like
able
food
epic
equals
12,
and
you
want
to
set
that
sort
of
globally
for
the
entire
dataset
right
and
then
F
and
then
have
Reyes
up
guards
that
check
the
pool
attribute
equal
to
twelve
or
greater
than
equal
to
or
whatever
do
your
little
comparison
operation
and
having
it
started
by
PG?
Actually,
it
seems
Artur,
because
then
you
are
when
you
set
it,
you
have
to
set
it.
A
A
C
D
D
D
A
A
A
D
A
D
A
A
D
C
Think
for
I
mean
it's
it's
unclear.
C
Well,
let's
I
mean
maybe
it
would
help
to
separate
the
two
cases,
so
I
think
there
is
a
case.
So
in
the
zeal
in
case
it
really
is
a
very
tiny
amount
of
metadata
per
pool,
and
it's
not
gonna
scale
up.
It's.
It's
just
constant
I
think
that
you're
you're
right
about
that
that
might
not
work
for
all
the
different
use
cases
that
exist
in
skyhook,
like
where
you
have
one
for
a
Postgres
table
right
right.
A
But
what
would
happen
in
skyhook
if,
instead
of
having
one
epic
per
table,
you
had
a
global
epic.
A
C
A
A
I
had
one
it
would
be
nice
to
basically
reach
for
something
like
watch
notify
where
the
clients
are
using
a
cooperative
scheme
to
coordinate
their
rights.
The
problem
there
is
that
and
there's
sort
of
a
fundamental
issue
with
watch
notify
where,
when
the
client
goes
away
and
fails,
do
you
have
to
like
rely
on?
You
have
to
wait
for
time
for
a
certain
amount
of
time
for
the
operations
to
timeout
before
you
like
go
into
some
recovery
mode
or
something,
and
that
might
not
be
appropriate
for
a
data
piece.
A
C
C
C
C
E
E
E
E
C
C
But
fundamentally
the
objects
are
storing.
Relational
was,
but
you
end
up
in
this
situation,
where
you
need
to
actually
have
a
lot
more
intelligence
in
at
the
storage
level
than
just
reads
and
writes
so
we're
going
to
be
controlling
things
like
transactional
consistency,
though
there's
a
lot
of
metadata
flying
around
that
we
need
to
track,
and
some
of
that
metadata
is
harder
to
scale
than
others,
and
so
there's
things
like
table
at
Wapping,
which
is
a
coarse-grained
solution,
but
it
just
gets
more
come.
A
C
C
A
C
A
E
A
C
D
The
stage
like
it's
asking
for
a
list
of
all
the
objects
that
satisfy
that
really
condition
is
something
I'm
a
lot
more
interested
in
and
then
we've
been
more
comfortable
in
the
past.
So
in
fact
we
actually
have
something
sort
of
like
this
there's
a
P
GLS
filter
mm-hmm.
D
If
you
guys
have
seen
this
and
I,
don't
remember
what
all
it
matches
with
it
might
only
be
named
based,
but
I
know,
rgw
uses
that
for
doing
some
searches
and
the
file
system
does
for
what
it's
looking
for.
Think
directories
to
do,
effort
to
do
it's
backwards,
grub
and
repair
stuff.
A
But
it's
it's
doing
it's
it's
a
numerating
objects
and
then
checking
each
object,
though
it's
not
actually
an
indigo
yeah.
A
D
A
A
Does
it
live,
it
doesn't
live
inside
the
PG
because
PG
split
and
merge-
and
you
don't
want
to
have
to
rewrite
and
resource
the
index
and
in
fact,
in
general,
you
probably
it's
not
yeah.
That's
not
an
efficient
thing
to
do.
A
A
But
that
means
that
the
we're
basically
just
pushing
all
the
atomicity
transactional
stuff
down
and
rocks
to
be
which
works
fine
today,
you
think
it
works
fine
today,
because
we
have
rocks
to
be
underneath-
and
it's
just
you
know,
has
its
own
big,
global,
lock
and
so
em,
you
know,
aren't
hitting
it
quite
hard
enough
that
it
has
mattered,
or
maybe
we
argue
about
that,
but
this
will
have
the
same
issues
basically
and
that
you
have
independent
operations
on
different
PGs
different
objects.
A
That
of
both
have
to
atomically
update
this
shared
index
structure,
that's
global
to
the
OS
team,
that's
sitting
underneath,
and
so
that
might
cause
more
contention,
even
though
I'm
like
today
zoasty.
But
it
makes
life
really
hard
when
you
start
thinking
about
the
sort
of
the
future
crimson
toasty,
where
we
want
to
be
much
more
explicit
and
much
more
strict
about
the
sharding
so
that
we
can
really
shard
more
aggressively
across
the
BG's
mm-hm.
A
A
E
A
You
were
to
do
this
by
actually
having
/
PG
indexes.
That
means
that
when
you
merge
to
VG's,
you
have
to
do
this,
they
merge
of
these
two
structures.
That's
gonna,
be
a
big
latency,
blip
or
any
split.
You
have
to
actually
the
same
thing
and
so
far
we've
done
a
pretty
good
job
of
making
split
and
merge.
Not
do
that.
The
exception
emerged
has
a
bit
of
a
flip,
but
it's
it's
rare,
but
it's
it's
yeah.
It's
not
a
it's,
not
a
price
on
an
order.
A
A
Mean
we
could,
if
you
words,
if,
if
we
knew
that
there
was
like
a
maximum
degree
of
charting
in
the
pool,
for
example,
we
could
just
like
pre
shard
the
index
so
that,
if
you
have
one
PG,
you
actually
have
a
hundred
little
indexes
to
update
and
you
any
particular
object
is
calling
it
a
modify
one
of
those
charts
and
when
you
do
a
query,
you
sort
of
walk
across
them
or
if
you
had
an
index
structure
that
sort
of
naturally
lended
itself
that
being
restarted
at
the
same
I.
C
A
B
A
D
I
think
you
want
to
think
real
hard
about
what
exactly
you're
trying
to
push
into
the
sub
cluster
is
what
you're
doing
on
your
own
cuz
like
it
sounds
like
you.
What
are
you
talking
about?
Pg
metadata
or
like
PG
level
indexes
you're,
basically
asking
for
for
cross
object,
atomic
operations,
and
you
think
that
you're
like
saying?
Oh,
it's
on
the
PG.
So
it's
it's
free,
but
that's
just
not
that's
not
as
true
as
you
might
like
it
to
be
somewhat.
D
Maybe
try
and
figure
out
like
what
kinds
of
operations
you
need
to
do
and
if
we
can
break
it
into
smaller
pieces
that
maybe
we
can
make
happen
well
yeah
like
we,
don't
have
anything
that
works
on
people
really
that
works
on
a
PT
level,
it's
accessible
to
people
that
maybe
we
could
come
up
with
something
we
were
like.
D
Okay,
we
want
a
PG,
op
and
so
like
we're
like
and-
and
we
know
what
versions
of
these
ten
objects
we're
working
on,
and
so
like
run
a
thing
on
these
ten
objects
and
store
the
data
in
this
other
object.
That's
also
there
like
I,
don't
know
I,
don't
like
it.
I
haven't
thought
about
this.
We
works
I'm,
not
saying
that
will
work,
but
that's
a
lot
more
plausible
than
what
I've
heard
so
far.
D
Okay,
say
just
being
really
nicely
talked
about
the
snapper
just
like
okay,
so
we
have
a
snapper
that
sort
of
is
like
this,
but
like
the
degenerate
case,
it's
like
it's
its
stores
of
religious
virgins
for
objects,
but
only
when
those
virgins
exist
and
join
us
all.
Anyone
yeah
the
entire
Rados
codebase
not
just
like
a
set
of
random
data,
and
it
works
pretty
well
now,
but
it
didn't.
A
It's
it's
it's
hard
because
I
can
see.
A
million
uses
for
a
reverse
index
would
be
a
pretty
powerful
thing.
At
the
same
time,
we've
managed
to
build
a
bunch
of
like
robust,
complete
stuff
on
top
of
the
sort
of
the
more
primitive
stuff
that
readers
does
today,
and
so
it's
it's
hard
to
I.
Don't
know
it's
hard
to
argue
that
we
need
to
make
it
more
complicated
because
it's
already
very
complicated
and
the
challenge
yeah.
C
A
A
E
D
A
D
D
No,
this
would
I
mean
this
would
have
to
be
a
thing
that
runs
on
a
PG
and
just
like
if
it's
doing
back
or
something
it
just
blocks
until
it
has
all
the
objects
but
not
not
across
VG's
and
I,
think
we
can
make
that
work
more
easily,
because
we
can
just
strap
them
to
their
data
and
say
yes,
it
matches
and
we're
not
back
nor
you
know,
we
know
which
Politis
lives
into
these,
so
we're
good.
It's
just
a
split
and
merge
the
thing
that.
C
Okay,
let's
play
bridge
Oh,
consider
that
yeah,
probably
okay,
all
right
all
right!
Well,
I,
don't
want
to
do
all
on
this
room
too
long.
I
think
that
was
good
though
so
the
next
one
is
pretty
simple.
I
think
there's
a
we
have
some
use
cases
for
allowing
some
data
to
be
returned
when
OSD,
ops,
mutate.
C
A
Hammer
Jewell
I
was
in
juvie,
I
can't
look
at
while
ago.
Okay,
so
the
yes
pretty
luminous
I
think
it
was
anyway
that
so
it
goes
in
the
PG
log
yeah
there's
where
the
error
code
is
recorded
and
when
you
send
a
it's,
because
the
rights
have
to
be
idempotent.
So
if
you
reset
it
up,
it's
the
PG
log
or
the
adjacent
structure.
That's
consulted
to
see.
If
that
operation
is
a
replay
and
if
so,
then
it
returns
the
same.
C
A
That
it
recorded
so
it's
pretty
easy
to
extend
that
to
include
a
payload.
It
just
means
that
the
PG
log
has
the
potential
to
get
bigger
for
about
16
bytes
I'm,
not
too
concerned
because
there's
already
PT
log
entries
already
have
a
bunch
of
other
stuff
and
Alicia
could
have
pools.
For
example,
they
have
a
copy
of
all
the
attributes
for
the
object
know
for.
A
Time
you
write
something
like
that,
or
maybe
just
a
change
once
I
can
remember,
but
it's
like
on
that
order.
So
it's
a
or
at
least
it
did
in
the
old,
maybe
with
the
it's
possible
with
the
new
overwrite
greatest
stuff
it.
We
don't
need
that
anymore,
but
previously
we
didn't,
but
in
case
yeah
it
would
be
reasonable.
I
think
to
include
it
there
as
long
as
it's
bounded
and
I.
Think
yeah
16
been
five
sounds
reasonable
enough.
B
A
Last
one
or
the
one
that
failed
and
not
for
the
intermediate
ones-
and
there
are
some
tests
that
are
doing
rights
and
looking
at
the
return
value
for
the
individual
operations
they're
just
right
now.
It's
unit
tests
that
fail
on
the
M.
So
if
you
have
an
ill-timed
reset,
we're
asked
to
resend
that
up
and
it
hits
the
replay
whatever
returns
a
return
value
than
that
the
intermediate
the
operation
code
doesn't
match
mm-hmm
and
it's
the
same
basic
problem.
A
You
get
at
50,
not
just
16
bytes
worth
yeah
I'm
practice.
You
don't
really
actually
return
those
right,
yeah
I,
don't
know
if
anything
really
needs
it.
So
it's
like.
Maybe
we
should
care
to
that,
but
but
it
does
exist
and
there
is
a
you
know:
a
functional
test
that
tests
that,
in
the
liberators
test,
suite
or
whatever.
C
I
think
there's
I
think
there's
well
so,
for
my
specific
use
case,
I
want
a
64-bit
number
plus
some
metadata
and
that
to
fit
in
sixteen
bytes,
but
16.
Bytes
also
seems
like
enough
that
if
you
really
wanted
to
return
a
larger
amount
of
data
that
was
atomically
consistent
with
an
update,
you
could
stash
it
in
the
object
and
just
return.
Some
sort
of
unique
identifier.
B
D
C
D
A
D
A
I
mean
this
could
even
be
something
that,
like
you
said
as
a
pool
property
or
something
right
with
this
pool
operations.
Have
this
much
allowed
right,
payload
sure
and
the
main
purpose
of
it
is
just
to
like
prevent
a
poorly
behaved
operation
from
putting
like
two
Meg's
in
the
buffer
list,
then
returning
it
and
blowing
up
yeah
I.
E
A
B
A
That
we
current
right
now,
we
we
toss
it
on
the
floor.
You
could
just
keep
it
if
it's
uncap
it
at
whatever
the
max
sizes.
That's
right,
it
seems
like
all
of
the
plumbing
is
already
there
yeah
yeah,
except
for
storing
in
the
PG
log
interest.
He
and,
in
the
same
time,
we
fetched
the
return
code
when
we're
doing
a
replay
out.
We
also
would
fetch
that
it
would
be
pretty
easy
enough
alone.
A
C
A
Made
sense
at
the
time
because
or
reads
it
made,
it
made
sense
for
reads
and
I
think
it
reads
and
writes
are
sort
of
similar,
and
so
the
writes
got
them
for
free
without
really
thinking
about
it,
but
the
way
that
the
ops
are
set
up
for
each
up.
You
have
an
input
and
output
buffer
and
a
return
value,
and
you
like
to
have
n
in
them
in
your
transaction,
yeah,
so
I
think
for
reads:
it's
generally
useful
right.
D
A
D
A
In
practice,
they're
usually
always
zero.
So
we
could.
We
could
fix
that
bug
just
by
storing
only
the
non
zero
ones.
And
yes,
if
you
put
a
bazillion
right
ops
that
are
returning
non
zero
return
codes
in
the
same
transaction,
then
you'd
have
to
peach
other
countries,
but
maybe
that
doesn't
matter
I
don't
know
or
we
could
just
make
them
non-existent
for
for
right
operations.
I.
D
A
D
Memory,
it
was
something
else
like
about
being
able
to
build
the
transaction
efficiently
with
either
were
like
that
was
set
that
late,
but
just
doesn't
make
sense
either
because
that
doesn't
seem
like
it'll
be
hard,
but
maybe
it
was
a
problem
when
I
got
fixed
up
as
a
result
of
the
earlier
work.
I
don't
know,
but
like
people,
people
tried
to
do
this
before
and
not
been
successful.