►
From YouTube: CDS G/H (Day 2) - RGW: bucket index scalability
Description
https://wiki.ceph.com/Planning/CDS/CDS_Giant_and_Hammer_(Jun_2014)
25 June 2014
Ceph Developer Summit G/H
Day 2
RGW: bucket index scalability
A
B
C
Okay,
thanks
hey
on
I'm,
going
on
I'm,
going
to
present
a
some
work
that
we
are
increased
to
do
in
terms
of
barking
index.
There
was
a
there
was
a
blueprint,
create,
I
think,
in
m
/,
which
took
about
the
barcade
stability
scalability.
You
should
I
really
riddle
scale.
A
sigh
I
think
there
are
a
couple
of
Evil's
lair,
including
the
first
one
is
like
parallel
right
to
a
single
barcade
will
bring
scalability
usual
as
the
bracket
index
object.
C
Parallel
update
does
not
fail
at
OSD
side,
because
all
the
like
the
food
update
copy
operations
will
eventually
upgrade
to
a
single
object.
That
is
to
read
the
like
to
read
the
version,
something
like
that
and
update
the
version
and
put
a
new
record
out
there,
though,
from
our
testing
that
on
the
Cinco
barcade
hood
can
only
scale
to
scale
to
like
10
to
20
RPS.
C
Our
testing,
that
our
testing
showed
that,
with
a
least
three
million
records
IAM
in
the
birthday
that
the
bag
filling
could
take
up
to
three
up
up
to
two
minutes
to
finish
and
and
from
ailee's.
We
also
saw
some
customers
report
that
keep
scrubbing,
also
that
it
keeps
rapping
to
the
parking
index.
Object,
also
store
the
store
all
the
operation
to
the
barcade,
the
operation
I
mentioned.
I
mean
that
for
the
update
copy,
which
need
to
eventually
talk
to
the
birth
index
object
so
those
the
problems
and
they're.
C
There
were
a
couple
of
options:
talk
about
over
x
to
improve
situations,
those
options,
including
that
from
the
original
blueprint,
those
options
include
like
charting
the
party
index,
object
that
it.
That
means
that
instead
of
one
single
object,
my
to
barcade,
we
created
multiple
objects,
leech
map
to
the
same
barcade,
so
so
that
the
update
operation
will
reveal
for
food
abrasion.
We
will
hush
hush
the
object
key
and
see
which
park
index
break
index
chart
it.
It
needs
to
locate
and
only
update
to
that
one.
C
C
So
that
means
that
if
we
want
to
do
a
prefix
listing,
we
need
to
talk
to
a
couple
of
marcus
up
an
a
grade,
a
hive-like
client-side,
because
we
talk
to
multiple
objects,
but
the
benefit
from
this
solution
is
also
a
very
odd
the
scalability
issue
we
like.
Originally,
we
have
20
requests
per
second,
with
lack
of
a
50
shouting.
We
have
one
solid,
that
that
is
more
enough
for
most
use
page
and
like
like.
Let's
say
that
originally
market
has
highs.
C
One
has
like
10
million
on
10
million
records
with
a
shard
in
like
10,
we
have
1
million
records
for
each
object,
that
I
reduce
the
time
that
needed
to
do
scrubbing
back
fueling
and
recovery.
So
on
this
one
option,
another
one
is
another
dimension:
that
is
a
blend
market
which
we
tried
before.
That
is
completely
disabled
parking
index,
because
for
some
user
case
they
don't
need
a
birth
date
listing
feature,
and
for
that,
for
that
use
case,
we
can
just
disable
a
it
has
a
benefit
that
we
don't
we
don't.
C
We
are
not
bothered
by
the
previous
problems.
We
talked
about
like
back,
feel
like
blackberry
scalability,
and
we
also
reduce
the
load
to
the
entire
cluster,
because
each
upload
operation
will
need
to
talk
to
the
birth
index,
object
twice,
which
increase
the
Alps
op's
okie
numbers
significantly
for
put
up
for
polo.
So
that
is
a
brief
introduction
in
terms
of
the
the
problem.
We
have
the
options
we
we
can
explore
to
either
any
question
or
a
comment
on
top
of
that
or
there
anything
I
miss.
I.
D
Think
everything
you
said
is
is
right.
I
think
you've
identified
sort
of
the
two,
the
two
main
solutions,
and
those
definitely
make
sense
to
me-
are
you
you're,
you're
sort
of
your
most
interested
in
going
down
the
path
of
starting
and
not
doing
the
blind
buckets
in
this
case?
For
your
for
your
particular
use
case,
you
want
to
retain
the
bucket
listing
abilities,
I
yeah,.
C
Yeah,
currently
we
we
already
have
a
prototype
that
we
are.
We
we
have
a
configuration
which
is
ten
feet,
I
the
real
scale.
A
configuration
is
set
calm
that
configs
the
number
of
chars
we
need
for
the
barcade.
We
also
put
that
red
chard
information
to
the
birthday
metadata,
which
means
that
if
one
day
we
want
to
to
support
more
showers,
we
just
update
the
configuration
each
does
now:
Briggs
our
regional
markets,
which
has
another
char
number.
That
is
one
soon.
D
Have
you
is
there?
Is
there
a
way
to
make
it
so?
The
the
bucket
creation
can
also
pass
the
number
of
charts,
because
it
might
be
that
dumb
buckets
you
know,
are
going
to
be
small
disease
but
I
one
chart
or
used
to
default.
Another
buckets,
you
know
are
going
to
become
very
big
and
you
want
to
have
a
larger,
a
birth
chart
to
them.
C
D
Have
you
have
have
you
looked
at
adding
a
if
argument
for
the
bucket
creation
operation
lets
you
specify
the
number
of
charts,
yeah.
E
D
D
G
G
A
G
F
F
We
need
to
mean
to
make
sure
let's
support
it
through,
like
that
made
us
get
away
admin
api's,
so
that
when
you
get
the
bucket
metadata,
it
includes
the
membro
shards
for
that
bucket
and
when
you
put
that
and
increase
it
with
appropriate
number
and
as
well
as
we're
adding
the
shard
number
to
the
buggy
index
log,
we
need
to
make
those
readers
gateways
engagement,
aware
of
that
and
able
to
use
that
if
it
exists
or
fall
back
if
it
had
to
be
backward
compatible.
A
D
C
I
can
I
can
get
one
update
for
that,
so
so
for
the
bra
ke
index
log,
but
there's
a
mana
take
increasing
version
number
out
there
store
in
the
berkeley
index
object.
I
hope
that
from
a
client
side
that
he
or
she
can
least
least
all
the
operations
starting
from
one
version,
because
we
shared
a
Rashard
Bertie
next
object
that
breaks
the
original
semantics
of
like
there's
a
single
version
that
is,
money,
money
keep
increasing.
C
So
in
order
to
address
that
issue,
we
issue
the
request
to
do
multiple,
chars
and
an
icon
back
the
short
ID
to
user,
so
that
the
client
hide
the
client
side
which
is
riddled
gateway.
Agent.
Could
I
could
issue
a
request
with
a
marker
which
includes
a
short
ID
together
with
a
version
number
so
that
we
can.
We
can
still
maintain
such
semantics
like
from
a
client
side.
He
or
she
would
know
that
from
which,
from
which
porn
he
wants
to
list
those
operations
afterwards
does
I
make
make
sense.
C
C
C
And
another:
okay
thanks.
Another
proposal
I
would
like
to
to
discuss
is
that
for
certain
used
currently
for
the
listing
operation,
we
issue
the
listing
requires
to
all
the
shots
and
every
data
at
the
client
side,
and
we
preserve
the
s3
api
bjs
a
star
object,
a
prefix
and
any
return
a
certain
number
of
objects,
please.
I
start
object
and
prefix
so
on.
C
So,
as
I
mentioned
at
the
beginning
that
the
treat
of
that
is
that
the
listing
operation
is
a
little
bit
heavier
than
previous,
because
we
need
to
talk
to
more
chars,
so
I
still
get
the
result.
I
think
another
option
we
may
explore
is
that
if
user
doesn't
care
too
much
about
the
prefix
listening,
we
cannot
we
can
actually,
but
rather
they
just
want
to
iterate
all
the
objects
from
within
our
birthday.
Another
option
is
that
we
we
can
start.
C
We
can
implement
a
simple
like
iterator
stuff,
something
like
to
die
to
to
iterate
each
broadhead
and
return
back
all
those
key
all
those
records
one
by
one
like
for
a
single
request.
We
only
need
to
talk
to
one
parking
object
on
which
a
specified
by
user
and
four
corner
cases
which
the
number
he
requested
calling
the
bracket
boundary.
We
need
to
talk
to
and
we
return
each
trunk
one
by
one
and
and
that
dramatically
reduce
the
cost
which
is
needed
by
the
party
listing
on
the
result.
C
D
Yeah,
that's
exactly
it's!
It's
an
unordered
listing
at
that
point
right,
like
you're,
getting
it
yeah,
but
not
defined
order.
So
I
guess
the
techie
challenge,
then,
is
that
your
position
is
no
longer
just
an
object
name,
but
it's
something
that
might
be
somewhat
opaque
is
probably
gonna,
be
a
shard
ID
and
then
a
crime.
C
C
Yeah
yeah,
we
can
actually
hide
the
information
like
for
each.
We
can
use
something
like
the
user
different
metadata
or
primita,
something
like
users
identify
a
starting
point:
v
tracer,
oh
and
we
can
be,
and
we
can
I,
don't
echo
back
the
next.
The
next
like
metadata,
which
you
need
to
to
starbies
something
like
that,
because
we
only
need
to
maintain
of
another
perimeter,
which
is
the
birthdate
chart,
a
tidy.
A
D
Think
that
makes
sense,
so
let
me
just
make
sure
I'm
understanding.
You
basically
want
to
add
a
new
list
operation.
That's
like
list
unordered
that
and
the
and
the
offset
is
just
sort
of
an
opaque
string.
That
in
reality,
is
going
to
be
the
char
tidy
and
the
object
name,
but
yeah
I
just
want
to
make
it
undefined
from
your
perspective,
yep
just
to
keep
it
simple,
so
we
can
change
it
later.
Yeah.
A
A
D
A
C
For
like
for
our
use
case,
we
just
want
to
the
park.
A
listing
is
not
used
by
online
service,
but
for
operation
for
all
creations,
so
that
we
don't
kyer
the
order,
but
we
want
to
list
all
the
records
from
from
one
bracket
so
on
do.
In
that
case,
we
can.
We
can
use
the
way
we
just
talked
about
to
iterate
all
the
records,
but
we
don't
care
about
order
so
that
can
decrease
the
low
to
the
system
with
the
same,
almost
as
in
without
I.
Think.
D
C
Yeah
yeah,
I
think
the
default
the
default
command
that
that
does
not
change
so
user
from
from
users
perspective.
He
or
she
does
use
our
regionally
by
if
he
doesn't
care
about
order,
it
can
use
some
like
a
customer
ID
metadata.
Something
like
that.
Do
you
achieved
the
purpose
as
he
he
wants
to
achieve
without,
like
reduce
load
to
the
system.
D
E
D
I
said,
I
think
I
think
what
the
suggestion
was
that
in
some
cases
you
would
want
to
the
system
the
regular
list
command.
That
traditionally
is
ordered
to
return
something
unordered,
which
just
sort
of
odd
does
violate
the
contract.
I
guess,
but
I
would
have
to
be,
you
know,
not
the
default,
but
mostly.
F
E
Application
level
would
have
to
make
no
assumptions
about
that.
So
for
that
to
be
useful,
the
application
level
would
have
to
be
such
that
it
actually
doesn't
notice
if
they
come
back
yeah,
which
will
be
chillin
around
or
some
well.
People
necessarily
know
whether
their
application
layer
wire
is
that
sometimes
you
give
me
the
option
so
well:
okay,
sometimes
we've
yeah.
A
E
A
A
A
D
Would
be
a
very
good
thing?
Okay,
don't
forget,
especially
if
we
can
wean
people
off
the
default.
1I
have
a
sorry.
This
is
going
back
a
little
bit,
but
I
have
a
question
going
back
to
the
bucket
index
starting
and
the
and
the
bucket
index
log.
I
wonder
if
there
are
there
instances
where
we
want
to
shard
them
are
learning
census
or
I'd
want
them
to
be
stored
differently,
sharded
differently.
C
F
G
D
D
D
C
I
think
on,
if
we,
if
we
can
light
user,
specifies
a
shard
strategy,
it
should
be
pretty
straightforward
to
implement,
because
when
user
accreta
birthday,
you
cannot
become
specified
that
he
would
like
to
blend
parking,
and
we
can
serialize
that
information
inside
information
to
the
break-in
metadata
and
for
all
the
parkade
relate
the
operation
like
could
upload
listing.
We
just
we
just
read
that
information
and
decide
what
what
we
need
to
do
most
most
likely
ignore
the
operation
just
skip
like
the
burning.
That's
update,
listing
so
yeah.
A
D
A
D
C
C
D
One
is
very
generally,
you
have
I
mean
the
right,
no
right
writing
up
the
blueprint
design
discussing
on
the
list
and
then
having
a
pull
request.
That's
all!
That's
all
right,
I!
Think,
generally,
you
want
to
break
it
into
small
pieces,
though
so
yeah
I'm,
looking
at
the
the
current
pull
request,
and
it's
just
one
patch
that
has
a
couple
thousand
lines
or
I.
Don't
know
many
hundreds
of
lines
updated
on
so
you'd
want
to
break
that
into
a
patch
that,
like
you
know,
adds
the
additional
fields
to
the
bucket
the
structures
yeah.
D
You
know
I
patch,
that
will
store
and
retrieve
it
a
patch
that
will
set
it
when
you
create
the
bucket
a
patch
that
will
let
you
specify
it
when
you
create
it,
create
the
bucket
via
the
metadata
and
then
a
series
of
patches
that
will
support
it
for
listing
and
then
for
the
put
and
again
whatever.
So
it's
so
that
it's
all
you
know,
bite-sized
chunks
that
are
sort
of
conceptually
tempered.
A
D
And
then
it's
just
a
matter
of
you
know,
cornering
josh
or
yahoo
to
on
IRC
and
and
and
getting
getting
it
reviewed.
But
I
think
that
the
smaller
the
pieces
that
the
easier
it
is
to
review
because
you
can
do
it,
you
know
yeah
some
pieces
and
so
that
will
help
a
lot.
Yeah.
A
D
No
problem:
okay,
yeah,
where
I
think
or
I
think
everybody's
very
excited
to
see
this
work
done
to
cause
assassin.
You
guys
are
not
the
only
people.
Obviously
that
hit
this
problem.
I
think
the
one
thing
I
guess
would
be
sort
of.
My
comment
is
as
we're
doing
this.
We
should
have
a
little
bit
of
an
eye
towards
the
future
because
someday
eventually
the
the
goal
would
be
that
when
you
create
a
bucket,
it
has
one
shard
and
then,
when
it
as
it
gets
big,
it
will
magically
the
gateway.
D
Will
dynamically
split
it
into
you
know
for
shards
and
then
16
or
something
so
it'll
Harding
will
sort
of
be
auto-tuning,
I
guess,
and
so
we
just
we
don't
have
to
do
that
now.
Obviously,
but
you
want
to
keep
that
in
mind
so
that
we
don't
paint
ourselves
into
a
corner
and
make
it
harder
to
solve
that
that
problem
later
I
guess
yeah.
G
Yeah,
in
particular
things
we
want
to
think
about
that
when
clicking
an
unordered
listing,
I
was
gonna,
bring
this
up,
but
you
got
there
so
because
if
the
unarmed
listing
is
specifying
an
offset
in
a
shard
and
there's
a
shard
of
a
change
in
the
shard
number
going
on,
while
that's
happening,
I'm
not
really
sure
how
that
would
interact.
As.
F
F
F
A
D
E
G
G
E
F
F
F
G
G
That
is
that's
a
good
one
to
him.
Unfortunately,
I,
don't
remember
it
anymore.
Have
you
looked
at
that
I'm?
Sorry,
I
can't
see
your
name
right,
but
if
you
look
at
that
Guang
oh.
C
Yeah,
I
don't,
I
don't
think
multipower
multipop
hour,
upload
break
anything
in
terms
of
the
original
implementation,
because
that
is
not
some
something
manager
by
the
par-tay
index,
though,
like
like
how
the
houses
manifest
our
manager
is,
is
another
another
layer
or
another
object.
I,
don't
see
anything
that
is
related
to
birth
index
all
day.
A
G
D
Well,
if
it
were
a
problem,
let's
just
assume
for
a
second
that
we
do
need
to
like
update
all
the
parts
when
you're
doing
the
final
put
or
whatever
to
assemble
them.
That
just
means
that
when
the
parts
are
placed
in
the
index
or
when
they're
sharted,
we
need
to
do
that.
Based
on
what
the
final
object
name
is
going
to
be
yeah,
which
I
don't
know.
If
that
works
in
I
thought,
you
get
to
begin.
What
of
it?
D
D
D
For
slip
that
one
work,
but
maybe
in
Swift
that
atomicity
is
handled
differently
and
it's
not
a
problem
so
because
yeah
yeah
cousin
Swift,
it's
actually
their
their
their
first
class
objects.
It's
not
like
they're,
hidden
or
anything,
I,
think
I,
don't
really
remember
the
stuff,
I
think
I
think
they're
just
regular
objects
and
then,
at
the
very
end
you
do
manifest.
This
is
by
the
way.
This
is
a
meta
object
that
includes
all
these
other
ones.