►
From YouTube: Ceph RGW Refactoring Meeting 2023-07-05
Description
Join us every Wednesday for the Ceph RGW Refactoring meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute
What is Ceph: https://ceph.io/en/discover/
A
B
C
About
well,
in
fact,
it
was
performances
in
the
state
isn't
a
safe
way
to
describe
this.
The
the
Raiders
team
has
sort
of
sort
of
sort
of
sort
of
relative
to
where,
where
for
things
where
the
other
well,
how
this
feature
or
home
itself
was
defined
when,
when
this,
when
the
teacher
was
was
being
designed
and
other
features
like
it,
they
considered
there
to
be
a
a
a
much
as
much
as
there
already
is
for
object
data
and
our
an
absolute
Max,
which
will
never
be
significantly
changed
on
this.
C
Both
the
size
of
an
object
and
and
sort
of
can
map
is
a
maximum
number
of
omap
keys.
That's
currently
200
000.,
it's
never
going
to
be
much
larger
and
it
will
not
be
allowed
to
change,
because
the
the
design
of
radius,
replication
and
Recovery
depends
on
putting
a
fixed,
fixed
up
or
bound
on
the
amount
of
data
that
has
to
be
transferred
during
recovery.
A
C
That
customers
are
raising
the
I'm
I'm
using
I'm
using
it
in
a
mathematical
sense.
Maybe
it's
bad
mathematician
language
too,
but
it's
a
it's
it
the
thing
the
point.
The
point
is
not
really
fixed,
but
the
point
can
never
move
far
from
where
it
is,
though,
the
the
the
design,
the
design
the
design
limit
will
not
move.
A
Right
and
a
lot
of
the
cases
where
this
has
been
a
problem
are
places
where
we
were
treating.
Omap
is
an
unbounded
set
rather
than
partitioning
it
like,
for
example,
bucket
index
resharding
does.
C
Yeah
and
research
workspiring
for
certain
things
it's
and,
and
but
we
we,
we
we,
for
example,
wouldn't
consider
it
I,
wouldn't
consider
a
good
solution
here
and
I
think
I
think
going
forward.
I
I
I've
gone
through
several
Cycles
on
this
I
I.
Think
there
isn't
I
understand
what
I
would
try
next
to
tackle
this.
C
Basically,
a
two
level
low
map
index
managed
by
a
library
that
deals
with
the
concurrency
yeah,
with
coming
with
reader
writer
and
and
update
consistencies,
but
doesn't
but
but
leaves
it
but
leaves
it
all
in
omap,
rather
than
mixing
in
the
problem
of
of
implementing
some
sort
of
own
or
Auto.
You
know
commercial,
off-the-shelf
or
or
open
source
off
the
shelf
database
technology.
That's
that's
a
whole
different.
Those
those
I
think
those
I
think.
On
balance,
those
those
two
approaches
have
to
be
split
up
bar
split
apart.
C
You
know
in
a
world
that
we
were
not
relying
on
readers
for
certain
things.
We
could
be
doing
whatever
we
want
for
scalable
metadata,
but
we
need
some
strategy
on
radius.
D
C
It's
not
well,
it
is
but
I
guess
and
no,
but
but
yeah.
Well,
it
is
this
I
am
proposing
a
strategy
similar
to
sharding,
but
charting
based
on
hash
is
isn't,
is
not
all
requirements,
so
some
other
strategy
likes
like
range
splitting
is
probably
they
need
it
as
an
alternate.
That's
all
I'm
actually
saying.
B
B
We
can,
we
can
also
maybe
take
the
fifo
approach.
B
C
I,
don't
think
that
works
in
searching
and
sorting
terminology,
but
but
but
I
mean
a
technique.
That's
the
technique.
That's
used
in
cockroachdb
is
simple
range
splitting.
C
If,
if,
if
we,
if
we
tolerate
it,
if
we,
if
we,
if
we
simply
if
we
simply
if
we
came
up
with
a
way
to
to
allow,
for
example,
a
small
cache
of
of
lru
management
or
whatever
you
choose
at
a
particular
client
endpoint
to
to
manage
a
two
level
index,
you
know,
in
fact,
in
fact
it
could
be.
It
can
be
shared
with
with
that.
C
That's
the
fun
part,
but
I
mean
it
is
I,
think
it
is
achievable
or
worth
adjusting
radio
scriveness
which
to
get
there
if
necessary,
but
largely
impossible,
especially
with
return
with
return,
run,
read
and
stuff
like
that.
But
but
I
mean
you
can
you
can?
You
can
then
have
at
least
150
000
or,
let's
say
200,
let's
say
200
000.
If
you
take
it,
we
take
behind.
At
her
word,
we
have
200
200
000
times,
200
000
objects.
C
Pearl
map
range
that
you
do
have
a
different
different
omap
objects,
supporting
it
to
the
one
object
that
names.
The
group
that's
but
that's
over
40
billion.
That's
all
that
is
roughly
four
is
40
billion
objects,
which
I
think
is
a
large
enough
scale
to
handle
almost
anything
you
want
to
stick
in.
B
Do
we
need
sorting
for
for
the
for
the
buckets.
C
Sure
we
need
sorting
in
general,
yeah
I
mean
I
mean.
If
you
don't
need
it,
you
could
use
you.
C
Could
you
know
you
could
you
could
use
hash
splitting
and
maybe
maybe
this
idea
can
be
converged
but
but
if
Eric
is
here
and
Eric
can
talk
about
the
complexity
of
of
converting
the
the
partition
indexes
into
into
an
ordered
sequence,
but
as
is
done
for
bucket
listing
it's
expensive
and
it's
been
prone
to
more
complex
bugs
than
you
know
than
anyone
has
a
right
to
partly
because
of
other
other
tricky
semantics
that
were
layered
in
with
special
kinds
of
objects
in
the
range
and
stuff
like
that,
but
seriously
it's
expensive.
C
Facility
does
this
particular
feature
struck.
You
know
completely,
you
know
solve
it.
You
know
we're
going
to
mandate
it
no,
but
but
the
general
problem
space,
where
you
have
an
ordered
sequence,
that
we
want
that
we
want
to
scale.
That's
real
and
the
only
way
to
scale
into
radius
is
is,
is
to
split
into
into
independent
partitions
that
that
will
that
will
be
distributed.
The
ordinary
you
know
in
the
ordinary
rules
of
Kratos.
That's
the
only
way
you
get
both.
C
A
So
I'm
I'm
curious,
it's
not
clear
to
me
from
the
API
back
about
list
buckets
whether
that
actually
requires
ordering.
C
D
D
A
C
Yeah
I
mean
I,
mean
I,
I
think
this
is
again
as
I
said
in
my
email
to
you
probably
are
a
great
way:
Adam
I,
I,
I,
I,
I
I,
don't
I,
don't
think
I,
don't
think.
We've
worked
from
the
API
from
this
I
also.
You
also
made
the
case
that
we
don't
have
to
do
this
at
all,
because
it's
because
Amazon
says
that
100
buckets
per
user
is
just
fine,
our
users,
don't
think
so
so
so
it
may
be
that
we
have
to
that.
C
We
that
we
have
to
look
at
some
other
way
to
to
to
deal
with
to
deal
with
that,
because
yeah
fascinating
a
large
number
of
objects
that,
though
they
didn't
have
imagination,
because
it
because
it
assumes
that
you're
not
gonna,
have
very
many
buckets
per
user
I,
don't
I
doubt
that
will
be
permanent,
but
I,
don't
think
I,
don't
think
our
users
customers
want
to
be
there.
This
has
been
persistently
raised
over
multiple
years,
so
so
so
I
think
it's
I
think
it's
relevant.
C
C
This
may
be
a
case
where
yeah
I
think
I
think
for
whatever
reason,
I
think
I
think
the
Persistence
of
people
wanting
to
do
this
is
real
I.
Think
Amazon
is
going
to
want
to
solve
this
at
some
point.
It
may
not
look
the
same
as
our
solution.
I
I
did
not
flatter
myself
that
they
read
anything
we
do,
but
if
they
did,
maybe
we
could
even
socialize
it
to
Amazon.
B
That
does
pagination
impose
order,
I,
don't
think
so.
I.
A
C
D
C
A
D
C
D
C
We
don't
look
at
this
where
we
do
look
at
this
engagement
is
the
details
and
it
has
a
lot
of
red
has
to
write
up
in
this
email
thread
that
I'm
on
that's
that's
sophisticated
is
the
way
we
manage
quotas
and
some
other
stuff.
I
mean
this.
This
bucket,
these
are
bucket
sequence,
is,
is
doing
more
things
than
than
just
bucket
listing,
but
it
those
those
are
more
like
credit
operations
and
may
not
have
any
ordering
constraint,
but
but
then
again
they're.
C
A
Right,
similar
to
the
way
that
we
accumulate
bucket
stats
for
objects
in
a
bucket,
we
also
at
the
user
level,
accumulate
user
stats
for
each
of
their
buckets.
So
there's
the
like
the
stats
caches,
the
quota
caches
in
the
background
that
periodically
flush
and
and
do
rights
to
this
but
I
think.
A
C
Different
well,
probably
not
because
the
rate
well
I
mean
no,
because
the
rate
limit
is
as
we
defined
it,
and
those
are
entirely
transient,
but
this
is
for
this
is
this
is
for
memorized
essential
effect.
You
know
persistent
data,
reliable
data,
reliable
mappings,
right.
C
A
All
right
so
I
mean
maybe
we're
discussing
kind
of
longer
term
designs
around
this,
but
maybe
in
the
shorter
term
it
would
help
to
add
some
more
documentation
around
the
existing
user,
Max
buckets
field
and
warning
about
the
constraints
of
raising
or
removing
those
limits.
A
B
C
A
Yeah
I
mean
in
in
terms
of
implementation
for
scaling,
I
I
do
think
the
sharding
module
is
what
I
would
look
at
first,
but
maybe
the
first
step
would
be
trying
to
just
document
all
of
the
constraints
and
where
we
do
the
reads
and
writes
and
which
ones
we
really
want
to
optimize
for.
C
A
C
But
we're
not
we're
not
hitting
billions,
we're
hitting
about
500
000.,
no,
maybe
or
not-
that's
not
true,
maybe
five,
sorry,
500
million,
maybe
yeah
I,
know
and
I'll.
You
know
I'll
Flash,
that's
a
that's
a
that's
a
lip!
That's
that's!
A
limit
that
needs
to
be
surrounded,
probably
shows
up
elsewhere,
but
that's,
but
it
does
work
up
to
a
point,
but
but
as
we
could
as
we
get
there,
I
can
discuss
in
detail.
I
mean
the
different
different
Cycles
going
around
it.
C
We
we
stopped
Optimus
optimizing
at
a
point
at
a
point
where
optimizing,
the
the
linearizing
of
the
shot
of
the
hash
sharding
was
was
becoming,
was
ceasing
to
deliver
any
benefit
but
was
becoming
incredibly
complex
and
it
already
involves
distal
occupation,
in
other
words,
that
yeah.
A
A
Right,
I
could
also
try
to
take
some
of
the
notes
in
my
email
and
format
them
and
share
them
with
the
list,
and
we
can
continue
discussions
on
on
a
design
there.
B
Yeah
so
I
I
started
to
look
at
that.
The
idea
is
that
currently
it
requires
a
restart
of
the
rgw
and
I
want
to
avoid
that.
B
So
in
case
you
suggested
that
I
would
look
at
the
watch
notify
mechanism
that
is
used
for
the
for
the
zone
for
the
period
reload
and
I've.
Looked
at
that
I
mean
it
it
should.
It
should
be
quite
straightforward
and
simple
to
do
that.
B
What
I
wasn't
sure
there
were
two
things.
First
of
all
whether
I
like
what
is
the
object
that
I
need
to
create
for
the
notifying
whose
responsibility
is
to
create
this
object,
because
the
the
notifications
are
coming
from
the
release,
Gateway
admin
and
received
by
the
release,
gateways
and
I.
Don't
think
it
makes
sense
that
the
rest
gateways
would
create
the
object,
but
the
redis
Gateway
admin
creates
the
object
only
when
somebody
wants
to
add
a
package,
so
I'm
I
wasn't
sure
about
that.
The
how
the
mechanism,
the
work.
A
A
A
B
That,
okay,
for
some
reason,
when
I
call
the
watch
I
get
no
end.
Okay,
I'll
have
a
look
at
that.
My
my
first
thought
there
was
that
I,
don't
I,
don't
really
need
I.
Don't
really
need
this.
B
This
watch
notify
mechanism
and
I
thought
I'd
just
check
periodically
on
the
on
the
version
of
the
object
and
see
if
the
version
changed
then
I
need
to
to
read
it
and
and
do
the
installation
so
and
that
I
think,
because
I
already
have
like
a
background
thread
which
which
I
need
anyways
for
that
for
the
watch.
B
Notify
mechanism
to
work
I
need
that,
because
I
need
to
support
the
realm
reload
mechanism,
so
I
will
always
have
the
right
readers
pointer
there
so
because
the
whole
thing
doesn't
work,
it
works
in
the
background.
Doesn't
work
based
on
on
Beast
requests
so
because
it
works
in
the
background.
I
need
something
that
supports
the
whole
radius
Recreation
mechanism.
So
if
I
put
that
in
some
background
thread,
anyways,
then
I'm
not
sure
I
need
the
the
watch
notify
mechanism.
I
can
just
do
some
simple
polling
to
see.
If
I
need
to
do
this,
reload.
A
I
mean:
is
this
really
worth
the
effort
compared
to
just
saying
when
you
change
the
packages
restart
rgw,
the
reload
versus
a
restart
is
not
that
different.
In
the
first
place,.
B
It
is
I
mean,
restart,
really,
kills
the
process
and
starts
that
all
over.
This
is
an
outage
in
in
the
system.
My
reload
doesn't
give
almost
any
outage,
because
it's
not
like
I'm
not
doing
the
pause
resume.
Reload
I'm,
just
installing
a
couple
of
packages
on
their
local
disk.
A
B
I,
don't
know,
but
I'm
not
using
this
reload
I'm,
not
I'm,
not
using
this
reload
I
need
I
need
somebody
that
is
aware
of
this
reload
so
that
whenever
I
go
and
read
the
object
and
try
to
install
the
packages,
then
I
will
have
the
right
radius
pointer
and
not
the
the
old
one,
but
other
than
that.
I'm
not
really
doing
a
Reload.
I
mean
this.
This
package,
reload,
is,
is
just
just
invoking
some
some
external
command
that
changed
stuff
on
disk
doesn't
do
anything
else.
A
B
B
Think
about
the
case
where
you
have
a
script
that
relies
on
some
packages
and
then
you
change
it
to
a
script
that
relies
on
other
packages
that
are
not
installed
yet
or
maybe
you
install
the
packages
first
and
then
you
change
the
script,
so
I
mean
anyways
they're
going
to
be
some
period
in
which
the
script
and
the
packages
are
not
going
to
match.
So
whether
this
is
because
the
director
is
now
locked
or
busy
or
something
doesn't
really
matter,
I,
don't
think,
there's
a
real
issue.
B
There
I
mean
there's
an
issue,
but
you're
gonna
have
because
the
whole
thing
is
very
non-atomic,
then
you're
gonna
have
an
issue
anyways,
so
either
your
current
I
mean
probably
the
right
way
to
do.
That
is
just
to
remove
all
scripts.
Do
the
install
and
then
put
the
new
script
in
and
and
that's
it
so
and
you
can
do
that
without
any
interference
to
the
operation
of
the
rgw.
B
B
Now,
while
I'm
changing
the
file
on
this
directory,
if
law
script
and
trying
to
read
those
files
in
order
to
because
they
have
dependencies
with
the
external
packages,
there
then
they're
not
going
to
work,
but
there's
no
atomicity
between
the
the
scripts
and
the
packages
installed
on
disk
anyways.
So
the
probably
the
right
way
to
do
that
is
to
remove
the
scripts
install
the
packages
install
the
new
scripts
and
that's
it.
B
So
I'm
not
going
to
pause
resume.
The
only
reason
I
I'm
concerned
about
pause
resume
is
that
this,
the
the
code
that
installs
the
scripts
needs
to
read
the
list
of
sorry.
The
code
that
installed
the
packages
need
to
read
the
list
of
packages
from
from
some
system
object,
and
for
that
we
need
The
Radars.
So
it
needs
the
right
one.
In
case
somebody
called
really
reload
on
the
on
the
rgw.
At
the
same
time,.
A
C
A
B
A
B
So
if
you're
saying
that
the
notify
command
should
create
the
object,
then
I'm
just
gonna,
investigate
and
see
why
it's
not
happening
and
maybe
just
just
skip
this
mechanism.
B
Because
what
I,
if
the,
if
the
watch
command,
need
to
be
retried
in
the
background,
then
it's
really
pointless
and
I'll
just
read
the
object
in
the
background
and
see
if
you
change
and
then
do
the
work.
Instead
of
relying
on
the
on
the
notifications.
B
A
Notifies
are
definitely
an
issue
that
I
saw
in
the
realm
reload
stuff,
and
it
has
special
cases
for
that.
B
Yeah
so
I
I
guess,
if
we,
if
we
block
the
the
red
escape
to
admin
until
all
where
this
Gateway
is
replied,
then
at
least
from
that
admin
nobody
would
resend
another
notified.
B
Okay,
there
was
another
another
small
thing
that
I've
noticed.
There
is
the
question
of
zipper,
so
the
because
of
identification.
This
code
is
really
really
specific,
but
the
code
is
doesn't
sits.
It
sits
above
the
zipper
line,
so
I
wasn't
sure.
A
Yeah,
as
far
as
I
know,
we
don't
have
any
any
analog
for
watch,
notify
for
DB
store
or
or
other
backends.
C
Well,
there's
certainly
the
intent
to
create
it,
but
whether
we
have
to
put
the
card
but
put
that
ahead
of
other
things.
I,
don't
I
wouldn't
say,
but
I
mean
like
my
goal,
is
to
is
to
try
out
you
know
the
the
right
is
as
a
potential
solution
for
group
Communications
that
you
would
use
for
that,
but
as
similar
it
can
have
similar
semantic.
But
we
can
use
other
things,
but
some
we
have
to.
We
have
to
introduce
something
and,
like
it,
I
think
we
should
look
at
that.
E
E
D
B
C
E
C
Well,
well,
here
I
mean
I
mean
watch
that
if
I
happen
to
use
rhinos
because
rados
was
there
and
rados
would
always
be
there,
but
but
you
know
but
but
looking
at
it,
you
know,
considering
that
you
know
that
it's
actually
a
notification
between
all
the
nodes
in
in
the
RSW
cluster
routing
through
the
routing
through
with
a
back
end,
is,
is
acceptable,
but
by
no
means
required
by
the
concept.
The
other
ways
I
would
do.
It
would
not
use
radius
yeah.
C
A
Abstractions
to
store
this
list
of
Lua
packages
and
the
blue
Iraq
stuff
to
actually
install
those
is
above
the
line
that
just
relies
on
zipper
to
read
the
list.
A
I
think
it
might
make
sense
to
extend
that
existing
API
to
also
be
able
to
pass
in
some
Observer
interface.
And
if
the
back
end
supports
notifications,
then
it
could.
It
could
call
your
callback
based
on
watch
notify
for
rados.
B
E
A
C
A
A
So
yeah
you
you
mentioned,
watch
notify
for
realm
reloading
and
I.
Eventually
that
will
be
abstracted
somehow,
probably
in
the
config
store,
which
stores
the
period
realm
stuff,
but
I
think
that
some
store
back
ends
just
won't
support
notifications
and
won't
Implement
that
I
think
that'll
be
okay,
but
similarly.
E
B
E
A
If
dbstore
had
any
kind
of
metadata
caching,
then
that
would
come
up
or
that
invalidate
messages,
but
it
doesn't
cache
metadata
currently
so
you've
all
has
this
discussion
helped.
Do
you
see
a
Way
Forward.
B
Yeah
I'll
investigate
a
little
further
on
the
watchmother
fancy.
Why
it's
not
working
for
me
and
would
create
those
those
abstractions
at
least
for
so
at
least
the
the
Lua
package
notifications
will
be
abstracted.
A
All
right
sounds
great
yeah.
It
might
help
to
look
at
the
the
use
of
watch
notify
for
metadata
cache
to
see
whether
it
has
to
actually
create
things.
I
assumed
that
that
just
watching
would
create
the
objects
Maybe
I'm
Wrong.
B
I
guess
they
know,
there's
no.
There
shouldn't
be
any
harm
if
I
try
to
create
the
object
or
like
I,
don't
want
to
write
anything
to
the
object
when
I'm
just
watching,
but
if
I
can,
if
I,
just
kind
of
open
the
the
object
for
rights
that
should
create
it.
If
it
doesn't
exist
right.
A
B
B
Nope
I,
don't
think
so.
Really,
foreign.