►
From YouTube: Ceph RGW Refactoring Meeting 2023-03-08
Description
Join us every Wednesday for the Ceph RGW Refactoring meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
B
Yeah,
so
this
came
as
part
of
the
end-to-end
tracing
PR,
but
I
mean
the
the
topic
is
more
more
General,
so
the
thinking
end-to-end
tracing
is
that
tracing
itself
is,
is
not
radio
specific
I
mean
the
traces
and
everything
they
they
could
be
used
anywhere,
but
they
need
to
be
used
in
in
the
end-to-end
recipient,
the
somewhere
in
the
code
that
they
need
to
be
passed
through
the
different
address
apis
and
eventually
being
serialized
into
the
into
the
messages.
B
So
there
has
to
be
some
some
radius
code
that
that
knows
about
tracing
now.
B
This
work
that
Omri
did
and
what
he
did
is
that
he
added
the
trace
to
the
cell
object
because
kind
of
the
the
object
can
can
hold
the
trace
and
accumulate
all
kind
of
things
and
information
to
trace,
and
eventually
the
object
goes
into
radius,
and
you
can
take
the
trade
serialize
that
and
send
it
over
now.
B
As
part
of
a
recent
refactoring
word
that
Casey
did
he
kind
of
removed
calls
to
the
seller
from
Raiders,
which
makes
sense,
because
Raiders
is
a
specific
implementation
of
cell
and
I
mean
why
a
specific
implementation
need
to
touch
the
generic
apis
that
it
implements
right.
B
He
doesn't
need
to
because
it
just
implements
them
so
so
this
this
makes
sense,
but
I
I
ran
here
into
an
issue
with
the
fact
that
I
need
to
touch
something
that
exists
only
in
cell,
because
it's
not
radio
specific
but
I
need
to
touch
that
and
access
that
from
within
radius,
now
kind
of
wondering
what
would
be
the
the
approach
in
a
specific
case
and
maybe
in
the
more
generic
way.
So
specifically,
I
can
solve
that
in
all
kinds
of
ways.
B
For
example,
I
can
I
can
do
whatever
we
did
with
the
with
the
logging
like
with
with
the
with
the
DPP
and
just
add
the
trace
to
all
the
apis.
B
All
the
function,
all
the
function,
definitions
and
let
it
kind
of
trickle
through
all
the
function,
calls
until
it's
being
used
somewhere
in
the
radius
call
I
mean
this
can
be
a
major
change
to
many
many
many
many
functions
and
and
to
be
honest,
I
do
think
that
it
could
make
sense
to
have
a
trace
as
an
attribute
or
a
member
of
the
object.
B
So
there
are
kind
of
two
two
reasons
for
me
for
not
to
do
that.
Not
only
that
it's
going
to
be
a
huge
amount
of
work,
but
also
I
I,
think
a
trace
could
be
a
member
of
the
object,
so
in
this
case
I'm
not
sure
what
would
be
kind
of
the
right
approach
here
like
do,
we
have
an
equivalent
to
a
cell
object
that
lives
in
the
redis
implementation
that
we
can
just
move
the
trace
there
and
or
or
maybe
maybe
you
do
take
some
some
different
approach.
A
Right
so
there's
the
rgw
cell
rados,
which
has
a
derived
class
rados
object
which
would
have
access
to
that
Trace.
But
the
issue
is
passing
it
from
that
into
all
of
the
rgw
rados
code
so
that
it
can
hook
link
up
with
our
calls
into
liberatos
right.
B
Okay,
so
there
is
an
a
radius
object
that
derives
from
cell
object.
That
has
the
trace
already.
B
B
So
when,
when
I
kind
of
reach
the
libretus
layer,
then
I
just
passed
the
trace,
but
to
get
the
this
from
from
the
seller
through
our
radius
kind
of
code
to
liberate
us
I
would
need
to
somehow
pass
the
the
radius
object.
A
So
I
guess
you
mentioned
rgw
ABS,
a
struct
which
just
has
the
bucket
and
the
object
name,
but
rgw
rados
does
also
have
some
nested
classes
like
bucket
and
object.
B
A
A
B
So
I'll
probably
try
to
to
do
that.
I
mean
the
reason
I
I
mentioned
rgwrps,
because
I
mean
the
the
the
the
code
kind
of
broke
without
any
kind
of
merge
conflict.
But
there's
actually
can
you
know
it
just
doesn't
compile
after
the
rebase,
because,
like
we,
we
kept
the
same
name.
So
there's
something
called
an
object
and
in
the
in
the
before
the
refactoring
the
what
I
would
the
object
was
a
sell
object.
B
So
if
I
was
doing
object,
dot
get
traced,
then
I
would
just
get
the
trace,
and
now
it's
still
the
object
in
the
in
the
code,
but
it
doesn't
have
a
get
Trace
function
because
object
is
just
an
rgw
out.
So,
as
you
said,
it's
just
a
name.
So
I
guess
it's
not
like
everywhere.
In
the
code
where
we
had
the
cell
object,
it
was
replaced
with
a
radius
object
like
in
many
cases.
It
was
just
replaced
with
the
simpler
struct
that
just
has
the
the
name
of
the
bucket.
A
A
A
B
B
If
not,
then
then
rgw
object
is
a
rather
specific
implementation
of
something
it's
not
something.
It's
not
like.
It
doesn't
break
the
the
layering.
If
rjw
videos
use
it
because
algebra's
is
the
implementation
I
mean,
does
the
rgw
radius
class
doesn't
inhabit
from
like
the
cell
driver
or
something
anyways
or
is
it
completely
separated?
It's.
B
C
C
B
C
No
lebrjw
uses
Sal,
so
all
the
front
ends
you
sell,
so
you
could
take
the
RG,
the
rgw
rados
implementation,
pull
it
out
and
write
something
else
on
top
of
it.
I.
A
Yeah
I,
don't
think
we
have
a
real
use
case
for
that,
but
maybe
it
would
help
if
I
talk
about
some
of
the
motivation
for
my
changes
in
rgw
rados.
Recently,
okay,
linking
a
separate
PR
into
the
agenda,
50148.
A
C
A
Well,
the
the
rgw
radar
subclasses
for
object
and
bucket,
which
the
read
and
write
operations
go
through,
would
be
a
good
place
to
stash
a
trace.
That
at
least
the
reads
and
writes
would
have
access
to
but
you're
right
that
there's
a
lot
of
functions
in
our
GW
rados
that
aren't
related
to
those
classes
and
would
need
to
be
threaded.
A
B
I
mean
without
adding
that
to
the
object.
I
remember
that
was
extremely
difficult
to
do
that
because
it's
it
wasn't
like
a
straightforward
kind
of
like
a
stack
of
functions,
calling
one
another
I
I
can
I
can
look
at
that
again,
but
at
least
this
is
what
Omri
said
that
without
putting
that
into
the
object,
that's
going
to
be
quite
difficult,
but
I
mean
I.
Guess
it
should
be
possible.
It's
all
functional,
like
there's,
there's
nowhere
in
the
code
that
we
store
object
like
in
a
container
right.
A
B
A
E
B
B
Right
I
mean
the
the
the
pr
was
just
to
do
like
the
most
basic
implementation
or
the
most
basic
functionality,
just
to
show
that
it
can
work,
and
you
can
have
a
choice
that
is
rgw
and
OSD
together,
but
it
was
really
from
functionality
perspective.
It's
very
limited.
The
whole
idea
would
just
demonstrate
that
this
is
possible
in
working.
B
Right
and
and
this
class
is
being
constructed
from
rgw
cell
right
so
or
anywhere
from
the.
A
B
C
Yeah,
so
the
biggest
users
of
this,
for
your
case
would
be
the
cell
objects,
read
subclass,
read
op
subclass
and
then
the
writers
there
are
a
few
other
places
in
rgwranos
in
sorry
in
the
Rado
store
that
that
use
this
behind
the
scenes.
B
Okay,
yeah
I'll
I'll
have
a
low
coming
LC.
If,
if
this
is
sufficient,
then
I'll
just
do
that
and
later
on.
If
we
want
to
trace
more
things,
then
figure
out
other
places.
E
Ap
yeah
yeah
this
this
is
pretty
quick.
I
forgot,
I
even
put
this
on
the
agenda.
You
I
remember
in
the
keys
cap
PR
that
CR
one
of
the
functions
in
that
class
was
calling
think
it
was
user
creation
via
the
admin
API,
and
you
mentioned
that
that
class
doesn't
get
called
by
anybody
and
it
should
be
removed.
I
was
wondering
if
how
trivial
removing
that
class
would
be,
and
if
there's
anything
else
along
those
lines
that
should
that
would
need
to
be
removed.
A
E
A
E
And
yeah
I
yeah
I,
don't
mean
to
there's
no
overall
up
in
my
mind
in
terms
of
the
adding
cap
PR
in
this
I
just
get
a
yeah,
maybe
I'll
go
about
and
delete
that
class
and
poke
around
a
little
more
and
see
what
the
other
classes
are
in
that
file
and
at
least
open
up
a
PR
for
deleting
that
one.
A
A
Yeah
I
think
there
are
still
other
uses
of
stuff
in
that
file
in
cloud
tiering
and
maybe
Cloud
sync.
So
it's
not
all
unused,
but
at
least
user
creation
was
just
Pub
sub.
E
E
A
A
All
it
currently
uses
is
the
Ragweed
test
Suite,
and
it
basically
starts
by
installing
an
older
version
of
saf
runs.
Ragweed
prepare
on
an
older
version,
then
does
some
upgrades
and
then
runs
the
reg
weed
check
step.
A
So
there's
only
it's
kind
of
a
two
by
two
Matrix.
During
the
install
step
we
either
install
Pacific
or
Quincy,
and
then
during
the
upgrade
sequence,
we
either
upgrade
osds
first
or
upgrade
rgw's
first.
A
This
can
this
can
uncover
some
upgrade
issues,
especially
around
uploads
and
multi-part
uploads.
But
it's
not
a
ton
of
test
coverage,
so
I
have
a
man
I'd
like
to
talk
through
some
options.
I
guess
I
feel
like
some
kind
of
S3
test
coverage
would
help,
but
our
S3
tests
are
specific
to
a
sep
release.
A
So
I
don't
think
that
during
an
upgrade,
we
could
really
guarantee
that
either
the
previous
test
cases
or
the
new
test
cases
would
all
pass
right,
because
sometimes
we
change
test
cases
to
fix
bugs
and
that
bug
is
fixed
on
the
new
release,
but
not
on
the
old
one.
So
the
test
case
is
going
to
fail
on
one
or
the
other
makes.
E
C
The
problem,
though,
is
that
as
three
tests
is,
is,
does
the
test
fixture
for
every
test,
so
you
can't
run
the
setup
and
then
upgrade
and
then
run
the
tests
right.
It's
all
one
self-contained
thing,
so
you
would
do
nothing
upgrade
and
run
the
tests,
and
so
all
you're
testing
is
that
the
upgrade
didn't
completely
break
rgw.
A
Yeah,
like
we,
we
get
testing
of
the
current
release
from
rgw,
verify
and
other
Suites,
so.
A
Hey
I'm
not
sure
that
it
would
be
really
testing
anything
interesting
if
we
just
run
S3
test
at
the
end.
E
C
Because
because
S3
tests
doesn't
have
a
setup
right,
each
individual
test
case
sets
its
own
thing
up,
and
so
in
order
to
run
a
test,
it
sets
it
up,
it
runs
it
and
then
it
deletes
it
and
then
the
next
test
sets
up
runs
and
deletes,
and
then
the
next
test
sets
up
runs
and
deletes.
So
there's
no
setup
stage
we
can
run
before
we
upgrade.
C
A
A
E
A
E
C
Right
so
the
idea
is
that
you
would
start
the
workload
it
would
be
spread
across
all
the
rgws,
and
then
you
would
start
one
by
one.
Upgrading
the
rgw's
and
the
workload
would
continue
to
be
spread
across
the
mall
so
that
in
transit,
some
of
them
are
old
and
some
of
them
are
new
and
by
the
end,
they're
all
new.
E
A
A
Adam
recently,
you've
talked
about
using
warp
for
testing
like
this.
Do
you
think
that
would
be
a
good
tool
for
an
upgrade
Suite
like
this
honestly?
Don't
want
to
touch
cause
bench.
D
Yeah
I
think
so
I
mean
I've
used
warp
for
testing
multi-site
for
the
most
part
just
to
get
a
bunch
of
objects
to
make
sure
they
end
up.
Where
they're
supposed
to.
D
A
A
A
But
multi-site
tests
are
not
all
reliable.
So
there's
not
much
point
in
in
running
those
currently.
B
You
mentioned
the
Tempest
testing
that
split
into
setup
and
and
test.
A
C
A
Yeah
I
think
that's
a
good
idea.
Maybe
we
could
start
just
with
an
audit
of
the
test
cases
that
are
there.
I
know
that
there
are
a
number
of
multi-part
upload
tests.
A
What
new
Huda
added
support
for
storage
classes?
He
added
test
cases
for
that.
A
B
Just
like
maybe
a
question
about
Ragweed:
does
it
I
mean?
Does
it
assume
the
restart
cluster
like
the
S3
tests,
or
does
it
have
like
a
separate
mechanism?
There.
A
A
The
only
main
difference
is
that
it
also
needs
to
be
able
to
import
the
rados
python
module
because
it
can
actually,
you
know,
inspect
the
underlying
rados
layer
to
verify
things
which
is.
B
Interesting,
what
I'm
thinking
is
that
you
know
just
maybe
even
as
an
exercise
I
mean
I
can
take.
Let's
say:
I
have
this
file
that
does
all
the
quantification
tests
and
like
if
you
look
at
each
other
tests
like
there's
a
section
at
the
beginning
where
you
do
setup
and
then
there's
a
section
later
on
where
you
actually
run
the
test.
B
I
can
fit
that
into
functions
and
if
I
can
import
those
tests
for
ragweed
I
can
just
execute
them
into
steps,
but
all
the
well,
it's
not
a
lot,
but
there's
some
kind
of
underlying
mechanics
that
assume
that
we
have
some
kind
of
a
cluster
and
we
have
some
some
functions
that
we
can
get
the
the
host
name
and
the
keys
and
all
that
stuff.
B
C
B
Because
there's
a
huge
huge
amount
of
of
tests
in
S3,
even
in
the
specific
notification
Suite
I
have
I,
don't
know,
maybe
50
tests,
rewriting
50
tests
would
be
lots
of
work
right
and
also
I
mean
those
tests.
It's
not
only
rewriting
it's
like
code
duplication,
which
could
be
pretty
difficult.
If
you
know
sometimes
you
have
to
you,
know
they're
bugs,
and
you
have
to
change
the
test
because
you
fix
bugs-
or
you
add
some
new
stuff,
so
keeping
two
sets
of.
B
If
you
talk
about
S3,
this
is
like
hundreds
of
tests
into
separate
places
that
that's
not
going
to
work
so
I
I.
Can
you
know,
give
it
a
try
and
see
if
this
could
be
kind
of
easily
done?
Also
incrementally
done,
then.
Maybe
that
could
kind
of
rapidly
fix
the
issue.
A
I
mean
I,
don't
know
that
Ragweed
really
needs
full
coverage
of
all
the
features,
but
we
might
look
at
kind
of
obvious
things
that
might
break
over
an
upgrade
and
and
try
to
have
at
least
one
thing
that
you
know
creates
a
topic
on
an
old
version
and
make
sure
that
it
can
still
be
used
after
an
upgrade
things
like
that.
B
Yeah
everything
that
touches
lots
of
metadata
but
I
mean
it's
not
only
identification.
It's
also
all
the
SDS
and
I
am
all
that
stuff.
I
mean
it's.
It's
all
metadata
that
if
we
change
the
structure,
then
you
know
we
might
break
the
the
upgrade
so
and
sure
we
don't
need
to
cover
all
kind
of
identification.
B
We
don't
we
don't
care
if
it's
Kafka
or
mqp
or
whatever,
because
it
doesn't
matter,
but
even
looking
at
SV
and
and
all
that
stuff,
there's
quite
a
bit
of
of
tests
that
are
doing
exactly
what
it
could
could
break
like
setting
lots
of
mid
data
or
kind
of
system
objects,
and
all
that
stuff
and
and
those
exactly
are
the
places
where
we
we
can
suspect
that
things
won't
work.
C
So
I
think
there's
some
overlap,
but
I
don't
I,
don't
necessarily
think
they
all
apply
because
take,
for
example,
life
cycle
right
for
life
cycle.
What
we
want
to
do
is
we
want
to
set
the
life
cycle
config
to
an
upgrade
and
then
check
the
like
cycle
config
to
make
sure
it's
the
same,
but
in
the
S3
tests.
C
B
What?
What?
If
what?
If
we
change
the
code
in
a
way
that
you
know
so
I
I
tell
what
I
did
what
I
did
when
I?
What
kind
of
did
the
the
cleanup
removal,
Pub
sub
I
I
changed
the
structures
I
removed
a
lot
of
lots
of
metadata
from
the
structures
that
was
completely
unnecessary
or
is
completely
unnecessary
in
the
new
version,
because
we
don't
support
Pub
sub
anymore
and
we
don't
need
this
unnecessary
information.
So
I
actually
I
broke
the
structure,
but
in
a
way
that
the
code
should
should
be
able
to
handle.
B
A
So
it's
probably
worth
mentioning
that
we
do
have
some
test
coverage
through
the
the
den
coder
testing
of
our
encoded
formats.
Essentially
hsf
release.
We
compile
a
corpus
of
the
existing
object
encodings
and
in
the
new
release
we
test
that
we
can
decode
the
new
version
from
the
old
stuff
which
which
does
help
but
yeah.
B
I
mean
that
that
would
find
that
would
find
like
exception
saying
oh
but
I
mean
if,
for
example,
you
decode
and
it
says
it's
an
old
version
and
I
don't
know
I,
don't
support
this
version.
Then
the
code
is
successful.
Right
I
don't
support
this,
but
maybe
this
is
not
what
you
what
you
intended
intended.
Not
only
that
I
mean
you
could
decode
successfully,
but
something
is
wrong
with
the
values
and
you
have
no
way
to
do
that.
B
A
B
A
E
A
Where
issues
show
up
is
where
you're,
where
you're
interoperating,
for
example,
rgws
and
cls's
of
different
encoding
versions,
where
you
know
just
protocol
level
stuff,
can
show
up
errors
where,
where
the
encoding
by
itself
doesn't.
A
So
that's
that's
where
I
think
the
Ragweed
coverage
of
osds
running
different
versions
than
rgw's
is
really
useful
just
to
test
the
CLS
stuff,
but
also
I,
think
testing
mixed
versions
of
rgws.
So
you
know
writing
stuff
on
one
rgw
and
verifying
that
you
can
still
access
it
on
another
rgw
running.
A
different
version
could
also
be
really
useful.
C
A
Yes,
but
but
still
I
I
worry
that
we've
written
the
test
cases,
assuming
that
we
have
the
fix
from
you,
know,
a
fix
that
was
in
Reef
for
both
rgw
and
OSD.
And
so,
if
we're
running
a
test
case
that
assumes
both
but
in
a
mixed
configuration,
then
it
might
fail
and
we
would
kind
of
expect
it
to
fail.
A
Well,
what
if
the
rgw
fix
relied
on
a
fix
and
and
CLS
also
and
if
you're
running
against
an
older
OSD
that
doesn't
have
it,
then
it
wouldn't
pass
I
mean
you
could
you
could
potentially
run
the
older
version
of
S3
tests
always
but
I?
Don't
think
that
there's
a
a
configuration
where
we
would
guarantee
that
all
of
the
test
cases
would
succeed
so
I
mean
maybe
we
could
start
tagging
things
and
and
whitelisting
failures
that
we
expect
during
upgrades,
but
I
think
that
would
get
complicated,
especially
over
you
know
several
different
releases.
A
I
mean
it's
definitely
worth
thinking
about
that
for
new
features
that
we're
merging
and
I
try
to
incorporate
that
into
a
review.
But
you
know
actually
building
test
cases
into
Ragweed
for
specific
stuff
that
we're
adding
I
think
could
help
catch
a
lot
of
those
kind
of
issues.
A
E
The
other
thing
I
was
going
to
ask,
is:
do
we
know
of
any
either
Downstream
customers
or
Upstream
users
that
have
had
specific
issues
between
upgrades
that
we
haven't
been
testing.
A
I
yeah,
definitely
probably
not
so
much
recently,
I
think
we're
doing
a
little
better,
but
some
kind
of
issues
like
that
have
come
up
with
multi-site
reshard.