►
From YouTube: Ceph RGW Refactoring Meeting 2023-01-04
Description
Join us every Wednesday for the Ceph RGW Refactoring meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
A
I
want
to
talk
about
the
replication,
our
application,
consistency,
guarantees
and
I.
What
I
have
summarized
here
is
what
I
think
is
true
and
Casey
basically
agrees
with
the
most
important
Point.
We
have
in
this
expectation
that
replication
be
reliable
and
I.
Think
that
essentially
means
China,
actually
transactional
and
consistency,
or
at
least
apparently
so
we
currently
seem
to
have
it.
A
We
currently
have
a
two
level
system
and
the
back
of
the
bucket
index,
the
bucket
index
operation
and
operation
and
the
ingest
operation
that
that
triggers
it
are
are
journaled
so
so
which
is
fully
transactional,
which
is
so
that
if,
if,
if
for
some
reason,
we
can't
log
the
operation,
you
know
the
operation
to
delete
an
object,
we
don't
we
don't
delete
it,
so
that's
and
and
and
then
and
and
and
that's
so,
insofar
as
it
goes.
A
That's
that's
perfect
what
people
expect
from
from
our
consistency,
replication,
however,
the
they're
they're
up,
actually
the
full
replication,
relies
also
on
storing
a
data
log
entry
at
that
point
and
which,
which
is
best
effort
if.
A
Isn't
updated
it?
It
may
not
be
it
may
or
may
not
be
yeah.
There's
a
there's
a
there's,
a
potential
problem.
It
may
or
may
not
surface,
but
but
this
potential
at
that
at
that
point
of
a
stall
and
replication,
because
even
though
the
operations,
but
there
were
the
severe
replicating
has
been,
has
been
reliably
journaled.
We
rely
on
the
data
log
entry
to
to
point
to
The
Shard.
That
needs
to
be
checked
for
for
instinct.
A
Is
that
that
sort
of,
as
things
are
currently
set
up
best,
that's
basically,
that's
basically
irreducible.
A
If,
if
we
need
and
and
if,
if
in
some
in
some
situations,
it
could
be
to
a
bug
but
but
it
doesn't
have
to
be,
it
could
be
due
to
some
sort
of
cluster
or
environmental
error.
Whatever
cluster
partitioning
something
like
some
various
things,
some
sort
of
of
bucker
application
streams
could
could
stall
and
that's
all
could
be
indefinite
now
it
is.
It
is
the
case
that
they're
fully
recoverable,
so
someone
can
run
so
bucket,
sync
run
or
further
ingest
on
that.
A
Shard
will
definitely
continue
the
replication
for
it
and
that's
good,
but
we.
A
Would
happen
so
we
have
it,
so
we
have
a
so
we
have
it.
So
we
have
a
kind
of
a
an
algorithmic
problem.
It
seems
like
we
would
like
to
solve
this
in
order
to
have,
but
the
have
replication
be
reliable.
The
way
ordinary
people
think
I.
Think
of
that
it
seems
like
there's
two
paths
you
could
go
down
to
do
that
we
could
either
we
can
either.
We
can
either
make
the
data
log
up
to
log
of
it
hit
what
the
data
that
part
of
the
data
log.
You
know.
A
The
part
of
the
data
log
operation
that
that
we
have
that
has
that
we
care
about
is
is
is
I,
mean
never
mind.
Let
me
say
that
differently,
because
we
can
either
make
the
make
the
what
we
all
call
the
data
log
part
something
something
effectively
transactional
which,
which
would
be,
which
would
presumably
be
more
expensive
than
what
we're
doing
now
or
the
alternative
would
seem
to
be
that
we
have
they
put.
A
So
you
find
some
other
some
other
way
to
put
an
upper
bound
on
on
how
long
a
bucket
a
bucket
a
bucket
replication
could
be
stopped
and,
for
example,
you
can
just
you
could
you
could
you
could
scrub
all
the
buckets
and
you
know,
and
and
science
on
some
and
some
periodic
fashion
and
research,
anything
that
has
stalled
I.
Think
that's!
That's!
That's
sort
of
the
story
for
me.
It's
sort
of
related
to
that.
You.
A
You
you,
there
are
other
signs
besides
walking
into
rados,
for
the
data
log
that
you
that
you
could
that
you
could
undertake.
You
could
do
more,
you
could.
You
could
have
a
belt
and
suspender
sort
of
a
Blog
kind
of
concept,
for
example,
that's
fine,
but
it
doesn't,
but
but
it
doesn't
help
by
itself.
It
doesn't
solve
this,
the
sort
of
problem
with
it,
but
it
narrows
it
a
lot.
A
D
Mean
I'm
not
certain,
if
that,
if
that's
a,
if
that's
correct
description
of
the
issue,
aren't
we
getting
transactionality
from
the
from
Raiders
itself
like?
If
we
do
a
write,
then
either
the
right
succeeds
or
the
right
fails.
Like
okay,
okay,
hold.
D
Right,
but
if
the,
but
if
the
right
fails
like
there
are
two
things
that
can
happen,
it's
either
a
bug,
and
we
should
fix
that
and
I'd
like
to
know
about
it
sure
or
like
burritos
doesn't
like
currently
in
rgw
with
rights.
Doubt
time
out,
like
things,
should
either
return
success
or
they
should
return
an
error,
because
there
was
an
error
if
the
OSD.
D
A
A
A
D
A
Cases,
but
we
can
we
get
this
data
we're
going
to
take
today.
We
can
call
we
can.
We
can
call
you
know
if
you
just
say
it
doesn't
matter
what
it
is,
there's
other
things
that
can
do
it,
but
it
doesn't
benefit
us,
but
it
certainly
is
infrequent.
I
I
mean
I'm,
not
saying
Mr.
Shame
us,
but
I
think,
but
I
think
we
need
to,
but
I
think
we
need
to
bound
this
this.
This
problem.
A
C
So
the
data
log
entry
would
be
stored
on
the
source,
Zone
and
I.
Think
the
recovery
would
be
happening
on
the
destination
zone,
so
I
think
the
error
repo
is
the
right
place
to
do
that.
Their
repo
is
what
the
destination
uses
to
track
things
that
it
needs
to
retry.
A
Wouldn't
writing
why
wouldn't
this
are
so
into
it?
You
know
all
all
zones
that
can
originate
changes
would
presumably
do
this.
Why
I
mean
well,
it
seems
to
me,
like
the
starstone
would
do
this.
Why
would
why
would
it
be
the
destination
how.
C
D
Was
that
we
already
do
have
a
situation
where
we
like,
when
we
turn
like
when
we
do
research
and
whatnot
after
the
end
of
restarting,
we
just
write
one
entry
per
bucket
charge
to
every
chart
in
the
data
log
and
I'm
wondering
if
we
could
and
I'm
wondering
if
we
actually
could
do
it
on
the
source
Zone
just
by
having
a
timer
for
like
every
10
minutes
or
every
30
minutes
or
whatever
the
source
Zone
just
goes
through,
and
does
that
one
entry
per
Shard
thing.
A
That's
what
that
means
I
mean
all
scrub
strategies
would
look
like
that
in
some
sense,
but
you,
but
you
should
but
you
but
you,
but
if
you
haven't,
whenever
we
come
up
with
has
to
has
to
is,
could
could
be
done?
Could
it
could
potentially,
if
you're
expensive
I
mean,
maybe
so
so
I
haven't
assumed
that
it
would
be
happening
close
to
the
limit
of
latency
yeah.
D
But
it
does
seem
like
we
could,
but
it
does
seem
like
doing
that
thing
where
we
do
the
write,
an
entry
for
every
Shard
thing
that
we
do
after
the
end
of
restarts.
Actually
you
know
on
a
periodic
basis,
might
be
the
lowest
effort
and
Least
Complicated
way
to
do
it.
A
C
So
I
I
think
I,
like
the
scrub
strategy,
the
best
where
we
would
only
trigger
retries
on
things.
If
we
identify
that
it's
behind.
C
E
B
So
so,
maybe
because.
B
Anything
that
that
you
want
to
to
write
in
the
data
log
you
know
can
fail
right,
because
this
is
the
the
initial
assumption
that
something
wrong
happened
here,
but
on
the
destination
Zone,
you
can
say
that
if
you
didn't
get
a
record
for
a
Shard
for
a
given
amount
of
time,
then
you
you'll
do
the
sync
for
this
shot.
A
B
But
instead
of
doing
that
for
everybody,
or
instead
of
of
the
The
Source
will
just
you
know,
tell
to
sync
everything
on
a
periodic
some
periodic
time,
and
that
could
be
even
if
this
time
is
long,
there
could
be
lots
of
shorts.
But
if,
if
the
destination
knows
that
they
didn't
see
anything
for
a
specific
chart
for
maybe
a
couple
hours,
then.
E
A
C
Yes,
so
going
back
to
Sonia's
Point
about
not
having
data
log
entries
I,
the
scrub
thing
wouldn't
touch
data
log
entries.
C
C
So
yeah
I
I
definitely
agree
that
there's
a
gap,
especially
around
crash
consistency
here,
but
I
also
agree
with
Adam
that
we
should
track
down
any
other
reasons
that
data
log
rights
are
failing.
Oh.
B
A
A
E
We
haven't
actually
looked
closely
on
that,
but
but
Casey,
but
at
least
I
think
in
the
bug
which
was
filed
Upstream
by
Bloomberg.
They
did
mention
once
I
think
that
even
after
a
retry
it
fails
when
there
was
some
races.
I
think
that
that
got
fixed
now,
but
yeah.
E
A
D
Yeah
I
mean
I,
don't
think
a
property
to
the
into
the
retries
exhausted
case
like
probabilistically.
Maybe
it
can
but
I'd
be
surprised,
but
I
suspect
it's
very,
very,
very
improbable.
D
I
think
yeah
I
think
I
think
the
case
that
rgw
can
simply
crash
after
writing.
A
after
writing
an
object
is
a
good
thing
to
want
to
recover
from
and
I
think
that
actually
suggests
that
we
do
need
a
scrub
analog
specifically.
A
C
A
D
I
think
I
need
to
rebates
after
I
think
I
need
to
not
rebase,
but
I
think
I
need
to
edit
the
our
current
neo-rated
push
request,
since
it
does
have.
Let
me
call
it
since
it
does
have
an
incompatibility
with
the
new
version
of
of
live
fmt.
A
Also
review
comments
from
Ilia.