►
From YouTube: Ceph Crimson/SeaStor OSD Weekly 2020-11-04
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
I
I
think
it's
a
it's
safe
regarding
to
to
the
concern
that
some,
some
unrelated
and
client
requests
could
be
could
sneak
in
when
the
original
one
is
is
still
on
fly,
so
they
are
not
likely
to
be
interfered
with
each
other.
So
I
think
that
there
are
two
pending
issues.
I
I
will
let
shahan
elaborate
on
them
later
later
on,
but
one
of
them
I
could
help
out
with
it.
A
That
is
the
recovery
of
the
locked
object
context,
because
to
ensure
that
the
the
current
request
are
correctly
sequenced,
we
introduce
a
object,
a
content,
lock
concept
as
we
did
in
the
classical
ocd.
But
but
when
the
pg
acting
set
changes,
we
we
need
to
unlock
the
object
context.
A
So
I
think
probably
the
right
way
is
to
to
use
the
with,
with
lock
primitives,
to
ensure
that
the
unlock
is
always
called.
So
I'm
trying
to
refactor
the
existing
with
underscore
objects
of
the
context
block
to
use
with
lock,
primitive
and
that's
it.
B
Hi
everyone,
so,
for
the
past
week
I've
been
trying
to
build
my
my
pr
regarding
eio
handling,
and
I
stumbled
across
many
compilation
errors
which
suggested
to
me
that
perhaps
the
the
place
I
chose
to
actually
do
the
handling
is
wrong.
Well
in
the
in
my
draft
pr,
I
made
it
in
pg
back
end
and
after
trying
to
build.
B
So
the
after
talking
with
radek,
shuan
and
kefu,
I
think
the
the
right
move
would
be
to
to
move
it
to
to
pgc,
to
pg
dot
cc
and
and
have
a
small
portion
of
it
in
client
request
when
we
need
to
to
restart
the
up.
If
we,
if
we
return
e
again
so
yeah,
that's
that's
pretty
much.
It.
A
A
B
So
I
tried
to
I
tried
to
look
yesterday
and
my
my
understanding
of
of
this
state
is
that
it
just
means
that
in
in
in
the
next
scrub
it
will
try
to
repair
the
pg.
But
I
I'm
not
I'm
I'm
not
sure
if
we,
if
it'll
be
blocking
any
further
requests.
C
D
C
C
That
written
and
I'll
try
to
have
a
look.
B
A
D
And
I'm
consistent
considering
the
next
tasks,
so
I
have
a
question:
do
we
need
a
separate
tree
for
the
collection
to
save
to
store
the
see
node
information.
D
For
the
collection,
because
collectors
store
the
c
note
information
in
the
drugs
db
in
bluestock.
So
what's
your
select
name,
information.
A
A
I
think
for
for
enumerate
the
existing
pgs.
We
need
to
understand
what
what
what
we
have
in
in
on
a
disk.
I
don't.
C
C
C
Is
that
what
it
is
I'd
rather
create
the
ability
to
create
objects
that
are
definitely
not
rados
objects.
Look
the
only
purpose
from
what
you
guys
are
telling
me
is
so
that
we
know
like
when
we
boot
up
how
many
what
pgs
exist
yeah.
So
the
question
is:
why
can't
we
use
the
existing
interfaces
already
in
object
store
to
do
that?
Why
does
this
have
to
be
special?
What's
interesting
about
this,
we
have
a
number
of
of
special
objects
already
present
in
the
osd.
C
All
of
the
pgs
have
a
meta
object,
which
is
represented
by
like
a
an
empty,
an
empty
like
just
a
hash
with
empty
other
fields.
So
I
forget
exactly
how
it's
constructed,
but
it's
a
perfectly
normal
object.
C
D
C
C
D
In
bluestar
it
is
pi
some
mcdo
something.
C
C
C
C
D
C
No
and
the
reason
for
that
is
that
you
need
to
be
able
to
split
a
collection
in
constant
time,
or
at
least
in
non-linear
time.
So
if
a
pg
split
comes
in,
we
have
to
be
able
to
split
a
collection
into
two
collections
based
on
the
change
in
pg
seed
very
quickly.
So
it
would
be
extremely
bad
if
every
collection
had
its
own
own
tree.
D
Okay,
so
do
you
think
each
pg
has
has
one
collection?
Do
we
keep
this
concept.
D
C
C
D
C
What
we
do
is,
we
would
simply
say,
blue
store,
uses
collections
and
the
interfaces
continue
to
use
them,
but
c
store
ignores
them,
which
is
pretty
much
what
it
already
does.
Collections
only
barely
exist
in
blue
store,
for
instance,
there's
no
difference
in
the
object
name
spacing
or
objects
in
different
collections.
The
collections
aren't
a
prefix
there's,
no
there's
nothing
special
about
them.
C
You
would
still
have
to
use
them.
The
interface
requires
it,
but
you
as
long
as
you
don't
rely
on
them
for
anything,
then
c-store
wouldn't
be
able
to
tell
the
difference
if
we
simply
ignored
the
parameter.
Nothing
would
change
like
what
what
behavior
my
challenge
to
you
is
what
behavior
is
actually
different
because
of
a
collection.
C
A
C
C
C
C
E
E
E
C
Okay,
so
every
thread
in
the
osd,
when
it's.
D
C
It
registers
itself
with
a
process-wide
registry
table
and
or
rather
every
work.
You
thread
every
time.
It
completes
an
item.
It
goes
back
to
the
registry
table
updates
its
live
from
time,
and
if
it
goes
to
sleep,
it
marks
that
and
goes
to
sleep
whatever
and
then
every
tick.
The
osd
pro
goes
over
this
process
table
and
looks
for
any
one
of
these
counters
that
hasn't
been
updated
for
at
least
whatever
the
timeout
is
60
seconds
or
120
or
something
after
one
threshold.
C
So
the
purpose
of
that
handle
is
so
that
long-running
operations
that
may
genuinely
take
too
long
but
not
really
mean,
but
not
really
indicate
that
the
osd
is
dead.
It's
so
that
they
can
take
this
handle
and
prove
liveness
every
so
often
so,
if
you're
listing
through
a
pt
or
something,
and
for
whatever
reason
you
didn't
want
to
divide
this
up
into
different
items,
you
ping
the
heartbeat,
handle
every
so
many
items
so
that
you
can
prove
liveness.
E
C
C
It's
not
it's
probably
best
not
to
assume
a
particular
value,
but
like
10
seconds
would
definitely
be
long
enough
that
you
should
be
pinging.
The
timer.
E
Okay,
I
don't
think
current
scoutcall
doesn't
use
it
and
I
don't
see
a
reason
to
use
it.
C
E
A
Covers
the
blue,
store
changes
and,
and
also
re,
covering
the
the
build
changes
and
some
sometimes
cover
ci
changes,
something
I'm
interested,
and
I
think
I
I
I'm
responsible
for
for
testing
them.
E
E
A
A
A
C
Yep,
I
have
successfully
run
some
random.
I
o
against
the
transaction
manager
backed
by
a
real
disk.
I
was
able
to
get
a
couple
of
tens
of
thousands
of
iops,
which
is
pretty
good
for
no
optimization
whatsoever
it
the
subsequently.
There
are
some
things
I
need
to
fix,
like
the
journal
needs
to
actually
check
some.
The
journal
needs
a
way
to
tell
the
difference
between
stuff.
I
wrote
this
time.
I
did
make
a
fest
and
stuff.
I
did
last
time,
so
I
have
a
bunch
of
bookkeeping
to
do
this
week.
C
I
should
have
a
full
request
for
at
least
a
very
basic
version
of
this
tester
kifu.
Where
is
the
performance
testing
we're
currently
running
for
the
perf
test?
I
would
like
to
add
this
to
it,
not
the
version
that
writes
to
this.
I
want
to
write,
I'm
just
going
to
have
it
right
to
an
ephemeral
back
end,
but
that
way
we
can
test
the
code
path
itself
from
commit
to
commit
and
find
out
whether
we've
added
in
instructions.
A
So
you're
asking
where
we
do
that
the
jenkins
thing,
oh
sure,
will
I
will
send
you
a
mail
regarding
to
the
details
where
we
have
the
test.
But
but
you
want
to
quick
answer:
that's
in
the
in
the
first
test
under
safe
field,
I
was
hosting.
C
C
D
Last
week
I
was
mainly
running
the
crimson
thrust
test
and
fixed
the
discovered
bugs,
and
I
also
tried
to
resolve
the
the
obc
lock
order
issue
by
implement
by
implement
some
some
kind
of
sequencer
to
order
the
client
requests
repeat
and
solve.
D
A
I
also
leave
some
some
some
comments
to
your
recovery
fixes
pr.
D
Yeah
last
week
for
the
honor
3,
I
I
implemented
the
typed
data
recorder
classes
for
replay,
I'm
currently
working
on
encoding
and
decoding
data.
I'm
still
drafting
the
ppm,
enhanced
system
design
and
for
mod
code
for
crimson
messenger.
I
proposed
possible
design,
changes
and
originals
in
the
m2
and
matching.
A
I'm
I'm
not
following
you
with
regarding
to
the
message:
messenger
changes.
What
is
it
related?
A
D
A
Yeah,
I
recall
we
do
we.
We
did
have
some
some
some
critters
or
factories
for
for
creating
the
the
messenger
which
works
across
course
right.
D
D
E
A
A
E
D
C
To
correct
a
few
mistakes
I
made
earlier,
I'm
wrong
blue
store
doesn't
expose
a
knob
sequencer
anymore,
that
was
in
the
past
these
days
it
it
maintains
an
internal
mapping
from
call
t
to
the
op
sequencer
object
and
it
populates
that
on
boot,
using
the
set
of
c
nodes
so
chun,
may
those
c
node
objects
serve
exactly
one
purpose:
on
startup
the
blue
store
scans,
those
keys
and
uses
it
to
initialize
the
collection
map
and
the
set
of
op
sequencers.
C
So
if
for
us
to
do
that,
all
we
would
really
have
to
do
is
create
like
a
single
block
array
of
pgt
bits
map
that
is
pgt
c
node
pairs
and
c
note
is
literally
just
the
set
of
bits
associated
with
the
pgid.
There
really
isn't
a
lot
to
it.
C
C
C
C
Two
things
submit
two
operations
submitted
on
objects
within
the
same
collection
are
meant
to
be
ordered,
but
that
thing
I
just
said
does
not
require
writing
it
down.
The
writing.
It
down
part
is
so
that
we
can
quickly
list
the
set
of
collections
on
osd
startup.
That
part
is
sort
of
orthogonal
and
could
be
handled
another
way,
but
I
think
for
now
the
laziest
way
to
do
it
would
simply
be
to
add
another
block
type
that
just
lists
collections
assume
it's
bounded,
don't
worry
about
it.
C
C
D
C
A
C
C
C
That
already
exists.
We
use
the
transaction
manager
the
same
thing
as
anything
else,
just
use
a
well-known
number
or
whatever
we're
currently
using
not
really
worried
about
it.
C
C
Or
we
could
do
something
else,
I
don't
really
care
it's.
It's
like
it's
not
like
a
big
deal,
because
again,
its
only
purpose
is
to
initialize
the
collections
list
on
on
startup.
C
There
are
likely
other
ways
to
do
that
as
as
well,
you
could
write
to
a
file,
that's
specified
somewhere
in
a
config
directory,
for
instance,
yeah,
because
it's
not
rapidly
changed.
I
I
think
the
the
easiest
solution
is,
to
just
add
a
new
block
type
for
now
easy.
C
Does
that
make
any
any
sense?
Shall
we
so
the.
C
And
the
on-disk
structures
serve
completely
different
purposes.
They
used
to
be
separate
back
or
before
I
think
2013.
Most
of
these
interfaces
took
both
a
collection
and
an
op
sequencer.
They
were
independent
and
the
collection
denoted
an
actual
on
disk
folder.
But
none
of
that's
true
anymore.
These
days
we
use
a
shared
namespace
for
all
of
the
objects
in
the
entire
osd
there's
no
folder
structure
and
for
us
there
will
be,
for
instance,
like
they'll
all
be
in
the
same
o,
node
tree.
C
So
at
least
at
first
for
now
we're
going
to
ignore
the
sequencing
part,
because
all
operations
will
be
considered
sequenced,
so
that
will
be
fine.
The
longer
term
thing
will
be
that
so
that
that
part's,
fine,
the
only
the
only
thing
we
have
to
do
with
collections,
I
think,
is
be
able
to
list
them.
D
That
is
what
we'll
see
go
in
ahead
in
the
root
block.
There
is
a
pointer
pointed
to
one
block
to
just
save
the
collection's
information.
D
No
because,
if
not
big
enough,
the
number
is
now
big.
It's
maybe
700.
C
C
D
C
A
A
Can
we
also,
can
we
store
the
the
pj
list
and
put
it
this
way
along
with
the
super
block.
C
C
A
C
It's
not
actually
it's
just
that
for
all
practical
intents
and
purposes
for
testing
it
won't
be
big
because
it's
really
really
stupid
to
create
a
huge
number
of
pg's
on
an
osd,
but
in
real
life.
That
does
happen
occasionally
and
it's
unacceptable
for
an
oc
to
crash
under
those
circumstances.
So
we
have
to
tolerate
large
numbers
of
collections
eventually,
but
we
don't
have
to
do
it
like
this
year.
We
can
do
that
in
january
or
february,
once
other
stuff
is
working.
C
I
mean
we
could
also
just
have
it
be
a
special
object
inside
of
c-store
and
have
it
use
the
existing
like
once
we
have
omap
functionality,
then
we
already
have
a
scalable
key
to
value
mapping.
We
don't
really
have
to
write
it
bespoke
one.
If
we
do
it
that
way,
see
what
I
mean
like
c
store
itself
internally
could
have
a
special,
otherwise
inaccessible
object
that
it
uses
for
storing
this
kind
of
control,
information.
C
C
C
I'm
saying
that,
okay,
so
imagine
that
your
omap
code
is
already
in
the
tree,
so
we
could
modify
c
store
to
create
a
special
internal
object,
that
is
in
the
o,
node
tree
and
everything
that
isn't
accessible
from
the
osd,
and
we
could
just
stick
the
pg
information
in
that
into
the
omap
section
or
the
data
section
for
that
matter.
I
don't
really
feel
that
strongly
about
it,
but
that
would
work
yeah.
A
C
Although
the
optimization
isn't
especially
important
because
again,
the
numbers
here
shouldn't
be
big,
so
if
we
have
to
rewrite
the
entire
collection
set
every
time
we
add
or
remove
one,
I
can't
say
I
lose
a
lot
of
sleep
over
that
we
are
keeping
the
whole
collection
set
in
memory
at
all
times,
so
it's
not
expensive.
At
least
it
shouldn't
be
expensive
to
do
that.
C
D
B
Yes,
first
thing
one
second,
let
me
take
a
look
here.
B
So
I
guess
the
first
time
I
just
want
to
clarify
when
we
in
classical
osd,
we
we
put
the
the
pg
in
state
in
repair
state
one.
B
Yes,
so
you
have
so
when
you
get
eio,
you
call
a
function
called
reprepare
primary
object
and
one
other
thing.
One
of
the
things
it
does
is.
Is
I
set
the
pg
state
to
repair
one,
my
question,
so
my
question
is
one:
why
and
two
what
are
the
consequences
of
putting
the
the
pg
state
in
repair.
C
I
am
not
sure
that
sounds
like
a
random
question
rod
and
I
think
you've
had
more
experience
with
this
bit
of
code
than
I
have
recently.
E
C
So
offhand,
I
believe
the
goal
here
is
when
you
see
an
eio,
you
want
to
trigger
a
scrub
and
then
a
repair
on
the
object
right.
So
I
suspect,
that's.
What's
going
on
so
you'd
have
to
read
the
classic
ceph
code
to
work
out
how
the
scrub
then
actually
happens
that
I
am
not
sure
about.
Does
it
queue
for
scrub
or
otherwise
change
the
scrub
state.
E
B
E
B
No,
so
it's
in
the
normal,
so
it's
in
the
normal
I
o
path.
When
you
try
to
do
let's
say,
read
or
sparse,
read
if
you
get
an
eio
from
the
object
store,
then
you
you
call
a
rep
repair
which
does
the,
which
does
the
which
changes
the
state
of
the
pg
and
then
tries
to
start
a
appearing.
C
B
I'll
repeat
that
again,
I
I
said
that
it
is
in
the
regular
I
o
path.
For
example,
if
you
try
to
do
read
or
sparse,
read
in
in
primary
log
pg
and
you
get
an
eio,
then
you
call
red,
prepare
primary
object,
which
is
the
function
I'm
talking
about,
which
does
essentially
the
marks
the
the
object,
as
missing
saves
the
the
the
ops
that
came
through
for
the
object
changes
the
state
of
the
pg
and
starts
the
peering
event
starts
appearing
essentially
yeah.
C
I
see
okay,
it's
because
this
is
the
mechanism.
This
is
the
mecha
okay,
so
this
this
must
be
the
way
repair
works
these
days.
This
is
new.
To
to
me,
I
suspect,
what's
going
on
is
that
when
you
queue
recovery,
when
the
pg
is
in
state
repair,
it
looks
for
I'm
going
to
be
adding
this
somewhere
right.
It
marks
the
object
missing
a
little
bit
above
that
which
triggers
the
existing
log
based
recovery
logic
to
re-recover
the
object
and
then
run
into
your
point.
C
Neither
what
it's
going
to
do
is
it's
going
to
mark
that
specific
object
non-present
on
the
osd.
So
if
another
read
or
write
comes
in
on
that
object,
it
will
be
blocked
until
that's
no
longer
true.
At
the
same
time,
it
keeps
appearing
event
which
causes
the
pg
to
go
into
recovery,
which
will
cause
it
to
recover
any
missing
objects.
This
one
it
will
not
block.
I
o
in
any
other
object
the
okay,
the
osd
is
perfectly
capable
of
serving
I
os
on
other
objects
that
are
not
the
one
being
recovered.
B
C
Why
this
is
an
abuse
of
the
flag?
The
way
this
is
supposed
to
work
is
when
you
have
the
repair
state
set
it
the
next
time
through
then,
when
it's
when
it
the
scrub
starts,
it
will
mark
any
problematic
objects
it.
It
sees
and
then
kick
off
a
recovery.
C
C
B
Yep
sounds
good.
What
I
was
just
about
to
ask
is
so
you're
saying
that
this
is
just
our
way
to
to
mark
the
object
is
missing
yep,
don't.
C
Know
what's
going
on
yeah,
no
literally
as
missing
one
moment:
where
does
it
actually
do
that
primary
error?
I
believe
if
you
follow
that
primary
error
call.
B
Yes,
okay,
so.
B
I
was
just
saying
careful,
so
does
that
answer
your
question
from
yesterday.
A
Yeah,
because,
by
looking
at
the
peering
state
dot
h,
I
find
that
there's
clean
state
move
moves
to
to
wait.
Local
recovery
is
reserved
if,
if,
if
I
do,
recovery
even
comes
in,
that's
why
I
I
thought
it
might
mark
the
pg
in
in
the
available
state,
will
prevent
it
from
serving
furthest
requests
from
clients.
So.
C
The
the
larger
scale
summary
of
how
this
this
works
is
that
the
time
during
which
pgs
do
not
serve
io
is
the
time
during
which
they
do
not
know
which
objects
are
up
to
date
and
which
ones
aren't.
C
This
is
generally
called
peering,
so,
by
the
time
period
completes
we
know
exactly
which
objects
on
the
primary
are
up
to
date
and
which
ones
aren't.
So.
At
that
point
we
begin
serving,
I
o,
and
if
we
see
an
I
o
on
an
object
that
is
not
up
to
date,
then
we
block
it
until
recovery
completes
on
that
object,
but
otherwise
we
continue
serving.
I
o.
D
B
Only
the
primary
pg
can
only
the
primary
can
trigger
peering
right.
C
You
all
have
to
be
careful
with
the
term
trigger
appearing,
because,
strictly
speaking,
no
o's
osd
can
peering
happens
when
an
osd
map
comes
in
that
changes
the
interval
set.
So
actually
it's
a
cluster-wide
signal.
This
is
not
peering.
This
is
reinitializing
recovery
and
that
is
primary
specific
state.
The
primary
is
free
to
do
this
at
its
own
anytime.
It
wants,
I
think
the
question
you're.
C
Probably
asking,
though,
is
what
happens
if
an
eio
occurs
on
a
replica
and
the
answer
at
the
moment,
I
believe,
is
that
there's
no
good
handling
for
it,
eios
sort
of
fundamentally
can't
happen
on
a
right.
C
The
same
way,
it
would
mean
the
store
is
dead,
so
that
kind
of
kills
the
osd,
which
sort
of
solves
the
problem,
because
you
know
the
osd
fails
and
we
do
failover,
but
there
isn't
anything
like
this
repair
path,
primarily
because
we
don't
do
replica
reads
in
the
first
place,
but
also
because
the
replica
would
have
to
somehow
propagate
this
information
back
to
the
primary,
which
is
of
course
possible,
but
would
require
quite
a
bit
of
code,
so
it
hasn't
been
implemented.
B
C
And
in
a
lot
of
ways,
that's
an
acceptable
behavior
because
mostly
discs
don't
partially
fail
at
this
granularity
and
it
doesn't
mean
a
whole
host
fails.
It
just
means
that
specific
osd
and
therefore
that
specific
disk
failed,
so
failover
is
appropriate,
but
yeah
it's
an
area
that
could
be
improved.
You
probably
shouldn't
try
to
tackle
that.
As
part
of
this
part
of
your
project,
though,
I
would
try
to
replicate
the
existing
behavior.
B
So
kefwood
do
you
agree
that
that
we
should
it
in
terms
of
the
the
actual
implementation
of
the
function
that
I
showed
you
yesterday
with
with
urgent
recovery
and
the
and
the
state
set
and
peering
and
local
peering
event?
Do
we
keep
that
the.
B
A
C
Well,
it's
it's
mostly
correct,
like
most
of
the
stuff
normal
that
that
is
it's
not
as
much
of
a
stretch
as
it
looks
like
we
genuinely
don't,
have
a
copy
of
this
object
at
this
time
right
because
we
can't
read
it
so
right.
The
recovery
path
is
appropriate
here.
C
The
only
part-
that's
a
little
cute
is
using
pg
state
repair
here,
because
that
is
an
abuse
of
the
flag
that
isn't
really
what
that
flag
was
meant
for,
but
for
now
it's
okay.
I
think.
A
C
C
A
Find
the
the
code
urgent,
you
could
go
to
client
underscore
request
dot
cc.
So
if
I'm
posting
the,
I.
C
A
A
B
C
Yes,
that
does
seem
more
appropriate
you'll
have
to
think
a
bit
to
make
sure
that
there
isn't
other
behavior
and
classic
we
need.
But
I
don't
see
any
that's
obvious
to
me.
You
will
of
course,
still
need
to
handle
the
part
where
you
add
it
to
the
missing
set,
or
this
won't
work.
C
Yeah,
if
this
is
what
we
do
on
a
on
a
read
on
a
missing
object,
that's
the
best
option
here.
B
Yeah,
so
so,
if
I
I
thought
that
that
this
that
this
urgent
recovery
thing
is
the
is
the
equivalent
of
our
wait
for
waiting
for
unreadable,
object
queue
in
classical
ost.
It's.
C
D
C
B
Yes,
I
understood
it
does
it
that
it
does
more.
It
actually
adds
to
immediately
adds
the
object
to
the
the
record
to
the
recovery
queue,
but
it
does
it
achieve
also
what
what
what
we
were
doing
in
classical
osd
with
the
peering
after
setting
it
doesn't
do.
C
D
C
The
real
question
is:
why
doesn't
classic
do
this?
Wait
for
unreadable
object
totally
calls
no
wait
for
a
readable
object
called,
maybe
kick
recovery,
which
does
exactly
what
this
code
does
this
urgent
recovery
thing?
This
is
definitely
a
thing
in
classic
two,
so
I
don't
really
understand
why
that
eio
handler
in
classic
doesn't
just
call
maybe
kick
recovery.
B
We
still
need
to
mark
the
object,
as
is
missing
with
the
state,
with
the
pg
state
repair.
No.
B
C
C
No,
you
also
need
to
mark
the
object
missing
and
do
any
other
bookkeeping
required
to
prevent
the
oc
from
crashing.
What
I'm
telling
you
is
that
the
specific
thing
classic
osd
is
doing
here
is
overkill.
We
don't
actually
need
to
transition
all
the
way
back
into
all
the
way
back
to
the
recovery
stage.
Not
since
there
is
apparently
code
to
do
a
recovery
without
them.
C
B
D
B
But
the
the
second,
the
second
part
of
my
to
my
question,
is
I
I
thought
of
doing
this,
the
the
handling
in
in
so
okay.
Wait!
Let
me
let
me
start
this
differently
this
this
in
classical
sd.
This
function
could
actually
return
e
again
if,
for
example,
that
we
are
blocked
by
the
the
the
pg
state,
if
it's
not
clean
and
it's
not
backfilling
or
recovering,
then
it
returns
a
again
and
what
it
does
on
classical.
Is
this
completely
restarts?
It
completely
restarts
the
opt.
B
Well,
to
my
understanding,
I
I
think
that
it
does
that
because
well,
it
restarts
the
entire
up,
because
maybe
the
the
the
object
was
already.
Maybe
the
object
was
recovered,
perhaps
by
the
time
we
we,
we
call
the
up
again,
because
maybe
someone
maybe
someone
else,
already
triggered
recovery
for
that
object.
C
No,
it's
because
this
code,
what
it
calls
q
peering
event,
is
going
to
transition
us
back
to
the
recovery
state
in
the
peering
state
machine.
No,
this
is
not
peering.
It's
the
thing.
The
peering
state
machine
is
the
name
for
the
entire
state
machine
that
describes
all
of
the
possible
states
for
the
pg.
Only
a
small
subset
of
that
is
actually
appearing,
so
the
state
we
go
back
into
is
called
recovery
and
it's
the
state
we're
in
when
we're
doing
this
thing,
where
we
iterate
over
the
missing
set
and
recovering
all
the
relevant
objects.
C
So,
if
we
didn't
check
for
this-
and
we
went
straight
through
we'd
end
up
recover
we'd
end
up
transitioning
back
into
recovery
already,
even
though
we're
already
in
it.
So
what's
actually
going
on
here
is
it's
saying:
oh
we're
already
doing
recovery
it's
too
complicated
to
handle
this
case.
So
what
we're
going
to
do
is
we're
going
to
block
the
app
until
we're
done,
doing
recovery,
hidden,
eio
again
and
then
trigger
recovery
again
to
handle
it.
C
A
I'm
I'm
just
now,
when
you
guys
are
discussing
I'm
looking
at
the
the
code
pass
in
primary
lock,
pg
try
to
understand
w.
If
the
e
again
comes
back
to
to
client,
I
think
the
answer
is
it
does.
C
A
D
A
Enclosed
by
the
request
will
be
start.
C
C
A
C
A
A
A
It
created
a
shared
future
in
in
the
central
place
that
we
can
win
yeah.
I
mean
that
part.
Certainly.
C
C
Way,
either
way,
if
you're
using
the
the
future
concept
here,
then
this
not
is
clean
part
that
block
for
clean
would
simply
wait
on
a
future.
That's
all
that's
going
on
there
so,
like
I
said
this
isn't
really
about
the
waiting
part.
The
client
never
sees
these
these
against.
It's
just
a
way
of
suspending
the
operation
until
the
condition
clears
so
amnon
is.
This
is
clean
condition
important,
given
the
change
with
the
urgent
recovery.
C
C
C
What
we
are
proposing
to
do
here
is
to
use
the
already
existing
urgent
recovery
code
to
use
the
path
we
use.
When
we
see
a
read
on
an
object
that
needs
on
an
object
that
see
read
on
an
object
that
is
not
present.
C
B
All
right!
Well,
I
don't
think
I
don't
think
so
in
this
case.
If,
if
I,
if
I
find,
if
I
understand
correctly
classical
osd
when
we
when
we
restart
the
whole
thing,
it
is
possible
that
that
that
the
pg
state
changed.
Why
was
it
that
that's.
C
C
So
I
the
core
difference
is
that,
because
this
urgent
recovery
operation
is
designed
to
run
on
an
object
that
is
not
present.
While
we
are
doing
recovery,
it
is
not
a
problem
to
run
it
while
recovery
is
happening,
so
we
don't
need
this
clean
check.
The
reason
why
classic
is
doing
this
classic
osd
is
doing.
This
is
because
this
trick,
where
it
transitions
back
into
recovery,
wouldn't
work
properly.
If
we
were
already
in
recovery.
B
C
A
That
there
is,
I
think,
it's
more
like
a
map
for
for
collecting
the
object
being
recovered.
A
C
C
What
it's
doing
is
it's
doing
the
operation
we
normally
use
during
the
recovery
state
without
actually
waiting
for
the
recovery
process
to
get
to
it
and
the
reason
why
it
exists
in
crimson
and
indeed
in
classic,
is
because
immediately
after
we
transition
into
the
acting
state,
if
we
get
a
a
client
read
on
an
object,
we
would
really
like
to
recover
that
object
immediately.
We
don't
really
want
to
wait
for
the
log
recovery
to
just
kind
of
get
to
it
right,
so
we
just
instantly
send
the
messages
and
be
done
with
it.
C
C
The
reason
for
that
on
clean
check
is
because,
if
we're
not
clean,
then
we're
already
recovering
and
that
little
trick
won't
work,
it
will
mess
up
the
existing
recovery
process,
so
we
simply
wait
until
it
ends
in
crimson.
We
don't
need
to
do
that,
because
we
can
simply
use
this
operation,
which
is
specifically
designed
to
work
in
para,
currently
with
a
an
ongoing
recovery.
C
C
C
B
C
No
and
classic
oc
doesn't
propagate
that
back
to
the
client.
If
you
go,
read,
do
up
or
do
osteous
one
or
the
other,
that
return
code
is
caught
and
dropped,
because
it
means
that
something
somewhere
the
handling
pipeline
stuck
this
up
on
a
queue
somewhere,
and
it's
something
we'll
we'll
deal
with
it
later.
C
C
The
op
may
have
had
multiple
reads
embedded
in
it
and
it
might
be
a
little
bit
tricky
well
or
maybe
not.
Maybe
it's
okay
to
just
read
it
like
a
really
long
write.
But
one
of
the
problems,
for
instance,
is
that
you're
holding
a
read
lock
on
the
object
right
when
you're
doing
a
read
through
yes,
so
you
need
to
drop
the
read
block
to
do
recovery
and
I
really
think
the
easiest
way
to
do
and
that's
not
the
only
state
we
have
to
drop
like
there's
an
object
context
we
got
to
drop.
C
We
might
have
been
making
modifications
to
an
object
context
as
we
went
because
we're
projecting
the
result
of
a
read
modify
right,
and
I
just
it's
difficult
to
reason
about
precisely
what
in
that
state
you
have
to
reset
after
you
resume
and
I'd
kind
of
prefer
not
to
I'd
really
rather
just
start
over,
especially
for
something
like
an
eio
which
is
meant
to
be
rare.
Writing
special
handling,
for
this
seems
like
wasted
work.
C
I
think
that's
gonna,
be,
I
think,
that's
gonna
be
the
best
solution
for
every
op
either
the
op
runs
to
completion
without
errors
or
weird
conditions,
or
if
we
find
we
have
to
suspend
it,
we
just
we.
We
abandon
our
current
work
and
start
over
once.
The
condition
clears.
A
C
C
Eio
and
and
even
receiving
a
read
on
an
object
we
have
to
do
recovery
on
like
I
don't
really
care
how
long
it
takes
to
schedule
the
op,
because
in
the
meantime,
we
spent
a
bunch
of
time
and
energy
recovering
the
object
itself.
That's
going
to
be
the
dominant
cost,
not
the
relatively
small
cost
of
simply
redoing,
perhaps
an
io
we
already
did,
which
notably,
would
be
to
cash
states.
So
it's
unlikely
to
be
expensive
anyway.
B
Are
you
saying
that
it's
not
really
important
when
we
try
to
do
the
the
up
on
the
object
again
you're
saying
we
should
just
schedule
it
and.
C
For
now,
I
think
I'm
on
you
should
do
whatever
the
current
do.
Recovery
user
does,
but
in
the
longer
term,
as
we
go
back
through
and
start
fixing
bugs
in
this,
I
want
to
think-
and
I
I
want
to
do
this
thing-
where
we
just
reschedule
the
operation
from
scratch.
I
think
joyhound
is
working
on
that
or
wait.
No,
that's
the
this
is
something
we
discussed
a
few
weeks
ago.
It's
just
work
yet
to
be
done
so.
C
D
C
B
B
Okay,
thank
you.
I
guess
that's
pretty
much
it
on
on
this
matter.
Unless
anybody
else
has
any
other
questions.