►
From YouTube: Ceph code walk-through: OSD (async) recovery
Description
Neha Ojha walks us through the OSD recovery code, with an emphasis on the (newer) async recovery mode.
A
All
right
so
hi
everybody
I,
am
Neha,
and
those
of
you
who
do
not
know
me
I,
walk
on
the
radar
Steen
today,
I'm
gonna
try
to
give
a
would
walk
through
on
recovery,
particularly
focusing
on
his
synchronous
recovery
that
we
implemented
of
or
mimic
and
before
I
delve
into
the
core
I
just
thought
it
would
be
helpful
for
everybody.
Let
me
just
quickly
share
my.
B
C
A
Right
I'm,
just
gonna,
go
over
a
couple
of
I
know:
slides
are
boring,
but
I
still
wanted
to
score
a
couple
of
them,
because
I
think
it
makes
more
sense
and
gets
background
about
why
we
wanted
a
sinks
rapidly
in
the
first
place.
So
the
motivation
behind
off
isn't
recovery
is
that
recovery
in
safe
has
been
a
synchronous
process
which
implies
that
it
blocked
rights
to
an
object
which
was
spending
on
recovery.
A
So
the
problem,
obviously
with
this
is
that
it
increases
right.
Layton
sees
and
affects
the
availability
in
total.
So
we
wanted
to
get
rid
of
this
problem,
and
the
solution
that
we
came
up
with
was
not
to
block,
writes
on
objects
which
are
only
missing
on
non-acting
OS
T's.
So
I
want
to
focus
on
the
fact
that
the
whole
idea
of
async
recovery
takes
a
lot
from
backfill,
which
is
already
a
concept
present
in
safe.
A
So
the
idea
is
to
perform
recovery
in
the
background
of
University
out
of
the
acting
set
just
like
backfill
and
use
the
PG
log
to
determine
what
needs
recovering,
and
this
is
why
the
most
important,
slides
I
think
we
are
going
to
see
these
concepts
in
the
core
as
well.
But
it's
just
important
to
remember
that
these
are
the
three
factors
that
determine
when
we
perform
a
stint
recovery,
as
I
mentioned
earlier,
that
async
recovery
targets,
our
OS
DS
that
are
just
taken
out
of
the
acting
set
based
on
these
following
conditions.
A
A
Now
we
can
always
tune
this
parameter
and
use
it
to
our
benefit
and
perform
experiments
to
determine
what
is
a
good
value
for
it,
but
we
just
now
have
the
hook
to
change
this
parameter
and
allow
users
to
decide
what
they
want
this
value
to
be,
but
for
now
we
have
set
it
to
hundred
and
the
third
one,
which
is
the
most
important
one
I
feel
is.
We
always
want
to
maintain
main
size
replicas.
As
long
as
we
have
this,
that
is
when
we
even
can
think
of
performing
a
sync
recovery.
A
A
Before
I
talk
about
how
all
the
stuff
looks,
the
most
important
pieces
of
God
lie
in
the
OST
directory
and
the
files
of
interest
to
us
are
PG
e
dot,
h,
PG,
dot,
CC
primary
log,
PG,
h,
CC
and
a
little
bit
of
PG
back
n
dot,
HCC
and
also
the
backend
specific
implementations
are
in
replicated
PG
dots,
EC
and
EC
back
and
Roxy
see.
I
think
this.
These
are
mostly
the
important
files
where
you'll
see
all
the
latest
changes
that
have
been
implemented.
So,
let's
now
first
went
to
preach
e
dot.
Cc.
A
When
I
spoke
about
collecting
some
OSDs
as
a
sink
recovery
targets,
the
this
cool
thing
happens
in
the
choose
acting
function.
So
this
is
a
function
which
was
already
there
and
and
as
there's
a
clear
comment
about
what
it
does,
it
calculates
the
desired,
acting
set
and
requests
a
change
with
the
monitor
if
it
differs
from
the
current
acting
set.
A
by-product
of
this
function
is
also
that,
while
we
select
the
acting
set,
it
used
to
segregate
out
back
fill
targets
in
the
process,
and
we
will
take
a
look
at
how
it
was
doing
it.
A
If
you
look
at
it,
it
takes
in
I
think
it's
important
that
we
focus
on
this
function
of
it,
because
this
is
where
all
the
things
happen
and
then
we
can
see
like
after
we
select
a
sink
recovery
targets.
What
next?
How
does
it
actually
ensure
that
we
do
not
block
IO
to
that
object,
so
coming
back
to
choose
acting,
so
this
is
a
function
and
PG
dot,
C
C
and
it
takes
in
the
authoritative
logs
chart
ID
a
couple
of
other
things
which
are
not
important
right
away.
So,
let's
just
go
down
it.
A
If
you
look
at
this
section,
you
see
a
calc
replicated
acting
and
a
calc
EC
acting
function,
so
these
spring
functions
basically
select
acting
OS
T's
and
also
select
backfill
targets
out
of
the
entire
set
of
OS
DS
that
we
provide
to
this
function.
So
the
parameters
that
we
pass
here
are
the
odd
flog,
shod
PG
size,
the
acting
the
upper
primary
and
the
important
bit
here
is
that
we
have
a
want
variable,
I,
want
voxel
and
want
acting
Batson.
These
are
the
three
that
get
populated
as
a
result
of
this
function.
Let's
for
simplicity.
A
A
A
So
this
is
the
place
where
we
are
segregating
out.
Backfill
OSD
is
based
on
this
particular
condition,
and
we
pretty
much
two
Tris
and
by
the
end
of
this
function
we
have
three.
Let
me
just
go
back
yeah.
Do
we
have
these
filled
out?
The
want
set,
which
is
going
to
be
the
proposed
acting
set.
The
want
voxel,
which
is
the
backfill
targets
and
want
acting
backfill,
is
a
combination
of
both
after
this
please
okay.
A
So
when
I
kept
saying
proposed,
we
I
meant
that
there
is
this
function
here,
which
checks
whether
after
making
these
selections,
can
we
still
recover
and
are
we
greater
than
or
equal
to,
men,
sighs
or
not?
This
is
exactly
the
name
determines
what
the
function
does
so
I'm
not
going
to
go
into
the
details
of
the
function.
A
Now
we
come
to
the
part
what
has
been
implemented
new
for
async
recovery,
so
we
output
for
the
outcome
of
the
calc,
EC
and
calc
replicated
functions
and
now
pass
through
these
two
new
functions.
One
is
called
of
choose
a
sync
recovery
EC
and
the
other
is
the
replicated
one.
And
if
you
look
at
it,
we
just
pass
the
want
SEC
that
was
returned
earlier,
and
we
are
also
expecting
that
this
want
a
sync
recovery
set
gets
populated.
If
there
is
a
chance
of
OST
is
getting
selected
as
a
sync
recovery
targets.
A
Now,
let's
again
go
and
see
what
to
say,
sync
recovery
replicated
does
so
we
already
know
what
this
function
takes
in.
What
so
the
idea
here
is.
We
are
going
to
create
a
set
which
is
a
pair
of
an
int
and
a
pretty
shot
P.
So
this
ink
is
basically
the
cost
to
recover
forthe
particular
shot.
So
the
first
bullet
that
I
mentioned
in
the
slide
maps
to
this
now
I'm
going
to
talk
about
how
we
populate
this.
A
Even
before
we
can
populate
the
set,
we
first
try
to
eliminate
any
stray
OS
DS
and
any
OS
DS
that
are
not
up.
So
we
found
that
in
if
we
end
up
selecting
any
of
these
kind
of
OS
T's,
it
becomes
hard
for
the
OSD
to
get
back
into
the
acting
set,
and
then
we
can't
basically
recover
that
PG
completely.
So
we
just
ensure
that
if
these
two
conditions
are
met,
we
will
not
select
such
OSDs
as
a
sync
recovery.
It's
if
you
are
well
and
good
so
far.
A
What
we
try
to
do
is
there
are
two
versions
that
we
find
out.
So
one
is
the
och
info
last
updated
version,
which
is
the
version
of
the
authoritative
or
log
and
the
candidate
version.
So
when
we
are
looping
through
all
these
OS
DS,
we
check
the
last
update
version
of
that
particular
OS,
DS
PG
log,
and
we
try
to
find
the
difference
in
which
is
what
we
call
the
aprox
entries,
the
authorization,
the
candidate
version.
This
is
the
cost
to
recover.
A
Now,
once
we
have
this
course,
what
we
check
is
whether
this
cost.
So
this
is
coming
to
the
second
bullet
that
I
mentioned
earlier.
If
this
cost
is
greater
than
OSD
async
recovery
min
PG
log
entries,
which
was
100
in
our
case,
that
is
when
we
end
up
selecting
or
even
considering
that
OSD
for
a
sink
recovery,
and
we
inserted
in
this
set
called
candidates,
PI
cost
by
the
end.
When
this
function
prominence,
we
have
a
new
set
which
has
the
cost
to
recover
and
the
shard
in
in
in
ik.
A
After
this,
what
we
do
is
we
are
going
to
ensure
that
we
take
out
OS
DS,
which
have
a
higher
cost
to
recover,
but
this
condition
here
ensures
that
we
do
not
go
below
min
size.
We
keep
checking
that
our
want
set,
which
is
going
to
be
our
acting
set,
is
always
less
than
or
equal
to
min
size
of
the
of
the
pool.
As
long
as
that
is
met,
we
can
go
ahead
and
we
can
insert
the
OST
in
the
async
recovery
set
that
we
asked
her
to
populate
earlier.
A
A
Beyond
this,
we
do
pretty
much
what
we
used
to
do
earlier
or
to
re-request
exchange.
If
the
want
is
not
equal
to
the
acting
and
by
the
end
of
it,
we
will
see
that
we
have
our
battle
targets,
our
new
one
sec
yeah.
This
is
where
we
have.
We
have
the
new
baffle
targets,
the
essence
recovery
targets,
and
we
also
have
the
one
we
call
it
want
acting
backfill
now,
which
is
just
basic.
Sorry.
It's
like
acting
recovery,
backfill
because
leaves
it
now
is
scattering
to
both
recovery
and
backfill.
A
So
after
this,
what
is
important
to
focus
on
okay,
so
I
spoke
about
the
part
where
we
have
already
selected
some
OS
DS,
on
which,
if
there
are
objects
which
are
not
recovered
yet
we
can
still
afford
to
do
io
and
not
block
on
them.
Now
I'm
going
to
talk
about
where
we
implement
this
part
of
not
blocking
the
aisle
that
goes
into
a
primary
lock,
dot
CC
in
primary,
lock,
dot,
CC
there's
this
function
called
do
op
where
any
op
starts
getting
processed,
so
it
just
gets
the
OP
reference.
B
A
What
we
have
done
is
we
have
changed
this
function
a
bit,
and
we
say
that
if
an
object
is
only
missing
on
an
async
recovery
target,
when
this
function
is
not
going
to
return
true
how
we
do
it
is
here.
You
can
see
we
I
trade
over
all
the
acting
recovery,
Brockville
shards,
and
we
try
to
find
it
in
the
pure
missing
set
so
before
I
even
go
into
what
they
are
missing.
So
pure
missing
is
another
data
structure.
A
C
A
Tries
to
find
that
particular
pure
in
the
pure
missing
SEC
and
when
it
finds
that
there
is
an
entry
in
the
pure
missing
set
for
that
pure
and
checks
that,
if
that
pure,
is
an
async
Republic
target
and
then
it
just
continues,
it
lets
it.
Let's
this
function
proceed
further
and
it
at
the
end.
If
it
doesn't
find
anything
we,
it
returns
false.
So
basically
this
will
hold
true
and
when,
if
the
object
is
only
missing
on
a
sink
recovery
target,
then
this
will
return
false
in
that
condition.
A
Once
we
have
done
all
the
checks,
it
now
goes
to
execute
CPX,
where
we
execute
the
context
of
the
OP
that
is
just
coming
for
processing.
So
just
doing
a
quick
recap
of
we
now
have
a
sink
recovery
targets.
We
now
know
that
we
are
not
locking
on
a
sink
recovery
target,
but
we
also
need
to
ensure
that
we
have
adequate
information
about
that
particular
object
in
the
beer
missing
set
of
the
of
the
primary
first
in
the
primary
to
eventually
recover
this
particular
object.
A
A
B
A
So
at
this
point
we
are
trying
to
ensure
that
there
is
no
log
operation
that
has
happened
so
far,
but
we
know
that
the
the
we
have
enough
information
about
the
log
entry
that
is
getting
processed.
So
we
are
now
going
to
check
we
hydrate
overall,
the
async
recovery
targets
and
we
check
whether
the
pure
missing
set
as
that
entry
of
that
object
and
is
missing.
That
particular
object.
So
it's
basically
saying
that
is
it
and
is
in
recovery
target?
Yes,
and
is
it
missing
that
particular
object?
A
A
Yeah
add
next
Avenger
sticks
in
the
log.
Entry
tries
to
find
that
object
for
that
log
entry
in
the
missing
SEC
and
goes-
and
this
is
the
actual
part
that
it
does
so
this
is
the
missing
set.
It
creates
an
entry
of
that
object
and
adds
an
item
with
the
current
version
of
the
log,
and
since
this
is
saying
that
it's
a
new
element,
it's
it's
like
a
need
version,
the
half
version
and
whether
this
object
is
a
delete
or
not.
So
these
three
pieces
of
information
gets
stored
in
the
missing
set.
A
Now,
let's
see
what
we
are
doing
next
so
similar
to
pure
missing.
We
also
have
this
missing
locked
or
add
missing.
This
is
again
a
function.
This
is
again
a
data
structure
which
goes
and
updates
are,
there's
a
needs.
Recovery
map
which
goes
and
adds
this
entry
in
the
needs.
Recovery
map
I'm
not
going
to
go
into
too
much
detail.
It's
a
little
complicated
there,
but
the
idea
is
to
basically
ensure
that
all
the
data
structures
that
we
require
to
recover
are
up-to-date
about
this
particular
object
and
the
log
entry.
A
So
it
basically
updates
the
object.
We
do
version
it
does
that
and
we
need
half
version
is
empty
and
it's
a
delete
or
not.
This
is
the
information
that
we
store
now
moving
ahead,
so
there's
okay.
This
might
sound
confusing,
but
there
is
another
data
structure
called
missing
lock,
which
is
kind
of
the
opposite.
The
missing
lock
is
a
data
structure
which
has
correct
information
about
a
particular
object.
So
it
has
the
information
about
all
the
OSDs
that
have
up-to-date
information
about
a
particular
object.
A
Finally,
we're
at
the
end
of
this
function.
Okay,
so
for
me,
normally
I
just
go
and
call
a
submit
transaction
and
we
send
it
all
this
information,
the
object,
ID
stats,
the
version,
a
bunch
of
other
information
here,
we'll
see
how
we
use
this
later.
Since
this
is
getting
called
on
the
PG
back-end.
Now
we
need
to
go
and
see
back
in
specific
code,
so
for
simplicity,
again
I'm
going
to
go
to
just
the
replicated
case
and
see
how
this
works.
A
Okay,
so
in
submit
transaction.
We
do
few
things
which
are
just
specific
to
executing
that
operation
and
the
important
picture
is
this
generate
transaction?
There
are
a
couple
of
in
directions
here,
but
in
each
function
we
end
up
doing
some
more
incremental
amount
of
work
that
is
required
for
processing
the
OP.
There
is
nothing
specific
to
a
sync
recovery
in
this
function,
so
I'm
just
going
to
go
to
generate
transaction
in
generate
transaction.
A
B
B
A
Tries
to
encode
the
log
entries
it
goes
through.
All
the
acting
recovery,
backfill
sets
and
individually
calls
a
generate
sub
of
a
function
on
each
of
them.
So
this
message
right
is
what
is
going
to
get
populated
when
generates
above
returns
and
generate
the
sebab.
We
again
pass
on
all
the
information
that
it
requires
to
generate
that
sabab.
This
is
where
we
will
see
changes
that
were
made
for
async
recovery.
A
He
come
and
we're
just
trying
to
create.
So
the
M
o
is.
The
wrap-up
is
a
replica
that
we
are
trying
to
create
here,
using
all
the
information
that
we
have.
We
passed
to
this
function,
but
the
important
bit
is
this:
so
does
this
check
that
we
are
doing
that?
If
parent
should
send
up?
If
it
returns
false,
then
we
just
create
an
empty
transaction
and
encoded
into
get
theta,
but
if
shred
and
I
should
send
off
is
true
when
we
have
actual
op
transaction,
this
open
e
underscore
P
is
what
we
bought
in.
A
Your
is
what
we
pass
in,
which
we
and
what
we
encode
in
to
get
data
now
this
should
send
off
is
where
we
did
our
mine.
Let's
just
go
and
see
so
that's
again
in
primary
log,
dot
CC
so
should
send
off
is
where
we
decide
whether
we
should
send
a
complete
op
or
only
the
log
entries
of
to
that
particular
OSD.
So
if
you
look
at
it,
it's
a
pretty
simple
function
that
just
takes
the
pair
and
the
object
now.
First
checks.
Whether
pur
is
a
primary.
A
If
it
is
a
primary,
we
always
send
the
complete
off.
So
there's
no
question
about
not
sending
the
op.
If
not,
then
we
go
ahead
and
first
check
for
backfill
targets,
so
this
was
already
there
and
we
select.
We
checked
whether
these
two
conditions,
so
it's
the
object.
Id
is
beyond
the
last
backfill
started.
Then
we
know
that
it
is
a
backfill
target
and
we
send.
We
say
that
issue
wrap-up
is
going
to
just
empty
an
empty
of
transaction
to
that,
but
misty.
A
What
we
have
also
made
this
function
to
is
to
check
for
a
synchro
travel
targets,
so
we
check
first,
but
here,
but
we
are
trying
to
send
this
off.
Is
that
pureness
and
recovery
target,
or
not
when
this
function?
So
when
this,
when
we
find
out
that
it
is
which
next
try
to
check
whether
that
object
is
missing
on
that
particular
pair
or
not?
If
both
these
conditions
are
true,
then
we
say
that
we
are
just
going
to
ship
an
empty
up
transaction
to
that
OSD.
A
Beyond
this,
we
are
just
doing
we're
just
filling
in
other
information
that
will
be
required
for
us
to
send
to
the
replicas,
and
we
just
return
the
right
so
I
think
with
this,
we
have
ensured
that
we
have
updated
all
the
data
structures
on
the
primary.
We
have
ensured
that
we
send
the
right
kind
of
information,
do
the
replicas
for
it
to
also
determine
that
it's
it's
an
OP
that
is
getting
cross
processed
on
an
async
recovery
target.
A
B
A
Whenever
an
op
comes
through
handling,
message
is
good
to
check
whether
that
message
that
came
in
is
of
ripe
was
the
wrap-up,
which
is
a
replica
and
it
ends
up
calling
do
wrap
up
and
since
the
OP
do
it,
let's
see
the
implementation
of
this
function.
So
this
is
trying
to
extract
information
of
from
that
op
and
find
to
process
the
operators
will
just
sync,
but
what
I'm
trying
to
focus
here
is
on
this.
A
So
if
you
see
there
is
a
be
out
here
which
has
been
added
you
to
a
simpler
cover
e
I'm,
just
gonna
focus
on
what
this
is
saying.
This
says
that
I'm,
a
replica
get
my
log,
get
my
missing,
set
and
see
what
the
items
are.
It's
just
saying:
what
are
the
missing
items
that
I
have
is
what
it
is
printing,
its
just
added
for
debugging
purposes,
but
this
is
the
data.
This
is
the
rate
of
structure
that
we
end
up,
updating
or
recent
recovery.
So
let's
go
and
see
where
we
do
that.
A
Yeah,
so
this
is
a
new
flag
that
we
have
added,
but
that's
not
really
important.
What
we
try
to
do
is
we
try
to
keep
missing
tracker
underscore
t
be
missing,
is
good
to
get
all
the
local
missing
objects
for
that
particular
peer.
So
now
it
may
be
an
async
recovery
target
or
not.
We
don't
know
yet,
but
when
we
check
this
free
missing
is
missing,
object
IDs,
so
if
FAC
missing
set
is
missing,
that
particular
object.
A
B
A
A
Okay,
missing
ad
next
entry,
what
this
is
again
saying
is
ad
next
event.
So
if
you
remember,
we
did
an
ad
next
event
earlier
for
the
primary
case
as
well,
when
we
said
that
ad
next
event
is
going
to
go
and
update,
or
it's
going
to
take
this
log
entry,
it's
going
to
find
out
what
object
we
are
trying
to
update
and
it's
going
to
update
the
missing
sex
for
it,
which
was
I,
can
go
back
to
it
yeah.
So
it
does
the
same
thing
for
the
replica.
A
But
this
is
where
yes,
so
this
is
where
we
have
ensured
that
all
the
information
is
up-to-date
about
the
missing
sets
on
the
essence
recovery
targets.
What
this
helps
us
is,
it
helps
us
do
log
based
recovery,
all
the
information
all
the
missing
set
is
there.
So
when
log
based
recovery
is
going
to
happen,
it
is
going
to
get
all
the
right
kind
of
information
that
it
requires
to
recover
that
particular
PG
before
I.
The
next
part.
A
Let
me
I
know
the
AC
part
may
be
a
little
complicated,
but
I
will
try
to
go
through
a
little
bit
of
it
and
show
you
parallels
between
both
of
the
should
send
opt
function
is
also
used
by
EC
back-end
and,
if
to
see
her
its
end
up
is
populating.
I
should
send
available
easy
sub
right
is
pretty
much
like,
like
a
wrap
up
that
we
did
AC
sub
right
is
parallel
of
that.
So,
when
we
create
this
sub
right
up,
we
and
we
send
this,
send
information
with
it.
A
Mv
sub
right
is
again
a
parallel
of
do
wrap
up
do
replication,
so
handle
sub
right
performs
similar
functions
to
that
function.
We
heard
what
we
see
is
we
check
if
the
OP
is
backfill
or
async
recovery
and
okay?
So
it's
pretty
much
doing
the
same
thing.
It's
checks
whether
this
is
a
backfill
on
missing
to
recovery
target
and
it
tends
the
empty
transaction,
so
it
ensures
that
it
doesn't
write
the
whole
transaction.
It
just
writes
the
log
entries.
A
The
next
part
is
there:
if
you
look
at
it,
these
lines
of
code
are
pretty
much
exactly
the
same
that
we
did
for
the
replicated
case.
So
it's
just
going
to
get
the
local
missing
for
that
easy
shard,
and
it's
going
to
see
if
it
is
a
missing
object.
If
it
is,
then
it
goes
and
calls
add
locally
next
event
on
that
particular
log
entry,
and
this
is
pretty
much
what
we
need
to
know
about
how
to
back
in
specific
implementation
of
async
recovery
is
I.
A
So
now,
let's
just
jump
to
the
part
where,
let
us
say
log
based
recovery
has
completed,
and
we
also
need
to
ensure
that
peace
Oh
that
we
selected
as
a
sink
recovery
targets
should
become
or
get
a
chance
to
become
like
a
regular,
acting
OST
when
all
those
conditions
that
I
mentioned
earlier
are
not
true,
I
mean
that
it
is
not
missing
any
object
and
it
has
all
the
up-to-date
information.
So
that
happens
basically
in.
A
A
Recovery
will
solve
false
and
we
make
these
extra
crawls
to
choose
acting
to
allow
it
to
go
and
NP
outpatient
recovery
targets
if
we
don't
need
them
anymore
or,
like
maybe
select
some
other
OS
DS
as
a
sink
recovery
targets,
because
they
are
better
candidates
for
a
sink
recovery.
This
is
where,
before
we
actually
go
to
the
recovered
state,
we
make
another
call
to
choose
acting
and
we
ensure
that
the
u.s.
DS
that
were
only
selected
as
a
soon
recovery
targets
can
get
back
to
the
acting
said.
I
think
with
this
I
have
mostly
covered.
A
How
is
synchrotron
V
works
in
general.
What
I
would
like
to
understand
here
is
maybe
a
little
stop
sharing
my
screen
and
also
the
others.
When
we
spoke
about
going
through
recovery
code,
the
recovery
code
on
its
own
is
pretty
deep
and
I.
Don't
think
it
can
be
covered
in
you
know
in
an
hour,
so
what
I
have
a
suggestion
is
to
also
go
and
cover
a
bit
of
recovery
improvements
that
we've
done
then.
B
A
Think
with
luminous,
where
we
allowed
recovery
ops
to
be
throttled
that
something
I
walked
on
and
I'm
very
familiar
with,
so
it
could
just
make
sense
for
me
to
go
over
that
part
of
the
code
as
well.
So
is
everybody
okay
with
that,
or
we
can
also
do
more
questions
about
async
recovery.
At
this
point.
A
Recovery
on
a
particular
OST
basically
begins
or
using
this
GU
recovery
function,
which
is
called
using
PG
recovery.
Pt
recovery
is
the
recovery
class
and
run
is
the
function
that
we
basically
call
start
recovery
on
us
to
start
a
particular
recovery
up.
So
we
just
go
and
say:
OST
do
recovery
on
this
particular
PG
and
we
pass
in
a
few
other
things
that
it
requires
all
recovery.
Now,
let's
go
and
see
what
two
recovery
does.
B
A
Is
this
comment?
Okay?
So
this
comment
kind
of
explains
what
the
OST
recovery
sleep
option
is
for.
So
this
is
the
option
that
I
was
talking
about
that
helps
your
throttle,
recovery,
ops,
based
on
your
requirement,
so
the
comment
explains
pretty
clearly
that,
when
the
value
of
this
osu
recovery,
sleep
is
greater
than
zero
recovery.
Ops
are
scheduled
after
this
much
amount
of
time
from
the
previous
previous
recovery
up.
A
A
A
So,
at
this
point,
what
we
are
doing
is
we're
just
creating
or
defining
a
callback
you're,
not
actually
calling
the
call
that
we
are
defining
alcohol
back
and
be
saying
that
when
this
callback
is
going
to
get
executed,
do
recovery
is
going
to
wake
up
at
a
particular
time
and
riku
recovery,
ops
and
what
it
ends
up
doing
is
taking
a
sleep
lock
that
we
have
defined
and
since
it
has
already
slept
at
that
time,
we
set
this
recovery
needs
sleep
variable
to
false.
We
call
recovery
a
cue
recovery
after
sleep
and
we
pass
it.
A
A
We
need
to
actually
determine
what
the
recovery
schedule
time
is
going
to
be
that
when
is
the
next
recovery
of
going
to
happen,
the
recoveries
the
time
based
on
what
the
time
now
is,
if
we
do
not
have
a
scheduled
time
earlier,
we
just
start
from
now,
but
if
we
do
have
a
recovery
schedule
time
earlier,
then
we
just
add
recovery,
sleep
amount
of
time
to
the
previous
time
and
schedule
the
next
recovery
job.
Now
we
know
what
callback
to
execute
and
when
to
execute.
A
This
is
what
ensures
this.
This
is
the
part
we
ensure
that
we
do
not
block.
This
is
also
an
asynchronous
operation.
We
do
not
block
while
we
are
all
sleeping,
so
it's
basically
going
and
adding
events,
and
whenever
there
is,
whenever
it
is
time
for
it
to
execute
it,
it
is
going
to
go
in
execute
our
that
callback.
A
B
A
It
has
a
few
variables
called
work
and
progress
recovery
star,
so
these
are
just
used
to
to
keep
track
of
the
state
in
which
these
recovery
ops
are,
and,
as
the
name
suggests,
if
recovery
has
started,
then
work
in
progress
is
going
to
be
just
true.
These
these
are
basically
for
tracking
and
ensuring
that
we
don't
keep
calling
the
same
form
again
again
as
long
as
one
recovery
off
has
been
cued.
A
A
A
Moving
on
okay,
we've
been
looking
at
this
yet
missing,
so
the
data
structures
that
we
were
trying
to
update
for
a
since
recovery.
So
this
is
basically
doing
the
same
thing
is
trying
to
look
for
the
primaries
PG
log.
It's
calling
a
get
missing
function
on
it
and
it's
trying
to
get
the
missing
set
off
for
that
particular
OSD,
the
primary.
Basically,
it's
trying
to
do
missing
not
nor
missing,
so
it's
basically
is
trying
to
see
how
many
missing
objects
are
there,
and
it
is
also
trying
to
find
how
many
unfound
are
there.
A
The
next
thing
we
check
is
if
the
number
of
missing
are
zero.
That
means
we
are
all
up
to
date.
Then
we
can
go
ahead
and
say
that
the
last
complete
and
the
last
update
on
that
particular
pair
is
the
same.
That
means
we
are
not
missing
any
information,
so
we
can
just
say
our
last
complete
is
the
last
update,
but
what
if
that
is
not
true,
let's
see
what
happens
so
if
non
missing
is
numb
unfound.
So
basically
we
just
saying
that
all
the
missing
are
basically
unfound
objects.
A
Then
we
go
ahead
and
say
that
we
are
going
to
recover
the
replicas,
but
before
we
go
and
look
at
what
our
replicas
does,
I
am
also
going
to
just
go
ahead
and
this
code
and
see
do
you
this?
This
is
the
first
chance
where
we
go
and
try
to
recover
replicas
and
the
started
variable
basically
determines
whether
we
have
started
recovery
on
the
replicas
or
not.
A
If
not,
then
what
we
try
to
do
is
that
we
try
to
recover
the
primary
because
it
says,
as
the
common
mentions,
that
we
still
have
missing
objects
that
we
should
grab
from
replicas.
So
we
will
now
make
an
attempt,
because
it
has
not
started
recovery
on
the
replicas.
We
will
try
to
recover
the
primary.
If
that
is
also
not
true,
we
go
and
do
this.
A
So
this
says
that
if
we
have
not
started
recovery,
that
means
this
condition
was
not
fulfilled
and
numb
unfound
is
not
equal
to
get
naman
found,
which
was
the
earlier
thing
that
we
found
your
numb
unfound
is
the
get
naman
found.
So
in
the
meantime,
we
have
changed
the
state
of
the
missing
sets
and
that's
why
we
want
to
go
and
recover
the
replicas
again
now,
let's
go
and
see
what
recover
replicas
does.
A
Eg
back
end,
it
just
opens
a
recovery
op,
it
returns
our
recovery
handle,
and
the
next
thing
that
we
do
is
we
check
whether
so
acting
recovery
backfill
as
I
mentioned
earlier.
It's
basically
all
the
acting
or
settees
recovering
OS,
DS
and
likely
the
recovering
means
the
async
plea
recovering
velocities
and
the
back
fill
targets.
We
check
whether
this
is.
A
B
A
That's
what
this
whole
code
block
is
going
to
do.
So
we
try
to
find
that
particular
pair
in
pure
missing
and
we
ensure
that
we
are
not
at
the
end
of
the
pure
missing
set.
So
nm
should
be
the
number
Singh
on
that
particular
pure
and
it
tries
to
append
or
push
back
as
a
pair
the
number
of
missing,
and
that
particular
say
it's
it's
similar
to
the
function,
candidates
by
cost,
where
we
had
a
cost
and
the
OSD
that
we
were
trying
to
add
to
a
set.
A
It's
just
doing
it
for
the
as
number
of
missing
and
the
pair
next.
What
we
do
is
we
sort
this
based
on
missing
objects
in
ascending
order,
so
we
know
which
phase
are
missing
of
objects
once
we
do
that,
let's
see
what
we
do
here,
okay,
so
we
are
going
to
iterate
over
all
the
replicas
that
are
there
in
the
replicas
by
missing
that,
and
we
are
first
going
to
say,
you're
gonna.
A
So
pure
info
is
another
data
structure
that
we
maintain
for
all
the
pair's
which
has
which
is
basically,
if
you
look
at
it,
it's
a
mapping
of
the
chard
and
the
PG
info
for
that
particular.
Here
again,
we
need
to
ensure
that
there
is
an
entry
off
or
that
they
are
in
pure
info
as
well,
and
once
all
this
is
met.
What
we
do
is
we
try
to
get
the
num
missing
for
that
particular
there,
which
is
M,
sighs,
I,
guess
s
service.
A
A
We
are
just
going
to
say
that
we
are
good
to
recover
the
oldest
first.
So
if
you
go
here,
we
are
iterating
over
the
missing
set
in
the
reverse
order,
which
is
the
are
missing,
and
we
are
just
checking
for
a
bunch
of
things
again
here.
So,
let's
see
what
we
are
checking
here,
we
are
checking
whether
the
object
is
so
missing.
Lock
is
again
real
structure.
I
talked
about,
so
you
can
go,
make
a
call
to
war
that
function
and
find
out
whether
it
is
an
unfound
object
or
not.
A
If
it
is
an
unformed
object,
we
just
tape
as
unfound
and
we
contract
over
it
right
away.
So
we
continue
next.
What
we
do
is
we
check
whether
this
object
is
passed
last
fat
fill
or
not.
If
that
is
the
case,
then
if
it
is
not
present
in
the
recovery
okay,
so
it's
basically
going
to
check
whether
it's
fast
backfill
and
hard.
That
means
we
do
not
need
to
do
thoughtful
on
it.
A
A
We
error
out
and
say
that
the
object
is
beyond
last
brough
an
object
added
to
them,
so
it's
been
added
to
the
baffle
set,
but
it
is
not
recovering
it
and
that's
an
error
condition,
but
if
we
do
find
it
in
the
recovering
prompt,
that
means
we
know
that
this
object
is
already
recovering
and
we
don't
need
to
perform
recovery
recovery
again.
So
we
just
continue
again.
A
Let's
see
what
this
does.
Okay,
now
we
before
doing
the
recovery,
we
need
to
check
whether
that
object
has
been
deleted
or
not.
If
the
object
has
been
deleted.
We
use
this
function
called
prep
object,
replica
boletes
to
do
recovery
and
it's
a
little
different
from
of
this.
Another
function
called
prep
of
object,
replica
pushes
where
we
actually
try
to
push
all
the
updated
information
to
for
that
object,
but
when
it
is
a
delete,
we
just
need
to
ensure
that
we
delete
it
from
all
the
other
replicas.
A
So
that's
that's
why
we
have
a
different
function
for
deleted
objects.
Then
there
are
again
a
bunch
of
other
checks
that
we
do,
and
this
is
what
the
important
part
is.
This
is
where
we
actually
send
the
object.
The
need
object.
We
need
version
of
that
object,
the
handy
little
probably
handy,
and
the
work
started
vailable,
which
is
again
going
to
go
and
penetrate
one
down
like
one
other
level
and
just
go,
does
inform
us
whether
prep
object
replica
pushes
has
started
any
work
for
the
replica
recovery
or
not
I.
A
Think
we
are
out
of
time,
but
I
mean
how
do
we
want
to
go
about
it?
I
mean
I,
there's
a
whole
of
there's
lot
of
code
that
we
want
to
go
through.
The
there's
prep
object
pushes
then
there
is
stops
recovery
and
there's
like
a
bunch
of
other
things
that
we
can
go
through.
So
how
do
we
want
to
do
it?
Do
we
want
to
do
it
like,
as
part
of
a
separate
talk
or
I?
B
B
B
A
D
B
A
The
idea
is
that,
since
a
synchro
carry
targets
are
not
part
of
the
acting
set,
we
can
still
afford
to
write
to
them
and
not
have
up-to-date
copies
on
them
because
they
are
not
part
of
the
acting
set,
so
they
will
not
participate
in
like
immediately
not
immediately
participate
in
returning
the
right
information
about
those
objects.
So
we
can,
if,
like
delay,
recovery
on
those
objects
using
async
recovery.