►
From YouTube: Ceph Crimson/SeaStor OSD Weekly 2020-09-23
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So,
let's
start
last
week,
I
as
as
always
I've
been
working
on
the
structure
test
fix
and
what
did
you
show
was
the
interval
changes
when,
where
we
still
have
a
pending
request,
waiting
for
a
object
context,
so
the
the
the
request
being
executed,
just
bail
out
and
keep
retrying,
but
the
the
ones
being
pen,
the
pending
ones,
managed
to
to
be
scheduled
before
the
interrupted
one.
So
we
have
a
out
of
order
response,
I'm
trying
to
reproduce
it.
A
B
B
Work,
why
would.
A
Because
I
think
the
pg
interval
changed
so
the
yeah.
A
C
I
see
so
yeah
so
the
so
I
guess
what
you
need
is
you
need
it
to
be
the
case
that
the
interruptible
thing,
when
it
notices
that,
at
least
when
the
callback,
when
that
request
wakes
up
and
observes
that
the
interval
has
changed,
it
needs
to
drop
the
lock
and
re-cue
itself
right.
A
C
A
This
whole
thing
in
you
know
written
in
a
repeated
log,
repeat:
loop.
B
C
C
Okay,
so
let's,
let's
think
about
the
the
the
sequence
of
operations
that
happens
to
an
I
o.
The
first
thing
you
do
is
you
put
yourself
in
a
queue?
Then
you
come
off
the
queue.
It's
your
turn
to
run.
So
you
look
at
the
message.
You
figure
out
what
the
object
id
is.
You
read
that
object,
context
off
disk
and
then,
which
is
like
state,
that's
in
memory
that
you're
holding
a
pointer
to
then
you
it's
a
shared
point
or
whatever.
C
Then
you
try
to
take
the
rw
state
lock
on
the
object
state
or
whatever
this
new
tribe,
mutex
thing
right.
So
for
a
read
at
least
this
sequence
is
pipelined.
We
might
have
multiple
reads
occurring
on
the
same
object
and
there
could
be
like
10
reads
on
the
very
same
object
blocked
waiting
for
right
to
complete.
C
So
let's
say
that
then
the
pg
epic
changes,
it's
not
enough
for
the
right
to
that's
currently
in
progress
to
be
canceled.
All
of
those
reads
have
to
be
cancelled
too.
A
Hold
on,
we
don't
have
the
right
request
in
the
fly
when
we
have
read
requests.
C
A
A
Right
because
they
have
already
have
the
have
active,
pg
and
yeah.
C
C
C
C
B
C
Error
signal:
whatever
it's
going
to
use
that
signal
to
drop
its
lock
and
re-queue
itself,
then
the
next
read
is
going
to
get
its
lock
observe
that
the
pg
epic
has
changed
because
it
got
the
error
signal,
drop,
its
lock
and
re-queue
itself
and
so
on
up
the
chain.
Oh
so
they're
all
going
to
recue.
It's
not
just
the
right.
C
A
C
No,
the
p,
no,
no,
no,
no!
No!
No!
No!
The
osd
goes
back
in
the,
so
we
right
now,
there's
like
so
in
classic
osd,
there's
an
actual
queue
right.
What
we
do
is
we
re-cue
the
op
into
the
very
central
osdr
processing
pipeline,
the
message
itself,
the
original
message:
we
dropped
all
of
the
in-memory
state
for
that
rook
for
that
request
for
crimson,
it's
a
little
more
complicated
because
we
don't
have
a
queue
like
that.
We
have
an
implicit
queue
in
the
form
of
the
pipelines.
C
So
what
you
should
probably
do
is
drop
absolutely
all
of
the
state
associated
with
the
pipeline
and
go
back
to
the
beginning.
The
very
first
thing
where
you
pick
up
the
pg
lock
and
start
from
from
there
or
where
you
pick
up
the
pg
and
start
from
there.
Remember
after
repeating
state
change,
it
might
not
even
be
the
same
pg.
A
C
A
But
in
case
of
them
acting
saves
their
change.
We
can
assume
that
the
it's
just
acting
says
change.
It
does
not
imply
that.
C
The
primary
change,
though
you
would
have
to
go
through
literally
every
check
the
primary
does
and
make
sure
that
it's
invariant
over
an
interval,
change
and
they're,
not
after
an
interval
change
recently
written
objects
will
typically
be
degraded,
that's
normal,
because
the
replicas
may
not
have
seen
the
most
recent
io,
so
the
primer
is
going
to
have
to
re-recover
the
current
state
of
that
object
back
to
them,
which
means
rights
have
to
block,
which
means
those
reads
needed
to
block
earlier
in
the
chain
they
needed
to
block
on
wait
for
degraded
object
or
whatever
I'm
telling
you.
C
Too
many
different
ways:
this
can
be
difficult
one
day
in
the
future.
We
might
choose
to
be
smarter
about
this,
but
yeah.
It's
not
easy
to
do
this
correctly.
It
would
be
simpler
and
more
effective
and,
in
the
common
case,
exactly
as
performance
to
simply
re-queue,
in
that
case,
mostly
you're,
going
to
have
to
perform
the
same
checks
in
the
same
order
anyway.
So
this
even.
C
The
primary
is
clearly
the
same,
but
think
of
all
the
things
that
aren't
necessarily
the
the
same.
The
object
version
could
have
changed.
The
current
object
context
on
disk
could
have
changed.
It
doesn't
seem
like
that's
true,
but
it
actually
is
because
after
appearing
after
appearing
the
pg,
authoritative
pg
log
may
have
changed.
C
C
So,
let's
say
after
like
when,
when
a
pg,
when
a
pg
interval
change
happens,
there
are
100
ios
outstanding
on
a
pga.
The
primary
has
persisted
them.
None
of
the
replicas
has
seen
them
yet.
So
when
the
osd
map
goes
out,
changing
the
acting
set,
but
not
the
primary,
the
primary
will
be
the
only
p
osd
with
those
100
log
entries.
C
So
during
peering
we're
going
to
look
at
we're
going
to
compare
those
log
entries
to
our
peers
with
a
replicated
pg
we're
going
to
decide
that
all
of
those
objects
are
degraded,
because
the
primary
has
a
copy
that
no
one
else
has
so
before
we
can
do
anything
to
them.
They
have
to
be
re-replicated
over
and
you'll
notice.
The
wait
for
degraded
check
actually
does
happen
on
a
read.
C
Request
can
situationally
happen
on
a
read
request,
depending
on
whether
it's
rw
and
whether
it's
right
ordered
there
is
a
lot
of
detail
in
liberators
flags
that
change
the
way
read
and
write
tops
are
ordered.
I
really
really
really
do
not
think
it
is
a
good
idea
to
recheck
all
of
these
conditions
again
just
to
avoid
what
is
frankly,
not
an
expensive
operation
which
only
happens
during
peering
anyway.
C
So,
by
the
time
we're
even
talking
about
this,
we've
gone
to
the
trouble
of
doing
two
entire
message
round
trips
to
the
rest
of
the
acting
set,
doing
two
entire
commit
cycles
and
re-upping
all
of
the
primary
state
just
having
to
redo
a
couple
of
lines
of
code
in
the
op
processing
pipeline
is
small
potatoes.
C
C
As
though
it
were
a
brand
new
message
right
off
the
wire,
you
just
have
to
be
very
careful
about
the
order
in
which
you
do
that,
because
they
need
to
go
back
into
all
of
those
pipeline
states.
In
the
same
order.
It's
semantically
identical
to
the
order
in
which
we
call
nqop
during
interval
change
in
classic
osd
you'll
notice,
we're
very
careful
about
which
queues
we
look
at
and
in
what
order.
A
A
C
The
but
I
mean,
if
you
look
at
the
implementation-
that's
literally
what
it
is.
It
uses
that
the
c-star
lock,
which
internally,
is
just
a
linked
list
of
tasks.
C
A
C
Yeah,
like,
like
I
said,
if
you
go,
look
at
classic
osd
you'll
notice,
we're
very
careful
about
the
order
in
which
we
call
enqueue
up
on
all
the
various
weight.
Cues
you're
doing
the
same
thing.
A
D
Yeah,
I'm
still
developing
an
tree
and
the
finished
display
increase
and
doing
the
make
balance
test
case
and
for
the
initial
error,
and
I
made
the
problem
on:
don't
have
the
crimson
cluster,
so
I
have
already
saw
because
who
has
already
applied
a
pr
file
to
solve
the
problem.
I
will
try
and
try
to
see
if
it
is
fixed.
C
Yep
I've
been
working
on
finishing
up
the
pr
for
garbage
collecting
regard
for
basic
garbage
collection
in
c
store.
So
at
this
point
it
can
run
it
should
be
able
to
run
indefinitely
doing
a
random
io
workload
without
running
out
of
space.
So
that's
good
because
it
will.
It
will
clean
the
used
space
behind
itself
right,
so
it
should
use
about
the
amount
of
space
that
it's
actually
supposed
to
use
and
not
just
sort
of
all
of
it.
C
It's
got
a
tunable
that
says
like
you.
Can
it
has
two
tunables
one
which
is
a
target
free
space
to
maintain,
so
it
won't
allow
like
if
it
gets
to
that
amount
of
free
space,
say
20
or
something
it
will
garbage
collect
aggressively
to
maintain
that
amount
of
space
or
prevent
ios
from
happening.
The
other
parameter
is
a
ratio
of
used
to
unusable
space.
C
C
So
if
we
have
two
megabytes
of
segments
that
are
unavailable
because
we've
written
to
them,
but
in
of
those
two
megabytes,
only
one
megabyte
are
actually
live,
and
this
and
the
threshold
is
set
to
50
it'll
start
garbage
collecting
that
and
even
if
the
disk
is
relatively
empty.
This
way
we
won't
get
to
20
available
space
and
suddenly
garbage
collect
the
entire
disk.
C
C
No,
that's
actually
an
entirely
different
first
feature
right
now.
It
just
chooses
whichever
one
has
the
most
data
blocks.
Now
we
can
be
smarter
about
that
later.
No,
this
is
like
when
an
I
o
comes
in.
We
have
to
decide
how
much
garbage
collection
work
to
do
right
and
if
we
only
if,
let's
say
it's
a
four
terabyte
disk,
but
only
100
gigs
are
in
use.
C
C
B
B
Last
week
I
was
debugging
the
interruptible
future
and
I
think
I'll
submit
the
pr
as
soon
as
this
parts
are
delivered.
C
Thank
you.
How
was
the
review
on
the
part
one
of
the
dirty
extent
right
now
going.