►
From YouTube: Ceph Crimson/SeaStore Meeting 2022-06-22
Description
Join us weekly for the Ceph Crimson/SeaStore meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
A
All
right,
let's
get
started,
I'm
back
from
vacation
today,
I'm
catching
up
on
pr's.
I
got
through
about
a
third
of
them
today,
so
another
another
day
or
two.
C
C
We
also
can
see
the
benefits
from
the
improved
gc
policy
from
the
reduced
right,
amplification,
reduce
conflicts
and
improved
end
performance,
and
I
think
generation
gc
can
also
be
a
base
step
to
implement
device
tiering,
and
I
will
finalize
the
pr
this
week
for
review
and
I
also
got
a
clue
about
how
to
evaluate
the
ssd
internal
right
amplification
and
I
will
have
ever
tried
to
do
some
evaluation
later.
That's
all.
D
Hi
yeah,
so
I
sent
a
pier
it's
still
in
draft
state.
I
sent
the
peer
number
in
the
chat,
so
there
are
like
around
five
commits
in
that
most
of
it
is
cleaning
and
a
couple
of
correction
of
calculations
in
the
make
metadata
function.
Apart
from
that,
I'm
working
on.
So
what
what
I
was
seeing
was
one
issue
when
I
start
voiced
it
would
try
to.
You,
know,
read
all
these.
D
I
mean
segments
and
then
try
to
write
the
header
and
then
write
the
tail
and
the
tail
was
getting
written
at
the
end
of
the
segment,
but
that
will
not
kind
of
work
with
zns
because
it
has
to
write
a
try
pointer.
So
I
made
the
changes
to
write
the
so
my
plan
is
to
ensure
that
when
we
close
the
segment,
we
write
this
styles
retail
information
right.
D
So
when
we
close
the
segment,
you
write
the
tail
at
right,
pointer
and
when
you
try
to
read
the
segment,
then
you
read
it
from
the
right,
pointer,
minus
or
whatever
that
offset
4k
offset
and
and
to
get
the
time
information
from
there.
So
that
is
what
I'm
trying
to
do.
I
have
done
with
I'm
done
with
the
right
path
and
the
read
path:
I'm
seeing
that
there
is
no
at
least
as
of
now.
D
I
couldn't
find
an
easier
easier
way
to
get
the
right
pointer
in
the
in
the
right
path
when
we
do
a
read
segment
tail
that
function
basically
in
the
segment
manager
that
in
that
path,
there
is
no
easy
access
to
that
segment
right
pointer.
So
I'm
kind
of
currently
looking
at
making
adding
new
functions
to
get
that
information
at
right,
pointer.
D
So
yeah,
so
that's
where
I
am
at
and
so
in.
I
just
had
one
one
question
for
the
team
like
in
the
block
segment
manager
right
so
do
we
cache
the
segment
information,
all
the
segments
information,
or
is
it
something
that
we
calculate
every
time
or
read
every
time
like
the
right.
A
D
A
A
D
D
Okay,
okay,
so
in
zns
we
can
reopen
a
closed
closed
zone.
So
there
is,
there
is
the
option
to
transition
from
closer
to
open
state,
but
it
is
not
much
of
a
benefit
in
the
sense
that
both
a
closed
zone
and
a
open
zone
are
considered
as
active,
active
resources
in
the
in
the
spec.
D
So
so
there's
no
real
benefit
in
actually
closing
the
zone
versus
a
finishing
finishing
the
zone
finishing
the
zone.
Actually,
when
you
say
you
finish
the
zone,
then
that
zone
is
not
available
to
write
unless
you
reset
sorry.
A
That's
what
I
meant.
We
don't
bother
with
clothes
because
it
doesn't
it's
pointless.
I
meant
finish.
A
You're
right
closed
zones
don't
seem
to
have
any
purpose.
So
sorry,
what
I
meant
was
we
only
track
the
right
pointers
for
zones
that
are
not
yet
finished.
That
is
the
relatively
few
zones
that
are
open
for
right.
D
Okay,
but
when
you,
okay,
okay,
so
for
zones
which
are
open,
you
track
the
right
partner
and
one
more
question
I
had
was
the
the
for
the
crimson
ost.
What
is
the
default?
I
mean
in
the
c
in
c,
store
in
c
store.
What
is
the
default?
Write?
Manager,
like
I
mean,
do
we
use
the
zns
segment
in
normal
cases
or
zns
I
mean?
Does
the
segment
manager
I'm
sorry?
Segment
manager
is
specific
only
for
a
dns
use
case.
A
We're
not
sure
yet
for
normal
well
for
zns
devices,
obviously
we'll
be
using
a
segment
for
non-zns
devices.
It
will
depend
there
are
flash
based
devices
that
actually
prefer
append
only
write
workloads,
so
in
that
case
it
will
likely
make
sense
to
use
the
block
segment
manager
for
faster
devices.
It
will
make
sense
to
use
the
random
block
manager
that
we're
still
working
on.
D
All
right,
so,
if
you
want
to,
if
you
want
to
currently
if
I
want
to
run
c
store
on
a
regular
ssd,
not
a
dns
ssd,
which
the
does
the
segment
manager,
because
segment
manager
has
this
three
implementation
size.
So
one
is
the
cns.
What
is
the
block
and
what
is
the
fe
fp
ephemeral,
yeah
fml?
D
D
Right,
okay,
yeah
yeah,
I'm
saying
that
all
of
those
calls
are
getting
wired
to
the
zns
segment
manager
calls.
So
so
that
was
my.
That
was
my
understanding,
so,
okay,
so
for,
if
I
use
a
regular
device,
then
it
should
all
go
to
block
segment
calls
right.
Yeah.
B
I'm
working
on
the
pr,
according
to
the
comments
and
fix
least
object
to
issue,
no
root
cause
the
easiest
and
they're
working
on
the
pr
now.
A
Oh
aravind
ying's
in
linked
two
places
in
the
segment
cleaner,
where
we
actually
do
maintain
written
to.
But
I
think
what
you're
asking
about
is
within
the
segment
manager,
which
is
a
little
bit
different.
E
Another
thing
is
the
long
story
of
of
gcc
11
for
crimson
building
in
cicd.
Well,
crimson
builds
I'm
still
working
on
the
problem
with
python
bindings
compile
copulation
internally,
it
uses
python's
distributions
and
gcc
instantiated
from
there
unfortunately
goes
to
wrong
assembler
program
to
older
version,
the
one
that
is
not
the
default
system,
one
which
doesn't
support
one
of
the
options
right
now
putting
more
the
box
in
in
the
code
of
python's
disk
duties
just
to
understand.
Where
was
the
reason?
That's
me.
F
Last
week
I
was
maine,
I
was
trying
to
fix
several
bugs.
I
observed
when
I
restarted
a
restart,
a
crimson
ost,
one
of
which
is
this.
I'm
pissing
it
in
the
chat
window,
and
I
also
did
some
further
performance
analysis
for
the
gc
process.
F
The
result
is,
is
a
the
result,
shows
that
first
the
whole.
I
run
the
test
for
two
hours,
and
I
it
seems
that
the
whole
test
pro
the
test
procedure
can
be
splitted
into
three
phases.
F
The
first
of
the
first
is
that
when
the
disk
space
is
not
used
up
and
there's
no
space
reclamation
processes
running-
and
the
second
is
that
the
space
reclamation
process
is
joined
in
and
and
the
disk
space
is
used
up,
and
this
and
the
third
phase
is
about
after
after
about
50
minutes
since
the
start
of
the
test
in
which,
after
which
the
the
time
of
the
space
reclamation
process
overruns
all
other
gc
gc
cycles
in
terms
of
durations.
F
I
think,
for
I
think,
the
the
the
journal
for
the
journal
trimming
cycle
duration.
F
It
also
drops
following
if
it
drops
in
those
three
phases-
and
I
think
the
reason
is
that
as
the
as
as
the
tester
runs,
the
the
cycles
per
unit
time
of
the
journal
trimming
process
drops-
and
this
is
because
I
can-
I
observed
a
very
similar
curve
pattern
between
the
journal
treatment
cycles
and
the
journal
treatment
durations
for
the
second
for
the
third
phase.
F
F
There
came
up
a
lot
of
background
merge
operations
during
the
space
reclamation
cycle
and
that
time
the
time
take
and
the
time
it
takes
made
the
space
reclamation
duration
longer.
F
There
are
several
other
observations.
The
first
is
that,
although
the
journal
trimming
durations
the
journal,
trimming
time
drops
as
the
as
the
times
goes
on
and
then
the
the
the
extent,
the
number
of
extents
that
committed
by
general
trimming
transactions
are
actually
increased.
F
F
So
the
second.
The
second
extra
observation
is
that
as
to
the
journal
trimming
process
most
of
his
time
and
most
of
its
time
is
spent
on
committing
the
committing
the
transaction.
So
I
think
if
we
can
and
and
the
transaction
commit
time
also
occupies
a
significant
part
of
the
whole
space
reclamation
during
the
space
reclamation
duration.
F
So
I
think
if
we
can
implement
some
some
kind
of
a
pipeline
between
the
gcs
gc
cycles,
I
think
maybe
we
can
make
the
gc
more
more
possible
to
be
scheduled
to
run,
and
so
the
I
o
throughput
curve
can
be
smoother
right
now,
it's
the
curve
is
very,
very
rough.
F
If
the
I
o
is
blocked,
then
the
journal
trimming
transactions
commit
time
is
significant
lower
than
when
the
when
the
io
is
not
blocked.
So
I
haven't
been
able
to
dig
into
it,
but
I
think
there
may
be
some
extra
optimization
point
in
that.
In
that
part,
I
also
did
some
cpu
profiling
for
the
gcu
process.
F
It
seems
that
after
our
previous
work,
this
there
is
almost
no
solely
I
o
waiting
time
for
for
the
crimson
city
for
the
crimson
osd
and
the
cpu
is
is,
and
the
cpu
is
kept
busy.
So
I
think
the
next
step.
F
If,
if
we
want
to
increase
the
through
the
the
I
o
throughput,
we
need
to
reduce
the
work
for
the
gc
processes,
and
it
seems
that
right
now,
the
most
significant
part
of
the
cpu
time
used
by
gc
processors
is
that
they're
at
various
various
phases.
The
c
store
needs
to
get
lba
nodes
or
backgraph
nodes
from
cache,
even
if
they
are
already
in
memory.
F
I
I
set
the
the
whole
the
size
of
the
cache
url
to
about
one
gigabytes,
so
most
of
the
almost
mostly
most
of
the
lba
nodes
and
the
background
nodes
are
in
memory
so,
and
I
think
this
means
that
getting
getting
lba
or
back
reference
from
cash
might
be
might
be
a
might
be
an
overhead,
and
if
we
can
save
it
all,
we
can
significantly
reduce
the
workload
of
the
gc
processes.
That's
basically,
all
all
the
perth
analysis
results.
A
Just
a
couple
of
comments,
we
absolutely
do
have
to
add
the
object
data
blocks
to
the
cache
when
we're
doing
gc
it's
for
correctness
reasons,
while
the
I
o
is
in
progress
any
future
ios
that
come
in
on
that
block
need
to
be
able
to
see
the
projected
future
state,
but
that
part's
mandatory
you're
correct
that
it
does
pollute
the
gc
or
the
cache,
which
means
we
shouldn't
be
adding
it
to
the
lru.
But
I
believe
that's
our
that
optimization
is
already
implemented.
A
The
larger
observation
here
is
that
indeed
we
are
dpu-bound
now,
so
that's
good
news.
The
data
structures
underpinning
the
cache
and
the
lba
tree
are
very,
very
simple.
The
cache
in
particular,
is
just
using
an
intrusive
map.
There's
nothing
clever
going
on
there.
So
that's
where
I
would
probably
start.
I
would
probably
start
by
investigating
how
much
we
can
improve
the
the
cache
data
structure
itself,
as
well
as
the
way
we're
doing
lookups
in
the
lba
tree,
like
I
think,
we're
doing.
A
Linear
scans
we're
not
even
doing
binary
searches
on
the
blogs
unless
that
got
improved
already.
So
there's
like
a
lot
of
space
for
improvement
there.
If
I
were
you,
I
would
start
by
making
a
micro
benchmark
or
looking
up
a
block
already
in
cash
like
pre-populate
the
cache
with
a
bunch
of
or
pre-populate
c-store,
with
a
bunch
of
lba
notes
that
are
already
present
and
then
measure
the
time
it
takes
to
do
a
thousand
random
lookups
or
something
and
then
profile
that.