►
From YouTube: Ceph Crimson/SeaStore Meeting 2023-03-08
Description
Join us weekly for the Ceph Crimson/Seastore meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
B
A
Okay,
all
right,
let's
see
for
me
this
week,
I'm
still
working
on
I
was
on
PTO.
Last
week,
I'm
gonna
be
working
on
classic
scrub
and
then
Crimson
scrub
for
the
next
several
weeks.
I
think
Junction
how's
it
going.
A
C
I
have
some
updates
from
changing.
He
he's
not
joined
today.
So
the
final
final
grain
cache
is
available
for
review
and
changing
also
works
on
the
max
application
size,
and
there
was
bug
block
him,
and
probably
it
is
skill
C
or
what
is
going
on
here
with
the
max
allocating
size
feature.
So
my
my
work
this
week
is
is
mostly
about
reviewing
the
lba3,
optimization
and
I.
I
will
look.
Take
a
look
at
the
code
here
proposal
this
week
and
I
also
plan
to
allocate
more
efforts
to
the
multiple
messenger.
E
Hey
sorry,
I
do
not
have
any
major
updates
foreign.
E
Yeah,
so
what
I
was
saying
is,
since
the
last
couple
of
weeks
last
couple
of
week,
all
the
Crimson
images
that
I
am
getting
from
Sam
and
have
this
particular
issue
where,
when
we
are
trying
to
configure
the
cluster
and
when
we
are
adding
the
nodes,
that
is
throwing
an
exception,
I've
already
raised
up
some
ticket
for
this
and
spoke
with
nedson.
He
informed
me
that
this
is
not
in
Crimson
underlying
issue,
but
rather
issue
with
the
reef
reef
build
itself,
and
we
are
not
yet
sure.
E
A
Weird
yeah,
he
also
touched
anything
about.
It
must
be
a
problem.
F
A
Okay,
Kevin.
D
G
Last
week,
I'm
doing
the
vertical
system
modification
and
the
other.
All
the
modification
is
done
and
I
do.
The
riddles
bench
write
reader
reboot
for
the
CN
storm
system
install
and
just
the
corresponding
you'll
need
to
test
it.
So
all
works.
So
please
review.
Okay,
that's
all.
A
Okay,
Rocky.
F
B
B
I
was
really
sorry.
Okay,
this
week,
I
was
mainly
modifying
the
LBA
point
of
viewer,
following
in
inches
suggestion
that
that's
all
that's
all
I,
also
assisted
my
colleague
Johnson,
to
write
the
design
documents
for
the
my
volatile
data,
cache.
G
B
G
A
Does
anyone
else
have
anything
else
to
discuss
before
we
continue
on
talking
about
the
hot
data?
Non-Volatile
cache
talk
about.
A
All
right:
do
you
want
to
put
a
link
into
chat
to
the
document.
B
Okay.
Okay,
sorry,
no
worries.
B
Okay,
the
purpose
of
this
this
documents
is
to
describe
a
Machinery
that
we
propose
for
implementing
for
for
implementing
the
hot
data
cache
function
in
systore,
in
which
we
cache
data
data
that
are
frequently
accessed
or
were
frequently
read
in
the
part.
Here.
That's
that's
the
purpose.
So
that's
the
purpose
of
this
document.
B
The
design
principles
are
as
well
as
first
on
first
I,
think
to
castrate
or
to
cast
it
in
hot
tiers.
We
need
to
utilize
the
locality
of
the
applications,
access
pattern
on
this
data
and
I.
Think
in
Seth
object,
upper
upper
layer,
applications
logical
address
space
are
so
are
partitioned
into
objects
at
the
radius
level,
the
red
is
level
and
those
objects
are
hashed
onto
a
set
of
pgs,
which
is
scattered
across
the
cluster.
B
So
the
locality
of
access
is
of
the
locality
of
access
is
across
the
in
the
routers
objects
are
some,
some
is
someone
lost,
so
in
sister,
the
locality
we
can
see
is
is
on
the
is
on
The
Logical
address
space
of
extents
within
the
same
owner,
so
we
think
that's.
We
need
to
construct
this
cache
based
on
based
on
The
Logical
addresses
of
the
logical
address
space
of
within
the
sink
within
the
same
object.
That
means
the
cache
lines
can't
cross
the
boundaries
of
redis
objects.
B
So
that's,
that's
that's
the
main
principle
and
insist,
or
we
think
there
are
two
types
of
data.
One
is
meditative
in
the
metadata
of
syslord
itself.
It
contains
a.
B
Stands
back,
Graphics
turns
and
all
node
instance,
and
the
other
side
is
data
from
upper
layered
applications
like
object,
data
and
or
map
data,
and
we
think
that
the
access
pattern
of
these
two
types
of
data
are
relatively
different,
because
the
data
is
the
data.
Access
is
issued
by
up
layer
applications
and
they
should
show
obvious
locality.
B
Well,
the
metadata
may
not
be
the
same.
The
reason
is
that
or
the
metadata
are
all
all
B
trees
and
the
keys
of
these
batteries
are
basically
either
fiscal
addresses
or
logical,
addresses
that
create
created
based
on
the
crush
hash
of
redis
objects.
So
we
think
that
some,
although
extends
within
the
same
object
within
the
same
radius
object,
will
have
contiguous
logical
addresses
well,
but
the
but
the
extents
about
body
stands
from.
B
B
An
orbitic
image
can
have
several
or
tens
of
terabytes
and
in
red
in
red,
as
objects
are
relatively
small
I
think
that's
basically
32
megabytes
at
the
largest,
so
one
RBD
image
can
have
hundreds
of
thousands
of
objects,
rattus
objects,
and
if
so,
if
only
say,
five
percent
of
his
space
is
hot,
then
that
hot
space
will
still
contain
tens
of
thousands
of
objects,
which
means
10
tens
of
options.
Tens
of
thousands
of
LBA
Leaf
nodes
will
be
accessed
equally
frequently,
so
we
think
that
so
this
is
the.
B
This
is
the
main
difference
in
the
access
patterns
of
metadata
and
data
So.
Based
on
this
assumption,
we
we
think
that
perhaps
we
should
we
should
cache
data
and
evict
the
data,
a
cache
data.
We
should
load
data
from
culture
to
heartier
and
evicted
from
hot
tier
to
culture
based
on
based
on
the
the
data's
heat.
Well,
we
can
rely
on
the
current
Turing
Machinery
to
to
evict
metadata
and
I.
We
think
that's
the
metadata
doesn't
have
to
be
loaded
back
to
back
to
the
the
hot
tier.
B
This
this
process
can
be
I
think
they
can
come
back
to
the
part
here
on
the
right
or
mutation.
That's
that's
the
main
principles
of
our
design.
I,
don't
know
if
they're.
If
there
are
any
questions
about
this.
A
A
Oh
I'm,
getting
a
huge
Echo
from
you
join.
It
might
just
be
that
your
microphone
is
equal
to
your
speakers,
but
anyway,
it's
still
the
case
that
LBA
he
he's
close
together
in
the
LBA
tree
are
more
likely
to
be
accessed
because
they
will
belong
in
general
to
the
same
data.
Portions
of
the
same
objects,
so
I
actually
think
the
LBA
tree
is
going
to
have
just
as
much
locality
properties
as
the
data
blocks
and
references
most
of
the
time.
A
I'm
not
super
fond
of
having
a
separate
system
for
data
blocks,
I'm,
okay,
with
having
a
separate
policy,
but
we
already
have
it
hearing
system
and
I.
Don't
know
if
Junction
wants
to
jump
in
here
and
disagree,
but
I
would
be
more
inclined
to
find
a
way
to
phrase
this
in
terms
of
the
existing
tiering
design.
B
A
A
C
Me
share
my
thoughts,
so
the
current
design
with
tearing
is
a
code
here
and
hotel
is
based
on
the
generations
which
the
goal
of
generation
is
to
minimize
rights,
the
overall
right
amplification,
but
during
cleaning.
So
so
in
order
to
make
make
painting
efficient
as
possible,
and
it
is
not
designed
for
accelerate
for
the
hot
here
to
access
the
rate,
the
frequent
access
or
frequent
reader.
The
intense,
because
I
saw
that
if
we
extend
the
memory
enough,
the
voltage
have
memory
enough.
It
has
absorb
it
can
cover
this
case.
C
So
I
think
the
shahan's
proposal
is
to
to
also
add
this
responsibility
to
the
hot
tier.
So
we
can
promote
the
frequently
access
the
extents
put
hot
here.
A
A
C
A
D
A
A
A
There
would
be
this
extra
in-memory
structure
for
speeding
eviction
from
the
hot
tier,
but
that
may
not
even
be
necessary.
You
can
pursue
these
two
things
independently.
A
A
Look
into
this
different
eviction
strategy,
I'm
a
bit
skeptical
of
this
eviction
strategy,
because
I
think
it's
going
to
use
quite
a
bit
of
memory
and
reconstructing
it
on
Startup
strikes
me
as
mildly
expensive,
but
I'm
prepared
to
be
convinced.
Once
you
have,
you
know
more
numbers,
but
it's
it's
it's
possible
that
once
you've
implemented.
F
B
I
have
one
question:
the
eviction
of
data
used
to
be
based
on
its
based
on
the
data's
heat
right.
We
need
to
we,
we
we
don't,
we
won't.
We.
We
probably
want
to
avoid
evicting
hot
data
to
culture,
so.
A
A
A
So
that's
going
to
absorb
some
of
the
some
of
the
locality.
It
would
be
interesting
to
measure
how
much
the
other
piece
is.
There
are
strategies
other
than
an
explicit
linked
list
of
addresses.
You
can
use
time,
decaying
Bloom
filters
to
do
a
sort
of
a
lossy
estimate
of
whether
extent
ranges
have
been
recently
read
and
there
are
other
tricks
like
that.
You
lose
them
on
Startup,
of
course,
but
that's
not
necessarily
a
big
deal.
A
Anyway,
so
I
think
what
I'm
pointing
out
is
there's
more
than
one
way
to
do.
The
eviction
part
and
the
heat
estimation
part
that
use
more
or
less
memory
to
get
more
or
less
accuracy
and
I'd
encourage
you
to
think
about
that
when
you're
doing
the
design
and
try
to
make
it
plugable
so
that
we
can
change
it
later.
A
B
I'm
sorry,
I,
didn't
I,
didn't
hear
the
last
sentence.
A
B
A
A
Any
system
it'll
be
running
and
it's
the
second
one,
so
any
RBD
block
device
that
isn't
sitting
behind
fio
will
be
in
a
virtual
machine
with
probably
Linux
running.
That
Linux
will
have
a
page
cache
at
its
own.
Read
cache.
So
it's
actually.
You
may
find
that
in
real
life,
it's
improbable
that
you
get
successive
reads
on
the
same
block,
because
the
first
reads
going
to
load
it
into
the
virtual
machines
cache.
B
Yes,
but
what
these
the
scenarios
that
we
are
considering
is
we
put
set
processes
and
the
application
process
all
within
the
same
machine,
and
that
will
make
the
memory
very
very
expensive,
because
the
single
machine
can
own,
maybe
several
hundreds
of
gigabytes
of
memory,
but
it
will.
A
lot
of
applications
will
have
to
share
those
memories.
So
the
memory
is
that
the
page
cache
might
be
using
could
be
very
limited.
A
B
No
I'm
not
saying
sister's
memory,
I'm
saying
that,
since
the
memory
that
can
be
util
that
can
be
utilized
by
Page,
casual
or
any
kind
of
cache
is
very
limited,
we
will
need
to.
We
will
need
the
the
hot
tier
to
serve
as
a
secondary
level.
Cache
is
that
right.