►
From YouTube: Ceph Performance Meeting 2020-07-30
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
let's
see
so
it's
been
a
couple
of
weeks
since
the
last
meeting,
but
it's
also
the
summer,
which
means
things
are
a
little
slow.
A
A
I
think,
if
I
remember
right,
it's
avoiding
a
second
iteration
through
the
list
or
something
so
that's
fantastic
and
then
there's
another
one
about
d3n
cache
changes
from
upstream,
I'm
not
quite
sure
what
that
is
but
looks
like
it's
being
reviewed
and
tested.
So
that's
good.
A
Let's
see
we
had
a
couple
of
pr's
closed
in
the
last
two
weeks.
Two
of
them
are
from
majiang
ping
related
to
bluefest.
Those
are
both,
I
think,
minor
improvements
in
different
ways.
I've
got
a
pull
request
for
the
mds
that
merged.
A
I
had
opened
that
I
think
last
week,
but
it's
basically
just
a
a
fix
for
cash
trimming.
When
you
have
lots
of
subtrees
this,
we
previously
were
we're
iterating
over
the
entire
list
of
sub
trees
to
calculate
the
number
of
sub
trees.
But
in
reality
we
already
stored
that
in
a
variable,
though,
we
didn't
need
to
redo
that
work
and
that
just
avoids
redoing
it.
A
So
it's
actually
fairly
substantial
when
it
happens,
but
it
may
not
be
that
often
so
anyway
that
got
merged,
and
then
there
was
another
pr
that
merged
regarding
ceph
adm
that
creates
osds
in
parallel,
rather
than
creating
them
one
at
a
time.
So,
as
you
might
suspect,
that's
a
fairly
substantial
improvement
for
self-adm.
A
There
was
only
one
other
pr
I
saw
that
got
updated,
and
that
was
another
one
from
majiang
peng
about
enabling
rocksdb
pipelined
right.
He
had
responded
to
a
question
I
had
regarding
performance.
A
A
All
right,
well,
maybe
before
I
get
into
the
work
that
radek
and
I've
been
doing
buffer
list
ring
buffers.
Is
there
any?
Are
there
any
topics
that
folks
would
like
to
bring
up
this
week?.
A
All
right
well,
then,
well
radik,
would
you
like
to
explain
your
your
ring
buffer
and
and
what
you've
been.
B
Doing
well
we're
actually
working
on
that
together,
I
can
go
the.
C
C
Those
allocations
actually
an
offspring
of
the
fact
that
mgs
don't
use
the
dink
the
dank
framework
of
booster.
It
was
initially
developed
for
booster.
It
uses
the
old
encoding
stuff,
which
means
that
there
is
no
pre-processing.
There
is
no
reservation
past
the
buffer
list.
That
would
allow
to
allocate
memory
in
one
single
goal
and
as
a
result
of
that
buffer
list
from
from
time
to
time,
really
needs
to
go
to
the
this
malloc,
but
this
malloc
student
house
to
be
extremely
costly.
C
C
That
would
be
that
would
allow
to
allocate
big
tank
and
big
chunk
of
contiguous
memory
and
actually
fragment
it
over
sure.
It's
over
the
buffer
role.
Instances
buffalo
are
the
guys
responsible
for
for
storing
the
physical
data
in
buffer
list.
C
Currency
dialogue
go
ahead,
the
current
okay.
We
are
managing
the
allocation
at
the
moment
using
extremely
damp,
extremely
simple
idea
about
a
ring
above
a
circular
ring
a
circular
buffer,
a
ring.
It's
it's,
because
we
need
to
deal
with
one
pretty
nasty
things,
one
pretty
nasty
thing
of
buffer
list.
C
It
allows
the
allocations
to
be
performed
from
a
separated
threat,
different
one
that
the
one
who
made
the
allocation,
and
because
of
that
we
got
some
extra
complexity,
but
well
still
trying
to
fight
it
and
what
we
got
is
actually
pretty
interesting
drop
in
the
usage
of
cpu,
the
md
submits
threat
of
mds,
the
one,
the
one,
the
single
the
dedicated
the
specific
instance
responsible
for
doing
journaling
actually
drops
significantly.
C
But
we
are
going
further
with
testing
mark
made
a
lot
of
runs
showing
it
and
they
showed
that
they
showed
that
huge,
really
huge
amount
of
memory
flows
through
the
rings.
C
C
It
goes
to
the
it
goes
and
allocates
by
in
4k
almost
4k
for
units,
and
that
might
be
really
wasteful
when
it
comes
in
scenarios
when,
when
somebody
happens
just
a
few
bites,
the
baffle
is
trying
to
we're
trying
to
to
estimate
the
ways
there.
A
I
I
put
in
the
chat
window
a
link
to
a
spreadsheet
that
we're
looking
at
that
shows
when
reddick
had
mentioned
the
data
flowing
through
the
ring.
This
is,
is
kind
of
what
he
was
talking
about.
A
Basically,
when
you
look
at
the
md
test,
easy
results,
it's
like
a
five
minute
test
and
then,
by
the
time
you
get
down
to
the
hard
right
results
that
all
tests
together
have
been
running
and
aggregate
for
maybe
about
25
minutes
or
so
about
an
hour.
So
just
about
25
minutes
and
we're
seeing
sometimes
up
to
hundreds
of
gigabytes
of
data
flowing
through
the
rings,
and
this
is
just
for
the
metadata
journaling
in
the
mds.
A
It's
not
even
you
know,
data
flowing
through
the
osd's
or
anything
it's
just
the
that
that
portion
for
for
the
mds.
So
it's
it's
a
little
crazy.
One
thing
radic
that
I'm
I'm
thinking
here
is
that
there's
this
tension
right
between
what
we're
doing,
where
we're
making
these
these
bigger
allocations
up
front
to
fill
in
for
the
buffer
list
that
don't
get
used
necessarily
and
versus.
A
A
But
I
wonder
if
this
benchmark
does
not
show
us,
it
really
showcasing
the
real
behavior
very
well,
because
maybe
when
you
have
all
that
memory,
wastage
and
you
have
to
go
back
to
the
tc
malloc
central
pre
cache,
you
can't
use
thread
cache
anymore.
Maybe
that's
work
that
we're
not
really
capturing
in
the
benchmark.
A
Yeah,
so,
even
though,
in
the
benchmark,
it
looks
much
better
to
be
doing
the
the
you
know
upfront,
larger
allocation
that
then
we
can
use
you
know
for
for
future
things.
Maybe
in
reality
it's
it's
not
as
good
as
we
think.
A
C
C
Always
that
doesn't
make
a
huge
sense.
I
believe
it
might
be
really
worth
to
spend
some
time
and
try
and
implement
dynamic,
dynamical
growing
of
of
the
allocation
unit
depending
on
the
history,
depending
on
the,
for
instance,
on
the
data
stored
inside
buffers.
A
I
was
really
surprised
at
how,
when
we're
doing
you
know
we're
we're
making
allocations
the
traditional
way
by
using
the.
A
However,
we
calculate
alen
here
the
stuff
buffer
alec
unit
size,
how
okay,
so
then
we
it
makes
sense
we're
doing
like
you
know,
4k
roughly
allocations,
but
when
we
don't
do
that,
it
drops
all
the
way
down
to
190
sorry
yeah,
182
bytes.
A
C
Yeah,
that's
that's
good
questions.
What
I
can
recall
from
from
the
history
of
buffaloes
is
that
man
many
many
years
ago
we
had
a.
We
introduced
a
concept
called
append
buffer.
C
It
was
an
optimization
to
read
to
amortize
the
cost
of
allocations
made
during
appending
to
baffles
and
the
and
the
big
alloc
size
is
actually
I
I
think
it
was
defined
there.
B
But
I
don't
know
why
I
have
no
idea
what
why
it's
so
big.
C
C
Maybe
we
should
maybe
we
should
also
re-evaluate
the
allocation
policy
for
small
appends.
A
A
And
if
it
was
just
memory
right,
if
it
was,
if
all
you
were
doing,
was
wasting
memory-
okay,
fine,
this
trade-off
between
cpu
and
fine,
but
because
of
the
way
that
our
allocation
patterns
work
and
are
that
we're
multi-threaded?
It's
not
just
wasting
memory.
Now
it's
also
wasting
diffuse.
You
gain
cpu,
but
you
waste
cpu
too.
C
Yeah
to
simulate
them
and
free
from
different
threats,
that's
gonna
be
costly,
and
I
bet
it
might
be
actually
a
typical
pattern
when,
let's
say
a
tpu
hdtp
makes
some
processing
made
makes
the
encoding.
But,
finally,
those
those
buffers
are
freed
in,
let's
say
messenger
handing
the
messenger
workers
handling
the
out.
C
C
Do
we,
but,
on
the
other
hand,
I
cannot
recall,
I
cannot
recall
such
when
I
was
tracing
osd
last
time.
Okay,
it
was
a
long
time
ago,
but
I
haven't
noticed,
stinks
like
I
haven't,
satisfied
this
like
in
the
mds
case.
A
E
That
was
something
matt
was
interested
in,
not
in
that
was
one
of
the
things
that
he
was
interested
in
looking
at
just
in
the
just
on
how
much
cost
we
were
paying
for
all
the
atomic
increments
and
decrements.
A
Yeah,
I
was
just
thinking
the
same
thing
right.
You
were
saying
it
was
when
you
were
just
annotating
the
raw
tls
destructor
you're,
seeing
50
of
the
time
spent
in
that
single
atomic
exchange.
C
C
In
allocate
okay
58
of
time
of
cycles
burned
in
the
raw
tls
allocate
was
on
that
pasting.
Oh.
C
Oh,
the
self
is
smaller
than
in
case
of
raw
dls
create
in
this
testing.
It
was
looking
five
percent
of
cycles.
A
Okay,
I
just
posted
the
the
perf
call
graph
that
I
gathered
from
looking
at
the
the
basically
the
cash
ring
implementation,
radix
cash
ring
implementation,
and
this
is
the
case
when
we
are.
A
Just
requesting
what
we
need,
rather
than
the
rounded
up
to
whatever
that
constant,
is
or
whatever
the
stuff
something.
A
A
A
C
Okay,
I
I
said
that
okay,
this
was
in
five
percent,
was
from
the
append
bench
test,
not
from
the
upper
hall
bench,
which
is
supposed
to
exaggerate
the
refill,
the
refilling
path.
C
C
D
C
A
Places
well,
I
suppose,
we've
probably
exhausted
all
the
things
we've
we've
got
so
far
right
radic,
I
think
so
yeah
all
right.
Well,
that's
that's
it
for
now
guys
we're
still
working
on
this.
Don't
have
a
lot
to
do.
I
think,
but
hopefully
we'll
make
progress
and
and
be
able
to
at
a
long-term
goal
right.
The
the
real
goal
here
is
to
be
able
to
make
a
buffer
list,
and
specifically
encoding
and
decoding
faster
without
actually
having
to
change
the
this
of
the
existing
encoding
decoding
over
to
tank.
A
So
I
don't
know
we'll
see.
Maybe
we'll
have
progress,
maybe
not,
but
that's
that's
the
goal.
A
Yeah
all
right
well
anything
else
guys
or
is
that
it.