►
From YouTube: 2020-05-07 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Alright,
let's
get
started
okay,
new
pr's.
In
the
last
two
weeks,
I
only
saw
two
as
possible:
I
missed
one
though
the
first
is
from
Adam
and
that's
for
this
type
of
the
data
log,
which
I
very
much
enjoy
your
comments.
Adam,
don't
never
never
lose
them.
I,
don't
actually
know
what
this
does
Adam.
What
what's
this
P
argue.
B
So
we
have
been
under
the
ruthless
domination
of
old
map
for
quite
a
while,
and
we've
been
trying
to
use
less
of
it
in
our
GW,
because
I
think
we're
probably
the
heaviest
biggest
consumer,
especially
in
things
that
are
essentially
linear.
It
doesn't
really
make
sense
to
use
a
key
value
store
to
emulate
a
queue,
and
so
this
is
basically
just
implementing
a
segment
of
queue
on
top
of
radius
objects.
B
A
B
A
A
A
All
right
also
the
other
big
new
PR,
so
that
I
guess
we
have
two
big
new
PR
czar,
wonderful
you're,
the
Sam's
initial
C
store
work
and
Keith.
Who
has
been
reviewing
that
I?
Don't
know
what
state
it's
in,
but
just
having
that
there's
a
really
big
milestone
so
really
really
excited
to
see
what
what
we
see
with
that
work.
A
All
right:
well,
we
had
a
bunch
of
stuff
closed
this
week.
There
was
a
bug
fix
for
the
monitor
to
make
sure
that
the
priority
cache
manager
invocation
is
also
regularly
calling
balance
and
to
memory
to
make
sure
that
we're
properly
releasing
freed
memory
and
then
also
regularly
rebalancing
things
as
we
should
be.
So
that's
good
that
should
improve
the
memory.
Balancing
and
memory
consumption
of
the
monitor
key
Fuu
had
a
PR
that
increased
the
standard,
deviation,
variance
in
the
Crimson
and.
A
It's
the
oh,
this
one
for
increasing
the
4k
metallic
size.
Spinners
from
Egor
emerged
that
that
was
okay
to
merge
now
because
of
his
other
work
merging
with
the
the
hybrid
allocator
and
also
the
deferred
big
rights
PR.
So
that's
really
good.
Look
at
lots
of
testing
in
on
all
of
that.
Over
the
next
couple
of
months
of
master.
A
Ma
GN
pings
PR.
This
doesn't
actually
change
the
OSD
on
threads
per
shard
anymore.
This
one
instead
fixed
a
bug
and
where
we
were
not
properly
sending
notification
to
all
threads
to
wake
up,
I
believe
I.
Remember
it
that's
what
he
changed
in
any
event
improves
a
performance
when
you
have
like
random,
read
workloads.
So
it's
really
good.
A
Adams
starting
PR
merged,
that's
a
big
one,
that
merged
I
think
we're
still
waiting
on
his
tool
to
be
able
to
convert
OS
DS
from
the
old
format
to
the
new
format.
But
having
that
pair,
there
means
now
we
can
get
a
couple
of
other
peers
have
been
waiting
implemented,
including
the
double
cash:
oh,
no,
double
cash,
PR
fix
and
also
the
age
based
bidding,
and
then
we'll
also
have
to
change
that
tool
for
getting
the
OSD
conversion.
A
Since
the
doto
double
cache,
PR
will
cause
an
on
disk
format:
change
just
like
the
column,
family,
sharding
wounded.
So
hopefully
we
can
roll
all
that
up,
then
for
Pacific
and
the
last
one
that
merged
is
this
one
from
KC
to
use
iterators
for
comparison
operators
a
buffer
list.
We
had
a
couple
that
were
updated.
A
Some
discussion
on
my
PR
for
the
MDS,
you
add
new,
expected
a
new
expected
files,
lag
or
directory
exit,
or
rather
just
discussion
with
the
island
about
how
to
implement
that
and
kind
of
general
feeling
on.
If
this
is
a
good
route
to
go
or
not,
the
objection
triumphant
had
another
review
and
I
think
another
update.
In
the
last
two
weeks,
Thank
You,
Leah,
Oh,
fantastic,
very
good,
was.
B
A
A
This
blue
store,
right,
lock,
I
think
we're
all
a
little
scared
of
it.
I
think
my
campaign
went
through
and
figured
out
why
it
was
failing
in
what
I
forget
which
tool
was,
but
there
was
a
failure
that
I
think
he
diagnosed
and
figured
out,
but
I
think
we're
all
well
at
least
personally
I'm
still
a
little
scared
of
it
ATM.
We
were
messing
around
with
the
locks
and
blue
store.
A
A
Okay
and
then
Igor's
memory
reduction,
PR
for
the
O
node
that
one
was
rebased
and
it's
now
going
through
testing
branch.
So
hopefully
that
gets
merged.
That's
exciting!
That's
a
nice
reduction
in
the
O
node
size,
I,
think
about
20%
of
every
member
right
and
also
a
little
bit
of
a
performance
increase
because
of
it,
and
that
was
it
for
the
last
two
weeks
from
what
I
saw.
A
C
Yeah
sure
more
discussion
and
a
presentation,
but
I
just
wanted
to
share
some
test
results
from
some
experiments
are
done.
So
the
goal
of
this
was
to
see
how
much
variation
exists
in
our
smithy
notes,
particularly
if
we
run
the
same
performance
test
on
the
same
smoothie
machine
over
and
over
again
how
much
variation
exists.
C
C
A
D
C
Yeah
I
guess
what
the
two
metrics
that
I
initially
looked
at
was
the
eye
ops
and
average
client
latency
across
our
10
runs
of
lebar,
really
FIO
performance
and
overall
in
terms
of
eye
ops.
I
am
pretty
satisfied.
I
think
that
the
variation
is
not
too
high,
as
you
can
see,
the
average
sure
was
by
196
in
most
trials
of
similar
numbers
or
acceptable.
Close
numbers,
latency
I
still
think
there
is
a
bit
of
variation,
especially
you
look
at
trial,
9.
C
Comparatively
I,
from
the
average
value
there
is,
there
are
a
couple
of
other
graphs
on
the
left.
There
is
a
plot
of
thanks
to
shredder
and
his
a
visualization
tool.
We
were
able
to
plot
95
to
99.99%
value
information
that
FIO
collects
and,
as
you
can
see,
that
variation
also
shows
in
the
99.99%
aisle
results
of
trial,
9.
C
A
C
C
E
C
C
A
This
is
this
looks
great
I'm
really
glad
to
see
that
on
an
individual
node,
we're
seeing
I
mean
looking
at
this.
The
variation
looks
to
me
like
it's
under
well
under
5%,
if
I'm
just
running
it
through
my
head
properly
yeah
this.
This
looks
really
good.
I'm
I
suspect
that
if
you
add
the
network
and
we'd
see
a
lot
more
variation,
but
that's
you
know
a
future
and
ever
just
just
doing
single
node.
This
is
this,
isn't
bad.
A
You
know,
unfortunately,
that's
kind
of
the
case
we're
at
the
easy
case
right
single
node
is
where
we
can
yeah.
We
can
get
good
stuff
like
this.
The
benefit
that
technology
brings
us
is
being
able
to
do
the
big
multi
node
tests
and
that's
where
we,
we
probably
need
to
figure
out
how
we're
gonna
do
that
effectively.
But
this
is
a
good
start.
Yeah.
C
C
But
as
long
as
we
have
this
fine,
then
that
that's
when
we
should
start
looking
at
the
second
step,
I
guess
so,
I
think
I'm
just
going
to
do
a
few
more
experiments
for
one
some
more
machines
and
then
maybe
talk
to
David
Galloway
about
narrowing
down
some
of
these
machines
and
do
multi
node
testing
as
well.
At
this
point,
it's
a
matter
of
I'm
just
like
trying
to
locksmithing
machines
and
like
run
these
tests
over
and
over
again,
it's
a
pretty
manual
job
at
the
moment.
C
A
A
The
reason
I
am
the
reason
I
asked
about
it-
was
that
the
I
ops
is
around
like
5,200,
and
these
should
be
either
until
he
3,700
nvme
drives
or
up
and
drives,
depending
on
what
which
specific
smithy
note
we're
talking
about
the
that
number
is
quite
a
bit
lower
than
what
we
can
get
with
a
single
OSD.
These
are
rights,
or
these
reads:
yeah.
C
A
A
A
Yeah
yeah
I
suspect
that
we'll
probably
CPU
limited,
which
means
that
probably
things
like
rocks
DB
compaction.
The
nvme
is
probably
fast
enough
to
kind
of
absorb
the
extra
IO
workload,
but
CPU
will,
just
you
know
be,
will
just
have
extra
CPU
overhead.
We
should
slow
it
down,
but
that
might
might
change
kind
of
the
variance
that
we
see
in
the
results
over
time
like
it
might
be.
It
might
be
actually
smoother
because
they're
not
seeing
like
you
know
the
disk
backing
up
at
all.
C
A
A
C
A
E
A
Nvme
drive,
you
might
actually
have
a
little
bit
more
on
nathie
if
I'm
thinking
about
that
right,
because
it
would
be
basically
I'm
inserted,
it
would
be
5
2.3,
gigahertz,
real
physical
cores
per
OSD
and
on
mithy
it
would
be
for
real
physical
cores
at
3.5,
gigahertz
I.
Think
smithing.
You
actually
get
a
little
more.
B
E
A
Remember
last
summer,
when
I
was
doing
the
cache
work,
I
was
doing
a
lot
of
testing
on
the
OS
DS
and
I
thought
that
we
inserted
I
had
gotten
up
to
about
35
to
40
thousand
4k
random,
right,
I
ops,
with
with
some
of
that
work
that
was
then
merged.
So
if
now
we're
seeing
it,
you
know
more
like
18
or
19
we've
we've
introduced
a
regression
so
yeah
we
should.
We
should
look
into.
That
would
be
a
good
idea.
Yeah.
A
C
You
can't
go
there,
it's
so
that
some
early
experiments
had
done
across
different
speeding
machines,
and
this
is
I.
Just
saw
the
eye.
Ups
and
I
got
discouraged
and
I
decided
to
destroy
on
one
I
was
just
comparing
it's
the
same
workload
profile
but
running
on
different
reading
machines
and
I.
Just
looked
at
the
eye,
ops,
there's
a
lot
of
variance.
A
C
And
whatever
it
is
it's
worse
than
on
a
single
machine
right,
so
there's
yeah.
That's
also
a
part
of
the
reason
why
I'm
saying
that
I
want
to
do
those
the
same
kind
of
experiment
on
a
difference
with
the
machine
to
just
be
what
those
results.
Look
like
it's
possible.
Those
might
not
be
as
impressive
as
the
first
one.
Yes,.
C
C
A
A
F
A
F
A
So
this
thing
right
here,
the
Safra
TB
log
cursor.
We
can
run
that
against
the
log
when
we
create
the
log,
we
can
even
just
delete
the
log
if
they're
too
big
to
keep,
but
they
shouldn't
be
forms
to
us,
because
we
should
be
running
out
of
the
log
levels
anyway,
ooh
yeah.
What
level
are
we
running
at
in
those
tests
that
I.
A
Open
fantastic,
so
we
will
want
to
make
sure
that
we
get
the
rocks
DB
logs,
probably
there's
really
low
overhead,
so
it
shouldn't
cause.
A
lot
of
you
know
performance
impact,
but
we
should
get
that
and
then
we
can
run
this
thing
against
it,
and
this
will
give
us
all
these
statistics
about,
like
you
know
how
many
compaction
events
there
were,
what
size
they
were.
C
A
The
OSD
logs
for
that
first
one
that
rösti
be
like
long
purser
tool
and
then
this
thing
I
think.
If
I
remember,
I
spent
a
while
since
I
used
it.
It
just
will
take
the
output
of
sis
ETL
and,
and
you
give
it
like
two
two
files.
You
know
one
from
arrays
one
or
more
files,
and
then
it
will
take
those
those
output
dumps
and
and
just
it
walks
through
them.
Looking
at
what's
different
between
them,
almost
kind
of
like
a
or
like
a
diff,
not
exactly
but
more
or
less.
D
G
Okay,
my
question
is:
is
I
really
want
to
understand
the
plan
we
got
into
the
prophecy
I
because
I
added
it
to
the
purpose
there
to
our
Jenkins
I
don't
want
to,
but
as
I
mentioned
it
last
time,
I
think
he
said
it's
just
a
very
beginning
of
the
our
prophecy.
I
really
want
to
know.
What's
our
plan
is.
A
So
keep
away
I.
Think
just
my
understanding
is
the
idea
here
is
for
Jenkins.
We
do
tests
looking
at
performance
regression
of
specific
PRS
and
the
goal
would
be
fast
running
tests.
We
iterate
through
lots
of
them,
and
hopefully
we
can
test
every
single
PR
that
comes
in
with
a
performance,
flag
or
a
label
and
have
different
test
Suites
that
run
based
on
the
labels
so
like
rgw
or.
G
H
F
A
A
A
A
A
C
G
C
You,
what
do
you
think.
G
G
A
A
F
A
F
A
G
F
G
G
F
A
There
are
a
couple
of
advantages
to
doing
so.
Then
TBT
knows
about
the
about
the
topology
of
the
disks,
so
it
can
do
things
like
run
on
any
controls.
The
OSD
so
can
do
things
like
start
out
with
Val
grinder
or
do
some
other
kind
of
walk,
trace
type
stuff
which
maybe
we'd
want
to
be
able
to
add
as
a
feature
when
you
fop
are
coming
in,
and
you
see
something
weird.
Maybe
you
want
to
be
able
to
tell
Jenkins
you
know:
do
it
this
way
run
it
with.
G
A
A
A
G
A
A
B
A
G
A
G
A
A
A
G
F
Yes,
so
the
you
see
start
is
that
you
can
use
for
local
binaries
with
an
existing
can
the
average
and
that
needs
to
both
packages.
Either
you
just
kind
of
replace
the
kinds
of
the
container
with
local
development
builds
packages
are
negative,
sorry
binaries
and
you
can
get
up
and
running
with
the
usual
stuff,
ATM,
tooling,
so
you've
kind
of
another
quest
with
just
like
a
a
normal
user
would.
A
F
F
G
F
He
has
initial
implantation
right
now,
but
it
still
requires
a
bunch
of
copying
a
fineries
around.
So
it's
not
good
ideal
by
the
timing
perspective,
so
the
approach
and
burger
swing
is
in
instead
to
kind
of
mount
your
bill
directory
into
the
container
and
place
all
the
binaries.
With
some
links
to
the
that
you
wrote
your
build
directory.
Essentially
it's
trying
to
get
into
any
data
copying.