►
From YouTube: 2019-10-03 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
Igor
had
some
tests
on
hard
drives
that
first
sounded
like
it
was
really
good
or
looked
really
good
and
then
I
think
he
thought
that
he
needed
to
redo
some
of
them,
so
I'm
I'm,
not
sure.
Yet
what
what
the
latest
on
that
is,
but
once
he's
kind
of
done,
some
of
his
testing,
maybe
we'll
see
if,
if
we
still
think
that
4k
metallic
size
for
hard
drives
is
a
good
idea
or
not,
but
for
now
this
is
at
least
a
placeholder
PR
to
start
tracking
this
stuff,
so
we've
got
putt
there.
Yeah.
A
A
A
The
first
two
here
are
kind
of
related
to
each
other
and
have
to
do
with
the
OSD
op
q.
I
have
not
actually
been
following
that
really
closely,
but
I.
Think
Sam
was
fairly
involved
in
some
of
these
discussions,
especially
since
he's
looking
at
some
of
this
quality
of
service
effort
that
we've
have
been
trying
to
get
over
the
fence
for
the
last
couple
of
years,
so
that
goes
to
both
merged.
A
This
right
around
cash,
one
that
merged
that
merger
that
just
get
closed
or
was
that
put
in
the
wrong
place,
still
not
close,
that
okay
is
closed,
buzz,
okay,
so
potentially
maybe
that
needs
to
be
reopened
or
something
at
some
point.
But
anyway,
that's
that
a
lots
of
updated
stuff.
A
But
the
thing
that
I'm
a
little
bit
worried
about
with
the
existing
filtering.
That's
their
potential
is
in
the
filtering.
That
we
want
to
add
is
that
it
means
that
we
have
to
decode
things
in
the
CLS
code
inside
the
OSD
and
that's
actually
showing
a
fair
amount
of
wall
clock
time
being
spent.
Doing
that
when
we're
doing
a
good
listing,
though
Casey
had
mentioned
the
possibility
at
one
point:
oh
you're,
here,
Casey
I
think
you
had
mentioned.
C
Yeah,
the
filtering
in
Eric's
PR
is
just
based
off
of
the
keys,
so
it
doesn't
doesn't
need
any
decoding
there,
but
the
only
other
decoding
or
the
only
other
filter
I
know
of
is
for
versioned
objects
where
it
just
needs
a
flag
so
potentially
separating
the
flag
from
the
rest,
cool.
A
Placing
inlining
immutable
small
objects
within
the
Oh
node.
We
talked
about
that
a
little
bit
at
CDM
yesterday,
Igor
has
a
an
another
PR
that
does
something
similar,
but
at
the
blob
level,
I
don't
know
if
that's
actually
in
this
listed
PR
Azure.
If
that
got
closed
at
one
point,
but
I
think
sage
is
maybe
more
in
favor
of
Igor's
proposal
versus
actually
in
lining
stuff
right
in
the
O
node.
So
anyway,
we'll
we'll
see
we'll
see
how
that
goes.
A
D
D
Just
just
a
couple
fixes,
so
it
now
passes
our
BD
tests
cleanly
and
I
fixed
a
couple
bugs
shown
up
by
the
seven
fest
testing,
and
this
morning
I
found
one
more
which
I'm
hoping
will
be
the
last
one,
but
we
shall
see
and
I'm
still
covetous
of
reviews
or
an
approval.
You
don't
even
have
to
look
at
it.
I
won't,
we
complain,
approve.
A
A
E
Get
distracted
by
no
oh.
A
Yeah,
are
you
looking
at
that
that
bug
about
it
yeah.
D
E
I
said
it
on
it,
just
one
thread,
and
it
was
super
first
I
tried
to
make
it
so
we
set
it
really
early
on
in
the
startup
process,
so
that
all
the
truck
chilled
intercept.
But
that
was
really
hard
because
you
basically
have
to
set
up
a
thing
up
to
find
out
what
device
and
what
network
addresses
you're
going
to
use
and
all
that
stuff
and
then
tear
it
all
down
again
and
then
set
it
and
send
that
look
again.
It's
just
sort
of
stupid.
E
So
instead
I
just
look
in
proc
and
I
iterate,
the
threads
and
I
just
said:
oh
no,
everyone
Wow
seems
to
work
except
that
I'm
like
and
if
I
do,
if
I
set
the
affinity
and
then
I
get
the
affinity,
it
gives
me
back
the
right
thing,
but
if
I
look
in
proc
in
this
get
shed
file
whatever
it
is,
it
there's
a
field
and
they're
called
Numa
preferred
NID
and
that
doesn't
get
set
until
later.
E
I
think
there's
like
a
something
in
the
kernel
scheduler
that
like
looks
at
Numa
faults
and
then
like
sets
that
dynamically.
So
after
I
do
some
work
load,
a
bunch,
that's
ready
to
move
over,
but
not
all
of
them,
because
they're,
probably
idle
or
something
I'm,
not
sure
if
it's
like
the
ideal
solution,
but
it
I
don't
know
how
it
works.
E
Okay,
I
guess
and
then
I
noticed
that
there's
another
function
called
on
men
set
policy
or
settlement
policy,
something
like
that
that
basically
controls
whether
it
tries
to
help
the
process
memory
is
allocated
on
the
same
Numa,
node
or
memory
attached
to
the
same
Numa,
node
or
nearby
or
whatever,
and
so
I
think
we
should
be
setting
that.
E
But
that
one
doesn't
let
you
choose
what
thread
it's
only
the
current
thread
so
I
think
that
one
we
have
to
adjust
the
policy
I
think
that
startup
and
then
basically
only
said
it
just
sort
of
in
like
I,
think
it'll
be
okay
cuz.
The
policy
is
like
try
than
local
note.
If
you
can't
people
know,
then
do
a
remote,
node
I
think
that's
one
we
want,
and
that
should
be
okay,
even
if
we
don't
set
any
new.
My
affinity,
I'll
just
sort
of
tend
to
make
remember
anymore.
Local
I.
A
A
A
Okay,
I
think
that
stuff
this
list
here
left
here,
Margie
and
ping-
has
a
mutex
contention
PR.
That
theoretically
should
improve
something
which
which,
when
he's
looking
at,
but
there
was
no
performance
improvement
from
it
and
somebody
HS
go
through
make
sure
that's
actually
correct,
I,
think
he
or
he
was
hoping,
maybe
or
could
I,
don't
think
I
have
time
to
look
at
it,
I'm,
probably
not
even
the
best
person
to
anyway.
So
there's
that
Igor
has
an
update
for
his
framework
for
intelligent
DB
space
usage.
I
think
he
I
remember
what
he
did.
A
E
E
E
A
A
A
A
E
A
A
A
E
G
And
so
Adam
continues
to
we're
kind
of
it's
a
he's,
doing
more
performance
benchmarking
with
our
database
I.
This
now
he's
adjusting
fao
to
be
able
to
handle
that
I
got
a
piece
that
maybe
mark
you
could
help
with
at
some
point,
I
would
be
trying
to
test
more
with
HS
pension,
so
we
can
get
more
than
the
test
on
listing
side.
Is
there
there
might
be
some
in
fact,
from
time
to
list
across
multiple
choice.
A
Yeah
yeah
I
think
I
will
attempt
to
thread
the
needle
and
do
that
on
the
new
Intel
nodes
that
they
donated
for
us,
so
that
we
can
simultaneously
get
that
testing
done
and
show
that
we're
actually
using
these
and
not
doubling
them
languish,
though
I
think
I
think
we
should
be
able
to
do.
That
sounds
great.
A
Also,
I
was
speaking
speaking
of
which,
with
some
of
the
previous
HS
bench
tests,
we
noticed
that
there
was
this
kind
of
instability
during
rgw
writes
where
the
performance
we
can't
like
oscillate
a
lot
and
when
I
put
rocks
DB
on
or
Ram
disk
that
went
away
it
was.
It
was
much
more
stable,
so
it
was
really
the
the
database
workload
that
was
causing
that
ass
oscillation.
A
F
G
And
they're
still
like
one
or
two
remaining
issues,
I
think
but
Adams
looking
into
them
in
trying
to
provide
in
parallel
with
that
for
ingesting.
A
G
A
Yeah
yeah
absolutely
well,
you
can
do
test
there
and
yeah
Matt
I
mean
if
you
guys
want
to
do
testing
on
it
sounds
like
he's
able
to
get
through
at
least
like
eight
hours
worth
of
testing.
Fine
I,
don't
know,
maybe
even
longer.
At
this
point
so
I
mean
it
sounds
like
it's
it's
you
know,
at
least
in
a
stable
enough
of
this
state
right
now
that
you
guys
could
could
look
at
it
and
play
with
it
too.
If
you
wanted.
G
A
F
I'm
yeah,
I
think
it
subscribe,
because
I
think
it's
critical
path,
work
isn't
so
I'm
excited
about
it
and
wanted
to
get
together.
You
know
what
I
guess
early,
but
you
know
I
want
to
get
its
figure
out,
whether
it
operationalize
it
early.
If
it's
gonna
be
successful,
the
figuring
are
ever
going
or
if
we're
gonna
run
into
roadblocks
with
it
will
just
know
that.
A
Yeah
I
hear
he
was
saying:
I've
got
a
pair,
that's
this.
We
need
to
do
something
like
I.
Hopefully
we
get,
it
soon
will.
F
A
Yeah
I'll
probably
try
to
work
it
in
on
the
new
the
new
nodes.
Once
we
get
those
into
kind
of
a
good
configuration
state
right
now,
they're
the
bonding
setup
on
them
isn't
isn't
kind
of
going
working
properly.
It's
all
going
through
one
one,
Nick
we've
got
to
fix
that
and
then
there's
also
I
suspect.
We
probably
want
to
do
some.
Oh
s,
level
tuning
on
them.
A
Just
some
initial
runs
there,
they're
kind
of
not
as
fast
as
I
was
hoping
they'd
be,
but
I
suspect
this
largely
due
to
probably
more
CPU
related
things
than
disk
related
things
so
like
looking
at
two
states
and
P
States
and
hyper-threading
and
and
have
turbo
boost,
and
all
these
different
policies,
I
think
is
probably
the
first
order
of
business,
some
of
them
anyway.
So
yeah
I'm
going
stuff
with
with
all
that,
but
once
we
have
that
worked
out,
that
I
think
could
be
the
the
next
thing.
We
start
testing.
A
Ok,
so
let's
see
oh
yeah
I
mentioned,
we
do
have
the
new
until
nodes
up
and
running
and
maybe
in
part
of
that
I'll
give
a
shout-out
to
keep
ooh,
because
we've
got
some
toss,
8
running
on
them
and
he
figured
out
in
about
a
day
how
to
get
stuff
building
on
them.
So
also
a
shout-out
to
Ken
dryer
for
packaging
up
a
bunch
of
stuff
that
we
needed,
including
PC,
malloc
and
other
things,
though
I
think,
with
just
a
little
bit
more
work
to
install
that's
SH.
A
So,
ok,
so
that's
the
new
Intel
nodes
ongoing
work
with
those
ok
stage
I.
This
is
an
incomplete
list,
but
I
tried
to
start
writing
down
all
the
different
performance
related
things
that
have
been
kicking
around
my
head.
That
I
want
to
try
to
get
fixed
for
octopus
or
at
least
improved,
not
necessarily
fakes,
but
just
make
them
better
to
the
extent
that
we
can.
So.
This
is
not
complete
and
feel
free.
A
We've
got
a
couple
of
things
that
we've
done
minor
things
we
updated,
rocks
DB
that
helped
a
little
bit.
We
added
multi
thread
compaction,
which
was
a
small
thing,
but
we
had
to
test
it
to
make
sure
it
was
good.
Okay
to
do
I
think
we
feel
it's
safe
now,
which
it
may
not
have
been
in
the
past.
So
that's
good
and
then
Adams
charting
stuff
is
really
the
other
really
big
thing
here:
right,
yeah
yeah.
A
So
there
will
be
a
lot
of
work
that
will
go
into
testing,
that
I,
think
and
making
sure
it
all
looks
good,
but
that's
that's
kind
of
what
we're
looking
at
right
now.
It
sounds
like
for
for
tea
rocks
DB,
that's
probably
for
the
moment
at
the
very
least,
and
maybe
out
since
the
rocks
CBS,
don't
really
want
to
entertain
the
idea
of
Virgie
entry,
yeah.
E
E
A
A
Okay,
Luther
caching
improvements,
so
this
was
surprisingly
right
for
work.
The
cache
key
factor
had
a
pretty
big
impact
on
performance,
bigger
than
I
actually
expected,
while
I
was
working
on
it,
so
the
the
refactor
itself
is
done.
That's
merged.
That
was
pretty
much
just
moving
trimming
into
the
DPO
SCTP
threads.
Making
trimming
happen
on
add,
rather
than
in
the
mental
thread
you
know
periodically
and
then
also
splitting
the
buffer
and
O
node
cache,
so
that
we
aren't
contending
on
the
same
lock
on
updates.
A
That
was
that
was
the
majority
of
what
that
covered
work
in
progress.
The
Ono
double
caching
fix
that
will
go
in
after
Adams
starting
fix
a
refactor
and
then
the
cache
age
bidding
which
I
have
been
holding
off
on
until
we
also
get
the
Ono
double
caching
fix
in
so
those
tours,
it's
kind
of
a
change
dependency
there
and
really
the
caches
bidding.
A
Once
we
have
the
double
caching
fix,
then
the
question
will
be
I
think
that
the
primary
benefit
of
the
age
bidding
will
be
if
you're
like
seeing
workloads
that
alternate
between
something
like
RB
d,
&
RG
w,
where
for
a
while,
you
have
rgw
traffic
and
you
really
want
Oh,
Matt,
cached
but
then
later
on.
Maybe
there
is
no
RW
traffic
and
it's
just
our
BD
traffic,
and
then
you
want
to
move
back
from
like
a
mixed
cache
scenario
to
only
really
cache
mio
nodes.
A
F
E
G
E
E
A
A
You
can
set
up
those
those
bins
in
different
ways
and
then
you
can
go
through
and
say:
okay,
this
cache
has
some
really
really
hot
stuff
and
it
has
a
lot
of
really
hot
stuff
in,
but
this
other
cache
has
stuff
with
like
three
days
old
in
it,
and
then
you
can
start
making
decisions
about
well,
how
do
I
prioritize
these
things?
What
do
I
want
to
do
with
them?
A
A
But
that's
that's
it
lets
you
start
making
decisions
based
on
that
rather
than
just
saying.
Well,
we
have
two
caches
and
we
don't.
We
know
that
there's
this
much
stuff
in
this
cache
and
this
much
stuff
in
this
cache,
but
we
don't
know
anything
about
them.
We
don't
know
anything
about.
What's
the
age,
the
relative
age
of
these
things
in
these.
A
E
G
E
E
A
It's
yeah
I
mean
it
was.
It
was
working.
I've
got
up
here
on
that
place
to
master,
but
the
question
is:
do
we
want
to
require
an
own,
an
an
OSD
reset
to
enable
it?
Whereas
if
we
piggyback
on
Adam
stuff,
we
can
use
his
conversion
tool
to
make
it
so
that
you
know,
since
it
requires
not
just
format,
change
it.
I.
A
A
A
A
F
G
G
F
But
that's
extra
that's
extreme,
but
that
but
the
more
and
more
cut
much
more
communist
as
and
as
that
is
that
is
the
traditional
mentioned.
You
know
64
K
or
something
like
that,
but
it
but
maybe
arrange
a
slightly
below
that.
But
this
put
the
bleep
wait,
wait!
Wait!
We
do
know
this.
This
is
a
large
user
I.
E
E
E
E
Anyway,
okay
yeah,
all
right
well
I,
have
a
lot
of
stuff
on
the
list
now
for
Blue,
Star,
so
sure.
A
Here's
this
proposing
disabling
it
entirely.
We
should
be
easy,
but
maybe
none
palpable
for
us.
A
A
E
A
Seems
good
to
me:
I
mean
we
could,
if
we
wanted
to,
we
could
make
it
so
that
it's
it
targets
a
certain
number
of
PJ's
per
pool,
as
kind
of
like
being
the
the
upper
level
optimal
right
like.
If
someone
sets
a
thousand,
maybe
you
don't
really
want
to
give
it
a
thousand
that
they
can
I.
Guess
right
now,
scrape
the
limits,
but
you
know
maybe
fifty
is
good
enough
and
and
we'd
prefer
that
that
be
have
like
the
upper
limit
of
what
an
individual
pool
gets
or
something
I
don't
know.
E
Great
about
a
upper
limit,
that's
the
lower
limit
that
it's
tricky
one,
because
that
that
limits
you.
It
affects
your
performance,
so
it's
just
marketed
before
very
old.
It's
getting
option,
but
it's
basically
just
at
the
four
by
default,
but
it
might
make
more
sense
to
have
that.
Based
on
what
dynamic.
E
Don't
think
so
I
think
we
need
to
like
shift
our
thinking
so
that
we're
not
thinking
about
PGS
we're
thinking
about
what
actually
the
user
actually
wants.
So
when
we
create
a
pool,
we
shouldn't
think
about
how
many
PGs
it
gets.
We
should
be
thinking
about
what
level
of
parallelism
these
are
wants
and
how
much
data
they
expect
to
put
in
it
and
then
derive
the
PG
from
that.
E
E
E
A
A
G
E
A
E
Yes,
so
the
make
FS
could
be
a
little
bit
more
smart
about
doing
a
discards
or
write
zeros.
Instead
of
actually
writing
literal
zeros,
but
maybe
writing
it.
Literal
zeroes
is
actually
better
I,
don't
know
it's
sort
of.
You
could
argue
either
way
and
but
I
think.
If
you
have
performance
issues,
creating
an
empty
file
system,
then
you're
thinking
about
the
wrong
thing.
Yes,
you're
going
to
have
performance
issues
once
you
actually
start
to
use
it
like
the
creation
of
the
file
system
is
insignificant
in
comparison
to
using
it.
E
A
A
A
E
Think
again,
I
would
just
defer
to
the
XFS
guys
it
I
mean
if
they
add
the
option
to
do
the
efficient
write
service
and
great
will
use
it,
and
but
it's
not
always
what
you
want,
because
they're
using
a
thin
provision
system,
then
you
actually
don't
want
to
have
the
journal
be
thin.
Provision
to
us
inna
space
will
hit
you
in
a
more
painful
way,
but
turned
out
that
doesn't
really
matter
for
us.
So
I
think
we
would
use
it,
but
whatever
anyway,.
E
G
E
E
E
A
E
B
You
know
I've
been
doing
some
testing
compare
performance,
doing
related
stuff
on
all
SSDs
and
I've
finally
made
the
jump
from
our
Luminess
released
to
Nautilus
and
the
two
outstanding
things
that
I
noticed
one
64k
performance
dramatically
improved.
That's
good
news.
Bad
news
is
I'm,
seeing
a
rather
dramatic
drop
in
large,
sequential
transfers.
This
is
sefa
fest,
so
I
did
you
know.
I
haven't
traced
it
back
if
it's
just
stuff
a
faster
if
I'm,
seeing
it
in
our
beauty
and
I.
B
A
B
B
B
A
A
B
A
A
B
A
Was
a
haven
getting
a
block
trace
on
the
new
one,
though
you
know
you,
you
would
have
to
redeploy,
but
if
you
could
get
a
block
trace
when
you're
doing
a
big
right,
you
can
see
if
you're
getting
like
I'm,
you
know
unaligned
rights
or,
if
there's
anything
else,
weird
going
on
that,
you
know
results
in.
Like
a
you
know,
thirty
or
fifty
percent
performance
production,
yep.
B
B
A
B
It's
it's
a
it's
interesting
and
it's
interesting.
The
effect
that
they're
having
an
analysis,
the
environment
so
anyway
I'll
you
guys
as
I
get
more
data
cool.