►
From YouTube: 2018-FEB-08 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
B
B
If
everything
the
case
and
I
fully
agree,
that
static
is
isn't
a
way
to
go,
unfortunately,
because
it
will
be
it
would,
it
would
bring
last
the
minimum
and
the
minimum
size
of
boilerplate
still
putting
as
a
member
would
be
acceptable.
However,
we
could
face
another
issue
with
that.
We
have
atomic
insight,
which
means
by
default,
it's
not
it's
not
movable,
so
it
could
be
a
problem
with
I
think.
A
A
A
Let's
see
the
only
other
interesting
floor
request
recently,
there's
marks
that
changes
the
oath
streams
when
not
debugging
I
didn't
look
at
this
closely.
Ki
foo
I
had
an
it
about
structuring
something
with
return
value,
but
this
is
like
chase
off
5%
of
the
CPU
time
for
one
or
wall
clock
time.
For
one
of
these
threads.
C
A
All
right,
the
a
context,
final
thing
looks
fine
I'd
mark
that
needs
QA.
Let's
see,
there's
the
indirection
layers
for
a
shorted,
op
work
queue.
I,
think
that
the
tiny
vector
stuff
all
looks
fine.
The
bit
I'm
worried
about
is
this
is
changing
the
shouted
up,
work
queue
and
it's
getting
like
that.
It's
getting
totally
ripped
up
in
my
my
refactor
branch
preparing
said
that
I
think
we
should
wait
on
that.
Maybe
just
merge,
tiny
vector
and
I
can
either
roll
that
change
into
my
full
request
for
you
angel
after
sure.
B
A
A
A
A
Welcome
back
to
that
one,
that's
senator
I!
Think
that's
I
mean
the
main
things.
The
core
requests
that
changes
the
collection
handle
stuff
merged
that
shouldn't
be.
There
should
be
a
no
op
as
far
as
performance
goes,
but
I'm
working
on
the
one
that
I'm
reps
tottaly
unreadable,
stuff
and
compensates
in
file
store.
A
It's
eliminating
a
bunch
of
callbacks,
though
we'll
see
I'm
hoping
I'm
hopeful
that
it's
going
to
be
noticeable
speed
up
on
booster,
but
we'll
see
I'll
to
do
some
testing
and
running
it.
Yeah
I'm
cleaning
up
the
branch
and
testing
it.
So
once
that's
once
it's
working,
then
we
will
do
a
last
batch
of
tests
on
file
store
in
a
post
or
see
what
the
Delta
is.
D
B
A
D
A
And
it's
percentage,
yeah,
yeah
and
I
think
that
we
can
also
make
it
better
to
like
right
now.
The
observer
is
when
there's
an
update,
you
basically
call
every
observer,
but
I
think
we
can
intern
in
there.
We
can
make
a
map
of
like,
if
you're,
observing,
these
specific
keys
like
make
a
list
perky
of
the
will
of
the
observers
that
you
care
about,
and
so
they
only
call
the
observers
that
matter.
That
sort
of
thing.
D
A
D
A
I
mean
it's
better,
given
that
right
now
they
config
calls
every
observer
when
there's
any
update.
But
if
we
only
call
the
observers
affecting
a
key,
then
it
would
be
better
the
new
way,
because
we
wouldn't
call
that
big
function
and
test
whether
every
possible
key
is
in
the
changesat
and
then
call
them
or
whatever.
It
would
be
much
better
and
that's.
A
D
A
I
mean
yeah.
Sorry,
if
that's
the
case,
I
would
I
would
just
take
a
look
at
the
config
code
that
calls
the
observers
and
update
that,
because
I
think
we
can
make
it
we
can
make.
It
only
call
the
observers
that
are
affected
for
the
changed
keys
and
that
will
that
will
eliminate
the
take
concern.
I
think
right
now
that
indentation
is
just
the
naive,
sloppy
thing.
That's
why.
D
In
the
meantime,
I
have
been
observing
how
they
get
works
actually
in
under
profiler
and
so
on
and
so
on,
and
they
came
to
conclusion
that
the
get
Val
is
slow
not
only
because
of
the
Traverse
over
the
STD
map,
but
also
normalizing
keys,
even
if
they
are
already
normalized.
So
I've
made
a
short
PR
to
take
that
and
in
my.
A
C
A
A
B
A
If
there's
like
a
static,
I
think
my
instead
of
I
would
actually
leaned
forward,
making
the
callers
that
have
unsanitized
config
option
names
and
make
them
explicitly
call
the
normalize
function
and
then,
and
then
this
and
then
get
Val
doesn't
have
to
normalize
right.
It's
assumed
that
it's
always
getting
a
pristine,
well-formed
thing
and
it
asserts
yeah
it's
a
little
bit
more
work
because
we
have
to
like
look
at
the
callers
and
decide
whether
it's
it's
a
Christine
source
or
not.
A
A
A
B
C
B
C
B
It
would
be
because
I
observed
that
on
the
right
path,
the
decode
pipeline
of
nowadays
x86
abuse
is
involved
in
more
and
more
for
more
about
55%
of
delivered
microbes,
which
means
similarly,
but
we
are
not
using
the
decode
that
microbe
cut
up
almost
at
all
and
I'm,
not
surprised,
because
because
the
right
path
is
OSD
and
Gloucester
is
basically
long
sequential
code.
So
it
can.
It
has
no
chance
to
fit
inside
the
micro
up
the.
Could
you
call
it
micro
up
cash.
D
B
C
B
Also
I
I
considered
because
I
can
see
a
lot
of
a
cache
misses
and
also
I.
Tlb
misses
I,
think
about
about
pipelining
direct
path.
I
know
it's
controversial,
because
each
stage
adds
some
Latin
see
some
additional
work
because
of
passing
passing
the
data
to
the
next
stage,
etc,
etc.
However,
at
the
moment
we
are
running
with
solo
IPC
around
zero
around
0.4
or
lower
that
that
if
we
could
improve
it,
maybe
we
could
also
squeeze
some
Latin
see
some
Latin
see
there.
C
C
A
B
B
It's
very
small
part
of
instruction
cache,
basically
around
six
percent
in
instruction
cache
size,
I'm,
pretty
sure
that
pretty
sure
I
guess
that
a
lot
of
instruction
cache
and
I
TLB
addiction
events
come
from
jumping
to
kernel
each
time.
I
wish
you
read
at
the
moment
we
are,
there
is
no
batching,
absolutely
no
matching.
We
are
using
using
most
of
the
time
period,
one
period
to
ever
reach
operation,
I'm,
trying
to
put
to
toot
some
kind
of
of
additional
stage
solely
dedicating
grid.
B
Async
read
stuff,
trying
to
to
compose
incoming
incoming
higher
context
into
a
graph
that
could
be
put
into
kernel
using
one
small
IO
IO
submit
call,
however,
it
will.
It
will
be
done
in
in
a
separated
threats
to
having
having
a
sister
in
place
would
be
very,
very
useful
here.
I
think
because
it
has
dedicated
threats
for
for
interaction
with
cut
out.
Am
I
am
I,
correct,
yeah.
C
That's
when
I
think
I'll
be
interesting
to
see
ones
who
converted
more
right.
There
was
team,
or
just
he
started
that
once
we
it's
for
measuring
this,
these
things
more
exactly.
Okay,
some
I
think
that
at
that
point,
it'll
be
more
interesting
to
look
at
these
kinds
of
micro,
optimizations,
absolutely
unclear
to
me
like
we're
entering
right
now,
if
it's
going
to
add
the
same
kind
of
parallax
that
we
will
later.
B
The
profiling
shows
we
have
a
lot
of
micro
operations
delivered
by
the
micro
sequencer
instead
of
normal
decoders,
which
means
that
that
the
throughput
that
the
front
and
throughput
is
very,
very
limited
trying
to
find
source
of
them.
It
could
be
that
that
the
divisions
we
have
we
have
in
the
pipeline
if
we
have
on
the
right
path,
are
responsible.
Because
of
that,
however,
there
is
no
mention
about.
There
is
no
direct
mention
that
division
goes
through
micro,
sequencer
on
X,
X
6
on
skylake,
for
instance.
B
C
B
C
B
B
Angus
Fox
manual
says
that
each
division
that
division
require
around
10
micro
operations
when
I
can
find
some
sources
claiming
that
each
each
in
x86
operation
that
takes
more
than
4
micro
I
ups
needs
to
go
for
sequencer
and
we
do
have
some
divisions
across
the
root
path.
I'm
addict
eating
them
right
now
we
fully.
We
have
we'll
have
a
pull
request
today
or
tomorrow
and
Rita
enter
and
repeat
the
profiling
session.