►
From YouTube: 2018-FEB-01 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
A
A
A
A
It
looks
like
the
worst
case
on
right
is
like
a
1
to
2
percent.
In
my
tests,
I
couldn't
really
see
it.
On
my
box,
the
Brita
sauce
saw
small
regression,
but
general
thinking
is
that's
worth
it
to
get
rid
of
all
the
other
stuff
and
we'll
eliminate
callbacks
in
the
UST
and
code
and
whatever
it'll
simplify
things.
A
The
part
that
I
forgot
was
that
there's
also
now
a
check
on
every
read,
call
to
check
if
there
is
an
in-flight
write
on
that
object,
and
so
we
have
to
also
benchmark
the
small
reads
on
NDB
and
see
what
the
impact
is
there.
So
radio
stops
doing
that
I
think
today
or
tomorrow,
or
whatever
it's
late
today,
I'm
to
make
sure
I
think
it'll
be
fine.
It's
a
it's
a
hashmap
lookup,
and
so
it
should
be
should
be.
A
Ok
need
to
make
sure,
but
that's
exciting,
that's
going
to
pull
out
a
whole
bunch
of
random
fastly.
The
the
interface
cleans
up.
So
there's
only
like
one
method
now
to
queue
a
transaction
and
the
transaction
is
applied
to
the
in-memory
cache
synchronously
so
and
that
call
returns.
You
can
read
back
the
result,
although
annoying
flavors
of
that
are
all
gone,
it
does
keep
around.
It
doesn't
actually
remove
the
call
back
yet
because
they're
a
handful
of
cases
where
I
think
we
still
need
it.
A
There's
this
OS
driver
thing,
that's
used
by
snap
mapper
and
also
I
notice.
It's
used
by
one
other
thing:
I
can't
remember
this.
A
A
The
other
thing
going
on
is
rate
of
sloth.
Has
this
big
pull
request
that
adds
a
sync
read:
support
to
blue
store
and
then
also
changes
a
bunch
of
stuff,
an
OSD
to
make
it
work,
but
it's
sort
of
running
it
up
against
the
complicated
soil
of
control
through
the
OSD
right
now
so
kind
of
thinking
that
we
should
wait
until
we
map
out
what
we
want
to
do
to
restructure
clean
that
up.
First.
C
What
is
the
current
state
of
the
OSD,
a
synchronous
infrastructure?
We
are
reusing
parts
that
was
written
mostly
for
recovery.
In
some
places
there
are
simply
slow.
We
are
doing
a
lot
of
unnecessary
work
and
I'm
afraid
we
can
heaven.
We
can
see
an
effect
where
we
got
some
boost
from
even
from
going
essentially
even
for
a
random
right.
Sorry
for
random
reads:
scenario
that
is
f
that
could
be
in
some
situations
offset
by
by,
for
instance,
executing
context
twice.
C
The
typical
typical
cars
is
a
sparse
Street.
It's
implement
that
in
master
it's
for
a
synchronous
treats
for
easy
pools.
It's
implemented
in
that
way
that
basically
gets
all
the
data
then
wraps
it
in
a
together
with
header
consumed
by
our
BD.
It's
very
very
far
away
from
object
stores
will
do
for
Brewster.
We
have
patches
bringing
bringing
sparse
with
support.
All
we
need
to
do
is
is
to
clean
up
there,
always
the
infrastructure
that.
A
C
A
I'm
kind
of
thinking
that
we
should
try
to
extract
everything
that
can
be
extracted
from
there
I
didn't
do
it
separately.
Okay,
like
the
the
P
start,
the
starting.
The
restarting
is
one
thing
where
I'm
not
sure
it's
you're
going
to
help,
but
we
can
tune.
We
can
make
it
parameterize
by
a
number
of
retards,
so
you
just
set
it
to
one
until
it
actually
helps
yep.
D
C
A
B
C
D
C
C
C
E
B
B
A
And
I
think
what
what
I'm
kind
of
imagining
is
that,
once
you
have
just
the
skeleton
of
the
call
chains,
what
we
really
need
to
do
is
write
a
bunch
of
annotations
that
say
like
this
is
where
we,
you
know,
prepare
the
log
event,
and
this
is
where
the
version
gets
decided
and
so
on,
because
I
think
that's
the
part,
that's
complicated.
You
move
all
that
around
and
so
that
we
have
a
more
as
a
simpler
flow
and
one
that
poke
more
easily
elegantly
composed
and
all
the
different
combinations
we
need.
A
A
B
F
Yeah
I
was
waiting
until
you
folks
were
done
discussing
whatever
you
normally
do,
because
I'm
not
usually
here
but
I,
alex
Calhoun
and
perfect
scale
team
is
running
our
BD
test
with
blue
store
and
we're
sitting
we're
hitting
a
weird
result.
That
I
was
hoping.
Maybe
you
could
give
me
some
ideas
on
what
to
look
at
sorry.
I,
don't
have
like
the
data
like
ready
to
throw
up
on
the
screen
here.
F
Maybe
I
could,
but
the
problem
is
that
64
KB
there's
a
sudden
spike
in
increase
in
our
you
know
our
video
I
through
put
with
blue
store,
and
it's
too
high.
It
seems
like
because
we're
getting
like
you
know
like
500
brights
per
OSD,
which
just
isn't
possible.
You
know
so
like
I'm,
trying
to
understand.
What's
going
on
here.
A
A
A
D
A
Then
all
those
will
just
be
laid
out
sequentially
and
so
you'll
just
go
fast
and
below
that
you're
overriding.
If
you're
overwriting
existing
data,
then
it
has
to
overwrite
a
previous
allocation,
and
so
it's
going
to
do
a
it's
going
to
do
a
data
journal
and
then
it's
going
to
a
synchronously
go
and
update
overwrite
the
the
previous
allocation.
A
G
A
F
F
F
I
think
we
did
I'm
still
looking
I've
got
to
sit
down
with
him
and
look
at
his
data
again,
but
I
think
there
were
some
problems
with
memory
management
where
we
had
done
some
things
didn't
make
sense.
So
we'll
get
back
to
you
on
that.
Okay,
that's.
A
A
A
B
Wig
I
think
that
sounds
like
a
good
idea
to
me
and
ain't.
The
one
thing
we
want
to
consider
is
in
the
kind
of
it
and
stay
for
this
Easter
work.
Are
you
gonna
happy
having
like
one
object
or
chard
or
one
port
more
than
one.
A
Think
we
can't
because
the
unless
we
get
and
that's
for
like
replicating
all
the
OSD
storage
and
stolen,
but
that
the
main
problem
that
I
keep
running
up
against
is
the
snap
mapper
is
an
index
of
objects
by
index
of
clone
objects
by
snap
ID
and
it
doesn't
charge
because
it's
it's
updated
as
part
of
a
transaction
for
placement
group
and
the.
A
A
G
A
A
A
But
if,
if
that's
the
case,
then,
if
you,
if
it's
one
OSD
and
you
split
a
PG
and
the
PG
jumps
to
a
different
shard,
is
that
the
same
thing
as
if
you
have
four
Oh
STIs
and
you
split
then
that
PG
is
gonna,
move
to
another?
It's
gonna
move
to
another,
we'll
see,
no
matter
what
I
guess
right.
So
it
doesn't
matter.
B
A
B
B
A
B
B
A
B
B
F
Is
kind
of
I
was
just
gonna
comment
that
I've
been
running
on
skylake
with
like
60
60,
some
cores
and
you
burned
up
just
about
all
of
them
on
running
with
4
nvm
SSDs,
and
some
of
that
is,
you
know,
got
to
sort
out
how
much
is
really
the
device
and
how
much
is
really
networking
and
how
much
is
the
context.
Switching
that
we're
doing
this
is
luminous,
and
but
do
you
think
that
one
of
the
questions
I
had
for
you
is?
B
F
Dobby
that
the
is
the
answer
for
that,
like
basically
going
to
like
a
DP
DK
or
our
DNA,
or
some
alternative
yeah
exactly
and
are
we
looking
at
that
like
for?
Does
anybody
measure
it's
sort
of
the
answer
to
that
question,
how
many
cores
per
SSD
with
those
transports,
or
do
we
not
need
to?
Because
we
already
know
from
profiling
that
it's
somewhere
else.
B
A
A
Mean
you're
always
using
a
namespace
right,
it's
a
sort
of
Indian
equivalent
of
a
partition
and
I.
Think
it's
just
built
into
the
because
internally,
the
derive
is
doing
its
own
log
structure,
thing
and
so
I
think
it's
just
built
into
all
that
that
mapping
layer
that
map's
the
the
logical
block
offset
into
the
physical
pages.
It's
just
part
of
that
whole
I'm
implementation,
so
I
wouldn't
expect
there'd,
be
any
overhead
actually.
A
B
G
A
G
G
A
Yeah
this
is
this
is
sort
of
my
lingering
concern
with
how
well
we
can
chart,
because,
ideally,
if,
if
you
have
multiple
shards
and
they're
doing
transaction
commits
it'd
be
nice
if
they
were
like
using
independent
cues
on
the
device
and
could
like
commit
independently
instead
of
having
to
go
through
the
same
the
same
log
or
something
having
any
shared
data
structures.
There
right.
B
A
A
But
it
means
that,
like,
if
you're
doing
a
read
you
might
need
to
like
consult
all
shards,
you
have
to
make
sure
all
shards
are
committed
or
something
I,
don't
know.
Yeah
yeah
it'd
be
weird
they're,
not
totally
independent,
just
for
structures,
but
maybe
that's
what
we
want
anyway,
though,
because
maybe
the
in-flight
write
process
and
that
and
the
journal
is
all
charted.
But
on
read
your
your
your
view
of
what's
committed,
as
always:
I
shared
read-only
type
view,
because
again,
data
can
both
migrate
between
shards.
When
you
split.
A
A
It
seems
like
it
might
come
down
to
one
or
designing
the
implementation
of
that
back
in
whether
it
Maddox
cuz
all
the
other
stuff
doesn't
matter
right,
all
the
other
stuff.
We
can
rashard
at
any
time
like
just
restart
the
OSD,
with
the
different
number
of
shards,
when
we
can
divvy
up
the
PGS
differently.
It's
only
wear
it
like
meets
the
on
desk
layout
that
there's
potentially
a
persistence
problem
are.
G
D
A
A
Yeah
yep
yeah,
hopefully
so
I
mean
that
the
thing
with
the
log
structured
is
that
the
whole
file
layout
is
the
journal
right,
so
it
needs
it
needs
to
be
in
order
to
be
able
to
without,
like
rewriting
all
the
data.
It
has
to
be
the
case
that
you
can
read
data
from
multiple
from
journals
committed
by
other
shards.
A
I
mean
it'll
be
mixed
because
you'll
have
you
know
these
twelve
PGs
are
in
shard
one
and
then
we
split
and
now
you
have
eleven
and
a
half
PGs
and
shard
one
and
a
half
of
that
fiji
is
now
in
shard.
So
a
given
like
log
segment
I,
don't
think
it's
going
to
be
much
of
a
problem,
because
their
whole
point
of
the
log
structure
is
that
it's
read-only
once
it's
written,
it's
read-only
and
so
so
like
sharing
stuff,
doesn't,
is
easy
to
deal
with.
But
it's
just
something
we
need
to
think
about.
B
A
C
And
it's
being,
it
was
used
in
some
critical
paths.
At
the
moment
there
is
a
PR
ongoing
that
switch
back
to
the
alt
legacy
infrastructure.
But
the
open
question
is:
how
do
we
want
to
address
it
in
the
in
the
in
the
future?
Yesterday
I
started
sketching
some
very,
very
small
and
stupid
utility
class
for
for
for
provide
caching
for
a
particular
configurable
while
still
observing
for
changes,
but
it's
not
sure
whether
it
said
the
direction
we
want
to
follow.
May
maybe
just
the
legacies
is
the
option
not
sure
I.
A
Think
the
yeah
so
that
the
approach
I've
been
taking
for
now
is
just
keeping
or
reverting
to
the
legacy
stuff
if
it
matters,
because
we
don't
have
a
better
solution
and
the
legacy
thing
is
not
that
bad
as
far
as
making
something
better.
The
fundamental
reason
why
it
is
this
way
is
because
we
need
to
make
sure
they
can
fig
update,
is
atomic
and
so
I
think
there
are
two
options
we
can
make
a
like
a
trivial
wrapper
thing
that
you're
talking
about
that
just
does
the
observer,
but
has
a
cash
value.
A
A
A
E
A
E
There's
not
a
slow
in
the
fast
case,
depending
on
the
other
there's
a
fence
there
sexy
right,
but
but
still
that's
the
faster
that
what
we're
doing
now,
but
we
need
it,
but
we
asked
we
needed
general.
We
need
a
general
approach,
though
we
need
we
need.
We
need
to
so
I
like
so
I
like
this
office,
there's
a
whole
thread.
You
know
that
we've,
that
of
ideas
that
branislav
anatomy
and
I
had
been
talking
about.
We
should
put
it
somewhere,
but
we
should
do
a
group
of
them
together.
A
B
C
C
E
D
G
A
So
that's,
I
think,
that's
what
we're
talking
about
here
so
we're
talking
about,
so
that
the
problem
is
that
to
set
up
an
observer
for
every
random
thing
that
you
want
to
set
is
like
like
20
lines
of
boilerplate,
and
it's
super
tedious
and
it's
easy
to
get
wrong.
So
what
we
want
is
a
class
where
you
just
you
just
basically
declare
it
and
you
give
it
into
the
constructor
the
thing
that
you're
observing
and
it
does
the
entry.
It's
the
observer
for
you
right.
G
A
C
A
A
E
Animal
I
have
to
drop
four
other
meeting,
but
this
I
just
wanted
to
say
we
could
be
attacking
this.
One
I
mean
well.
If
we
wrap
this
up
one,
we
should
we
should
be
attacking
it
as
a
as
a
I
also
saw
been.
You
know
there
without
me,
behaving
a
group,
a
group
of
them
or
some
with
some
way
to
get
to
the
whole
yeah
yeah
I.
Think.
A
A
A
C
A
A
C
A
A
E
C
A
A
C
A
A
G
If
it's
atomic
based
or
whatever,
then
it
needs
to
something
they
can
change
while
something
else
is
happening,
even
if
we're
it's
like,
if
it's
taking
grabbing
a
lock,
you
can
happen
in
between
operations,
but
not
during
an
operation
because
yep
that
makes
sense
so.