►
From YouTube: Ceph Performance Meeting 2018-11-29
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
C
C
E
E
E
E
E
E
A
D
F
D
E
D
E
The
way
to
approach
this
is
to
be
careful
with
the
investment
of
complexity
and
buffer
list
like
show
good
gains
before
we
make
it
to
much
harder
and
simplify
the
API
whenever
possible,
she's
been
doing
a
little
bit
of
both,
but
with
crimson
don't
use
Buffalo's
at
all.
Try
to
start
with
the
primitives
over
there
I
think.
That's
the
way.
That
is,
that
I
think
it's
gonna
be
a
real
challenge.
There's
gonna
be
a
ton
of,
could
rework
if
we
just
start
to
like
rip
out
a
felicitous
in
the
current
code.
This.
A
Crimson,
it
will
be
a
huge
pain
we
were
in
crimson.
We
wanted
to
exchange,
maybe
not
more
than
a
few
percent
of
codebase,
basically
only
dis,
only
those
parts
that
are
essential
or
for
for
performance
rewriting
the
machinery
for
make
for
a
good
for
taking
care
of
rebalance
of
the
capacitor
health.
Well,
it
be
rewriting
from
scratch.
E
E
E
C
E
I
think
it's
gonna
be
there.
Certainly
we
can't
get
rid
of
it,
but
I
would
I
would
try
to
like
when
we
write
the
message
primitives
or
like
the
messenger
stuff
like
try
to
do
that
in
terms
of
buffers
where
possible
and
on
the
IO
path,
for
example,
try
to
do
that
in
terms
of
buffers
and
it's
a
powerless
part
where
we
have
wherever
we
can
so.
C
A
E
Suspect
so
I'm
not
I'm,
not
really
sure
that
I'm
not
really
following
the
crimson
stuff
right
now,
I'm,
not
the
person
that
talked
about
it,
but
my
general
fence
and
I
defer
to
the
people
who
are
actually
doing
the
work
here.
My
general
sense
is
that
we're
possible.
We
should
try
to
use
the
sea
star
buffer
primitives
and
only
use
bufferless
where
we
need
to
for
compatibility
with
all
the
other
code.
Okay,.
C
Yeah,
so
one
of
the
choices
I
had
made
with
the
sea
star
messenger
was
to
redo
the
messenger
and
connection
types,
but
he
was
the
existing
message:
encode
decode
related
to
bufferless,
I
hope.
If
we
want
to,
we
want
to
change
that.
Then
we
basically
have
to
duplicate
the
encode
decode
for
the
messages
that
we
want
to
support
or.
E
Make
like
a
make
a
set
of
wrappers
like
C,
star,
encode
and
C,
star
decode
and
those
fallback
to
some
that
does
the
partial
list.
But
if
you
specialize
that
you
can
write
a
native
encoder
make
something
like
that
waking
up,
we
can
selectively
optimize
the
handful
of
messages
that
we
think
matter.
I.
D
Wonder
I
mean
if
we
actually
went
through
and
audited
the
code
that
uses
buffer
list?
Can
we
do
that
at
that
level
rather
than
trying
to
make
something
happen?
Eric
that
does
a
fallback
or
or
can
you
know,
selectively
optimize
I
guess
I
feels
like
some
of
the
complexity.
Right
now
is
because
we
use
buffalo's
for
everything.
E
E
Yeah,
it's
a
lot
harder.
They
like
we
tried
to
change
everything
to
be
in
terms
of
buffer
or
try
to
change
specific
things
to
be
in
terms
of
effort
pointers
instead
of
buffer
lists,
for
example,
or
something
and
now
I
guess,
I
think
that's
harder,
because
everything
right
now
is
written
in
terms
of
our
first.
That's
all
sort
of
generic.
A
A
E
A
feeling
that
the
other
problem
with
Oh
map
that
might
be
an
alluring
fruit,
it's
just
that
it's
getting
decoded
it
into
a
map
and
then
re
encoded
and
then
decoded
into
a
map
and
I
think
we
can
probably
bypass
a
lot
of
that
work.
I
am
fiddling
with
the
interfaces
so
that,
when
something
comes
in
over
the
wire,
it's
all
encoded
in
its
sequential
buffer
should
be
able
to
pass
that
straight
down
to
the
object,
store
layer
and
then
copied
into
there
actually
be
buffer.
Hopefully
without
allocating
a
red-black
tree.
E
D
A
E
F
G
Well,
it
doesn't
make
one
not
not
interesting.
I
think
that
the
general
buffer,
listen
permits
are
important
in
their
own
right.
Yeah,
yeah,
yeah
I
had
a
change
that
that
read
that
tried
to
change
the
trying
to
change
housing.
Oh
my
buffers
were
into
use
flat
map.
It
would
because
it
constructs
its
constructed
in
sorted
order.
Yeah,
but
I've
had
a
few
prompts
then
didn't
pursue
it.
Yeah.
E
My
eye
I
thought
we
started
to
do
some
of
this,
but
I
would
actually
try
to
avoid
unpacking
it
at
all,
like
maybe
have
a
have
a
function
that
will
just
like
walk
the
structure.
Being
code
is
structured
to
make
sure
it's
valid
like
to
do
a
validity
check,
but
not
actually
unpack
it
at
all
and
then
past.
That.
G
E
E
Totally
agree
anyway,
I
guess
coming
back
to
where
we
started
I
think
my
general
feeling
is
yes.
This
hyper
combined
buffer
list
is
complicated,
but
I
have
a
feeling
it's
going
to
be
worth
it.
E
E
E
D
Yeah
sure
we
had
a
user
from
the
UK
who
was
running
on
a
bunch
of
HPC
gear
and
they
were
seeing
this
huge,
dramatic
drop
in
performance
with
stuff
deployed
on
LVM
devices,
and
it
was
we
went
through,
did
a
whole
bunch
of
analysis
and
we
don't
have
it
100%
nailed
down
yet.
But
it's
looking
very,
very,
very
similar
to
another
bug
that
was
discovered
with
XFS
on
top
of
LD
m
devices
with
Intel
nvme
drives
where,
when
iOS
cross
certain
boundaries,
it
results
in
this
performance
degradation.
D
It
kinda
has
to
do
with
the
way
that
these
these
nvme
devices
have
internal
raids
and
distribute
io-22
things
internally
to
the
flash
cells
internally.
So
the
the
gist
of
it
is
that
it
I
think
it's
only
on
an
older
version
of
the
kernel.
They've
done.
Some
changes
there's
actually
been
a
lot
of
changes
in
the
DM
layer
since,
since
the
the
3x
series
for
Dex
looks
a
lot
different
and
then
there's
a
patch
or
310
that
is
different
than
what's
in
for
but
but
I
linked
it
to
it
in
the
the
the
etherpad
here.
D
The
the
only
reason
I
wanted
to
bring
it
up
here
is
because
now
that
folks
are
starting
to
youssef
volume
for
deployment
they
they
potentially
could
get
bitten
by
this.
If
they're
using
you
know
certain
nvme
drives
and
also
are
on
older
kernels.
So
if
you
see
a
huge
like
sequential
read,
performance
drop,
this
would
be
probably.
The
first
thing
you
should
look
at
is
my
guest,
especially
if
you're
using
LVM
or
SF
volume
for
deployment,
so
I
guess
more
of
a
public
service
announcement
than
anything.
But
that's
that's
what
to
look
for.
A
We
know
that
a
lot
of
costs
of
applying
buffer
list
comes
from
sharing.
Behavior
I
mean
ref,
counting
memory,
fencing
all
that
crazy
stuff
that
we
can
do
nothing
with
that.
If
somebody,
for
instance,
mixes
see
string,
happens
with
buffer
list
up
ends
with
buffer
pointers,
appends
etc,
we
can
I,
don't
see
any
possibility
to
avoid
inflation
of
underscore
buffers.
I,
don't
see
any
way
to
avoid
Khatami
operations,
but
if
there,
if
somebody
uses
only
buffer
lists
without
imposing
the
sharing
behavior
well,
we
can.
We
can
basically
educate
all
Atomics
I,
made
a
request.
A
Adam
I
posted
a
link
to
it
at
the
moment.
It's
working
progress
and
it's
built
on
top
of
the
hyper
combined
thing,
but
there
are
two
interesting
changes
from
the
shark.
From
the
sharing
stand
point
of
view,
one
is
on
creation
on
of
first
buffer
pointer
that
points
to
a
fresh
created
buffer
row
at
the
moment,
if
you
create
buffer
row
and
create
the
first
of
the
owner,
the
pond
pointer,
owning
a
given
row
instance,
even
in
such
condition,
you
have
atomic
operation.
A
You
are
in
atomic
increasing
an
erect
member
of
row
from
0
to
1,
using
atomic
but
I,
but
just
with
some
kind
of
extra
Thai
extra
bit
of
information
on
rate
type
system
we
can
avoid.
That
second
thing
is
decreasing
is
is
taking
out
instance
of
an
instance
of
buffer
pointer
when
it's
only
where
we
have
only
one
user
of
particular
buffer
buffer.
Oh,
we
can
avoid
a
technical
operation
and
there
I.
Think,
though,
if
you
have
only
let's
say
you
have
an
instance
of
buffer
list
that
has
one
buffer
pointer
pointing
to
one
buffer.
A
A
G
A
G
A
D
In
the
C
star
world,
it
seems
to
me,
like
you'd,
almost
want
right
to
be
copying
your
memory
into
whatever
local
memory
you
have
for
the
shard
is
operating
on
a
single
core
and
then
at
that
point
your
you've
got
already.
Some
pre
allocated
memory
that
you've
written
into
and
you're
never
doing
any
kind
of
like
crazy
stuff
with
it
right.
It's
just
you're,
operating
on
it
locally
is.
E
In
like
having
having
an
explicit
handoff
when
the
message
is
owned
by
the
messenger
and
it's
owned
by
the
OSD
and
then
given
back
to
the
messenger
or
whatever,
if
that
handoff,
it's
explicit
right
now,
like
it's
a
message
ref,
who
knows
where
the
references
are
in
the
system.
But
if
we
had
a
more
read
ownership
change,
then
maybe
a
lot
more.
E
H
E
Suspect,
what's
gonna
happen
is
they're
gonna,
implement
a
minimal,
IO
path?
That's
like
we'll
see
from
scratch
and
that's
not
gonna
include
anything
from
like
primary
log.
Pg
replicated,
back-end
or
whatever,
like
it's
gonna,
be
pretty
minimal
and
fresh
and
that'll
give
us
an
idea
that
what
the
new
structure
should
look
like
and
then
there's
going
to
be
a
painful
process
of
either
refactoring
current
OCCC
and
primary
log
PG
code
or
re-implementing.
H
A
E
Exactly
right,
but
I
think
that
sort
of
the
core
flow
through
the
OSD
for
IO
and
also
for
recovery,
is
kind
of
like
the
data
that
data
path
part
is
gonna,
need
to
be
rewritten
and
hopefully
simplified
and
streamlined,
and
whatever
in
the
process,
and
that's
going
to
be
hard
but
I
think
once
we
have
a
minimal
implementation,
we
can
sort
of
lot.
Of
course,
they're
doing
that.
There's
gonna
be
a
lot
of
code
that
needs
to
be
moved
around,
like
all
that
we
need
to
figure
out
how
to
factor
all
the
stuff.
D
Related
to
that
stage,
I
still
think
it
would
be
a
really
good
idea
to
just
implement
some
kind
of
PG
that
doesn't
do
log
based
recovery
at
all.
Just
does
nothing
right,
yeah
just
to
see.
Can
we
plug
something
like
that
in
and
then
kind
of
re
abstract
it
so
that
we
have
primary
log.
Pg
is
one
one
kind
of
thing
that
can
implement
it.
There's.
E
D
A
Well,
just
on
observation
that
we
are
using
pump
released,
I
mean
especially
the
append
method
of
buffer
list
taking
buffer,
East
or
buffer
pointer
and
doing
the
shallow
copy.
A
shallow
copy
I
mean
just
rifka
doing
the
ref,
counting
on
appropriate
buffer.
Instead
of
cup
of
of
running
with
non
copy.
Well,
I'm,
gay
I
bet
there
is
a
threshold
under
which
the
mem
copy
will
be
much
more
performant,
and
still
we
can
I
am
pretty
sure.
We
just
cannot
do,
cannot
enforce
deep
copy
behavior
on
each
a
pair
of
on
each
buffer
list
event.
A
We
have
at
the
moment
we
have
quite
a
similar
logic
inside
of
contiguous
appender.
It
has
a
constructor.
The
constructor
of
extra
I
off
continues.
Compander
has
extra
parameter
called
deep
by
default,
it's
false,
but
in
Brewster
or
encoding
extent
map
we
are
enforcing
our
continuous
offender
to
actually
do
deep
copy
of
each
buffer,
pointer
or
buffer
list
that
is
passed
to
it.
A
Maybe
we
could
try
to
adapt
Amelia
behavior
to
the
tube
of
buffer,
pointer
and
buffer
list,
taking
variants
of
buffer
list
append,
just
try
to
go
with
mem
copy
under
some
some
predefined
size
like
128
bytes,
or
something
like
that
he'll.
It
would
impose
extra
some
just
some
extra
logic
just
to
detect
us
to
check
whether
it's
worth
doing
mem
copy
or
not
so
it
might.
It
would
impose
on
some
some
little
extra
cost.
D
Radek,
my
my
my
selfish
hope
is
that
if
it
turns
out
to
be
really
useful,
then
maybe
it
means
that
we
can
do
things
that
are
that
that
we
could
do
it
more
more
explicitly,
rather
than
doing
like
a
runtime
check
have
things
that
aren't
gonna.
You
know
that
that
aren't
going
to
use
bufferless
in
a
way
that
very
often
that
would
be
large.
If
we
can
then
just
explicitly
use
something
that
doesn't
M
copy
rather
than
then
you
know
doing
the
check
on
it,
it
makes
sense.
A
A
The
main
variant
of
append
for
appending
buffer
is
to
buffer
list
to
do
mem
copy
and
try
to
profile
the
code
for
us
multiple
scenarios
as
possible
to
detect
those
cases
where
we
are
called
where
we
would
copy
Jumbo's
and
switch
them
to
something
like
Charlotte
to
explicit
shallow,
shallow
append
of
buffer
lists.
So.
G
A
G
Obviously,
though,
it
feels
a
little
bit
off-track
I
mean
we
don't
want.
We
want
to
avoid
atomic
operations.
You
should
avoid
atomic
operations,
because
these
are
cases
where
in
an
equesticle
bio
path,
you
want
to
use
I
of
X
for
things
that
are
not
tiny
but
but
for
tiniest
operations.
The
cpu
is
this
fast,
but
for
things
you
don't
know
the
nurse
ever
okay.
This
is
efficient.
G
If
you
just
created
the
problem
with
Atomics,
then
copy
isn't
the
way
around
it.
That's
expensive
too,
and
it
has
impact
on
cash.
Lining.
A
Temper
our
copy,
if
you
know
that
you
want
to
use
the
copied
data
data
soon,
you
can
instruct
CPU
to
completely
bypass
the
caches
to
avoid
the
pollution
of
well.
G
Not
well
non-temporal
copy
would
be
interesting
if
you
can
figure
out
a
way
to
get
the
propria,
but
first
since
in
operation
in
a
UNIX
environment,
but
which
is
not
I,
don't
think
it's
to
reals
do
but
I
challenge
you
to
do
it,
but
I
don't,
but
I
still.
Don't
think
that
this
that's
the
solution
to
removing
atomic
Sinden
could
where
there
isn't
actual
sharing.
A
A
C
E
C
E
E
E
G
E
Well,
I
would
I
would
expect
that
they're
pretty
it's
pretty
rare
that
we're
appending
very
small
buffers
also
right,
like
usually
they
use
your
usual
you're
appending
a
car
start
at
the
length
because
you're
doing
it
in
code.
But
when
you're
doing
sixteen
bytes
thumb,
because
you
have
a
16
byte
buffer
pointers.
Aren't
that
many
of
this.