►
From YouTube: 2016-OCT-16 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
A
A
It's
been
about
two
weeks
since
we
last
had
a
meeting
and
there's
a
whole
lot
of
movement
in
the
world
of
pull
requests.
Some
of
the
new
things
that
have
recently
showed
up
that
are
kind
of
interesting
is
Alan.
Has
his
PR
for
implementing
slab
containers
and
intensity.
The
current
memek
will
support
a
Zen
master,
and
this
is
going
to
be
really
really
important,
going
forward.
I
think
both
in
terms
of
reducing
memory,
fragmentation
with
and
stuff,
and
then
also
identifying
where
we
are
kind
of
doing
dumb
things.
A
We
already
kind
of
know
that
we've
got
a
lot
of
places
in
the
code
where
we're
creating
and
deleting
lots
and
lots
of
objects.
So
part
of
this
will
be
till
let
us
kind
of
track
that
down
better
and
then
also
generally
just
to
get
an
idea
of
where
our
memory
is
going.
How
much
we
are
spending
on
different
data
structures,
allowing
us
to
better
manage
how
much
memory
we
are
offering
in
different
places
and
kind
of
restricting
it
in
a
way
that
is
easy
for
the
user.
A
So,
rather
than
having
all
these
different
tweaks,
where
you
can
change
this
buffer
in
that
buffer
and
the
third
buffer
do
something
some
element
size
count.
The
goal
with
a
lot
of
us
is
to
say
that
here's
an
amount
of
space
that
the
user
specifies.
Maybe
it's
one
gigabyte
or
something
of
ram,
and
then
how
do
we
divvy
that
up
between
all
these
different
buffers
in
a
same
way?
So
that's
that's.
A
Let's
see
what
else
we
have
a
couple
of
different
blue
store
things
this
one
about
refactoring
blue
source
since
mid
transaction
I,
think
that
would
be
good
I,
think
that
will
lead
to
some
performance
improvements
going
forward,
which
would
be
good,
there's
a
really
interesting
one
here
where
how
my
has
started
implementing
RDMA
for
the
async
messenger,
so
kind
of
previously.
Excuse
me
previously,
a
lot
of
this
kind
of
work
had
been
done
in
xio
messenger
development
on
that
has
been
a
little
bit
slower
lately.
A
It
can
turn
means
to
be
seen
what
the
benefits
and
downsides
of
each
approach
are,
but
personally
I
think
it's
it's
great
to
have
people
and
working
on
things
that
they're
motivated
to
work
on
so
I
think
this
will
be
really
excited
to
see
where
it
ends
up
and
and
also
to
to
see
where
xio
messenger
ends
up
as
work
continues
on
that.
So,
if
you're
interested
in
in
RDMA,
it
might
be
worth
checking
out
and
see
kind
of
what
he's
doing
and
how
that
compares
to
exile.
A
A
A
Let's
see
what
else
to
move
our
be
issa,
apparel
transactions
missions,
blue
store,
that's
kind
of
dependent
on
these
other
things
that
sages
working
on
or
through
factoring,
is
forcing
submit
transaction.
A
lot
of
different
stuff
closed.
A
A
A
In
that
case,
the
good
news,
though,
is
that
that's
really
the
only
case
where
we're
seeing
a
regression
from
Nasik
messenger
in
all
the
other
cases
that
we've
tested
so
far,
it's
like
it's
on
par
or
maybe
marginally
faster,
so
I'm
dead,
good
news
there.
The
other
nice
thing
too,
is
the
async
messenger.
It
should
be
a
lot
lot
easier
on
the
memory
allocator,
so
I
suspect
that,
with
basic
messenger
memory,
allocator
tuning
is
probably
going
to
be
less
of
a
concern.
A
We
may
actually
be
able
to
relax
the
defaults
and
reducing
them
in
usage
a
little
bit
so
silver
lining
there.
What
else
we
have
going
on
the
11
to
13
at
a
PG,
fast
influitive
ute
for
to
reduce
/
io
metadata
updates?
That
was
really
fantastic.
A
Basically,
every
single,
I
oh,
we
are
doing
all
gathering
all
these
statistics
and
it
turns
out
that
and
I'm
actually
writing
recording
them
to
to
leveldb
or
rocks
TV
depending
on
the
back
end,
and
it
turns
out
that
was
a
lot
of
data
was
like
seven
hundred
fights
for
every
I.
Oh,
so
imagine
if
you're
doing
porque
iOS,
that's
a
fair
amount
of
overhead.
A
So
basically
this
just
reduces
the
frequency
for
the
attributes
that
don't
really
need
to
be
required
for
every
I,
oh
and
that
reduces
things
down
typically
to
around,
like
200
bites
from
700
bites
and
every
once
in
a
great
while
and
I
see
a
larger
one,
but
its
overall
much
better
I
think
we
were
seeing
with
blue
store
anyway,
something
like
a
fifteen
or
twenty
percent
performance
improvement
from
doing
that.
I
haven't
tested
file
store
recently
to
see
how
that
improves,
but
I
suspect
that
we're
seeing
some
improvement
there
as
well.
A
A
A
Encode
decode
and
that
that
hopefully,
should
improve
things
fairly
dramatically,
I'd
forgotten
exactly
what
that
did
earlier
today
and
was
surprised.
I
didn't
see
any
difference
in
the
Brooks
DB
compaction
behavior,
but
somnath
reminded
me
that
we're
not
actually
changing
the
the
actual
encoding
scheme
so
we're
still
using
like
bar
int
there.
A
A
The
RC
locking
work
is
progressing
well,
it
looks
like
is,
was
kind
of
reading
through
the
discussion
on
that,
and
it
looks
like
people
are
pretty
happy
with
how
it's
shaping
up
I'm
I'm
hopeful
that
that
will
yield
some
really
good
improvements,
but
we'll
just
kind
of
have
to
see
how
quickly
it
stabilizes.
A
A
A
The
Zetas
scale
integration,
the
sandisk
guys,
have
been
working
furiously
on
trying
to
kind
of
make
Zetas
scale
more
performant
in
kind
of
some
of
the
obscure
ways
that
stuff
want
their
that
loose
or
wants
to
use
it
and
I
think
they're
they're.
Getting
pretty
close.
I
heard
recently
that
they
have
now
got
4k
random,
I/o
behavior
in
Zetas
scale,
beating
blue
store,
sorry
blue
score
with
lettuce,
kale,
doing
4k
random
iOS.
That
is
now
beating,
rocks
DB
with
boost
or
so
want
to
see
kind
of
where
they
go
with
that.
A
But
but
it
sounds
hopeful
sounds
like
they've
got
some
soup
and
terminally.
That's
that's
doing
really
well
and
then
yeah
I
haven't
haven't
seen
too
much
of
this
other
stuff.
Any
updates
like
on
the
PM
device
for
blue
store
or
any
these
other
ones.
I
guess,
but
this
kind
of
have
to
see
how
that
goes.
Okay,
that's
that's!
Basically
it
for
poor
requests.
Are
there
any
questions
or
comments
on
any
of
those
I.
B
Have
a
question,
oh
sure,
can
you
hear
me
so
it's
yard
dma,
so
I
just
wanna
make
sure
I
understand.
What's
being
done,
it
sounds
like,
like
RDMA
supports
being
added
to
a
sink
messenger
where
before
it
was
in
xio
messenger,
and
the
problem
was
that
xio
messager
couldn't
really
interoperate
with
you.
B
A
Theoretically,
live
Xiao
can
use
TCP
behind
the
scenes
as
well.
I
think
the
the
interesting
questions
that
will
come
up
with
this
are
things
like
how
is
zero
copy
implemented
and
are
there
any
advantages
to
one
or
the
other
in
in
the
Box
I?
Oh
we've
hit
some
like
scaling
issues
related
to
the
way
that,
basically,
it
hit
I
think
I'm,
forgetting
now
exactly
what
it
is,
but
I
think.
A
If
you
have
like
multiple
OS
DS
in
one
box,
you
can
run
to
scalable
it
Gail
ability
issues
with
the
way
that
Lebec
taio
does
RDMA,
but
I'm,
probably
mangling
it,
but
anyway,
there's
there's
a
lot
of
cam
interesting
technical
details
that
might
tender
indicate
which
way
in
the
end
is
the
better
way
to
go
so
I
personally,
I
think
it's
really
good
that
both
are
being
worked
on.
This
stuff
is
complicated
enough.
That
I
think
there's
room
for
multiple
ideas
here.
So
yes,
that's
my
cake.
A
C
A
Thomna
sound:
do
you
guys
have
any
and
enter
you
guys
have
kept
updated
on
the
biggest
or
stand
up
we've
been
doing
with
their
scale
recently?
Would
you
be
able
to
be
rather
give
a
short
kind
of
overview
of
what
you
guys
have
been
working
on
yeah.
C
So
too
yep
can
you
hear
me?
Well,
yes,
yeah.
Basically
yeah
new
studies
going
through
Lord
of
changes
on
the
way
actually
allowed
to
model
died,
so
go
in
I
think
it's
dual
version
of
the
dealer
new
store
where
actually
we
started
integrating
and
with
that
we
will
do
short
press
that
rocks
to
be
pretty
comfortably.
So
now,
basically,
we
have
introduced,
although
sharding
and
everything
on
the
pill
store,
so
that
actually
having
extra
rights
on
the
on
the
underlying
underlying
waiter
at
a
store.
C
So
that
is
not
hurting
rocks
to
be
because
rocks
will
eventually
it's
it's
koala,
seeing
all
those
things,
but
that's
Harding
data
still,
because
it's
an
extra
right
and
it's
touching
the
other
between
or
so
so.
We
are
actually
actively
working
on
right
now
to
optimize
that
part
and
that's
so
actually
it's
taking
time
it's
it's.
It
seems
that
is
not
trivial,
so
we
are
pretty
close
and
we
are
working
on
that.
So
hopefully
something
will
come
up
soon.
C
The
advantage
it'll
be
the
scale
is
that
we
don't
have
to
deal
with
these
confessions
and
all
those
things
we
have
predictable
on
the
on
the
disk
that
we
don't
have
in
case
of
rocks
to
be
because
we
don't
know
the
exact
compaction
behavior
and
how
much
it
will
write
based
on
the
actual
and
obviously
compassion
will
go
higher
the
amount
of
data
I
will
be
writing
and
and
in
the
in
the
stable
state.
What
is
the
compaction
ratio?
It's
very
difficult
to
actually
analyze?
C
The
reason
is
that
it
also
depends
on
the
amount
of
data
on
the
rocks.
Tv
show
you
if
you
right
shake
100,
gb,
vs,
8,
terabyte
or
Satan
awake
for
now,
the
16
Taylor
white
whiskey's
are
also,
I
will
oppose
the
disc
and
it
means
wanna
waste.
If
you
do
that,
so
the
behavior
will
be
completely
different
so
that
we
are
trying
to
characterize
as
well
in
the
end
parallel
and
comparing
with
the
chick
escape
so
yeah
lot
of
work.
So
hopefully
we
will
get
some.
A
Poor
you
guys,
you
think
that
you'll
have
camps
me
test
results
that
you
can
share
publicly
anytime
soon.
I
remember
someone
had
mentioned
I
think
that
you
guys
were
beating
our
rocks
TV
in
your
class
version.
Now,
yes,.
C
So
that's
what
we're
gathering
data
for
that,
like
comparing
at
least
to
our
name,
is
to
have
that
not
to
do
like
a
bigger
data
set
and
and
making
like
writing
Lord
of
a
meter
data.
So
actually
it's
in
stable
state,
so
making
blue
store
and
rocks
to
be
in
a
stable
state
and
also
making
blue
stir
and
a
testicle
in
a
stable
state.
That
means
they
54
MB
objects.
C
Hopefully,
sometime
soon
we
can
present
it
in
the
the
performance
meeting,
for
one
thing
for
sure
is
that
if
you
run
rocks
to
be
continued
and
rooster
continuously,
even
I
I
have
some
results
on
the
10
hours
run
right,
mark
life
I
heard
of
that,
but
even
even
that
you
will
see
that
it's
not
stable.
It's
continuously
basically
yeah
it's
slowly,
but
for
surely
it's
going
down
towards
the
steady
state.
So
what
is
the
steady
state
number?
We
don't
even
know
today.
C
A
Okay,
so
for
the
people
that
are
super
familiar
with
booster
here,
basically,
the
min
Alex
size
is
controlling.
A
Essentially,
how
the
end
result
of
changing
is
basically
that
you
change
how
much
data
ends
up
going
into
the
metadata
store
the
number
of
blobs
and
that,
like
it
recorded,
increases
dramatically
as
you
decrease
the
metallic
size.
But
you
also,
then
don't
have
to
do
right
ahead.
C
One
thing
for
sure
mark
is,
is
for
it
a
skill
with
the
performance
will
be
will
be
hearting
with
Amina
log
high
of
16
k,
because
in
in
case
of
rocket
EV,
the
way
we
are
doing
is
that
we
are
just
doing
water
right
and
you
attract.
We
are
deleting
that
key,
so
that
really
doesn't
go
to
the
SSD
file
friend
and
in
10
like
impacting
complexions
right
so
for
data
scale.
We
checked,
and
we
know
that.
C
Okay,
that
extra
for
that
4k
right,
the
double
right
basically
is
going
so
and
then
the
leaf
node
we
have
so
that
will
actually
impact
the
performance
badly.
So
we
have
to
forget
a
skill
we
have
to
work
with,
at
least
in
the
first
cut.
We
have
to
go
with
the
middle
of
size
of
4k
and
we
will
try
to
improve
that.
Okay,
us.
C
B
C
C
C
B
A
A
A
A
We
kind
of
expected
that
in
the
rbd
case,
large
buffers
would
help
a
lot
to
allow
iOS
to
different
to
the
same
object
to
maybe
be
coalesced
and
then
the
the
other
case.
We
expected
that
large
buffers
might
actually
hurt,
because
now
you've
got
a
bunch
of
data
that
all
has
to
be
compacted
at
the
same
time
and
competitors
would
take
longer
that
was
kind
of
the
assumptions
that
we
that
we
made.
What
what
it
turns
out
is
that
that's
sort
of
the
case.
A
You
can
see
that,
like
in
this
rbd
case,
where
you've
got
small
buffers,
basically
there's
32
meg
buffers
and
we've
got
up
to
32
of
them.
But
the
big
thing
here
is
that
the
the
min
write
buffer
number
to
merge
as
one
that
basically
means
that
every
time
you
felt
one
of
those
32
megabyte
buffers,
it
will
go
and
write
it
out,
and
so
do
you
have
these
small
files
in
level
0
that
then,
are
compacted
and
the
compaction
shouldn't
take
this
long,
and
in
that
case
yeah
you
see
that
actually,
it
is
pretty
fast.
A
Well,
it's
like
twice
as
fast
as
in
the
other
cases,
oddly
enough
with
4k
Menelik
size
that
sorry,
that
was
with
16
caiman
Alex
eyes
in
the
4k
metallic
size
case.
Where
theorem
we
have
more
metadata
to
deal
with,
we
don't
have
right
head
log
rights
that
could
be
right
leaking
to
level
0,
it's
actually
taking
a
long
time
that
was
unexpected,
but
but
there
it
is.
A
The
big
thing
that
you'll
see
here,
though,
is
that
in
this
case,
where
we
have
small
buffers
that
get
ridden
out
really
quickly
where
they
fill
up.
Where
one
fills
up
and
we
start
writing
it
out
the
amount
of
data
that
gets
compacted
from
level
zero.
Two
of
them
one
is
huge
compared
to
compared
to
the
other
cases.
A
The
unfortunate
part
of
that
is
that
it,
it
doesn't
seem
like
there's
right
now,
any
good
way
around
it.
It
kind
of
looks
like
we
will
need
to
allow
large
buffer
alert
large
right,
I
had
a
lot
of
buffers
and
actually
be
using
for
256
megabytes
as
opposed
to
32
32,
mingle,
I
buffers
improved
performance
by
like
2x
so,
and
it
was
the
same
in
every
test
here.
A
A
All
we
really
see
is
reduction
the
number
of
compaction
and
in
fact
the
average
compaction
time
is
lower
when
we
use
larger
buffers,
probably
because
there's
so
much
less
data
to
deal
with
the
the
right
amp
is
just
generally
lower,
so
everything
goes
better
in
this
case
again,
we
see
that
performance
is
is
again
better
with
larger
buffers,
so
essentially
reducing
the
right
amp
is
is
trumping
everything.
It's
this,
the
the
big
effect
that
we
see.
We
also
did
test
with
the
universal
compaction
and
universal
compaction
is
cam
interesting.
It
helps
my
model.
C
What
we
go
ahead
solution
on
that,
like
your
data,
so
you
are
saying
that
okay,
16
game
winner,
look
sighs
with
32mb
buffer
and
eight
of
that
right.
The
first
one
you
are
getting
2
42
mb
per
sec
and
and
with
60
16
came
in
a
lock
size,
16
k,
Mina,
love,
sighs,
all
right,
yeah,
16k,
Mina
locks
eyes
with
256mb
and
one
of
them
you
are
getting
2
35
mb
per
second.
So
it
seems
that
okay,
small
buffer
and
number
of
what
for
more,
is,
is
basically
more
suitable.
No.
A
Well,
it's
a
good
question:
I
mean
if
you
look
down
at
the
radio
Spence
numbers
below
it.
Arguably,
maybe
the
the
other
cases
is
actually
a
little
bit
better
in
that
picture.
So
it
may
also
be
just
that
we
hit
like
a
boundary
to
that's
different
I,
guess
I
guess
there
are
actually
a
couple:
fewer
compaction.
In
the
16k
case,
with
four
large
buffers,
there
were
like
two
fewer
compaction,
Zand,
the
other
one,
but
the
tokamak.
C
A
A
Have
that
one
I
could
do
it
quick,
I,
just
hadn't
run
it
because
sage
was
kind
of
uninterested
in
the
32
small
buffer
case
with
eight
buffers,
or
was
it
but.
C
C
C
B
A
May
be
that
actually
is
a
little
bit
better
I'm,
not
opposed
to
considering
that
as
an
option
as
well.
If
we,
if
it
can,
makes
logical
sense
in
the
gist
of
it,
though
right
is
that
you
know
the
numbers
are
pretty
similar,
I
mean
they're.
It
there's
enough
variability
here
that
it,
it
might
be
that
the
the
32
small
buffers
with
the
men
write
buffer
number
to
merge
at
set
of
eight
is,
is
it
might
be
marginally
better,
but
it's
close
and
the
same
with
greatest
bunch.
You
know,
maybe
maybe
the
alternate.
A
The
four
large
buffers
is
with
a
min
write.
Buffer
number
to
merge
set
at
one
is
maybe
marginally
better,
but
again
it's
really
close.
So
the
takeaway,
though,
is
that
you
can't
in
this
case
it
does
not
look
like
having
small
buffers
and
only
in
trying
to
flush
them
really
quickly.
After
one
fills
up.
That
seems
to
be
the
pathologically
bad
case.
That's
the
one
where
yeah.
C
And
one
more
thing:
one
more
thing
is
that
recently
in
the
process
between
some
profiling
on
that
so
load
of
rocks,
TV
things
are
popping
up.
So
probably
that
number
of
buffers,
if
it
is
eight
and
thirty
two
things
it
has
to
do,
Marge
and
all
those
things
it
may
end
up,
having
like
lot
of
extra
CPU
cycles.
So
do
you
have
any
data
on
that
so
like
is
there
any
cpu
cycle
difference
between
one
were
four
and
eight
buffer?
I
have.
A
A
what
I,
this
literally
just
finished
like
an
hour
ago
so
high,
I
couldna
copy
and
pasted
all
the
data,
and
so
that
we
can
make
you
show
it
for
the
meeting
so
yeah,
it's
I'll,
take
a
look
at
it
and
hopefully
present
some
interesting
things
there.
A
We
still
see
that
the
buffer
sizes,
basically
how
much
data
you
have
openness
allowed
in
aggregate
in
buffers
before
you
flush,
is
the
predominant
factor
that
determines
performance.
But
having
said
that,
it
looks
like
universal
compaction
handles
small,
smaller
buffers
better
in
the
object
for
K
object
creation
case,
then
then
level
of
action
does,
but
it's
a
pretty
big
improvement
in
this
can't
worst-case
scenario
compared
to
level
compaction
and
we
do
see
a
little
bit
of
a
performance
increase
for
or
these
big
buffer
cases
as
well.
A
A
A
Interestingly,
here,
though,
in
the
four
case
size,
one
of
the
things
is
that
there's
a
lot
less
compaction
traffic.
It's
about
two-thirds,
the
amount
of
traffic
in
the
other
case,
so
rank
right,
amp
is
going
down,
basically,
which
is
what
universe
actually
supposed
to
do.
So
it
is
decreasing
ramp.
There's
fewer
connections,
total
amount
of
data
output
data
is
is
much
smaller.
You
know
or
two-thirds
the
size.
I
guess
that's
good.
The
performance
is
no
different.
A
So
that's
interesting
that
that
essentially
I
guess
that
means
that
these
nbme
devices
have
enough
throughput
still
available
to
them
that
the
amount
of
right
traffic
or
into
the
device
doesn't
really
matter
we're
being
bounded
by
something
else.
So
maybe,
as
we
improve
blue
store,
we
might
actually
see
this
Universal
compaction
fork
in
and
outside
starting
to
pull
ahead,
because
we're
writing
out
less
data.
But
the
downsides
of
universal
compaction,
though,
is
that
it
increases
thighs
amplification.
A
Potentially,
you
can
see
up
to
a
true
x,
increased
temporarily
in
the
amount
of
storage
required
during
compaction
and
then
also
read
amp,
maybe
bigger
as
well.
So
we'll
just
have
to
can't
see
how
that
plays
out.
I
think
for
right
now,
I
still
am
going
to
recommend
that
we
stick
with
either
for
256
megabytes
buffers
or
this
kind
of
32
832
case
and
not
do
universal
compaction.
A
There
are
some
other
limitations
of
the
universe,
compaction
that
I
think
you
can
only
have
100
gigabytes
of
data
total,
so
we
may
need
to
shard
rocks
TV
be
fully
deep.
Did
that.
But
this
is
this.
This
kind
of
four
came
in
Alex
eyes.
Case
is
really
worth
watching
because
with
less
data
being
what's
right
and
I
should
say
that
might
actually
start
playing
ahead
at
some
point.
So
basically,
this
is
this
is
what
we've
been
looking
at
with
the
rocks.
A
Tv
I
know
it's
it's
a
lot
of
data,
but
but
the
hope
here
is
is
that
once
we
understand
this
well,
we'll
kind
of
have
a
better
insight
into
how
all
of
this
should
be
tuned
and
kind
of
as
we
change
Rockstar
change
blue
store,
what
things
are
going
to
be
important
going
forward?
So
any
any
questions
on
this
I
guess
before
I
before
I
pick
it
up
screen.
B
Yeah,
so
I'm
not
really
going
to
do
anything
like
the
you
know
today.
I
just
want
to
see.
If
was
interesting
to
people
and
mention
a
particular
problem.
I
ran
into
so
we're
testing
with
a
thousand
a
hard
drive,
Oh
SDS
across
29
servers
and
we're
basically
trying
to
integrate
that
with
OpenStack
on
20
compute
nodes,
and
you
know,
get
some
performance
data
for
that.
B
The
integrated
stack
and
one
thing
we
one
thing
I,
you
know:
we've
got
a
starting
to
get
performance
data
for
Steph
that
you
may
be
able
to
share
if
people
are
interested
at
some
point.
But
the
the
thing
that
concerns
me
a
little
bit
is
that
I
ran
into
a
problem,
tracker
and
I'm,
going
to
try
to
dig
that
up
for
you
right
this
second
and
post
it
in
the
chat
window
and
see
what
people
think
so,
here's
the
tracker.
So
it
started
out
as
I
noticed.
B
There
was
a
problem
running
radis
bench
with
CBT,
and
it
was
just
basically
that
the
number
of
file
descriptors
defaults
of
2024
and
that
had
to
be
increased
for
a
raid
oz
bench
to
work
reliably
and
I
thought.
That
was
that
was
it
and
then
later
on,
I
started
running
cbt
fio
tests
and
lo
and
behold,
I
read
in
the
same
problem,
and
so
what
happens?
B
Is
it
gets
an
error
back
saying
can't
create,
you
know,
can't
create
the
file
descriptor
for
the
socket
and
but
it
doesn't
error
out
it
just
kind
of
hangs,
and
what
concerns
me
about
that
is
that
you
know
if
it
just
hangs.
People
aren't
going
to
mediately
understand,
you
know
what's
going
wrong,
and
so
my
question
to
you
is,
you
know:
does
this
you
know?
Is
this
a
bug
in.
C
B
A
B
B
Well,
well,
that
already
happens
so
there's
no
problem
there,
but
the
problem
is
the
applications
that
use
lib,
rbd,
okay,
famously
and
later
on.
We
ran
into
the
fact
that
openstax,
you
know
libvirt,
you
know
kvm
processes
that
are
implementing
guests,
don't
increase
it,
and
so
that
was
causing
his
problems,
which
we
fixed.
But
the
point
is
like
it
doesn't
show
up.
Clearly
you
know
when
that
happened.
When
this
is
going
wrong,
it
just
kind
of
lib
rbd
doesn't
seem
to
really
kind
of
clearly
log.
B
You
know
the
problem
and
I
was
wondering
anybody
is
hit
this
and
if
you
know
whether
there's
something
that
needs
to
be
done
differently
or
is
just
a
bug
in
these
applications
or
what
is.
B
B
A
B
Is
I
guess
that's
small,
but
I
mean
the
point.
Is
a
we
tried
to
choose
a
value
that
was
somewhat
like
what
you
might
expect
from
a
like,
a
real
you
know
or
like
a
application,
so
we're
basically
trying
to
scale
out
the
number
of
volumes
rather
than
cram
as
many
iOS
as
possible
through
a
single
volume.
Yeah.
A
B
A
B
No,
it's
not
because,
first
of
all,
when
you
increase
the
the
file
descriptor
limit,
the
problem
goes
away
and
the
second
thing
is
the
TCP.
The
thing
you're
talking
about
it
has
to
do
with
recycling
ports
and
if
you
have
an
application
or
your
cost
of
connecting
and
closing
and
connecting
and
closing.
A
B
Anyway,
but
I
mean
overall,
things
are
going
pretty
well,
I
mean
it's.
We
got
like
you
know.
There
was
one
test
where
we
got
up
to
like
40
2
gigabytes,
a
second.
You
know
it's
just
kind
of
a
nice
little
eye
popping
thing
for
me:
I,
don't.
B
Getting
up
to
like
there
was
one
test
where
retest
random
read
where
we
got
to
like
140,000
die
offs.
You
know
which
is
not
for
you
folks,
it's
probably
pretty
boring,
because
you're
working
with
all
SSD
configurations
and
that
sort
of
thing.
But
you
know
it's
not
bad-
for
heart,.
B
A
B
I
would
love
to
do
that.
It's
what
we
could
do
is
try
to
schedule
you
in
because,
unfortunately,
we're
competing
with
a
lot
of
folks
for
this
hardware,
so
arm.
C
A
A
B
B
C
A
Ben
a
question
about
your
your
earlier
question
about
the
you
limit.
So
again,
I'm
sorry
excited
I'm
not
super
familiar
with
all
this,
but
it
does.
Each
TCP
socket
require
a
filehandle.
Is
that
that's
is
that
true,
yeah.
B
A
B
You
are
correct,
basically,
if
you,
when
you
close
the
the
TCP
socket
it
the
connection
hangs
around
for
a
while
is
one
of
those
time
wait
things
that
you
states
that
you're
talking
about
and
then
after
some
number
of
seconds,
depending
on
the
Carl
parameter
for
it
it
recycles
the
connection,
but
you
can.
That
can
be
adjusted,
but
I.
Don't
think
that's
what's
happening
here,
because
basically
you
I,
don't
think
live.
Rvd
is
constantly
recycling.
Tcp
connections.
I
could
be
wrong,
but
I
won't
my
understanding,
it's
opening
them
and
keeping
them
open.
A
I
would
assume
that
I
would
assume
that
it's
doing
like
okay,
so
you've
got
like
a
thousand
no
STDs
or
whatever
right
the
presumably
it's
like
if
you're
doing
random.
I
always
like
blasting
stuff
off
on
all
these
different
crazy
directions,
and
I
would
I
would
assume,
I
guess
that's
closing
the
connection.
So
it's
not
keeping
like
connections
to
a
thousand
different
servers
open
like
from
every
single
client
all
the
time.
Maybe
the
thing
that's
incorrect,
but
I
I
would
suspect
that
is
closing
them
at
some
point
here.
I
will.
B
A
Why
I
was
wondering
that
even
if
it
closes
them,
though,
presumably
all
these
file
handles
are
being
left
open
for
a
while?
That's
what
I'm
wondering
if
that's,
why
you
have
to
kick
it
up
so
high?
If
that's
that's,
even
though
it's
not
keeping
you
know
a
bunch
of
them
open
concurrently,
maybe
maybe
that's
why
you're
running
out
of
file
angles,
I'll
check
on.
B
A
B
Inside
right,
I
agree:
you
can
I
just
didn't
I
didn't
notice
that
kind
of
time
wait
buildup,
but
I
wasn't
really
at
paying
attention
to
it.
At
that
point,
I'm
so
I'll
take
a
look
yeah.
It
might
be.
A
That
just
increasing
them
our
file
handles
fixes
it
for
you,
but
you
may
also,
if
this
is
what,
if
my
suspicions,
I
guess
is
correct,
if
you
reduce
or
even
eliminate
the
recycling
are
the
keeping
it
around
I
guess
not
the
recycling
to
keeping
it
around
I
wonder
if
that
might
also
improve
the
situation,
but
anyway,
that's.
That
was
my
thought.
All
right.
Thanks
a
lot
all
right,
but
we
are
out
of
time
guys
any
last
minute
comments.