►
From YouTube: 2019-10-31 :: Ceph Performance Meeting
Description
.
A
A
The
the
core
stand
up
is
going
over,
so
we
might
get
a
couple
more
people
possibly
coming
in,
but
they
were
still
in
the
middle
of
discussing
bugs
and
stuff,
so
might
be
a
little
bit,
but
I
will
get
started
on
pr's
here.
Oh
there's,
this
new
orbitty
cache
replicated
right.
Log
I
think
it's
part
of
a
previous
one
that
had
been
submitted,
maybe
earlier,
but
I
think
Jason
want
to
have
this
broken
for
the
smaller
parts.
So
this
is
apparently
the
first
part
of
the
previous
one.
A
A
A
A
Let's
update
his
stuff
here,
I
actually
have
not
gotten
through
all
the
old
viewers,
but
I
think
got
through
mostly
ones
that
matter.
So,
let's
see
what
has
been
updated,
my
pair
for
pinning
own
ODEs
in
a
separate
list
or
the
own
cache,
a
new
store
was
tested,
approved
merged,
and
then
we
unmerged
it.
We
were
heard
of
it
because
it
was
breaking
a
couple
of
tests
that
I'd
missed.
A
It's
not
clear
to
me
why
it's
broken,
though
I
asked
Josh
finally
to
take
a
look
at
it
as
well,
just
to
make
sure
get
another
set
of
eyes
on
it,
but
I
am
a
little
suspicious
that
maybe
it's
we
have
some
broken
behavior
somewhere
else,
and
this
is
just
exposing
it,
but
one
way
or
another.
We
need
to
figure
out.
What's
going
on
before
we
try
merging
it
again.
A
Let's
see
I've
got
a
PR
for
increasing
the
default
number
of
our
JW
bucket
shards
easy.
We
can
discuss
this
more
if
later
on,
but
my
recommendation
I
guess,
would
just
be
to
let's
test
it
and
see
what
happens.
A
A
Let's
see
what
else
MBS
auto
treating
of
MBS
cache
memory
limit
that
needs
to
be
rebased
again.
The
out
work
is
still
ongoing.
Patrick
continues
to
review
that
Adams
brach
CB,
sharding,
PR
I,
tested
that
a
couple
weeks
ago.
Adam
is
also
doing
a
bunch
of
testing
on
that.
Oh
and
Adam
you're
here.
So
if
you
maybe
after
we
get
to
the
PRS,
if
you
want
to
talk
a
little
bit
about
what
you've
been
seeing,
what
you've
been
testing
that'd
be
great.
A
Let's
see
who
else
we
have
here
fix,
broken
and
use
calculations
and
blue
store?
I,
don't
remember
what
that
is,
but
I
think
it's
was
buggy,
so
it's
being
fixed
by
you
or
reviewed
that
oh,
can
you
you're
you're
here
so
yeah
I,
don't
actually
know
anything
about
that
one.
Is
there
anything
interesting
going
on
there
right?
What
does
it
even
fix.
C
D
A
Let's
see
Adam
his
I
object,
PR
is
is
continuing
to
move
along.
It
looked
why
I
just
looked
at
it
looked
like
it
was
passing
a
two
thumbs
around.
So
that's
exciting,
I
think.
Maybe
Josh
also
took
a
look
at
that
and
approved
it
and,
in
addition
to
other
people
also
have
been
taking
a
look.
So
that's
really
good.
Hopefully
that
gets
moved
merge
soon.
A
Fantastic,
that's
great
to
hear
all
right.
Let's
see
the
IOU
ring
io
engine
and
blue
store,
though
we
did
have
some
discussion
a
while
back
about
that,
and
maybe
like
kind
of
refactoring,
a
bunch
of
the
code,
that's
related
to
a
IO
and
blue
sore,
but
that's
can
be
able
to
work
and
I.
Don't
think
anyone
has
time
to
do
it
now,
so
the
gist
of
it
is.
B
B
Additional
idea
about
making
collection
beasts
in
a
synchronous,
because
current
implementation
might
spark
for
a
while,
under
some
circumstances,
and
actually
I
have
seen
how
this
costs
us
aside
time,
outs
or
design
of
collection
is
to
be
improved.
My
opinion,
well
anyway,
this
part
which
implements
perfection,
is
ready
for
review
about
a
synchronous
collection
listing
my
needs
on
discussion.
Oh.
B
A
A
A
B
C
A
specific
chart
is
done
by
calculating
some
harsh
and
this
solution
was
extensively
tested
by
me
for
performance.
There
is
a
presentation
that
was
given
on
Saturday
this
week
in
Poland,
and
the
results
are
as
follows:
it
seems
not
to
give
any
reduction
to
latency
tail
latency
significant
reduction.
There
are
some
some
test
cases
when
this
tail
latency
can
be
observed,
but
in
general
latency
in
became
greater
force.
C
From
the
positive
side,
we
have
a
much
better
write,
amplification
and
much
shorter
compaction
times
like
two
times
even
up
to
two
thousand
also
write
amplification
up
to
two
times
better,
but
beside
that
it
I
would
rate
its
performance,
its
unsatisfaction
re,
but
I
just
simply
refuse
to
acknowledge
that
this
solution
sharking,
would
somehow
fail
for
rocks
DB
if
it
was
so
successful.
In
other
cases,
I
will
just
roll
null
in
task
to
roll
out.
C
I
have
to
confirm
that
if
so,
then
second
attempt
shardene
could
be
to
actually
invoke
multiple
instances
of
rock
TV,
but
that
would
be
hugely
difficult
as
additional
sharing
of
low
FS
and
or
something
else
would
have
to
be
done.
So,
basically,
that's
it,
though,
from
from
the
my
point
of
view,
only
the
first
of
the
three
full
requests
is
actually
useful.
The
one
that
makes
integration
was
columns,
families
into
our
kv
database
and
allows
us
to
properly
assign.
C
A
That
out
of
my
a
I
think
it
sounds
like
the
results
I
saw
when
I
was
testing
also
can
match
what
you
are
seeing.
I
can
link
those
again
I,
think
you've
seen
them
already,
but
is
not
not
higher
performance,
not
significantly
worse,
but
you
know
maybe
a
little
tiny
bit
worse,
but
then
much
much
lower
compaction,
overhead
and
a
much
lower
right,
amplification,
but
still
still
good
I
think
so
worthwhile.
A
F
F
A
F
F
D
F
D
C
F
A
Tim
one
one
thing
I've
been
a
little
worried
about
with
us
is
that
I've
seen
some
opinions
will
say
on
various
mailing
lists
and
things
that
that
when
you
have
a
large
and
that's
somewhat
undefined
number
of
column,
families
that
you
can
potentially
hit
some
locking
issues
with
the
right
head.
Log
I'm
wondering
I.
A
A
A
What
in
there
are
some
scenarios
that
we
saw
from
customers
where
they
were
seeing
like
20
or
30
second
compactions,
and
it
looked
like
maybe
the
right
ahead
log
was
was-
was
starting
to
throttle
because
it
couldn't
keep
up
with
the
you
know.
The
the
compactions
were
so
slow,
but
it
was
triggering,
but
the
righthand
log
was
was
getting
into
this
slowdown
state
and
then
the
the
interest
rate
for
writes
was
really
love
was
lower.
I
was
curious
if
your
PR
may
help
in
that
situation.
C
Well,
that
that
what
you
explained
could
be
a
explanation
for
like
sometimes
having
latency
of
operations
like
13
seconds
or
even
more
so
I
guess
in
some
tests.
I
do
have
I
might
have
that,
but
I
did
not
observe
it
directly.
I
mean
I
did
not
have
an
air
and
tool
kit
set
to
actually
confirm
the
find
having
that
okay.
C
C
A
A
C
A
C
A
big
and
one
more
note
about
why
compaction
in
that
case
may
have
a
smaller
awry
data
produced
it's
because
if
you
have
some
pool
that
has
just
all
maps-
and
you
add
it
a
lot
and
there
is
a
trigger
for
compacting
all
maps,
then
it
doesn't
really
touch
any
other
objects
in
any
columns.
So
that
would
also
have
impact
for
reducing
write,
amplification,
yeah.
A
Alright,
let's
move
on
the
next
thing
I
had,
and
this
was
boost
our
trim,
update,
I,
guess
I
already
can't
covered
it.
Just
if
it
is
is
that
we
broke
master
briefly
I'm
trying
to
figure
out
why
it
doesn't
appear
that
there's
anything
really
obviously
wrong
with
the
PR
I
am
still
suspicious
that
maybe
we're
we're
exposing
some
mother
bug
in
blue
store,
but
well,
maybe
have
time
to
take
a
look
at
it,
and
then
we
can
figure
out.
A
What's
going
on
so
moving
on
from
that
ok,
rgw,
bucket,
charting
and
chart
counts,
and
all
this
stuff
we
don't
have
Eric
but
Casey
you're
here
so
I've
got
a
PR
where
I
was
advocating.
Once
we
have
the
the
bucket
list
efficiency,
they
are
merged
that
we
up
the
number
of
shards
I'm,
advocating
for
a
very
high
number
Casey
I
think
you
were
advocating
for
slightly
something
a
little
bit
more
conservative,
I
guess.
A
E
D
A
So
those
are
kind
of
the
to
me.
It
seems
like
those
are.
The
two
things
is
the
offset
or
the
the
the
improvement
that
you
get
in
single
bucket
right,
throughput
versus
the
performance
penalty
of
of
you
know
what
it
does
to
bucket
listing
throughput
is
that
does
that
seem
reasonable
or
desisting
right.
E
A
A
Yeah,
so
so
I
guess
the
trade-offs
are
with
more
shards
listing,
slows
down
to
some
extent,
which
we
probably
should
retest
to
see
how
it
is
now.
Potentially,
the
creation
and
deletion
of
buckets
themselves
is
slower
and
then
the
trade-off.
The
the
benefit
potentially
with
more
shards,
would
be
faster,
single
bucket
right
through
a
faster
single
bucket,
object,
deletion
and
don't
cry
merrily
be
it.
Is
that
sound
right.
E
Yeah
but
I
feel
like
one
thing
that
that
we're
not
considering
here,
is
how
we
always
scale
with
or
adapt
to
the
workload
with
restarting
and
I
are
tunable.
There
is
basically
the
hundred
thousand
keys
per
per
shard
before
we
look
to
split,
sure
and
and
I
think
I'm
very
interested
in
looking
at
that
and
figuring
out.
E
If
it's
giving
us
the
right,
the
right
scaling,
and
so
when
we're
talking
about
picking
a
good
default
for
or
new
buckets,
I
think
that
we
just
want
to
fit
that
into
the
model
for
how
we
adapt
and
make
sure
that
that
the
curve
I
guess,
looks
right.
So
that's
kind
of
why
I'm
hesitant
to
add
a
ton
of
shards,
because
then
it's
going
to
take
forever
before
before
we
start
rewriting
and
adapting
to
the
workload.
A
E
A
A
E
E
G
E
A
E
H
H
Don't
know
we're
also
looking
at
other
strategies
to
represent
bucket
indexes,
but
but
in
terms
of
this
I
mean
that
might
be
correct.
I
mean
I,
think
that
I
think
I
mean
I
apologize
for
missing
the
first
part
of
this
and
I'm
not
going
to
bladder
a
lot
I'd
like
to
talk
more
with
Adam
in
this
context
and
learn
more
about
what,
whether
that's
the
true,
whether
that's
a
likely
assertion
to
him.
If
we're
seeing
we're
seeing
if
we're
seeing
a
response,
you
know
I'm
saying
first
see
if
we're
seeing
a
behavior.
H
That
was
just
that
without
that,
could
that
that
would
work
or
if
we
have
a
way
of
investigating
that
it
sounds
it's
not
it's
novel,
it
might
be,
it
might
be
true,
I,
don't
know
well.
A
H
Separately,
yes,
yeah!
Well,
your
honor,
well,
Your,
Honor
I
think
we're
on
the
right,
nay
right
track.
Here.
We
need
to
do
I
do
believe.
We
need
a
different
type
of
growth
curve
for
for
sharding.
It
appears,
like
you,
probably
want
to
jump
to
sharding
quite
quickly,
but
not
necessarily
very
wide
and
then
grow
less
frequently,
but
but
more.
I
To
me
yeah
so
hey,
this
is
Prasad
from
Flipkart
won
on
the
topic
of
numbers,
we've
had
clusters
which
were
statically
sharded
to
32
shots,
buckets
which
were
shorter
to
32
and
had
a
cap
of
1
million
objects
per
bucket,
and
that
has
served
us
quite
well.
But
when
we
on
another
cluster,
which
was
running
luminous,
dynamic
sharding,
we
had
this
one.
You
know
a
hundred
thousand
objects
Charlemont
and
we
were
have
seen
you
know
occasional,
slow
requests
coming
in.
I
Frankly,
because
the
total
number
of
objects
itself
was
high,
but
you
know
no
matter
who
does
a
delete.
You
know
there
used
to
be
a
lot
of
tombstone
entries
on
the
rocks,
DB
and
so
there's
one
lakh
itself.
You
know
seemed
not
quite
helpful.
So
to
me,
the
1
million
divided
by
32,
which
comes
to
roughly
about
30k
objects,
seem
like
a
nice
number.
I,
don't
think
100,000
that
we
have
is
to
address
what
needs
to
be.
You
know
increased
or
you
don't.
Okay
do.
H
H
Should
be
observing
I
think
when
you
use
a
when
you
use
an
even,
for
example,
a
non
prime
number
of
Sheridan
switch,
which
we
adjusted
in
upstream
just
recently
that
wasn't
the
intent
of
the
original
designs.
It's
just
a
simple:
they
used
modular
arithmetic,
so
it
shouldn't
be
doing
that
so
I
think
again,
I
think
you
could
a
significant
imbalance.
Even
then,.
I
I,
don't
idea,
we
were
like
wondering
why
I'm
probably
missing
something
basic
pardon
me
for
that,
but
we
were
thinking
why
there
was
no
4pg
rocks
DB
instance.
Why
is
it
that
we
had
to
have
like
one
big
rocks
TB
for
all
the
PGS
on
a
given
OSD
and
no
matter
who
does
a
delete
on
mapping
to
whichever
PG
it
affects
the
primary.
A
When
you,
when
you
do
a
write,
you
would
be
able
to
have
the
entire
transaction.
Only
the
redhead
log
is
a
single
operation
right,
very
simple
transaction
that
you
have
instead
of
spreading
out
like
lots
of
rocks
TB
instances
and
lots
of
logs
over
a
hundred
with
you
know,
potentially
lots
of
random
I/o
happening.
Then
you
hit
one
log
with
you
know
all
of
the
things
necessary
to
operation.
I
H
C
C
I
I
The
blue
FS
partition,
we
move
to
DB,
dot,
slow
and
then
still
spelled
over
and
it's
just
too
heavy
with
a
lot
of
slow
requests
coming
in,
and
then
we
had
to
keep
adding
more
and
more
OS
DS
and
then
making
the
size
of
the
rocks
TVs
much
thinner
and
the
compaction
times
we're
like
really
on
a
per
device.
T
basis.
A
I
A
H
Yeah,
it
simply
was
untenable
and
they
first
and
for
that
reason
that
was
like,
when
index
index
was
index,
those
buckets
were
created,
I
mean
it
simply,
it
simply
had
a
would
simply
hit
a
wall
and
as
you
as
you'd
expect,
how
big
was
the
logarithmic
wall.
H
H
A
H
H
A
H
H
If
you
know
you
know,
you
know
access
assurance
by
constructing
a
bucket
with
us
for
the
weather,
unix-like
/,
shard
name,
/
prefix,
and
you
and
you
can-
and
you
were
ensured-
that
it
that
it's
or
prop
or
strongly
implied
that
you
can,
that
you
get
that.
Show
that
you,
because
you
that
you
get
a
shirt
affinity
if
I
doings
up
by
doing
this.
H
Well,
they
rented
issues
I
believe
they
ran
into
issues
similar
to
ours,
but
they
would
they
want.
At
the
end,
the
idea
was
to
make
it
user
visible,
so
the
FIR,
so
that
so
that
users
can
construct
work.
They
wanted
to
get
any
wanted
to
construct
some
amount
of
get
some
type
of
guarantee,
guarantee
or
partial
guarantee
that
the
users
would
know
that
they
probably
were
asking
like.
How
do
we
get
well?
How
do
we
get?
How
do
we
care
and
get
guaranteed
parallelism
row
for
a
work?
H
H
Like
this,
we
don't
I
mean
I,
think
I,
think
I
think
transparency
are
doing
is
the
way
to
get
is
I,
don't
think
we're
gonna
be
perfectly
okay
using
this
for
sharding
I.
Think
I
think
I
think
that
excusing
a
simple
you
know
using
a
hash
space.
Selector,
that's
it
to
place.
Everything
is
a
great
way
to
get
uniform
distribution,
but
it
has
the
issues
it
has
I
think
we're
gonna
moving
out
of
that.
H
It
may
always
be
a
good
strategy
for
some
more
clothes,
I,
don't
know
which
ones
but
I
think
you
know
other
things
we're
looking
at
doing
have
they've
added
three
others
with
it.
The
splitting
based
approach
that
you
know
that
then
I'm
just
I,
figure
out
this
days.
What
that,
if
I
just
wanted
the
plus
tree
approach,
we've
got
working
out
of
design
and
annotation
for
that.
A
Yeah
in
my
ideal
world
there
would
be
no
splitting
for
most
people
that
we
could
get
to
the
point
where
we
can
have
a
default
number
of
shards.
That's
not
significantly
impacting
any
performance
at
all,
and
then
you
know
it's
fast
enough
that
that
for
most
people
they
don't
care
about
having
more
shards.
A
H
Yeah
I
mean
people.
Are
we
gonna
say
that
the
new
thing,
which
is
it
well?
It's
not
so
much
the
starting
to
say
that
you
know
it's
not
so
much
the
shard
structure,
especially
after
your
Eric's
improvements
to
scale
and
the
numerating
things,
and
it
was
unexpectedly
good
that
made
that
may
that,
may
that
may
short-circuit
the
need
for
a
different
way
of
doing
splits.
You
know
that
aren't
fully
autonomic,
but
he
said
that
people
can
still
say
well.
H
H
Sorry,
distribution,
it's
supposed
to
work,
what's
where
it's,
where
it's
fluctuating
I
know
a
lot
I,
don't
know!
If
the
letter
is
all
that
common
to
be
honest,
I
think
there's
a
common
work.
Let
me
hear
about
where
we're
starting,
creating
a
new
bucket
ever
have
a
stream
and
we're
flooding
and
things
are
flooding
in
we're
getting
we're
getting
to
the
natural
charm.
You
think
we
should
be
at
at
any
rate
for
that
for
that
for
the
for
the
for
the
for
the
steady-state,
you
know
size
on
the
bucket,
but
there
it
has.
H
H
H
G
H
Is
much
much
better
than
anybody
and
then
listen
Lena
that
I
predicted
much
much
better?
That
means
that
if
you
dress,
if
you
started
eight
shards
or
something
or
sorry,
seven
shards
then
look
really
these
are
things.
Are
things
are
probably
pretty
good
for
a
liver
for
a
lot
for
a
long
run
to
come,
Yvonne
says
I
have
won
more
times
to
get
to
multiple
millions
of
objects
and
that
that
would
be
tolerable.
A
My
my
hope
and
I
know
Casey
and
Eric
have
expressed
a
lot
of
concern
with
this,
but
my
hope
is
that
when
we
actually
start
testing
some
of
this,
we
might
see
that
the
impact
of
even
more
shards
on
bucket
listing
isn't
very
significant,
like
we
might
be
able
to
get
up
to
17
or
even
like
31,
and
have
very
little
impact
on
my
I.
Think.
H
H
A
H
H
Yes,
that's
that's,
probably
pretty
common
and
there's
some
other
weird
ones,
but
yeah,
but
I
thought
I.
Maybe
it
would
be
useful
to
actually
measure
such
distributions
and
Eric's
some
of
Eric's
work
and
what
he
called
Olympic.
Don't
love
you
buckets.
He
was
looking
at
that
nning
any
collected
some
upstream
information.
If
other
folks
have
upstream
people
have
upstream
Apostle
Peter
publicly
it
you
know
real,
viewable
or
or
a
few
skateable
bucket
listings.
This
way
this
could
be.
This
could
be
useful
to
us.
H
Both
both
we
see,
we've
seen
both
and
people
and
people
have
an
applications,
heaven
infer
things
about.
What's
why
they
would
do
each
one.
Sometimes
it's
just
random
reasons
and
other
times,
I
think
it's
one's
gonna
be
faster
than
the
other
or
in
either
it.
Maybe
it
is
or
it
isn't
if
we
have
their
also
work.
Let's
arguable
construct
and
destroy
buckets
for
the
same
name,
frequently
producing
that's
a
great
data
sets
or
recycle.
Our
data
sets
that's
that
are
like
snapshots
or
but
we're
being
replaced
by
new
and
from
you
know,
by
new
information.
A
H
H
H
Buckets
absolutely
it
does
because
it's
basically
we're
basically
just
taking
number
of
shards
modulo
the
shared
count
to
get
to
the
place
in
Underland
and
there's
a
retiree
taking
that
we're
taking
up
we're
taking
out
we're
taking
some
random
over
other
than
the
structure
structure,
collection
based
on
the
name
of
the
of
the
object
and
modulo.
The
third
count
should
place
it.
You
know
visiting
the
shard,
so.
A
H
I
I
A
Right
well,
if
I
have
time
this
week,
which
I'm
hoping
I
will,
though,
we'll
see
how
this
similar
stuff
goes
at
Matt,
I'm,
gonna
attempt
I
think
to
run
some
tests
using
Eric's,
PR
and
and
look
both
at
the
bucket
creation
and
deletion
times
along
with
listing
times
and
then
read.
Earth's
I
write,
throughput
and
deletion
throughput
about
use
within
a
bucket
with
different
prime
number
accounts,
just
to
kind
of
get
an
idea
of
where
we're
at
okay.
H
H
Starting
at
a
bigger,
bigger
number:
okay,
I,
don't
think
you're,
gonna
notice,
much
difference,
but
I
think
it's
gonna
behave.
It
should
be
a
hitch
like
you
say
it
should
behave
better
as
we
know
this
because
they
scale
up
if
we
could
find
it
better
and
in
general,
about
better
curve.
You
know
free
shard
points,
yeah,
I,
think
that'd.
E
A
A
H
I
mean
yeah:
we
should
you
should
go
with
what
the
numbers
say:
I
mean
they
meant
that
they
observed
that
it
said
that
it's.
What
did
you
see
because
I
think
you
tested
already?
You
tested
a
variety
of
right.
You
like
to
charge
once
around
around
to
eight
and
there
and
around
32
and
so
forth.
Those
if
you
didn't,
if
we
didn't
see
you
know
that,
like
you
say
they
well
yeah,
let's
talk
offline,
but
I,
don't
know
what
their
concern
is
to
them.
Yeah.