►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So
hello,
everyone
and
welcome
to
another
step
tech
talk
we're
here
for
the
month
of
october.
We
push
this
one
over
just
a
little
bit
to
make
time
for
the
speaker
and
so
right
now
we
have
we're
going
to
be
hearing
about
scale
testing
with
red
hats.
Yes,
with
over.
Let's
see
how
many
is
that
10
billion
plus
objects?
B
Now,
all
right,
so
thanks
thanks
mike.
Yes,
it
is
insane
and
it
is
actually
10
billion
plus
objects.
So
so
yeah,
hey
guys,
hello,
everyone.
My
name
is
karan
singh
and
I'm
a
senior
solution
architect
at
red
hat
cloud,
storage
and
data
services,
business
unit-
and
I
do
lots
of
stuff
in
my
daily
activities.
B
So
before
I
go
to
this
one.
So
this
is
not
yet
another
performance
testing
that
we
have
done
with
red
hat's
app.
I
mean
the
team
to
which
I
belong.
We
do
lots
of
performance
testing
on
on
saf
and
openshift
container
storage.
So
this
is
not
yet
another
one
where
in
this,
in
this
testing,
we
specifically
wanted
to
to
test
ceph
object,
storage
with
not
one
or
not-
two,
not
even
five,
but
a
ten
billion
objects
into
the
subsystem.
So
this
is
the
actual
division.
B
We
started:
okay,
let's,
let's
stop
this
testing
once
we
hit
the
10
billion
mark
because
we
wanted
to
go
to
10
billion.
That
was
the
intention
of
this
car
or
this
this
project.
But
that's
that's
why
this
is
not
yet
another
performance
testing
for
us.
It
was
predetermined
that
we're
gonna
build
the
cluster
until
10
billion,
so
this
is
a
very
rare
view
of
the
cluster.
So
all
the
folks
who
are
gonna
view
this
view
this
youtube
or
or
join,
live
in
this
session.
It's
a
stuff
status
output.
B
We
have
10
billion
objects
into
the
surf
system
and
you
can
expect
that
the
pools
the
pools
are
are
near
full.
My
osd's
are
full
blue
store,
is
spilling
lots
of
data
onto
spinners,
and
you
know
a
lot
scrubbing
and
deep
scrubbing
everything
right.
So
this
is
pretty
standard
command,
but
the
output
is
not
standard.
So
that's
the
reason
I
put
it
here.
B
So
it's
the
time
when
we
hit
10
billion,
so
it
captured
this
the
screenshot
and
if
you
go
to
the
cef
metrics
dashboard,
it
also
explains
the
same
story
like
my
bucket
redox
gateway
data
pool
is
having
you
know:
10
billion
objects
into
the,
so
we
kept
on
like
fifth
of
the
26th
may
we
started
this
testing
and
then
there
is
a
bump
in
here.
I'm
gonna
explain
this
this
bump
in
here
and
then
you
know
we
kept
on
adding
more
objects
until
we
reach
10
billion.
B
So
this
is
again
a
rare
view
from
one
of
our
cluster
that
we
tested
in
this
lab.
So
one
would
ask
like
hey:
why
did
he
chose
10
billion?
Why
not
7.5
or
something
right?
So
this
year
in
february
2020
we
have
tested
redexa
storage
to
one
billion
objects
and
the
the
results
are
already
published
on
redhat
dot
com,
slash
blog,
slash,
storage,
you
can
you,
can
just
google
that
that
link
right
now
and
so
yeah
we
last
the
last
testing
this
year
was
on
one
billion.
So
we're
like
hey.
B
What
should
we
do
next?
Should
we
do?
Should
we
do
something
more
interesting
with
sef?
So
that's
why
I
decided.
Okay,
let's
go
with
10
this
time
and
one
other
thing
which
came
to
us
like
okay,
you
know
other
object,
storage,
solutions
and
systems.
B
They
aspire
to
scale,
to
billions
of
object,
that
to
one
day
all
right
so
but
seth
can
do
it.
Today.
Seth
is
a
10
year
old,
10
year
old
matured
technology.
It
can
definitely
do
do
10
billion,
but
you
know
somebody
has
to
test
it.
So
that's
that's
why
we
started
with
this
okay,
let's
test
10
billion
and
put
the
numbers
out
in
the
community
and
and
for
the
customers.
B
Another
motivation
was
that
we
could
see
a
lot
of
attractions
on
from
customers
around
data
lake
use
cases
for
object,
storage
and
when
I
say
data
lake,
it's
you
know.
Big
data
workloads,
they're
gonna,
put
they're
willing
to
put
big
data
workloads
on
object,
store,
which
means
they
end
up.
Writing
lots
and
lots
and
lots
of
object.
So
somebody
have
have
to
test
that
and
see.
Does
it
really
perform
good
or
are
there
any
implication
with
when
they're
gonna
write?
B
You
know
lots
of
objects
into
the
subsystem,
so
that
was
also
one
of
our
motivation,
and
this
is
this
is
the
the
motto
of
our
team
in
which
I
work.
So
we
try
to
educate
and
motivate
the
communities
we
work
into
the
customers
we
work
day
in
day
out
and
our
partners,
so
we
want
to
educate
the
field
and
the
community
and
customer
with
the
rich
data
set
and
backed
by
empirical
evidence.
So
that
was
another
one
of
the
reason
we
chose
to
go
with
with
an
even
number
like
10
billion.
B
When
you
say
executive
summary
to
me
so
yeah
it.
It
might
look,
you
know
too
flashy
for
you,
but
actually
it
I'm
not.
You
know,
I'm
gonna
explain
each
and
every
term
but
which
I'm
gonna
call
in
this
executive
summary,
so
red
hat,
storage
or
saf
in
general
has
delivered
a
deterministic
performance
for
both
small
objects
and
large
object
workloads.
So
there
is
no.
You
know
there
is
no
marketing
term
here.
B
We
just
put
it
here
in
front
of
you
that
yeah
we
got
some
numbers
which
are
cool
for
both
kind
of
object,
uses
storage,
use
cases
because
in
production
you
will
have
a
mixed
variety
of
workload.
It
could
be
small,
it
could
be
large
depending
on
the
use
case
you're
using,
but
for
across
the
board.
We
saw
a
deterministic
performance
from
storage
system,
understanding
that
the
scale,
so
what
is
scale
for
us.
B
So
again,
we
have
ingested
10
billion
plus
objects
into
the
system,
and
we
have
we
have
tried
to
retrieve,
if
not
all,
but
most
of
them
are
during
the
read
read
test
so
and
all
of
these
10
billion
objects
were
across
spread
across
a
hundred
thousand
plus
buckets
into
into
the
surf
system,
and
each
of
the
buckets
were,
were
you
know,
configured
to
store
a
hundred
thousand
objects.
B
All
of
this,
the
data
that
we
have
crunched
into
this.
They
it
was
spread
across
318
spinning
devices
backed
by
36
nvme
devices
for
blue
store
metadata
overall,
getting
close
to
five
petabyte
of
raw
capacity
into
the
system,
and
it
took
us.
You
know
several
days
because
you
know
it
will
take
time
to
write
this
many
objects,
but
over
over
close
to
500
unique
test,
runs
to
achieve
this
10
billion
mark
into
the
system.
B
So
that
was
the
scale
for
us
understanding
the
hardware
and
the
software
inventory
in
in
the
lab.
So
I'm
thankful
to
intel
and
seagate
for
providing
us
the
necessary
equipments
for
for
this
testing.
So,
overall
we
had
six
red,
xf
storage
nodes.
B
Each
of
the
nodes
were
equipped
with
53
16
terabyte
spinning
devices
using
seagate
jbods
e4e106,
the
for
the
jbar
that
we've
used
for
this
six
intel,
qlc
39
devices
with
three
7.6
terabyte.
They
were
used
for
blue
store
and
then
intel
xeon,
gold,
processors
and
some
memory,
or
maybe
lots
of
memory,
and
then
some
standard
networking
in
place.
So
this
was
my
setup
with
respect
to
the
clients.
We
had
six
clients,
so
you
can
see
like
one
to
one
mapping
with
clients
and
osg
nodes
from
the
software
part
of
things.
B
We
have
used:
rel,
8.1,
red
hat,
sap,
storage,
4.1
and
all
the
daemons
were
containerized,
the
osd
monitor,
managers
and
android
gateway,
they're,
all
containerized.
We
did
something
special,
I
mean.
We
know
that
from
from
our
last
one
billion
testing
that
if
we
deploy
multiple
redox
gateways
per
osg
node,
it
delivers,
it
tends
to
deliver
better
performance.
So
this
time
we
we
went
with
two
redox
gateway
instances
or
the
containers
on
each
node,
which
means
total
12,
redox
gateway,
end
points
we
had
in
the
system.
B
All
the
testing
was
based
on
ec
4,
4
plus
2
coding,
and
everything
is
here
you
see
here
is
based
on
s3,
s3,
access
modes
and
again
hundred
thousand
objects.
In
each
bucket
for
workload
generation,
we
chose
cosbench
with
six
drivers
and
12
workers
and
64
for
threads.
So
we
I
didn't.
B
We
came
up
with
this
number
after
some
rounds
of
initial
tests
so
that
we
can,
we
can
know
what
would
be
the
right
workload
that
we're
going
to
apply
to
the
system,
because
once
we
are
once
we
are
set,
we
don't
want
to
change
any
other
thing
into
the
setup.
So
we
will
keep
everything
constant
and
just
measure
the
performance.
B
So
this
is
how
the
software
inventory
and
hardware
inventory
look
like
this
is
the
the
lab
architecture
so
again,
six
six
f
nodes,
and
then
we
had
a
melon
x
switch
here,
25
2
into
25,
which
is
50
gig
pipe
for
for
the
for
the
cluster
and
the
storage
network
and
standard
10
gig
management
code
for
for
basic
internet
and
management.
So
nothing
complex
here,
pretty
standard
lab
setup.
B
With
respect
to
the
workload
selection,
we
chose
to
go
with
a
64
kilobyte,
which
represents
the
small
objects
and
128
megabyte,
which
represents
the
large
object.
So
again,
coming
back
to
my
executive
summary,
we
have
tested
for
both
small
and
large
object.
Workloads
because
you
never
know
your
customer
or
or
a
user
would
be
running.
You
know
variety
of
different
workloads.
It
could
be
you
know
small
text,
files
or
log
files
or
images
or
videos-
or
it
could
be.
B
From
access
pattern
point
of
view,
we
had
a
pretty
mix,
100
get
100
port
and
then
a
combination
of
get
post
put
list
and
delete
operation
with
respect
to
s3
in
capacity
of
70,
25
and
5.,
so
pretty
mixed
workload.
Here,
we've
also
done
a
degraded
testing
like
try
to
fail
manually
intentionally,
fail
one
device,
one
spinning
device
and
then
success
spinning
device,
and
then
you
know
one
entire
node
failure,
because
we
also
wanted
to
test
at
this
scale.
B
What
will
happen
if
I'm
failing
one
node
or
six
spindles,
or
maybe
an
entire
node
from
a
cluster
which
is
like
you
know,
eighty
percent
filled
up
or
seventy
percent
filter.
So
we
wanted
to
know
how
does
the
performance
change
compared
to
steady
state?
So
that's
also.
We
have
incorporated
in
a
test
plan,
so
this
is
the
first
graph
for
you
guys.
B
This
is
a
small
object
performance
and
the
key
metrics
that
you
should
be
looking
here
is
operations
per
second,
so
I'm
going
to
help
you
understanding
this
graph.
Now
let
me
move
this
thing
here.
So
you'll
see
from
the
blue.
The
blue
bars
here
represents
the
objects
ingested.
The
redox
objects
into
the
subsystem
or
the
stuff
pool,
we
start
from
zero
objects.
Until
we
we
reach
to
the
top
of
the
peak,
which
is
10
billion
or
10
000
millions.
B
Due
to
the
course
we
measured,
both
s3
get
and
s3
put
performance.
I
also
have
numbers,
for
you
know
the
mixed
workload,
but
it
will
make
my
graph
look
too
confusing.
So
that's
why
I
just
have
not
plotted
here.
So
the
red
line
here
represents
the
s3
put
performance,
and
the
golden
here
represents
the
kit
performance.
So
let's
go
over
this
one
by
one.
B
B
Just
just
bear
with
me,
but
overall
you'll
see
right
from
the
zero
object
until
until
very
end,
until
the
10
billion
object,
the
s3
write
performance
or
the
put
performance
were
pretty
pretty.
You
know
straight
line
so
which
is
like
you
know,
close
to
deterministic
performance
with
respect
to
get.
B
There
are
some
interesting
things
happening
along
the
way
we
we
wrote
the
objects
into
the
system,
so
we're
going
to
go
into
the
details
of
each
and
every
you
know,
ups
and
downs
in
this
course,
but
yeah
overall
overall,
we
if
I
average
out
this
number
right
from
zero
objects
until
10
billion
10
billion,
I'm
getting
somewhere
around
close
to
17
800
at
s3
put
operations
and
28
800
s3
get
operations,
so
it
is
operations
per
second.
So
it
is.
B
It
is
pretty
decent
number
from
from
object
storage
at
this
scale
right,
if
you
ever
reach
out
and
if
we
normalize
these
numbers
for
spinning
media
or
per
device
right
which
will
help
you
in
some.
You
know
calculations,
I'm
going
to
explain
that
also
later.
So,
if
I
just
divide
this
number
by,
you
know
the
number
of
spindles
I
have
like
318,
I
I'll
come
to
this
number
like
okay,
60
s3,
put
operations
per
second
from
a
single
pinning
device
and
90s3
get
operations
from
a
single
spinning
device.
B
So
again,
this
is
not
the
the
actual
performance
of
your
of
your
spinning,
spinning
media,
which
you
know
the
vendors
will.
Will
call
out
like
hey,
I
can
do
like
you
know
whatever,
like
100
iops
per
second
or
whatever
right,
but
this
is
the
s3
workflow
performance,
so
this
is
completely
different.
What
you
get
on
the
internet,
so
this
is
the
s3
workload
performance
all
right,
so
this
is
the
graph
number
one.
Now,
let's
go
into
the
details
of
these
these
clips
in
this
graph
right.
Let's
try
to
understand
technically
what
happened.
B
So
this
is
the
section
of
the
performance
graph.
So
the
first
thing
which
happened
in
our
cluster,
which
we
observed,
was
at
around
close
to
one
billion
objects.
When
my
system
was
at
one
billion
objects,
I
suddenly
saw
you
know
performance
going
down.
When
I
looked
into
the
system,
I
could
see
a
lot
of
deep
scrubbing
going
on,
which
is
a
standard
safe
operation
staff
will
try
to
protect
your
data
by
using
deep,
scrubbing
and
scrubbing
time
to
time,
but
we
saw
a
lot
of
you
know
deep
scuffing
going
on
into
the
system.
B
So
at
this
point
we
have
chose
not
to
disable
entirely
the
deep
scrubbing
part,
because
deep
scrubbing
is
something
good
right.
So
you
don't
want
to
disable
that
point,
but
what
I
did
is
I
reduced
the
rate
of
deep
scrubbing,
which
is
which
is
not
cheating
right,
I'm
just
you
will
you
will
do
this
in
production
as
well?
You
will
reduce
the
rate
of
your
deep
scrubbing.
So
that's
what
I
did
at
this
point.
B
So
the
performance
came
back
up
after
like
one
or
two
tests
on
so
this
explains
the
the
drop
in
here,
which
is
which
is
attributed
to
deep,
scrubbing
effect
into
the
system,
and
then
you
know
again
my
performance
restored
and
kind
of.
I
I
went
ahead
and
the
next
thing
which
we
observed
was
at
around
5
billion
objects
like
50
of
my
cluster
thing
and
what
we
observed
here
is.
You
know
there
was
a
power
outage
in
in
the
lab
where
we
hosted
all
the
equipment.
B
So
it
was,
it
was
a
long
power
outage.
It
was
like
up
to
48
hours
and
then
after
48
hours
we.
So
what
happened
is
what
the
power
outage
right.
So
all
the
ups
powers
were
were
gone,
so
all
the
six
ceph
nodes,
all
the
client
nodes
they
they
abruptly
got
powered
off
like
somebody
pulling
the
the
cables
right
this
power
power
outage.
So
imagine,
like
you,
know,
in
a
safe
cluster
or
a
storage
system.
B
B
So
once
the
power
got
restored,
we
powered
on
all
the
all
the
nodes,
like
you
know,
in
any
order,
because
I
had
like
six
nodes,
so
we
just
powered
on
all
the
nodes
one
by
one
and
once
the
os
came
up
and
all
the
sub
services
came
up,
all
the
parts
came
up
for
not
parts
the
the
containers
came
up
for
my
my
cluster
after,
like
you
know,
30
to
1r,
once
all
the
pairing,
all
the
pg
pairing
completed,
my
cluster
was
back
to
normal,
just
after
you
know
a
few
minutes.
B
Without
I
doing
some,
you
know,
repairs
or
doing
some
you
know
disc
replacements
or
whatever
right.
So
it
was
a
magical
moment
to
me.
A
self
cluster
with
5
billion
objects
suddenly
lost
power.
We
bring
it
up
and
all
of
a
sudden
everything
was
calm
and
the
storage
was
again
serving.
You
know
the
objects.
So
at
this
point
we
observed
a
performance
drop
in
in
the
get.
The
reason
to
this
is
that
you
know
you
know
guys
know
that
theft
uses.
B
So
all
of
all
of
a
sudden,
my
my
my
heart
caches
in
into
the
memory
got,
you
know,
got
flushed
out
and
then
you
know,
then
you
know
this
is
the
core
reason
for
for
the
outage
here.
Well,
especially
for
the
for
the
get
get
performance.
B
Remember
the
memory
flushing,
because
it's
non-volatile,
so
all
the
pre-prepared
caches
they
just
vanished
away.
But
for
forget
we
haven't
seen
anything
like.
If
you
see
just
compare
this,
there
is
just
a
minor
blip
in
here,
but
then
the
performance
for
the
for
the
get
restored.
Just
after
we
we
did.
The
test,
so
this
was
the
second
event
which
happened
for
us.
The
third
event
in
the
testing
happened
at
around.
You
know
when
we
reached
to
a
very
high
critical
level
of
capacity
spatial
capacity
usage
in
the
system.
B
So
at
this
point
my
system
was
like
you
know,
seven
to
eighty
percent
filled
up.
We
could
have
choose
to
not
write
anything
after
at
around
8
billion.
But
again
our
goal
was
to
hit
the
10
billion
mark
and
see
what
happens.
So
we
kept
on
writing
data
into
the
subsystem,
though
it
is
not
advised
that
in
any
storage
system
you
should
not.
You
know,
fill
fill
the
cluster
to
its
throat
right.
B
You
should
leave
some
ban
some
capacity
available
in
your
storage
systems,
but
we
have
not
followed
it
because
we
want
to
choose
to
go
to
10
billion,
so
we
could
see
a
significant
drop
in
performance
which
is
attributed
to
you
know
a
spatial
capacity,
high
utilization
of
the
spatial
capacity,
as
well
as
a
combinatory
effect
of
filling
over
of
the
blue
store
metadata
from
nvme,
fast
storage
into
spinning
devices.
B
So
the
guys
who
are
familiar
with
ceph
and
how
booster
works
at
certain
level
blue
store
tend
to
move
data
from
flash
onto
the
slower,
slower
tier
available
which
causes
performance
implication.
We
know
that
already.
So
there
was
no.
You
know
no
surprises
here
so,
but
a
good
point
is
that
the
ports,
the
puts
haven't,
got
impacted
by
even
by
by
the
spatial
capacity
as
well
as
data
movement.
However,
we
see
a
significant
drop
in
the
get
performance,
so
these
were
three
major
events
happened.
B
While
we
were
ingesting
data-
and
here
are
some-
you
know
some
graphite
graphs
for
you,
so
that
you
guys
can
you
know-
relate
this,
so
this
was
deep,
scrubbing
effect
going
into
my
system,
and
if
you
see
my
this
is
the
read
and
write
ratio
of
of
each
and
every
spinning
devices
there
are
318
of
them.
You
can
see
these
are
my
test
runs
going
on,
but
all
of
a
sudden
I
started
to
see
lots
of
lots
of
read
because
system
is
doing
deep,
scrubbing
and
describing
is
a
is
a
read
intensive
operation.
B
So
I
could
see
a
lot
of
lots
of
deep
slaving
going
on
and
at
the
same
time
the
performance
goes
goes
down
because
the
discs
were
busy
in
doing
something
else.
So
this
was
a
deep
scrubbing
affirmation
from
grafana
graphs,
and
here
are
some
more
graphs
from
grafana,
which
explains
the
the
blue
store
spilling
effect.
So
you
guys
know
that
blue
store
uses
rocks
to
be
and
rocks
to
be
uses
level
style
compaction.
B
We
have
a
graph,
we
have
a
blog
in
here
which
explains
this
this
thing
in
great
detail,
so
the
rockstar
has
multiple
multiple
layers
so
level
zero
is
in
memory
and
then
goes
on
to
level
one
until
until
until
you
know
too
many
levels,
but
a
portion
of
it,
if
it
can
store
this
data
onto
flash
rocks,
will
be
prefers
to
do
that.
If
you
have,
if
you
have
big
enough,
you
know
db
of
a
blue
store
database
sizes,
so
roxy
we
will
try
to
put
that
on
on
flash.
B
But
if
it's
not
able
to
write
on
flash,
if
you
are
limited
by
the
capacity
of
the
flash,
it
will
go
and
dump
that
on
to
spinning
media
because
it
has
to
put
it
somewhere.
So
in
our
system,
until
l4
caches,
we
were
managed
to
get
or
we
were
managed
to
store
the
data
onto
opt
onto
qlc
intel
qlc
devices.
B
Until
then
the
performance,
what
was
pretty
good
as
the
level
five
hit,
which
is
you
know,
which
was
like
2.56
terabyte
of
data
for
every
osg
device.
It
cannot
fill
in
that
on
the
flash
media
or
the
flash
flash
partition
we
had
for
blue
store
database,
so
it
has
to
move
the
data
on
to
spinning
devices.
So
if
you've
seen
here
so
this
is
the
metric
called
as
f
blue
fs
slow
use,
byte,
which
demand,
which
tells
you
that
how
much
data
is
moving
on
to
the
slow
tier.
B
So
if
you
can
see
this,
the
testing
all
started
at
26
there
was.
There
was
absolutely
no
data
movement
until
this
time
and
soon
after
you
know
after
fifth
of
july
or
june,
I
started
to
see
lots
of
data
moving
going
into
the
these
spinning
devices,
which
explains
that,
yes,
there
was
a
data,
blue
store,
spillover
effect
which
came
into
place
and
caused
cause
cost
of
performance
degradation
which
is
expected
so
no
surprises
as
well.
B
Okay-
and
this
is
the
graph
which
explains
about
the
latency,
so
how
much
time
does
it
took
for
for
the
system
to
you
know
to
respond
so
on
an
average?
If
I,
if
I
do
the
average
right
from
zero
object
until
10
10
billion
like
half
a
second
of
latency
from
the
system
right,
you
could
say
that
you
know
it
is.
It
is
too
much,
but
for
for
an
object,
storage
system
it
and,
depending
on
the
application
that
you're
using
it
depends
what
latency
you
want
to.
B
You
know
build
your
system
with
so
for,
like
you
know,
for
some
some
latency
intensive
system,
you
want
to
bring
it
down
and
there
are
mechanisms
to
to
bring
it
down.
That's
not
a
problem
and
with
respect
to
the
get
get
latency
on
an
average
of
27
milliseconds
of
get
get
latency
that
we
that
we
observe
from
the
system.
C
B
Yeah,
no
big
surprises
these
two.
You
know
the
peaks
that
you
see
in
here.
It
is
attributed
again
to
the
power
outage
and
the
deep
scrubbing
effect
going
into.
But
overall
again
you
know
pretty
pretty
straight
line
across
the
board
right
from
zero
and
10
10
million.
B
The
next
graph
comes
is
the
so
the
last
one
was
small
object.
The
next
one
is
large
object.
What
will
happen
if
I
write
128
megabyte
object
into
the
subsystem?
Definitely
by
by
the
laws
of
mathematics.
We
cannot
ingest
this
given
capacity
system
with
up
to
10
billion
objects
and
each
object
having
128
megawatt,
because
we
don't
have
enough
storage
available
right.
B
So
we
started
again
with
with
you
know:
zero
objects
into
the
system
and
being
just
as
close
to
like
18,
18
or
19
ish
million
objects,
because
the
object
size
were
were
massive
128.
B
during
this
course
of
testing.
We
also
measured
100
percent
get
100
put
numbers.
So
if
you
focus
on
the
on
the
red
line
here
right
from
very
start
of
the
test
until
we
reach
to
the
last
pretty
pretty
straight
line
right,
not
not
supe
super
super
straight,
but
pretty
pretty
deterministic
line
here.
If
you
see
the
the
put
numbers
they
are,
they
are.
You
know
super
super
straight.
If
you
see
these
numbers,
however,
we
actually
we
missed
to
run
several
rounds
of
get
numbers
in
the
text
test
cycle.
B
So
it
was
a.
It
was
a
problem
at
the
test
test
plans
that
we
built
up.
So
we
missed
right.
We
missed
some
of
the
test
cycles
here,
but
since
we
can't,
we
can't
go
back
in
time
and
do
it
redo
it.
So
we
just
you,
know,
started
at
this
point.
I
realized
okay,
oh
oh,
we
are
not
capturing.
One
person
gets
so,
okay,
it's
not
too
late.
Let's
start
it
now,
so
on
average
10.7
gigabytes
per
second
of
s3
put
bandwidth
and
11.6
gigabytes
per
second
for
s3
get
again.
B
This
is
average
out
right
from
zero
object
until
to
the
very
last,
and
if
I
normalize
these
numbers
again
with
my
total
number
of
spinning
media
into
the
system,
I'm
I'm
capturing
close
to
34
megabytes
per
second
of
s3
workload.
It's
again,
it
is
not
the
performance
of
the
bare
performance
of
your
media,
which
could
be
you
know
close
to
120
150
megabytes
per
second,
which
is
advertised
performance,
but
this
is
actual
workload
performance,
so
these
numbers
typically
help
you.
B
B
B
So
the
left
one
here
is,
with
the
large
object
testing.
So,
first
of
all,
there
was
no
outage
into
the
system.
It
was
a
steady
state
performance.
We
are
getting
close
to
12,
gigabytes
and
10
gigabytes
respectively
for
get
input
all
good
here
then,
in
the
next
round
we
intentionally,
you
know,
stopped
six
osds
and
waited
for
sev
to
throw
those
out.
So
after
you
know,
600
seconds
left
throw
out
all
the
all
my
six
failed
osds,
which
I
intentionally
failed
and
at
the
same
time
I
was
running
cost
bench
to
measure
the
performance.
B
How
does
cause
bench
reports
the
performance
once
I'm
missing?
Six
tries
from
the
system,
so
you
can
see
this
right.
There
was
like
a
very
minimal
effect
from
from
12
I
moved
to
10
and
then
from
from
10
I
moved
to
9.6
so
which
is
you
know,
which
is
not
not
not
a
a
huge
drop
in
in
performance,
given
that
I'm
I'm
losing
some
storage
as
well.
B
So
if
you
see
this
one,
so
two
percent
of
storage
failure
resulted
in
into
six
and
eight
percent
of
you
know
performance
which
is
expected
because
you
have
low.
You
have
less
number
of
devices
underneath
which
used
to
work
right
so
in
in
the
in
the
third
iteration
of
the
same
test.
What
we
did
is
we
just
pulled
off
one
node,
one
node
containing
53
spinning
devices.
So
at
this
point
we
have
lost.
You
know.
B
17
percent
of
my
my
total
capacity
and
costbunch
is
trying
to
write
the
data
set
or
data,
and
then
I'm
measuring
the
performance.
So
there
was
a
a
decent
performance
drop
of
21
percent
and
25
percent,
which
is
again
expected
because
you
are
running
with
low
low.
You
know
worker
workhorses,
so
yeah
it
is.
It
is
decent
right.
It's
not
it's
not
too
bad,
I'm
not
losing
like
50
percent
of
the
battery,
or
let's
say
you
know,
or
even
even
lower
than
that,
but
it
is.
It
is
expected
with
respect
to
small
object
testing.
B
We
saw
a
similar
test,
a
similar
number
performance.
So
the
first
one
first
block
is
the
steady
state
everything
going
good
and
then
we
failed
six
devices
and
again
there
is.
There
was
no
huge
performance
drops
as
compared
to
you
know,
delivering
the
same
performance
with
with
large
options.
B
We
did
not
have
time
in
the
lab
to
execute
the
third
third
round
of
that,
so
we
don't
have
a
data
for
for
the
last
round,
but
you
know
you
got
an
idea
right,
a
subsystem
which
is
eighty
percent
filled
up
and
then
I'm
pulling
out
one
entire
note
from
the
system
and
trying
to
measure
the
problem
we
are.
Actually
you
know
we're
actually
trying
we
were.
We
tried,
you
know
very
hard
on
on.
B
Here
are
some
based
on
based
on
what
we,
what
you
guys
saw
on
the
performance
here
are
some
guidance
that
you
can
draw
from
from
this
like:
okay,
okay,
this
is
all
good,
but
how
can
this
help
me
designing
my
my
next
awesome
set
cluster,
because
I
have
got
a
requirement
that
I
need
to
build
a
cluster
that
can
deliver
x,
operation
per
second
or
maybe
a
cluster
that
can
deliver
y
gigabytes
of
of
per
second
s3
workload.
How
should
I
size
it?
How
many
nodes
should
I
go
and
buy?
B
How
many
spindles
should
I
would
I
be
requiring
for
this
workload?
So
again,
there
is
no
silver
bullet.
This.
This
mechanism
can
help
you
to
come
up
with
a
ballpark
number
so
again
pointing
it
back
to
my
the
numbers
which
we
so
average
numbers
divided
by
the
total
total
number
of
spinning
devices
that
we
have
so
we
can
get
close
to.
You
know
660,
ops
per
spinning
device
and
34
megabytes
of
put,
so
you
can
do
the
math
from
here.
B
Like
okay,
for
example,
let's
say
if
you
need
to
build
if
you
need
to
build
a
cluster
with
with
you
know,
with
that,
can
deliver
3.4
gigabytes
of
your
per
performance.
So
what
you're
going
to
do
is
you
can
just
go
and
buy?
You
know
100,
100
osds,
so
typically
that
should
that
should
give
you
a
cluster
with
3.4
gigabytes
per
second
of
of
put
output
bandwidth.
So
this
is
how
those
averaged
out
number
can
help
you
in
sizing
your
your
next
big
awesome,
stuff
cluster.
B
The
sample
size
was
too
small,
because
this
was
based
on
just
one
performance
testing.
So
don't
take
these
numbers,
as,
as
you
know,
the
the
most
accurate
to
the
most
perfect,
but
this
can
help
you
with
at
least
the
ballpark
number,
and
this
is
all
the
given
is
that
this
is
all
based
on
four
percent
of
flash
capacity
that
you
should
be
using
for
bluestora
per
osd
device.
B
The
next
one
is
some
recommendation
is
that
you
could
go
if
you
want
to
get
some
more
performance
from
subsystem,
just
go
and
use.
You
know
multiple
instances
of
of
ceph
redux
gateway
on
each
fosg
node.
That
can
give
you
the
more
performance,
which
means
it
will
gonna
actually
add
on
mode
loaded
into
the
osd
back-ends.
So
you
can
get
some
more
performance
from
here.
B
The
third
one
is
going
with
a
decent
blue
store,
blues
or
flash
sizing
so
for
what
we
have
observed
and
what
we
are
recommending
to
our
customers
and
others
are
use
four
percent
of
flash
for
bluestora,
which
will
help
you
in
most
of
your
use
cases
like
flog
file,
block
and
object
right.
So
this
is
typically
a
good
good
starting
point.
If
you
don't
know
what
is
your
actual
use
case,
gonna
look
like
right.
B
If
you
know
the
use
case,
then
fantastic,
you
can
probably
lower
it
or
maybe
increase
that,
depending
on
your
on
use
cases,
but
this
is
usually
the
the
idle
recommendation
that
we
do.
B
You
could
also
increase
max
bytes
for
level
base
which
is
default
to
256
megabyte,
which
means,
if
you
are
allocating
four
percent
of
your
of
a
blue
store
metadata
device
for
for
osds
for
each
osd.
B
You
can
actually
bump
up
this
number
slightly
so
that
you
can
get
most
out
of
your
your
flash
capacity,
which
means
you
will,
you
will
add,
and
you
will
write
in
more
data
onto
of
rocks
db
into
until
you
hit
until
you
hit
the
limit
of
rocks
db
compaction,
moving
the
data
from
from
spinning
from
flash
onto
spinning
devices.
So
typically
you
can.
B
You
can
play
around
with
this
with
this
tunable
by
the
way,
all
the
testing
that
I've
done
like
it
was
kind
of
you
know,
based
on
the
default
setting,
except
some
minor
settings
like
this
one
and
some
also
described
in
the
paper
I'm
going
to
show
you
like
you
know,
objector
in
flights,
and
you
know
those
kind
of
things,
but
I
haven't
tuned
the
theft
to
the
to
the
to
the
to
the
last
turn
available,
because
it's
it's
too
too
difficult.
B
The
people
out
there
who
are
still
using
rpm
based
things
or
rpm
based
services
in
in
yourself
in
a
subsystem
you
guys
can-
can
rely
on
ceph
and
and
using
containerized
storage
demons.
Like
all
these
storage
components,
all
the
stuff
components
can
can
run
on
containers
and
they
are
pretty
stable.
It's
been
there
since
last
two
and
a
half
or
three
years
I
would
say
most
of
the
customers
that
we
have.
They
are
using
contrary
storage
demons.
They
are
rock
solid.
B
With
this
testing
we
ingested
10
billion
objects.
We
failed
nodes
that
that
filled
up
at
80,
80
percent
full
filled
ratio.
We
have
not
seen
any
problem
with
respect
to
continuous
storage,
yeah
go
and
go
and
use
a
co-located
csd
contrast
for
siemen.
This
will
also
reduce
your
you
know.
Footprint
like
you,
don't
necessarily
need
dedicated
machines
for
mons
or
dedicated
machines
for
osds
and
managers
and
and
what
not
right
you
can
just
go
and
buy.
All
all
of
them
are
like
same
nodes.
B
You
can
just
go
and
have
six
nodes,
let's
say
and
just
co-locate
everything
you
should
be
good
to
go
if
possible,
go
with
a
decent
size
of
osg
memory
target
by
default.
It
is
six
six
five
or
six
gigabytes,
but
yeah.
If
you
have
availability,
you
can
go
with
some
some
decent
osg
memory
target
for
osd,
which
is
still
not
not
too
too.
I
mean
memory.
Memories
are
cheap,
cheap
these
days,
so
you
can
go
and
get
some
more
dents
into
the
system
and
get
some
more
numbers.
B
So
these
are
some
sizing
guidance
based
on
our
study-
and
here
are
these
some
here's,
the
summary
so
overall
we
have
achieved
deterministic
performance
at
scale
for
both
small
and
large
object,
workload
sizes
and
before
we
hit
any
any
saturation
limits,
like
you
know,
blow
store
spilling
from
from
nvme
devices
to
spinning
devices,
and
then
we
that
we
hit
that
you
know
utilization
capacity
utilization
problems
because
we
did
not
had
enough
free
capacity
in
the
system
to
you
know
to
keep
it
to
a
decent
field
level.
B
All
my
systems
were
like
at
the
end
of
my
testing.
They
were
like
95ish
percent
filled
up.
I
did
not
add
any
any
capacitor
remaining
so
yeah
until
we
hit
any
resource
saturation,
we
got
some
some
fantastic
numbers
at
scale,
right
from
zero
objects
until
10
billion
and
same
for
my
failure,
mode
scenarios,
pretty
good
numbers-
and
undoubtedly
this
is
not
the
limit.
B
I
I'll
reiterate
this.
This
is
not
the
limit
of
staff.
This
is
what
we
tested
in
our
labs,
so
I
would
I
would
you
know
I
know
people
like,
like
cern
and
other
other
off-our
customers.
They
have
huge
stuff
clusters
which
have
already
have
you
know:
multi-deca
billion
objects
into
the
system,
so
so
yeah.
This
is
not
a
limit.
This
is
just
the
tested
maximum
and
I
hope
that
this
will
help
you
give
some
more
confidence
on
staff
staff
is,
is
really
robust.
B
B
So
yeah
you
can
download
the
full
report
of
this
performance
testing
on
this
url.
B
B
So
I
will
now
go
and
see
if
there
are
any
messages
in
the
chat
or
mic.
If
you
have
anything
for
me
that
I
can
answer,
okay,
relax
objects
or
s3,
so
these
are
redox
objects
anthony.
C
B
Oh,
thank
you
and
then
is
there.
Everything
has
the
report
has
to
get
the
same
s3cmd.
Can
you
please
tell
me
version
of
a
ccmd
port
okay,
so
we
have
not
used
sdcmd
because
s3
cmd
is
not.
I
mean
to
me
it's
not
designed
for
scale.
I
would
say
at
this
scale,
if
you,
if
you
just
do
s2cmd
ls,
then
your
terminal
gonna
hang
for
so
many
hours,
because
I
I
actually
did
that
on
another
bucket.
B
It
was
taking
so
much
so
much
so
much
to
because
I
had
like
hundred
thousand
bucks
buckets
in
my
well,
not
hundred
thousand
ten
thousand
right
yeah.
I
guess
10
000
buckets
in
my
system.
It
takes
a
lot
of
time
to
to
list
out
so
so
yeah.
We
have
not
used
s3.
We
have
been
using
cause
bench
to
write
the
data
into
the
system
and
cos
bench
uses
aws,
s3,
sdk
official
sdk,
to
write
to
the
system,
which
is
pretty
pretty
fast.
B
With
respect
to
measuring
the
performance
we
have
not
measured
using
s3cmd,
ls
or
all
those
kind
of
fancy
graphenock.
It
was
easier.
We
just
went
to
minus
s
and
we
just
relied
upon
the
stuffed
metrics
coming
out
from
the
grafana
and
prometheus,
and
it
was
the
redox
object
stored
into
the
subsystem.
C
This
is,
this
is
anthony.
What
was
the
min
allocation
size
for
blue
store?
Was
that
still
set
to
64k
because
of
hdds.
B
Tracks
that
no,
we,
I
think
we
we
had,
I
cannot
remember
I
mean
it's
been
long.
I
did
the
testing,
but
I
can
I
mean
exactly
the
numbers
are
there
on
on
the
paper,
but
it
was
exactly
same
what
we
have
in
the
on
the
fix
in
the
upstream
community
because
yeah
I
mean
you
know.
While
we
know
that
you
know,
the
community
has
told
us
that
okay,
we
should
be
adjusting
the
block
sizes
so
that
we
can
get
better
from
the
system.
B
Otherwise
we
will
end
up
losing
more
capacity,
because
you
know
I
guess
that
was
16.
If
I
recall
correctly,
if
any
of
these
f
expert
is
available
here,
I
guess
the
fix
that
we
have
in
upstream
is
like
16
kilobyte
for
both
spinning
and
and
hdds.
I
guess
so,
but
yeah.
That
was
the
number.
I
think
we
we
went
with
16
or
four
four
yeah
I
need
to.
I
don't
dig
that
number
up.
It's
not
at
the
top
of
my
head
right
now.
C
For
your
index
for
your
index
pool
how
many
shards
did
you
have,
did
you
have
utter
restarting
turned
on.
B
Yes,
we
had
our
auto
recharging
turned
on
we,
we
have
not
tuned
anything
on
that
part
of
things
and
again
we
we've
also
not
created
a
index
pool
on
flash,
because.
C
B
Know
we
don't?
We
don't
need
that
now,
since
blue
store,
so
yeah
auto
starting
turned
on
in
the
substitution.
C
Okay,
interesting,
okay
and
you
in
one
of
your
slides,
you
sort
of
implied
that
the
the
size
of
the
partition
that
you
used
for
the
wall
and
db
external
device
was
300
gigabytes.
But
there
was
a
url
to
a
prior
presentation
about
a
1
billion
object
trial
that
described
an
80
gig
partition
size.
So
I
wanted
to
be
clear
about
which
you
actually
used
here.
B
So
in
here
we
had,
I
didn't,
do
the
math
here
real
quick,
because
it
was
like
I
guess
we
had
like
800
gigabytes
per
ost,
yeah
close
to
close
to
internet,
because
we
had
like
pretty
pretty
big
size
of
nvme.
B
So
we
had
like
six
nvmes
in
the
system
and
each
nvme
was
close
to
eight
terabytes.
They.
B
C
There
was
a
lot
of
discussion
on
the
list
when
upstream
documentation
started
recommending
four
percent
because
of
the
you
know,
the
the
blue
store
level.
You
know,
sizes
that
you
mentioned.
B
It's
close
to
750
or
800
gigs,
but
you
know
we
have
not
seen
because
because
we
we
hit
this
okay.
So
let
me
toss
my
screen
on
this
one.
This
is
a
pretty
interesting
point.
I
guess
so.
If
I
go
back
here
actually
you
know
we
have
not
used
a
lot
of
because
most
of
my
I
was
hitting
l4
into
my
flash.
My
l5
was
like
2.56
terabytes
and
I
did
not
have
this
many
this
much
this
much
unique
capacity
available.
B
So
actually,
even
though
I
had
like
800
or
750
gigs
available,
I'm
actually
using
you
know
close
to.
If
you
see
this
graph
like
seven
two,
two
third,
two,
eighty
gigabytes
off
of
dbs.
B
C
There's
been
some
discussion,
I've
seen
some
some
preliminary
prs
about
sharding
roxdb
so
that
we
don't
have
the
stair
step.
You
know,
you
know
proxy,
be
where
we
can
more
efficiently
use.
You
know
sizes
that
aren't
you
know.
You
know
power
of
10
multiple.
C
B
You
can
increase
that
yeah,
that's
a
workaround,
you
can
you
can?
There
are
a
lot
of
tunables
that
you
can
tweak
in
that
sizes
that
you're
describing
here.
C
B
Two
enables
yeah
with
the
rocks
tv
there
are.
There
are
ways
you
can
just
change
that
you
know
multiplier.
You
know
you
can
just
change
that
by
your
own,
but
yeah.
That's
that's.
A
All
right
well,
thank
you
for
taking
the
time
and
sharing
this
information
with
us
and
the
recording
will
be
posted
up
shortly
today.
Let's
see,
I
think
the
only
thing
I
have
in
terms
of
announcements,
stuff
newsletter
should
be
going
out
today,
outreachy
news
as
well
we're
looking
for
mentors.
A
We
have
projects
already
set
up,
and
otherwise
that's
it.
Thanks
everybody
for
joining
us
and
have
a
great
day
have
a
great
night
and
we'll
see
you
all
next
time.