►
Description
Presented by: Danny Abukalam
Full schedule: https://pad.ceph.com/p/ceph-month-june-2021
A
Okay,
yeah
improving
improving
cost
bench
for
self
benchmarking,
so
at
softline
we
spend
a
lot
of
time
doing
benchmarking.
It
typically
falls
into
two
categories.
We
have
baseline
performance
testing
of
our
notes,
which
we
just
do
on
a
regular
basis,
and
then
we
have
specific
benchmarking
that
we
do
on
customer
by
customer
basis.
A
So
we
caused
back,
I'm
actually
talking
about
the
second
thing
so
yeah.
Obviously,
saf
is
tunable
and
configurable
for
different
workloads.
So
we
have
a
you
know
quite
often
we're
trying
to
help
our
customers
understand
how
to
configure
an
architect
a
set
cluster
for
whatever
they're
trying
to
do
so.
A
A
So
it
started
off
very
much
as
a
research
project
research
paper
and
that
whole
shebang,
but
then
it
very
quickly
graduated
into
a
project
lots
of
users,
and
I
think
the
reason
for
this
is
it
was
built
to
be
modular,
so
it
made
use
of
java's
osgi
framework
to
break
down
the
different
parts,
such
as
the
controller,
the
driver,
the
web
ui
scheduler
api
plugins.
A
So
why
does
it
keep
coming
back
over
and
over
again?
Well,
there's
lots
of
benchmark
tools
out
there,
although
not
very
many-
are
designed
specifically
for
the
purpose
of
benchmarking,
an
s3
object,
store
or
even
any
storage
behind
http.
So
in
terms
of
open
source
tools,
fio
is
really
powerful.
I'm
sure
many
are
familiar
with
it.
A
It
has
lots
of
bells
and
whistles
it's
great
for
benchmarking,
block
block
storage
in
general
and
has
an
rvd
back-end.
It's
pretty
mature.
It's
been
around
for
20
years
very
well,
documented
and
fio
also
has
a
http
backend,
which
you
can
use
for
sv
benchmarking,
there's
also
a
tool
called
hs
bench,
which
I
believe
or
hot
source
bench,
which
I
I
think
came
out
of
the
staff
community
as
well,
and
it's
a
fork
of
another
popular
s3
benchmarking
tool.
A
A
Ports
file
block,
object,
a
number
of
other
custom
interfaces,
but
in
spite
of
all
of
this,
we
keep
coming
back
to
cosmos,
and
so
so
why
is
this
the
case?
A
So
it's
yeah
a
benchmark
in
the
true
sense
of
the
word
yeah.
We
might
be
able
to
get
three
percent
more
out
of
another
tool
or
five
percent
out
of
something
else,
but
while
we
have
tools
that
we
use
to
squeeze
every
bite
per
second
that
we
need
out
of
the
cluster,
sometimes
that's
less
important
than
having
consistency
across
different
systems.
So
we
end
up
talking
to
us.
A
People
use
cosmetics
and,
like
that
gives
it
value
as
a
another
good
thing
about
postbench
is
that
it
has
support
for
lots
of
different
proprietary
storage
technologies
so
because
these
were
added
over
the
years,
it
means
that
you
can
use
one
xml
file
and
benchmark
many
different
storage
clusters
using
those
vendor
contributed
plugins
in
terms
of
architecture.
A
It
actually
cosbench
has
a
controller
drive
architecture
which
not
all
benchmarking
tools
have,
and
so
that
means
you
can
scale
it
fairly
easily
to
very
large
clusters
and
also
it's
fairly
straightforward,
to
figure
out
if
your
driver
nodes
are
becoming
a
bottleneck
for
your
storage
cluster.
If
you
need
more
driver
nodes
or
you
know,
if
you've
hit,
if
you
could
hear
the
limit
in
terms
of
generating
load,
then
lastly,
it
also
supports
the
battery.
A
So
I
think
someone
contributed
a
values
plug-in
a
few
years
ago
and
that
still
works,
which
works
really
well,
so
we
can
use
cause
bench
to
benchmark
both
s3
and
then
also
the
browns
as
well.
A
So
this
is
what
cost
bench
the
the
workflow
usually
looks
like
you
have
an
xml
file.
The
powerpoint
only
ever
runs
one
workload
file
at
a
time
you
feed
the
xml
into
the
controller
node
and-
and
you
can
do
that
by
the
web
interface
or
command
line
and
cospects
will
then
send
out
the
different
workshops
that
you
specify
within
each
workstation
in
parallel,
and
it
will
do
that
and
send
that
to
your
drivers,
sequentially
and
these
will
then
go
and
do
the
http
cancel
puts
all
obviously
yeah.
A
It's
about
us,
it'll
be
it'll,
be
just
like
articles,
but
typically
http
gets
inputs
and
the
cool
thing
about
this
as
well
is
that
cosbench
also
has
the
ability
to
address
multiple
wireless
gateway
endpoints.
So
you
can
give
it
a
number
of
http
endpoints,
and
if
you
structure
your
workload
correctly,
then
you
actually
have
to
worry
about
load
balancing,
because
it
will,
it
will
do
some.
You
know
it
will
balance
the
load
across
your
endpoints
for
you,
so
that's
fairly
good
as
well.
A
So
this
is
what
a
cosbex
workload
looks
like
you
take.
You
can
see
at
the
beginning
of
specified
storage
type,
so
s3,
I'm
saying
I've,
given
it
an
access
key,
a
secret
key,
giving
it
my
end
point
in
this
case.
A
Just
this
workload
is
a
very
basic
kind
of
one
endpoint
workload
file,
and
then
you
have
different
work
stages
and
so
typically
you'll
have
some
free
benchmark
workstations
which
prepare
you
create
the
bucket
buy
the
data
so
that
you
have
some
data
to
read
and
write,
read,
write
from
and
then
you'll
have
your
benchmarking
workstages.
So
here
I've
got
a
hundred
percent
right
and
100
and
I've
got
70.
A
30
read
right,
that's
another
thing
as
well,
which
not
all
benchmarking
tools
can
do
is
that
you
can
have
mixed
read,
write
workloads
I
mean.
A
Does
that,
but
but
not
not
everything
does
that
and
so
you
can,
you
can,
you
know,
add
a
ratio
to
the
that
kind
of
a
mixed
and
mixed
benchmark
and
then,
after
you've
done
your
basic
work
stages,
you
have
a
couple
of
post
benchmark,
essentially
clean
up
deleting
the
data
and
then
disposing
of
the
pocket
as
well.
A
I've
actually
skipped
in
this
work.
I've
skipped
the
creation
of
the
data,
because
I
have
my
right
work
stage
first,
but
you
need
to
be
careful
with
that,
sometimes
because
you
typically
read
faster
than
you
write,
and
so,
if
you,
if
you
do
a
time-based
workload
or
you
can
sometimes
end
up
with
an
overflow,
so
it's
used.
This
is
quite
lazy.
It's
much
safer
to
just
have
a
right
data,
but
if
you're
careful
you
can
get
rid
of
them.
A
A
Like
this
is
typically
what
we
do
when
we're
benchmarking
with
cos
bench,
I've
taken
out
some
of
the
other
peripheral
nodes
that
we
use
like
management
interfaces
and
storage
management.
A
That
kind
of
thing
so
just
got
the
set
roles
here,
the
osds
monitors
and
the
wireless
gateway,
and
you
can
see
here
we
have
outbound
management,
but
at
the
bottom
you
can
see
it's
all
plugging
into
the
same
switch
and
we
you
can
have
a
separate
s3
network
or
you
can
do
whatever
you
like,
but
we
typically
have
a
separate
sv
network
and
then
our
cost
bench
controllers
and
drivers
plug
in
to
the
relevant
network.
A
And
then
you
know
you
can
use
the
the
right
ip
based
on
either
http
or
braddos.
The
other
thing
is
that
typically
we
do
bonding.
I
haven't
it.
It
was
a
lot.
I
tried
to
do
the
diagram
of
bonding,
but
it
was
a
lot
more.
It
was
a
lot
clearer
when
I
didn't
so
I
thought
I'd
just
keep
it
this
way,
but
you
could
buy,
borrow
the
mix
and-
and
it's
the
same
thing
essentially.
A
So
yeah
cosmetic
has
a
web
interface.
It's
not
that
great,
it's
okay,
sometimes
to
just
watch
the
cluster
and
see
how
things
are
going.
I
tend
to
not
use
it.
I
tend
to
use
grafana
and
prometheus
to
watch
and
see
what's
happening
on
cluster
and
also
the
really
useful
command
is
the
set
osd
card
stats,
so
I
can
see
what's
up
supporting
if
that
matches
up
with
what
I'm
seeing
on
the
other
side,
and
so
you
know
get
some
validation
there.
You
can
really
you
know
other
than
watching.
A
What's
going
on,
you
can
also
kick
off
workloads
and
generate
workload
files
from
the
interface,
but
it's
more
effort
than
it's
worth
in
my
opinion.
So
I
I
just
don't
bother
with
the
web
interface
at
all.
I
just
yeah
it.
I
tend
to
just
submit
workflow
files
using
the
command
line
and
and
yeah.
So
I'm
not
a
big
fan
of
its
face
really.
A
So
what
sucks
with
cost
bench
turns
out
quite
a
lot
of
things?
Firstly,
there's
no
build
system,
so
you
can't
really
build
it
easily
from
source
today.
If
you
want
to
build
it,
you
have
to
manually
construct
development,
environment,
eclipse
and
figure
out
how
that
works.
A
It
distributes
all
the
builds
the
built
binaries
in
the
in
the
repo,
which
is
obviously
always
bad.
The
java
users
is
now
fairly
ancient,
but
that's
partially
because
it
hasn't
seen
much
development
in
many
years,
but
it
also
means
that
you
have
to
faff
about
even
just
to
run
it.
Let
them
build
it.
A
You
have
to
back
around
with
getting
an
old
java,
which
can
sometimes
be
fun
on
on
modern
distros,
so
yeah
the
build
system
is
basically
non-existent,
and
actually
I
think
it's
it's
a
lot
more
difficult,
getting
it
to
build
on
linux
than
it
is
on
windows.
I
think
I
don't
even
know
if
we've
managed
to
get
it
to
build
on
linux
before
so
I.e
with
the
eclipse
kind
of
strategy.
A
That's
what
I
mean.
So
the
build
system
is
not
great.
The
product
is
also
kind
of
dead,
so
it's
unmaintained.
While
there
are
some
forks
that
add
in
vendor
plug-ins
and
a
few
other
things
there's,
there
hasn't
really
been
real
development
on
the
the
core
of
cost
bench.
In
a
long
time,
maybe
more
than
five
or
six
years,
and
so
the
last
release
was
more
than
five
years
ago,
and
if
you
try
and
run
the
last
release,
it
actually
doesn't
work.
A
You
have
to
run
the
ultimate
release,
there's
something
wrong
with
the
last
release.
I
can't
remember
exactly
what
so
there's
all
these
different
traps
as
a
user
trying
to
try
to
use
cosmetics
the
first
time
you
fall
into
many
of
them
and
it's
can
be
quite
fun.
A
Obviously,
there's
no
build
system
so
how
we're
going
to
distribute
anything?
Basically,
if
you
want
to
install
it
on
your
nodes,
it's
just
a
very
manual
process.
I
imagine
that
people
in
the
community
that
still
use
crossbench
have
probably
written
their
own
scripts
to
do
this
in
an
automated
way,
but
everyone
has
to
figure
this
out
themselves.
A
So
there's
no
package,
no
package
management,
no
doctors,
no
nothing,
and
not
only
that
to
start
the
to
start
the
the
project.
To
just
forget
to
run
you
have
these
bootstrappy
batch
scripts
that
it
relies
on
and
they're
they're
pretty
fakey
and
not
very
reliable.
So
that's
another!
That's
not
the
issue!
Kind
of
sucks,
and
then
I
think
the
most
annoying
thing
is
the
workflow,
but
also
the
most
easily
solved
thing,
which
is
that
no
one
really
likes
writing
xml
or
well.
A
No
one,
I
know
likes
writing
xml
and
constructing
workload.
Files
is
really
a
pain
to
do
manually.
So
once
you've
got
a
bunch
of
xml
files
that
you're
happy
with.
On
top
of
that,
you're
gonna
have
to
pass
the
results
from
the
runs
and
because
you
have
a
lot
of
drivers,
you
have
to.
A
You
know,
go
in
and
figure
out
what
the
aggregate
result
is
by
adding
up
all
of
the
results
from
your
different
drivers,
which
which
it
doesn't
do
it
kind
of,
has
something
that
does
this
in
the
web
interface,
but
sometimes
it's
there.
Sometimes
it's
not.
It
tends
to
just
vanish
and
then
you
just
have
to
do
it
manually.
So
it's
very
unreliable.
A
So
you
know
you
end
up
getting
a
calculator
out
and
summing
results,
and
you
know
once
you've
done
that
ten
times
you,
you
kind
of
irritate
it
once
you've
done
that
a
hundred
times.
Well,
yeah,
I
don't
know
not
great
so
another
pain
point
is
the
workflow.
A
So
what
kind
of
we
have
an
internal
branch
of
cost
bench
and
over
time,
we've
kind
of
tried
to
make
our
lives
easier
to
just
deal
with
it?
So
what
what
progress
have
we
made
over
time?
I
think
the
biggest
thing
is
that
we've
actually
introduced
the
build
system.
So
you
can.
A
One
of
my
colleagues
actually
you
know
spent
spent
quite
a
lot
of
effort,
but
he
spent
quite
a
bit
of
time
getting
it
to
build
with
maven,
so
we
can
now
throw
clips
out
the
window
from
linux,
just
type
name
and
build
and
we're
off.
So
that's
that's
really
great
and
that's.
I
think
I
think,
a
big,
a
big
change
that
we
found
pretty
useful.
A
On
top
of
that,
we
also
have
packaged
plus
bench
for
debian,
so
software
analytics
is
a
fourth
debian
and
so
I'd
imagine
that
our
packages
would
work
on
any
w
system,
maybe
even
until
we
haven't
tried
it
and
so
yeah.
We
build
these
with
gitlab
ci
and
this
places
the
various
you
know
when
you
install
the
packages
it
places
the
various
pieces
of
crossbench
in
conventional
system
directory.
So
you
have
configurations
in
the
use
flashing
to
see.
A
I
think
we
have
some
stuff
in
bar
lib
and
then,
and
so
it's
all
kind
of
a
lot
more
tidy
and
we
also
have
systemd
services,
so
you
can
easily
start
and
stop
and
restart
various
cost
bench
services,
the
driver
and
the
controller
relatively
easily
single
command.
So
once
you've
edited
the
configuration
file
in
ecc,
you
specify
your
drivers
and
controller
nodes.
Then
you
just
literally
run
the
system,
the
start
command
and
you're
good
to
go,
and
then
we've
also
done
some
work
to
make
it
easier
to
generate
cost
back
focus.
A
So
from
our
standard
benchmarking
stack,
you
know
we
we
can.
We
can
specify
a
workload
and
it
will
go
away
and
build
some
xml
for
us
that
we
want.
It
doesn't
support
everything,
but
it
does
some
basic
things
and
then
we
also
have
some
some
scripting
that
helps
us
pass.
A
The
csv
files
that
cause
bench
logs
as
part
of
it's
so
kind
of
the
results
gathering
as
well,
so
we
have
some
scripts
for
generating
the
workloads
it
comes
in
and
then
we
have
some
scripts
for
pulling
out
the
results
on
the
other
end,
and
actually
I
figured
out
after
we
did
this.
I
realized
that
the
set
benchmarking
tool-
cbt
also,
I
think,
has
done
some
of
that
as
well.
So
I
think
that
supports
cost
cosmetics
as
a
back
end,
but
I
haven't
I
haven't.
A
Actually
I
haven't
looked
at
it
in
anger,
so
I'm
not
sure
if
those
scripts
might
be
better
or
the
past
stuff's
better,
but
it's
definitely
worth
exploring
and
then
the
other
thing
is:
we've
got
some
slightly
better
documentation
as
well.
A
So
all
the
cosmetic
stocks
are
in
this
pdf
and
it's
like
from
this
long
academic
kind
of
scroll,
30
pages
kind
of
things,
and
it's
also
not
very
clear
in
some
cases,
kind
of
what
different
variables
do
and
how
to
what
the
syntax
is
like
when
you're
building
an
xml
it.
It's
not
immediately
obvious.
A
A
You
know
clean
the
repo,
build,
run,
cost
bench
and
then
or
install
package
and
then
get
started
and
make
a
basic
workload
file.
So
we've
we
that's,
that's
a
lot
easier
as
well.
So
here
we
have
this
internal
branch.
It
has
a
few
of
these
changes.
There's
still
more
that
we
can
do.
I
think,
to
make
cost
bench
a
bit
a
bit
easier,
but
I
definitely
find
some
of
the
changes
we
made
already
a
bit
easier.
A
So
we
want
to
tidy
up
this
branch
and
make
it
available
in
the
open,
ideally
at
some
point
in
the
next
month
or
so
so
that
other
people
will
take
advantage
of
the
efforts
that
that
we
put
in
if
it's
useful
to
people,
if
people
are
still
using
crossbench
and
anger,
then
that's
great,
maybe
maybe
somebody
else
will
find
will
find
this
useful
as
well.
We'll
probably
only
have
packages
for
debian
to
begin
with.
A
If
other
people
want
packages
for
other
distros,
maybe
we
can
look
at
it
or
maybe
they
you
know
if
they,
maybe
they
could
quite
easily
build
their
own
packages
and
contribute.
That
would
be
welcome.
I
guess
we're
just
generally,
you
know
interested
to
see
who
else
is
using
crossbench
still
and
if
there's
others
in
the
community
within
the
staff
community
that
are
still
using
it,
it
would
be
great
to
collaborate
and
figure
out
how
how
we
could
work
together.
A
So
that's
kind
of
a
summary
of
what
we've,
what
we've
improved
with
the
cost
branch.
I
welcome
any
questions.
B
Yes,
we
can
hear
you,
okay.
I
I
just
wanted
to
understand
one
thing,
so
in
my
testing
I
generally
use
cause
bench
for
performance
analysis
of
one
of
the
ss3
products
which
rednet
has,
namely
nuba.
So
one
thing
which
one
challenge
which
I
faced
with
causebench
is
that
the
data
stream
which
it
generates
is
actually
it's
a
completely
non-random
data
data
stream.
It's
like
100
percent
duplicable.
B
So
is
there
any
thought
process
where
you
are
trying
to
do
input
stream,
which
can
be
where
we
can,
as
a
user,
specify,
okay,
50
degree
purple
or
25
percent
de-loopable
or
75
percent
neutral
d-duplicate?
Any
any
thoughts
on
that
that
one.
A
So
your
question
is:
if
you
want
to
test
the
benefits
of
deduplication
and
video,
I
guess
with
with
cosbench,
because
the
input
stream
is
reproducible
every
time
it
makes
it
very
difficult
to
to
test
the
benchmark
to
see
that
yeah.
That's
right.
A
Have
you
looked
at
some
of
the?
Have
you
looked
at
some
of
the
variables
within
the
cost
branch
pdf
for
how
you
actually
generate
the
data,
because
there's
a
number
of
different
kind
of
buttons
there?
I'm
not
sure
if
you
could
achieve
what
you
want
today
with
cost
bench.
I
suspect
you
might
be
able
to.
B
Okay,
so
what
what
I
do?
Generally,
they
say:
there's
another
cause
bench
which
is
available,
which
was
actually
edited
by
nexcenter.
What
they
have
done
is
probably
they
have
edited
the
input
stream
to
generate
a
completely
random
input
data.
So
I
use
that,
but
ideally
I
would
want
to
have
a
feature
which,
which
will
allow
me
to
select
between
any
percentage
like
50
or
20
percent.
As
you
were
mentioning-
and
I
will
go
back
and
check
in
the
cosmos
documentation
if
there
is
any
way
to
specify
workloads
in
a
certain
way
which
does
that.
A
Okay,
I
mean
that's
it
it's
interesting.
Maybe
if
you
know
offline,
if
you
send
me
your
your
workload
file
and
also
point
me
to
the
center
the
next
center
before
you
talk
about,
because
I
I
I
don't
think
I've
seen
that
before
I
you
know,
if
you're
happy
to
look
at
I'd,
be
happy
to
look
at
this
and
see
if
I
have
any
observations
or
thoughts.
A
A
Yeah,
do
you
use
that,
with
with
the
cbt
tools,
you
have
your
own
stack.
C
I
I
think
both
they're
they're
they're,
the
perfect
scale
team,
but
then
right
now
has
has
their
own
setups
and
automation.
There
must
be
many
others,
there's
certainly
a
lot
of
folks.
That
would
like
to
see
something
more
lightweight
and
easier
to
deal
with,
but
but
we
do
agree
that
it
gets
good.
It
can
get
once
you
have
it
going.
It
can
do
a
good
job.
A
C
Chris
bloom
of
red
hat
wrote
a
golang
kind
of
translation
of
parts,
a
little
parts
of
it
and
I've
never
fully
evaluated
it.
But
I
don't
think
that
has
gone
anywhere
and
and
mineio
wrote
their
own
sort
of
driver
client
driver
set
up
called
what
they
call
warp
and
then
I've
been
meaning
to
have
someone
look
at
that.
But
I
haven't
seen
any
got
any
feedback
yet
from
anybody
that
does
regular
benchmarking.
A
Yeah
we
we've
also
written
our
own,
go
benchmarking
tool
internally,
which
I
mentioned
briefly,
and
it
supports
it,
does
a
lot
more
than
cosbrush
does
in
terms
of
interfaces,
and
it
has
a
similar
driver
controller
mechanism
which
is
pretty
cool.
So
you
know
another
thing
that
we
might
look
at
doing
is
open
sourcing
now
at
some
point,
but
I
I
you
know,
we
typically
use
that
very
extensively
internally
when
we're
not
or
we're
not
using
cost
bench.
A
I
did
see
the
the
gas
bench
blog
post
as
well
looks
very
interesting.
I
I
know
also,
I
think,
mark
mark
hpc.
I
keep
seeing
market
each
other.
Is
it
mark
nelson?
I
think
he
yeah
mark
nelson
he's
kind
of
done
a
lot
of
work
in
this
space
as
well.
I
think
he's
one
of
the
main
guys
behind
cbt
right.
C
A
Well,
there's
no
more
questions.
Thank
you
very
much.