►
From YouTube: CDS Jewel -- Non-Functional Tests
B
Sure,
hi
everybody
I
guess
I'll
first
quickly
introduce
myself
I'm
I'm,
a
PhD
student
at
UC,
Santa,
Cruz
and
I'm
a
summer
intern
at
Red
Hat
working
on
the
project
that
I'm
going
to
describe,
and
I
would
really
appreciate
any
feedback
that
you
have
I'm
going
to
actually
use
a
flight
deck
that
I
prepare
for
this.
I
think
I
can
share.
B
B
Benchmarking,
it's
kind
of
a
combination
of
the
two
so
in
in
integration
testing.
You
go
through
all
this
setups
of
deploying
configuring
and
then
testing
that
everything
works
in
benchmarking
you
deploy
configure
and
then
your
benchmark
non
functional
testing.
It's
the
way
is
defined
Addison
by
Wikipedia
is
a
test
in
our
requirement
that
it's
that
specified
criteria
that
can
be
used
to
judge
the
qualities
of
a
system
rather
than
specific
behaviors.
B
To
describe
it
as
specifying
a
statement
of
the
form
the
system
shall
be,
and
then
equality,
for
example,
the
system
shall
be
scalable,
the
system
shall
be
perform
at
it
and
so
on
and
the
main
problem
with
these
type
of
testing
I
mean
it's
it's
it's
useful
in
the
sense
that
it's
ifs,
testing,
a
property
of
the
system
as
a
whole.
The
problem
is
that
it's
hard
to
quantify.
How
do
you
know
that
how
to
use
quantify
precisely
in
order
to
determine
that
your
system
scales
order
system
performs
in
a
certain
way?
B
So
that's
the
main
problem
that
we
are
addressing
as
part
of
our
research
in
UC,
Santa,
Cruz
and
and
and
we
try
to
apply
some
of
the
ideas
that
we
that
we've
generated,
and
so
the
the
the
main
goal,
or
one
possible
approach
that
we
want
to
use
to
address.
This
is
what
I
mentioned
before
it's.
Basically,
you
kind
of
a
merge,
integrated
testing
interview,
turn
testing
with
benchmark
and
then
by
gathering
performance,
metrics
and
then
on
the
output
of
those
performance.
Metrics.
You
take
some
assertions
and
validate
those
assertions
on
top
of
that
data.
B
Do
you
basically
defining
tests
over
the
benchmark
output
data
and
and
then
and
then
those
assertions
become
a
test,
and
you
can
think
of
those
as
a
direction
test,
so
it
will
fail
or
it
will
pass
depending
on
how
the
output
of
the
benchmark
looks
like,
and
there
are
some
challenges.
The
first
one
is
that
the
hardware
is
non-deterministic
course
up,
but
we
want
to
address
that
and
then
the
second
one
is
that
that
we
need.
We
need
a
way
to
specify
this
test.
B
B
That,
regardless
of
what
the
underlying
machine
is,
so
that
it's
not
perfect,
but
it
can
work
for
many
cases
and
that's
one
of
the
one
of
the
things
we
want
to
try
to
see
how
how
good
it
is
to
to
use
it
as
a
way
to
bring
the
termination
and
then
the
second
one
is
that
we
need
a
way
to
specify
tests
and
an
inner
project.
We
have
a
validation
language
that
we
come
up
with,
and
it's
basically
having
having
an
output
file
with
that
with
performing
metrics.
B
You
can
define
this
type
of
assertions
if,
for
example,
if
you're
measuring
scalability
and
you
have
the
cluster
size
and
then
the
performance
of
the
row,
devices
and
the
performance
of
self
and
whether
or
not
the
net
way
saturated,
you
can
specify
this
type
of
validation
statements
and
then
we
have
a
validation
engine
that
runs
that
on
and
and
basically
it's
a
it's
a
it's
a
yes
or
narrow.
It's
a
boolean
function,
whether
or
not
the
output
data
complies
with
that
validation
statement.
B
So
using
do
these
two
things
darker
on
the
one
hand,
and
having
this
validation
language
on
the
other.
We
want
to
bring
a
non
functional
testing
to
self
and
and
in
particular,
that
the
steps
that
we
need
to
go
through
is
first,
we
need
to
deploy
cell
phone
talker,
so
we
can
configure
cgroup
dynamically,
then
run
benchmarks
and
validate
validate
this
assertions
over
the
output
of
the
other
benchmarks.
B
So
we're
we're
only
we're
going
to
focus
on
rattles
initially,
so
we
can
grab
that
up
in
a
three
month
in
a
summer
project
and
and
so
in
the
particular
set
of
the
task,
the
list
of
fastest,
oh
yeah,
I'm.
Sorry,
so
I
looked
first
on
how
to
do
this
using
either
tautology
or
the
set
benchmarking
tool
kit,
the
cbt,
poor
and
a
there
there
are.
There
are
some
pros
and
cons
of
using
each
but
I
ended
up
deciding
on
using
tautology
just
because
there
there's
more
people
working
on
it
more
ice.
Looking
at
it.
B
So
our
plan
is
to
add
a
darker
task.
It
is
basically
leveraging
an
orchestration
frame
or
legs
called
maestro.
It's
a
Python
framework
that
that
orchestrates
deployment
of
of
soccer
multi
host
systems,
and
initially
we
would
just
pull
from
the
darker
registry
without
having
to
build
the
images
and
then
using
these
doctors.
The
these
dr.
task
would
deploy
SEF
and
configure
how
what
are
the
resources
that
each
container
has
available
to
them?
Then
the
raddest
benchmark
task
that
it's
already
there
in
the
safe
QA
you
an
eight
week
as
far
as
I
know,
doesn't
write.
B
B
Your
task,
then
also
add
an
a
verte
task,
a
very
sore
framework,
so
you
will
basically
a
wrapper
around
a
bear
that
points
the
raddest
benchmark
and
a
validation
statement
to
the
output
of
raiders
bench
and
determines
whether
or
not
the
validations
hold
and
then
the
last
thing
is
to
write
validation,
statements
for
all
these
properties
that
we
would
like
to
observe
and
that's
pretty
much
all
the
slides
that
I
have
do.
You
have
any
question
or
comment
so.
C
B
Yes,
so
you
have
a
particular
version
or
release
and
then
on
a
on
a
set
of
the
hardware
setups
different
clusters
use.
You
are
certain
that
these
properties
hold
on
on
each
of
those.
So
you
run
a
scalability
a
test
and
it
holds
on
multiple
clusters,
and
so
it's
like
a
new
layer
of
testing.
You
have
unit
testing,
you
have
integration
testing
and
you
have
now
this
type
of
testing
that
it's
more
I
would
say
more
high-level.
D
B
C
So
then
one
fallen,
Christian
then
would
be.
Have
we
identified
because
I'm
going
to
majan
you
kid,
it
would
be
difficult
to
cover
the
thing
with
a
lot
of
a
lot
of
guarantees.
What's
validations,
have
we
identified
as
things
you
want
to
do,
is
part
of
this
project
so
I'm
guessing
there's
some
measurements
yeah
the.
B
B
D
B
Upon
I,
because
you
need
a
benchmark
right,
so
we
already
have
a
benchmark.
That
I
mean
rattles
bench
if
you
just
repeat
the
same
benchmark
on
multiple
o's.
Always
these
configurations-
and
you
have
the
scalability
test.
The
performance
test
rather
than
truck,
can
also
be
used
for
that.
But
for
availability
we
don't
have
a
corresponding
benchmark.
C
B
So,
for
example,
Rabin's
bench
shows
you
the
throughput
over
time.
Then
you
want
to
observe
that
throughput
not
go
out
of
a
specific
range.
You
say:
ok,
the
throughput
of
the
system
should
be
within
ninety.
Five
percent
of
the
raw
performance,
for
example,
and
and
the
raw
performance
might
be
determined
by
the
number
of
choices
that
you
have
available
times,
the
capability
of
each
or
something
like
that,
or
you
can
actually
run
I.
B
May
be
a
DD
task
distributed
BD
tablet,
but
of
things
that
the
raw
performance,
so
your
your
validation,
would
be
regardless
of
the
size.
Actually,
this
this
statement
is
is
that
is
specifying
that
so,
regardless
of
the
size,
I
would
expect
the
performance
of
stuff
the
two
products
have
to
be
within
ninety
percent
and
a
web
of
the
real
performance
when
gonna
do
it
I'm.
Sorry,
sorry.
E
B
B
E
That
make
sense,
so
one
way
you
could
test
things
like
availability
is,
if
you
have
the
time
series,
latency
and
throughput
information
from
rato
spam,
and
you
could
correlate
that
with
events
like
taking
a
nosey
down
by
the
way
technology
already
has
machinery
for
doing
all
of
that
in
Seth
manager,
double
yoi,
I
think
cool,
like
its
extensive.
Most
of
our
testing
involves
to
running
a
random
thrasher
in
the
background
that
kills
those
to
use
root.
So
as
these
that
sort
of
thing,
so
there
are
utility
methods
for
doing
that'sthat's.
E
That
sort
of
thing,
so
you
could
write
a
task
that
manually
performs
a
very
specific
manipulation
on
the
cluster
logs.
The
time
at
which
had
happened,
and
then
later
you
can.
You
would
be
able
to
look
at
the
log
before
and
after
that
event
and
verify
that
the
constraints
were
about
is
that
sort
of
where
you're
going?
Yes.
E
I've
been
thinking
about
this
for
a
while.
Actually,
as
you
start
doing
that
this,
you
can
ask
around
didn't
hun
sapien
hugs
after
though.
D
F
F
So
one
one
thing
that
may
be
useful
regarding
regression
testing:
if
you're
going
down
performance
regression,
testing
wrote,
is
Ben,
England
wrote
just
a
camp
simple
script
for
going
back
and
doing
basically
just
looking
at
performance
regression
between
different
sets,
important
data,
which
I
think
he
just
uses
json
as
input,
but
it
may
be
something
that
you've
interested
in
potentially
post-processing
smell.
Your
your
run,
data
with.
F
If
you
look
in
the
chat
window,
there's
there's
basically
just
a
simple
Python
script:
I've
been
England
wrote.
This
is
what
they
use
for
gluster,
actually
for
doing
their
regression
testing
it's
it's!
It's
not
real.
It
was
just
a
basically
a
basic
script
here,
but
but
this
what
will
probably
be
using
for
cpg
for
doing
like
aggression,
analysis,
I,
don't
know
if
it's
useful
to
you
or
not,
but
it
it
might
be
something
that
that
you
at
least
one
old
cat
awesome.
A
E
B
E
Have
to
give
it
a
threshold
or
a
99
percentile
comments
at
meals,
exactly
okay,
so
I
strongly
recommend
that
whatever
procedure
used
to
build
those
those
thresholds,
those
also
should
be
automated.
Basically,
we
would
like
to
be
able
to
point
it
at
a
new
set
of
hardware.
I
know
see.
Groups
is
supposed
to
remove
the
hardware
dependency,
but
it
it
won't
really
it'll
just
do.
D
E
B
B
E
F
D
B
Mean
the
only
thing
well,
I
assume
that
there's
a
driver
for
a
sequel
driver-
this
isn't
go.
This
is
implemented
in
go
so
I.
What
I'm,
assuming
is
that
there's
a
driver
for
sequel
for
CSV
that
speaks
sequel
which
I
believe
it.
There
exists,
something
like
that
so
yeah,
so
whatever
you
can
up
plug
sequel
to
will
be
supported.
E
B
B
So
once
you
have
this
information,
you
have
information
out,
see
groups
and
how
to
you,
how
your
host
looks
like
if
you
have
contextual
information
of
a
particular
run
and
then
the
output,
and
then
you
know
that,
but
that
study
that
is
valid,
then,
when
you
move
to
upper
to
a
different
setup,
you
would
like
to
when
something
breaks.
You
would
like
to
find
us
the
root
cause
when.
E
So
one
thing
that
might
be
valuable
is
each
tooth:
elegy
job
generates
a
summary
llamo.
So
for
these
the
performance
testing,
once
you
probably
also
want
to
dump
all
the
information
you
possibly
can
about
the
hardware
and
the
C
Group
configuration.
Is
that
what
you're
getting
out
so
that
later
on?
When
you
see
the
failure
you
can
get
as
much
information
those?
Yes,
you
can
yeah.
B
E
I
kind
of
just
assume,
yes,
I'm
kind
of
as
unlock
see
groups
are
simply
much
better
at
constraining,
I/o
throughput
than
I
think
they're.
There
is
no
way
we're
going
to
be
able
to
come
up
with
thresholds
that
are
aggressive
enough
to
actually
catch
regressions.
That
will
be
conservative
enough
not
to
trip
on
to
hardware.
I
I'm
not
sure
that's
a
worthwhile
design
goal.
I
think
the
design
goal
should
be
to
make
sure
it's
transparent
and
simple,
to
generate
new
thresholds
for
a
new
hardware.
E
Just
you
know,
in
addition
to
ratos
bench,
there's
a
tool
called
small
io
bench,
I
think
it
since
f
tools,
it
actually
outputs,
like
one
JSON
line,
/
io,
so
you
can
get
exact
latency
information
on
every
single
I.
Oh
I
wrote
it
because
rye
toast
bench
is
not
very
good
for
the
sort
of
thing
you
may
want
to
look
into
that.
That
may
be
less
tedious
to
work
with
them.
Ratos
much.
They.
E
Yeah
well
I
mean
it.
Ratos
match
also
has
other
properties
that
are
not
attractive,
like
it
only
writes
out
full
objects
and
it
moves
on
to
new
objects.
It
doesn't
ever
overwrite
objects.
So
it's
not
a
good
proxy
for
rbd.
For
example,
male
Oh
bench
writes
out
a
large
pool
of
objects.
You
know
that
the
size
you
of
the
number
in
size
you
specify
and
then
performs
a
pattern
of
configurable
size,
writes
and
reads
against
them.
You
may
find
that
to
be
a
more
flexible
tool.
E
I
think
it's
Inouye.
Well,
it's
a
it's.
A
it's
in
the
SEF
project.
I
think
it's
in
the
SEF
tools
package,
which
is
already
installed
by
default
in
technology.
That
is
it's
part
of
the
SEF
bundle
of
stuff
I.
Don't
think,
there's
a
ratos
task
for
it:
okay,
I,
sorry,
I,
don't
think,
there's
a
theft,
you
a
sweet,
ask
for
it,
but
that's
easy
to
write.
It's
just
a
wrapper.
I
mean
all
the
radios
bench.
One
does
is
invoke
great
as
bench
not
much
to
it.
You'll.