►
Description
This talk will show how we created a framework to benchmark service meshes, how to create large use and throw clusters, pipelining of metrics in persistent storage, how to choose the right metrics to get a holistic view of the performance of the mesh, (ab)use of the Grafana charts to get around the limitations of time series database, tweaks to the wrk2 tool to get the job done, etc.
A
Awesome,
I
hope
you
all
can
listen
and
hear
me
and
all
right.
So
thanks
nisha
for
the
introduction
and
hello
everyone
good
your
time
of
the
day
and
hope
you
all
are
doing
well,
thanks
for
coming
for
coming
to
our
talk.
A
First
of
all,
I'd
like
to
thank
hannah
for
setting
the
stage
for
us
because
now
attendees
have
an
understanding
of
what
the
service
mesh
is,
and
so
that
now
I
and
t
logo
can
build
on
top
of
it,
like
it's
a
really
good
segue
into
the
performance
aspect
of
the
service
mesh,
so
yeah
the
next
next
slide.
A
So
like
yeah,
we
are
already
introduced,
but
yeah
still,
I'm
sure
deshmook.
I
I
work
on
locomotive
the
kubernetes
distribution
of
ken
folk
and
my
core
speaker
is
tilo.
He
is
director
of
osn
security
and
leads
the
flat
car
team
flat
car
container
linux.
It
is
a
new
avatar
of
core
os
container
max
or
the
next
slide
yeah
a
little
bit
of
introduction
of
my
employer
kinfolk.
A
So
we
do
software
right
from
kernel
to
the
application
level,
of
course,
kubernetes
in
between
and
so
yeah.
In
one
line
we
like
to
say,
like
you,
are
kubernetes
and
kubernetes
linux
experts
and,
like
nasa
said,
we'll,
take
questions
at
the
end
next
slide.
A
So
I
I
I'll
talk
about
the
agenda
first,
because
the
like
simon
sinek
likes
to
say,
let's
start
with,
why
so
we'll
look
at
why
we
have
chosen
a
certain
way
to
collect
matrix
and
the
rational
behind
it,
like
everything
from
sample
size
to
statistical
spread
and
things
like
that,
and
then
we'll
go
on
to
look
at
the
implementation,
the
engineering
aspects
of
how
we
have
set
up
this
framework
and
how
we
have
collected
various
metrics
from
various
clusters.
A
And
finally,
we
will.
We
will
end
with
a
on
a
practical
note
by
looking
at
the
demo
how
it
all
fits
together.
So,
at
the
end
of
the
talk
you
will
learn
how
to
how
we
build
the
benchmark
framework
and
our
learnings
from
it.
So
if
you,
if
you
feel
feel
like
using
it,
you
you'll
get
a
good
idea
and
if
you
want
to
build
something,
similar
you'll
also
get
some
inspiration.
A
So
from
here
or
tortilla,.
B
Thank
you
siraj.
So,
let's
start
with
some
theory
the
rationale
behind
the
benchmarks
that
we
did
so
the
metrics
that
we'll
be
looking
at
indeed
cover
three
of
the
four
gold
signals.
B
So
thanks
for
bringing
this
up
hannah,
the
goal
that
we
have
the
overall
goal
of
our
benchmark
is
to
determine
the
cost
of
operating
service
mesh.
It
is
generally
a
comparative
benchmark,
so
we
are
looking
at
the
differences
between
different
service
measures.
We're
looking
at
regular
use
cases,
so
there
won't
be
any
cluster
overload.
B
The
data
we
collect
focus
mostly
on
request
response
latency,
as
hana
already
raised.
That's
the
one
that
you
can't
compensate
easily
of
just
throwing
money
at
your
cluster
you'll.
B
Also,
look
at
cpu
and
memory
usage
of
both
the
control
plane
of
the
service
mesh
and
the
sidecars
in
our
benchmarks
and
we'll
have
control
metrics
and
those
are
there
to
make
sure
that
we
really
don't
do
any
that
we
really
don't
run
into
overload
situations
so
we'll
be
looking
at
the
request,
response,
error
rate
of
our
load
generator
and
we'll
be
looking
at
cpu
and
memory
usage
of
the
application
under
load
and
the
benchmarking
tool,
and
if
those
saturate,
the
limits
and
the
nodes
there
on
and
obviously
not
in
a
regular
use
case.
B
The
sample
size
and
statistical
spread
is
something
that
I've
seen
ignored
and
quite
a
few
benchmark
results
that
have
been
published
out
there.
So
we
run
our
benchmarks,
as
many
classes
actually
run
in
an
infrastructure
as
a
service
environment
that
our
iis
provider
serves
us
in
their
data
centers.
We
have
limited
control
over
this
environment.
B
B
We
could
have
noisy
networks
so
basically
neighbors
that
do
a
lot
of
network
traffic
which
impacts
latency,
buggy
top
effect
switches
and
things
like
that,
and
we
wanna
identify
those
and
basically
exclude
those
from
the
results,
and
then
there
are
variations
that
we
need
to
include
not
basically
just
cover
the
statistical
spread
of
the
data
that
you're
collecting.
B
So
there
may
be
some
variety
of
these
servers
and
the
network
equipment
that
we're
using.
We
don't
know
if,
like
all
of
the
hardware,
runs
the
same
firmware
versions,
we
don't
know
about
hardware
revisions,
and
that
is
just
diversity
that
the
environment
introduces
and
it
just
you
just
can't
escape
it's
always
there.
If
you
use
data
centers,
you
need
to
make
sure
to
have
those
covered
in
your
benchmarks.
B
If
you
don't
host
yourself,
so
we
basically
repeat
the
same
runs
multiple
times
both
on
the
same
cluster
one
after
each
other,
as
well
as
on
multiple
clusters
that
have
the
same
heart
aspect.
We
just
basically
try
to
gauge
the
statistical
spread
that
we're
seeing
that
that
will
be
that
we
need
to
cover
in
the
data.
B
So
if
you
look
at
charts,
you
see
that
it's
neat,
you
see
latency
right,
but
this
chart
is
a
lie
because
it's
a
single
snapshot,
it
doesn't
tell
you
anything.
So
what
you
want
to
see
is
at
least
the
the
amount
of
the
ranges
that
your
that
your
tests
run
into.
B
Otherwise
you
may
you
may
even
look
at
an
outlier,
but
in
any
case
you
will
have
no
idea
about
the
minimum
maximum
spread
of
the
data
that
you're
collecting
and
that's
particularly
important
when
you're,
when
you're
comparing
service
meshes
with
that
are
supposed
to
add,
minimal,
latency
right
all
right
as
another
thing
that
we're
doing,
we
want
to
have
our
benchmark
user
experience
centric,
so
henna
had
this
great
animation
in
her
talk
where
you
could
see
the
inside
of
the
of
your
cluster
of
your
micro
architecture,
environment
and
you
could
see,
requests
and
responses
having
individual
latencies
going
back
and
forth.
B
This
is
not
what
we
want
to
measure.
So
we
like
to
take
the
position
of
the
user
and
the
user
always
uses
this
the
whole
application
at
once
in
your
cluster
and
your
application
then
consists
of
individual
microservices
and
you
want
to
cover
that
in
our
benchmarks.
So
a
single
user
action
fed
into
your
application,
which
consists
of
multiple
microservices,
will
cause
many
microservice
endpoints
to
be
called
so.
B
The
the
benchmark,
we're
gonna
run,
is
user-centric
and
will
basically
interface
with
your
cluster
as
a
big
application,
instead
of
just
covering
individual
endpoints
latencies,
that's
very
important
to
us,
and
then
we
have
a
very
specific
way
of
measuring
latency
overall,
and
it's
also
factors
into
the
user-centric
side
of
things.
There's
this
developer
jill
tenna.
Who
can
explain
it
a
lot
better
than
I
can
and
there's
this
youtube
youtube
talk
that
you
should
watch
and
he
he
coined
the
term
coordinated
omission.
B
So
that's
discarding
data
that
just
basically
feels
like
it's
not
there,
but
in
fact
it's
usually
user,
impacting
so
taking
coordinated
permission
into
account
allows
us
to
reflect
the
user
experience
on
wait
time
when
looking
at
latency
and
instead
of
measuring
the
requests
per
second
individually.
We
actually
measure
committed
rps
over
time.
B
So
to
give
you
an
example,
if
you
have
a
user
action
that
causes
in
your
cluster
100
endpoint
requests
and
your
application
commits
to,
on
average
100
rps
across
services,
that's
10
milliseconds
per
request
on
average,
then
the
user
expects
one
second
of
wait
time,
but
if
one
of
those
hundred
requests
has
a
stall
for
one
second
and
all
the
other
complete
in
10
seconds,
then
in
10
milliseconds,
then
the
user
will
have
a
200
percent,
wait
time
of
what
the
user
expected
and
we
don't
think
that
many
traditional
ways
to
measure
latency
actually
reflects
that.
B
So
if
you
look
at
the
example
in
terms
of
statistics,
we
have
99
requests
that
go
with
10
milliseconds
and
one
request
that
takes
a
second,
and
this
is
kind
of
what
it
looks
like
right.
If
you
take
the
average
on
the
per
request
kind
of
your
thing,
then
you
have
20
milliseconds.
On
average,
you
have
five
milliseconds
p25,
because
that's
below
the
10
milliseconds
average.
Obviously
p50
is
10
milliseconds,
because
that's
what
the
application
guarantees
pf
75
is
15,
milliseconds
and
even
in
p99.
B
But
if
you
instead
look
at
the
time
of
things
when
we
see
980
milliseconds
that
have
10
milliseconds
latency
response
times
on
average,
and
then
we
see
one
second,
that
has
a
thousand
millisecond
response
time,
and
if
you
factor
that
into
the
equation,
then
you
see
that
your
average
latency,
if
you
measure
over
time
and
not
over
individual
requests,
is
500
milliseconds,
because
you
spend
two
seconds
on
something
that
should
have
taken
you
a
second
and
the
latency
reflects
much
better
in
the
percentiles
of
what
you're
measuring.
B
So
that's
the
approach
that
we're
taking
and
how
do
we
fix
that
when
measuring
latency
on
the
technology
level?
Well,
so
to
get
started,
we
need
to
feed
the
expected
or
committed
requests
per
second
into
our
benchmark.
So
if
you
start
with
committing
to
100
readers
per
second,
that's
one
request:
every
10
milliseconds.
We
expect
the
first
request
to
go
out
at
0
milliseconds,
the
second
and
10
milliseconds
to
third
at
20
milliseconds.
B
So
if
one
of
those
has
larger
than
10
milliseconds
latency,
then
the
succeeding
request
will
not
go
out
in
time
and
that
again
is
easy
to
to
map
in
software.
So
our
fix
is
instead
of
measuring
latency.
At
the
point
in
time
where
the
request
goes
out,
you
start
another
measuring
latency
at
the
point
in
time
where
the
request
should
have
gone,
and
that
gives
us
this
great
time
focus
and
very,
very
user-centric
view
of
things
and,
let's
hand
over
to
suraj
for
the
implementation
part.
A
Sure,
thanks
tilo,
I
think
we
can
start
from
the
next
slide
yeah,
so
yeah.
This
slide
shows
you
how
the
setup
looks
like
benchmarking,
like
we
have
one
controller
cluster
like
this
is
all
locomotive
which
is
kubernetes
behind
the
scene
and,
like
all
the
logos
that
you
see
behind,
like
they
are
deployed
so
like
controller
cluster,
has
a
prometheus
deployed
with
storage
backed
by
open
ebs
like
this
storage
is
quite
quite
large
like
so
that
can
store
all
the
metrics
coming
from
various
and
then
on.
A
The
controller
cluster
is
going
to
help
us
to
visualize
all
the
metrics.
We
have
script
and
down
down
below.
You
see
all
the
leaf
clusters
that
have
been
deployed
from
controller
cluster,
so
each
leaf
cluster
has
various
components
like
open
ebs
again
for
storage
for
to
back
storage
of
prometheus
contour
and
metal
lb.
A
They
they
help
you
to
expose
your
application
over
the
internet
because
we
deployed
on
on
packet
cloud
and
they
don't
and
the
the
right
way
to
expose
application
over
the
internet
on
packet
cloud
is
using
metal
lv
and
then
at
these
endpoints
on
the
the
endpoints
on
the
leaf
clusters.
Are
then
scrapped
by
controller
clusters?
A
We
also
have
external
dns
in
use
here,
so
that
when
prometheus
and
prometheus
is
deployed
on
these
leaf
clusters,
there
is
a
dns
entry
made
for
it
in
the
aws,
so
that
controller
cluster
can
knows
where
to
where
to
scrape
from
and
a
link
id
and
sto.
As
you
see
like
these
are
the
two
service
meshes
we
we
did
benchmarking
on
and
they
are
deployed
as
as
needed
like
as
as
the
tests
progress.
A
We
also
have
metric
server.
There
is
no
logo
for
it,
so,
but
it
is
needed
because
istio
uses
horizontal
port
autoscaler
and
it
auto
scales
as
the
load
increases,
and
we
will
later
also
see
how
our
controller
cluster
knows
about
the
prometheuses,
prometheus
or
various
prometheus
running
on
the
leaf
clusters
so
yeah
over
to
next
slide.
A
So
this
whole
thing
that
happens
at
the
at
the
root
level
or
the
controller
cluster
that
you
see
it
is
all
done
by
this.
One
help
chart
called
orchestrator
help
chart.
So
so,
apart
from,
like
I
said,
before,
open
ebs
and
prometheus,
there
is
also
this
child
deployed
on
the
contraster,
and
so
it
does
various
things
like
it
has
multiple
jobs.
They
download
helm,
they
download
the
charts
as
well
and
terraform,
which
is
needed
by
the
locomotive.
A
Deployer
loco
ctl
also
builds
loco
ctl
because
we
were
like
also
experimenting
with
it,
so
we
could
always
we
could
give
this
commit
and
then
it
will
build
loco
ctl
and
all
these
binaries
and
helm
charts
and
everything
is
available
in
this
one
volume.
So
this
orchestrator
application
has
access
to
it
now.
A
Orchestrator
helm
chart
has
one
golang
application,
which
actually
runs
multiple
kubernetes
jobs,
and
then
these
jobs
go
and
create
these
leaf
clusters
and,
like
the
if
job
reports,
a
failure,
then
it
starts
again
because
kubernetes
jobs,
so
it
is
started
again
and
since
it
is
backed
by
a
volume,
we
don't
lose
on
any
manifest
or
any
configuration
that
was
used
for
creating
a
cluster.
A
If
there
was
a
failure
in
the
job
that
turned
out
to
be
very
cumbersome
so
yeah,
we
don't
delete
it
now
and
we
just
leave
even
after
jobs
are
completed
or
failed
or
whatever
the
configs
stay
there,
and
someone
can
always
go,
go
back
using
this
one
debug
port
that
is
always
running
and
then
do
whatever
like
delete
the
cluster
whatever
and
also
like,
like
I
said,
like
we
have
this
one
volume
which
has
all
all
binaries
and
configs
and
everything
so
this
yeah
this
this
helped
across
all
the
jobs
and
and
finally
like
none
of
the
scripts,
I
mean
this.
A
This
is
all
backed
by
a
bash
script.
Right
now
like
for
the
jobs
jobs
that
start
leaf
clusters,
none
of
these
scripts
were
baked
into
docker
images,
because
anytime,
you
wanted
to
make
a
change.
You
don't
want
to
build
a
docker
image,
push
it
to
a
registry
and
pull
it
again
and
test
it.
A
So
the
best
way
to
do
such
kind
of
things
is
create
a
config
map
from
that
scripts
and
mount
it
as
a
volume,
and
so
so
anytime,
you
make
a
change,
it's
only
helm
upgrade
and
the
thing
starts
again.
So
that's
one
of
the
learnings
and
yeah
over
to
the
next
slide
thanks.
A
So
the
the
like
every
job,
like
I
said
it
is,
it
is
backed
by
this
orchestrator
application,
so
it
runs
kubernetes
jobs
which
deploy
leaf
clusters,
and
then
it
deploys
external
dns,
prometheus
grafana
and
all
that,
except
for
istio
and
link
id,
because
we
we
do
that
later
when
we
start
the
benchmark
runs
since
and
since
this
controller
cluster
jobs,
like
these
jobs,
are
running
on
the
controller
cluster,
the
root
cluster.
A
This
is
where,
when
it
when
it
is
deploying
these
child
or
leaf
clusters,
this
is
where
we
do.
The
registering
part
like
this
is
where
the
root
prometheus
knows
about
these
child
prometheus
and
that's
how
prometheus
from
rose
creps
from
this
child,
so
yeah,
and
then
we
also
like
during
the
benchmark,
runs.
We
also
deploy
push
gateway
before
before
starting
benchmark
runs
and
let's
see
what
benchmark
runs,
look
like
or
to
next
slide.
Thanks.
A
So
look
like
you
can
see.
These
are
like
three
nested
for
loops
for
every
request
per
second,
like
500
000
1500,
we
run,
we
run
the
for
loop
for
five
times
and
we
run
it
for
three
types
of
service
mesh
link,
id
istio
and
no
service
mesh,
so
we
still
call
it
service
mesh,
and
so
so,
every
time
what
happens
is
it
the
service
mesh
is
installed?
If
it's,
if
it's
bare
metal
I
mean,
if
there
is
no
service
mesh,
then
we
don't
do
anything
we
just
return
from
that
function.
A
Then
we
install
this
emoji
water
application.
So
emoji
auto
is
a
dummy
application
that
linked
rde
ships,
and
so
it
was
used
so
that,
like
we
needed
to
simulate
the
micro
services
architecture
and
after
that
is
in
this
run,
benchmark
function
deploys
the
actual
wrk
to
help
chart.
This
is
where
the
job
runs.
By
taking
all
these
parameters,
it
starts
firing
all
these
requests.
A
And
finally,
we
run
the
merge
job
so
that
all
the
metrics
are
are
sent
to
push
gateway.
And,
finally,
when
this,
when
both
the
jobs
are
done,
we
delete
the
emoji
water
applications
again
and
clean
the
mesh.
A
So
we
clean
it
every
time
because,
let's
say
when
you
deploy
sdo
and
like
you're
deploying
this
application,
the
proxy
is
injected
by
proxy
is
injected
by
the
the
mutating
webhook
that
is
always
running
so
so
we
need
different
proxies
for
different
service
measures
and
we
need
no
proxy
when
there
is
no
service
mesh.
So
that's
why
every
time
we
install
and
every
time
we
clean
at
the
end.
A
So
this
is
the
the
core
of
what
happens
in
there
or
to
next
slide
yeah.
So
at
the
node
level
you
this
is
how
leaf
cluster
looks
like.
So
there
are
these
multiple
workload
nodes
where
emoji,
auto
applications
are
running,
and
then
there
is
one
benchmark
node,
which
is
running
your
wrk
to
application.
So
all
the
requests
from
benchmark
node
from
wrk2
are
filed
to
these
emoji
water
applications.
You
can
see
like
all
of
them
are
of
same
type.
A
This
machine
type
is
available
on
the
packet
cluster
and
the
controller
is
just
single,
because
there
is
no,
no,
not
that
much
load
on
the
on
the
control
plane
of
kubernetes.
So
it's
just
one
more
or
two
next
slide
so,
like
I
said
before
so,
wrk2
is
used
to
generate
the
load
and
measure
latency
and
emoji
over
to
as
a
demo
app,
and
it
is
deployed
in
like
in
multiple
times
in
multiple
applications.
A
So
you
can
always
tweak
that
how
many
applications
you
want,
depending
on
how
much
you
want
to
stress
the
whole
thing
yeah
or
to
next
and
so
so
locomotive.
I
think
I
didn't
mention
it
before
so
all
like
when
I
say
components,
we
have
this
notion
of
components
or
all
the
helm,
charts
are
sort
of
backed
and
the
configs
that
these
components
provide
are
supported
by
by
by
kinfolk
as
a
part
of
locomotive,
so
link
id
and
sql.
A
We
have
added
as
experimental
right
now
and
if,
if
you
want
to
check
out,
you
can
just
download
the
local
sequel
binary
and
you
can
deploy
the
cluster
and
these
components.
A
So
once
you
get,
the
slides,
you'll
have
the
links
as
well
sure
or
to
next
slide.
So
we
saw
the
earlier
for
image
that
we
saw.
It
was
the
node
level
flow
of
data.
This
is
what
happens
at
the
pod
level,
so
you
can
see
like
wrk2,
that
job
is
first
of
all,
sending
all
these
http
requests
to
various
applications,
and
once
all
the
metrics
are
collected
by
wrk2
they're
pushed
to
the
prometheus
push
gateway
now
the
now
so
why
push
gateway?
You
might
ask
so
so
prometheus
is
very
good.
A
It
is
a
has
a
pull
mechanism
of
metrics
now
something
that
is
very
short-lived,
like
a
kubernetes
job.
How
do
you
live
in?
It's
it's
not
very
efficient
in
terms
of
like
by
the
time
promise
starts
scraping
like
discovers
this
job
and
starts
creeping
metrics
from
it,
and
the
job
might
have
died
and
you
might
lose
out
on
metrics.
A
So
push
get
reacts
as
a
as
a
stop
gap
solution
here,
where
you
push
all
your
metrics
to
push
gateway
and
then
promise
you
scripts
it
from
the
push
gateway
and,
like
you
can
see
here,
like
the
prometheus,
like
every
every
leaf
cluster
has
a
prometheus
and,
like
I
explained
earlier,
the
root
cluster
then
scripts
from
these
parameters,
which
we
haven't
seen
shown
here.
But
that's
what
happens
like
the
root
scrapes
from
this
parameters.
So
we
have
metrics
from
all
these
leaf
clusters
in
one
root
cluster
so
that
you
can.
A
You
can
see
metrics
from
various
clusters
so
that
we
to
increase
the
spread
so
yeah.
I
will
do
next
slide
yeah
and
now
we
can
see
the
demo
of
how
it
all
happens.
B
B
Just
have
a
look
at
the
repo
later
and
see
which
parts
you
can
use.
What
I'm
going
to
demo
now
is,
if
I
just
quickly
this.
Basically,
this
is
what
we're
gonna
looking
at
be
looking
at.
So
it's
a
single
leaf
cluster
it'll
be
pretty
low
level
I'll
start
a
single
benchmark
for
you
before
we
can
start
a
benchmark.
Obviously
we
need
to
provision,
but,
since
you
know
the
technology
behind
provisioning,
a
cluster
is
pretty
amazing,
but
looking
a
cluster
actually
being
provisioned
is
kind
of
like
looking
paint
watching
paint
dry.
B
B
B
B
B
B
The
parts
and
in
the
single
name
space
we
see
the
three
microservices
the
motivation
consists
of
all
of
those
microservices
will
have
endpoints
and
will
cover
all
the
namespaces
and
all
microservices
in
them
all
right
now,
the
results
of
those
benchmarks
will
be
displayed
in
a
grafana
dashboard
that
we
call
the
benchmark
cockpit,
that
is
set
up
to
give
you
an
overview
of
individual
benchmarks.
You'll
be
able
to
introspect
benchmark
data
and
you'll
you'll
be
able
to
see
the
benchmark
running
now.
Let's
start
the
benchmark.
B
B
No
there's
my
helm
come
on
it's
reasonably
complex,
so
it's
it's
a
good
idea
to
have
this
in
your
history
and
this
basically
deploys
wlk2
and
tells
wlk2
that
there
are
five
applications.
That's
the
five
emoji
vocal
instances,
the
benchmark.
It
sets
a
committed
rps
which
is
50
rps,
because
we're
going
to
take
it
low
here.
It
sets
the
duration
to
two
minutes.
24
connections,
that's
24,
wlk
threads
and
it
gives
an
initialization
delay
of
10
seconds.
B
B
And
wlk
2
is
running
and
you
should
be
seeing
a
new
entry
in
the
benchmark
list
here
and
that's
basically
the
the
currently
active
benchmark.
We
see
that
it's
eight
percent
done
and
just
for
refresh
as
the
benchmark
is
running,
you
can
see
the
average
and
current
rps
that
is
being
emitted
while
the
benchmarks
running
and
it's
not
stellar,
it's
just
50
rps
and
that's
for
demo
purposes.
Obviously
you
folks
do
your
own
gauging
and
determine
your
own
rps
rates.
B
In
the
middle
section
of
the
dashboard,
you
can
introspect
service
mesh
resource
usage,
which
we
currently
don't
see
because
we're
not
benchmarking
a
service
mesh
and
things
run
on
their
manner.
Please
scroll
down
a
bit.
We
see
our
control
data,
so
we
have
the
memory,
usage
and
load
generated
by
the
benchmark
tool.
That's
pretty
it's
pretty
low.
The
load
is
0.0
something
and
it
takes
1.4
gigabytes
of
ram,
and
we
have
plenty
left.
B
You
see
the
same
here
for
all
of
the
modulo
applications,
that's
the
microservices,
so
we
can
have
a
a
kind
of
control
mechanism
to
validate
that
we're
really
not
overloading
anything,
and
we
also
have
a
bird's
eye
view
on
the
cluster,
which
is
the
load
on
all
of
the
cluster
nodes,
which
is
also
quite
knowledgeable.
B
B
B
B
We
have
latency
percentiles
here
and
we
have
a
very
detailed
breakdown
of
latency
percentages
down
here
to
dive
deeper
now
single
benchmark
runs
are
very
nicely
into
introspectible
using
this
dashboard,
but
to
really
get
a
summary
of
things,
we
created
a
summary
dashboard,
so
this
will
show
you
comparative
latencies
of
service
measures
compared
to
bare
metal
runs
and
since
grafana
wasn't
really
built
and
prometheus
wasn't
really
built
to
display
non-time
series
related
data.
This
is
kind
of
a
little
manual.
B
So
what
we
need
to
do
now
in
order
to
refresh
this
in
order
to
feed
our
our
benchmark
run
into
that
is,
we
need
to
run
a
separate
job
that
is
the
matrix
merger,
so
the
matrix
merger
will
simply.
B
B
It
should
have
completed
by
now
it's
done,
and
we
can
refresh
this
dashboard
and
we'll
see
here
a
new
one
having
popped
up.
This
is.
This
is
a
little
almost
abusive
to
to
grafana's
charts
that
we're
doing
here,
and
it's
just
to
basically
display
all
of
the
different
percentiles
that
we
have
in
the
data
and
since
the
prometheus
push
gateway
will
continuously
feed
the
merge
data
into
prometheus.
B
Just
waiting
a
little
will
basically
give
us
the
display
that
we
need,
and
then
we
have
an
overview
of
all
of
the
runs.
The
first
section
of
the
dashboard
has
a
comparative
overview
and
we
can
scroll
down
and
can
basically
introspect
bare
metal,
link,
id
and
istio
percentage
individually.
I
did
a
few
runs
before
this
presentation
to
just
warm
up
a
little
data
for
you.
So
there's
some
data
in
this
in
this
dashboard
already
now
something
else
this
dashboard
is
very
good
for
is
spotting
outliers
and
there
is
a
bare
metal
outlier
right
here.
B
If
you
look
at
the
higher
percentiles,
we
see
a
spike
in
latency
and
it's
it's
a
little
more
than
80
times
the
latency
that
we've
seen
in
every
single
other
benchmark.
So
chances
are
that
we
had
a
noisy
neighbor
or
something
else
going
on
here.
So
the
dashboard
offers
you
the
option
to
actually
exclude
runs
which
we're
gonna
do
now.
This
is
the
offensive
run.
It's
now
excluded
and
data
can
be
looked
at
without
this
run.
So,
as
suraj
mentioned,
the
the
controller
cluster
will
have
the
merge
latencies
of
every
single
of
the
leaf
clusters.
B
So
there's
a
special
version
of
this
dashboard
for
the
controller
cluster
that
gives
you
information
of
the
various
benchmark
run
on
on
the
specific
leaf
clusters
as
well.
Now,
if
you're
interested
to
dive
into
a
specific
benchmark
here,
for
instance,
for
some
reason,
we
want
to
introspect
this
specific
run
and
we
can
click
the
link
and
that
will
take
us
to
the
to
the
benchmark
cockpit
again-
and
this
is
frozen
in
time
at
exactly
the
time
span,
where
this
benchmark
happened.
B
So
we
can,
for
instance,
introspect
all
of
the
running
benchmark
data.
We
can
look
at
the
results
we
can
see
if
there
are
any
transport
errors
that
have
happened
or
we
can
give
a
get
a
detailed
breakdown
of
the
latencies.
Basically,
everything
that
the
cockpit
has
to
offer.
As
a
closing
note,
I
said
that
we're
benchmarking
whole
whole
applications
and
we
consider
the
cluster
to
be
the
application.
B
So
one
of
these
statistics
we
have
in
the
results
section
is
for
every
single
endpoint,
the
actual
amount
of
calls
of
requests
that
this
specific
endpoint
received.
This
is
something
that
you
can
configure
in
the
in
the
benchmark
helm
chart.
B
So
if
you
want
to
have
a
different
distribution
of
endpoint
calls,
you
can
just
edit
the
simple
text
file
and
we
will
get
you
there.
And
that
concludes
our
presentation.
B
C
So,
thank
you
very
much.
This
was
very
interesting
to
me
and
I
I've
read
the
article.
I
think
it's
already
one
year
old,
what
the
article
you
wrote
about
linker
d,
benchmarking
and
istio,
and
actually,
when
we
built
service
mesh.es,
we
were
thinking
about
doing
a
benchmark
and
then
we
were
like
no.
This
is
this
is
crazy.
C
You
know
you
have
to
know
many
things
to
do
benchmarking
correctly
and
reading
your
article
and
also
the
comments
people
like
no,
you
need
to
do
this
and
that
so
I
think
this
was
a
lot
of
work.
C
So
thank
you
very
much
for
doing
it
and
my
my
questions
are
the
benchmark
you
just
did.
Did
you
did
you
do
it
with
the
latest
versions.
B
Those
are
the
in
development
versions
that
we
have
in
locomotives.
So
raj
has
more
details
on
that,
so
we
he
started
migrating
istio
and
link
id
into
locomotive
as
components.
This
is
ongoing
work,
as
suraj
mentioned
it's
experimental,
so
we
would
by
no
means
call
this
a
comparative
benchmark
right
now.
The
thing
representing
today,
just
the
the
automation
environment
around
it,
and
also
so
it's
by
accident,
istio
and
lingard,
because
that's
what
we
integrated
with
locomotive
it
can
be
done
with
any.
C
B
A
Yep
the
the
istio
version.
I
think
it's
the
the
one
before
the
I
mean
when
we,
when
we
did
the
second
avatar
of
the
second
service
measure
benchmark
we
wanted
to,
like
you,
know,
make
it
a
framework
so
that
others
can
use
as
well.
A
Integrated
electricity
and
and
sdl,
so
we
use
the
sd
operator
for
it
to
deploy
it
and
for
link
link
id.
Is
the
latest
table
not
the
edge
one,
not
the
edge?
Okay,.
C
So
you're
you're
already
using
the
the
sdod
component,
I
mean
the
the
yeah
okay.
I
would
expect
that
they
have
been
improved.
The
performance
has
improved
since
1.5,
because
I've
only
seen
blog
posts
and
benchmarks
for
all
the
versions,
which
is
not
very
interesting.
If
you
look
at
the
current
versions,
so
I
I
saw
in
the
in
the
benchmark
that
they
are
pretty
much
the
same
right.
C
Look
at
the
current
istio
and
linker
diversions.
B
Yeah,
but
I
mean
it's:
it's
really
not
a
quantifiable
this.
This
is
really
not
quantifiable
data.
It's
just
something
I
fired
up
yesterday
on
packet
and
ran
a
few
things,
so
there's
not
no
optimization
going
on
here
and
we
haven't
looked
into
into
any
of
that.
The
thing
we
didn't
really
quite
finish
one
year
ago,
when
we
did
the
initial
benchmark
is
so
we
were
quite
dissatisfied
with
the
level
of
automation
that
we
had
in.
A
B
Benchmark
you
know
suit,
and
that
has
changed
a
lot
in
the
in
the
recent
months,
like
there's
been
significant
improvements
into
it
in,
I
think
all
parts
of
it.
So
why
are
we
set
up
now
to
do
like
another
round
of
benchmarks?
We
didn't
really
do
it
now
and
I
wouldn't
I
wouldn't
start
with
50
requests
per
second
right,
comparing
the
two.
I
guess
they
can
deliver
slightly
more
so.
B
That's
yeah,
I
mean
we're
saying
that
we're
testing
clusters
in
regular
operating
conditions,
but
I
think
you
can
push
it
a
little
further
than
50.
okay
to
get
a
quick
demo
done
and
have
no.
You
know
life
demo.
Badness
is
happening
to
us.
So
if
you're
driving
it
the
safe
way.
C
B
So
we
were
looking
at
other
test
applications
a
little
they're,
not
hard
to
integrate.
You
must,
if
they're
automated,
and
if
they
can
be
deployed
automatedly,
then
it's
it's
pretty
straightforward
to
do
the
thing
with
emoji
water
is,
it
turns
out
to
be
quite
efficient,
so
the
first
thing
we
did
because
we
have
emoji
water
pretty
well
integrated
in
the
test.
Suite
first
thing
we
did
is
we
just
tried
to
push
it
as
hard
as
we
can
on
bare
metal
and
turns
out.
B
This
thing
can
do
a
lot
of
requests
per
second
without
introducing
much
additional
latency
and
with
no
errors
at
all.
So
it's
perfect
for
for
using
it
as
a
test
application.
I
don't
want
to
talk
bad
about
other
demo
applications,
but
we
also
tested
the
book
info
or
books,
one
that
comes
with
linkedin
and
couldn't
really
push
it
harder
than
50
or
100
weeks
per
second,
without
massive
errors
and
and
delays,
which
is
probably
a
good
thing.
B
If
you
want
to
deliver
a
demo
application
for
a
service
mesh
because
then
you
have
something
for
debugging
right.
It's
it's
not
a
good
thing.
If
you
want
to
want
to
run
a
benchmark,
if
there's,
if
there
is
any
other
target
application
that
we
should
be
looking
at
we're
taking
we're
taking
patches,
npr's
like
this
is
the
whole
service.
Mesh
benchmark
project
is
an
open
source
project
or
on
github.
If
you
want
to
automate
things,
yes,
please.
B
Yeah,
absolutely
so,
as
mentioned
the
the
the
service
benchmark
suite
that
that
version
2.0
that
we've
worked
on
and
that's
now
pretty
much
done,
has
like
several
layers.
Where
you
could,
just
you,
don't
you
don't
even
need
to
go
as
far
as
deploying
your
own
locomotive
cluster.
So
if
you
have
a
cluster
up
and
running,
you
can
use
the
benchmark
port
straight
away.
As
long
as
you
have
grafana,
where
you
can
put
the
dashboards,
which
are
also
in
the
repository.
B
So
there
are
json
exports
for
grafana
dashboards
for
the
cockpit
and
summary
one.
You
just
use
helm
to
deploy
the
wrk
port
and
you
configure
just
your
own
applications
and
points
in
the
in
the
handshake
before
deploying
and
then
this
thing
runs
right
and
as
long.
C
B
B
Of
course,
if
you
want
to
do
a
comparative
benchmark
like
what
service
mesh
is
best
for
me,
you
would
need
to
do
manually.
What
is
automated
in
the
in
the
service
benchmark
suite,
so
you
basically
have
to
remove
whatever
service
measure
we're
trying
to
get
a
bare
metal
benchmark.
B
Add
it
and
then
remove
it
again
and
put
on
the
other
service
mesh
that
you
want
to
benchmark
against,
and
that
is
all
good
right.
I
mean
because
it's
still
your
and
your
own
application,
but
if
you
want
to
have
a
comparison
between
multiple
service
meshes,
then
having
the
second
layer
of
of
extraction,
where
locomotive
basically
does
all
of
the
leg
work
for
you,
it's
probably
good.
It's
probably
a
better
idea.
A
C
Can,
if
you,
if
I
want
to
add
let's
say
the
traffic
mesh
or
console
or
something
else
I
just
I
have
a
file
where
I
place
the
install
you
know,
commands
and,
and
I
could
integrate
it
or
service.
B
Meshes
the
the
what
they're
currently
doing,
which
is
by
no
means
limited,
which
should
by
no
means
limit
the
way
you
would
do
things.
But
what
you're
currently
doing
is
we're
implementing
the
service
measures
as
components
in
locomotive
yeah.
That's
good.
C
B
It's
also
an
open
source
project
right,
so
the
to
answer
your
question:
you
would
you
would
go
to
the
locomotive
project
and
you
know
create
either
fork
it
or
create
a
pr
there.
That
adds
this
as
a
component,
which
is
cool
if
the
which
is
easy
and
straightforward.
B
If
the
service
mesh
is
just
the
handshard
or
if
it's,
if
it's
a
straightforward
controller
for
istio,
and
then
you
basically
use
locomotive
for
the
lowest
level
of
automation,
and
you
can
put
the
service
mesh
benchmark
framework
on
top
of
it,
and
it
will
run
you
back
if
that
scares
you
by
no
means
you're
limited
to
that,
you
can
always
have
your
custom,
your
own
scripting,
around
running
benchmarks
and
only
base
things
on
the
on
the
server
smash,
benchmark,
repo
and
install
and
remove
your
service
mesh
use
it
using
other
scriptings
that
works.
C
B
So
there's
a
there's,
a
metrics
and
benchmarking
working
group
at
that
cncf
and
that's
probably
the
right
level
to
kind
of
establish
this
kind
of
a
response,
because
I
feel
benchmarking
is
a
thing
that
will
always
need
an
independent
body
to
carry
out
it's
it's
it's
slightly
political
and
well.
We've
noticed
that
with
our
first
benchmark
post,
but
that's
that's.
C
B
And
I
get
it,
I
mean
I
get
why
people
get
kind
of
you
know
agitated
when
when
they
read
things
about
their
favorite
servicemen.