►
Description
Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Yingxin Cheng, Intel Software Engineer
A
A
B
Hi
everyone,
my
name,
is
changing
I'm,
a
software
engineer
from
Intel
and
contantly
working
on
the
performance
analysis
of
self.
So
today's
topic
is
pink
quartz
airport
on
a
total
cluster
behavior
missus.
The
motivation
is,
of
course
the
bottleneck
is
a
very
important
part
in
performance
analysis,
and
it
is
a
very
difficult
task
to
identify
bottlenecks
in
in
a
very
complicated
system
like
self
and
I
want
to
share,
there's
a
way
to
do
it
instinctively
and
very
fast,
and
now.
A
B
Want
to
share
the
important
use
case,
use
cases
of
district
or
the
tracing
okay.
So,
let's
start-
and
you
can
also
reach
me
personally
or
Australis
email
and
we
also
have
demo
outside
in
the
Intel
booth-
you
can't
meet
them
so
I
want
to
reason
the
entire
idea,
by
with
three
fundamental
questions,
why
performance
matters?
What
is
performance
and
how
to
improve
it?
They
were
helped
understand
the
ideas
about
how
to
protect
better
performance
data
and
how
to
represent
performance
from
from
the
DISA
collected
and
how
to
do
the
following
analysis
and
improvements.
B
So
first
why
performance
matters?
Of
course,
we
don't
want
users,
fear
the
system,
the
subsystem
very
laggy
and
unresponsive.
So
it
means
that
if
we
do
not
have
enough
time
to
monitor
all
the
activities
inside
the
cell-
and
even
you
don't
have
enough
time
to
allow
this
all
the
latencies
in
the
user
requests,
but
we
should
not
overlook
all
the
costs
that
is
directly
related
to
the
the
request
the
response
presents,
because
any
of
them
could
be
the
bottleneck
and
the
coverage
of
the
bad
user
experience
so
using
existing
tools.
B
I
think
it
is
very
tedious
and
time-consuming
to
to
crack
all
the
latencies
from
the
every
corner
of
self
components,
but
I
think
the
better
ways
to
not
clog
the
latencies
manually,
but
to
use
a
distributive
tracing
to
reveal
the
history
of
of
of
requests,
end
of
and
start
from,
the
respondent
wanted
to
find
their
to
find
a
critical
path
to
the
request
being
sent
and
the
collected
response,
responding
costs
on
that
critical
path,
and
we
built
up
a
prototype
to
to
do
this.
This
is
an
example
of
array.
B
Dos
rights
requests
across
one
client
and
three
OS
DS
in
that
cluster,
and
it
is
beaut
based
on
pure
happens
before
relationship
between
events,
that
between
events
and
the
the
costs
between
these
events
are
either
in
threatening
occasion
or
cross,
strap
and
even
crossed
host
relationships,
the
the
cost
of
in
throught
executions
and
the
cost
between
threads
and
processes,
and
we
can
find
sound
from
the
responding
point.
There
is
one
way
back
to
to
the
request
being
sent
because
of
the
definition
of
the
crater
passes,
the
longest
execution
paths
and
Kazi.
B
The
events
are
consecutive
there.
Their
costs
are
not
overlap,
so
it
brings
us
a
very
big
advantage
to
show
the
performance
of
concurrent
requests
and
the.
In
order
to
explain
this,
I
need
to
first
explain
what
is
actually
performance.
The
performance
is
basically
a
two
dimensional
concept,
the
latency
and
the
throughput,
and
it
is
it
very
easy
to
understand
if
we
want
to
just
chat
for
a
group
of
people
if,
if
used
planks
in
his
very
fast,
so
it
has
latency,
but
it
can
only
take
a
limited
number
of
people,
so
it
has
matters.
B
Has
the
throughput
is
not
very
good
but
on
contrary.
If
we
use
trains
it
has,
it
can
take
a
lot,
a
huge
amount
of
people,
so
the
throughput
is
good,
but
the
latency
is
not
because
the
strain
is
relatively
slow
and
we,
we
usually
who
usually
accustomed
to
latency
only
analysis
like
the
SAP,
perf
contest
or
collect
metrics
of
latencies,
and
all
we
usually
accustomed
to
measure
cost
individually
use.
Oh
people
as
dop,
trackers
or
or
the
using
the
SAP
parking
include
a
parking
tracing
that
we
distributed.
B
We
can
first
draw
two
lines
to
represent
these
throughput.
That
goes
inside
this
function
and
the
throughput
goes
up
of
this
function
and
the
more
steep
the
Landis
means
the
Bannister
put,
and
the
distance
between
these
two
lines
means
represent
the
latency
of
that
function.
So
if
we
improve
this
function,
to
have
better
throughput
will
see
that
the
line
of
the
output
will
be
better.
It
will
happen
to
be
more
steep,
and
if
we
improve
the
same
function
with
that
latency,
you
can
see
the
result.
B
The
distance
between
two
lines
become
reduced
and
in
the
ideal
situation,
if
it
has
better
latency
and
baddest
report,
we
can
see
two
lines
close
to
each
other
and
we
come
with
one
line
and
so
how
to
represent
the
performance
of
concurrent
requests.
Now
we
have
one
quick,
critical
path
per
her
request
and
that
it
was
consecutive
costs.
B
Then
we
can
stack
the
critical
path
together
and
aggregate
the
costs
by
its
logical
steps,
and
so
we
can
represent
the
latency
and
the
throughput
of
individual
steps
and
the
because
the
the
output
of
the
previous
steps
equals
to
the
involute
of
the
following
steps,
because
the
steps
are
consecutive,
and
so
we
can
see
the
representation.
The.
B
B
For
example,
if
we
generate
importing
generates
right
inputs
from
a
file
and
setting
the
I
adapts
equals
to
32,
and
we
can
see
clear
from
the
second
graph
that
the
fil
will
wait
until
the
previous
requests
being
finished,
and
here
it
stands
another
the
later
group
of
32
concurrent
requests
and
next,
how
to
understand
the
bottleneck
in
that
graph
and
what
Nakki
is
also
a
tree.
I
think
it
is
also
a
two
dimensional
concept.
Elating,
the
latencies
of
different
steps
are
an
unnecessary.
B
B
B
It
also
is
the
worst
case
is
that
the
the
lowest
performance
part
will
cost
the
requests
in
weight.
If
you
wait
before
this
point,
is
the
slowest
stupid
point
and
it
costs
weight
latencies
in
in
those
requests
and
most
of
the
case
it
becomes
a
bottleneck
of
the
entire
system,
and
we
can
very
clearly
see
see
this
happening
in
that
representation
of
performance.
B
Okay,
now
my
performance
can
be
identified
relatively
easily
and
that
next
is
how
to
improve
the
performance.
The
very
short
answer
is
to
identify
the
root
causes
of
the
bottleneck
and
then
to
resolve
them
and
improve
the
performance,
but
the
reality
is
much
longer,
because
there
are
many
kinds
of
factors
that
can
impact
the
performance,
for
example,
in
three
categories:
physically
the
system,
the
cluster
configuration
or
how
we
deploy
the
cluster
or
which,
which
hardware
we
use
in
that
posture,
will
affect
the
performance
and
the.
B
Secondly,
logically,
the
parameters
we
choose
in
functions
and
the
different
algorithm
we
choose
and
the
whole
architecture
of
the
entire
software
can
have
impact
to
performance
and
other
workload.
At
the
same
time,
whether
it
is
indeed
a
system
or
it's
even
self
will
impact
the
performance.
So
there
are
almost
infinite
combination
of
these
factors
and
I
think
it
is
bad
to
do
optimization
in
blindness
survey,
because
we
have,
we
have
identified
the
bottleneck
and
also
identified
the
related
costs
of
that
bottleneck,
and
we
also
related
each
cost.
B
It
means
to
use
control
variables
or
open
all
methods
to
see
the
impact
of
different
factors
and
find
what
to
do
to
improve
the
performance
and
then
verify
these
solutions
to
see.
If
we
did,
it
is
actually
battle
and
the
the
incremental
analysis
means
there
is
their
needs
are
interact.
The
the
incremental
analysis
means
there
needs
to
be
an
interactive
front-end
to
do
the
data
driven
analysis
and
the
soul.
B
Okay,
so
that's
that's
all
of
the
entire
idea
about
how
to
identify
button
acting
in
the
SAP
system
and
to
do
the
pouring
optimizations,
the
we
leverage,
the
distribute
tracing
to
collect
the
critical
passes
on
the
user,
gig
requests,
and
then
we
developed
a
visualization
technique
to
to
represent
the
performance
straightforwardly
and-
and
we
developed
interactive
from
them
to
do
the
incremental
analysis.
And
here
is
the
example:
we
we
will
have
a
prototype
to
to
track
the
IBD
image
rats
inside
that
cluster,
and
there
are
three
requests
related
on
the
apathy
layers.
B
There
are
image
rat
requests
that
internal
internally
represents
the
right
right
or
Roger
Klotz
in
that
image,
and
this
request
will
trigger
object,
requests
to
write
the
data
into
the
objects
in
that
image
and
then
at
riddles
level.
There
are
the
object,
request,
triggers
object,
write
operations
to
persist
the
data
across
the
entire
cluster,
and
we
we
define
as
three
VM
environment
and
the
use
a
file
to
generate
the
right
inputs
and
the
collected
Chasen's
during
the
experiment
and
then
in
the
interactive
analysis
content.
B
We
reload
the
data
with
we
load
the
result
tracing
result
into
a
variable
called
data
and
the
past
that
they
turn
into
the
three
types
of
requests.
Of
course,
there
are
many
other
requests
in
this
existing
code,
but
we
we
think
that
these
three
is
the
most
important
and
this
for
this
syntactic
from
Amy's
view
based
on
a
path
and
Jupiter.
It
is
a
very
famous
open
source,
open
source
web
application.
To
do
the
data-driven
analysis.
B
B
We
can.
We
can
see
back
what
has
very
clearly
that
the
the
bottleneck
is
accident
stamp.
The
image
contacts
right,
X
operation
to
the
operation
is
completed,
and
we
can
choose
to
highlight
this
step
in
that
in
that
graph
and
the
true
the
purple
color.
So
we
can
see
that
all
costs
are
related
to
this
step,
and
this
step
means
the
right
acts
of
operation
of,
in
fact
triggers
the
object
requests
in
the
next
layer,
and
we
also
can
directly
find
out
that
the
bottleneck
is
at
least
the
old
map.
B
Operation
object,
map
operations
and
the
AIO
operate
operations,
and
these
two,
these
two
operation
actually
triggers
the
object,
write
operations
at
readers
level,
and
we
also
identified
the
bottleneck
as
in
these
three
steps
and
they
represent
the
queue
operations
in
OST
queue
operations
and
the
actual
disk
rats
in
in
the
object
store.
So
actually
the
whole
botnet
is
is
limited
to
at
this
environment.
Things
is
related
to
the
queue
operation
and
the
writes
in
the
object
store,
and
then
we
do
different
analysis.
B
Firstly
to
filter
out
the
P,
for
example,
the
OST
queue
operations
by
their
step
name,
and
then
we
we
find
if
these
costs
are
related
to
the
physical
location.
So
we
distribution
body
were
the
host,
it
happens
and
we
found
that
the
median
number
are
similar
so
that
it
it
it
has
less
impact
by
by
the
host.
And
then
we
tried
found
out
if
it
is,
if
it
is
related
to
the
logical
location
for
the
they
happen
ordering
in
the
logical
workflow,
and
we
found
that
the
first
occurrence
of
the
in
queue
operation.
B
Ust
has
real
very,
has
a
much
larger
median
number
and
it
means
that
this
latency
means
that
the
following
do.
Okie
operation
in
primary
OSD
has
as
not
doesn't
has
have
good
performance
by
by
using
this
interactive
analysis,
and
then
we
start
to
find
out
the
root
causes
of
of
why
this
impure
operation
becomes
so
slow
and
we
found
out
three
related
configurations.
The
first
is
the
number
of
page
books
of
the
poor
and
the
nap.
A
B
B
And
the
interactive
analysis
actually
can
do
more
than
that,
for
example,
to
to
represent
all
kinds
of
distributions
or
to
find
out
the
longest
or
the
most
complex
of
quests
and
the
visualize,
those
requests
and
or
or
we
can
represent,
the
message
he
kidnapped
between
hosts
of
the
entire
cluster
and
at
higher
level,
because
we
attach
to
the
wrong
time
contacts
with
this
request.
We
can.
B
We
can
reckon
Alice's
if
the
right
in
elastics
are
balanced
across
the
entire
cluster
or
by
comparing
the
cave
EUR
of
image
right
requests
in
a
BD
and
the
objective
quests
in
a
BD
to
find
out.
If
the
Abadie
caches
is
valid
or
is
it
works
good,
and
if
we
can
combined
it
with
resource
monitoring
tools
like
Sabbath's,
we
can
find
out
the
specific
logic
that
consumes
excessive
resources.