►
From YouTube: Summit 2022: Benchmarking the performance of CPU pinning using different virtual CPU topologies
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
Okay,
so
marcelo
we'll
cover
the
both
the
introduction
and
the
motivation
part.
He
will
talk
about
the
cpo,
pining
benefits
and
drawbacks
and
also
he
will
list
the
different
scenarios
of
virtual
cpu
topology
and
the
configuration
and
the
goal
of
each
scenario.
And
after
that
I
will
talk
about
some
interesting
characteristics
of
hyper
threading.
C
Okay,
so
just
a
very
short
introduction,
everyone
knows
what
pupini
is
it's
we
are,
you
know
dedicating
and
then
doing
one-to-one
mapping
of
the
virtual
cpus
from
the
vm
to
the
physical
cpu
in
the
physical
holes.
So
it's
done
for
several
reasons.
You
know.
C
One
of
the
reason
is
performance,
so
we
have
some
application
that
it's
cpu
intensive
and
it's
performance,
you
know
and
latency
sensitive
and
cpu
pini
will
improve
the
performance
of
the
vm,
especially
because
it
will,
you
know,
reduce
the
competition
of
resource
of
a
different
process
running
if
different,
vms
running.
In
the
same,
if
all
the
vms
are
pinned,
but
you
run
the
same
holes,
so
it
can
prevent
some
os
noise
and
prevent
some
contact
switching.
C
At
least
it
will
reduce
a
little
bit
the
contact,
switching
for
the
set
of
only
for
the
set
of
physical
cpus
that
the
vm
will
have
access
so
and
the
other
you
know,
motivation
for
have
cpu
pining
is
to
isolate
vms
public
cloud
is
doing
that,
so
they
create
vms
and
isolating
not
only
for
performance,
but
especially
for
security
cell
cpu
and
pinning
it's
something
that
many
people
are
using.
C
C
Should
you
know
in
a
system,
for
example,
cover
that
is
creating
a
lot
of
vms
and
define
what's
the
best
cpu
pini,
and
we
will
talk
about
that
expect
most,
especially
for
the
new
release
of
kubvert
as
fabian
introduced
it
before
with
something
actually
roman
actually
created
the
new
cpu
penny
called
in
the
new.
You
know:
release
okay,
so
regarding
cpu
pinning
something
that
it's
important,
that
comes
is
the
vm
to
polish
okay.
C
So
it
is
especially
important
when
doing
cpu
pini
so
because
it
will
affect
the
performance
of
the
vm.
When
you
are
creating
the
virtual
polish
in
the
vm,
the
vm
can
have,
for
example,
virto
hyper
thread
or
not.
You
are
saving
hyper
thread,
don't
have
like
cores
in
the
vm,
and
this
virtual
topology
impacts
the
performance,
and
this
is
one
of
the
motivations
that
of
our
experiments
and
in
this
presentation.
Okay,
so
leak,
can
you
go
to
the
next
line?
C
Okay,
so,
given
that
we
have
like,
we
can
have
you
know,
virto
hyper
thread
and,
and
not
you
know,
our
disabled
virtual
hyper
thread
have
different
virtual
colors
and
even
the
host
we
can
disable
hyper
thread
and
have
all
the
cores.
So
what's
the
best
you
know
to
polish
the
best
configuration
for
performance
and
and
that
we
can,
we
can
have
with
that.
So
this
presentation
we're
going
to
drive
through
you,
know
just
different
topologies
and
talk
about
performance.
C
Yeah,
we're
gonna
go
to
next
okay.
So
this
is
the
baseline.
What
we
call
perfect
topology
matching
we
have
the
vm
topology.
The
virtual
apollo
is
the
same
as
the
physical
topology
in
the
host.
For
example.
Here
we
have
a
host
with
one
socket
and
it's
a
you
know
theoretical
holes
here.
So
only
two
cores
and
each
core
has
enabled
two
hyper
threads,
and
the
virtual
polish
has
also
the
same
configuration
okay,
so
this
will
be
used
as
the
our
baseline
configuration
to
compare
to
the
other
scenarios.
C
We
have
another
scenario
where,
with
the
host
we
disable
hyperthread
and
but
the
vm
has
hyperthread
enabled,
then
we
want
to
you
know
measure
here.
What's
the
impact,
whether
the
gas
os
has
things
that
they
have
hyperthread,
but
there
is
no
hyperthread
in
the
holes
you
you
can
go
next.
C
The
next
one
is
actually
we
have
hyper
thread
in
the
host,
but
the
the
vm
topology
is
not
aware
about
the
hyper
threat.
It's
only
horrors
in
the
vm
and
we
want
it's
like
a
mismatch
in
the
polish.
We
want
to
show
how
it
will
be
the
performance.
Next,
it's
like
a
plus.
You
know,
since
we
are
doing
cpu
pini,
it's
possible
also
to
pin
cpus
from
different
pneuma
nodes,
different
core
sockets-
and
you
know,
pneuma
nodes.
C
Everyone
probably
is
very
aware
about
about
that,
but
it
will
the
each
cpu
have
access
to
more
memory
bandwidth
because
they
have
different.
You
know
memory
rations
and
also
less
level
cash,
okay,
next,
okay,
the
the
the
another
scenario
that
we
want
to
show
here
is
what
we
call
mismatching
hyper
thread
location.
C
So
this
is
this
is
the
the
problem
that
you
know
a
previous
presentation.
The
kvm
forum
have
before
the
change
that
we
had
in
cpu,
pinning
in
the
coop
vert,
and
it
was
more
or
less
random,
not
not
run
but
kind
of
run.
The
locate
cpus
when
it
was
doing
pini
and
then
was
not
matching
the
topology,
the
virto
the
hyper
thread,
and
then
we
will
show
the
performance
when
we
have
the
scenario:
okay,
the
next
okay,
so
the
the
next
one
it's
to
illustrate.
C
You
know,
because
we
are
talking
about
about
a
lot
about
hyper
thread
here
host.
You
know
disabling
holes
hyper
thread,
so
this
is
the
it's
to
show
the
benefit
of
hyper
thread.
You
know,
even
though
hyperthread
has,
of
course,
as
expected,
is
lower
performance
than
one
core
hybrid
thread.
It's
important,
especially
for
the
scenario
here
when,
for
example,
an
application
can
access
two
cores
when
we
disable
hyper
thread
in
the
hose.
However,
if
we
enable
hyper
thread,
the
application
now
can
access
two.
C
You
know
I
would
not
say
virto
here,
because
we'll
mix
with
the
concept
of
virtual
machine,
but
four
other
cores,
you
know
in
the
holes
and
how
hyper
thread
you
know,
allows
the
to
increase
the
performance
for
application.
When
we
can
run
more
and
increasing
more
hybrid
threads,
not
allows
only
allows
application
to
run
more
threads,
but
also
to
run
more
vms
in
an
old
okay.
Just
to
keep
in
mind
that
the
next
one,
okay,
so
the
next
one
we
want
to
compare.
C
You
know
the
performance
of
the
perfect
matching
scenario
with
pini
against
kvm
versus
cooper,
just
to
show
you
know
how
both
are
using
library
to
create
vms.
So
it's
the
same
thing
same
versions.
However,
coop
is
running
a
kubernetes
cluster
and
inside
the
container,
and
we
want
to
you
know,
highlight:
what's
the
performance
difference
here.
B
Okay
and
thank
you
marcin,
I
guess
this
is
my
part
now
I
will
talk
about
some
background
about
hyper
thread.
I
guess
the
first
natural
question
to
ask
is:
why
do
we
need
to
use
hyper
thread?
So
I
think
this
might
be
obvious
to
you.
There
are
certainly
a
lot
of
issues
related
to
hyper
threading,
like
cache,
thrashing
where
threads
are
competing
for
those
low
level
caches
and
also
some
previous
studies
actually
have
proved
that
hyperthreading
has
higher
latencies
compared
to
a
dedicated
physical
core,
but
they
also
come
with
some
benefits.
B
B
They
only
increases
the
the
die
size
by
less
than
five
percent,
but
it
was
potentially
more
than
30
percent
gain.
That
means
they
add
less
transistor
current
with
more
throughput.
This
is
quite
important
because
traditionally,
if
you
want
to
let's
say,
increase
the
cpu
performance
by
30
percent,
you
might
need
more
than
30
percent
of
a
transistor
current.
B
That's
actually
not
very
power
efficient,
which
means
that
you
might
get
more
bills
on
the
electricity
cost
and
another
obvious
benefits
is
that
you
can
run
more
vm
per
node
as
marcelo
said
so
for
experiment.
We
run
the
micro
benchmark,
nasa
parallel
benchmarks
with
those
computational
kernels
and
some
pseudo
applications
they're,
basically
doing
some
sort
of
matrix
computation
and
tasks
using
cpu
intensively.
B
I
wrote
a
simple
batch
script
to
automate
the
whole
task,
where
you
can
actually
modify
the
xm
file
on
the
fly
and
launching
the
vms
running
the
benchmarks.
Inside
of
for
multiple
times.
For
each
of
the
scenario,
we
run
two
parallel
tasks,
except
for
one
experiment
where
we
wanted
to
see
how
much
throughput
gained
from
hyper
threading.
So
we
allocated
two
cores
which
it
was
a
hyper
threading
hung
compared
with
two
chords
with
hyper
threading
off.
B
So
for
the
first
case,
we
we
run
four
parallel
tasks
compared
two
tasks,
so
we
wanna
see
how
much
gain
we
get
here
is
our
test
bed
for
the
host.
We
got
32
vcpus
with
32
ram,
so
we
we
got
two
pneuma
nodes
on
this
host,
where
we
have
two
sockets
and
eight
cores
for
each
socket
and
where
you
can
able
to
disable
hyper
thread
for
the
vm.
B
Most
of
the
cases
were
allocating
four
vcpus,
with
an
ram
with
a
pre-allocated
disk
image
for
the
os
hosts
were
using
ubuntu,
but
for
the
guests
we're
using
opensuse.
B
The
reason
for
this
is
because,
for
the
previous
talk
on
kvm,
I
collaborated
with
susie
engineer,
so
I'm
too
lazy
to
to
change
to
a
different
os.
So
I
just
stick
with
the
susie
distribution.
I
I
hope
it
doesn't
really
make
much
difference,
but
it's
important
actually
to
for
us
to
make
sure
the
qemu
and
the
library
version
are
consistent
with
the
the
one
that's
shipped
by
covert,
so
we
can
have
an
apple
to
apple
comparison.
B
B
And
here
is
our
result
of
the
first
comparison.
We
compared
compared
the
best
language
scenario
too,
where,
let's
see
you
have
you
disable
the
hyper
threading
on
the
host
side,
but
enable
the
hyper
threading
inside
of
the
guest
I
was
expecting.
The
impact
should
be
minimal,
which
is
actually
true
for
most
of
the
test
cases.
B
Similarly,
for
this
case,
where
we
have
hyper
threading
disabled
on
the
host
as
well
as
inside
of
the
guest,
so
we
have
a
matching
topology
again.
This
impact
is
very
minimal,
but
mg
showed
some
interesting
performance
differences
out
there.
B
Things
got
really
interesting
where,
let's
see
you
have
the
hyper
threading
turned
on
in
the
host.
But
what,
if
you
turn
on
turn
off
the
hyper
thread
inside
of
the
vm
guest?
This,
where
you
have
a
mismatch
in
the
topology-
and
this
is
actually
a
real
issue,
because
the
guest
scheduler
is
not
hyper
threat
aware.
So
there
is
a
50
percent
of
chance
that
you
have
a
sibling
contention.
B
So,
as
you
can
see
that
the
performance
drop
is
quite
significant,
which
up
to
35
percent,
another
scenario
is
we
put,
we
pinned
the
the
cpus
to
different
socket,
so
the
benefit
of
that
is
those
tasks,
got
access
to
more
of
those
lower
level
caches,
also
with
higher
memory
bandwidth.
But
since
our
application
is
quite
small,
we
expect
this
memory
battle
with
won't
make
much
difference.
But
then
you
can
see
for
both
integer
salt
and
this
congruent
gradient.
B
They
are
showing
quite
a
big
throughput
difference.
The
reason
is
because
they
require
inter-process
communications
when
we're
running
those
two
tasks,
so
you
need
to
access
data
from
remote
memory,
memories
from
a
different
luma
node.
That
gives
you
quite
a
bit
of
performance
penalty
for
scenario,
six,
we're
comparing
the
case
where
what
the
cooper
issue
they
had
in
the
past,
where
you
have
a
totally
mismatching
hyper
thread
allocation,
we're
basically
forcing
the
siblings
to
compete
with
each
other
for
the
resources.
B
So
for
all
the
benchmarks-
and
you
can
see
there
is
a
significant
performance
drop
up
to
like
a
thirty
percent
for
scenario,
seven
seven.
This
is
the
case
where
we
wanna
check
like
how
much
throughput
we
get
through
gain.
We
get
from
hyper
threading,
so
we
are
comparing
two
thread
with
two
cores
like
with
hyper
threading
able,
and
then
we
disable
the
hyper
threading
by
only
allocating
to
dedicated
cores.
So
this
is
a
throughput
gain.
B
You
get
as
you
can
see,
this
ep
benchmark,
which
is
called
embarrass,
the
parallel
the
reason
you're
getting
60
60
performance
gains
because
they
also
called
the
perfect
parallel
because
the
the
tasks
that
really
requires
little
almost
no
communication.
B
This
is
quite
important
for
some
tasks
like
image
processing,
because
you
can
just
process
those
individual
frames
independently
without
any
dependency.
B
Lastly,
which
is
something
most
of
you,
might
be
very
interested
wanting
to
know
what
is
the
difference
between
running
a
kvm
vm
and
covered
vm?
So,
as
you
can
see
that
the
performance
difference
is
really
small,
but
we
run
multiple
executions,
those
performance
difference
always
exists,
so
we
suspect.
The
reason
is
because
in
cooper,
kubernetes
components
are
running
a
lot
of
background
processes
like
kubernetes
agent
or
container
d,
which
might
be
computing
resources
with
the
vm
cpus.
B
So
for
final
consideration,
if
you
really
want
to
take
this
cpu
pin
into
the
next
level
what
we
suggest
you
can
actually
either
use
iso
cpus
or
cpu
set
for
the
kernel
boot
parameters,
you
can
actually
isolate
those
cpus
from
the
host
scheduler.
So,
in
a
way
they
minimize
the
cpu
preemption
and
also
I
found
out
that
when
you
have
this
dedicated
cpu
placement
enabled
it
also
actually
have
enabled
this
kbm
hand
dedicated
thing,
which
is
some
sort
of
a
power
virtual
edition.
B
Allow
the
guest
is
kind
of
aware
they're
running
on
top
of
kvm,
but
we
don't
know
how
much
a
performance
impact
this
has,
and
I
talked
to
of
the
maintainers,
and
they
said
this
thing
didn't
really
go
very
well.
Another
thing
you
could
do
is
you
can
actually
use
this
iso
isolate
emulator
thread
which
reduces
the
log
contention
and
also
you
can
increase
this
huge
pitch
size
which
allows
you
to
do
faster
pitch
walk
as
well
as
reducing
the
tob
pressure.
B
So
there
is
essentially
a
trade-off
and
you
want
to
choose
like
whether
you
want
to
do
things
more
automatically
or
you
want
to
do
things
manually.
So
as
usual,
there
is
no
solution,
one
solution
and
solve
all
the
problems,
so
you
need
to
make
the
trade
off.
And,
lastly,
I
think
the
opinion
issue
is
indeed
fixed,
which
is
quite
a
good
news.
B
A
A
B
A
Yeah,
that
would
be
really
cool
also
for
for
real-time
performance,
I
think
would
be.
A
B
I
think
the
of
the
most
meaningful
benchmark
is
called
the
ep,
which
is
in
embarrassedly
parallel
that
that,
basically,
is
the
benchmark
that
requires
no
dependent
dependencies
between
tasks.
So
this
is
actually
quite
a
good
example
for
what
I
said
in
the
exam
called
a
presentation
called
image,
processing
tasks-
yeah,
that's
one
of
them,
and
another
thing
is
some
of
them
actually
do-
require
some
sort
of
like
dependencies
like
inter-process
communications,
which
can
be
a
good
representative
for
general
applications.
I
think.
A
A
Okay
seems
like
yeah:
it
seems
to
be
answered.
Okay,
okay,
great.
There
is
no
other
question
coming
up
so
lee
and
marcelo.
Thank
you
again
for
the
great
talk
and
we
will
be
back
10
minutes
past
for
utc
time.
That's
in
roughly
13
minutes.
Okay!
Thank
you.
Thank
you.
Thank
you
marcin.
Thank
you.
Roman.