►
From YouTube: 2 - Introduction to HPCToolkit
Description
Part of the Using HPCToolkit to Measure and Analyze the Performance of GPU-accelerated Applications Tutorial, Mar-Apr 2021. Slides available at https://www.nersc.gov/users/training/events/hpctoolkit-for-gpu-tutorial-mar-apr-2021/
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
So
that,
I
think,
is
a
coming
attraction.,
so
here,,
it
just
says
like,.
This
is
inline
code
that
came
from
this
file.,
but
we
don't
know
what
function
it
came
from
'cause.
We
just
don't
have
that
information,
just
we
know
that
it
came
from
somewhere,
whereas,,
the
new
information
that
they're
producing,.
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
We
need
to
know
what's
happening,
inside,
or
experimentally.
We
determine
what's
happening
inside
and
say,
"oh,,
cuda
or
cupti
is
creating
lots
of
threads
just
for
measuring.",
and
so
we
shouldn't
measure
the
threads
that
they're
creating
for
measuring.,
and
so
that
leads
to
some
exploratory
development.
A
A
A
A
E
F
A
But
we
can
compute
utilization
by
just
saying
so
we're
using
samples,
and
this
is
how
many
samples
we
expected,
based
on
the
clock
frequency..
This
is
how
many
samples
we've
got.,
and
so,
if
we
get
less
samples
than
we
expected,
then
we
can
infer
that
the
sms
were
idle
because
they
weren't
collecting
samples..