►
From YouTube: 07 - Debugging Tools
Description
Part of the NERSC New User Training on September 28, 2022.
Please see https://www.nersc.gov/users/training/events/new-user-training-sept2022/ for the training day agenda and presentation slides.
A
Yeah
go
ahead:
okay,
so
I'm
going
to
be
talking
about
debugging,
It,
On,
Prom
water,
it's
one
of
the
more
difficult
topics
when
you're
trying
to
Port
your
code,
so
I'm
going
to
kind
of
go
over
some
of
the
tools
and
how
you
can
use
them
with
the
different
types
of
programming
models
and
the
different
types
of
Hardware
that
we
have
the
most
common
ones.
We
have
are
DDT,
which
is
used
by
majority
of
our
users
and
total
view.
A
These
are
kind
of
full-fledged
GP,
GPU
CPU
debuggers
that
use
a
bunch
of
different
programming
models,
they're
graphical
user
interface,
a
lot
of
different
ways
of
doing
things,
there's
some
specific
tools
from
Nvidia
coded
GDB,
which
is
just
a
GB
with
an
attached
Cuda
extension
and
then
there's
a
cute
sanitizer,
which
does
a
couple
of
different
things
for
finding
memory
related,
bugs,
there's,
also
a
GDB
for
HPC,
which
is
a
GDB.
A
That's
meant
for
doing
GDB
like
things
but
against
parallel
programming
models
and
there's
a
vowel
grind
for
HPC,
which
is
again
similar
where
it's
taking
Bell
grind
and
applying
it
towards
finding
memory
related
bugs
and
stuff
like
that.
A
But
against
parallel
programming
and
there's
also
a
special
series
of
tools
called
stat
and
ATP
that
are
related
that
are
good
for
finding
crashes
and
Deadlocks
because
they
take
a
look
at
where
your
program's
at
according
to
its
back
trace
and
then
they
kind
of
merge
the
back,
trace
and
show
you
where
you're
going.
A
But
before
you
start
debugging,
there's
a
few
things
you're
going
to
want
to
do.
You're
going
to
want
to
set
up
a
remote
connection,
everybody's
kind
of
talked
about
this
use
no
machine.
It's
a
better
performance
than
a
traditional
X11
forwarding,
although
both
DDT
and
total
view
have
their
own
options.
For
this.
A
A
So
I
put
the
options
in
here
for
how
you
want
to
do
that
with
cn4tran
and
then
with
Cuda
there's
the
host
options,
which
are
Dash,
G
and
dash
o0,
and
then
the
capital
G,
is
what
turns
on
the
the
device
debugging
information
for
Cuda
the
dash
capital.
G.
Excuse
me.
A
You
need
to
set
up
your
environment
in
a
way
so
that
you
can
create
core
files.
So
you
need
to
tell
your
shell
that
you
want
to
be
able
to
create
core
files
of
an
unlimited
size.
Otherwise,
it's
not
going
to
be
able
to
create
core
files,
and
you
want
to
tell
your
programming
models
that
if
they
find
an
error,
if
they
or
they're,
going
to
abort,
go
ahead
and
dump
core
file
as
well.
A
A
special
notice
on
these
CR
on
the
cray
tools,
especially,
is
that
they
use
something
called
craze
CTI,
which
is
the
common
tool
interface.
This
helps
them
work
with
use
common
code
to
work
with
job
launchers,
such
as
slurm,
and
it's
tied
into
a
lot
of
these
tools.
So
you
need
to
have
the
module
loaded
and
for
our
particular
use.
You
need
to
be
setting
this
environment
variable
CTI,
wlm,
Implement
impl,
which
is
the
CTI
workload
manager,
implementation
and,
in
this
case,
we're
using
slurm.
A
Here's
how
you
kind
of
allocate
your
nodes
for
debugging
everybody's
kind
of
talked
about
this
as
well.
You
want
to
use
your
CPU,
you
make
sure
to
set
the
constraint
for
CPU
same
with
GPU
and
then
you're
going
to
want
to
use
interactive
or
debug
depending
upon
how
long
you
need
the
node
for,
and
here's
a
link
here
to
the
limits
and
charges
that
you
can
use
for
setting
up
Qs
on
the
locations.
A
First,
one
we're
going
to
talk
about
is
called
DDT.
It's
a
distributed.
Debugging
tool
supports
a
bunch
of
different
parallel
programming
models
like
MPI
openmp,
open,
ACC
Cuda.
It
supports
python,
c4trans,
C,
plus
plus.
Originally
it
was
developed
by
a
company
called
the
linets
now
owned
by
a
company
called
arm
that
develops
processors,
schematics
and
licenses
out
processor
information,
but
they
do
develop
software
as
well,
and
that's
one
of
the
reasons
that
they
picked
up
DDT.
A
A
There's
some
extra
documentation
here,
but
I'm
going
to
just
kind
of
show
you
some
screenshots
of
what
DDT
looks
like
here.
A
So
once
you
open
it
up,
it
kind
of
gives
you
the
option
to
either
run
them
or
attach
to
some
kind
of
program
or
service.
You
can
open
up
core
file.
You
have,
if
you
like
as
well
or
you
can
manually
launch
the
back
end.
That's
using
some
of
the
remote
launch
stuff.
Is
there
as
well?
A
A
You
can
see
off
to
the
left
that
it
kind
of
breaks
up
the
file
or
in
the
the
source
code
into
different
functions,
looks
like
kind
of
your
run-of-the-mill
IDE.
You
know
has
line
numbers
on
the
right
side.
Allows
you
to
check
the
stack
on
the
right
side.
Gives
you
the
source
in
the
middle,
on
the
left.
It's
showing
you
the
current
stacks,
and
it
gives
you
some
input
other
tabs
there
that
for
input,
break
points,
watch
points.
A
Trace
points
and
logging
chose
the
processes
on
the
top,
and
it
gives
you
a
bunch
of
buttons
at
the
top
for
doing
navigation
like
stopping
your
program
starting
your
program,
stepping
through
things
like
that.
A
Here's
some
stuff
where
it's
also
using
a
Cuda
kernel
and
you
can
see
at
the
bottom.
The
stack
has
the
has
a
few
additional
things
that
allow
for
kernel
space
or
a
Cuda
specific
functionality.
A
Here's
some
more
specific
information
on
how
and
where
everything
kind
of
works
you
see.
You
got
your
processing
entity
to
control
up
top
navigation,
as
we
talked
about
before
you
can
right
click
on
a
variable
within
the
list
in
the
middle
and
that'll
give
you
the
information,
keep
it
the
sparklines
at
the
bottom.
You
can
evaluate
expressions
based
on
whatever
the
current
data
is,
and
then
there
on
the
left.
You
can
see
the
stack
frame.
A
Here's
some
more
of
the
specific
Cuda
type
features
you
can
see
the
GPU
devices
and
the
image
on
the
right.
The
kernel
progress
on
the
left,
like
where's,
in
progress
for
on
the
device
and
the
Cuda
stack
in
relation
to
the
c-stack
as
well.
A
Another
alternative
to
this
is
total
view.
This
is
a
similar
system.
It
just
has
a
few
different
features
supports
a
lot
of
the
other
same
stuff.
It
was
developed
by
a
different
company,
but
now
it's
owned
and
developed
by
Earth
Force
has
two
different
options:
a
remote
client
that
you
can
download,
as
well
as
a
remote
connection
that
can
you
can
use
you
just
module,
load,
total
View
and
run
that,
and
you
can
also
get
more
information
from
both
rdocs,
their
docs
and
the
man
page.
A
We
also
have
an
upcoming
training
session
I
believe
that
that
is
tomorrow,
yep
Foreigner
for
total
view,
if
you're
interested
in
more
training
click
on
the
link
there
at
the
bottom-
and
you
can
sign
up,
it
has
two
different
interfaces,
because
apparently
people
don't
like
new
things.
So
the
first
one
here
is
a
kind
of
a
view
of
their
newer
interface
and
what
everything
looks
like
again.
The
similar
features
to
what
you
would
expect
from
an
ID
or
a
debugger
very
similar
to
what
is
in
DVT.
A
You
see
your
processes
and
kind
of
where
they
are
at
in
the
stack
on
the
left.
You
have
your
Source
listings
in
the
middle.
You
have
some
action
points
and
bookmarks
down
at
the
bottom
left.
You
have
your
loggers,
your
command
line
and
your
data
on
the
bottom,
the
right
side,
you
have
variables
and
their
value-
and
you
have
the
call
stack
in
the
upper
right
here-
is
the
classic
interface
as
they
call
it.
A
It's
a
very
older,
X11
interface,
but
a
lot
of
people
are
very
used
to
this,
and
so
they
like
to
use
it
here.
You
have
again
some
pointers
to
different
features
in
here
where
it
is
based
on
the
GPU
and
the
CPU.
You
have
some
different
Focus
areas.
A
You
have
the
threading
at
assigned
positive
thread
IDs
at
the
bottom.
You
have
ways
to
select
MPI
tasks,
to
set
your
breakpoints
and
your
threads
at
the
bottom.
You
have
the
source
code
listing
again.
You
can
see
the
value
or
dive
into
it
when
you
Mouse
over
in
the
middle
there
on
the
source
code,
you
have
a
window
off
on
the
left.
That
shows
the
state
of
the
MPI
tasks
and
where
everything
is
at
and
you
have
a
stack
frame
and
stack
trace
on
the
upper
middle.
A
A
So
you
can
use
it
just
like
you
would
use
GDB,
except
that
it
also
has
some
Cuda
options
where
you
can
just
type
in
help
Cuda,
and
it
will
get
you
some
more
information
on
what
to
do
there.
There's
docs
here
that
you
can
see
from
Nvidia
doesn't
do
other
types
of
programming
models.
It
just
uses
Cuda
right
now
it
doesn't
do
anything
like
MPI
or
openmp
and
then
or
MPI
or
other
big
models.
You
may
also
do
openmpm
I,
don't
remember
here's
another
one
called
compute
sanitizer
that
was
originally
called
cudament
check.
A
This
is
a
drop-in
replacement,
I
believe
they're
using
the
type
of
sanitizer
stuff
that
you
would
find
in
either
llvm
or
valgrind
again
developed
by
Nvidia
use.
Dynamic
instrumentation
at
compile
time
does
kind
of
a
s
run,
compute
sanitizer
and
you
pop
in
the
tool
that
you
want
to
use
and
use
the
program.
A
So
they
have
mem
checker
for
race,
checker,
nationalization
Checker
and
a
sync
checker
I'm,
sure
they're,
going
to
add
more
Checkers
as
they
go
along
based
on
their
tooling
and
based
on
what
llvm
probably
produces
and
there's
some
more
documentation
on
how
to
use
this
tool
here
at
the
bottom
GB
for
HPC.
This
is
another
great
tool.
A
A
A
Launching
a
process
set
named
dollar
P
of
eight
tasks
for
an
application
called
PCM,
starts
up
Network
in
the
background
and
connects
all
of
the
debug
servers
to
it,
and
then
it
sets
an
initial
break
point
at
Main
of
the
of
the
app
and
you
can
see
p0.7
means
process
set
0.7
because
I
named
it
dollar
P
it's
using
the
P
there.
A
So
if
I
do
a
listing,
then
of
where
I'm
at
based
on
that,
you
can
see
I
get
the
first
line
of
function
main
there
and
if
you
do
a
view
set
of
dollar
P,
it
shows
you
all
of
the
processes.
So
this
allows
you
to
do
different
types
of
process
sets
and
to
run
multiple
apps
and
see
their
different
communication.
A
Set
a
break
point
here
on
line
31
of
main
notice
that
if
I
print
out
try
and
print
out
the
rank,
which
is
a
data
entry
data
point
in
the
app
that
it's
currently
set
at
zero,
because
we
haven't
quite
reached
that
part
in
the
code.
Yet.
A
A
Similarly,
there's
a
valgrind
for
HPC
again,
this
uses
a
bunch
of
different
tools
to
do
like
mem
checks
and
does
dynamic
instrumentation
the
compile
time
again.
This
doesn't
support
HP
gpus
at
the
moment,
but
it
supports
other
different
types
of
programming
models
like
MPI
and
what
it
does
is
it
kind
of
runs
valgrind
against
each
of
your
MPI
processes
and
Aggregates
the
data
into
a
more
reasonable
readable
report
rather
than
having
you
know,
end
tasks
number
of
reports.
A
A
So
you
can
try
on
some
of
that
stuff
sanitizers
for
HPC.
This
is
a
direct
tool
from
Craig
again,
that
is
using
llvm
type
sanitizers
and
it's
using
the
same
idea
of
that
run.
A
sanitizer
against
each
one
of
the
processes.
A
Aggregate
the
reports,
except
that
these
sanitizers
use
static,
instrumentation
at
compile
time
rather
than
what
the
insights
or
the
Nvidia
compute
sanitizer
and
the
Crave
algrind
tools
are
doing,
and
if
you
have
a
very
CPU
intensive
application,
these
compile
time
instrument
or
static
instrumentations
that
compile
time
are,
can
save
you.
Some
it'll
save
you
some
time
because
they're
going
to
lower
the
overhead
due
to
the
way
that
the
instrumentation
is
inserted
into
the
program.
A
They're,
like
I,
said
they're
based
on
the
llvm
and
they
support
gpus
with
Cuda
mem
check
and
support
CCE
and
GC
again
you're
just
going
to
want
a
module
swap
to
cray.
At
this
point
they
do
support
gzz,
but
I
prefer
to
use
cray
for
this.
A
You
use
the
option
here:
F
sanitize.
You
need
to
add
that
to
your
compile
line,
make
sure
that
you're
sanitizing
against
the
right,
sanitizer
and
the
sanitizers
here
listed
are
for
address,
leak
and
thread
and
I
put
in
some
documentation,
both
on
the
sanitizers
event
page
and
the
original
sanitizers
as
they
are
on
Google.
So
we
have
stack
traits
analysis
tool.
A
This
kind
of
attaches
to
your
processes
as
they
are
running
and
tries
to
look
for
Deadlocks
what
it
does
is
analyze
each
one
of
the
processes
and
gets
a
stack
trace
or
a
back
Trace
and
then
emerge
them
together.
So
you
can
see
the
different
places
where
your
application
might
be
in
the
code.
A
A
Atp
allows
you
to
do
this
in
a
more
automated
nature,
rather
than
having
to
start
stack
on
its
own.
You
can
just
in
a
module
load,
ATP
set
some
variables
and
then
once
the
application
dies,
or
you
send
a
termination
signal
to
the
application,
it
will
automatically
dump
some
stat
information
for
you,
as
well
as
the
core
files
that
it
selectively
chooses.
It
won't
print
out
all
it
won't
send
out
all
the
core
files
you
can
control,
which
ones
are
sent,
but
it
will
send
a
selection
based
on
the
back
trades.
A
There
are
some
optional
and
other
things
that
should
be
noted.
You
can
set
the
GDB
binary
to
be
whatever
GDB
version
you
want
in
ATP
that
can
be
useful
because
internally,
it
normally
uses
stat
to
identify
the
back
traces,
but
in
this
case
it
would
use
GDB,
which
sometimes
can
be
a
little
more
useful,
there's
also
for
Fortran
and
gnu.
A
You
need
to
make
either
a
compiler
or
an
environment
variable
change
to
use
ATP
because
they
both
use
their
own
back
Trace
information
and
again
you
just
pretty
much
s
run
your
program,
determine
that
terminates
or
it
gets
a
signal
and
you
stat
view
the
dot
files
that
come
out
the
dot.
The
files
that
are
outputted
are
all
in
dot
format,
so
you
can
also
look
at
them
in
gnu
plot
or
anything
that
supports
the
dot
format.
A
This
is
kind
of
an
idea
of
what
it
looks
like
here.
You
can
see
right
away.
That's
something
took
a
fault
into
a
summary,
and
you
can
see
from
the
side
here
that
it
ranks
three
through
seven
of
the
eight.