►
From YouTube: Intro to GPU: 06 Debugging on GPU
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hi
everyone
welcome
back
to
afternoon
session.
My
name
is
Hasan
yang
and
I
met.
The
I
am
working
in
the
nurse,
give
user
engagement,
group
and
I'm
going
to
talk
about
the
debugging
on
GPU,
so
maybe
I'm
severely
underestimate
under
estimating
the
number
of
threads
here,
but
I
can
cut
only
up
to
a
thousand.
A
So
what
you
know
the
problem
here
is
that
we
are
running
lara
lara
threads,
so
that
is
very
difficult
to
to
know
that
who
is
doing
what
and
when
the
error
occurs.
Where
does
it
occur?
It's
very
difficult
and
in
the
old
days,
the
very,
very
good
role
of
using
print
statement.
It's
not
going
to
work
here.
Okay,
so
I
used
up
instrument
a
lot
for
my
thesis.
You
know
project,
but
it
in
other
work.
So
you
have
to
use
the
debugging
tools.
So
this
is
a
these.
A
Are
tools
that
I'm
going
to
cover
today,
so
that
gdb
is
a
kind
of
gdb.
It's
an
extension
version
for
CUDA
and
up
to
demand
check.
There
is
a
demon
reminiscent
of
vagrant
man
check
and
we
have
a
degree
pretty
good.
The
GUI
parallel
debug
recorded
to
review.
We
have
another
popular
tool
called
DDT,
but
unfortunately
we
don't
have
a
license
for
GPU.
So
for
the
time
being,
we
have
to
use
all
three
of
them.
A
So
what
is
a
CUDA
gdb?
This
is
the
extension
of
Agnew
gdb
for
debugging
crÃticos.
You
can
use
this
for
debugging,
both
CPU
and
GPU
code
within
the
same
application.
So-
and
this
is
a
command
line
mode,
and
this
is
basically
for
non
MPI,
a
debugging
non
MPI
code
here,
but
if
you
want
you
can
try
this
kind
of
trick
to
use
this
small
number
MK
ranks
and
the
the
one
thing
that
you
can
note
you
should
notice
is
that
when
you
refer
to
some
CUDA
entity,
you
can
just
add
a
CUDA.
A
Just
like
a
CUDA
CUDA
thread.
One-Seven,
you
are
switching
from
whatever
thread
you
are
on
to
thread
170.
So
it
is
a
different
from
the
what
the
that
there's
kind
of
main
difference
between
the
gr
Kundu
gdb
and
the
very
good
materials
that
you
can
use
for.
Learning
the
CUDA
gdb
is
the
the
just
user's
manual
provide
a
Nvidia,
so
modulo
2,
CUDA
and
then
go
there.
A
Get
this
PDF
file,
there's
very
good,
and
so
what
you,
what
you
should
do
with
the
gdb
you
set
the
breakpoints
and
by
the
way,
the
what
points
is
not
supported.
What
points
do
you
know
the
difference
between
watch
points
and
breakpoints
preferences
so
where
the
coaches
should
stop?
When
you
run
it,
you
preset
where
the
code
you
should
stop,
so
that
you
can
check
the
variable
values.
What
point
is
the
program
will
start
stop
when
the
certain
variables
values
changes.
A
You
set
the
watch
points
for
certain
variables,
it's
a
pretty
useful
tool
for
debugging
memory,
Carib
issue,
but
the
CUDA-
because
probably
there
are
you
know
too
much
resources
here.
So
they
don't
support,
watch
point
here.
So
anyway,
you
can
set
breakpoints
and
you
can
run
the
code
or
continue
and
the
window
could
stops.
You
can
just
check
the
values
of
the
variables
or
status
of
the
program,
so
these
are
three
major
main.
A
You
know
workflow
with
the
debugging
set,
the
breakpoints
run
it
and
when
the
CUDA
stops
then
to
see
examine
so
there
are
noting.
Is
that
the
you
can
run
another
sign
go
to?
This
is
a
second
debugger
time
that
I'm
going
to
talk
about
you
see
after
afternoon,
but
you
can
run
that
second
debugger
under
CUDA
gdb,
okay-
and
there
are
not
a
nice
thing
that
I
find
is
auto
step
the
CUDA
gdb.
A
What
it
does
is
that,
because
we
are
dealing
with,
as
so
many
stress
right,
we
can
specify
certain
suspicious
area
in
the
code
like
no
line
three
of
three
lines
or
something
like
that.
So
in
their
particular
three
lines,
the
gdb
will
examine
very,
very
closely
hold
the
steps
here.
So
it
is
single
stepping
but
the
the
rest
of
the
coast.
You
run
fast
right,
so
when
the
cool
stuffs
there
we
can,
we
can
see
where
the
could
fails
in
the
what
resolution
level.
So
this
is
a
really
powerful.
A
I
think
that
this
is
a
very
interesting
aspect
here
and
not
things
that
you
may
want
to
generate
your
core
dump
right
code,
because
you
want
to
examine
where
your
code
fails.
I
think
that
the
four
most
important
thing
with
a
debugging
when
you
have
a
code
bug
is
that
is
to
know
where
the
code
fails.
A
Once
you
find
that
out,
you
solve
that,
we
have
the
problem
here
right
and
then
you
can
do
a
lot
of
thing
print
statement
there,
but
with
a
code
them
you
can
quickly
identify
whether
CUDA
fails
and
then
you
can
check
the
variable
values.
So
this
is
one
way
to
get
your
coat
on.
So
how
do
you
run
the
CUDA
gdb?
You
need
to
build
with
the
gee
kappa
ogi
flag
to
get
the
the
debug
information
on
cpu
side,
as
well
as
the
GPU
side.
A
If
you
use
a
PGI
for
PGI
or
Fortran
put
a
Fortran,
you
use
that
way
and
the
start
here
you
load
up
the
model
and
do
you
run
the
a
strong
command?
Please,
you
know
don't
forget
to
Adam
PTI,
because
this
is
really
necessary
to
run
all
the
CUDA
commands
interactively.
Okay,
the
another
concept
here
is
the
corner
focus.
We
are,
as
I
said,
that
we
are
running.
You
know
hot.
You
really
is
so
many
threads
here,
but
I
can
only
exam
in
one
thread
at
a
time
so
I'm,
focusing
on
whatever.
A
A
So
so,
what
you
need
to
do
to
go
to
a
different
thread?
You
can
use
the
either
hardware
coordinates
or
software
coordinates.
Hardware
coordinates
is
like
a
device.
We
have
a
GPS
on
a
single
node
right.
So
if
we
are
using
one
device,
one
GPU,
then
you
have
just
one
device.
So
on
volta
we
have
eighty
sm's
string
multiprocessors
and
we
have
a
warp
in
their
fam
and
the
line
were
with
a
group
of
32
threads
right,
so
lame
means
that
each
of
these
32
thread
I'm
all
right.
A
One
of
these
and
software
coordinates
is
I'm.
Sorry,
I
think
that
the
I
made
a
mistake
here.
Software
coordinates
Cano,
which
Colonel
I'm
running
and
the
create
blog
a
thread.
You
know
you
know
that
these
kind
of
basic
entities-
we
look
for
the
programming,
so
to
know
that
if
you
are
coding,
I'm
on
I
can
just
crew
that
device.
This
is
a
hardware
coordinates
here
so
I'm
on
the
device.
0
sm
0
warp,
0
Lane
0
to
the
kernel
block
threat.
Then
it
either
printed.
A
The
software
coordinates
corresponding
to
that
one
and
if
I
switch
to
a
different
thread,
I
can
say:
I
want
to
go
to
divide
zero,
but
the
SM
one
warp
to
Lane
3,
then
BOOM
I'm
into
that.
Yet
the
particular
thread
here.
So
this
is,
you
can
go
to
chapter
11
point
1.
There
are
some
examples
here.
It's
a
pretty
good
example.
The
pit
reverse
is
kind
of
you
know
for
bad
words,
but
you
change
the
order
in
each
in
each
bike
right.
A
So
it's
pretty
simple,
but
you
can
just
follow
this
step,
so
modulo
CUDA
build
it,
run
it
and
set
a
breakpoint
in
the
main
function
and
the
set
a
breakpoint
in
the
colonel
name
is
bit
diverse.
Here
you
can
set
the
breakpoint
at
line
21
and
run
it,
and
they
either
stop
at
the
first
breakpoint
here
and
then
you
can
examine
certain
things
and
you
can
continue
and
you
can
you
sometimes
you
can
forget
about
the
know
where
you
are.
A
A
He
shows
that
back
trace
from
the
GPU
side
here,
because
I'm
in
the
kernel-
and
you
can
you
can
ask
about
Korea
about
the
kernel
itself
like
that,
and
you
can
print
black
ID
you
can
you
can
print
a
lot
of
you
know,
program,
related
entities,
block
index,
credit
dimension
and
then,
if
you're
going
to
the
next
line,
just
type
next
etc.
You
can
pretty
array
values
check.
Make
sure
that
the
these
are
reasonable
values.
There's
something.
A
If
you
see
something
is
wrong,
then
you
know
you
need
to
go
back
from
that
moment
on
to
see
that
why
you
are
getting
the
wrong
values.
This
is
a
parameter
for
the
color.
So
if
you
do
that
this
is
it
will
show
that
the
these
parameters
basically
printer
as
the
values
here.
So
you
can
the
dereference
it
to
see
the
value
again,
you
can
switch
to
thread
170
for
whatever
reason,
and
they
do
something
and
then
you
can
create
it.
So
this
is
a
pretty
typical.
A
This
this
code
is
not
does
not
have
an
error,
but
you
can
test
it
today.
Another
example
is
auto
step.
I
said
that
all
step
is
pretty
pretty
useful
tool
to
me,
but
the
example
code.
There
doesn't
seem
to
work
for
whatever
reason,
but
anyway,
it
clearly
demonstrates
that
it
is
really
very,
very
useful.
It
will
be
very
useful
because
you
don't
know
where
the
Cordillera
is,
but
you
set
the
these
ranges
of
coder,
where
the
code
will
run
slowly
then
you'll
find
that
either
stop
it.
A
A
Just
like
a
very
grind,
it
is
made
of
several
tools,
so
the
only
probably
the
same
thing
is
a
man
check.
This
is
to
detect
the
any
memory
issues.
Memory
errors
raise
check.
This
can
be
pretty
useful
if
you,
your
code,
has
some
race
condition
between
the
threads.
You
can
detect
it,
and
the
in
each
at
this
is
pretty
minor
stuff,
because
this
only
detects
about
uninitialized
variables
sink
attack.
You
detect
something
carer
again
this
this
man
is
pretty
useful
to
build.
A
You
follow
this
step,
so
the
first
first
tool
in
that
the
mentor
this
is
to
to
detects
all
the
thing
you
know:
memory
access
error,
just
like
a
mellow
free,
the
will
free
invalid
pointer
to
free,
hip
corruption,
etc,
and
besides,
that,
you'll
also
detect
some
strange
kind
of
collections
of
error.
Hardware
exception
could
I
API
error
checks,
but
another
important
thing
is
a
memory
leaks
right.
You
allocate
the
memory,
but
you
forgot
to
T
allocate
when
people
you
get
out
of
the
corner,
for
instance.
A
For
instance,
if
you
want
to
detect
the
memory
leaks
added
this
flag
here
and
as
I
said,
this
man
check
tool
can
be
run
on
the
CUDA
gdb
inside
the
gdb,
but
there's
a
1
KB.
Yet
here
so
if
we
run
the
mem
check
on
the
CUDA,
gdb
kernel
launch
will
become
synchronous.
You
know
that
when
the
host
decide
the
Econo
Lodge
I
mean
the
kernel
and
should
be
non-blocking
right
right.
A
A
A
So,
race
racetrack
this
as
I
said
that
is
it
to
detect
the
race
condition,
but
this
only
currently
is
supports
for
shared
memory.
I
mean
sure
the
memory
meaning
that
own
chip,
the
fast
memory
right
so
I
said:
if
you
this
detects
the
race
condition
among
the
other
variables
inside
the
share
the
memory
right
and
not
not
not
anything
else.
So
to
run
it,
you
run
this
command
and
it
reports.
A
Two
types
of
error:
I
have
three
types
reported:
one
is
to
report
about
individual,
the
race
condition
and
the
the
secondary
analysis
tell
is
based
on
whatever
the
you
know,
the
race
condition
race
condition
you
detected.
He
kind
of
me
summarized
about
this.
This
code
about
this
race
condition
again.
A
Global
memory
only
I
saw
here
that
I've
toured
so,
for
instance,
this
will
not
work
for
the
shared
memory
variable
or
local
variable
right.
So
to
run
it
you
just
do
it
like
that
sync
chart
this
that
detect
the
the
synchronization
synchronization
between
the
estrellas
horeb
you.
So
this
is
graphic.
This
is
really
truly,
you
know
fully
featured
in
a
graphical
predator
debugger,
so
the
manual
says
that
it
only
supports
Cray
compiler,
so
I
contact
the
vendor.
A
So
it's
definitely
support
CUDA
and
the
definitely
support
MPI
to
run
to
use
that
tool
you
run
like
that
and
that
this
is
quite
complicated,
but
in
condense
it
contains
a
lot
of
information
call
back
trace.
This
is
a
stack
frame.
You
can
see
the
variables
in
that
stack
stack
right
in
here
and
then
you
can
set
a
breakpoint
here
by
just
clicking
on
the
number.
A
This
is
colonel
and
we
are
inside
the
corner,
and
this
is
the
total
views
convention,
so
they
represent
the
each
of
thread
or
process
using
these
two
numbers.
One
point
something:
the
first
one
graph
read
approach
to
the
MPI
task,
but
not
necessarily
ampere
rank
and
second
number
is
roughly
transport
corresponds
to
the
thread
ID,
but
not
exactly
but
anyway,
by
looking
at
it.
If
you
see
'm
negative
number
in
the
in
the
second
part,
that
means
is
CUDA
kernel
could
a
thread.
The
positive
ones
are
the.
A
The
CPU
side
here
so
you
can
set,
you
can
just
click
the
under
number
on
the
source
pane
and
to
set
a
breakpoint
and
before
the
CUDA
code
is
loaded
at
that
point,
that
the
breakpoint
will
be
said
temporarily,
but
once
the
CUDA
code
is
loaded,
then
it'll.
Definitely
you
do
locate
the
earth
breakpoint
in
the
correct
location
and,
as
I
said
who's,
the
host
thread
is
has
a
positive.
Second
number
CUDA
thread
has
a
negative
number.
A
So
this
is
a
triggered
thing
because
in
everything
we've
done
by
the
the
warp
level
instruction.
Okay-
and
here,
if
you
look
at
I'm
sorry,
let's
go
back
here-
we
are
talking
about
here.
So
this
is
to
show
the
the
thread
coordinates
here.
So
block
thread
three
dimensional
entities
right
and
that
the
thread
and
minus
one.
So
you
can
click
this
button
to
specify
the
other
coordinates
in
terms
of
logical,
logical,
meaning.
A
The
software
coordinates
in
the
Twitter
gdb
term
or
physical
coordinates
here
that
that
corresponds
to
a
hardware
coldness
in
the
device,
the
SM
Asura,
to
check
the
values
you
can
just
right:
click
on
the
variable
in
the
in
whatever
in
the
window.
This
is
called
a
dive
dive
on
the
variable
and
you
can
check
the
value
and
you
can
plug
your
elements.
You
can
get
a
statistic
about
our
elements
here,
so
I
think
that
that's
all
I
have
today.
But
if
you
play
around
with
this,
this
is
kind
of
into
2d
tools.