►
From YouTube: 05 Codee identification of defects in parallel code
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
We
plan
also
for
a
30
minute
break
again
at
around
7
30
hour
time,
so
within
one
hour,
one
hour,
15
minutes,
and
for
the
first
part,
what
we
will
be
seeing
is
how
you
can
use
kodi
to
detect
defects
in
your
openmp
or
open
sec
codes.
What
this
means
is
that,
once
you
have
taken
your
sequential
code,
you
decide
what
parts
of
the
code
want
to
offload,
and
you
add
practice
to
it.
Kodi
can
understand
the
code,
can
understand
your
practice
and
check
if
the
pragmas
are
correct
for
the
code
that
you
are
uploading.
A
So,
during
before
the
break,
we
will
be
first
seeing
very
briefly,
which
are
the
defects
currently
supported,
and
we
will
present
you
with
an
exercise
that
we
can
cover,
probably
in
the
next
30
35
minutes,
for
you
to
use
the
kodi
to
detect
one
defect
in
gpu
gold
right
after
that,
we
will
see
another
short
slide
deck
to
understand.
What
is
the
completely
added
complexity
of
real
codes,
why
things
are
similar,
but
significantly
different
to
having
isolated
kernels
like
biomatmul?
A
What
are
the
challenges
behind
real
cost
or
the
additional
things
we
need
to
consider,
and
for
that
we
will
be
just
reviewing
very
briefly
or
enumerating
some
of
the
difficulties
you
have
for
real
codes,
and
we
will
be
presenting
you
with
the
lulac
mk
example,
which
is
a
simplification
of
the
lulac
choral
benchmark,
but
still
contains
functions
of
the
coral
of
lules.
That
are
real.
That
literally
the
same
part
of
the
a
of
the
code
of
the
rules
benchmark,
and
we
will
present
you
how.
A
Also,
before
the
break
we
plan
to
do
a
demonstration
of
kodi
using
fortran
codes.
This
fortran
support
is
experimental,
yet
we
finish
a
development
internally
one
month
ago
and
during
this
term
until
june,
we
plan
to
be
testing
it
internally
or
with
early
adopters
that
want
to
test
the
fortran
code
early
early
releases
of
fortune
code.
A
We
encourage
you
to
bring
your
own
codes
and
try
to
follow
the
initial
steps
to
get
started
with
kodi
so
that
we
can
help
you
during
this
session.
If
something
happens,
then,
of
course
we
always
have
can
continue
conversations
after
the
course
through
appointment
sessions.
That
nurse
is
going
to
facilitate
in
the
upcoming
weeks
or
months
okay.
So
this
is
the
the
the
plan
for
today.
A
So,
let's
start
with
the
next
lap,
the
third
lap
that
we
propose,
that
is,
we
saw
a
set
of
gpu
challenges
that
we
need
to
address,
and
today
we
will
be
seeing
how
we
can
use
kodi
to
identify
defects
in
gpu
code,
particular
defects
in
data
transfers,
data
transfers
that
are
coded
using
openmp
or
openscc
pragmas,
but
that
seemed
correct,
but
that
they
are
correct
incorrect
for
some
reason.
A
So
if
you
look
at
the
catalog
that
we
have
open
and
we
encourage
you
to
use
review,
learn
from
it
and
of
course
always,
please
feel
free
to
contact
reach
out
to
us
or
tuners.
So
we
can
identify
new
actions
or
elements
that
we
can
add
to
the
catalog,
we're
always
learning
and
working
collaboratively
with
the
with
the
community.
On
this.
So
from
the
open
catalog
that
you
can
see
in
the
website,
we
will
focus
on
the
section
of
defects
where
we
have
today
implemented
11
defects
using
the
software.
A
You
can
also
take
the
defects
and
navigate
the
defects
in
a
different
way.
Remember
the
six
stages
of
the
performance,
optimization
roadmap,
three
sequential
stages,
optimizing,
sequential
instructions,
simplifying
the
control
flow,
optimizing,
the
memory
usage
and
the
three
three
stages
related
to
parallelism,
vectorization,
matrices
and
offloading.
A
How
can
this
this
can
impact
on
the
way
we
need
to
manage
data
transfers?
So
today,
what
we'll
be
seeing
is
how
these
data
transfers,
due
to
the
data
structure
that
we
have
selected,
can
be
incorrect,
although
they
look
like
being
correct-
and
this
is
related
to
the
well-known
problem
of
deep
copy,
so
copying
complex
structures
that
are
built
with
pointers
and
navigating
the
pointers
to
move
all
the
core
data
correctly
from
the
cpu
memory
to
the
gpu
memory.
A
Typically,
what
is
called
deep
copy
so
as
usual,
and
we
always
want
to
remark
this
many
times
it's
up
to
us
developers
that
we
are
responsible
for
making
a
correct
usage
of
the
language
or
the
programming
language
of
the
compiler
that
implements
and
support
the
specification
of
the
language
and,
of
course,
the
parallel
programming
api
that
we
use,
openmp,
opencc
or
any
other
one.
We
need
to
learn
the
rules
and
we
need
to
learn
to
use
it
appropriately
in
a
proper
manner.
A
So
it's
up
to
us
to
use
it
consistently
so
that
the
compiler
can
do
the
rest
of
the
hard
work.
So,
for
this
case
we're
going
to
use
part
of
the
materials
we
introduced
yesterday.
If
you
remember,
we
have
this
multi-dimensional
matrix,
let's
call
a
2d
matrix
that
can
be.
This
is
the
logical,
structured
layout
of
the
data
in
our
in
our
minds,
but
this
is
not
necessarily
how
data
is
actually
located
in
the
physical
memory
of
the
computer.
A
So
in
order
to
control
this,
this
is
up
to
the
programmer
to
decide
which
data
structure
is
going
to
use,
to
represent
logical
matrices
and
depending
on
the
data
type,
and
how
we
declare
the
matrices
in
the
pro
in
our
cc,
plus
or
fortran
program.
We
can
have
the
data
consecutive
in
memory.
This
is
highly
desirable
for
performance,
because
when
we
have
all
the
data
consecutively
memory
in
a
natural
manner,
we
can
traverse
all
the
data
set
and
this
enables
to
make
efficient
computations.
A
This
enables
to
implement
efficient
message,
passing
or
efficient
communications
of
data,
because
data
can
be
packed
in
one
single
hardware
instruction,
but
when
we,
when
we
need
to
use
large
or
extra
large
amount
of
data,
typically
statically
located
memory
is
not
enough,
it
has
its
limitations.
So
we
need
to
use
the
heap,
and
so
we
need
to
use
dynamic
memory
allocated
in
the
dynamic
memory
of
the
computer,
and
then
we
enter
in
the
world
of
pointers.
A
So
here
when
we
have
a
double
pointer
implementation
for
a
logical
2d
array,
logical
matrix
in
the
in
the
in
the
double
pointer
in
cc,
plus
plus,
we
don't
have
a
guarantee
that
all
the
data
for
the
rows
is
stored
consecutively
in
memory.
It's
important
to
remark,
because
when
we
want
to
transfer
three
rows
in
one
single
operation,
we
cannot
do
it,
because
all
the
nine
data
elements
in
this
example
are
not
consecutive
in
memory.
A
We
need
to
do
it
by
segments
first
central
row
number
one
next
send
row
number
two
next
send
row
number
three:
if
we
try
to
send
all
the
nine
elements,
starting
in
the
element,
the
position
of
the
first
element
of
the
logical
matrix,
this
will
fail.
This
is
what
is
remark
here.
So
this
relates,
if
you
remember-
and
now
I'm
jumping
to
the
to
the
lab.