►
From YouTube: Migrating from Cori to Perlmutter: CPU Codes
Description
Migrating from Cori to Perlmutter: CPU Codes
Presenter: Erik Palmer, User Engagement Group
Training: Migrating from Cori to Perlmutter, March 10 2023
A
B
A
Right
so
this
is
this
is
my
outline.
You
can
of
course,
sort
of
broadly
break.
This
talk
up
into
five
sections,
but
you
know
there's
quite
a
bit
of
overlap
between
different
parts
of
this.
So
let's
here
we
go
so
I
begin
by
talking
about
modules
and
modules
are
essentially
how
you
access
pre-installed
software
on
the
system
and
the
big
difference
between
Quarry
and
pearl
Mudder.
Is
we
have
a
new
module
system
on
promoter?
A
So
when
you
first
log
on
to
Pro
moderate,
there
will
be
a
set
of
modules
loaded
by
default,
so
this
is
sort
of
like
the
the
environment
configuration
or
the
software.
That's
already
loaded
on
your
system
when
you
start
up,
if
you
haven't
made
any
other
modifications-
and
this
includes
things
like
you
know-
optimizations-
for
the
CPU
architecture-
that's
what
you
see
here
in
yellow.
It
also
includes
the
gnu
programming
environment,
which
includes
the
GCC
compiler.
A
Those
are
these
highlighted
in
red
and
by
default,
because
Pearl
Motor
is
a
GPU
system.
We
have,
by
default
a
lot
of
modules
Geared
for
gpus,
so
if
you're
doing
a
CPU
code,
what
the
for
one
of
the
first
things
you're
going
to
want
to
do
is
to
type
module
load
CPU,
which
will
reconfigure
this
environment
for
CPU
coding,
and
you
can
see
basically
what
it
does.
A
Is
it
unloads
a
lot
of
the
GPU
specific
modules
from
your
environment
and
we'll
also
turn
off
computer
Cuda,
aware
MPI,
which
you
know
which
may
cause
problems
for
you.
If
you're
trying
to
compile
your
code
later
and
don't
want
it
largely.
However,
the
module
system
works
similar
to
what
you're
used
to
on
Quarry.
You
know,
module
list
is
the
same
as
before:
module
load
and
unload
is
still
going
to
be
how
you
load
packages
into
your
environment
and
unload
packages
in
from
your
environment.
A
The
big
difference
here
is
going
to
be
a
module,
spider
and
and
I
want
to
point
this
out,
because
this
is
also
going
to
be
a
common
one
that
you
use,
because
this
is
going
to
be
how
you
find
software
that
you
want
to
load
into
your
environment,
specific
packages
and
and
another
and
whatnot
previously
on
Corey
you
would
have
used
module,
Avail
and
module
available
will
still
work
on
Pro
matter.
A
It
just
won't
show
you
everything
and
I'm
going
to
show
you
an
example
that
sort
of
drives
that
point
home
in
the
next
slides,
but
but
for
now
commonly
used
module
commands
that
you're
going
to
be
using
on
promoter
these
ones.
Here,
module
swap
still
works.
It's
still
useful
modules
show
I.
Think
is
really
useful
and
I'll
do
another
slide
on
that
later,
it
will
tell
you
what
is
going
on
when
you
load
a
module
into
your
environment
and
I
have
I,
have
two
cool
tricks:
I'll.
A
Just
briefly
describe
this
one
and
I'll,
let
you
try
this
one
on
your
own
next
time,
you're
on
the
system,
but
this
one
is
what
it's
going
to
do.
Essentially,
it's
going
to
take
the
module
spider
command
and
it's
gonna
pipe
it
to
grep,
but
to
make
that
clean,
you
can
use
this
redirect
flag
right
and
that's
the
dash
dash
redirect
and
then
for
the
spider
command.
A
You
can
tell
it
to
use
to
search
by
regular
expression
with
this
Dash
R
flag
and
that's
why
I
just
use
a
DOT
which
just
basically
tells
it
search
for
every
single
thing
in
this
in
the
module
system
and
pipe
the
output
to
grep,
and
then
you
can
use
grep
to
search
for
the
string
you
want.
So,
if
you're
more
familiar
or
more
comfortable
using
your
own,
you
know
text
searching
tools.
A
You
can
do
it
with
that
cool
trick
and
for
more
information
on
this,
you
can
look
for
the
docs
for
the
lmod
environment.
So
what
is
the
difference
between
module
spider
and
module
Avail
on
Perl
matter
right?
So
if
you
use
module
and
spider
on
Pearl
Mudder,
it
will
search
without
regard
for
hierarchy.
A
So,
on
Pearl
Mudder,
the
the
module
files
are
kind
of
arranged
in
such
a
way
that
if
you
were
to
type
well,
if
you're
looking
for
a
module
with
module
avail,
it
will
not
be
shown
to
you
unless
you
have
all
the
dependencies
already
loaded.
So
in
that
sense,
if
you're
typing
module
available
and
let's
if,
unless
you
can
just
load
that
unless
you
already
have
all
the
dependencies,
the
dependent
module
is
loading.
Your
system,
module
available
will
not
show
you
that
module
as
being
available
available
right.
A
So
that's
why
you're
going
to
see
difference
in
the
output
between
module,
spider
and
module
available.
A
So
in
this
example,
what
I'm
going
to
do
is
I'm
going
to
be
just
searching
for
create
netcdf.
This
is
that's
the
name
of
the
module
I
want
and
if
this
is
going
to
show
you
the
difference
between
what
happens
is
I
go
through
this
process,
so
bear
with
me
as
I
as
it
types
what
I
just
said.
These
are
the
modules
I
currently
have
loaded
and
just
to
point
out
that
it's
not
there
I'm
going
to
try
to
just
load
it.
It's
just
it's
not
available
is
what
it
says.
A
A
Well,
if
I
use
create
module,
spider
create
netcdf,
all
of
a
sudden,
I've
discovered
it
now
with
module
spider
if
I
type
out
the
module
name
and
the
version
I
get
even
more
detailed
information,
and
that's
where
it
tells
me
that
I
need
to
load
create
hdf5
first,
if
I
want
to
load,
create
netcdf.
A
We
see
that
the
module
is
now
loaded
here,
so
this
this
example,
which
we'll
leave
with
the
slides
that
you
can
view
it
as
many
times
as
you
like
shows
you
that
you
know:
module
spider
with
Spiderman
defeats,
our
our
Superman
hero,
module
available
in
this
case
and
searching
for
create
netcdf.
So
we're
really
going
to
recommend
to
users
to
to
kind
of
change
that
default
habit
and
move
toward
module
spider.
A
Another
useful
module
command
is
module
show,
and
you
know
so
so
this
slide
is
a
lot
of
text.
But
what
I
wanted
you
to
see
is
that
module
show,
gives
you
a
lot
of
information
about
what
the
module
is
doing
when
you
type
module
load
a
particular
package.
So
it's
essentially
doing
these
three
two
two
large
categories
right,
which
is
changing
your
paths
and
setting
some
environmental
variables.
A
So,
if
you've
seen
yellow
like
these
are
the
things
I've
highlighted
as
changes
to
paths
right,
you
can
see
it's
adding
the
place
where
these
create
hdf5
libraries
exist
down
here,
right
where
the
hdf5
directory
and
hdf5
root.
It's
setting
those
environmental
variables
so
that
when
your
applications
are
looking
for
hdf5,
they
might
search
for
this
environmental
variable
to
try
to
find
where
it's
located
and
that's
where
this
is
being
done.
The
other
important
reason
to
show
you
this
is
sometimes
people
in
part
of
their
building.
A
Their
code
have
these
Library
paths
hard-coded,
and
it
might
be
helpful
just
to
be
able
to
look
at
the
location
of
the
library
and
to
that
might
help
you
sort
of
troubleshoot
or
or
kind
of
take
a
shortcut
to
solving
some
of
your
compile
issues.
If
you
need
a
specific
library
and
you're
trying
to
look
at
where
to
find
it.
A
Okay,
so
in
the
next
section,
I'll
I'll
talk
a
little
bit
about
programming
environments
and
we'll
get
into
a
little
bit
more
about
compilers
and
and
and
things
that
go
along
with
it.
A
So
I'm
gonna
focus
on
on
three
main
programming
environments
on
Pearl
Mudder.
The
default
one
is
program,
I
call
it
program,
environment,
I,
always
called
gnu,
but
I
guess
you
could
call
it
gnu.
The
big
new
one
is
programming
environment
Nvidia
for
us
when
we're
talking
about
CPU
only
I
I,
you
know
I
expect
that
we
may
not
use
that
one
as
much.
A
However,
you
know
maybe,
if
you're
going
from
starting
at
CPUs
and
moving
to
gpus,
you
might
focus
on
that
more
program.
Environment
Nvidia
is
also
totally
capable
of
compiling
CPU
only
code.
So
if
you
want
to
try
it
out
it,
it's
definitely
worth
a
try.
If
you're
trying
to
get
stuff
to
work
uses
the
credit
compilers
and
it's
listed
here-
that's
another
viable
option.
So
usually,
like
we've
set
this
as
a
default,
we
usually
suggest
that
people
start
by.
A
If
it's
your
first
time
trying
to
compile
your
code
on
Pro
matter.
We
suggest
you
start
with
the
programming
environment,
canoe
and
then
sort
of
Branch
out
to
the
other
ones
and
and
to
see,
if
you
know
you
get
better
performance
or
or
you're
able
to
compile
in
places
where
you
couldn't
compile
before,
but
that's
kind
of
our
general
advice.
A
One
thing
I'm
going
to
be
emphasizing
in
the
next
few
slides
is
about
wrappers,
so
the
rapper
is
going
to
work
in
tandem
with
the
programming
environment.
So,
as
you
can
see
here,
to
use
the
G
plus
plus
compiler
for
cs
plus
I'm,
going
to
use
this
Capital
CC
wrapper
and
if
I
was
in
programming,
environment
and
videos.
So
if
I
was
over
here
and
I
was
just
typing
CC
and
program
environment
new,
it
would
call
the
G
plus
plus
compiler.
A
If
I
was
in
this
environment
and
I
used
the
CC
wrapper,
it
would
call
the
NVC
plus
plus
compiler
right,
that's
what's
what
this
table
is
showing
you
how
each
one
of
these
compilers
changes
for
the
wrapper.
As
you
change
programming
environments-
and
you
know
for
MPI
each
one
of
these
uses
the
great
mpitch
MPI
that
we
the
default,
that
we
recommend.
A
So
how
do
you
load
a
programming
environment
just
like
the
other
modules
like
module
load
programming
environment?
A
If
you
want
to
switch
from
one
to
another,
for
example,
if
I'm
going
from
gnu
to
cray,
I
can
type
module
load,
program,
environment,
gray,
I,
don't
necessarily
have
to
swap
or
unload
like,
maybe
previously,
so
that
is
slightly
more
convenient
but,
like
I,
said,
the
programming
environments,
kind
of
work
in
tandem
with
the
compiler,
wrappers
and
I
kind
of
want
to
continue
to
sort
of
encourage
you
to
use
the
compiler
wrappers
and
that's
what
what
this
slide
is
sort
of
talking
about.
A
This
is
sort
of
showing
you
not
only
does
the
compiler
wrapper
like
sort
of
automatically
set
the
compiler
based
off
of
your
programming
environment,
but
it
also
includes
many
other
things
that
you
don't
necessarily
see
so
in
this
dark
blue
line
here.
A
I
have
like
a
sort
of
a
typical
compile
line
that
I
would
do
with
the
GCC
compiler
and
that's
just
I'm,
compiling
a
hello
world
example
with
the
few
Flags
to
give
it
openmp
and
to
tell
it
how
to
Output
my
code
and
the
second
line
down
here
in
the
light
blue
area
I'm
using
the
compiler
mapper,
now
I
also
added
this
flag
that
says,
create
PE,
Dash
verbose,
which
is
going
to
basically
tell
it,
tell
me
all
the
extra
things
that
are
happening
behind
the
scene
when
I
type
this
compile
line.
A
A
So
the
only
difference
between
these
two
is
I've
added
this
flag
to
say
Hey,
you
know
tell
me
all
the
extra
flags
that
are
being
added
by
the
crate,
PE
compiler
wrappers
and
tell
me
a
little
bit
about
all
the
amazing
things
that
are
that
are
happening.
So
that's
what
you
see
down
here
right
so
once
you
show
all
that
stuff.
This
create
compiler
wrapper
in
this
example
I'm
in
the
programming,
the
canoe
programming
environment.
So
it's
calling
GCC
and
it
adds
a
flag
to
optimize
for
the
CPU
architecture.
A
It
also
has
additional
Flags
to
further
optimize
for
architecture.
It
is
including
you
know,
our
default
and
right,
it's
also
including
the
science,
libraries
and
several
other
things
that
we
will
find
helpful,
and
you
know
this.
This
list
goes
on
quite
a
bit,
but
it's
kind
of
too
big
for
one
slide.
So
let
me
cut
off
here.
A
Furthermore,
if
you're
using
the
wrappers,
several
things
will
automatically
link,
you
know,
like
I,
showed
you
MPI,
those
the
the
science
libraries
like
the
pack
Bloss
scholar
pack
more,
if
you
have
loaded
create
modules,
those
get
automatically
linked
by
the
chromatic
by
the
wrappers,
a
quick
kind
of
a
side
note
about
the
scientific
libraries
if
you're
looking
for
the
packs,
color
pack,
plus
those
they
are
included
in
the
cray
libsi
module
and
for
more
information
about
exactly
what's
in
there
and
how
to
use
it.
A
I
recommend
that
I've
been
looking
at
the
manual
with
man
website.
I'll,
tell
you
more
about
that
modules
are
linked
dynamically
by
default.
You
know.
So,
when
you
load
these
modules
into
sit,
your
your
environment,
a
lot
of
the
times
they'll
be
the
pass,
will
be
prevented
to
the
LD
Library
path
to
this
create
LD
Library
path.
The
shared
libraries
will
be
dynamically
linked
if
you're,
combining
with
your
own
shared
libraries,
we
consider
you
know
adding
these
options
for
the
r
path
in
general.
A
You
should
know
that
create
wrappers
build
dynamically
linked
activity
executables
by
default.
One
thing
to
be
careful
about
is,
if
you've
we're
using
the
static
flag
or
this
create
environmental,
pretty
environment
variable
down
here,
creepy
link
type
equal
static.
A
A
So,
just
to
kind
of
you
know
compilers
and
flags,
there
are
a
lot
of
them
and
it's
a
little
bit
daunting
to
just
keep
rogging
you
with
more
and
more
Flags,
but
I've
only
got
a
few
more
Flags,
so
this
table
sort
of
puts
those
things
together:
I've
separated
them
out.
So
if
you're,
this
is
like
a
new
programming
environment.
This
is
the
cray
programming
environment.
This
is
the
Nvidia
programming,
environment
and
I've
just
pulled
out
a
few
common
compiler
flags
that
you
might
want
to
use
with
your
codes.
A
Typically,
you
can
see
that
it
creating
a
new
behave
almost
identically.
However,
if
you're
compiling
codes
in
the
programming,
Nvidia
programming
environment,
you
may
need
to
make
some
small
changes
to
to
achieve
the
same
things.
A
One
thing
to
to
point
out
here
is
a
big
difference
between
Corey
and
pearlmutter.
Is
that
openmp
is
not
enabled
by
default
on
perlmeter.
A
So
if
you
want
your
codes
to
incorporate
openmp,
you
need
to
add
the
flags
and
the
flags
for
canoe
is
just
F
openmp,
but
on
Nvidia,
that
flag
is
different
in
this
MP
and
again,
I
will
tell
you
that,
from
my
personal
preference
to
get
into
the
nitty
retails
of
the
flags
on
the
compilers,
the
the
manual
pages
are
really
helpful
in
in
using
you
know,
searching
through
those
can
be
a
quick
way
to
get
definitive
information
about
what
you
need.
A
A
Another
thing
to
point
out,
because
you
know
what
we've
in
my
experience
you
know
a
lot
of
people
coming
from
Corey
to
Pearl.
Mudder
are
usually
bringing
codes
that
maybe
a
few
years
old.
By
now,
and
one
of
the
big
differences
between
Corey
and
pearlmutter
is
we
don't
have
the
Intel
programming
environment
and
the
Intel
compilers
that
go
with
it,
at
least
at
this
moment,
and
so
when
you're
trying
to
compile
a
code
on
Quarry.
A
If
you
were
compiling
with
the
Intel
compilers,
you
might
find
it
doesn't
automatically
compile
with
the
gnu
compiler
on
perlmeter,
and
so
we
have
some
sort
of
tips
here
to
kind
of
help.
You
with
things
like
that.
So
that's
what's
on
this,
this
slot
right
so
for
Fortran,
especially
older
portrand.
We
one
of
the
flags.
We
recommend
you
try
when
you're,
if
you're
having
trouble
compiling
and
when
you
move
to
Pearl
Mudder.
Is
this?
A
Allow
argument
mismatch
right,
so
this
is
kind
of
a
more
specific
targeted
thing
about
what
it's
telling
the
compiler
to
ignore.
But
if
you
know
to
take
that
sort
of
farther
in
the
like,
ignore
everything
you
can
Direction
about,
you
know
sort
of
older
code
practices
that
may
no
longer
be
allowed.
A
You
can
use
this
the
standard,
equal
Legacy
flag,
which
will
again
reduce
the
sort
of
strictness
of
the
compiler
and
allow
it
to
sort
of
bend
the
rules
a
bit
more
like
sort
of
maybe
the
older
compilers
were
famous
for
allowing
if
you're
talking
about
C
or
C
plus
plus
there's
again
the
same
idea
try
to
find
some
flags
that
reduce
strictness
just
to
get
your
code
compiling.
We
have
this
permissive
flag,
this
other
the
W
pandemic
pen,
the
dentic
pedantic
can
one
year
about
lines
that
break
code
standards.
A
A
Okay,
so
the
next
section
I
have
just
a
few
quick
tips
about
cmake
and
make
files
and
and
when
I'm
talking
about
make
files
here,
I'm
talking
about
make
files
and
the
auto
tools
sense,
not
the
make
files
that
see
make
makes
so
I
just
wanted
to
kind
of
make
that
distinction.
A
The
other
thing
I
kind
of
want
to
point
out
here
is
these.
The
the
the
tips
are
sort
of
high
level,
because
when
we
start
getting
into
build
systems
like
cmake
and
make
files
they're
usually
there
for
when
you're,
compiling
fairly
complicated
code,
which
usually
means
your
make
files
and
and
your
cmake
build
systems
are
fairly
complicated.
So
these
these
really
are
some
tips.
They
might.
You
know
it
might
be
kind
of
Hit
or
Miss
for
each
person,
but
you
know,
hopefully
a
few
hits
make
it
worthwhile.
A
So
in
particular,
the
reason
why
this
comes
up
here
is
because,
when
we're
using
the
Crave
wrappers
a
lot
of
times
deep
down
in
those,
you
know
that
cmake
build
system
or
in
the
make
files
for
other
tools.
The
compiler
has
been
hard-coded
and
won't
accept
the
compiler
create
compiler
wrappers.
The
way
we
think
it
should-
and
the
first
thing
to
try
if
something
like
that
is
happening,
is
to
use
one
of
these
two
techniques.
A
So
if
you're
doing
the
typical
auto
tools,
method
where
you
configure
and
it
makes
a
make
file
and
then
you
make
and
then
you
install
right
the
way
you
do,
that
is
with
a
line
like
this
right,
where
you
tell
it
the
things
before
when
you
were
asking
for
the
the
CC,
you
want
the
C
compiler,
the
cxx,
that's
the
CC
wrapper
or
the
FC.
You
want
the
ftn
wrapper
right.
This
will
give
it
that
will
point
it
into
the
right,
create
compiler
wrapper
for
each
of
the
compilers.
A
You
want
the
C
compiler,
the
C
plus
compiler,
the
portrait
compiler
for
cmake,
if
you're
having
that
similar
type
of
issue
like
this
is
typing.
This
line
on
the
same
line
as
your
cmake
command
or
before
you
call
it
can
help
remedy
the
same
problem
right.
It's
it's
doing
the
same
thing.
It's
telling
your
code
ahead
of
time
that
you
want
to
use
this.
The
C
compiler
compiled
by
the
wrapper
or
the
the
Craig
wrapper
for
this
C
compiler,
C,
plus,
plus
and
Fortran,
and
so
on.
Right.
A
So,
as
I
mentioned
before,
like
make
files
and
the
cmic
build
system
can
be
incredibly
complex.
So
what
I
have
here
is
a
really
kind
of
like
basic
example
that
points
out
kind
of
like
where
I
would
start
to
look
for
things
if
I
was
having
problems
with
my
make
file
and
some
sort
of
easy
adjustments,
I
can
make
to
incorporate
the
cray
wrappers,
which
would
allow
me
potentially
to
solve
some
problems
compiling
my
code,
so
in
particular
right
if
this
is
sort
of
my
example
make
file.
A
So
this
is
really
like
an
existing
make
file
like
you
already
have
one
that's
set
up,
and
you
just
type
make
to
to
compile
your
code
you're
looking
for
a
make
file.
That
looks
something
like
this
at
the
top
of
it.
You'll
generally
see
several
environmental
variables
defined,
one
of
the
ones
you're
going
to
be
looking
for
when
we're
talking
about
compiler
wrappers,
specifically,
is
is
the
CC
or
the
cxx,
or
these
type
of
things
here.
In
this
case,
you
see
in
this
example,
which
I
took
directly
off
the
web
they've.
A
What
I'm
going
to
call
hard-coded
the
compiler
would
be
the
GCC
compiler.
So
that
means,
if
I
switch
to
the
other
compilers
I'm,
not
going
to
be
switching
away
from
the
GCC
compiler,
if
I'm
in
the
query,
programming,
environment
or
the
Nvidia
programming
environment,
I'm
always
going
to
get
sent
back
to
the
GCC
compiler.
A
You
know
which,
in
the
case
of
C
Minit,
be
a
huge
deal.
But
this
illustrates
the
point.
Well,
so,
if
I'm
looking
to
make
file
and
I
see
an
issue
like
this,
what
I'm
going
to
want
to
do
is
I'm
going
to
want
to
change
this
so
that
it
points
now
to
the
C
compiler
right
because
I'm
this
is
the
C
compiler
that
it
it
wants
I'm,
giving
it
a
c
compiler.
So
that's
the
lowercase,
CC
and
once
I
do
that.
A
That's
all
I
need
to
incorporate
the
create
compiler
wrappers
into
this
build
system.
So
now,
when
I
type
make
it's
going
to
use
the
create
compiler
wrappers,
it's
going
to
do
all
those
optimizations
and
other
stuff,
I
told
you
about
before
all
hidden.
In
the
background
as
part
of
everything
you
want.
So
I
s,
you
know
when
you're
looking
through
your
make
files,
if
you
notice
something
like
we
talked
about
before,
and
you
can
make
this
easy
edit
I
suggest
you
you
give
that
a
try
and
see.
If
that
helps
you
out.
A
That
was
my
tip
for
make
files.
My
next
tip
is
on
cmake
and
it
really
can
be
boiled
down
to
one
thing,
and
that
is
this:
application
called
CC
make.
So
what
I
have
on
this
slide
is
basically
the
walkthrough
if
you're
not
totally
familiar
with
the
cmic
build
process.
A
I
have
a
typo
here,
because
I
need
to
make
a
directory,
but
so
I
make
my
directory
I
move
into
my
directory
and
so
I'm
building
this
code
from
directory
above
it
or
the
directory
out
of
side
of
it,
and
then
I
use
cmake
that
I
call
with
a
dot
dot
to
this
is
kind
of
analogous
to
the
configure
step,
but
this
is
how
you
invoke
cmake
and
it
will
do
make
it
cmake
files
as
part
of
the
build
process.
A
A
It
gives
you
a
user,
a
graphical
user
interface,
to
kind
of
investigate
the
different
parameters
in
your
build,
so
in
particular
like
in
this
example,
I'm
I'm,
using
it
on
that
simple
example
with
one
cmake
list
and
one
you
know
the
the
hello
world,
openmp
CC
file,
I'm
trying
to
do
it
to
compile,
and
you
can
see
that
cmake
sort
of
automatically
fill
in
a
lot
of
these
values.
A
A
So
if
I
were
to
do
something
like
that
or
if
maybe
I
was
looking
at
a
project
that
picked
those
up
correctly,
you
would
see
something
more
like
this
right.
So
if
I
was
looking
here
at
the
cscx
cxx
compiler,
you
can
see
that
this
one
has
gray
PE
2.7.19
bin
CC
right.
That
is
the
Crave
wrapper
for
this,
create
C
plus
plus
compiler
the
create
wrapper
for
the
C
plus
compiler
that
we
want
right.
A
So
if
I
saw
this
I'd
say,
okay
cmake
is
picking
up
the
compiler
the
way
I
expect
it
to,
and
it's
and
it's
doing
what
I
wanted
to.
A
The
other
thing
to
point
out
here
is
this:
is
page
one
of
nine
there's
a
lot
of
information
here
that
you
can
kind
of
look
through
to
look
for
issues
you
know,
depending
on,
if
you're
having
trouble
with
your
your
build
system,
so
I
I,
like
this
tool
and
I,
find
it
helpful
for
finding
issues,
doesn't
necessarily
solve
them
for
you,
but
knowing
where
the
problem
is
usually
pretty
helpful.
A
Okay,
all
right,
so
you
know
summarizing
back
to
where
we
are
those
things
that
I
pointed
out.
That's
what
we're
trying
to
address
by
giving
you
these
tips
right
to
try
this
line
here
or
to
try
this
in
your
configure
step
and
and
for
more
details
about
this.
As
always,
we
have
docs
on
these
things.
That
can
be
quite
helpful.
A
That
I
still
personally
find
helpful,
so
I
feel
fine
to
share
them
with
you,
okay,
in
the
next
section,
I'm
going
to
show
you
a
few
quick
examples,
I
think
I'm
gonna
just
do
one
based
on
how
things
are
going
for
compiling
code
on
parameter.
A
The
main
takeaway
here
is
that
the
thing
I
cannot
say
loud
enough
is:
if
you
go
from
Quarry
to
Pearl
matter,
you
should
probably
recompile
your
code.
The
architectures
are
different
enough.
Those
optimization
optimizations
in
there
that
will
speed
up
your
code.
A
You
know
you
will
code
will
run
faster
if
you
recompile
it
on
Pearl
Motor,
it's
very
possible
that
if
you
take
it
directly
from
query
to
Pearl
meta,
it
won't
run
at
all
so
recompile
your
code
on
parliamentar
the
example
I'm
going
to
show
you
is
just
a
simple
MPI
and
openmp
example
of
a
hello
world.
It's
going
to
say
you
know
hello
from
different
threads
and
processes
and
out
to
the
screen
so
that
you
know
this
is
kind
of
pared
down
as
much
as
possible.
I
think
for
this
example.
A
I'm
doing
this
from
the
gnu
programming
environment-
and
you
know
like
I
said
this-
is
you
know
a
fairly
straightforward
example?
But
maybe
this
is
just
give
us
a
taste
of
how
to
do
this.
You
can
see
I
have
the
just
the
default
list
again
I'm
going
to
compile
with
the
compiler
wrapper.
A
You
know
in
this
case
the
lowercase
CC,
and
the
things
to
point
out
here
is
that
you
Sonic
and
Powerline
I,
told
it
to
use
openmp
right.
I
had
to
include
that
because
that's
not
included
by
default
on
perlmeter.
So
if
you
want
openmp,
you
have
to
include
it
there
now
I'm,
giving
it
the
variables
setting
the
variables
to
do
the
threading
and
then
I
run
the
code
and
and
yay
it
works
right.
A
So
the
big
takeaway
from
this
simple
example
is
that
if
you're
compiling
with
the
compiler
wrappers
and
you've
been
compiling
with
the
compiler
wrappers
on
Corey,
you
should
find
that
these
are
very
similar
right.
It
should
be
very
similar
experience.
The
only
main
difference
is,
if
you
want
openmp,
you
have
to
include
the
flag
down
right.
That
was
not
the
case
before.
A
So
that's
the
moral.
The
story
here,
I
have
another
compiled
example.
Here,
I'm
just
going
to
leave
it
here,
you
can
watch
it
later
it.
A
You
know
down,
uses
the
library
that
I
downloaded
and
linked
it
in
and
shows
you
how
to
do
that,
but
that
will
exist
for
eternity
for
you
to
look
at
it
when
it's
convenient
for
you
in
the
next
section,
I'm
going
to
talk
a
little
bit
about
understanding,
job
parameters
and
I
do
this,
because
you
know
your
job
parameters
are
going
to
change
when
you
run
from
coming
from
Quarry
to
Perimeter
right,
because
architecture
is
different
and
if
you
understand
what
each
of
those
parameters
mean,
that's
going
to
help
you
make
intelligent
choices
about
them.
A
So
that's
why
I
take
this
this
perspective,
so
you
know
when
you
want
to
run
a
job.
You
have
a
job
script.
That's
what
we're
looking
at
here,
in
particular
the
things
I'm
going
to
be
focusing
on
are
the
things
that
are
highlighted
very
very
lightly
in
different
colors.
Is
this
part
here
this
part
here
these
parts
here
these
parts
down
here
and
it's
going
to
be,
you
know,
sort
of?
How
do
you
understand
these
parts
and
how
do
you
make
choices
about
these
to
understand
these
parts?
A
The
key
terms
I'm
going
to
focus
on
are
node
MPI
task,
logical,
CPU
thread,
physical,
core
processor
and
Advanced
terms
is
going
to
be
the
Numa
domain.
So
the
last
time
we
did
query
to
Pearl
and
matter
training
I
spent
more
time
on
pneuma
domain
I'm
not
going
to
spend
as
much
time
on
it.
This
time,
if
you're,
if
you're
interested
in
hearing
me
talk
about
it
a
little
bit
more
than
than
that,
video
is
still
available
online.
So
today,
I'm
going
to
give
you
the
shortened
version.
A
A
A
So
if
you
look
at
our
page
for
promoter
system
architecture,
it's
going
to
say
two
AMD,
epic
Milan
CPUs
I'm,
going
to
call
that
two
AMD
epic
Milan
processors.
So
when
I
say
the
processors
I
mean
the
same
thing
as
this
all
right
and
it's
going
to
say,
64
cores
per
CPU
I'm,
going
to
use
the
word:
64
physical
cores
per
processor
right,
so
I'm,
chasing
CPU
to
processor
here
and
cores
I'm,
going
to
call
physical
chords.
A
When
we
start
talking
about
two
hyper
threads
per
core
I'm,
going
to
call
those
two
logical,
CPUs
per
physical
core
right,
we
got
the
chorus
being
physical
cores,
hyper
threads
and
logical,
CPUs,
right,
new
domains
per
socket,
I'm
going
to
say
Newman
domains
per
processor,
all
right.
So
here
we
have
the
diagram
of
a
CPU.
Node
I've
got
one
processor
here,
I've
got
another
processor
here.
This
whole
thing,
together
with
the
two
processors,
is
one
node
right
and
inside
each
of
these
processors.
A
Well,
we've
got
a
nice
picture
here
here.
The
Wider,
blue
square
is
my
node
right
inside
my
node
I
have
two
processors
in
this
picture.
It's
the
yellow
Parts
by
each
one
of
the
processors.
Can
you
can
think
of
it?
Looking
kind
of
like
this
right
inside
each
processor,
you
have
physical
cores
right.
These
are
the
little
tiny
things
right
and
here
and
here
and
here
here
and
here
and
here
A
bunch
of
physical
chords.
A
Each
physical
core
is
capable
of
processing
to
instruction
threads
and
because
of
that
I'm
going
to
say
inside
of
each
physical
core.
There
are
two
logical
CPUs,
so
one
physical
core,
two
logical,
CPUs
right,
and
so
that's
how
the
words
that
I'm,
using
translates
to
the
architecture
for
the
Pearl,
Motor,
compute,
node
and
I'm,
going
to
try
to
keep
using
those
exact
terms
for
the
rest
of
the
talk
all
right.
So.
A
Now,
to
sort
of
understand
a
little
better
about
how
the
architecture
works,
I'm
going
to
give
you
kind
of
an
analogy,
and
the
analogy
is
an
office
building
analogy
which
is
probably
wholly
unoriginal,
but
that's
okay,
so
here
I
have
my
office
building.
You
can
think
of
it
as
full
of
nodes.
Maybe
each
floor
is
a
node
and
on
each
floor
there
are
like
maybe
two
office
layouts
like
this-
that
we
can
think
of
as
processors
right
like
because
this
is
sort
of
how
the
breakdown
of
our
node
works
inside
it.
A
We've
got
two
office
floors
which
are
marked
two
processors.
Our
office
floor
is
made
up
of
little
tiny,
cubicles
right
where
people
do
work
inside
of
each
cubicle
right.
Those
are
the
physical
cores
on
our
system
and
you
know:
I
I
gave
it
away
because
I
had
the
same
picture
last
time,
but
I'm
gonna
hold
it
to
the
end,
but
this
cubicle
could
represent
only
one
sort
of
specific
Hardware.
We
have
here
at
nurse
and
I'll.
A
Let
people
try
to
guess
in
the
in
the
Google,
Doc
and
figure
out
what
it
is
or
or
maybe
in
the
chat.
Would
probably
be
better
not
to
not
to
muck
it
up,
I,
don't
know,
but
I'll
give
away
the
answer
by
the
end,
or
somebody
else
will
and
then
we
could
also
think
of
our
cubicle
as
sort
of
setup
like
this
right
and
that
would
represent.
A
That
would
correspond
to
our
physical
cores
inside
of
our
processor
right,
each
one
of
these
little
box,
but
inside
each
one
of
these
little
boxes
right,
whether
it's
shaped
like
this
or
whether
it's
shaped
like
this
you've
got
a
little
worker,
doing
your
instruction
thread
and
that
is
The
Logical
CPU
right.
That
is
the
hardware
thread.
So
these
are
your
workers
inside
your
cubicles,
which
are
your
physical
cores.
Your
physical
course
go
inside
your
office
plan,
which
is
your
processors
and
your
processors
live
on
your
office
floor,
which
would
be
your
node
okay.
A
So
the
reason
I
go
through.
All
of
that
is
so
that
when
you
see
this
N2
something
immediately
pops
into
your
mind,
it
should
hopefully
be
an
office
floor
full
of
cubicles
right
if
I
start
talking
about
logical
CPUs
those
little
workers
inside
of
each
cubicle
that
corresponds
to
this
Dash
c16
parameter
here.
Those
little
people
inside
of
each
cubicles
should
pop
to
your
mind,
you
should
be
thinking,
oh,
the
hardware
threads
for
the
physical
cores
on
my
processor
right
and
if
I
start.
A
A
Okay,
now
to
understand
kind
of
the
the
that
was
the
architecture.
Now
this
is
sort
of
how
we
break
up
our
problem.
So
that's
when
we
do
that.
We're
talking
about
the
number
of
MPI
tasks,
the
number
of
open,
MP
threats
for
this
one,
no
I'm,
sorry
to
have
so
many
analogies
at
once,
but
let's
try
to
keep
them
separate.
One
was
for
Hardware.
A
This
is
really
for
you
choosing
how
to
break
up
the
work
of
your
simulation
code
right
and
so
the
way
I
gonna
ask
you
to
think
about
that
is
you
can
think
of
this
truck
as
sort
of
carrying
a
whole
bunch
of
you
know.
These
are
pallets
right,
a
whole
bunch
of
palettes
that
all
together
correspond
to
your
simulation
code,
the
work
that
your
simulation
code
needs
to
do
and
you
can
break
up
all
these
little
pieces
of
work
into
MPI,
not
little
pieces.
All
these
palettes,
all
these
big
pieces
of
work
into
MPI
tasks.
A
So
if
you
do
it
that
way,
your
MBI
task
consider
as
your
palette
of
different
boxes.
Now,
if
you
want
to
further
break
that
up
by
using
openmp,
you
can
think
of
M
each
MPI
task
being
further
broken
up
into
the
individual
pieces
with
the
threads
all
right.
So
that's
where
openmp3
comes
in
and
that's
where
the
threads
come
in.
We've
got
whole
simulation
code
broken
up
into
number
of
MPI
tasks,
and
each
MBI
task
gets
broken
up
into
individual
pieces
of
work.
So
that's
the
way
to
think
about
what's
going
on
there.
A
A
Now,
when
you
look
at
this
number,
you
have
that
in
your
mind,
when
I
look
at
this
line
here,
export
OMP
number
of
threads
that
says
for
each
one
of
those
set
of
boxes,
I'm
going
to
break
it
up
into
eight
pieces,
so
each
palette
has
eight
boxes
on
it
now.
So
this
is
how
I'm
further
dividing
down
the
work
for
the
job
that
I
do
so
now.
A
A
And
I
pointed
out
for
new
my
domain
right,
I,
didn't
say
much
so
I'm
going
to
give
you
the
the
shortened
rather
than
spend
a
lot
of
time
in
introducing
human
domains
I'm
just
going
to
give
you
the
advice
to
follow,
to
make
sure
the
way
you
arrange
your
work
respects
the
Newman
domains
so
that
it
will
run
efficiently
and
the
reason
I
can
do.
That
is
because
we
have
a
sort
of
a
general
set
of
guidelines.
A
We
give
you
that,
for
most
cases
is
pretty
successful
at
helping
your
code
run
efficiently,
and
so
this
is
what
those
guidelines
are
right.
So,
if
you're
looking
at
your
number
of
MPI
tasks,
that's
more
than
the
physical
course
right,
so
if
your
palettes,
the
boxes,
sorry
is
less
than
or
equal
to.
So
if
your
palette
of
boxes,
right
of
your
simulation
code
is
broken
up
to
is
less
than
the
number
of
physical
course
that
are
available
so
then
you're
going
to
want
to
use
the
CPU
underscore
bind
equals
course
option.
A
To
avoid
penalties
is
such
a
harsh
word,
but
it's
just
it's
not
as
optimal
as
if
you,
if
you
use
at
least
eight
you'll,
get
a
much
more
optimal
experience
if
you're
using
MP,
openmp
threads,
and
one
thing
to
also
check
is
that
Dash
C
should
be
greater
than
or
equal
to
the
valid
the
value
of
the
number
of
threads.
So
you
know,
if
you
have
two
threads,
you
want
Dash
C
to
be
two
or
more
right
and
then
to
make
sure
that
those
threads
kind
of
execute
close
together.
A
So
if
they're
working
on
the
same
thing,
because
threads
tend
to
work
on
similar
things,
it's
nice,
if
they
work
closely
and
not
like
one
person,
works
on
this
end
of
the
office
and
that
person
works
on
that
in
the
office
and
every
time
you
want
something.
You
have
to
walk
back
and
forth.
That
can
be
kind
of
annoying
and
slow
things
down.
A
But
if
you
use
these
settings
it'll
make
sure
that
when
you've
got
those
two
pieces
of
work
together,
it's
the
same
two
people
in
the
cubicle
working
together
and
that'll
make
sure
that
they
can
communicate
quickly.
So
if
you
follow
these
guidelines
to
set
those
parameters,
you
you
know,
we've
found
that
most
cases
you
will
get
a
good
experience.
A
So
with
all
those
things
said,
we
now
know
this
part
of
the
job
script
right
that
came
from
those
guidelines.
They
gave
you-
and
you
know
this
part
of
the
job
script
that
also
came
from
those
guidelines.
I
just
gave
you
so
we've
kind
of
sort
of
covered,
the
things
that
relate
to
the
new
and
domain
and
things
that
are
associated
with
how
threads
are
processed
so
now.
A
Well,
I,
guess
in
the
next
later
on
we're
going
to
go
into
like
how
we
do
some
job
scripts.
But
this
slide
here
sort
of
incorporates
the
details
from
each
of
the
different
architectures
on
the
nodes
that
we
have
right.
So
you
know
Haswell
you're,
familiar
with
Corey
K
L
you're,
also
familiar
with
our
Focus
today
is
parameter.
Cpus
right
you've
got
128
physical
cores.
A
We've
got
two
logical,
CPUs
per
physical
core
right.
So
again,
these
are
the
cubicles
right.
This
is
how
many
people
are
in
each
cubicle.
This
is
how
many
physical,
how
many
logical
CPUs
per
node,
so
how
many
people
are
in
each
office
floor
right
so
for
our
office.
Our
building
floor
has
two
office
plans.
A
How
many
people
work
in
those
that
office
floor
right,
and
this
is
how
many
Newman
domains
just
sort
of
which
areas
communicate
more
quickly
together,
and
this
is
the
formula
that
you
can
use
for
each
one
of
these
different
architectures
to
calculate
the
value
of
C
right
and
so
we'll
use
that
in
the
next
couple
slides.
A
So
what
I'm
going
to
do
here
in
these
examples
is
I'm
going
to
look
at
a
job
script
from
Corey,
Haswell
and
I'm,
going
to
talk
about
how
to
change
it
to
a
job
script
or
the
program
matter.
A
Cpu
node,
so
in
particular,
in
this
example,
I've
decided
that
I
don't
want
to
use
openmp,
threading
and
so
I've
set
this
variable
at
one,
which
is
just
kind
of
a
best
practice
and
I've
decided
that
I
want
to
split
up
all
the
work
of
my
simulation
to
1280
MPI
tats
now
I
mean
that's.
For
me.
How
this
number
is
chosen
is
kind
of
depends,
a
lot
really
on
your
application
tuning
these
parameters
to
find
the
optimal
result.
A
A
Sorry
for
each
MBI
task
I
want
two
of
my
workers
to
work
on
it
right
so
two
of
those
workers
inside
of
a
cubicle
working
on
it,
and
so
that's
where
that
one
comes
from.
So
what
I'm
doing
in
this
example
is
I'm,
keeping
that
constant
right
that
Dash
C
I
also
want
two
workers
to
work
on
each
MPI
task
over
here,
and
so
the
thinking
that
I've
gone
through
to
to
come
up
with
these
numbers
is
like
this
right.
A
I've
taken
my
total
number
of
MPI
processes
and
I've
divided
it
by
the
number
of
nodes
right,
so
I
had
32,
MPI,
sorry,
MBI
tasks,
32
MBI
tasks
on
each
node
right,
then
I
use
that
formula
that
you
saw
in
the
previous
slide
to
determine
the
value
of
C
to
be
2.
right.
So
the
same
thinking
kind
of
happens
over
here
for
the
parameter
CPU
right
so
now,
I've
done
I
have
1280
MBI
tasks,
I
divide
that
by
the
10
nodes
here,
right
and
well.
I
should
say
that
this
is.
A
This
is
how
I
came
to
the
number
10
here
is
I
thought:
okay,
if
I
take
1280
and
divide
that
by
10
I
get
128.
and
I
know
that
if
I
put
in
128
into
that
formula
from
the
last
page,
I'll
get
two
right.
So
I
know
that
that's
the
right
number
to
put
in
this
formula,
which
means
that
this
number
should
be
10,
which
means
that
the
number
of
nodes
that
I
want
should
be
10..
A
In
this
example,
now
I'm
keeping
the
number
of
nodes
constant
and
I'm
changing
the
number
of
logical
CPUs
for
each
MPI
task.
So
it's
a
similar
thing
right
same
con
same
computation
here
over
here,
I'm
splitting
up
my
MPI
task
across
the
40
nodes.
So
each
node
gets
32
processes,
but
because
there
are
more
physical
cores
and
logical
cores
available
on
a
promoter,
CPU
node
I
can
afford
to
associate
eight
logical
CPUs
to
each
MPI
task
instead
of
just
two.
A
So
this
is
sort
of
an
example
where
we
get
to
play
this
game.
You've
got
32
nodes.
Sorry,
we've
got
our
our
work.
We've
decided
to
split
it
up
into
512
MPI
tasks,
we're
going
to
split
it
across
32
nodes.
What
do
we
want
the
value
of
C
to
be
if,
in
this
configuration?
A
Well
here
is
my
hint
right
I'm
splitting
my
512
MPI
tasks
across
32
nodes,
so
that
gives
me
16.
I
put
that
in
that
formula
as
before,
right
and
I,
because
I'm
doing
openmp
I
have
to
do
that
check
for
my
rules
and
I
find
that
check,
and
that
told
me
it's
good.
This
also
sort
of
gives
away
the
answer
right,
because
I'm
making
that
check
here
and
told
me
that
the
answer
to
this
question
is
16.
and
I'm.
A
Basing
this
on
the
assumption
that
I
want
to
make
full
use
of
all
the
computational
power
on
the
on
the
Node,
if
you
want
to
you,
know
I
the
job
script
generator
is
a
sort
of
an
automatic.
A
A
So
it's
a
good
way
to
learn
and
it's
also
a
good
way
to
get
a
job
script
out
and
and
get
you
some
something
to
try.
That
seems
reasonable.
You
can
find
this
in
two
locations
now
both
of
these
work
so
I'm,
giving
you
both
here
and
with
that
I'm
going
to
sort
of
summarize.
So
the
key
suggestions
for
my
talk
are
use:
module
spider
rather
than
module
Avail.
It
will
show
you
more
things.
A
Recompile
your
query,
codes
on
pearlmeter
we've
got
the
new
programming
environment.
You
have
the
create
program,
environment,
the
Nvidia
program,
ad
they're
worth
trying.
You
know
obviously
start
with
the
default,
then
move
to
the
others,
use
the
compiler
wrappers
because
they
do
so
much.
A
You
know
kind
of
it's
kind
of
unseen,
but
there's
a
lot
of
optimizations
and
and
other
things
being
pulled
in
and
it
allows
you
to
more
easily
try
the
different
programming
environments
and
you
know,
look
back
at
obviously
look
back
at
your
job
scripts
and
and
try
to
you
know,
recalculate
your
JavaScript
parameters
for
for
Optimal
Performance
on
Pro
matter,
and
with
that
there's
only
one
more
note
here.
A
Do
you
Helen?
Is
this
the
usually
the
slide
that
you
talk
on
like.
B
Okay
yeah.
So
during
the
Hands-On
session
later
we
have
prepared
for
the
CPU
this
part
of
the
exercises.
This
is
the
GitHub
repo
there
and
we
have
a
readme.first.
It
basically
tells
you
encourage
you
to
work
in
this
order,
and
then
you
have
a
readme
file
for
each
example,
hello,
world
cereal
and
MPI
code
and
then
matrix
multiplication
or
Jacoby,
which
is
a
c
example
or
Fortune
example
to
do
hybrid,
MPI
and
openmp,
and
we
also
have
an
XT
High
Affinity
example.
You
can
compare
query.
B
Compare
on
the
CPU
side
on
parameter,
CPU
find
out
all
these
flags
that
Eric
talked
about
and
understand
more
with
the
the
the
the
what
what
chorus
with
High
hyper
threads
are
that
your
opening,
pnm
MPI
openp
threads
are
binding
on
there's,
also
a
GSL
test.
A
Yeah
and
with
that
I
will
thank
you
for
listening,
and
you
know
you
can
put
questions
in
the
Google
Doc
or
you
know
later.
If
you
ever
need
help,
you
know
submit
a
ticket
be
happy
to
help.
You.