►
From YouTube: 03 Migrating from Cori to Perlmutter CPU Codes
Description
Part of the Migrating from Cori to Perlmutter Training, December 1, 2022.
Please see https://www.nersc.gov/users/training/events/migrating-from-cori-to-perlmutter-training-dec2022/ for the training day agenda and presentation slides.
A
So
my
name
is
Eric
Palmer
and
I'm,
a
software
integration
engineer
here
at
nurse
and
I'm
very
happy
today
to
be
here
with
you
to
talk
about
migrating
from
Corey
to
Pearl
Mudder.
Today,
the
focus
of
my
talk
is
CPU
only
codes.
A
You
know,
I
I'm
I
aim
to
this
doc
to
give
you
lots
of
information
or
it's
a
general
understanding,
and
so
you
can
do
what
you
need
to
do
with
your
CPU
codes,
but
I'm
not
going
to
get
super
technical
to
maximize
ultimate
performance.
So
hopefully
you'll
find
this
useful.
But
if
you
want
to
eke
out
every
inch
of
performance
in
your
application
on
Pearl
meta,
you're
gonna
have
to
come
back
for
more.
A
So
these
are
the
topics
I'm
I'm
looking
to
cover
today
and
I
picked
these
four,
mostly
because
I
see
these
as
the
major
differences
between
Corey
and
Pearl
Motor.
So
you
know,
the
modules
are
slightly.
The
module
system
is
slightly
different.
The
programming
environments.
When
we
talk
about
what
compilers
are
available
are
available.
A
What
flags
you
need
to
get
the?
What
do
things
to
get
what
you
want.
Those
are
slightly
different.
The
way
you
compile
codes,
pretty
similar
with
one
small
point
that
one
flag
I
just
mentioned
and
the
job
scripts,
because
the
architecture
for
the
nodes
has
changed
so.
B
A
Are
the
things
I
I'm
highlighting
here
in
this
talk,
so
the
first
thing
I'm
going
to
talk
about
is
is
modules,
and
you
know
your
experience
with
core
and
modules
is
still
valid
on
Pro
matter.
It
works
largely
the
same
when
you
log
on
to
Pearl
Mudder
a
parameter
login,
node
you're,
going
to
be
you're
gonna.
A
Have
these
modules
loaded
by
default,
and
the
thing
to
point
out
is
that
a
few
things
here
is
is
one
of
these
modules
represents
the
CPU
architecture
that
the
create
compiling
program
environment
will,
which
I'll
mention
later
is
going
to
use
to
optimize
your
code.
Another
one
is
the
default
programming
environment,
which
again
we'll
get
to
you
more
later,
is
the
gnu
programming
environment,
but
also
the
third
one
is
is
really
important.
Here
is
if
you're
doing
a
CPU
only
code.
A
The
default
module
right
now
is
to
load
the
default
modules
load,
this
GPU
module
and
that
enables
the
Cuda,
where
MPI,
by
default.
It
also
loads
several
modules
that
are
targeted
towards
GPU
codes,
that
we
recommend
you
you,
you
essentially
disable
by
doing
module
load
CPU
if
you're
doing
CPU
only
codes.
So
the
first
step
is
you
come
into
Pearl
Mudder,
you
log
in
you
know,
you're
doing
CPU
only
code.
You
should
look
at
doing.
A
The
rest
of
the
module
stuff
is
is
still
fairly
similar
to
what
you've
seen
on
Corey.
The
commands
here,
such
as
module
list
module
load,
unload
module
swap,
should
be
exactly
the
same.
The
two
following
commands
this
one
may
this
is
one
that
I
don't
know
that
everyone
knows
I
feel
like
these
ones.
People
use
all
the
time,
but
maybe
this
one-
maybe
not
so
much.
This
will
give
you
a
lot
of
information
about
what
the
module
is
doing
and
I
have
an
example
later
to
show
you
that.
A
Finally,
this
last
one
is
going
to
be
the
focus
of
my
next
four
slides,
because
I
really
want
to
drive
this
point
home
that,
if
you're
using
module
Avail
to
find
a
module
or
Define
software
on
Pro
motor,
you
might
not
see
everything
right
away,
whereas
if
you
use
module
spider,
you're
going
to
have
access
to
more
potential
modules
and
more
information
for
how
to
get
to
what
you
want
right
away.
A
So
that's
what's
going
on
with
this
module
spider
here
and
we'll
talk
more
about
that
in
a
second
finally,
on
this
slide,
I
put
useful
tricks
that
I
found
useful
and
I
think
may
be
helpful
for
other
people.
A
If
you
would,
if
you
prefer
to
just
grab
for
a
string
through
all
the
modules
you
can
use
this
line
right,
where
you
can
redirect
the
module
output
to
your
favorite.
Oh,
you
know
Linux
utility
the
you
can
for
instead
of
using
module
lists,
you
can
use
the
shortcut
ml
T
and
this
will
put
a
nice
vertical
list
of
your
modules.
A
So
as
promised,
this
is
all
about
module
spider
versus
module
Avail.
Now
they
both
still
exist
on
Pearl
Mudder,
but
the
function
slightly
differently
right.
The
and
the
reason
is,
is
that
the
module
system
on
Pearl
Motor
is
slightly
different
than
the
one
on
Corey.
The
module
system
on
Pro
motor
is
called
lmod,
where
the
one
on
koi
I
believe
was
tickle.
The
difference
is,
is
they
have
a
hierarchical
structure?
A
So
if
your
module
depends
on
another
module
being
loaded
for
it
to
be
able
to
be
loaded,
it
may
not
show
you
that
you
can
load
that
module.
So
that's
why
we
have
module
spider
which
will
search
those
regardless
of
that
structure
and
make
basically
give
you
more
hits
on
any
search.
You
get
so
to
illustrate
that
I
have
an
example
where
I'm
trying
to
load
I
believe
it
is
it'll.
A
We
try
to
load
it
and
we
get
an
error.
If
you
use
this,
other
module
show
command.
It's
still
unhappy.
A
A
Oh,
if
you
wanted
to
load,
create
netcdf,
you
have
to
load
cray
Dash
hdf5
first
and
if
I
do
that,
we
find
that
the
module
loaded
we
get
the
software
we
want
and
and
everyone's
happy.
So
in
conclusion,
module
spider
for
the
win
module
Avail,
you
can
still
works
still
useful,
but
but
if
you're
looking
for
something
and
you're
not
finding
it,
please
try
module
spider.
A
This
slide
I
include
it
because
I
I
think
it's
helpful.
When
you
are
trying
to
figure
out,
especially
like
libraries,
you
want
to
make
sure
your
library
is
linking,
or
you
want
to
link
a
library
to
your
application.
It's
really
helpful
to
be
able
to
use
this
module,
show
and
see
what
loading
a
module
does
to
your
user
environment,
so,
in
particular,
I
would
highlight
the
yellows
and
greens
where
the
yellows
is
setting
particular
environment
variables.
A
Sorry,
the
green
is
where
the
environment
variables
are
being
set.
So
if
you're,
when
you're
making
your
program
or
building
your
program,
if
it's
looking
for
the
hdf5
directory,
it's
going
to
set
that
variable
to
that
directory.
So
that
tells
you
where
it's
looking
for
it,
the
yellows.
You
know
changing
your
path
where
it's
going
to
be
looking
for
a
library.
So
if
you're
wondering
where
do
I
pull,
you
know
if
I
want
to
explicitly
link
and
I'm
looking
for
the
place
for
that
Library
I
can
I
can
do
it.
A
This
way,
I
get
a
lot
of
information
from
module,
show
okay,
so
that
was
the
my
major
things
on
on
modules
that
I
the
next
thing
I'm
going
to
talk
about
are
programming
environments.
A
The
three
big
programming
environments
on
pearlmeter
are
the
canoe
programming,
environment,
the
Nvidia
programming
environment
and
the
credit
programming
environment.
We
no
longer
have
a
programming
environment
for
Intel,
so
I
know
that's
been
a
pain
point
for
a
lot
of
people.
So
hopefully
the
information
here
today
will
make
it
less
painful
to
try
maybe
say
this:
the
gnu
programming
environment
and
the
GCC
compilers
and
the
g4train
compiler
and
whatnot
for
CPU
only
codes.
You
know
typically
we're
going
to
recommend
that
you
try
this
one.
First.
A
Environment,
isn't
working
for
you,
you
can
it's
really
easy
to
switch
to
a
different
program
environment
such
as
a
create
one
and
give
create
a
shot
and,
and
sometimes
if
your
code
is
not
compiling,
just
switching
from
gnu
to
Crate
and
it
compiles
and
works
and
you're
you're
good
to
go.
Then
then
you're
good
to
go
like
both
of
them
are
equally
valid.
A
So,
for
example,
if
I
want
to
compile
a
c
plus
code
and
I
have
my
compile
line,
you
know
CC
I'll
just
make
it
up
for
now,
because
we're
going
to
cover
it
later,
my
I'm
going
to
say
CC
and
the
commands
to
compile
my
code.
Well.
If
I'm
in
the
programming
environment,
canoe
gray,
is
going
to
automatically
change
that
cc
to
the
G
plus
plus
compiler
right,
you
know
the
appropriate
command
for
the
G
plus
plus
can
bother.
A
So
maybe
I
should
use
this
one
as
an
example,
because
I
did
this
one
yesterday
I
know
it
so
you
know
change
this
to
the
command.
We
want
add
a
bunch
of
stuff
that
we're
going
to
see
and
then
it's
going
to
compile
your
code
now,
rather
than
that
exact
same
compile
line
if
I'm
using
the
compile
evapor
if
I
switch
to
program,
environment,
Nvidia
and
I
use
that
exact
same
line.
A
It's
going
to
use
the
NVC
compiler
to
compile
my
code
as
long
as
I'm
using
the
wrapper,
and
it's
going
to
make
other
necessary
adjustments
under
the
hood.
So
programming
environments
work
really
well
in
conjunction
with
these
wrappers,
and
so
we
recommend
that
that
you
give
them
a
shot
and
try
them
in
this
way.
A
As
I
mentioned
you
know,
switching
between
program
environments
can
be
useful
for
for
testing
things
and
solving
problems.
This
is
pretty
straightforward.
You
don't
have
to
use
the
swap
or
unload
you
can
just
if
you're
in
if,
for
example,
I'm
in
programming
environment,
like
a
new
programming,
environment
and
I,
want
to
go
to
the
create
programming
environment,
all
I
need
to
do
is
type
module
load
program,
environment
create
and
I'm.
There.
A
Okay,
so
this
slide
has
a
lot
of
stuff
on
it,
but
this
is
going
to
bring
some
that
point
that
is
sort
of
making
about
some
of
the
benefits
to
using
the
cray
compiler
wrappers.
A
So
what
this
is
doing
is
sort
of
comparing
two
different
ways
of
compiling
the
same
Hello,
World,
open,
NC,
openmp
code,
so
suppose
I
use
the
GCC
command
and
I
compile
my
code.
You
are
seen
here
everything.
That's
that's
going
into
the
compile
line
right
if
I
use
the
CC
wrapper.
A
Instead,
what
I've
done
on
this
line
is
I've
enabled
this
flag
to
create
p
dash
verbose,
which
will
show
me
all
the
stuff
that's
being
put
into
the
compile
line
when
I
do
this
CC,
when
I
use
the
wrapper
to
compile
that's
being
hidden
behind
the
CC,
so
what
you
will
see
on
the
command
line,
if
you're
using
the
CC
wrappers
you'll
just
do
CC
hello,
world
underscore
openmp.c
and
so
on,
to
compile
your
code
and
behind
the
scenes
all
these
optimizations
for
the
CPU
architecture,
the
MPI
libraries
will
be
included
the
science
libraries,
the
create
science.
A
So
the
rappers,
like
I,
said
they
provide
a
lot
of
stuff
automatically
under
the
hood.
They
link
MPI
your
science,
libraries,
le
pack,
blast
scholar,
pack
and
more
just
automatically.
A
A
This
note,
I,
think,
is
important
to
for
people
who
sometimes
people
have
a
particular
questions
about
science.
Libraries,
you
can.
This
demand
live
size,
a
good
way
to
get
detailed
information
about
how
the
science
libraries
work
and
a
great
programming
environment.
A
If
you
have
a
build
system
such
as
cmake,
you
may
need
to
explicitly
tell
it
to
use
the
wrappers
in
the
with
a
line
like
this
right
to
include
this
line.
To
tell
so
you
make
these
are
the
the
compilers
I
want
to
use
and
it
will
take
care
of
the
rest.
If
you
have
the
traditional
configure
make
you
know,
make
install
type
of
building.
A
A
On
promoter,
the
default
is
for
my
for
libraries
to
link
dynamically,
so
so
what
that
means
is,
you
know,
like
I
said,
when
you
load
that
module
into
your
your
environment,
it
prepends
the
path.
It
knows
where
to
find
it.
So
when
you
want
to
compile
the
code
with
something
like
GSL
I'm
using
the
wrapper
I,
all
I
need
to
do
is
specify
the
package
I'm
linking
I,
don't
have
to
give
it
the
locations
or
the
includes
that's
all
taken
care
of
automatically.
So
that's
a
convenient
thing.
A
If
you're
compiling
your
own
shared
libraries,
you
should
you
know
you
can
use
this
command
to
to
essentially
achieve
the
same
result
with
these
dynamically
linked
libraries
and
by
default.
Create
will
make
a
well
we'll
build
these
executables
by
default
to
make
them
dynamically
linked.
A
A
This
slide
summarizes
some
some
useful
compiler
compilation
Flags,
when
you're
compiling
in
particular
the
highlighted
blue
line
is
something
I
want
to
mention.
Is
that
to
enable
openmp
for
your
for
your
codes,
you
have
to
include
the
flag.
My
understanding
is
on
Quarry.
That
was
happened
by
default,
but
now
you
must
explicitly
include
that
flag
with
your
compilation
to
get
that
capability
and
finally,
just
some
like
tips.
A
The
stuff
we've
been
encountered,
especially
for
if
you're,
trying
to
compile
older
codes
coming
from
Corey
to
Pearl
Mudder,
some
quick
tips
for
you.
So
if
you're
doing
a
Fortran
code,
there
is,
if
you
find
that
your
code
doesn't
automatically
just
compile
like
it
did
before,
especially
if
you're
coming
from
the
Intel
compiler
to
the
maybe
the
g4train
compiler,
you
can
look
for
some
compiler
Flags
to
to
basically
alleviate
some
of
those
errors.
In
particular,
I
recommending
this
Dash
standard
equals
Legacy
flag.
A
Another
one
that
you
hear
a
lot.
Is
this
F?
Allow
argument
mismatch
both
you
know
this
one,
it's
included
in
the
standard
Legacy
flag,
but
you
can
also
do
it
separately
to
achieve
the
same
result
for
C
plus
plus.
You
can
take
a
similar
path
to
look
for
things
like
the
F
permissive
flag,
which
will
make
the
compiler
less
strict.
A
A
Finally,
just
to
mention
there's
some,
you
know
I
said
the
big
three
there's
some
other
ones
that
might
programming
environments
and
compilers
that
maybe
a
little
bit
harder
to
find
so
I
highlight
them
here.
For
example,
we
have
this
clang
Intel
in
compiler,
available
under
the
programming
environment,
llvm,
it's
not
as
full
featured.
It
doesn't
use
the
compiler
wrappers
right
the
way
you
access
it
is.
First,
you
have
to
load
these
module
files
by
using
module
use.
A
A
A
Okay,
just
to
to
make
this
totally
Crystal
Clear.
If
you
have
a
code
on
Quarry
and
you
want
to
run
it
on
promoter,
you
probably
should
recompile
it.
You
know
it's
I
mean
I,
imagine
maybe
it
could
run
I
I'd
be
surprised,
but
take
your
take
your
source
code
move
it
to
when
you're
on
Pro
motor
recompile
it
again
and
then
start
doing.
Your
runs.
I
mean.
A
If
you
yeah
I,
that's
this
is
the
way
forward.
So
what
I'm
gonna
do
now
is
show
you
some
examples
of
just
how
to
compile
codes
on
parameter
and
I.
Would
then
my
example
code
is
just
a
hello
world
that
has
both
MPI
and
openmp
built
into
it.
So
you
know
the
details
of
this.
Aren't
really
important,
just
to
know
that
it's
a
simple
code
that
does
these
things.
A
All
right,
so
that's
my
example.
These
are
the
modules
I
have
loaded
like
we
said
before.
You
can
see
I'm
in
the
program
environment
canoe.
So
when
I
do
this
command,
it's
going
to
use
the
canoe
compiler
to
compile
my
code
because
I
want
to
enable
openmp
I
have
to
include
that
flag.
It
will
not
be
included
by
default,
so
you
must
include
that
flag
now,
I'm
specifying
the
environmental
variables
for
openmp
and
that
one
should
be
open
and
proc
buying.
A
True
should
now
go
to
equal
spread,
we'll
talk
about
that
later,
but
I
just
have
to
point
out
here
and
I'm,
also
in
an
interactive
note,
so
I'm
not
not
running
on
a
login
node
in
case
you're
wondering,
but
so
the
takeaway
from
this
short
example
is,
if
you
were
using
the
wrappers
before
and
you're,
using
them
now
compiling
on
Corey
and
compiling
on
promoter.
Isn't
that
different
right?
This
you?
If
you're
using
compiler
wrappers,
you
should
be
mostly
the
same,
and
it
should
you
know,
like
we
say,
just
work.
A
A
What
I've
done
is
I've
I
have
a
software
package
that
I
manually
installed
in
my
user
space
and
I
want
to
link
against
the
libraries
that
it
provides
so
this
you
know
this
is
kind
of
a
more
manual
approach.
I
think
it's
worth
looking
at,
because
I'm
sure
I'm,
not
the
you
know,
from
my
experience,
there's
more
than
one
or
two
users
who
who
wants
to
manually
link
to
own
libraries.
A
So
this
is
what
this
example
is,
showing
you
I'm
trying
to
compile
that
my
example.
This
hyper
underscore
exe
that
requires
a
hyper
underscore
utilities.h
file.
That's
ins,
that's
included
in
the
hyper
package
that
I've
already
downloaded
and
installed
in
a
different
location,
because
the
path
to
that
location
is
kind
of
long
I,
I
save
it
as
a
environmental
variable
like
this,
so
the
hyper
underscore
dir
and
then
I'm
going
to
use
I'm
going
to
use
that
to
access
the
files,
but
also
in
my
compile
line.
A
So
this
is
just
showing
you
how
I
can
use
that
environmental
environment
variable
to
to
add
to
to
run,
commands
and
use
it
in
other
ways.
So.
A
A
So
the
the
next
section
I
spent
quite
a
bit
of
time
on
and
and
this
is
kind
of
like.
A
You
know
not,
you
know
not
100,
but
but
some
feeling
or
Instinct
about
whether
what
you're
doing
seems
reasonable
and
and
to
do
that.
I
really
have
to
talk
a
lot
about
the
architecture
of
the
promoter,
CPU,
node
and
and
things
you
know,
and
the
way
the
memory
is
set
up.
So
that's
what
the
next
section
is
going
to
be
discussing
I'm
starting
here
from
the
JavaScript,
because
this
is
how
I
went
to
you
to
kind
of
approach.
A
It
right,
I
want
to
translate
the
the
commands
and
the
parameters
that
I
select
here
over
to
these
ideas.
All
right.
So
you
know
I
listed
these
key
terms
here:
the
node
MPI
task,
logical,
CPU,
thread,
physical,
core
processor
and
Newman
domain.
All
of
these
are
going
to
relate
back
to
the
parameters
and
things
you
do
here.
So
let's
go
for
it.
A
And
the
first
you
know
thing
that
you're
going
to
encounter
is
that
the
terms
used
for
things
are
not
always
the
same.
They
sometimes
are
the
same
and
they
are
not
always
different
and
they
are
sometimes
different.
So
if
you
pull
off
the
prometer
system,
architecture,
page
what
it
says
about
a
promoter,
CPU
node-
and
you
compare
it
about
other
places
in
the
nurse
dock,
some
places
it
will
call
it
CPUs.
A
So
if,
if
it's
a
CPU
on
this
one
I'm
going
to
be
talking
about
the
processor,
when
I
talk
about
a
processor
I'm
talking
about
the
the
the
the
chip
that
you
see
in
the
motherboard,
when
I
talk
about
physical
course,
I'm
going
to
talk
about
how
that
processor
that
chip
you
see
in
the
motherboard
is
split
up
into
smaller
computational
units
and
with
inside
each
one
of
those
physical
cores.
You
know
we
have
I'm
going
to
call
them
logical
CPUs,
but
this
is
when
we
start
talking
about
like
hyper
threads.
A
That's
another
way
to
think
about
Hardware
threads
there's
also
terms
that
have
been
used
to
describe
this
so
on
Pro
motor
tpus,
there's
two
logical,
CPUs
per
physical
core
and
again
this
is
just
to
point
out
that
you'll
also
see
socket,
but
here
I'm
going
to
be
using
the
word
processor
to
refer
to
the
chip
in
the
same
the
chip
that
goes
in
the
socket,
essentially
so
by
by
adopting
these
terms
and
keeping
them
consistent
to
the
next
couple.
Slides
I
hope
that
helps
keep
these
Concepts
clear.
A
So
if
you
were
walking
down
the
street
and
you
ran
into
a
pearl
Mudder
CPU
compute
node,
would
you
know
what
it
was?
Would
you
know
what
it
looks
like?
Well,
here's
a
nice
picture.
Okay!
So
if
you
were
walking
down
the
street,
you
run
into
this
thing.
This
is
the
Pearl
Motor
CPU
compute
node.
This
diagram
here
on
the
right.
A
What
I
want
to
do
now
is
relate
these
terms
to
parts
of
this
diagram.
So
the
first
thing
is
the
node,
so
the
node
is
I'm
going
to
say
is
the
big
outer
square
that
includes
both
this
yellow
box
and
this
yellow
box,
which
represents
the
processors
so
in
each
parameter,
CPU
node.
You
have
two
AMD
Milan
processors
and
we're
going
to
count
from
zero,
zero
and
one
so
zero
one
here
and
one
is
here
inside
each
one
of
these
processors
is
going
to
have
64
physical
cores
right.
A
That's
the
lines
you
see
here
and
whatnot
next
to
each
group
of
16
of
them.
They
have
their
own
memory.
That
will
come
in
to
play
later
and
we'll
discuss
it
more.
But
within
each
one
of
these
physical
cores
you
have
two
logical
CPUs
for
doing.
You
know
what
they
call
hyper
threading.
So
you
get
two
logical
CPUs.
So.
A
That's
the
diagram
for
the
Pearl
Motor
CPU
and
those
are
the
terms
so
I'm
trying
to
give
you
a
sense
of
how
those
relate
I'm
gonna,
give
you
this
office
building
analogy.
Hopefully
this
helps
you
Maybe
not
immediately,
but
later,
when
we
start
thinking
about
things,
it's
going
to
be
useful
to
be
thinking
about.
When
are
we
talking
about
different
parts
of
the
architecture
and
how
it
affects
things
so
so
bear
with
me.
A
A
You
can
think
of
that
one
floor
of
the
office
building
having
two
office
floor
plans,
one
representing
each
processor
inside
your
office
floor
plan
you're
made
up
of
little
cubicles,
so
each
square
is
a
little
cubicle
here
and
you
know
in
case
you're
you're
following
along
the
question.
The
mystery
question
of
the
day
is
which
cubicle
represents,
which
system
right.
A
So
this
one
there's
one
nerve
system
that
would
have
a
four
person
cubicle
and
only
one
so
which
one
or
which,
which
nurse
system
node,
has
a
four
person
cubicle
the
pro
the
cubicles
on
the
promoter.
Cpu
are
two
person
cubicles,
so
inside
a
two
person
cubicle
any
little
box
like
this,
you
have
two
people
working
at
their
stations,
all
right
and
those
are
The
Logical
CPU
or
the
hardware
threads.
So
the
cubicles
are
the
physical
cores.
These
are
the
logical
CPUs
that
are
doing
the
work
within
it.
A
All
those
physical
chords
come
together
to
be
the
processor,
and
you
know
we'll
bring
this
home
Point
home
more.
But
you
know
physical
cores
that
are
closer
together,
usually
it's
easier
for
them
to
communicate
if
I'm
a
physical
core
working
here
over
with
the
office
people
over
here
that
might
take
longer
or
might
not
be
as
efficient.
So
that's
when
we
start
getting
into
the
new
more
domains
which
we'll
talk
more
about
okay.
A
So
this
is
the
Highlight
where
we
are
now
so
if
I
say
Dash,
N2
I'm
talking
about
notes,
you
now
have
a
sense
of
what
those
nodes.
What
I'm
talking
about
when
I
say
a
note
right
now,
I
saw
Dash
c16
right.
Those
are
The
Logical
CPUs.
These
are
the
workers
inside
your
cubicle
inside
your
office
plan
inside
your
office
building
right.
So
you
have
a
clear
sense
of
what
this
is
talking
about,
what
these
numbers
mean
and
how
it
corresponds
to
the
hardware
on
that.
A
On
that
note,
so
that's
why
I
highlight
these
and
then
again
here
when
we
talk
about
the
CPU
bind
setting
when
I
say
core
is
here,
you
know
that
this
is
relating
to
the
physical
core
right,
the
cubicles
that
we
talked
about
here.
So
now
you
have
a
sense
of
this.
What
this
core
word
is
meaning
an
association
with
the
hardware.
A
So
I'm
asking
a
lot
but
bear
with
me
again
from
my
cargo
analogy:
MBI
tasks
and
threads
are
about
how
you
split
up
your
work.
So
the
first
step
is
taking
your
simulation
code
and,
if
it
is,
you
know,
if
you
use
MPI
in
your
code,
you're
breaking
up
the
work.
A
That's
all
this
stuff
in
the
back
of
the
truck
into
smaller
blocks
right,
so,
in
particular,
I'm
thinking
of
this
picture,
representing
a
one,
two,
three,
four,
five:
six,
seven,
eight
nine
ten,
eleven
twelve
thirteen
fourteen
fifteen
NPI
tasks,
I
really
should
have
put
a
16
here.
A
It
would
make
me
feel
a
lot
better,
but
that's
okay,
each
one
of
these
palettes
of
boxes,
which
is
you
can
think
of
as
one
MBI
task
in
this
analogy,
and
each
MBI
each
MBI
task
is
this
palette
of
little
boxes,
which
you
can
further
break
up
into
open,
MP
threads,
all
right,
so
using
MPI
tasks
and
opening
openimp
threads
is
a
way
to
like
break
down
your
work
into
smaller
pieces
from
from
one
starter
stage
to
lower
stage.
A
A
Those
are
the
palette
of
boxes,
right,
you've
taken
that
truck
full
of
work
and
those
are
the
that's
the
actual
number
of
pallets
of
boxes
or
pallets
of
in
the
back
of
the
truck
all
right,
that's
with
a
32
and
that's
what
that
corresponds
to
the
openmp
number
of
threads
is
how
many
boxes
on
each
pallet
right.
So
when
we
talk
about
a
thread
now,
this
is
the
section
we're
talking
about,
and
now
you
have
an
intuitive
sense
of
what
piece
of
the
work
of
your
simulation.
That's
relating
to.
A
Get
the
rest
of
the
terms
we
have
to
understand
Newman
domains
and
if
you're,
like
me,
This
this
term
may
have
been
something
that
didn't
come
as
easy
as
the
other
ones.
So
what
is
the
Newman
domain?
A
A
Newman
domain
is
a
non-uniform
memory
access
or
is
what
it
means
is
non-uniform
memory
access,
and
essentially
it
goes
back
to
this
idea
that
if
I
have
my
a
physical
course
Computing
work
on
my
data
I
have
some
memory
which
keeps
that
data
really
close,
so
I
can
do
really
fast
work,
but
if
I
have
to
talk
to
get
the
data
from
over
here
to
work
on
I
have
to
do
this
communication
step
where
the
communication
comes
through
here
and
then
I
can
get
it
and
then
I
can
do
the
work.
A
So,
if
I've
got
this
person
working
on
this
one
and
talking
to
this
one
to
work
on
that,
one
you
can
see
bouncing
back
and
forth
would
make
that
a
lot
slower
than
if
they
could
work
right
next
to
each
other
and
didn't
have
to
exchange
the
data
from
one
memory
bank
to
another.
So
the
takeaway
here
is
it
matters
where,
on
the
processor
you're
doing
the
work.
If
you
want
to
achieve
maximum
performance
now,
if
you're
closer,
you
get
better
performance,
that's
the
whole
point
of
the
Newman
to
me.
A
So
now,
let's
go
back
to
our
diagram
of
the
parameter.
Cpu
I
should
say
the
promoter
CPU
node
right.
If
we
look
inside
each
yellow
box,
each
processor
they're
split
up
into
four
numer
domains.
So
that
means
that
each
promoter,
CPU
node,
has
a
total
of
eight
pneuma
domains
on
it.
So
when
you're
assigning,
when
we
set
some
of
these
commands
to
assign
where
the
work
is
going
to
go
on
the
hardware
you're
going
to
want
to
be
aware
of
these
eight
different
Newman
domains
as
such,
so
so
it's
an.
A
This
is
a
way
to
basically
provide
that
information
in
a
detailed
and
kind
of
and
detailed
way.
So,
if
you
are
in
one
of
those
promoter,
compute
nodes,
you
can
run
the
command
num
act,
L,
Dash,
H
and
you'll
see
that
there
are,
it
says,
eight
nodes,
but
these
are
eight
Numa
domains
labeled
from
zero
to
seven.
It
will
tell
you
the
physical
cores,
if
you're,
counting
only
red
right,
you'll
get
up
to
the
128,
so
starting
from
zero
to
127
128
physical
cores.
A
If
you
also
include
The,
Logical
CPUs,
then
that's
where
the
black
numbers
come
from.
So
in
this
new
domain
you
have
logical
CPUs,
which
can
physical
cores
16
physical
cores,
which
includes
all
of
these
different
logical
CPUs.
A
A
It
has
only
12
units
of
distance
between
these
two
and
you
can
consider
that,
as
like
a
you
know,
distance
of
time
here,
12
units
of
time,
whereas
if
I
was
in
one
and
I'm
talking
all
the
way
to
the
other,
you
know
these
new
domains
live
on
the
other
processor.
That
time
can
include
it
can
increase
to
almost
three
times
as
much
right,
so
the
the
pneuma
domains
on
my
processor
can
be
very,
very
quick.
A
A
Because
you
know
like
we're
talking
about
three
times
difference
in
performance,
we
provide
multiple
tools,
so
you
can
verify
that
the
Affinity
is
working.
The
way
you
want.
We
have
these
binaries
pre-compiled
binaries.
You
can
run
these
with
this
command
that
will
spit
out
the
information
of
where
your
ranks
are
and
what
the
Affinity
settings
are
for
each
one.
A
A
Just
like
you
see
exactly
the
information
the
formats
you
want
to
hear
so
that
you
can
read
these
things
to
make
sure
we're
getting
the
threat
if
you
need,
with
that
said,
I
think
for
most
people
we're
going
to
rely
on
sort
of
the
kind
of
nurse
defaults
and
you're
going
to
get
pretty
good
performance
with
them
as
long
as
we
use
them
correctly.
So
so
these
are
the
this
is
the
like.
A
We've
got
this
I'm
going
to
start
telling
you
what
the
suggested
way
is
to
run
to
make
sure
you
don't
incur
pneuma
performance
penalties
and
that
your
code
runs
well.
So
here
at
the
center
of
the
general
rules
of
thumb,
if
your
number
of
MPI
tasks
is
less
than
the
number
of
physical
cores
on
on
the
promoter
CPU,
the
number
of
MBI
tests
on
that
node
is
less
than
the
number
of
physical
cores
in
that
node.
Then
you
should
be
including
this
flag.
A
In
my
experience,
this
is
almost
all
the
time
right
that
your
MBI
task
is
less
than
the
number
of
physical
cores,
so
I
would
say
like
if
I
had
to
guess
right
now,
I
want
to
say
90
if
you're,
in
sort
of
the
much
the
less
common
situation
where
your
number
of
MBI
tasks
is
greater
than
the
number
of
physical
cores,
then
you're
going
to
want
to
set
this
to
a
CPU
bind
equals
threads,
yeah
I'm
going
to
leave
it
at
that,
and
we
can
talk
in
more
detail
if
you,
if
you
want
to
follow
up
and
understand
these
things,
deeply
I'm
happy
to
chat
more
later.
A
If
you're,
one
of
the
consequences
of
these
Nemo
domains
is,
if
you're
running
a
hybrid
MPI,
openmp
code,
you
want
to
use
at
least
API
MPI
tasks.
So
that
way,
when
your
work
gets
split
up
across
those
newer
domains,
your
openmp
threads
are
close
enough
to
each
other
that
they
can
work
quickly.
A
Okay,
you
can
imagine
if
you
had
only
one
MPI
task
for
the
entire
node,
that
it
might
put
one
openmp
thread
way
over
on
this
side
of
the
the
One
processor
and
where,
on
the
other
side
of
the
other
processor,
so
that
the
communication
between
them
would
be
really
slow,
whereas
if
you
add
more
Pro,
more
API
tasks,
you
would
avoid
that
situation
again.
Another
rule
of
thumb
is
the
value
of
Dash.
A
D
should
be
the
number
of
physical,
logical
CPUs
should
be
greater
than
the
number
of
openmp
threads
and
again
for
placing
things
correctly
to
tell
the
the
the
job
scheduler
where
to
put
the
stuff
in
the
right
places
we're
recommending
you
always
set
this
OMP
proc
buying
to
spread
and
only
places
to
threats
again,
there's
some
smaller
edge
cases
where
you
might
choose
something
different,
but
for
most
people
most
of
the
time,
this
is
probably
going
to
give
them
most.
A
The
performance
they're
looking
for-
and
the
only
other
thing
to
point
out
here,
is
that
previously
we'd
recommended
OMP
underscore
proc
underscore
bind
equals
true,
but
we've
found
that
spread
is
probably
a
better
a
better
option
in
in
general
on
Pro
motor
CPUs.
A
So
this
corresponds
to
these
last
parts
of
the
job
script,
the
OMP
places
the
CPU
bind
cores
and
I
can
say
now.
When
you
look
at
a
JavaScript
like
this,
you
should
have
some
sense
or
feeling
about
where
these
terms
are
coming
from
and
how
you're
setting
them
this
this
chart
gives
you
sort
of
the
differences
between
some
of
the
nodes
that
you
know
versus
the
ones
on
perlmeter
and
how
these
numbers
break
down
and
how
you
can
use
them
when
you're
making
those
decisions.
A
So
this
is
just
to
highlight
that
this
this
information
is
here
and
again.
You
know
this
formula
will
always
work
as
well
I,
just
for
myself
personally,
I
find
it
helps
to
have
that
intuitive
understanding.
A
So,
if
I'm
doing,
I've
got
an
example
of
an
NPI
only
JavaScript
here,
one
from
Corey
Haswell
I'm,
including
the
best
practice
for
MPI
only
this
is
no
we're
not
doing
any
openmp
threads,
but
I
include
this
line
as
the
best
practice
to
make
sure
that
is
always
clear
and
what
I
have
here
is
I
have
a
script
as
such,
where
I'm
using
40
nodes
with
1280
MPI
tasks
on
Corey
Haswell,
but
I
want
to
write
a
job
script
that
runs
this
efficiently
on
promitter
CPU
and
one
way
I
can
do.
A
A
A
and
then
I
divide
that
by
the
number
of
NPI
tasks,
and
that
leaves
me
with
two
logical
CPUs
for
each
MPI
task
and
that's
why
I
get
the
dash
E2
likewise,
because
the
numbers
are
are
the
number
here
especially,
is
going
to
be
different
right
I?
Do
my
1280
MPI
tasks,
I'm,
dividing
across
10
notes,
so
now
I
have
128
NPI
tasks
for
each
promoter,
CPU
node
each
promoter,
CPU
node,
has
a
256
logical
CPUs.
A
So
when
I
do
that,
Division
I
get
two
logical
CPUs
for
each
MPI
task
and
that's
what
I
put
here
again,
because
90
90
of
the
time
we're
still
we're
still
in
this
scenario,
this
is
what's
included
here.
A
Another
option
is
I
could
have
kept
the
nodes
at
40
and
changed.
You
know
now
now
I'm,
basically
summoning
more
Hardware
to
solve
this
problem
and
the
way
I.
The
way
that
affects
my
job
script
is
the
C
changes
accordingly,
so
I'm
doing
the
same
math,
where
I
have
1280
processors
and
divide
that
by
100
2080
MBI
tasks,
which
I'm
dividing
by
the
number
of
nodes
that
leaves
32
MPI
tasks
per
nodes
now,
but
because
I
have
256
logical,
CPUs
I
can
contribute
eight
logical
CPUs.
A
So
this
is
the
test.
It
may
look
simple
at
first,
but
the
the
curveball
is
I'm
I'm,
also
adding
openmp
and
the
second
curveball
since
we're
also
running
late
I'm,
going
to
wrap
up
real
fast.
Is
that
doesn't
make
a
lot
of
difference?
Here's
my
hint!
This
is
the
math
that
I
explained
in
the
previous
slides
to
calculate
this
number.
It's
doing
the
same
thing.
A
I
just
described
right,
I'm,
doing
cast
divided
by
nodes,
then
logical,
CPU,
logical,
CPUs,
divided
by
the
MPI
task
on
each
node
and
then
the
step.
Three
is
something
that
I
do
because
I'm
also
using
openmp
threads
I,
want
to
make
sure
that
the
logical
CPUs
is
greater
than,
and
they
should
say
greater
than
or
equal
to
the
number
of
threads
that
I'm
assigning
to
each
MPI
task
here
right,
so
32
is
bigger
than
eight,
so
I
am
good
and
this
will
script
should
run
pretty
well.
A
There
you
go,
and
this
is
the
script
is
to
this
slide-
is
to
highlight
the
difference
between
an
MPI
only
run
and
a
hybrid
MPI
openmp
run,
and
you
notice
the
settings
for
C
CPU
bind
cores,
those
haven't
changed.
The
only
changes
here
is
when
the
environment
variables
the
ompi
environment
variables.
I'm
studying
the
last
thing
to
point
out
is
we
have
a
JavaScript
generator,
which
you
know,
uses
sort
of
a
drop
down
approach
where
you
can
answer
these
questions,
select
answers
and
it
will
automatically
generate
a
job
script.
A
This
is
a
good
place
to
learn
and
a
good
place
to
start.
It
may
not
cover
every
single
edge
case,
but
now
you
understand
the
hardware
behind
it
and
why
these
numbers
are
what
they
are.
So
if
something
comes
out,
not
perfect,
you
can
fix
it
and
again
just
to
point
out
here
we're
going
to
update
this
very
shortly.
But
as
soon
as
we
do
that
you
will
see
LMP
Brock
buying.
Anything
where
it
says
true
will
be
now
be
equal
to
spread.
A
So
just
to
point
that
note
out
here
with
that
I
am
going
to
stop.
These
are
my
key
suggestions.
So
if
you
are
going
from
query
to
parameter
use
module
spider
for
a
comprehensive
module
search,
recompile,
your
query
codes
on
Pearl
Mudder
start
with
the
program
in
the
gnu
programming
environment,
then
move
to
try
to
create
one
or
try
some
of
the
other
ones.
A
B
Erica,
you
think
you
just
go
over
the
Hands-On
slides,
because
the
GPU
talk
has
the
Hands-On
routine.
So
let's
do
the
CPU
one
here
as
well.
A
One
side:
okay,
yeah
right
so
later
on
we're
gonna,
have
a
Hands-On
section.
This
is
Gonna
include
these.
A
Cpu
only
codes
that
you
can
play
with
that
you
can
work
with
and
you
can
try
out
some
of
these
Concepts
on
and
nurse
staff
will
be
around
to
to
guide
you
through
that
and
and
help
you
with
those
examples.
Those
examples
are
nice
because
you
know
it's
good
to
start
with
something
that
isn't
the
most
complicated
and
build
your
way
up
so
Helen.
Do
you
want
to
say
anything
else
about
this
slide
before
I
move
on.
B
So
yeah
we
have
a
few
slides
and
there's
a
readme.first
and
suggest
you
doing
the
exercise.
Following
this
order.
We
have
zero
and
MPI
hello
world
example,
hybrid
MK,
openmp,
with
c
and
Fortune,
and
also
Affinity
example.
So
Affinity
is,
as
Eric
has
presented.
You
will
see
all
the
results.
All
these
Affinity
values
you
can
compare
against
the
diagram,
the
physical
core,
logical
numbers.
You
know
where
your
presence
spread
or
bind
to
and
there's
another
spec
using
a
package
available
from
the
e4s
deck
on
test.