►
From YouTube: Using the Intel Compiler on Edison
Description
Tips for using the Intel Compiler on NERSC's Cray XC30 Edison system.
A
Okay,
thanks
next
up.
A
B
Anything
in
there-
well,
yes,
that's
actually,
probably
the
prime.
Despite
these
various
titles
here
which
we
we
got,
we
were
coming
up
with
this
before
we
got
edison,
so
we
had
we're
just
thinking.
What
were
you
going
to
call
this?
This
is
basically
the
edison
programming
environment
and
differences
with
hopper
and
as
a
matter
of
fact
that
you
end
up
talking
mostly
about
intel,
because
there
really
aren't
many
differences
with
the
other
two
compiling
environments.
B
Before
I
give
you
the
introduction
I'll
give
you
the
caveats,
this
is
not
this.
The
edison
programming
environment
is
a
moving
target.
It's
we
have
a
lot
of
requests
to
praise.
You've,
probably
heard
to
to
make
it
easier
to
use,
for
we
have
a
lot
of
requests
for
changes
and
such
all.
B
B
Well,
mostly,
the
difference
is
who
cares
about
the
similarities
between
edison
and
the
compiling
environment
as
it
as
it
impacts
a
programmer
and
a
code
runner?
Since
that's
what
my
experience
has
been,
then
I'll
talk
a
little
bit
into
more
detail
about
the
edison
intel
programming
environment,
which
is
quite
different
from
the
way
it
is
on
hopper
and
from
the
other
two
talk
a
little
bit
about
porting
from
pgi,
on
hopper,
to
intel
on
edison
and
since
pgi
is
going
to
be
gone
and
I'll
talk
a
bit
about
some
performance.
B
B
B
Edison
supports
three
compilers
three
programming
environments,
intel
which
is
default
differently
from
from
hopper
and
franklin,
cray
and
gnu.
Our
pgi
and
path
scale
compilers
are,
will
not
be
installed
on
the
system.
B
I
didn't
think
anybody
was
using
pastel,
but
I
got
a
request
right
today
from
somebody
who
wanted
us
to
port
the
pascal
5.0
beta
compiler
to
hopper,
so
there
are
people
out
there
who
still
use
it
gnu
and
cray
significantly.
I
think
in
the
long
run
this
may
be
one
of
the
most
significant
differences
use
live
site
by
default
for
math
library
routines
as
we
as
they
said
in
the
previous
talk,
you
can
use
mkl
with
either
of
them.
B
I
haven't
done
that
yet
and
ultimately,
we'll
probably
we'll
set
up,
probably
if
craig
doesn't
do
it
for
us,
we'll
probably
set
up
a
module
to
to
allow
you
to
link
with
with
mkl,
as
we
do
on
on
the
carver.
B
Our
our
cluster
intel
uses
mkl
by
default
and
lifecy
isn't
available
for
intel,
at
least
at
this
time
on
on
this
system,
and
we
recommend
people
use
minus
nkle
equals
cluster
at
least
right
now,
as
a
load
flag,
though,
did
you
say
that
it
should
be
a
compiler
flag
too,
or
did
we
just
need
it
to
load
time.
A
B
B
B
It's
supported
a
legacy
still
on
hopper,
eventually,
they'll
go
away,
you
can
only
use
the
cray-dash
ones,
but
it
makes
sense
not
to
start
with
the
old-fashioned
names
on
edison
and,
as
we
have
spoke
quite
a
bit
in
the
previous
talk,
the
intel
openmp
and
the
hybrid
mpi
openmp
runtime.
Iran
environments
do
not
work,
or
rather
they
do
not
work
efficiently
by
default.
B
At
this
time,
and
this
we're
sort
of
putting
this
here
and
maybe
cray
will
fix
this
for
us
and
we'll
give
the
workarounds
provided
by
junjie
and
helen
later
in
this
talk.
B
How
come
you're
supposed
to
move?
Oh
there,
okay,
edison
math,
libraries,
gnu
and
cray
math
library
is
the
same
as
on
hopper.
It's
the
old
tried
and
true
cray
life
side
been
around
forever.
No
and
again
on
hopper
as
it
was
on
franklin.
Special
flags
are
needed.
Everything
links
automatically
intel
uses
the
the
mkl
math
library
and
again,
as
I
just
mentioned
from
the
side,
my
mkl
equals
cluster
as
flag
at
length
time
to
load
it
livesci
is
currently
not
available
for
the
intel
compiler.
B
Okay:
here's
the
details
on
the
cray
module
name
changes
and
oh
the
two
significant
things
down:
are
there
at
bullets:
five
and
six
or
sub
bullets?
Five
and
six.
There
are
two
cray
modules
that
are
not
yet
available
for
the
intel
compilers,
the
petse
and
the
tri
linux.
B
B
So
converting
from
pgi
to
intel
the
these
are
the
flags
and
the
equivalents
things
people
use
pretty
frequently
the
thing
to
really
to
talk
about,
and
I'm
going
to
talk
about.
The
next
slide
on.
That
is
what
is
the
recommended
flag
to
produce
well-optimized
code
in
general
in
pgi?
It's
minus
fast
intel,
it's
minus
fast
with
this
other.
B
This
is
something
I've
just
discovered
myself,
just
in
the
past
couple
days
that
this
produces
the
best
well
optimized
code
at
runtime,
as
well
as
minimizing
the
compile
time
minus
fast,
but
with
minus
no
dash
ipo
with
it.
If
you
just
do
minus
fast,
there
are
problems
with
it
that
I
will
talk
about.
A
C
B
C
B
Okay,
fast
option
and
they're
very
different
between
the
two
compilers
fast
is
to
quote
the
man
page,
a
generally
optimal
set
of
options
chosen
for
targets
that
support
sse
capability.
It's
it's
fast,
but
it's
not
a
really.
It
doesn't
do
a
lot
of
very
big
analysis
and
it's
well.
It's
basically
used
to
be
called
fast
sse
that
one
time
they
were
thinking
about
keeping
those
two
things
separate,
but
they
at
least
for
years
they
they're
they're
the
same
thing
intel.
B
It's
includes
a
lot
of
optimizations,
but
most
significantly
it
includes
as
we're
just
talking
about
inter-procedural
optimization,
which
can
increase
compile
time
by
an
order
of
magnitude
or
cause
it
to
fail
in
order
of
magnitude
gtc.
B
If
you
compile
it
with
minus
fast
and
it
will
literally
take
a
order
of
magnitude
longer
than
it
would,
if
you
just
ordered,
compile
it
with
the
default
optimization
or
if
you
compile
it
with
minus
fast,
minus,
no
ipo
so
and
and
it
does,
you
do
see
it
fail,
because
intel
seems
to
have
a
hard
time
keeping
where
all
of
the
routines
are,
particularly
if
you're
in
complicated
make
files
using
different
directories-
and
I
have
seen
it
just
say:
oh,
we
can't
find
this
dot
ipo
or
whatever
the
stuff
that
you
stick
in
so
yeah
it's
it's
I
I
had.
B
I've
always
run
my
benchmarks
against
fast
and
before
I
just
sort
of
ignored
it,
because
on
hopper
doesn't
do
significantly
better
than
the
default.
In
fact,
often
it's
not
does
worse,
but
I
found
to
my
shock
when
I
was
running
benchmarks
on
edison
the
fast
for
many
benchmarks,
particularly
the
larger
benchmarks.
It
was
producing
a
faster
running
code
than
the
hopper
recommendation,
which
is
the
default
so
I
didn't
want.
I
was,
I
had
very
cold
feet
for
the
reasons
of
about
ordering
about
just
recommending
minus
fast,
but
I
I
thought
well.
B
So
that's
my
recommendation
for
users.
If
you
want
a
fast
high
level
of
optimization,
I
would
recommend
minus
faster,
minus
no
ipo
over
the
hopper
recommendation,
the
default
on
edison
and
as
I
was
just
mentioning,
there's
no
significant
improvement
to
minus
fast
order.
Minus
fast
non-ipo
over
the
default
on
hopper.
I
ran
my
benchmarks
again
on
hopper
just
to
double
check.
This
does
an
executable,
carry
evidence
of.
B
B
B
Of
ibm
is
very
elegantly
laid
out
and
a
very
modular
set
of
compilers,
so
it
would
not
surprise
me
in
fact
you
would
use
it.
It
can
call
a
separate
program
if
you
use
some
optimizations
that
you
it
would
just
go,
and
rather
than
have
been
one
big
chunk
compiler
it
was,
it
was
extremely
modular.
You
could.
A
B
Okay,
this
is,
we
spent
a
lot
of
time
talking
about
this
about
the
intel,
hybrid,
openmp,
runtime
environment.
I'm
not
going
to
be
talking
about
hyper
threading
in
this
talk.
It
was
covered
very
well
in
the
previous
talk
and
I
haven't
really
experimented
with
yet
yet
so
I
don't
have
anything
intelligent
to
say
about
it,
as
we
know
from
previous
talk,
create
thread,
affinity,
settings
which
make
a
lot
of
sense
and
a
very
good
idea,
performance,
wise
and
I
intel's
runtime
openmp
environment
conflict
because
of
that
awful
extra
thread
there.
B
So
you
have
two
threads
in
essence,
settled
on
the
schedule
on
the
same
core
and
means
the
job
takes
twice
as
long
as
it
should
so.
Here's
the
current
workaround-
and
I
know
this
works
because
I've
run
a
bunch
of
openmp
benchmarks
on
it.
It
might
be.
There
might
be
other
things
that
work
too.
This
was
janji
and
helen
came
up
with
this
and
I
I
will
get.
I
do
get
the
appropriate
speed
ups
for
for
the
benchmarks,
when
I
use
these
things,
when
you
have.
B
You
have
two
two
conditions
here:
you
have
omp
num
threads
less
than
or
equal
eight.
In
that
case
you
set
the
kmp
infinity
affinity
to
compact
and
you
run
with
a
pneuma
node
cc
newman
node
flag.
B
If
you
have
greater
than
8
and
less
than
or
equal
16
k
infinity
goes
to
scatter
and
you
use
the
cc,
none
affinity
flag,
you
break
all
affinity
rather
and
again
these
do
work.
There
may
be
other
ways
of
doing
it,
but
this
this
will
work.
B
Compiler
performance
on
edison
yeah:
this
is
the
last
slide,
but
I'll
probably
talk
a
little
bit
more
than
I
have
written
up
here.
B
I
have
a
bunch
of
nurse
six
and
in
nas
parallel
benchmarks,
3.1.1,
that
I
used
to
just
sort
of
look
at
compilers
and
performance
and
such
all
perform
all
compilers
perform
produce
significantly
faster
code
on
edison
compared
to
hopper.
B
When
I
say
significantly
on
the
average,
you
get
a
two
and
a
half
times
speed
up
on
on
edison
over
hopper
and
if
that
holds
up
for
regular
codes,
that
will
be
probably
the
biggest
jump
in
per
processor
per
core
performance
at
nurse
since
the
acquisition
of
the
c90
way
back
in
the
early
90s.
B
The
crane
intel
have
at
least
on
my
benchmarks,
which
are
don't
use
just
a
couple
of
them
even
use
the
math
libraries
have
quite
comparable
performance,
runtime
performance,
gnu
codes
run
on
on
the
average
about
ten
percent
slower.
But
again
you
can
find
the
benchmark
where
the
new
will
beat
the
other
two.
B
And
to
close
up,
this
is
going
to
be
very
short
talk
to
compensate
for
the
previous
ones.
This
is
what
I
you're
losing
using
a
lot
of
different,
optimization
arguments
on
these
benchmarks.
B
I
find
that
the
only
one
difference
from
hopper
is
intel
which
the
mine
is
fast
minus,
no
ipo,
whereas
on
hopper
I
recommend
not
using
any
optimization
arguments
just
using
the
default
cray
same
as
on
hopper
default,
no
explicit
arguments,
the
new
minus
o
three
minus
fast
math,
again
the
same
as
on
hopper.
B
Though,
oh
no,
I'm
not
going
to
mention
that
that's
not
important!
So
that's
the
end.