►
From YouTube: Lattice QCD (LQCD) Project
Description
Steven Gottlieb (Indian University)
Lattice QCD (LQCD) Project
A
Thank
you,
I
hope
you
can
see.
My
slides
I'd,
like
thanks
for
the
invitation
to
speak,
particularly
to
Neil
and
Rahul
I'm,
going
to
talk
about
lattice
qcd,
which
involves
more
than
one
group.
First
I'll
talk
a
little
bit
about
lattice
qcd,
some
of
our
accomplishments
on
Pearl
mutter,
then
a
few
benchmarks.
A
Yet
sorry
that
is
not
good
Let's
see
we
can
just
re-share
I
will
try
to
do
that.
Oh
I
think
I
know
what
I
did
wrong.
A
Yeah,
we
can
see
them
now,
thanks
right,
I
forgot
to
click
on
share
all
right.
Sorry,
so
Quantum
chromodynamics
is
a
50
year
old,
Quantum
field.
Theory
of
the
strong
interaction
I
happen
to
know
it's
50
years
old,
because
I'm
involved
in
preparing
a
volume
coming
out
called
QC
50
years
of
qcd,
so
it
describes
quarks,
which
are
particles
of
matter
and
gluons,
which
are
the
force
carriers.
These
are
the
analogs
of
QED,
of
the
electrons
being
matter
and
photons
being
the
force
carrier.
A
Quarks
have
this
quantum
number
that
we
call
color,
sometimes
called
red,
green
and
blue,
which
has
nothing
to
do
with
regular
colors.
It's
responsible
for
making
bound
states
of
quarks
and
anti
quarks,
which
are
called
mesons
and
baryons,
which
are
three
quarks
bound
together,
and
these
have
no
net
color.
The
nuclear
force
is
a
residual
is
due
to
a
residual
color
force
between
protons
and
neutrons,
which,
as
I
said,
do
not
have
color
themselves
and
people
often
talk
about
the
Higgs
field
as
being
the
origin
of
mass.
A
Well,
our
Mass
actually
comes
from
qcd,
not
the
Higgs
field.
You
should
be
aware
of
that
when
they
make
those
claims,
so
Ken
Wilson
developed,
lattice
qcd
to
go
beyond
perturbation
Theory,
which
doesn't
work
so
well
at
at
low
energies
in
qcd,
but
High
energies.
It
works
well
and
my
thesis
advisor
got
the
Nobel
Prize
for
that.
A
What
we
do
to
for
lattice
qcd
is
the
Continuum
of
space-time
is
replaced
by
a
four-dimensional
grid
of
discrete
points,
and
then
the
quarks
are
described
by
complex
fields
which
either
have
three
components,
which
is
the
kind
I
use
called
staggered,
or
three
color
components
and
four
spin
components,
which
is
the
original
Wilson
formulation.
There
are
also
some
other
formulations.
A
A
So
the
basic
calculation
we
do
is
like
a
Feynman
path
integral,
but
we
have
to
change
the
theory
to
imaginary
time,
and
that
makes
it
a
lot
like
a
statistical
mechanical
partition
function.
The
numerical
methods
include
Monte
Carlo
lots
of
random
numbers.
Most
of
the
time
goes
into
sports
Matrix
solvers,
and
we
actually
have
something
like
molecular
Dynamics
in
a
new
simulation
time
Evolution.
A
So
we're
constantly
updating
things
according
to
this
small
step
size.
So
there
are
two
things
we
need
to
do.
The
first
thing
is
to
calculate
ensembles
of
these
gauge
fields,
and
these
are
basically
pictures
of
the
qcd
vacuum
and
they
have
to
be
properly
weighted
paths
in
the
path
integral
that's,
basically
what
they
are
and
then
to
do
a
physics
calculation.
We
have
to
take
averages
over
these
gauge
fields
in
an
ensemble
and
naturally,
larger
ensembles
get
better
statistical
averages
and
they
help
us
average
over
the
quantum
fluctuations.
A
So
to
carry
out
a
physics
measurement,
you
have
to
control
systematic
errors,
and
there
are
several
of
them
so
first
to
generate
an
ensemble,
you
have
to
make
some
choices.
You
have
to
calc,
you
have
to
decide
on
a
lattice,
spacing
the
smaller
the
better
and
you
actually,
you
don't
set
the
lattice.
Spacing
you
set
the
strength
of
the
gauge
field,
coupling
and
you
determine
the
lattice.
Spacing
later
you
have
to
put
the
system
in
a
finite
size
box.
A
So
I
call
that
n
spatial
cubed
by
end
time
and
then
we
use
generally
periodic
boundary
conditions
in
space
and
anti-periodic
for
the
quarks.
In
time,
then
there
are
the
quarks
and
the
ones
we
put
in
this
in
the
calculation
are
the
up
and
down
Quark,
which
are
the
lightest
ones,
the
strange
quark
and
the
charm
Quark.
A
We
don't
generally
put
the
bottom
Quark
or
the
top
Quark,
because
they're
so
heavy
compared
to
the
qcd
scale,
and
we
usually
set
the
mass
of
the
up
and
down
Quark
to
be
the
same,
and
you
have
to
tune
those
properly
or
you
don't
get
the
proper
masses
in
the
theory
and
well
actually
I'm
just
got
into
the
next
bullet
to
control
the
errors
you
have
to
make
the
lattice
spacing
smaller
and
smaller
a
goes
to
zero.
A
You
either
take
a
infinite
volume
limit
or
just
a
big
enough
box,
where
you
don't
think
the
effects
are
significant
and
I
already
said:
you'd
have
to
tune
the
Quark
masses
to
get
them
right.
So
why
do
we
use
so
much
computer
time?
Well,
it's
because
controlling
each
of
these
systematic
errors
involves
investing
more
time.
If
I
half
the
lattice
spacing
it's
going
to
increase
the
time
of
the
calculation
by
about
a
factor
of
2
to
the
sixth,
that's
at
a
fixed
physical
volume.
A
If
I
want
to
increase
the
physical
volume,
say
double
the
linear
size,
then
you
get
a
factor
of
2
to
the
fourth,
because
it's
x
y
z
and
T,
or
if
you're,
only
exchanged
making
space
big
or
two
cubed
and
tuning
the
up
and
down
Quark
masses
to
their
physical
values.
A
It's
not
a
direct
Factor
it,
but
for
many
years
it
was
too
expensive
and
and
now
we
can
do
it,
so
we
have
ensembles
with
very
closely
tuned
physical
Quark
masses
and
then,
when
we
create
these
costly
ensembles
through
the
stochastic
Evolution
to
get
these
snapshots
of
the
vacuum.
The
iterative
solver
takes
much
of
the
time
and
you
like
to
do
this
in
a
few
stochastic
Evolutions.
So
you
like
this
to
run
quickly
and
it's
more
of
a
strong
scaling
problem.
A
At
times
once
The
Ensemble
is
generated,
you
store
them
on
disk
and
tape,
and
you
can
run
several
measurement
as
we
call
them
jobs
in
parallel,
so
creating
the
gauge
Fields.
You
want
high
speed
more
like
a
capability
problem,
doing
physics
analysis
more
like
a
capacity
problem,
but
it's
capacity.
It's
still
a
high
rate
of
speed,
so
I'd
like
to
talk
a
little
bit
about
some
of
what
we've
been
able
to
accomplish
on
Pearl
Mudder
I
was
a
little
bit
late
in
asking
the
people
who
are
involved
in
this
Nissan
project.
A
To
give
me
some
results,
and
a
lot
of
this
has
to
do
with
what
my
colleague,
Carlton,
dutar
and
I
have
done
we're
in
the
fermilab
lattice
and
milk
collaboration.
Chris
Kelly
did
send
me.
Some
information
and
I
have
some
information
there
and
I
got
another
slide
today,
which
I
should
have
time
to
show
so
datar
and
I
are
interested
in
the
decay
of
mesons
that
contain
a
bottom
Quark,
it's
a
challenging
calculation
and
we
need
to
have
a
fairly
small
lattice
spacing
for
that
and
the
reason
we're
interested
in
this.
A
It
helps
determine
a
fundamental
parameter
of
the
standard
model,
which
is
the
elements
of
the
ckm
mixing.
Matrix
k,
m
khabibo
and
Moscow
won
the
Nobel
Prize
for
defining
and
realizing
that
a
three
by
three
Matrix
could
explain:
CP
violation
in
the
universe,
but
they
didn't
say
what
the
values
are.
We
have
to
do
our
calculations
and
involve
experiment
to
actually
figure
out
the
values
and
a
key
issue
is
whether
there's
evidence
for
new
physics,
because
the
Matrix
doesn't
has
to
be
unitary
in
Quantum
in
the
standard
model.
A
But
if
it's
not-
and
we
could
discover
that
if
we
tightly
constrain
the
elements
that
would
be
evidenced
for
new
physics,
so
I
created
a
bunch
of
new
gauge
configurations
on
basically
the
toughest
lattice.
We
have
or
toughest
Ensemble
that
we
have
generated
for
which
we've
generated
configurations,
and
so
this
actually
started
in
2014
generating
configurations
and
what
I
have
here
is
a
folded
timeline
of
what
happened.
So
you
can
see
in
2014
we
had
about
400
time
units
and
our
goal
was
6
000
time
units
of
running.
A
We
save
a
configuration
every
six
time
units,
so
we
wanted
a
thousand
configurations
and
you
can
see
we
went
along
and
along
for
about
four
years
till
mid
2018,
and
this
was
done
on
several
computers
that
I'll
talk
about
later
and
then
there
was
a
gap
for
about
three
years
until
Pearl
Mater
came
along
and
you
can
see
the
red
plot
for
Pearl
Mudder
wow.
What
a
change
in
the
slope-
and
this
is
the
power
of
coral
motor
for
us
but
I-
have
another
comment
about
the
power
of
pearl
Mudder.
A
That
graph
was
prepared
in
December
2021
Well
turns
out.
Promoter
became
much
busier
busier
when
it
wasn't
so
much
early
science
and
everyone
was
allowed
on
Pearl
monitor.
So
you
can
see
that
our
rate
of
progress
slowed
down
considerably,
so
you
both
need
a
fast
computer
to
do
lattice,
qcd
and
a
sufficient
allocation
to
to
to
use
it.
So
we
completed
our
goal
of
creating
500
new
configurations
in
the
first
quarter
of
the
year,
and
datara
has
been
able
to
analyze
about
50
of
them.
So
a
few
comments.
A
The
Jefferson
lab
group
uses
qcd
jit,
Cuda
and
chroma
for
their
work.
They
did
a
new
multi-grid
solver
and
had
some
significant
algorithmic
improvements
and
a
job
that
took
1
192
seconds
on
256
Edison
nodes.
They
were
able
to
run
in
just
80
Seconds
on
32
Pearl
model
nodes
and
it's
a
combination
of
the
hardware
being
much
faster
and
the
algorithm
being
improved.
A
The
RBC
UK
qcd
group
has
been
doing
two
projects
on
Pearl
Mudder
one
involves
the
muon
anomalous
Magnetic
Moment,
which
is
a
an
experiment
that
was
done
at
Brookhaven
initially
about
20
years
ago
and
a
little
bit
over
a
year
ago,
a
new
result
was
announced
at
fermilab,
and
this
is
one
of
the
best
or
had
been
one
of
the
most
intriguing
pieces
of
evidence
for
new
physics
in
the
standard
model.
That's
it's!
So
it's
still
a
very
important
calculation.
A
At
any
rate,
their
first
project
was
using
256
nodes
to
analyze
the
96
cubed
by
192
grid
and
for
the
second
project,
they're
running
on
32
nodes
and
they've,
looked
at
two
different
size
grids
with
a
domain
wall
fermions
which
involve
a
Fifth
Dimension
and
one
of
the
interesting
things
I
found
out.
There
was
a
5.9
hour.
Job
on
slingshot
10
was
reduced
to
4.7
hours
after
the
upgrade
to
slingshot
11..
A
So
here
I
have
a
cross
platform
comparison
for
the
lattice
generation.
That
I
was
talking
about
before
those
four
years
of
running
were
a
combination
of
Edison,
Corey,
Blue
Waters
and
the
red
line,
of
course,
was
Pearl
Mudder,
and
so
here
you
can
see
that
what
took
from
five
and
a
half
to
eight
almost
nine
hours
is
reduced
to
1.53
hours
on
Pearl
Mudder,
which
is
very
nice,
and
you
know
this
doesn't
take
into
account
how
many
nodes
there
are,
so
you
could
multiply
out
node
hours.
A
The
question
mark
on
Blue
Waters
is
because
I
can't
remember
what
we
did
about
hyper
threading
there.
So
sorry
about
that
you'll
get
at
least
a
within
a
factor
or
two
of
node
hours.
A
So
I
wanted
to
say
something
about
our
performance
on
Pearl
mutter.
So
for
the
production
running,
I
used,
I
mentioned
128
nodes,
I'm,
pretty
sure
I
could
have
run
it
on
64
and
had
higher
efficiency.
I
think
128
was
dictated
by
how
much
I
wanted
to
get
done
within
the
maximum
wall
time
we
were
allowed.
A
It's
a
four-dimensional
grid.
I
cut
the
grid
up
only
up
only
in
three
dimensions
which
helps
reduce
Communications.
So
the
144
you
know
was
not
cut
the
X
Direction,
but
y
z
and
T
were
all
cut.
I
was
getting
285
gigaflops
per
GPU
in
single
Precision,
which
actually
in
a
mixed,
Precision
solver,
so
part
of
it
is
half
Precision
for
Lynx,
mirroring
I
was
getting
150
gigaflops
per
GPU
for
the
gauge
Force
I
was,
is
getting
1.5
to
1.7
teraflops.
A
That
code
avoids
communication
I
noticed
about
a
decade
ago
that
it
was
really
slower
than
it
needed
to
be,
and
the
lynx
mirroring
and
Fermi
on
Forest,
which
I
don't
have
any
result
on
here,
probably
could
benefit
from
that.
So
here
is
just
of
the
conjugate
gradient
solver
of
finite
volume
study,
so
I
have
different
traces
for
different
numbers
of
gpus,
so
the
labels
are
all
on
gpus
l
means
the
local
volume
on
each
GPU
was
L
to
the
fourth
grid
points.
A
This
was
an
early
study.
I
could
do
a
better
job
now,
but
I
noticed
when
I
tried
to
redo
this
Pearl
Mudder
was
very
busy
and
I
couldn't
get
my
new
benchmarks
done
in
this
case
with
when
you
go
to
16,
gpus
and
Beyond.
We're
cutting
in
all
four
mentions,
as
I
mentioned,
even
on
bigger
problems.
We
don't
really
have
to
do
that.
A
So
you
see
within
a
node
one
two
and
four
gpus
the
performance
is
quite
high
and
it
it
mainly
depends
upon
l
when
L
is
say,
32
or
larger
you'd
be
proud
of
the
performance.
You
probably
don't
want
to
do
production
running.
If
someone
asks
how
efficiently
you're
using
the
computer
at
below
20,
but
you
can
also
see
that
things
scale
pretty
well
Beyond
16
when
we're
no
longer
increasing
the
amount
of
communication.
This
is
a
similar
plot
for
the
gauge
force
and
view
of
time.
A
I
think
I'll
not
say
so
much
about
that.
So
I
was
asked
to
say
something
about
software
development,
and
so
the
lattice
qcd
Community
has
been
creating
Community
software
and
sharing
it
for
a
long
time.
The
Cuda
project,
which
is
the
code
one
of
the
codes
that
we
use
very
heavily
to
make
use
of
Nvidia
gpus,
began
in
2008
at
Boston
University,
two
of
the
main
developers
from
bu
Kate,
Clark
and
Ron
babich.
A
Now
work
for
NVIDIA,
one
of
my
former
postdocs
Matthias
Wagner
and
another
former
postdoc
at
bu,
Evan
Weinberg,
all
were
also
work
for
NVIDIA
and
Kate.
Matthias
and
Evan
also
spend
a
lot
of
their
time
on
Cuda,
but
not
all
of
it.
A
My
work
to
support
staggered
quarks,
which
weren't
part
of
the
original
Cuda
project
was
done
with
guo
chanchi
when
I
was
on
sabbatical
at
NCSA
for
the
Blue
Waters
project
turns
out
that
project
changed
a
lot,
and
that
was
the
best
thing
that
I
got
out
of
it,
starting
the
cue
to
work
so
Cuda
originally
only
supported
Nvidia,
but
it's
been
generalized
with
back
ends
to
support
hip
sickle
and
open
MP
I've
run
hip
on
Crusher
at
Oak
Ridge,
so
our
community
really
benefits
greatly
from
Cuda
and
I'm,
not
sure
other
areas
of
science
have
a
similar
thing
going.
A
Speaking
of
Nvidia
I
got
this
from
Geek
one
two
this
morning,
who
is
I
think
he
went
straight
from
graduate
school
to
Nvidia,
but
he
also
works
mainly
on
the
domain
wall
fermions.
So
this
is
which
have
a
Fifth
Dimension.
So
this
is
Mobius
domain
wall.
Fermion
64
cubed
by
96
by
12.
A
Blue,
is
before
the
upgrade
to
from
slingshot
10
to
slingshot,
11
and
red
is
after,
and
he
told
me
in
his
email
and
said
here
he
was
able
to
get
a
64
GPU
run
done,
which
was
30
percent
faster
than
what's
shown
here,
but
he
didn't
have
a
chance
to
get
the
other
runs
done,
I
think
or
he
could
just
couldn't,
get
to
Pearl
Mudder,
because
it's
in
maintenance.
So
that's
that
getting
a
little
bit
short
on
time,
so
performance
portability.
We've
heard
a
bunch
of
talks
today
about
different
approaches.
A
To
me
it
hasn't
been
clear
for
quite
a
while
what
the
best
portable
performance
approach
is.
I
mentioned
some
here
Cocos
we
heard
about
this.
We
heard
about
openmp,
DP,
C
plus
plus
hip
last
week,
I
learned
that
hip
is
going
to
be
supported
on
Nvidia
and
probably
you
know,
at
Earl,
mutter
and
also
on
Aurora,
which
is
a
very
interesting
there's,
been
a
series
of
meetings
from
the
doe
on
performance
portability.
A
I
encourage
you
to
use
Google
and
find
some
of
those
and
on
the
right
is
the
cover
of
a
special
issue
of
computing
and
Science
and
Engineering
on
performance
portability
for
advanced
architecture
that
I
co-edited.
You
might
find
that
interesting
to
look
at
and
I've
got
to
my
conclusions.
So
Pearl
Mudder
is
a
very
powerful
platform
for
scientific
Computing.
A
If
you've
been
using
Corey
GPU,
the
transition
should
be
easy
if
you're,
just
starting
out
with
gpus
I,
would
suggest
spending
some
time
studying
the
different
approaches
to
gpus
before
you
commit
to
a
forwarding
strategy
and
finally
have
fun.
But
please
leave
some
time
for
us,
I.E
the
lattice
qcd
physicists.
Thank
you
very
much
for
your
attention.