►
From YouTube: 08 - Workflows at NERSC
Description
Part of the NERSC New User Training on September 28, 2022.
Please see https://www.nersc.gov/users/training/events/new-user-training-sept2022/ for the training day agenda and presentation slides.
A
Bill
Arn
data
science
engagement
group,
what
I'm
going
to
cover
what
are
workflows,
what
resources
are
available
for
people
to
use
at
nurse
if
workflows,
will
help
them
and
sort
of
some
broad
best
practices?
There's
a
lot
of
details
in
a
lot
of
categories,
so
my
goal
here
is
not
to
show
you
so
much
concrete
examples
of
particular
things,
but
to
make
sure
that
you
would
recognize
if
you're,
in
a
situation
where
a
workflow
would
help
and
then
know
where
to
go,
to
get
more
information.
What
parts
of
the
documentation?
A
What
words
do
you
want
to
search
for?
What
considerations
do
you
need
to
think
about
so
you'll
be
prepared
to
use
workflows
and
do
the
research
to
actually
get
that
concrete
part,
but
also
a
lot
of
this
is
going
to
be
sort
of
Niche
and
very
particular
to
particular
Solutions.
So
if
I
went
too
specific
in
one
thing,
that
would
only
help
a
small
fraction
of
the
people
here,
all
right.
What
are
workflows.
A
A
If
you
find
yourself
rotely
repeating
the
same
command
line
a
hundred
times
or
some
other
I
just
do
this
automatically
until
all
of
it's
done,
then,
that's
probably
something
that
could
happen
better
in
a
workflow
workflow
management
tools
are
the
software
systems
that
help
perform
pieces
of
automation
like
that,
so
what
categories
of
things
can
be
helped
by
workflow
management
tools,
if
you're
performing
tasks
that
are
repetitive,
something
that's
urgent
or
if
you
want
to
do
like
automatic,
covering
so
say,
you're
running
a
job
that
has
a
small
chance
of
crashing,
and
you
want
it
to
restart
itself
when
that
happens.
A
Okay,
it
can
streamline
the
use
of
many
of
the
interfaces
that
access
nurse
resources,
so
things
like
slurm
and
S
batch
commands
and
SQ
and,
like
maybe
you
have
something
that's
watching
the
queue
where
the
progress
of
certain
jobs
for
some
reason
or
there's
also
tools
that
help
with
transferring
data.
Some
tools
have
an
understanding
of
Globus
or
sort
of
multiple
facility
or
data
sources,
or
things
like
that,
and
they
will
you'll
be
able
to
give
them
instructions.
A
That
is
something
like
fetch,
this
piece
of
data
from
somewhere
else
and
bring
it
here
when
it's
available,
there's
also
sort
of
on
the
data
side
of
things
say:
you
have
a
large
number
of
files,
a
large
amount
of
data,
a
big
pile
of
something
that
is
too
large
to
manually,
organize
yourself
to
put
into
all
the
different
folders
and
give
them
proper
names
and
make
sure
that
they're
registered
in
the
database
or
spreadsheet
or
whatever
else
you're
doing.
A
Some
of
these
tools
are
set
up
to
organize
data,
for
you,
I
will
take
all
of
the
inputs
that
match
this
pattern
or
in
this
directory.
I
will
do
this
task
to
them
and
then
I
will
put
all
of
the
outputs
from
those
with
a
particular
name
pattern
in
a
particular
place
and
though
everything
will
stay
organized
in
the
way
that
you
describe
without
you
manually
doing
it
yourself,
there's
also
sort
of
monitoring
and
understanding
performance
kind
of
things
that
these
can
do.
For
you.
Some
of
these
tools
have
a
progress
dashboards.
A
You
can
tell
them
to
send
you
an
email
if
something
goes
wrong
or
it
might
even
have
like
a
an
HTTP
little
web
server
thing
that
you
know
shows
you
charts,
so
how
fast
your
jobs
are
going
or
how
much
is
left
or
what
data
has
been
produced
or
log
files
or
all
sorts
of
stuff.
A
A
My
data
needs
several
stages
of
processing
with
different
applications
at
each
stage,
but
there's
an
ordering
like
application,
a
feeds
into
application,
B
feeds
into
application
C
and
instead
of
running
all
the
A's
myself
and
then
running
all
the
B's
myself
it'd
be
nice.
If
something
could
just
do
that
whole
chain
with
one
instruction
on
my
part,
all
of
my
applications
have
a
small
chance
of
crashing
I
want
them
to
rerun.
If
they
do
that,
on
timing
and
scheduling
things
like
say,
I
have
some
sort
of
utility.
A
So
with
problems
like
those
in
mind
what
kind
of
resources
are
available
with
nurse
to
help,
so
we've
got
some
specialized
infrastructure.
We
provide
some
software
and
the
support
from
people
like
me.
A
First
is
the
workflows
working
group,
there's
myself,
there's
Bjorn
Anders
and
Lori
Steffi,
who
are
sort
of
the
the
core
members
of
that.
So
if
you
submit
a
help
ticket
asking
about
a
workload
tool
or
workflow
issues
in
general,
you
might
get
routed
to
one
of
us
to
help
you
so
we've
been
at
it
for
about
two
or
three
years,
we're
mainly
responsible
for
the
nurse
documentation
on
workflows,
which
I
will
point
out
shortly
foreign.
A
So
we
do
a
lot
of
broad
examination
of
many
different
kinds
of
tools,
both
things
that
work
well
at
nurse
and
things
that
don't
and
try
to
get
the
understanding
of
what
the
requirements
are.
What
the
pitfalls
are,
so
that
when
people
come
out
come
in
with
questions
or
problems
suitable
for
these
things,
then
we
can.
A
You
know
point
you
in
the
right
direction,
documentation
and
and
when
we
also
do
an
amount
of
Outreach
to
users
that
are
doing
things
that
look
like
workflows
might
help
to
people
that
develop
new
workflow
management
tools
and
also
to
various
other
HPC
centers
that
have
to
sort
of
deal
with
the
same
challenges:
running
workflows
in
an
HPC
environment.
A
All
right,
so
I
will
Veer
off
to
the
side.
Hopefully
we
see
the
nurse
documentation
site
now.
A
A
thumbs
up,
maybe
anyone
yes
I-
can
see
it
all
right
cool
cool.
So
this
is
docs.nurse.gov.
If
you
look
on
the
left
about
part
way
down,
there's
the
running
jobs.
Category
expand
that
out
and
go
down
to
the
bottom,
and
there
is
workflow
tools
and
then
there
is
sort
of
this
head
page
and
this
list
of
tools
that
we
have
more
information
available
about.
So
most
of
the
things
that
I'm
going
to
mention
have
a
section
over
here
with
concrete
examples
that
should
work
on
our
machines.
A
Although
not
everything
might
be
upgraded
for
the
new
Pro
promoter
Universe,
yet
we're
still
working
on
going
back
through
things
but
yeah
we've,
we've
got
the
information
ready
to
find
right
there.
A
Okay,
oh
well,
I,
could
have
done
that
after
yeah
and
there's
a
link
to
it,
which
I
don't
like
putting
links
in
slideshows,
because
you
can't
click
on
my
share
screen.
What
good
is
that
yeah
said
that
stuff?
A
We
are
continuing
to
look
at
new
tools
to
improve
our
understanding
with
existing
ones
and
refine
the
stuff.
That's
on
there,
like
some
of
those
tools
are
not
actually
good
choices
for
nurse,
but
it's
important
to
understand
what
characteristics
of
the
tool
might
make.
It
not
work
well
in
HPC
and
I
want
to
encourage
anyone
who
is
considering
using
workflow
management
tool
or
coming
in
with
an
existing
one
from
like
some
previous
Computing
or
whatever
send
a
nurse
help
ticket
we
want
to
interact
with
you.
A
Interacting
with
tickets
is
one
of
the
main
ways
that
we
learn
about
new
ways
or
sort
of
what's
gaining
popularity
or
that
kind
of
thing.
So
we
can
prioritize
our
attention
and
make
sure
it
goes
to
the
most
beneficial
spot.
A
Okay,
so
a
resource
available,
slurm
contact,
or
sometimes
you
say,
sprun
tap
but
saying
it.
Berkeley
is
kind
of
weird.
So
crontev
is
the
standard
Linux
solution
for
running
something
on
a
schedule
when
you
wanted
to
run
something
monthly
or
weekly
or
hourly
or
whatever,
on
Quarry.
A
You
could
just
do
that
on
the
login
node,
but
you
were
sort
of
locked
to
a
specific
login
node
when
you
set
it
up
like
you
have
to
remember
which
one
it's
on
and
SSH
to
that
one
directly
and
if
something
happens
to
that
login
node,
then
your
schedule
doesn't
run
so
slurmcon
tab
replaces
crontab
on
Chrome
Under.
The
interface
is
the
same
like
the
command
s,
con
tab,
Dash
L,
to
see
what
might
be
in
there
already
or
Dash
e
to
edit
it
which
and
it's
like
a
text
file.
A
You
want
to
look
at
the
documentation
for
the
specifics
there,
but
the
thing
that's
special
about
s-contact
is
that
slurm
is
organizing
it
for
the
whole
system,
not
just
one
node.
So
if
you
open
up
your
s-pront
tab
list
of
tasks
anywhere
on
the
system,
you're
going
to
see
the
same
one
just
for
you,
you
don't
have
to
worry
about
what
node
you're
on
and
because
it's
running
is
not
tied
to
a
particular
node.
A
If
something
happens
to
a
particular
login
node
on
chromeotter,
it's
not
going
to
stop
your
job,
so
you
get
sort
of
an
increased
reliability
and
single
point
of
interface
instead
and
even
worry
about
what
note
you're
on
all
right
and
there's.
Also,
these
pound
sign
s-cron
directives
that
go
into
a
script.
A
The
same
way
the
S
batch
commands
go
in
the
top
of
the
slurm
script
and
that's
allows
you
to
communicate
things
like
qual
time
or
what
account
you
want
to
charge
to
and
that
sort
of
stuff
the
limits
for
using
this
right
now
are
two
cores
and
24
hours
of
all
time
for
an
individual
process.
You
can
restart
things,
there's
no
limit
to
how
like
frequently
you
can
schedule
it,
but
you
know
be
rational.
A
Don't
do
this
every
second
or
something
like
that
combining
S,
cront,
Tab
and
workflows
so
currently
on
Quarry
there
are
workload
management
nodes
which
have
a
special
form
to
fill
out
to
get
access
to,
and
that's
where
you
leave
a
system
process
running
to
power.
If
somebody
has
a
workflow
that
needs
that,
so
a
lot
of
those
workflow
management
tools
are
currently
running
on
quarry
on
the
workflow
management
nodes.
A
We
don't
have
workflow
management
nodes
on
chromeotter
and
we're
not
going
to
get
them.
So
a
substitution
is
to
use
S
cront
tab
with
a
different
workflow
qos,
and
the
difference
in
the
workflow
Qs
is
that
it
has
a
much
longer
permitted
maximum
of
all
time.
It's
going
to
be
at
least
a
month.
It
might
be
more
so
that's
a
way
to
get
long-running
processes.
Like
you
know,
maybe
it's
a
database,
that's
holding
State
information
for
your
workflow
or
it's
your
your
manager,
that's
accepting
commands
and
watching
the
queues
for
you
and
running
new
tasks.
A
You
use
esccon
tab
to
set
up
those
scripts
to
run
and
they
would
run
for
a
month
or
longer
on
wherever
something
happens
to
schedule
this
okay.
It
also
has
more
resources
available,
so
you
can
do
something.
That's
at
least
a
little
bit
more
powerful,
but
you
don't
want
to
be
doing
actual
like
production
compute
there.
So
one
quarter
of
a
pearl
under
node
32
cores
are
available
and
there
will
be
a
form
if
there
isn't
already
at
help.nets.gov
for
requesting
access
to
use
this
qos.
A
So
that
sort
of
mirrors
the
the
current
process
for
getting
access
to
a
workflow
node
on
Corey
all
right.
Another
resource,
that's
available
for
hosting
persistent
Services
related
to
workflows
or
whatever
else
you
might
need
a
persistent
resource
for
is
spin
spin
is
for
the
services
that
are
related
to
nurse
resources,
but
doesn't
make
sense
to
run
actually
inside
the
HPC
web.
Hosts
workflow
managers,
databases
that
category
of
stuff.
A
So
when
you
have
access
to
spin,
you
can
deploy
your
own
Gateway
Services,
API
endpoints,
all
the
all
the
stuff
that
goes
in
that
ecosystem,
and
there
are
training
sessions
for
getting
yourself
prepared
to.
You
know,
set
up
your
container
with
your
service
in
it
and
start
running
it
on
spin,
and
there
is
also
documentation
about
spin
on
the
websites,
documentation,
site
and
yeah
dogs
at
nurse.gov,
Services,
spin,
it's
on
services
and
the
next
training
session
is
October
5th.
A
So
if
you
believe
that
spin
might
be
a
good
solution
for
a
problem
that
you
have
then
go
sign
up
for
a
training
and
also
the
the
all
the
future
training
schedules
are
available
on
that
site
as
well.
A
Okay,
another
resource
available
is
the
super
facility
API.
So,
given
the
state
of
sort
of
security
and
the
challenges
of
connecting
into
an
HPC
environment,
launching
HPC
tasks
remotely
is
being
provided
by
a
nurse
through
this
API.
So
you
set
up
your
your
tokens
and
your
authentication
and
then
an
arbitrary
service.
A
Maybe
it's
running
in
spin,
maybe
it's
running
completely
elsewhere
in
the
universe
can
use
this
API
to
connect
into
a
nurse
can
do
things
like
look
at
file
system,
contents,
transfer,
data
run,
commands,
run
jobs,
look
at
the
state
of
the
queue
or
the
state
of
your
jobs,
all
sort
of
stuff
like
that.
A
A
So
some
more
concrete
examples
of
simple
things:
gnu
parallel
is
available
on
chromeotor,
with
just
module
load
parallel.
A
This
is
great
for
running
a
large
number
of
small
tasks
that
have
some
kind
of
simple
rule
that
differentiates
the
the
inputs
for
each
one
like
you
can
you
can
use
pipe
with
it,
so
this
just
simple
seek
command,
which
is
going
to
do
one
two,
three
four,
five
type:
that
input
to
parallel
and
parallel
is
running
an
echo
command,
and
then
you
see
all
the
Echoes
come
out
of
the
parallel
command.
A
This
is
great
for
putting
inside
a
slurm
job
and
running
lots
of
very
small
single
core
tasks
like
you
can
just
request.
One
node
launch
your
parallel
inside
and
it
takes
care
of
all
those
individual
short
jobs.
It's
also
just
a
great
convenience
in
general
for
doing
things
like
say:
I
want
a
hundred
directories
with
a
particular
naming
convention.
Instead
of
making
those
myself
or
writing
a
weird
bash
loop
I
can
use
a
one-liner
with
you
and
your
parallel
to
to
make
those
directories
packing.
A
Your
work
in
that
manner
will
save
you
key
wait
time,
because
small,
a
single
large
jobs
with
many
nodes
in
them
will
wait
less
time
than
a
very
large
number
of
small
jobs
that
are
each
one
known.
There
are
recipes
on
the
documentation
for
packing
multiple
node
jobs
with
gnu
parallel.
A
There
is
some
risk
using
some
other
methods
of
sending
too
many
slurring
commands
per
unit
of
time
and
overloading
the
storm
controller.
This
is
one
way
to
defend
against
that.
You
can
also
run
things
in
parallel
and
sequence.
It's
not
just
all.
At
the
same
time,
when
batch,
you
can
sort
of
Stack
a
whole
bunch
of
jobs
in
a
big
queue.
The
input
substitution
is
straightforward
to
understand,
and
it's
really
powerful
and
you
can
use
regular
expressions
and
some
things
like
that.
A
It's
better
than
task
arrays
also
yeah
go
to
the
documentation,
all
right,
so
I
am
going
to
yield
at
this
moment.
I
am
out
of
time.
A
There
are
other
tools
available
say
for
data
Centric
workflows
for
handling
dependencies
for
doing
all
sorts
of
stuff
like
that
head
for
the
documentation.
If
you
have
a
problem
that
you
believe
automation
of
this
sort
will
help
you
with
and
send
us
a
ticket
after
reading
that,
if
you
want
to
any
extra
help.