►
From YouTube: Workflows Pegasus Workflow Manager
Description
Part of the Data Day 2022 October 26-27, 2022
Please see https://www.nersc.gov/users/training/data-day/data-day-2022/ for the training agenda and presentation slides.
A
So
this
word
workflow
is
used
a
lot
nowadays,
so
this
can
mean
a
lot
of
different
things
and
it
comes
up
in
a
lot
of
different
places,
so
some
people
might
refer
to
just
kind
of
their.
You
know
your
bash
script.
That
has
a
few
different
programs
that
are
running
in
it
as
their
workflow,
but
a
lot
of
things
too
are
now
getting
to
even
more
custom
infrastructure.
A
You
might
have
databases
set
up
like
we
just
saw
in
things
like
spin,
those
might
go
to
web
interfaces,
and
all
of
that
is
interconnected
in
a
workflow
that
this
data
goes
through
and
some
of
the
things
that
nurse
helps
our
users
support
are
these
workflow
tools
and
workflow
engines.
So
a
couple
of
examples
are
like
snake
make.
You
might
have
heard
of
the
new
parallel
fireworks.
A
We
also
have
many
more
and
one
of
the
ones
I'm
going
to
be
talking
about
today
is
Pegasus.
Pegasus
is
a
a
different
type
of
workflow
manager,
but
kind
of
the
goals
in
general.
Of
what
we're
talking
about
when
we're
trying
to
to
have
workflow
managers
is
things
like
automation,
so
data
comes
in.
We
want
the
workflow
manager
to
be
able
to
pick
it
up
and
start
Computing
on
that,
and
especially
like
reproducibility,
and
sharing
this
workflow
with
other
people.
A
There's
also
going
to
be
a
lot
of
moving
Parts
being
able
to
track
data
through
those
pipelines,
as
well
as
being
able
to
use
our
resources
more
efficiently.
So
maybe
you
have
a
lot
of
really
tiny
jobs
that
you
can
pack
into
a
larger
job
using
one
of
these
workflow
managers
and
really
at
the
end
of
the
day.
A
A
So
I'll
talk
a
little
bit
about
the
Pegasus
workflow
manager.
So
this
is
a
workflow
manager.
It
defines
its
workflows
and
a
couple
different
yaml
files,
so
the
replica
sites
transforms
and
workflow
and
I'll
go
into
a
little
bit
about
that
and
show
there's
a
couple
different
apis
that
you
can
use
to
build
up
these
yaml
files.
So
you
can
either
build
them
up
yourself
or
probably
the
easiest
way
is
to
use
one
of
these
apis
and
so
I'm
going
to
be
showing
off
the
python
API.
A
And
then
we
also
have
an
example
of
this
on
the
day-to-day
GitHub
page
that
I
have
working
for
promoter.
So
there's
going
to
be
a
little
bit
more
little
information
here
and
then
some
more
information
on
the
actual
GitHub
page
that'll
have
install
instructions
and
some
instructions
on
the
run
again.
A
So
the
Pegasus
workflow
manager
has
some
different
apis
in
order
to
plan
out
a
dag,
so
that
dag
directed
analytic
graph
and
it's
a
graph
that
represents
the
work
that
can
be
done.
So
we
have
our
nodes
here.
A
So
in
this
case
we
have
something
where
we're
going
to
split
a
file
and
then
do
some
work
counts
on
those
split
files,
and
so
each
one
of
those
nodes
is
going
to
be
some
kind
of
executable
that
we're
going
to
run
and
then
each
one
of
these
edges
shows
both
the
data
flow
and
the
dependencies.
So
we
see
that
we
need
to
be
able
to
run
the
split
operation
first
in
order
to
go
on
to
these
word
accounts.
A
All
of
this
is
run
using
the
HD
Condor
scheduler,
and
so
it
you
it
actually.
Pegasus
will
take
your
representation
and
turn
that
into
a
dag
that
that
HD
Condor
scheduler
understands
and
it
will
use
that
schedule
to
actually
executor
and
so
that
scheduler
will
be
the
one
executing
and
managing
all
the
workflow.
A
So
before
we
start
making
our
workflow
there's
a
couple
things
that
we
want
to
consider
when
we're
building
this,
so
one
of
them
is
the
what
executables?
What
parts
are
we
going
to
want
to
run?
Maybe
are
these
parts
in
a
container
as
well?
Should
we
say
that
they're
in
a
container,
do
we
need
to
to
Define
that
somewhere,
a
lot
of
things
too
are
going
to
be
what
data
do
we
have?
A
What
are
our
inputs
going
to
be,
what
are
our
outputs
going
to
be
and
then
on
top
of
that,
what
are
those
dependencies
in
there?
So
what
outputs
need
to
go
to
the
next
task
and
how
are
all
those
parts
connected,
and
so
one
of
the
parts
of
this
the
transforms
part,
is
a
way
of
defining
what
our
executables
are.
A
So
here's
part
of
the
Python
API
showing
a
function,
that's
going
to
create
this
transform
catalog,
so
we
make
our
transform
catalog
object
and
then
we
can
add
transforms
to
this.
So
the
way
that
Pegasus
defines
its
executables
or
the
way
that
data
is
going
to
come
into
something
and
then
be
transformed
into
something
else
is
like
this.
So
we
can
see
that
we're.
We
can
tell
it
what
site
we
want
to
execute
on.
A
So
this
is
what
we
call
the
replicas,
so
the
replica
here
is
going
to
be
some
test
CSV
that
we're
going
to
split
up,
and
we
want
to
go
and
say
where
that
that
is
so.
This
is
a
local
file.
It's
on
the
file
system
somewhere
and
say
that
it's
in
the
inputs
directory
at
the
moment,
all
of
our
output
data
can
actually
be
put
in
as
when,
when
we
go
on
to
the
next
step,
we'll
actually
go
and
register
that
as
replicas
independently
in
a
different
step.
A
So
here's
where
we
actually
start
building
up
the
workflow
so
in
this
workflow
we're
going
to
go
and
have
that
that
file.
We
Define
that
we
have
a
file
that
we
need
as
input.
A
We
can
Define
that
we
want
to
split
this
into
four
different
parts
and
then
give
the
arguments
then
to
our
Command.
So
we
have
this
job,
we've
called
it
the
same
name
that
we
called
our
executable.
We
can
add
things
like
arguments,
add
things
like
inputs
and
then
we
can
add
that
job
into
the
workflow.
A
Then
we
can
also
go
and
add
different
command
sent
this
too.
A
So
now
we
have
our
top
layer,
which
is
our
split,
and
now
all
of
those
are
going
to
come
into
these
different
word
accounts,
and
so
we
see
that
by
going
and
seeing
this
word
count,
job
now
has
to
take
in
the
input
of
one
of
these
parts
that
we've
created
from
the
step
before
so
the
outputs
from
the
split
are
coming
into
the
inputs
of
our
word
count
and
again
we
can
go
and
catch
things
like
this
is
actually
going
to
catch
the
standard
out
and
then
take
that
standard
out
and
save
that
as
a
new
replica.
A
A
This
is
also
a
really
good
way
just
to
show.
Maybe
this
is
a
very
simple
workflow
here,
but
you
could
see
how
you
can
extend
this
as
well,
especially
with
it
being
a
python
API.
Maybe
you
have
a
directory
that
has
files
that
are
constantly
updating
this
workflow
could
look
in
the
directory
know
that
it
needs
to
do
the
same
task
on
these
depending
on
what
directory
it's
in,
and
you
could
have
this
build
the
workflow
based
on
files
and
directories
or
or
other
things.
A
So
again,
we
have
this
python
file
now
that
we
can
go
and
generate
our
workflow
with
and
that'll
create
all
of
those
yaml
files.
Again,
the
replicas
is
going
to
have
that
storage
and
our
data
defined
in
it.
The
transforms
will
have
all
of
our
executables
and
parameters
and
then
we'll
have
the
workloads
file
which
will
actually
Define
the
workflows,
I
kind
of
kept
skipped
over
the
site.
At
the
moment,
the
site
is
very
specific
to
the
site
that
we're
going
to
be
on.
A
A
So
I've
talked
a
little
bit
about
Condor,
so
Condor
is
a
different
scheduler
than
slurm,
and
so
we're
going
to
actually
have
to
have
to
set
that
up
in
order
to
use
this
system
and
the
way
that
we're
going
to
set
that
up
is
by
using
scront
tab
to
do
a
long
running
workflow
job.
So
this
is
a
new
feature.
A
That's
part
of
promoter
to
to
enable
some
long-running
workflow
jobs
and
so
we'll
actually
go
and
set
up
the
scheduler
before
we
start
any
of
our
jobs,
and
so
the
scheduler
is
actually
built
to
do
a
lot
of
high
throughput
workloads.
So
the
idea
is
that
you
have
lots
of
really
small
jobs
that
can
go
through
pretty
quickly
that
might
bog
down
a
system
like
slurm
and
a
lot
of
times.
A
These
are
going
to
be
jobs
that
have
less
than
a
node
worth
of
of
compute
power
that
they
need
so
you're
able
to
go
and
pack
a
lot
more
jobs
onto
a
single
node,
and
then
this
scheduler
understands
a
little
bit
better
how
to
pack
those
jobs
efficiently
onto
those
nodes
and
so
yeah
we're
going
to
use
this
to
go
and
run
Pegasus
and
Pegasus
will
use
this
to
run
the
workflows
so
down
to
the
bottom.
A
Here
is
just
a
example:
starting
up
and
running
a
scrawn
tab
on
parameter,
so
this
will
go
and
run.
It
says
every
10
minutes,
but
really
what
happens
is
that
it
will
just
start
every
10
minutes,
but
scront
tab
is
smart
enough
that
once
one
of
these
jobs
is
running,
it
won't
run
until
that
job
has
finished,
because
it's
using
slurm
in
the
back
end
to
manage
that.
So
this
will
just
start
up
10
minutes
and
then
I
have
this
running
for
30
days
in
the
workflow
qls.
A
A
So
once
you
have
so
once
the
scront
tab
goes
through
your
scheduler
starts
running,
you
can
go
and
use
the
Condor
status
command
to
see
and
make
sure
that
everything's
running
once
we
see
that
all
the
pieces
are
running,
we
can
see
the
most
important
one
here
is
going
to
be
the
scheduler
we
have
our
scheduler
running.
A
We've
started
up
that
scheduler
and
it's
ready
to
go
and
accept
the
work
they're
going
to
have,
and
then
we
can
do
a
pegasus
plan
submit
so
Pegasus
plan
will
take
all
of
those
take
all
those
yaml
files
that
it
sees
and
it
will
go
and
plan
through
and
compute
the
dag
that
needs
to
happen,
write
out
some
files-
and
you
can
see
here
it's
writing
out
some
files
for
Condor
to
go
and
use
to
go
and
execute
all
of
this
work.
A
Once
your
thing
starts
running,
you
can
go
and
look
at
the
how
things
have
progressed.
Looking
at
this
Pegasus,
analyzer
or
Pegasus
status
as
another
command
too.
Both
of
them
will
show
you
the
progress
of
the
jobs,
how
they're
going
through.
So
you
have
the
total
number
of
jobs
and
that,
like
the
number
of
succeeded,
failed
things
like
that,
so
you
can
see
here
too.
A
This
is
the
same
workflow
that
I
had
been
using
right,
so
we
should
probably
have
five
jobs,
but
you
actually
see
that
we
have
nine
jobs
here,
so
something
that
Pegasus
is
also
doing
is
doing
staging
in
in
cleanup
for
a
lot
of
your
jobs.
So
this
would
be
helpful
as
well.
It
does
a
lot
of
checking
for
you
to
make
sure
that
a
file
is
in
the
right
place
before
it
starts,
maybe
executing
a
larger
s
batch
that
might
use
up
some
of
your
allocation.
A
So
we
can
also
check
both
Condors
Q
to
see
that
things
are
going
through
it's
queue
and
you
can
check
on
the
slurm
queue
as
well
to
see
that,
like
I
said
this,
one
has
it's
waiting
for
the
stage
in
to
happen,
so
it's
staging
in
some
files
making
sure
that
things
are
ready
before
it
actually
starts
up
the
jobs
and.
A
A
And
go
through
so
something
I,
didn't
say
as
well:
Pegasus
actually
can
go
and
submit
jobs
to
the
slurm
scheduler
for
you.
So
there's
a
few
things
in
the
site:
configuration
that
tell
Pegasus
how
it
should
go
and
make
an
S
batch
command
for
one
of
the
jobs,
and
you
can
actually
specify
for
each
one
of
your
transforms
of
what
you
want
to
happen
for
that
transform.
A
So
that
includes
being
able
to
add
special
parameters
for
how
much
memory,
how
much
CPU,
even
how
many
nodes
you
might
want
if
it's
an
MPI
job
and
so.
A
A
A
A
So
again,
this
is
just
an
example
of
one
of
the
workflow
tools
that
we
can
use
in
order
to
help
to
go
and
make
sure
that
your
work
is
is
getting
done
to
make
sure
that
everything's
going
through
properly.
A
A
So
please
take
advantage
of
those
docs
to
figure
out.
Maybe
what
workflow
tool
might
work
for
you?
What
workflow
tool
might
not,
and
there
are
always
going
to
be
advantages
and
disadvantages
of
each
of
these
workflow
tools,
so
it
might
be
good
to
read
up
on
which
one
might
fit
your
needs.
The
best
so
I
think
that's
it.
So
any
questions.
C
A
So
I
think
for
Pegasus.
One
of
the
advantages
is
that
a
lot
of
people
might
have
a
pegasus,
workflow
already
ready,
and
so
this
is
a
way
that
those
people
can
also
just
start
using
this,
but
again
too
I
think
that
one
of
the
advantages
as
well
is
that,
because
you
can
specify
types
of
jobs
that
you
want,
you
can
have
a
mix
between
smaller
jobs
that
need
to
fit
on
on
that
can
fit
onto
one
node
plus,
then
maybe
an
NPI
job.
That
comes
after
that.
A
So,
on
the
back
end,
what
it's
going
to
do
is
it's
just
going
to
write
an
smash
script
for
you
for
the
parks
that
need
MPI.
So
really
Pegasus
is
just
going
to
go
and
figure
out
what
steps
need
to
go
and
what
places
and
also
figure
out
what
jobs
need
to
go
where
in
those
places
and
then
then
it
will
just
submit
those
jobs
via
S
bash.
So
it
should
be
the
same
performance
the
same
way
as
it
would
be
with
your
regular
job
API.
B
A
So
right
so
here
it's
just
you
kind
of
give
it
just
arguments
right.
So
these
are
similar
arguments
that
you'd
see
in
s
batch
and
then
I
was
just
doing
a
really
small
job
really
low
run
times,
but
you
would
just
give
it
a
a
number
of
nodes
here.
B
But
that
applies
to
all
jobs.
Doesn't
it
not
like
this
step
in
the
graph
needs
10
nodes,
and
this
other
step
needs
one
core.