►
From YouTube: 2021-02-23 - The Superfacility Team - The CS Area Superfacility Project: Year in Review 2020
Description
NERSC Data Seminars Series: https://github.com/NERSC/data-seminars
Title: The CS Area Superfacility Project: Year in Review 2020
Speakers: The Superfacility Team
Abstract: The Superfacility project has been busy in 2020! In this talk, we will present lightning updates from each area of technical work, highlighting the progress and achievements we've made in the past year.
Bio:
The Superfacility project includes staff from NERSC, ESnet and CRD.
A
Welcome
to
our
data
seminar
today
today
we
have
maybe
slightly
unusual
format
for
the
data
seminar.
The
work
leads
for
the
computing
sciences
area.
Super
facility
project
are
going
to
give
a
series
of
lightning
highlights
of
the
work
we've
been
doing
in
the
last
year
kind
of
highlights
of
2020.
A
This
is
part
of
our
2020
year
in
review
set
of
talks,
the
first
one
I'm
going
over
the
science
highlights
was
given
at
the
nurse
all
to
all
yesterday
afternoon,
and
so
what
you're
going
to
hear
is
a
series
of
updates
from
each
of
the
technical
work
leads
in
the
project
talking
about
their
work.
A
So
for
those
of
you
who
are
unfamiliar
with
the
super
facility
project,
I'm
just
going
to
give
you
a
brief
overview
of
what
we're
doing
so.
Computing
sciences
supports
many
users
and
many
projects
from
experimental
and
observational
facilities,
both
within
the
department
of
energy
and
from
external
external
experiments,
and
this
is
something
we've
been
doing
for
a
long
time
commuting.
A
Science
is
her
long
and
very
lustrous
record
of
working
with
experimental
science
teams
to
help
them
get
their
data
where
it
needs
to
be
help
them
analyze
their
data
and
help
them
to
use
supercomputers.
A
So
this
is
something
that
we've
been
doing
very
well
for
years,
but
we're
seeing
that
this
is
an
increasing
need
from
the
experimental
science
teams
to
be
able
to
really
handle
large
amounts
of
data
and
large
quantity
in
computing,
and
the
needs
of
this
community
is
starting
to
really
challenge
us
and
challenge
the
computing
capabilities
of
these
science
teams.
So
the
needs
go
simply
beyond
providing
a
nice
network,
some
strong
compute
resources
and
a
lot
of
storage.
A
They
have
rather
complex
needs
that
we've
seen
over
the
years
through
various
requirements.
Reviews
and
the
supervisility
concept
within
computing
sciences
aims
to
address
these
needs.
So
we
want
to
link
up
experimental,
experimental
detectors
to
high
performance,
networking
and
high
performance
computing
and
also
give
them
the
tools
and
capabilities
and
expertise.
They
need
to
do
the
full
workflow
end-to-end.
So
this
involves
yes,
high-performance
networking,
computing
storage,
but
also
the
ability
to
move
and
manage
data
between
sites
and
the
ability
to
have
real-time
or
short
turnarounds
and
interactive
access
for
their
computing.
A
They
need
to
have
resilient
workflows
that
run
across
multiple
locations
and
a
whole
ecosystem
of
edge
services
that
persists
over
long
periods
of
time.
You
know
months,
maybe
even
years
including
workflow
managers,
visualization
tools,
databases,
web
services
and
a
lot
more,
and
this
is
something
that
kathy
ellick
recognized
many
years
ago
and
has
been
part
of
the
cs
area
strategic
plan
for
the
last
five
years
and
around
about
a
year
ago,
we
instituted
the
super
facility
project.
A
Now
this
is
a
project
with
a
small
p,
in
inverted
commerce,
because
it's
not
a
doe
project.
This
is
an
internal
project
to
computing
sciences
and
it
was
designed
to
coordinate
all
the
work
that's
being
done
across
computing
sciences
and
where
it
makes
sense
to
coordinate
and
track
all
the
work
we're
doing
to
support
experimental
science,
because
one
thing
that
we've
noticed
very
much
is
that
a
lot
of
different
science
areas
and
a
lot
of
different
experiment
teams
are
facing.
A
The
similar
kind
of
challenges
and
for
us
to
be
able
to
scale
up
our
supports
of
all
these
science
teams,
we
need
to
take
a
coordinated
approach
so
that
the
solutions,
the
tools
and
technologies
we're
developing
so
that
they
are,
are
going
to
be
applicable
across
multiple
science
communities.
A
So
the
goal
of
this
project
is
always
designed
to
be
a
three-year
project,
not
because
the
science
will
go
away
in
three
years,
but
because
we
wanted
to
have
a
defined
end
goal
and
a
concerted
push
and
effort
to
develop
the
the
technologies
we
need
for
a
sustainable
support
of
this
workload.
A
So
the
goal
is
that
by
the
end
of
2021,
so
by
the
end
of
this
year,
more
or
less,
we
will
have
at
least
three
of
our
science
application
engagements
able
to
demonstrate
automated
pipelines
that
analyze
data
at
scale
without
routine
human
intervention.
So
automation
is
a
really
key
part
of
this.
A
We've
identified
specific
capabilities
that
we
want
to
be
able
to
support,
including
real-time
computing,
high
performance,
networking
data
movements
and
management,
automation
driven
by
api,
using
jupiter
as
one
of
our
compute
methods,
and
then
also
starting
to
use
federated
identity
for
authentication
methods,
and
this
is
hosting
this
whole
ecosystem
of
edge
services
with
our
spin
service,
and
I
have
here
on
the
right
of
this
slide.
A
Eight
science
engagements
that
we're
particularly
working
with
this
is
a
selection
of
science
teams
who
have
been
chosen
because
their
needs
represent
a
whole
spectrum
of
from
different
scales.
They
each
have
slightly
different
needs
of
computing
and
workflow
and
storage
and
networking,
and
so
by
working
closely
with
these
teams
and
by
gathering
requirements
from
all
these
teams,
we're
able
to
design
tools
and
technologies
that
will
meet
the
needs
of
a
large
part
of
our
user
base,
and
that's
something
that's
very
important.
A
Okay,
so
we've
now
been
running
for
two
years
and
really
this
last
year
has
been
really
very
impressive
on
part
of
the
super
facility
team.
The
super
facility
project
has
achieved
a
lot
this
year
under
difficult
circumstances.
A
A
The
project
has
four
work
areas
in
our
org:
charts,
applications,
gathering
requirements
and
deployments
of
applications
for
the
users,
the
scheduling
in
middleware
work
area,
an
area
really
focusing
on
automation,
automated
systems
and
automation
in
the
network
and
then
a
work
area
around
data
management,
and
with
that
I'd
like
to
start
going
through
our
updates
one
thing:
I
will
ask
I'll
ask
that
we,
everyone
hold
questions
till
the
end.
A
We
have
a
lot
of
highlights
to
get
through
and
I
want
to
make
sure
everyone
has
a
chance
to
present
and
then
we
can
go
go
through
questions
at
the
end,
all
right.
So
first
up
we
have
lori
talking
about
our
knee
staff
for
data
work.
B
Hi
everyone,
so,
if
you're
familiar
with
nisab,
it's
a
program
where
we
work
with
science
teams
to
try
to
optimize
their
applications
for
our
system,
so
one
part
of
that
is
nisap
for
data
who's
been
focusing
on
a
lot
of
the
projects
debbie
just
described.
B
We
have
a
lot
of
postdocs
who
are
kind
of
wrapping
up
and
also
a
new
class
that
are
coming
in.
So
just
to
give
you
a
quick
overview.
We
had
yongsan
who
was
working
with
atlas
and
cms,
he's
he's
now
off
to
nvidia
kind
of
our
middle
class
of
postdocs.
We
have
sheen
and
daniel
who
are
working
with
lz
and
desi
respectively,
and
we
have
upcoming
felix
who's,
not
pictured
he'll,
be
working
with
xfl.
B
We
have
nistor
nick
and
lippy,
who
will
be
working
with
toast
the
cmb
experiment,
jgi
resilient
workflows
and
lippy
gupta,
so
she'll
be
working
with
the
als.
So
we're
looking
forward
to
all
the
contributions
that
the
nisap
for
data
program
can
make
in
this
facility
space.
A
Okay,
thanks
laurie
next,
with
kelly
updating
us
on
outreach.
C
Thanks
so
outreach
this
past
year
has
taken
sort
of
a
two
pronged
approach.
The
first
slide
here
describes
one
of
the
prongs
that
is
sort
of
the
larger
outreach
to
the
larger
community
in
the
hpc
and
user
space.
C
So
last
year,
buren
enders
gave
a
great
talk
on
best
practices
for
running
at
nurse.
As
a
kickoff
meeting
to
our
nurse
user
group
special
interest
group
for
experimental
facility
users.
C
Following
that
introductory
talk,
the
community
posted
a
series
of
talks
from
one
from
each
experimental
facility
sharing
opportunities
and
challenges
that
they
faced
at
nurse
and
and
really
came
together
to
to
learn
from
each
other.
C
Following
that
from
nurse
there
was
a
demonstration
series
of
super
facility
tools
presented
virtually
where
a
number
of
the
technologies
that
will
be
discussed
later
on
in
this
presentation
were
demonstrated
to
the
the
community
and
then
finally,
we
had
a
number
of
great
talks
and
workshops
at
the
2020
supercomputing
conference,
including
the
presentation
at
the
x-loop
workshop
and
a
state
of
the
practice.
Talk
on
the
next
slide
is
a
description
of
the
other
outreach
prong
that
we've
been
working
on
this
past
year,
which
is
outreach
to
the
science
partners.
C
How
can
nursk
help
the
science
partners
make
better
use
of
the
super
facility
tools?
What
existing
documentation
do
they
already
have?
How
can
we
help
them,
help
their
users
or
help
their
staff
use
nurse
more
effectively?
C
And
out
of
this,
we
we've
collected
existing
their
existing
documentation
and
done
a
comprehensive
review
of
that
and
as
a
follow-up,
we're
doing
targeted
meetings
with
interested
research
group
areas
at
experimental
facilities
like
the
als,
the
advanced
light
source.
A
Thanks
great
thanks
kelly,
so
next
bill's
going
to
tell
us
about
work
in
the
area
policies.
D
D
At
the
beginning
of
the
year,
the
state
of
this
area
was,
I
had
just
accepted
the
role
as
the
lead
of
the
policies.
D
Currently,
we
are
sitting
on
a
we
have
designed
a
consensus
vetted
setup
for
what
I
want
to
call
campaign
users,
and
the
goal
of
this
is
an
implementation
of
an
idea
that
has
been
floating
around
for
a
few
net
years
now
called
data
users,
which
would
be
ways
to
bring
in
sort
of
additional
user
facility
associated
researchers
into
nurse
in
sort
of
a
streamlined
fashion.
E
D
Strategic
talking
point
for
a
number
of
years:
how
does
nurse
grow
from
being
an
institution
with
thousands
of
users
to
being
tens
of
thousands
or
more
users,
and
these
campaign
users
are
one
of
the
ways
that
I'd
like
to
see
that
made
real.
D
The
goal
would
be
to
leverage
some
of
the
other
super
facility
work
such
as
federated
id
and
shared
user
accounts.
That
would
belong
more
to
a
research
project
at
a
partner
user
facility
and
not
individual
humans,
and
we
can
make
use
of
lots
of
the
existing
infrastructure
that
manages
quotas
that
manages
identity
access
management,
automation
and
apis
to
end
up
with
a
system
that
can
be
easily
connected
to
existing
user
onboarding
systems
at
other
user
facilities.
A
Great
thanks
bill
all
right.
Next,
we're
going
to
hear
about
jupiter
and
some
of
the
work
being
done.
F
Okay,
so
yeah,
this
is
rolling.
I'm
going
to
talk
about
jupiter.
Briefly
in
the
super
facility
project.
We
think
that
the
jupiter
notebook
is
a
rich
user
interface
that
has
really
great
potential
to
make
interacting
with
supercomputers,
easier
and
more
productive,
help
attract
new
kinds
of
users
and
explain,
expand
the
application
of
super
computing
to
new
science
domains,
especially
experimental
and
observational
science
facilities,
like
those
that
are
the
focus
of
the
super
facility
initiative.
F
Of
course,
this
takes
work.
Jupiter
didn't
come,
come
out
of
berkeley
to
be
necessarily
directly
installed
onto
supercomputers
and
just
ready
to
go,
but
it
takes
work
from
people
like
us
in
super
facility
to
make
this
work
and
other
people
at
nurse
who've
helped
to
make
this
possible.
So
this
past
year
we've
been
able
to
support
about
2
000,
unique
users
using
jupiter
on
quarry,
so
that
means
about
25
percent
of
all.
Interaction
with
corey
goes
through
jupiter.
F
Now
many
of
the
users
that
are
using
jupiter
are
from
the
eod
space
lsst
desi
lcls.
These
are
projects
that
are
all
using
jupiter.
F
In
the
past
year,
we've
we've
been
able
to
change,
make
changes
in
the
stability
of
the
system,
basically
through
changing
deployment,
moving
to
rancher
2
in
in
spin
and
leveraging
ci
cd
practices
for
for
supporting
the
service
and
we've
also
in
introduced
a
number
of
extensions
that
are
useful
at
nurse,
but
also
at
other
hpc
centers
like
interacting
with
slow
arm
through
jupiter
lab.
F
That's
a
picture
on
the
right
of
that
and
we've
also
continued
our
robust
community
engagement,
working
with
other
hpc
centers
and
other
facilities
to
adapt
jupiter
to
hpc
and
large-scale
science
facilities.
So
treyas
is
going
to
take
the
next
slide.
G
So
this
has
been
really
useful
for
them
to
be
able
to
figure
out
how
to
you
know,
hone
in
on
on
some
of
their
their
parameters
for
the
tomography
workflows,
and
then
we're
also
doing
things
like
building
various
tools
in
the
ecosystem
for
them.
So
there's
a
slice
viewer
that
lets
them.
G
You
know
dig
through
a
set
of
3d
images
and
kind
of
flip
through
a
bunch
of
those
with
different
parameters,
and
so
what
we're
this
is
part
of
the
work
that
we're
doing
between
the
super
facility
and
nurse
can
crd,
and
so
this
has
been
really
useful
and
really
helpful,
and
we
presented
this
work
at
x.
Loop
at
super
computing
this
year
as
well.
A
Okay,
so
next
up,
bjorn
is
going
to
tell
us
about
workflow
planning
for
nurse9.
H
Yeah
hi
thanks,
so
this
is
a
new
area.
Basically,
we
want
to
make
make
sure
that
nurse
9
is
ready
for
automated
workflows
and
can
be
easily
integrated
in
our
patent
facility
pipelines
and
to
this
end
you
know
we
collect
requirement
for
our
super
facility
partners.
We
make
sure
we
have
like
adequate
milestones
in
the
integration
in
the
use
integration
area.
H
We
also
took
some
steps
to
reach
out
to
waffle
to
developers,
and
you
know
made
enough
noise.
The
waffle
related
events
in
contact
with
extra
work,
so
taking
part
in
the
workflows
at
community
summit.
Really
this
is
a
work
in
progress.
So
if
you
have
an
automated
workflow
that
you
want
to
make
sure
it
runs
on
a
nurse
nine,
any
kind
of
offer
you
want
to
be
considered,
you
know
please
feel
free
to
reach
out,
so
we
can
make
sure
that
your
requirements
are
reflected
in
our
planning.
A
Thanks
so
next
I'm
going
to
talk
a
bit
about
workflow,
resiliency
and
building
on
from
that,
so
in
2020
we
started
really
actively
working
on
enabling
our
science
partners
to
run
reliably
across
multiple
facilities,
and
I
emphasize
we've
started
working
on
this.
This
is
also
a
work
in
progress,
but
one
that
I'm
I'm
quite
excited
about.
So
there
are
three
main
areas
that
we've
been
focusing
on.
A
One
is
that
we
have
an
alcc
award
of
time
at
at
all
the
the
oscar
compute
facilities,
so
at
alcf
and
olcf
as
well
as
at
nurse.
This
is
work
that
katie
antipas
has
been
leading,
and
this
is
a
group-
that's
exploring
container
technologies
and
data
management
tools
and
see
how
the
technologies
can
run
across
the
different
facilities.
A
A
We
already
learned
some
very
useful
pain
points,
and
that
gives
us
good
pointers
to
where
we
might
want
to
be
focusing
future
efforts
and
future
work
in
trying
to
run
workflows
across
facilities.
We
also
have
a
nice
demonstration
this
year.
The
jupison
notebooks
are
actually
one
way
that
you
could
consider
having
a
workflow
that
could
run
at
multiple
locations.
We've
been
able
to
demonstrate
that
with
notebooks
running
at
nurse
and
also
running
at
slack
at
the
lcls
experiments.
A
So
again,
this
is
very
much
active
work
in
the
next
year
as
well.
A
I
So,
a
year
ago
we
had
developed
a
proposed
architecture
and
we
had
some
isolated
proof
of
concept.
Proof
of
concept,
implementations
really
just
to
kind
of
kick
the
tires
and
make
sure
that
we
were
familiar
enough
with
some
of
the
components
and
there
was
a
strong
likelihood
that
they
would
be
useful
for
for
our
application
and
a
year
later,
our
status
is
we've.
I
And
so
we
created
a
set
of
proposals,
and
we
worked
closely
with
our
security
team
to
refine
those
and
adjust
our
our
risk,
tolerance
appropriately
for
nurse,
and
then
we
went
through
a
fairly
comprehensive
security
review
of
those
policies
and
have
made
adjustments
based
on
recommendations.
We've
received.
I
I
Pretty
much
all
of
the
features
are
ready,
except
for
the
the
final
multi-factor
step-ups
that
we
want
to
make
sure
that
if
the
home
institution
doesn't
implement
multi-factor
authentication
that
we
have
an
opportunity
to
require
of
the
user,
and
so
the
code
for
that
is
being
finalized.
And
we
expect
to
have
everything
put
together.
J
Yes,
hi
it's
gabor
and
I
work
on
the
api
and
the
idea
of
the
api
is
that
anything
that
you
can
do
by
logging
into
the
machines.
You
should
be
able
to
do
from
a
script
via
the
api
so
a
year
ago
the
api
most
of
the
api
calls
just
returned
fake
information,
so
it
was
sort
of
a
proof
of
concept
that
didn't
do
anything.
The
authentication
was
a
homegrown
authentication
that
we
wrote
ourselves
versus
today.
J
We
have
a
standards-based
authentication
that
uses
the
connect
to
id
server
that
uses
the
oidc
standard
for
authentication.
Many
of
the
apis
are
now
functional,
so
you
can
check
on
system
health
or
center
health.
Rather,
you
can
run
jobs,
retreat,
retrieve
their
results,
and
you
can
move
data
around
using
the
storage
apis.
J
We
had
a
security
review
of
the
api
and
and
the
authentication
around
it
the
change
the
changes
that
were
recommended
that
came
out
of
that
review
were
implemented.
We
also
had
a
ux
review
that
sort
of
looked
at
how
to
use
the
api
and
how
to
make
it
friendlier
for
users.
So
we
have
implemented
a
lot
of
those
suggestions
as
well.
J
We've
started
to
reach
out
to
other
other
centers
and
facilities
that
have
similar
apis
like
cscs
and
tech,
and
we're
hoping
that
in
the
future,
there
will
be
collaboration
between
the
centers
and
and
maybe
have
a
set
of
apis
that
are
sort
of
common
to
all.
The
centers.
K
K
About
a
year
ago,
we
actually
launched
in
may
of
2018
and
had
a
long
pilot
phase
about
a
year
ago,
we're
still
concluding
that
pilot
phase
and
we're
just
on
the
brink
of
introducing
a
major
new
version
of
the
system.
That's
been
runs
on.
We
had
about
135
users
of
that
at
that
stage.
K
So
now,
a
year
later,
we're
in
a
full
production
with
and
we're
on.
A
new
instance
of
the
underlying
system
is
rancher
two.
This
is
essentially
a
complete
redeployment
and
this
system
is
based
on
kubernetes,
so
it
has
a
lot
of
new
modern
features.
It's
very
robust.
It
introduces
a
new
web
user
interface,
that's
easier
for
users
to
to
get
started,
and
this
also
dovetailed
nicely
with
cares.
Act
funded
large
memory
nodes.
K
So
we
built
our
production
cluster
this
summer
for
the
rancher
2,
in
instance,
based
on
a
large
memory
node.
So
we
could
support
a
wider
variety
of
of
workloads
in
this
new
rancher
2
instance
in
in
spin.
Meanwhile,
we
kept
the
support
and
training
going
and
add
some
new
features
there.
So
we
did
five
workshops.
Over
the
last
year
we
added
an
office
hours
every
other
friday.
K
The
workshop
materials
and
the
documentation
were
redone
and
we've
added
some
support
from
other
groups
at
nursk,
as
the
popularity
of
the
service
has
continued
to
grow
so
apg
and
das
and
dc
and
ueg
all
have
folks
that
are
helping
out
in
the
support
and
training
and
we're
up
to
over
200
users.
K
Now,
as
of
the
last
workshop
here,
just
in
february
and
over
40
nurse
projects
that
have
expressed
interest
or
already
started
working
in
spin
for
for
their
work
and
a
few
of
the
a
few
of
the
highlights
are
shown
there
to
the
right.
A
L
Sure
so,
as
these
systems
get
larger
and
we
have
more
complex
workflows,
one
of
the
efforts
that
we
realize
we
need
is
to
to
make
these
more
automated
and
easier
to
operate.
So
this
is
a
monthly
meeting
group
where
we
discuss
best
practices
and
what
is
going
around
or
what's
going
on
in
the
whole
cs
area.
L
A
lot
of
topics
include
things
like
monitoring
and
tuning
system
power.
We
have
conversations
with
hpe
presenting
their
work
within
rel
collaborations,
with
alcf
we've
presented
and
presentations
on
splunk
capabilities
via
es
net.
So
it's
you
know
really
good
set
of
discussions
and
the
the
I
think
most
exciting
thing
is
that
you
know.
I
think
this
will
springboard
how
we
do
things
on
on
pearlmutter.
So
we're
really
excited
to
get
to
work
on
that.
M
Hello,
everyone,
network
resource
reservation
and
advanced
networking
capabilities
are
covered
in
the
sdn
technology
area.
As
of
jan
2020
nurse
esnet
and
slack
have
deployed
a
dynamic
multi-point.
Oscar
circuit
infrastructure
projects
like
xfl
at
slack,
can
automatically
initiate
an
oscar
circuit
to
nask
and
tell
it
down.
After
the
data
transfer
is
completed.
M
We
have
extended
our
data
center
to
provide
an
aggregate,
400
gigabits
bandwidth
to
the
ensim
user
facility.
At
the
molecule
of
foundry
and
last
year,
we've
deployed
a
slum
plug-in
to
allocate
bandwidth
balanced,
compute
nodes.
In
query,
we've
started
a
major
project
to
overhaul
nurse
core
networks.
M
This
will
provide
more
bandwidth
and
programmable
resource
reservation
capabilities
for
super
facility,
scientific
workflows.
As
part
of
the
effort
we
have
started
deployment
of
400
gigabit
ethernet
at
nask.
M
A
Okay,
all
right
thanks,
ashwin,
okay!
Next
tom
is
going
to
tell
us
about
work
being
done
in
the
sense
projects.
N
Yes,
hello,
I'm
talking
about
the
next
generation
work
we're
doing
in
the
network
infrastructure,
so
this
is
focused
on
multi-domain,
orchestration
and
automation
for
network
services.
It
expands
on
some
of
the
oscar
services
that
we
heard
about
just
now
and
the
the
objective
there
is
that
the
through
the
yes
net
develop
sense
project.
There's
the
ability
to
do
orchestration
of
multi-domain
network
services,
where
the
network
services
include
not
only
the
network
infrastructure,
but
also
the
networking
stack
inside
of
in
systems
like
dtns.
N
So
we
made
a
lot
of
progress
in
trying
to
understand
how
that
type
of
multi-domain
automated
orchestrated
service
could
be
utilized
in
the
context
of
super
facility
and
distributed
infrastructure.
So
one
of
the
prototype
research
projects
that
we've
worked
on
this
year
is
with
the
xfl
and
the
lcls
in
terms
of
moving
data
from
slack
to
nursk
and
integrating
that
with
workflows
so
kind
of
the
there's.
Two
main
focuses
for
this
effort:
one
is
the
distributed
infrastructure
itself
and
the
other
is
the
integration
with
the
application
workflow.
N
So
we've
done
some
initial
work
there
and
we're
making
more
progress
and
it's
a
little
bit
broader
than
you
know,
just
one
type
of
workflow
or
one
type
of
infrastructure.
So
if
we'll
go
to
the
next
slide,
I'll
talk
about
that
a
little
bit.
N
Actually,
one
more
yeah,
that's
that's
the
one,
exactly
okay,
great
yeah,
so
just
in
terms
of
the
idea
of
distributed
network
infrastructure
and
orchestration,
we
are
very
focused
on
how
the
main
science
workflows
can
integrate
with
this,
and
you
know
now
that
we
have
kind
of
an
api
driven,
multi-domain
network
services
capability,
and
now
we
heard
also
about
the
super
facility
and
nurse
super
facility
api.
You
know
that
now
there's
an
api
going
to
be
available
into
the
computational
resources.
So
a
lot
of
the
work
we're
trying
to
do
is
figure
out.
N
How
can
the
main
science
workflows
utilize
these
different
apis
and
how
can
these
apis
work
together
to
enhance
workflows
so
we're
looking
at
multiple
workflows,
but
one
of
the
one
communities
that
we've
been
working
closely
with
is
the
lac
community
and
they
have
various
systems
like
russia
and
fds
and
things
above
that
that
are
trying
to
develop
the
intelligence
of
how
to
use
these.
So
that's
a
large
focus
for
this
effort
is
to
integrate
these
types
of
capabilities.
What
the
demand,
the
main
science
workforce.
A
Great
thanks
tom,
so
next
we're
moving
now
into
the
data
management
work
area
of
the
project,
but
on
the
continuing
the
theme
of
data
movement,
elise
is
going
to
tell
us
about
working
done.
Data
movement
in
this
work
area.
O
Hi,
so
we've
had
a
pretty
productive
year
in
2020
in
terms
of
the
data
movement
space,
moving
data
across
all
the
centers
quite
a
bit
and
starting
in
2020
we
only
sort
of
had
the
standard
globus
endpoints
available
where
users
could
write
as
themselves
to
and
from
our
various
file
file
systems.
We
had
a
very
early
prototype
of
gpfs
hpss
interface
deployed,
which
is
called
ghee.
It's
a
new
way
of
interfacing
with
hpss.
O
So
as
of
now
at
the
end
of
2020,
we've
deployed
a
new
service
in
globus.
That's,
let's
that
lets
groups
that
use
collaboration,
users
read
and
write
via
globus
as
the
collaboration
user
on
the
file
system.
So
it's
very
useful
for
the
super
facility
partners.
They
make
a
pretty
extensive
use
of
collaboration
users
and
so
having
this
is
much
more
convenient.
They
no
longer
have
to
transfer
data
in
and
then
open
tickets
to
ask
the
nurse
staff
to
do
chones
and
and
permission
changes.
O
For
all
of
our
all
of
our
data
throughout
nurse
with
these
tools,
and
so
the
early
testing
of
ghi
is
done,
the
gps
hpss
interface
and
then
during
2020,
we
worked
to
dramatically
improve
the
usability
and
security
of
the
interface
and
we've
done
successful
testing
with
several
facility
groups
and
other
teams
and
there's
some
set
of
published
documents
on
how
to
use
this
system,
and
you
can
go
and
check
it
out
yourself
if
you
want
to
kick
the
tires.
O
We've
also
scheduled
worked
with
slurm
to
develop
some
capability
to
have
batch
system
data
movement
so
that
you
can
migrate
data
automatically
from
cfs
to
or
hpss
to
night
scratch
before.
Your
job
starts
have
that
be
integrated
into
the
batch
system,
so
that
it
will,
it
will
hold
your
job
until
the
date
is
ready
and
the
target
for
that
is
fall
of
2021
and
pearl
meter.
A
Thanks
lisa,
so
next
mariam
will
tell
us
about
work
being
done
in
networking
tools.
Okay,
so
hi.
P
Everyone,
so
this
year
we
were
focused
more
on
network
analytics
tools,
so
there
are
two
main
tools
which
we
are
currently
working
on
highlighting
today.
So
the
net
predict
tool
which
was
launched
last
year.
It's
mostly
kind
of
like
a
machine,
real-time
machine
learning,
google
maps
style
to
predict
what
the
network
traffic
is
going
to
look
like
on
es
net
in
the
future,
so
that
we
can
plan
on
big
data
transfers
accordingly.
P
So
what
we've
done
is
we've
updated
that
tool
with
the
graph
neural
network
behind
the
scenes
which
has
includes
the
predictions
of
the
network
traffic
and
we've
published
a
paper
on
this.
Initially,
the
tool
was
deployed
on
google
cloud
platform,
but
now
we're
working
with
the
spin
team
to
move
that
tool
onto
spin
so
that
we
can
have
much
more
control
over
what
is
going
on
behind
the
scenes.
P
The
second
tool
is
a
new
tool,
the
net
preflight
tool,
which
we've
been
developing
specifically
for
the
super
facility
team.
So
this
is
in
collaboration
with
bjorn
and
bashir.
My
postdoc
so
bjorn
identified
this
problem
where,
when
we
do
a
d2m
to
dtn
transfer,
there
is
no
way
in
knowing
what
the
network
performance
is
before
you
do.
The
transfer
and
the
tools
like
iperf
and
persona
usually
require
a
client
to
be
running,
on
the
other
end,
to
actually
get
the
correct
information.
P
So
what
we've
done
is
we've
developed
this
tool
where
you
don't
need
any
client
to
be
deployed,
but
you
can
use
socket
programming
and
a
bunch
of
file
transfers
to
actually
calculate
what
the
current
bandwidth
and
the
trace
route
is
before
you
do
the
transfer.
So
our
current
tests
have
shown
comparable
results
to
iperf
and
we
are
writing
this
up
into
a
paper
and
we
will
be
presenting
this
tool
to
esnet
first
to
get
feedback
and
then
we'll,
hopefully,
release
it
to
the
rest
of
the
community.
A
Thank
you
thanks.
Miriam
okay,
next
annette
will
tell
us
about
the
data
dashboard
work.
Q
Hi
yeah,
so
the
data
dashboard
is
integrated
into
the
my
nearest
user
interface
and
in
the
beginning
of
2020.
What
we
had
were
was
one
tab
called
the
data
dashboard
and
that
offers
features
for
people
to
go
in
and
see
how
they're
doing
versus
a
quota
in
terms
of
their
data
storage
and
the
number
of
inodes
they're
using.
Q
We
also
gave
them
a
tool
for
browsing
through
their
directories
and
seeing
the
metadata
for
all
the
files
in
each
directory
and
being
able
to
kind
of
get
a
sense
for
what's
where
and
then
we
also
introduced
a
tool
that
allowed
them
to
identify
their
largest
files
and
directories
and
those
with
the
most
inodes
in
order
to
find
what
they
might
want
to
archive
off
and
free
up
more
space
in
their
quota.
So
the
at
the
end
of
20
2019.
Q
So
what
we've
implemented
in
2020
was
this
tool
called
the
pi
toolbox,
and
this
is
a
way
for
pi's
to
be
able
to
go
in
and
actually
fix
things
that
may
be
blocking
them
in
terms
of
you
know
things
that
they
might
otherwise
have
to
go
to
a
consultant
for
to
file
a
ticket
and
servicenow
and
what
we're
enabling
them
to
do
initially,
as
you
go
in
and
and
set
the
permissions
on
files
or
change
the
groups
of
the
group
that
owns
it
and
they
they
can
basically
go
in
and
make
changes,
regardless
of
who
actually
owns
a
file.
Q
Q
Have
the
all
the
same
group
and
all
group
readable
permissions
super
quick,
just
one
click
and
then
it
runs
in
the
background
and
we're
moving
towards
setting
it
up
so
that
they
can
also
do
a
change
of
ownership
on
a
file,
and
that
will
be
just
a
separate
another
button
that
we'll
just
add
into
that
that
interface
and
that
we
all
this
stuff.
Q
We
were
able
to
present
at
the
sc20
state
of
the
practice
track,
and
now
we
are
working
on
developing
what
will
probably
be
yet
another
tab
in
my
nurse
that
will
allow
people
to
do
things
like
file
transfers
and
we're
calling
this
a
petabyte
data
portal.
A
Thanks
annette
so
now,
quincy
and
serena,
going
to
tell
us
about
work
being
done
around
hdf5.
R
That's
great
so
over
the
last
year,
we've
spent
a
fair
amount
of
effort,
improving
the
support
for
experimental
and
observational
data
in
hdf5.
R
On
the
other
hand,
hdfi's
ecosystem
is
quite
large,
but
doesn't
access
xtc-2
data.
So
in
order
to
kind
of
meld
those
two
together,
we
devised
a
prototype
ball
connector,
a
plug-in
for
hdf5
that
allows
hdf5
applications
to
directly
read
xtc2
files
and
support
the
most
common
xtc
objects
in
those
files,
as
well
as
the
most
common
hdf5
api
routines.
So
we
can
enable
matlab
to
read
xtc2
directly
through
its
hdf5
interface
and
the
other
tools
that
are
built
around
hdfi
next
steps
over
the
next
course
of
the
project.
R
R
Next
we
focused
on
enhancing
hdf5's
support
for
streaming
data.
It's
it's
a
common,
you
know
experimental
use
case
and
just
not
very
well
supported
in
hdf5
and
likewise
for
variable
size.
Many
records
come
out
with
cameras
or
other
sorts
of
instruments
that
are
not
nice
multi-dimensional
arrays
and
that's
been
a
weakness
for
hdf5.
R
So
over
the
last
year,
we've
enhanced
hdf5
to
support
stream
data
very
directly,
just
improvements
to
the
basic
parts
of
hdf5,
the
infrastructure
that
everyone
uses
so
to
the
10x
performance.
Improvements
are
already
baked
into
the
library
for
the
next
release,
we've
written
up
and
done
some
prototyping
around
a
more
optimized
api
interface
for
hdf5.
R
You
know
and
improve
and
productize
the
streaming
api,
as
well
as
the
variable
length
data
storage,
so
that
we
can
really
raise
all
the
boats
by
improving
some
of
the
storage
infrastructure
that
hdf5
provides
for
everyone
and,
lastly,
we've
spent
some
time
working
on
the
querying
the
metadata
in
hdf5
files,
so
that
there's
a
there's,
a
tremendous
amount
of
metadata,
that's
already
built
in
and
baked
into
self-describing
formats
like
hdf5,
and
so
I
wanted
to
bring
that
out
with
this
metadata
indexing
and
querying
mix
project
in
order
to
allow
applications
to
really
extract
the
science
data
and
enhance
the
ability
to
deliver
science.
R
Discoveries
by
you
know,
looking
at
the
data
that
they've
already
got
they've
already
produced
this
data,
they
just
want
to
explore
it
for
better
knowledge
out
of
it.
So
next
steps
on
that
are
to
kind
of
explore
the
semantic
relationships.
You
know
not
just
stand-alone
pieces
within
the
file,
but
the
relationships
between
those
guys
and
be
able
to
query
and
build
more
science
discoveries
out
of
the
relationships
between
objects,
not
just
kind
of
across
individual
objects.
R
A
Great
thanks,
quincy
all
right
and
our
last
technical
work
area
update,
is
from
chris
tell
us
about
work
being
done
in
advanced
scheduling.
E
G'day,
so
where
were
we
with
advanced
scheduling
in
in
january
2020.,
so
we
had
nre,
which
is
non-recurring
engineering
with
skymd
who
maintains
slurm,
because
we
have
an
issue
where
we
want
to
be
able
to
accommodate
experimental
workflows
without
causing
disruption
to
the
existing
workloads.
E
Now
we're
going
to
use
reservations
for
this,
and
there
is
an
issue
in
scheduling
where,
if
you
have
reservations
in
place,
they
can
cause
a
what's
called
a
shadow
in
the
on
the
workload.
So
the
idea
was
we
would
have
something
which
would
allow
a
reservation
to
say.
I
will
allow
other
people
to
use
these
nodes
as
long
as
they
agree
that
they
will
be
preempted
within
a
certain
amount
of
time,
as
my
experimental
workload
arrives.
E
So
that's
now
insulin,
2002.
We
have
done
testing
with
that
to
check
that
it
works.
The
issue
we
have
is
integrating
it
with
how
we
charge
for
these
two
systems
and
there's
also
a
test
configuration
on
gertie,
which
is
the
test
system,
the
corrie
for
the
nurse
staff
for
people
to
experiment
with
coming
in
2011,
which
we
have
it,
was
released
in
in
november
the
end
of
november
and
is
currently
on
the
test
systems
for
call
mutter.
E
We
have
s
crontab
and
the
idea
here
is
crontab
workflows.
They
often
tie
people
to
particular
login
nodes.
That
means
that,
should
that
login
node
get
a
hardware
failure,
then
those
tasks
won't
run
and
we
need
things
that
are
resilient
in
the
face
of
that,
and
the
plan
here
is
that
essentially
the
skid
md
folks
have
implemented
s-con
tab,
which
is
a
command
that
looks
very
much
like
the
contact
command
and
allows
you
to
specify
jobs
to
run
at
certain
times.
It
has
another
nice
effect,
which
is
you
don't
have
the
case.
E
If
you
you
say,
have
something
running
every
hour
and
your
task
for
some
reason
takes
an
hour
and
ten
minutes.
Then
you
end
up
with
two
running.
At
the
same
time
with
this,
it
will
only
start,
then,
the
next
one
when
the
previous
one
is
finished,
and
it's
it's
also
people
for
this
work.
A
Great
thanks
chris
all
right,
so
we've
really
done
a
lot
of
work
in
the
last
year
and
we're
already
really
thinking
very
carefully
about
what
we're
planning
to
do
in
the
in
the
next
12
months.
A
I'm
in
the
last
year
of
the
project,
our
main
focuses
are
really
around
getting
the
science
teams
up
and
running
on
perlmutter
and
being
able
to
demonstrate
that
they
can
run
an
automated
pipeline,
whether
that's
on
cory
or
pelmeter,
so
running
their
analysis
without
human
intervention,
and
then
we're
also
thinking
very
carefully
about
a
sustainable
support
plan.
For
this.
How
do
we
transition
the
tools
to
being
able
to
handle
long-term
support,
ensure
that
their
production
hardened?
A
That's
another
area,
we're
working
on
this
year
and
really
what
all
the
work
we're
doing
here
within
the
super
facility
project
is
pretty
groundbreaking
and
we
are
making
sure
by
doing
this
work,
we're
ensuring
that
we're
well
placed
to
take
advantage
of
the
future
directions
for
oscar
infrastructure.
These
are
discussions
that
are
going
on
within
the
pro
between
the
program
managers
and
between
people
in
oscar.
A
We
are
very
well
placed
to
take
advantage
of
and
to
lead
in
the
area
of
developing
components
for
the
framework
for
geographically
distributed
workflows,
and
I
think
that's
something
that's
going
to
be
very
exciting
for
the
future.
So
I
just
finished
by
saying
many
thanks
to
the
super
facility
team.
This
has
been
a
very
challenging
year
and
they've
been
doing
fantastic
work
and
I
think
you
all
agree
they've
been
doing
a
lot
of
really
impressive
work
in
this
past
year.
So
thanks
to
everyone.