►
From YouTube: 2019-10-25 - David Skinner - Opportunities and Challenges in Linking DAQ and HPC Systems
Description
NERSC Data Seminars: https://github.com/NERSC/data-seminars
Abstract: This talk looks at HPC from the perspective of data acquisition systems (DAQs) at experimental and observational facilities. A systems view of DOE-scale instruments producing data increasingly includes opportunities and/or needs to modulate computational intensity upwards. Experimental facilities that "plug into HPC" for this purpose often include DAQ hardware and systems for which HPC can be a challenge. This seminar reviews related issues from recent data science engagements at NERSC.
A
Yeah
well,
thank
you
for
attending
expert
on
much
at
all
really,
but
in
something
like
super
facilities,
there's
so
much
so
much
to
do
that.
There
are
a
lot
of
different
angles
to
approach
things
from.
So
this
is
a
very
much
a
generalist
sort
of
overview.
Looking
at
data
acquisition
systems
and
computing
I
encourage
you
to
think
of
the
term
data
acquisition
system,
which
I'll
define
a
little
bit
here
in
a
pretty
generalized
way.
A
If
you
need
any
convincing
of
that
look
away
from
maybe
the
project
that
you're
deeply
enmeshed
in
that
the
you
know,
15
or
other
projects
that
nourish
can
csr
involved
in,
and
there
are
many
many
great
projects,
but
their
commonalities
are,
can
be
difficult
to
find.
So
you
know
this
is
a
talk
about
the
intersectionality
rather
than
about
the
specificity
of
a
given
DAC.
So
I'd
start
off
with
one
that
you
could
find
at
a
garage,
sale.
A
A
We
have
some
control
over
how
those
bits
are
are
formed
music
camera
to
microphone
down
here,
but
you
can,
of
course,
think
of
better
things
than
that,
let's
start
with
another
dak
eyeball
another
one
that
we
know
you
know
is
the
this
is
a
acquisition
system.
Sort
of
you
know
formed
over
a
lot
of
time
and.
A
The
data
is
the
the
signal
is
being
manipulated,
not
just
carried
right,
and
so
you
know
some
of
the
things.
What
can
you
do
when
you
first
have
parallax
between
two
detectors?
You
know
you
can
compute
angles.
So
you
know
this
is
a
biologically
inspiring
example
of
kind
of
a
end-to-end
system
that
takes
in
photons
and
get
something
useful
out
of
it.
And,
remarkably,
you
know
it
happens
pretty
quickly,
so
to
add
to
the
pantheon
of
rather
remarkable
data
acquisition
systems.
A
But
it's
worth
thinking
about
how
it
got
to
be.
You
know,
and
they've
got
to
be
by
a
large
number
of
people
over
a
large
number
of
time,
making
a
lot
of
conscientious
decisions
about
what
should
or
shouldn't
happen
inside
here,
and
so
you
know
that
begs
the
question
in
a
way
you
know
if
this
is
one
US
team
that
is
paving
a
road
for
data
to
come
to
our
Center.
You
know
what
the
other
road
look
like,
particularly
how
many
you
know.
This
question
asks
yourself
to
theory
in
the
areas.
A
How
many
of
us
expect
to
pull
this
off
right
to
do
something
like
this,
where,
where
they
have
done
enough
of
the
the
front-loading
sort
of
work
that
that,
when
the
data
arrives
here,
we're
not
just
catching
up
right.
So
this
is
a
complement
to
the
high
energy
physics
community.
It's
also
a
recognition
that
this
was
a
lot
of
money,
a
lot
of
time,
a
lot
of
people.
This
is
not
a
small
project,
and
if
you
have
questions,
let
me
know
we
can
stop
along
the
way.
So
I
have
pretty
pretty
amazing
data
acquisition
systems.
A
A
Diverse
set
of
physical
mechanisms
for
the
data
to
us-
and
you
know
again
some
questions
to
ask
about
how
is
the
data
gonna
get
here
and
in
what
form?
And
things
like
that
is-
and
this
is
a
question
I-
don't
know
the
answer
to,
but
many
people
in
the
audience
kind
of
have
any
of
these
devices
facilities.
You
know
have
really
planned.
You
know
for
4h
pcs
day,
and
you
know
I
don't
mean
to
say
that
HPC
planning
is
something
that
people
need
to
target
us
directly,
but
you
know
the
basic
issues
you
know
bandwidth.
A
You
know
the
time
to
process
the
data.
If
you,
you
know
crank
up
the
machine
to
the
next
level,
are
you
going
to
produce
a
million
files
that
you're
going
to
do
those
types
of
things?
These
are
really
common
issues
that
that
can
go
unresolved
without
planning,
but
to
the
degree
that
we
can
think
about
the
mechanisms
and
devices
that
are
going
to
be
bringing
data
to
us
in
the
future.
The
earlier
we
get
involved
the
better
as
yeah,
okay,
so
taking
away
the
standard
matrix
of
you
know
existing
established
gue
partners.
A
You
know
there's
likely
to
be
a
lot
more
and-and-and.
We
might
want
to
think
about
those
as
far
ahead
of
time
as
we
can-
and
you
know,
one
I
won't
talk
about
a
lot
in
this
talk,
but
one
motivating
example
in
those
in
along
that
lines
of
planning
that
that
I'd
say
pretty
much
anybody.
Any
any
staff
here
could
do
is
that
if
you're
working
on
a
proposal,
you
know
for
somebody
who's
saying:
hey
I
want
to
build
this
amazing
new
thing.
A
If
you
have
some
sensibilities
about
data
planning,
one
thing
you
could
say
is
you
know
hey?
Can
we
work
on
a
data
management
plan
together
or
hey,
go
talk
to
David
or
Debbie,
or
you
know,
and
somebody
in
one
of
these
groups
you
know
to
build
a
data
management
plan
for
those
devices.
So
what
are
these
things.
A
People
in
this
talk
because,
like
I
said,
if
you're,
if
you're
you
know
natural
selection
and
you've
got
a
million
years,
you
could
you
you
can
make
a
pretty
good
DAC
if
you're
the
military
and
you
want
to
apply
satellite
we've
got
film
out
of
the
atmosphere
and
catch
it.
You
know,
there's
you
can
build
amazing
data
acquisition
systems
with
enough
resources,
but
realistically.
A
So
that
was
a
very
hard
to
use
type
of
capability,
not
too
many
years
ago.
Now,
if
you
go
to
the
ALS
users
meeting,
the
vendors
have
a
have
a
table
there
or
they
want
to
sell
you
these
things
right,
and
so
that's
all
well
and
good,
but
again
how
much
data
planning
is
there
that's
lying
there
and
if
you
know
somebody
yet
ALS
or
anywhere
that
says
oh
yeah,
I'm
gonna
buy
a
hundred
of
these
things
and
we're
gonna
set
them
up
and
do
this
amazing.
A
A
Ones
that
are
increasingly
commodity
and
and
off-the-shelf,
oh
yeah,
so
so
what
these
things
talk
to
HPC
if
needed,
and
do
they
do
they
need
to
talk
to
HPC.
So
a
couple
things
to
rule
out,
you
know-
and
you
know,
I
bring
this
up
pretty
frequently
in
meetings
in
other
contexts.
Is
that
simplicity,
you
know,
can
be
a
terrible
way
to
get
started
with
with
things
right.
A
So,
if
you
want
to
say,
let's,
let's
make
a
solution
that
works
for
everything,
and
then
you
know
this
is
going
to
fly
into
all
these
stacks
I
I
would
I
would
challenge
that.
That's
that's
hopeless
right
and
that,
instead
you
know
the
motivations
connect
to
HPC
need
to
be
there
in
the
first
place
and
the
the
capabilities
and
resources
you
know
to
make
it
a
connection
need
to
be
there
too.
A
A
So
that
all
of
this
talk
is
opportunities
and
challenges,
and
this
is
kind
of
the
skeleton
of
that,
and
and
it's
not
meant
to
be
exhaustive,
but
you
know
these
are
two
opportunities
that
I
think
are
recyclable
and
I
know
they're
recyclable,
because
I've
seen
a
you
know,
I've
seen
parts
of
it.
So
the
first
is
says
that
if
you're,
this
is
in
the
context
of
us
right.
A
If
you
can
identify
this
piece,
we're
adding
HPC
allows
uniquely
new
science
or
some
sort
of
breakthrough
in
front
of
50
a
a
burst,
that's
HBC,
so
breakthrough
is
relative
means
different
things
to
different
people,
but
if
you
can
convince
a
do
a
program
manager
that
it's
a
breakthrough,
then
it's
a
breakthrough.
So
you
know
what
does
that
look
like
in
a
couple
of
cases?
It
could
be.
You
know
strong
argument
towards.
We
couldn't
do
this
work
unless
we
have
this
bigger
computer,
we
would
be
really
slow
at
doing
this.
Work.
A
Second
is
really
about
technology
development,
which
says
that
you
know
the
externalized
cost
of
messy
dated
devices
is
in
part
due
to
the
inability
of
the
people
making
the
devices
to
provision
and
realize
those
costs
right.
So
if
the
development
happens
next
to
a
supercomputer,
then
they
can
be
sloppy
for
a
while
and
then
figure
out
how
to
take
that
slop
out
of
the
device
right
and
so
first
and
co.design
are
sort
of
these
two
reasons
for
for
doing
this,
that
I
bring
up.
So
any
questions.
A
Okay,
if
you
have
other
opportunities,
have
you
interested
in
hearing
about
them?
Many
challenges.
Many
challenges
ignore
all
the
challenges
where
there
is
no
opportunity.
Okay,
like
you,
can
imagine
a
million
things
that
could
be
hard
right,
but
if
there's
no
guiding
opportunity
behind
it-
and
you
know-
that's
not
there,
but
in
the
cases
where
people
are
eating
down
our
door
and
saying
hey,
we
really
need
to
use
nurse.
What
are
some
of
the
issues
that
come
up
team
size,
one
that
I
already
mentioned?
Lack
of
vendor
HPC
integration,
lack
of
forethought.
A
You
know
in
a
way
terrible
richness
of
file
formats,
each
with
their
own
kind
of
performance,
proprietary,
Ness
issues
in
some
cases,
for
some
instruments
as
well
and
that
the
the
overall
cadence
you
know
that
remarkable
that
a
physics
project
can,
you
know,
look
as
far
out
as
they
do,
but
chemistry
and
a
lot
of
other
projects
you
know
are
both
in
there
and
their
planning
and
their
execution
necessarily
going
to
be
much
shorter
right.
So
I'd
be
open
to
arguments
to
the
contrary.
A
But
to
me
that
implies
the
greater
need
for
people
at
computing
centers
to
fill
in
the
parts
that
can
be
there
right
rather
than
watch
people.
You
know
what
watch
a
large
number
of
small
teams
try
to
reproduce
solutions
that
without
the
benefit
of
that
prior
work,
so
I
guess.
This
is
a
way
of
saying
that
that
this
computer
centers
should
do
more,
not
less.
A
A
A
A
A
A
So
that's
an
example:
look
kind
of
somewhat
local
example
from
the
past,
but
the
issues
with
you
know
big
teams
and
I'm.
Sorry,
big
science
and
big
teams
versus
small
teams.
You
know
are
numerous
and
show
up
here
the
we're
obliged
to
work
with
everybody,
I'm
trying
to
say
here
you
know.
But
realistically
you
know
that
this
is
something
where
people
to
people
sit
down
and
then
operate
this
whole.
You
know
orchestrated
thing
where
they
know
how
to
make
the
DAC
work
and
they
make
the
network
work
see
where
there's
their
HPC.
A
A
A
But
this
is,
and
the
reason
we're
driving
it
faster
is
that
so
the
faster
you
can
process
those
images
the
faster
you
need
that
structure.
If
you
can
do
it
within
a
hour,
shift,
that's
kind
of
a
breakthrough
and
that
you
can
then
get
answers
while
you're
operating
rather
than
take
data,
stop
understand,
take
doubt
stop
and
understand.
A
A
A
A
But
compressing
and
reducing-
and
this
was
an
anathema
to
the
project
you
know
some
years
ago-
you
know
they
thought
that
their
community
would
not
stand
for
it,
that
they
would
want
to
have
all
all
the
bits,
and
so
what
we've
seen
is
a
transition
to
know
the
realization
that
there's
gonna
have
to
be
some
sort
of
compression
or
some
sort
of
data
sensibility.
That's
that's.
You
know
closer
to
the
aisle
I.
A
A
A
A
Engagement
for
HPC
is
that
if
we
kind
of
regularly
go
after
the
projects
that
are
about
to
break
the
network
or
their
computer
or
whatever,
those
are
the
ones
to
chase-
and
there
are
there
are
many
here-
probably
not
to
chase
right,
but
as
more
and
more
attacks
more
methodologies,
you
know
advance
particular
you
know
yeah
yeah,
that
one
of
these
methods
could
burst
out
come
to
nurse
and
say:
oh,
my
gosh,
you
know
our
data
is
on
fire.
We
have
a
big
problem.
You
know.
A
Ideally,
people
at
nurse
would
be
well
well
prepared
to
say:
hey
cool
down.
You
know
this
part
problem
is
in
the
in
the
basket.
You
know
because
we
can
just
use
it
from
this
other
team
and
by
maybe
this
other
part
of
that.
That's
totally
on
you
and
you're
gonna
have
to
make
that
you
know
so
and
here's
some
of
those
you
know
a
BTS
has
some
has
some
food
20:27
plans
and
things
like
that.
A
A
A
And
you
just
wash
them
between
glass
plates
and
freeze
it
in
a
way
that
the
ice
does
not
harm
the
thing.
You
know
that
you
can
freeze
water
in
ways
that
it
that
it's
still
kind
of
non
crystalline
and
and
then
you
come
in
with
an
electron
microscope
and
look
at
all
those
little
macro
molecular
machines
that
you
have
in
there
and
you
get
a
blob
for
each
one
and
that
huge
blob
that
came
up
that
I
got
rid
of
all
those
little
shapes.
A
So
these
people
are
putting
some
sort
of
macro
molecular
fluid.
You
know
art
into
this
device
and
then
collecting.
Let's
say
tens
to
hundreds
of
thousands
of
images
of
what
they're
hoping
is
the
same
thing
you
know,
but
but
different
angles.
So
one
one
common
thing
that
you'll
see
through
a
lot
of
these.
These
BES
things
is
that
a
lot
of
really
bad
pictures
are
the
same
thing
or
useful.
A
A
A
Are
very,
very
tight
together
and
so
the
point
at
which
a
crime
person,
a
crime
expert,
you
know,
often
needs
to
use
the
supercomputer
is
when
they
click
on
an
image.
Click
on
this
image
when
I
click
on
this
image-
and
that
is
a
that's
a
very
difficult
thing
to
schedule.
So
some
of
the
the
discussions
there
about
blending,
you
know
we
take
the
human
out
of
the
loop
a
little
bit
and
put
more
humans
into
the
loop
in
a
more
conservative
way
that
makes
it
less
less
interactive.
A
A
A
When
I
say,
externalize
cost
I
have
really
an
interesting
conversation
with
this
detect
ryx
guy
up
there
because
he
wants
to
sell
devices.
You
know
the
downstream
consequences
of
data
or
not
not
not
in
there,
but
you
know
so
I
think
there
are
different
solutions
to
this,
but
planning
getting
ahead
of
the
game.
I
think
is
really
the
best.
You
know
whether
or
not
we
can
influence
a
vendor.
A
Okay,
so
so
that
you
know
my
plug
earlier
about
get
involved
in
the
DMP
right,
you
know
that
writing
a
DMP
as
a
data
management
plan,
and
it's
it's
it's
a
fairly
loosey-goosey
commitment
to
what
you're
going
to
do
with
your
data.
That
is
a
new
requirement
and
do
we
work
but
better
to
have
some
kind
of
plan
than
nothing
right,
and
so
one
way
to
get
ahead
of
the
game
is
to
say
we're
going
to
state
that
planning
in
that
in
the
DMP
and
in
not
in
any
case
it's
really.
A
A
Okay,
so
my
last
thing
on
cryo-em
is:
if
you
know
how
to
change
the
workflow,
let
me
know
you
know
I
otherwise,
I
think
that
my
imagination
has
that
community
kind
of
crashing
up
against
data
issues,
maybe.
A
A
A
Maybe
100
gigabytes
and
then
I
suppose
the
worst
thing
they
could
do
is
produce
an
instrument
that
did
none
of
the
data
analysis
it
just
like.
That's
somebody
else's
problem
right
and
and
really
what
we
want
to
encourage
them
to
do,
and
they
want
to
do
too
is
figure
out
which
parts
of
that
can
be
moved
back
into
the
into
the
deck
right
that
that
make
the
data
higher
quality
and
and
and
make
the
instrument
more
usable
too.
So.
A
A
A
These
are
becoming
more
like
one
to
one
is
nearly
ready,
and
so
this
is
going
to
look
more
like
a
file.
You
know
a
host,
a
house
as
it
comes
out,
so
that's
solution,
number
one
and
solution.
Number
two
is:
is
that
if
we're
all
using
Ethernet,
then
maybe
problem
goes
away
chain.
Laughs,
II
think.
Is
that
unrealistic.
A
A
A
A
A
A
You
know
and
all
of
that-
and
so
you
know
for
LCLs-
that's
about
three
three
people,
two
or
three
people
now
and
that's
that's,
not
automated,
but
it's
it's
way
better
than,
and
of
course,
these
egg
does
use
your
engagement
to
so
there's
a
lot
of
a
lot
of
a
lot
of
ways
that
we
can
get
involved
before
people
get
surprised.