►
From YouTube: NUG Meeting 2014: Skinner
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
I
think
we
all
agree
that
there's
no
doubt
that
the
acquisition
and
analysis
of
huge
data
sets
is
really
transforming
the
way
we
do
science
and
in
the
way
that
the
senator
is
like
nurse
are
operating.
Our
first
speaker,
David
Skinner,
was
getting
wired
up
here,
really
has
been
a
pioneer
in
integrating
data
producing
facilities
in
de
nurse
in
shaving.
B
A
B
You
Richard
welcome
Doug
members,
so
for
any
of
you
who
were
at
supercomputing
this
last
year,
you
heard
a
talk
called
more
than
likely
heard
of
talk
by
Kathy
Alec
called
more
data,
more
science,
Moore's
law-
and
this
is
a
very
interesting
thought,
provoking
talk
that
Kathy
gave
supercomputing
that
unveiled
some
of
the
first
real.
What
I
would
call
substantive
directions
about.
What
are
we
going
to
do
in
data,
and
let
me
apologize
first
by
saying
that
data
is
not
a
very
descriptive
term.
I
understand
means
numbers
and
for
different
people
here
in
the
audience.
B
That
probably
means
different
things.
So
we
concatenate
a
lot
of
meaning
into
this
this
one
more
data,
but
you
know
it
can
be
how
you
are
gathering
the
data,
how
you
are
logistical
e,
managing
the
data,
how
you're
analyzing
the
data,
how
you're
curating
the
data,
how
you're
making
the
data
do
great
science
after
you're
done
with
it.
So
those
are
all
sort
of
things
that,
as
you
look
at
these
slides
to
try
to
take
a
broad
view
of
data.
B
So
nurse
strategy
is,
is
a
real,
simple
one,
and
sometimes
you
know
simple
guiding
principles.
Are
you
really
would
make
an
organization
excel
and
internally
you
know,
staff
and
other
decisions
that
are
made
within
nurse
really
come
back
to
this
real,
simple
sort
of
criteria,
for
where
should
we
go?
What
should
we
do
and
we're
at
a
time
now
in
people's
generation
of
data
storage,
of
data
data
policies
that
are
coming
from
different
areas
that
we
really
are
asking
those
sorts
of
questions
again
about?
Okay?
B
B
The
the
second
area
is
a
simulation
and
data
analysis
and
extreme
scale.
Data
analysis
is,
is
the
the
second
of
two
major
thrusts
in
the
current
nurse
strategic
plan
and
I'll
be
I'll,
be
talking
about
those
right
now,
I
wanted
to
give
kind
of
macro
view.
This
isn't
a
nurse
slide.
So
don't
don't
attribute
this
to
nurse
design,
mrs.
from
a
community
report
that
was
called
scientific
collaboration
for
extreme
scale.
Science-
and
you
know
you
see
in
here-
let's
see
where
the
laser
some
familiar
things
computing
facilities.
B
You
know
down
here,
storage,
these
are
sort
of
bread
and
butter
topics
for
a
nurse
go
over
all,
but
you
know
in
in
the
the
overall
ecosystem
of
collaborative
science
and
research.
You
know
these
are
parts
of
a
lot
of
different
resources
that
coexist
and
at
different
strata
sort
of
the
you
know
the
physical
layer
of
facilities
at
the
the
kind
of
middleware
layer
of
things
that
join
different
science
projects
together
and
then
these
these
higher
level
knowledge
knowledge
seeking
and
collaborative
environments
that
that
are
really
tremendously
important
to
scientists.
B
So
in
taking
the
in
this
whole
ecosystem
here
I
think
it's
really
important
for
resource
providers
for
centers.
For
for
all
of
you,
you
know
to
take
a
broad
view
of
this
and
figure
out.
How
can
we
make
these
components?
Work
together,
really
well,
the
idea
of
a
computing
facility
as
a
standalone
entity
that
you
would
you
know,
take
your
punch
cards
to
and-
and
you
know,
show
up
at
the
computing
facility
run
your
computation.
That
activity
is
less
and
less.
B
B
So
the
last
leg
was
sort
of
a
community
view
on
this.
This
is
is
more
from
from
nurse
view.
These
are
some
speeds
and
feeds
from
science
topics
and
projects
that
are
currently
active
within
nurse
broken
out.
You
know
the
only
thing
worse
than
the
term
data
in
terms
of
describing
things
is
probably
big
data,
but
you
know
we'll
use
that,
and
this
slide
is
meant
to
sketch
out
kind
of
what
we
mean.
B
By
that
we
mean
you,
no
data
that
comes
in
volume,
data
that
moves
very
fast
data
that
comes
in
a
lot
of
different
types
and
data.
That's
that
needs
to
be
checked,
or
that
has
holes
or
gaps
or
errors
in
it
right
and
these
are
these
permeate.
The
scientific
and
programmatic
concerns
that
nurse
has-
and
there
are
some
drivers
here
that
are
pushing
this
in
two
directions
that
make
it
a
concern
that
all
of
us
need
to
to
really
begin
to
ponder
in
terms
of
strategy.
B
B
Okay,
so
in
some
sense
people
at
nurse
are
able
to
think
of
all
of
these
things
as
being
in
a
project
directory
someplace
and
and
that's
what
I
mean
and
all
of
these
have
project
directors,
but
they
come
from
very,
very
different
places,
and
you
know
that
diversity
has
been
a
real
strength
of
nurse
going
back.
A
long
time
is
that
we're
kind
of
the
place
where
a
lot
of
the
nation
science
data
across?
Why
you
know
science
disciplines
meets
and
best
practices
can
be
shared
in
the
most
optimistic
case.
B
We
can
actually
do
some
really
cool
data
fusion
in
bringing
data
together
from
different
areas
and
come
up
with
new
discoveries.
So
these
these
are
some
of
the
current
big
drivers
and
they
come
from
all
over
the
place
and
if
you
work
at
nurse
nowadays,
one
of
the
things
you
spend
some
time
thinking
about
is
how
does
this
all
fit
together?
B
The
other
motivating
slide
I
have
a
couple
more
in
terms
of
where
are
we
at
now?
You
know
just
has
to
do
with
with
reckoning
the
the
growth
rates
that
we're
seeing
with
detectors
and
sequencers,
and
this
slide
compares
to
those
two
where
CPUs
and
memory
are
going.
So
if
we
want
to
keep
the
end-to-end
process
of
big
team
science
moving,
we
need
to
avoid
pileups.
We
need
to
invoice
impedance
mismatches
where
the
detector
can't
talk
to
the
processor
and
and
so
on.
B
So
very
specifically-
and
there
are
some
technology
trends
that
we
can
get
into
that
sort
of
or
behind
that
that
last
slide
overall.
But
you
know
technology
trends
don't
mean
a
whole
lot.
If,
if
they're,
you
know
something
that
came
and
went
and
tell
you,
I
never
dealt
with
it.
So
I
I
myself,
barely
ever
dealt
with
cuda.
You
know
I've
sort
of
dodged,
a
CUDA
bullet
in
some
ways,
but
there's
some
bullets.
B
You
can't
dodge-
and
this
is
one
which
is
the
amount
of
data
that
comes
across
our
our
border
every
day,
so
every
blue
dot
is
a
day,
their
high
bandwidth
days
and
low
bandwidth
days,
but
over
the
course
of
decades.
The
trend
exponential
increase
in
data
movement
is
quite
clear
and
I
I
like
that,
because
it
tells
us
you
know,
maybe
where
we'll
be
in
2016,
and
if
things
continue
the
way
they
are
right
now.
So
the
the
traffic
has
driven
really
by
automated
data
pipelines,
large-scale
processing
from
genomics
Large,
Hadron,
Collider,
increasingly
image
processing.
B
So
why
image
processing
while
telescopes
microscopes
light
sources
all
these
things?
You
know
in
essence
our
cameras
have
one
sort
or
another
and
they
generate
large
amounts
of
image.
Data
data
comes
here,
I
like
to
think
at
least
I'm
open
to
feedback
from
other
people,
certainly
because
nurse
gives
a
secure,
reliable,
fast,
open
and
flexible
place
for
scientific
data.
We
we've
done
a
really
good
job,
keeping
out
of
the
crypto
card
business,
thus
far
and
a
whole
bunch
of
other
little
things
like
that.
B
If
you
go
over
to
the
ALS,
there
are
some
people
with
new
cameras,
very
ambitious
data
strategies.
Overall,
there
are
some
people
who
you
know
will
abide
perfectly
well
for
a
long
time
without
having
to
to
reboot
their
they're.
Thinking
around
data
and
that's
fine
and
that's
also
something
that
nurse
giz
is
really
accustomed
to.
You
know
we
have
such
a
broad
user
base
overall,
that
you
know
we're
not
trying
to
corral
everybody
into
the
same
direction
to
get
on
the
bleeding
edge
or
things
like
that.
B
We,
as
you
know,
support
Fortran
and
lots
of
things
that
have
been
around
for
a
long
time.
So
you
know
a
portfolio
approach
is
good,
so
for
some
people
this
is
is
tomorrow
for
some
people.
This
is
why
didn't
we
start
planning
this
five
years
ago,
and
so
that
this
cycle
really
starts
with
with
these
detectors
and
detectors
that
are
see,
CDs
are
on
a
super
Moore's
Law
data
ascent.
B
You
know
stages
here,
of
managing
and
sharing
data
that
could
be
within
a
team
to
larger
teams
and,
ultimately
making
the
these
facilities
a
secure,
integrated,
real
time
and
sort
of
programmable
resource.
So
I
it
used
to
be
sort
of
far-flung
a
little
bit
to
manage
somebody
using
you
know
a
computer
and
a
light
source
and
yes
net
all
together
at
the
same
time,
and
that
was
a
sort
of
heroic
act
of
scheduling
and
getting
your
right
allocation
at
the
right
time.
B
So
in
some
sense
this
is
real
simple.
You
know
if
you
want
to
complete
that
whole
loop
there,
you
just
put
data
on
the
web
right,
and
this
is
an
example
from
Cappy
of
actually
of
how
my
kids
do
research
for
their
their
homework,
and
things
like
that
is.
If
you
want
to
learn
about
something
you
just
go
to
google
and
you
put
in
the
name
and
sure
enough
knowledge.
B
B
So,
let's,
let's
look
at
some
ways
that
we
can
do
a
little
bit
more
than
just
Google
for
things,
so
some
of
that
these
new
data
methods
that
that
are
certainly
in
discussion
from
requirements,
reviews
and
a
lot
of
other
a
lot
of
other
discussions,
and
in
fact
I
mentioned
that
part
of
the
reason
that
not
everybody's
here
is
because
they're
off
talking
about
these
things,
you
know
in
DC
and
other
places.
So
you
can.
You
can
read
as
much
or
as
little
of
this
as
you
want.
B
I
hope
you
find
a
few
things
in
there
that
you
find
interesting
or
relevant
to
you.
It's
not
exhaustive,
but
my
main
intent
in
in
you
know
having
such
a
verbose
set
of
words.
There
is
just
to
get
across
that
this
is
not
a
question
of
how
many
disks
to
buy
right.
This
is
not
a
storage
capacity
issue
that
says
you
know
we
just
need
a
certain
amount
of
storage
in
the
mall
store.
B
All
the
data
people
want
to
do
really
interesting
things
with
you
know
they
want
to
do
things
like
deep
search
where
you
could
ask
genuinely.
You
know
you
know
genuinely
interesting
scientific
questions
about
data
sort
of
in
their
native
format
and
get
answers
from
them.
There's
a
whole
bunch
more
here
as
well,
that
will
sort
of
get
into.
But
these
the
this,
this
massive
pile
of
requests,
you
know,
will
surely
get
kind
of
winnow
down
over
time
as
we
recognize
which
technologies
really
work
well,
which
ones?
B
Maybe
we
can
forego
and
it's
it
sort
of
seems
to
me
the
right
time
did
for
the
community
to
be
asking.
How
do
they
see
these
new
data
methods
being
delivered?
Is
this
a
collection
of
tools,
or
is
this
a
collection
of
api's,
or
is
it
both,
and
certainly
some
of
the
R&D
efforts
that
I've
seen
before
in
high
performance
computing
have
had
kind
of
a
tools,
middleware
type
of
approach?
B
And
you
know
it's
not
clear
to
me
that
a
single
big
monolithic
tool
or
collection
of
panelized
tools
is,
is
going
to
be
able
to
address
this.
This
wide
collection
of
things
and
one
one
of
those
reasons-
is
that
a
lot
of
the
computing
that
people
want
to
get
done
in
the
data
analysis
that
people
want
to
get
done.
They
don't
necessarily
even
want
to
see
they
want
that
to
happen
automatically
in
the
background
part
of
a
workflow.
They
want
the
answer.
B
They
want
to
be
able
to
make
good
scientific
discussion
decisions
or
achieve
scientific
insight
through
knowledge
that
still
delivered
to
them
from
day.
So
where
we
are
right
now,
at
least
is
is
in
you
know,
space
where
there's
I
think
over
220.
Now
scientific
data
and
computing
application
programming
interfaces
that
are
written,
a
sort
of
modern
restful
format.
B
You
can
go
to
some
of
these
URLs
to
find
them,
and
so
whether
it's
access
to
data
access
to
services,
you
know
a
lot
of
these
sorts
of
things
are
becoming
available
through
programmatic
ways,
which
feeds
exactly
into
what
I
was
talking
about
earlier
about.
You
know,
build
your
own
science
capability
using
the
resources
that
you
want
from
different
from
different
facilities.
Ap
is
are
a
great
way
to
do
that.
B
So
I
want
to
tie
this
into
into
simulation
all
kind
of
bounce
between
simulation
and
data,
as
if
they
were
separate
topics
a
little
bit,
and
this
is
a
imagined,
modern
scientific
work
flow
from
the
scientific
computing
for
extreme
scale.
Collaboration,
scientific
collaboration
for
extreme
scale.
Science
report
that
came
out-
and
this
is-
is
the
s
3d.
B
It's
written
little
small
up
there,
but
if
you
know
s
3d
and
GDC,
this
is
sort
of
their
workflow
and
there
are
a
lot
of
technology
components
that
are
called
out
here
as
to
where
storage
and
you
know,
distributed
computing
where
data
analytics
happens,
and
things
like
that.
But
it's
broken
into
these
sort
of
two
streams
of
post-processing
and
Institute.
B
The
most
important
thing
I,
think
about
Institute
processing
is
that
it's
no
different
than
post-processing
it's
just
earlier
right
and
so
part
of
what
we're
part
of
what
we're
working
out
here
is
to
move
these
these
tools
and
processes
that
used
to
that
used
to
happen
after
the
fact
to
build
them
in.
So
there
are
variety
of
reasons
you
might
want
to
do
that,
probably
the
most
compelling
one
is
just
time
to
solution.
B
B
So
bringing
Institute
process
in
situ,
processing
and
post-processing
together
requires
a
lot
of
the
tools
and
data
methods
that
I
described
on
the
last
slide,
and
you
know
you
might
if
you're
a
software
design
thinker-
or
you
know,
motivated
to
kind
of
look
at
this
overall
workflow
and
figure
out.
Well,
how
do
these
parts
fit
in?
Who
are
the
stakeholders
in
the
different
components
and
where
do
they
where
they
come
together?
B
So
this
is
a
animated
rendition
of
a
similar
workflow,
but
here
motivated
in
beamline
science,
where
there's
a
data
pipeline
that
moves
data
to
storage
and
computing.
That
has
a
prompt
analysis
component
that
allows
people
very
quickly
to
be
able
to
run
the
simulations
and
to
bring
simulations
and
measurement
together
to
compare
them.
B
So
this
is,
you
know
not
far
in
the
future,
if
you're
doing
small-angle
x-ray
scattering
at
the
ALS,
you
know
that
you
need
simulations
that
that
to
overlay
on
your
curve,
sort
of
before
you're
done
doing
the
experiment,
and
so
the
the
speed
with
which
we
can
drive.
This
cyclist
is
crucially
important
to
that
sort
of
beamline
science.
B
You
know,
being
able
to
reuse
and
analyze
previously
collected
data
to
simulate
with
new
models,
discover
relationships
across
data
sets.
This
is
one
area
where
you
know
this
has
been
going
on
for
a
long
time,
but
you
know
I
think.
The
the
interesting
example
here
is
is
quasi
crystals,
the
for
which
a
nobel
prize
was
given
some
time
ago.
You
know
that
the
person
who
discovered
a
periodic,
pilings
and
quasi
crystals
was
staring
in
the
electron
micrograph
and
saw
something
that
they
didn't
weren't
able
to
kind
of
fit
together
into
their
point
group.
B
Symmetry
knowledge,
that's
something
that
you
could
you
could
detect.
You
know
across
large
amounts
of
data
and
so
being
able
to
discover
relationships
across
data
sets,
with
mathematical
analyses
as
tremendous
upside
potential
overall,
given
the
amount
of
things
that
people
have
discovered
simply
by
happening
to
stare
sort
of
at
the
right
micrograph
at
the
right
time,
the
being
able
to
fuse
data
get
together
from
other
other
disciplines.
This
is,
is
not
a
new
phenomenon,
certainly
overall,
but
big
data
in
the
in
the
commercial
sense
is
really
brought
together.
B
You
know,
statistics
and
computing
together
to
be
able
to
do
this
in
a
way
which
is
is
much
much
faster
than
then
had
happened
and
a
lot
of
times
before.
Let's
mention
here
with
myth,
machine
learning.
So
one
of
the
goals
of
this
this
model
of
discovery
with
machine
learning
is
to
to
take
what
would
be
a
postdoc
or
a
substantial.
B
You
know
human
invested
research
activity
that
is
go,
do
principal
component
analysis
or
support,
vector,
machine
or
random
forest
on
this
data.
Come
back
and
report
what
you
found.
You
know,
they're
in
both
in
the
private
sector
and
in
scientific
research.
The
idea
of
building
such
models
in
a
scalable
automated
way
through
machine
learning
you
know
is,
is
not
going
to
replace
the
postdoc
or
the
graduate
student
or
anything
like
that,
but
it
may
allow
everybody
to
move
further
faster
by
getting
those
models
generated
without
without
a
manual
process.
B
So
this
is
a
good
example
of
that,
and
this
is
if
your
job
is
to
count
cyclones
or
storms
in
a
large
climate
data
set.
You
might,
you
know,
use
visual
inspection,
you
know
to
go
through
and
say:
hey,
you
know,
I
know
what
a
storm
looks
like
I'll
go
through
and
find.
You
know
find
how
many
storms
there
were,
and
we
can
count
those
up
and
graph
them
over
time.
Well,
it
turns
out,
you
know,
they're,
pretty
good
mathematical
descriptions
of
what
you
know.
B
The
vorticity
and
a
storm
looks
like,
and
things
like
that
so
being
able
to.
Instead
of
retrieve
this
data
have
some
sort
of
manual
or
partially
manual
process
to
count
those
things
up
can
be
replaced
by
analysis.
That's
moved
to
the
data
and
that
that
counting
the
counting
of
storms
or
other
features
in
the
data
is,
is
automated.
B
The
storms
also
finding
kind
of
the
the
inherent
structure
in
data.
That
is
that,
if,
if
energy
or
in
this
case
mass
is
being,
you
know,
convected
along
particular
routes
within
a
system,
sometimes
it's
easier
to
think
about
those
routes
than
it
is
about
every
single
mesh
point
in
the
end,
the
entire
sample.
B
So
so
this
is
really
about
machine
learning
here,
the
other
another
unifying
abstraction
that
comes
across
a
lot
of
these.
These
projects,
in
better
at
nurses
that
are
working
with
data
in
new
ways,
is
what
I
might
call
radical
scaling.
It's
a
you
know.
By
that
I
mean
just
doing
data
fusion
and
reconciling
data
that
comes
from
very,
very
different
spatial
temporal
or
domain
areas,
and
these
aren't
ordered
in
chronological
order,
but
genomes
to
life.
You
certainly
had
this
this
viewpoint
a
long
time
ago.
Let's
take
sequences
and
make
them
useful
to
biology.
B
Kk
base
is
very
much
involved
in
that
sort
of
activity
right
now.
Building
models
on
genomics
data
microbes
do
biomes
is
you
know,
has
a
few
different
goals,
but
I
think
the
most
concretely
stated
one
that
I've
heard
is
to
study
the
microbial
biome
around
plants
in
the
same
way
that
the
human
microbiome
has
recently
been
mapped
and
looking
at
all,
the
different
critters
that
live
on
in
people
is
to
be
able
to
do
that
for
plants.
B
The
this
frontiers
approach
in
energy
intensity
and
the
cosmic
frontier
in
hell
is
is
a
is
a
bridging
across
a
very,
very
disparate
scales
problem
being
able
to
to
automatically
analyze
the
image
data
to
to
take
to
move
from
pixels
to
models
which
then
feed
back
into
these
sorts
of
activities,
and
this
is
a
sermon.
You
know
very
generalized
description
of
what's
happening
at
a
lot
of
light
sources
and
other
places,
which
is,
if
I
have
a
machine
that
can
take
a
lot
of
high-resolution
images
of
a
system,
that's
difficult
to
image.
B
Nowadays,
you
know
they
drop
macromolecular
machines
through
a
very
powerful
laser
that
lights
them
up
and
destroys
them
at
the
same
time,
and
all
of
those
samples
are
at
some
arbitrary,
you
know
rotation,
so
their
their
rotation
and
space
is
not
known,
and
you
get
a
little
bit
of
diffraction
out
of
each
one.
But
this
sort
of
process
here
is,
is
you
know,
taking
lots
and
lots
of
such
images
and
build
a
single
unified
model.
I'll
talk
a
little
bit
about
this
beam.
Lined
of
browser
scale
is
really
looking
at.
B
How
do
we
connect
very
very
fast
data
instruments
to
commodity
data
instruments,
laptops
things
like
that?
How
can
I
work
with
terabytes
of
data
that
comes
from
a
light
source
effectively
on
a
in
a
remote
way
being
able
to
couple
data
from
the
some
of
the
world's
best
climate
models
down
to
to
solve
regional
problems
and
and
lastly,
in
the
materials
genome?
B
Both
of
these
topics
we'll
hear
about
later
today
in
the
materials
genome,
has
this
sort
of
initial
goal
of
replacing
materials
design
with
search,
be
able
to
computationally
survey,
vast
expanses
of
possible
materials
and
then
answer
questions
that
are
designed
focused
by
by
searching
through
them.
All
this
is
is
getting
sort
of
moving
from
the
the
supercell
level
up
to
slightly
larger
levels
where
things
like
batteries
are
being
considered
and
engineered
materials
for
sunlight
to
fuels.
You
know
moving
from
materials
to
machines.
B
This
means
you
know
going
up
from
defects
to
functionally
electronic
materials,
to
nano
a
mesoscale
phenomenon
so
but
there's
a
lot
of
upward
possibility
in
these
things
and
one
of
the
the
unifying
aspects
of
how
we
see
scientists
working
with
data
is
being
able
to
accommodate,
really
radical
shifts
in
scaling,
and
here
are
some
of
the
tools
that
we're
talking
about
and
and
and
we're
hearing
about
from
user
requirements
in
the
dataspace.
Overall,
this
one's
sort
of
a
no-brainer
big,
fast
file
systems.
A
B
End
focus
in
scientific
workflows,
so
in
the
the
variety
and
veracity
part
of
of
the
the
data
slide
that
I
showed
before
I
thought.
You
know
this
is
an
interesting
example
that
is
one
of
the
science
gateways
that
runs
from
risk
is
America
X.
They
have
these
towers
that
are
collecting
a
wide
wide
variety
of
environmental
ecological
climate
data
that
that
are,
you
know,
have
a
wide
variety
of
sensors
on
them,
and
these
sensors
Marshall
their
data
to
to
nurse
and
other
locations,
whether
they're
organized
and
analyzed-
and
you
know,
I'm,
not
a
climate
scientist.
B
There
are
very
specific
prefigured
questions
that
that
are
built
into
the
ameri
flux
agenda
about
things
that
they
want
to
measure
carbon
and
things
like
that.
But
you
know
we
have
lots
of
examples
of
really.
You
know
amazing
scientific
discovery
that
comes
from
the
organization
and
analysis
of
data.
That
is
not
necessarily
obvious
where
it's
going.
So
in
this
case
you
know
people
trying
to
you
know,
look
at
the
the
noise
signal
in
one
of
their
antennas.
B
You
know
took
an
approach
where
they
began
a
real
systematic
analysis
of
that
noise
to
figure
out
where
it's
coming
from
what
it's
about
all
that
they
ended
up,
telling
us
something
about
the
overall
structure
of
the
universe.
So
I'm
tremendously
optimistic
that
sensor,
data
combined
with
simulation
data
combined
with
the
capability
to
organize
and
analyze
that
data
has
big
upsides,
being
able
to
reuse
and
reanalyze
previously
collected
data
is
another
cross-cutting
theme
in
these
projects.
You
know
the
you
could
you
can
take
all
the
data
you
want
and
file
it
away
on
a
big.
B
So
the
the
materials
genome
initiative
and
they're
kind
of
spear
point
project,
which
is
called
the
materials
project,
has
has
been
in
the
news
quite
a
bit
recently,
and
you
know
I've
already
already
described
it
a
little
bit,
but
you
know
I
think
that
some
of
the
key
key
things
that
are
not
necessarily
obvious
unless
you
kind
of
come
at
it
from
a
computing
center
perspective
or
about
durability
of
data
and
curation
of
data.
That
is
taking
an
active
view
in
a
project
about
how
is
this
data
going
to
be
used
later?
B
How
can
we
maximize
its
its
impact
and
people's
ability
to
leverage
it
so?
Who
here
who
and
here
is
heard
of
a
data
management
plan-
a
few
people
yeah
so
think
about
reuse
and
reanalysis,
and
things
like
that
when
it
comes
to
data
management
plans,
their
new
requirements
on
scientists
about
what
they
should
plan
to
do
with
their
data
and
for
a
lot
of
you?
That
means
that
you'll
be
writing
data
management
plans
in
your
grant
proposals,
and
things
like
that
and
I
just
want
to
point
out
that
you
know
keeping
the
data
in
case.
B
Somebody
wants
it
and
gosh
they're
going
to
have
to
email
me
and
then
we'll
figure
out
what
to
do
with
it.
You
know
that
that's
better
than
throwing
the
data
away,
and
it's
probably
better
for
your
funding
agency
too,
but
building
in
a
plan
to
reuse
and
reanalyze.
The
data
is,
is
you
know
tremendously?
You
know
tremendously
more
forward-looking
in
terms
of
what
can
be
done
with
all
this
data.
B
Multimodality
is
another
area
that
that
merits,
concern
and
nursing
is
a
great
place
to
look
at
multi
modality,
because
we
have
such
a
breadth
of
science
users
in
science
topics
this
example
that
kathy
gave
is
is
from
brain.
You
know
people
looking
at
the
the
structure
of
the
brain
and
there's
quite
a
bit
of
activity
in
this
through
the
brain
initiative
that
president
obama
announced,
but
the
brain
is
a
big,
complicated
thing
and
it'll
take
people
some
time
to
figure
it
out.
B
I'm
sure
the
one
of
the
routes
toward
that
is
looking
at
data
that
comes
from
different
modes
of
interrogation,
and
you
know
an
example
that
I'm
a
little
bit
more
familiar
with
compared
to
work
in
brain
and
neurology,
is
in
bio
imaging,
and
so
you
know,
if
you
take
one
sample
and
you're
able
to
assay
it
in
two
or
three
different
ways:
the
process
of
taking
those
different
assays
which
may
not
be
even
registerable
or
directly
comparable,
initially
being
able
to
overlay
them.
So
you
get
multiple
modes
of
data
at
every
point
or
object.
B
So,
to
give
you
a
kind
of
big-picture
view
of
this
sort
of
zooming
out
from
nurse
nurse
concerns
over
all
is
that
you
know
that
they're
there
are
people
within
the
Office
of
Science,
looking
at
advancing
scientific
knowledge
discovery
and
we
hear
about
knowledge
systems
a
little
bit
through
K
base
and
things
like
that.
But
this
this
really
is
a
broad
abroad
area.
B
You
know
we'll
probably
be
talking
more
about
data,
the
knowledge
for
quite
a
long
time,
but
I
want
to
connect
this
sort
of
upper
level
22
what
people
are
doing
with
that
data,
and
so
this
is
a
collection
of
activities
that
I'm,
hoping
that
the
nurse
users
in
the
audience
can
can
kind
of
take
in
and
look
at
how
they
might
leverage
them.
If
you
see
areas
that
you
think
are
sorely
lacking,
you
know
there's
still
time
to
to
influence
some
of
these
discussions
and
agendas
too.
B
So
if
you're
familiar
with
materials,
project
and
K
base,
and
some
of
these
other
things
you'll
see
that
you
know
really
attention
to
end
in
scientific
processes
doing
what
we
can
with
human
computer
interaction.
This
is
a
is
a
tough
that's,
a
tough
bullet
point,
in
particular
for
scientists,
who
are
not
all
that
well
known
for
making
sleek
intuitive
interfaces
overall,
but
there's
a
lot
of
work
to
be
done
there.
B
If
we
want
to
move
the
data
that
we're
collecting
into
this
sort
of
picture
of
being
able
to
inform
decisions-
and
so
you
know
from
a
resourcing
perspective-
we
have
computing
resources,
simulation
data
instruments
and
data,
analytic
appliances-
you
know-
maybe
these
two
are
the
same
thing
actually,
but
in
this
whole
process
being
able
to
discover
analyze,
present
and
then
deliver
back
knowledge
to
two
scientists
is
is
sort
of
a
the
big
picture
bowl
here.
So
how
do
we
get
there?
B
I'll
tell
you
first
I,
don't
know,
but
it's
really
fun
being
in
discussions
about
with
this
topic
about
people
who
are
trying
to
figure
it
out
and
I.
Think
just
in
the
last
year
in
particular,
we've
made
a
tremendous
amount
of
progress,
so
these
are
some
of
the
the
puzzle
pieces
as
they
come
together
here
for
extreme
data
science,
and
you
know
they're.
B
So
this
is
is
I
should
have
underlined
concept,
but
this
is
a
concept
that
the
kathy
presented
at
supercomputing
last
year.
There
are
other
concepts
that
are
out
there
too,
that
don't
have
this
XD
SF
name,
but
we're
all
sort
of
imagining
what
sort
of
resources
and
capabilities
would
drive
this.
There
was
an
interesting
set
of
slides
that
came
back
from
DC
yesterday,
where
there
are
some
discussions
going
on
in
the
the
technical
term
that
was
used
in
some
of
the
slides.
B
There
was
the
data
thingy,
so
you
know
put
on
your
your
algebra
hat
here
and
just
consider
this
to
be
a
variable.
You
know
that
that
stands
for
something,
but
it's
something
that
we
already
know
a
little
bit.
You
know
a
facility
that
can
handle
the
data
that
comes
from
the
Large
Hadron
Collider
that
can
handle
the
data
that
comes
from
the
joint
genome.
B
Institute
that
can
handle
the
data
from
from
ALS
from
these
different
sources
you
know
is,
is
something
that
there's
quite
a
track
record
on
already,
and
so
you
know,
the
question
is:
if
we
look
at
where
detectors
and
data
trajectories
are
going,
is
this
sort
of
the
right
model
for
the
future?
What
can
we
learn
from
the
models
that
are
that
exist
right
now
in
order
to
inform
where
we,
where
we
go
from
there?
B
If
you
apply
for
time
at
nurse,
you
can
get,
you
know
computing
time,
you
can
get
an
allocation
of
disk.
You
know
you
can
sign
up
for
various
other
smaller
things,
but
but
it's
really
about
computing
time
mostly
overall
and
part
of
XD
SF
is,
is
broadening
that
to
include
data
services,
storage
and
analytics
and
network
services,
which
I'll
describe
a
little
bit
now,
and
that
outer
context
for
this
is
really
about
data
science,
computer
science,
mathematics
and
machine
learning,
and
there's
lots
of
software
engineering
that
that's
part
of
this
as
well.
B
B
Other
than
do
you
need
to
do
data
transfer
over
the
land
that
we
really
ask
about,
but
but
you
know
in
this
sort
of
future
space
that
we're
imagining
you
know
asking
the
network
to
do
a
lot
more
than
just
be.
There
is
is
kind
of
what
this
comes
down
to,
and
so
adaptation
increasing
flexibility
and
really
pushing
the
limits
of
what
networking
technology
can
do
is
imagine
to
happen
in
this
case
by
making
some
of
those
choices
available
in
software
right
now,
you
have
very
little
choice
at
all
about
how
the
network
works.
B
It's
either
there.
It's
not.
The
changing
space
of
applications
is
really
brought
out
here.
Is
that
you
know
in
order
to
get
there,
you
know
we
need
to
expand
our
thinking.
You
know
to
include
both
these
seven
Giants
of
data
and
seven
dwarves
of
simulation.
The
the
the
Giants
here
reference
to
the
National
Academies
report
on
frontiers
and
massive
data
analysis,
and
these
are
not
fully
unknown
things
to
us.
B
The
the
data
structures
that
we
use
to
I
mean
the
the
kind
of
bread
and
butter
of
newark
in
some
ways
is
the
you
know:
mpi
OpenMP,
Fortran,
C
Python.
You
know
those
sorts
of
things.
The
the
level
at
which
we
interact
with
the
data
in
many
cases
doesn't
need
to
be
at
that
kind
of
bare
metal
sort
of
level,
and
so
you
know
software
and
and
and
workflows
that
allow
us
to
kind
of
move
up
a
level
can
be
really
useful
and
so
fast
bit
and
fast.
B
Query
from
John,
manera,
Shoshoni
and
other
folks
in
the
scientific
data
management
group
have
have
allowed
scientists
to
kind
of
stop
messing
with
their
data
as
much
this
may
be.
One
way
to
look
at
is
that
they
can.
They
can
ask
queries
against
the
data
without
having
to
get
down
into
the
data
sort
of
at
a
shell
level,
file
level
that
sort
of
thing.
B
So
this
is
a
tremendously
powerful
tool
to
be
able
to
interact
with
data
using
data
structures
that
that
are
not
the
flat
files
that
are
written
out
of
out
of
a
simulation
or
that
come
from
an
experiment.
Tigres
from
debs
group
is
moving
in
the
direction
of
scientific
workflows
that
that
abstract,
concurrency
and
parallelism
and
other
things
sort
of
behind
the
scenes.
I.
B
The
data
that
allow
people
to
work
with
larger
and
larger
data
sets
they're
going
to
be
a
lot
of
technology
choices
ahead
as
well,
and
this
is
a
slide
from
Cathy
that
compares
to
the
compute
intensive
architecture
and
data-intensive
architecture.
That
I
will
not
go
full
full
through
fully.
But
you
know
looking
at
these
kind
of
core
concepts
here,
maximizing
bandwidth
density,
near
computing
versus
you
know,
bringing
more
storage
capacity
near
the
compute
or
embedding
the
compute
into
the
storage
are
are
different,
different
tasks
that
have
different
technology.
B
B
First
of
all,
but
you
know
those
login
nodes,
you
know
having
having
more
conduits
and
more
pipes
to
a
machine
like
copper
is
something
that
we
can
treat
through.
For
instance,
the
network
connectivity
of
the
batch
nodes
and
other
ideas
like
that
so
designing
our
own
CPUs
is,
is
considerably
harder.
But
there's
there's
tremendous
flexibility
at
a
lot
of
other
areas
about
how
we
adjust
nurse
resources
to
to
really
drive
data.
Intensive
architectures.
B
You
know
fully
utilized
all
the
time,
so
we're
exploring
and
examining
these
trade-offs
about
how
interactivity
can
work.
What
one
of
the
the
you
know,
secrets
to
nurse
success,
I
think
overall,
has
been
thus
far
pretty
good
isolation
between
all
these
different
users.
You
know
that
that
when
you
get
a
node
or
your
batch
job
in
as
much
as
we
can
is,
is
more
isolated.
You
know
from
from
others,
and
so
your
performance
expectations,
the
kind
of
error
situations
that
you
can
reach.
All
of
those
things
are
are
less
wild
and
wooly.
B
So
the
the
last
thing
I
want
to
touch
on
is
how
this
is
all
programmed,
and
this
is
this
is
really
Cathy's.
You
know,
passion
overall,
is
how
how
are
we
going
to
which
program
which
programming
models
are
going
to
excel
at
delivering
on
these
sorts
of
challenges
and
again,
rather
than
declare
a
winner
or
give
you
my
own
advice
about
this?
You
know
I'd
say
that
you
know
that
the
answer
is
obvious.
B
It's
it's
probably
both
and
Newark
is,
is
heavily
involved
with
both
at
this
point,
if
you
are,
you
know
somebody
who
wants
to
make
sure
that
these
technologies
are
persistently
and
indefinitely
available
as
they
are
come
to
nug
and
tell
us
about
it.
If
you
want
to
you
know,
if
you
want
to
position
help
nurse
position
itself
for
more
advanced
computing
technologies.
Come
talk
to
us
with
about
that
too.
B
And
you
know
these
are
really
important
if
the
mission
goals
of
some
of
the
detectors
that
are
being
built
out
there
are
going
to
are
going
to
be,
satisfied
is
to
be
able
to
have
that
kind
of
end-to-end
science
happen
very
quickly
and
to
be
very,
very
user
focused
and,
in
addition
to
that
innovating
in
the
in
the
simulation
side,
we'll
be
hearing
later
today
about
the
materials
project
in
detail.
You
know
this
example
is
really
all
about
data
flow.
B
Aren't
that
complicated
is
you
know,
look
at
collaborative
HPC,
workflows
that
produce
web-based
durable
data
assets
and
if
you
have
an
algorithm
that
can
produce
those,
if
you
need
a
team
of
scientists
collaborating
through
you
know
some
social
mechanism
to
do
that.
There
are
a
lot
of
ways
to
get
there,
but
we
have
great
collaborative
potential
and
data
by
bringing
those
resources
together
in
a
way
that
they're
not
just
living
in
your
home
directory
or
your
project
directory,
but
they
become
resources
for
the
for
the
larger
community.
B
B
B
B
So,
if
you
throw
away
all
the
all
the
data
points
that
don't
match
that
you
know
you
might
be
interested
in
the
the
surface
area
of
that
of
that
kind
of
sub
selection
overall
or
its
volume
or
other
things,
and
so
those
indices,
things
that
you
might
be
interested
in.
Like
pressure
temperature,
carbon
monoxide,
concentration,
those
things
can
be
given
to
fast
bit
and
and
indices,
can
be
pre
computed
that
the
queries
that
are
done
after
the
fact
become
very
very
fast.
C
B
Really
haven't
you
know,
I'll
be
the
first
to
admit
that
I
was
slow
to
pick
up
on
it.
You
know
that
I
I
got
into
computing
because
I
was
interested
in
fast
simulation
and
you
know
for
a
long
time.
I
was
glad
that
the
backups
were
there
and
the
file
systems
worked
and
stuff
like
that.
But
you
know
it
is
really
a
paradigmatic
shift.