►
From YouTube: CI WG demo: NDS (National Data Service)
Description
Date: 01-20-17
Presenter: Kenton McHenry
Institution: National Center for Supercomputing Applications (NCSA)
Midwest Big Data Hub
A
So
with
that
I'm
going
to
introduce
dr.
Kenton
McHenry,
he
is
the
senior
research
scientist
at
the
National
Center
for
supercomputing
applications
at
the
University
of
Illinois
at
urbana-champaign
is
a
deputy
director
of
NCS,
a
scientific
software
and
applications
division.
The
principal
investigator
of
dibs,
Brown,
dog
and
co-lead
of
NC
essays,
innovative
software
and
data
analysis
group,
which
works
with
researchers
to
build
novel
tools,
services
in
support
of
scientific
data
needs
and
if
their
story
behind
that
brown
dog
data
lies,
I'm
not
familiar
with
it.
Yeah.
B
B
It
would
have
been
done
just
to
even
do
a
few
years
back.
The
program
officer
from
NIH
basically
talks
about
how
any
scientists
don't
want
to
give
out
their
data.
They
want
to
be
buried
with
their
data
and
how
there
was
the
major
accomplishments
even
do
this,
and
so
on.
The
I
would
say
in
terms
of
what
we're
doing
with
the
NBS
it's
towards
enabling
these
kinds
of
things
that
they're
much
more
frequent,
less
worthy
of
a
story,
kind
of
scenarios.
You
know
this
is
kind
of
the
way
things
will
be
we're
pulling
together.
B
Different
data
sources
is
just
the
way
things
are
done,
and
so,
with
regards
to
publishing
scientific
data,
there's
a
couple
of
aspects
to
and
the
most
common
one
being
you
know,
you've
got
to
have
somewhere
to
put
the
data
as
where
to
store
the
bytes
and
I.
Think
a
lot
of
people
tend
to
focus
on
that,
but
that's
only
really
half
of
the
story.
Everything
else
that
comes
after
that
is
crucial
and
in
these
presentations
I
just
throw
up
in
you,
know
a
couple
of
ASCII
bytes
and
ask
people
what
is
that
right
there?
B
If
anybody
out
there
can
answer
me
well,
what
does
that
mean?
Anybody
Mike
it's?
Basically,
if
you
watch
Hitchhiker's
Guide
to
the
galaxy,
it's
the
number
42,
you
know
they
answer
to
everything,
and
what
this
would
I
try
to
highlight
here
is
basically
it's
difficult
to
say
what
you
know,
those
bytes
those
bits
mean
without
you
know,
information
around
it
without
the
knowledge
that
it
was
ASCII
in
more
complex
scenarios.
You
know
the
format
of
the
file
indices
over
collections
of
files.
B
They
could
find
what's
where
metadata
around
the
data,
so
that
you
know
what's
in
what
and
all
kinds
of
other
things
around
access,
control,
data,
transfer,
data,
transformation,
analysis
and
so
forth.
All
these
are
services
that
are
above
and
beyond
just
just
the
storage
of
those
bytes
and
are
crucial
for
data
publication
and
data
reuse.
Eventually,
so
we
focus
on
that,
and
so
once
you
get
down
that
world
of
data
services,
there's
a
there's,
a
huge
landscape
of
things
that
are
currently
being
researched
and
developed
to
kind
of
deal.
B
With
these
things
for
a
scientist
getting
into
the
world
of
putting
together
a
data
management
plan,
there's
a
lot
of
different
tools
out
there.
A
lot
of
them
are
redundant.
A
lot
of
them
are
contained
within
specific
domains
and
not
really
known
about
beyond
that,
and
it
makes
it
difficult
in
terms
of
using
it
using
these
things
and
also
connecting
them
together
in
the
case
where
something
more
could
be
done
by
taking
one
aspect
of
one
and
combining
with
another
aspect
of
another
and
to
do
something
even
bigger.
B
We,
as
we
are
doing
here,
engaging
with
the
big
data
hubs,
earth
cube
and
the
RDA
and
so
forth.
Basically
trying
to
map
the
landscape
of
what's
out
there
and
trying
to
pull
together
pieces
and
address.
You
know,
interfaces
and
protocols
that
we
could
potentially
implement,
or
at
least
motivate
the
implementation
toward
fall
to
the
implementation
towards
towards
making
it
easier
to
connect
these
components
together,
and
the
last
thing
I
would
have
mentioned
too
is
we
also
engage
with
publishers
themselves.
B
Since
we
are
talking
about
publishing
data,
we
engage
with
like
nature
science,
also
here
and
so
forth,
and
representatives
from
them
typically
attender
our
meetings
in
terms
of
the
cyber
infrastructure
components
that
we
engage
with
participants
of
the
our
activities.
We
try
to
highlight
certain
aspects.
One
is
basically
they're
each
trying
to
engage
some
Big
Data
challenge.
B
That
many
people
could
potentially
use
so
we've
been
between
some
of
our
work
with
regards
to
the
community
activities,
have
centered
around
mapping
the
landscape,
as
I
mentioned
the
components
kind
of
breaking
things
down
into
individual.
You
have
pieces
of
a
data
infrastructure
from
as
a
metric
or
authentication
transfer,
storage,
curation
analysis,
explorations
so
forth
and
kind
of
mapping
these
into
some
of
the
components
that
are
that
are
out
there
and
what
does?
What?
Where
they're,
missing
gaps?
B
Where
are
there
10
different
things
doing
the
same
thing
that
those
kind
of
indicate
where
they
BIA
interface
really
could
be
used,
so
the
one
could
pick
one
and
swap
it
out
over
time
if
they
need
to
and
so
forth,
provide
different
things
out
and
we'll
see
what
kneestr
needs
best
and
overall
really
work
towards.
This
is
a
concept
I
think
first
came
of
the
RDA
is
kind
of
doing
for
the
this
data
input
data
world.
What
to
happen
for
the
internet
or
long
ago,
where,
basically,
there
was
a
lot.
B
There
were
lots
of
components
vying
for
each
piece
of
the
internet
back
then,
and
it
was
an
open
until
you
know
there
was
the
event,
except
that
we
should
use
tcp/ip,
that
the
HTTP
and
HTML
and
all
these
other
protocols
came
to
be
the
things
tenant
for
the
user
at
least
became
kind
of
cohesive,
where
it's
a
sense
that
they
could
pick
any
browser
that
they
wanted.
It
would
still
work
with
any
other
technology.
In
the
background,
a
web
administrator
could
pick
any
web
server
they
wanted.
They
felt
was
best
that
they
knew
best.
B
Development,
wise,
so
Kusum
see
funding
from
NCSA
and
some
efforts
out
of
sdsc
and
Argonne
National
Labs
we've
been
working
towards
sort
of
developing
some
tools
to
kind
of
a
foster,
this
movement
towards
interoperability,
and
so
the
first
one
is
in.
We
call
nd
up
laps.
This
basically
has
three
components:
the
MVS
Labs
workbench
is
the
main
software
component
of
this,
and
what
it
largely
is
is
is
this
sort
of
an
app
store
for
these
data
management
tools
and
services?
B
A
catalogue
of
these
things
that
are
being
researched
and
developed
and
are
under
active
development,
perhaps
not
finalized
at
the
moment,
and
so
a
user.
Some
news,
a
new
project,
that's
looking
for
data
management
tools
could
potentially
go
here,
find
tools
that
meet
that
they
need
for
curation
or
sharing
data
and
so
forth
and
actually
deploy
them.
So
the
tools
are
basically
contained
within
this
app
store
for
these
data
management
tools,
as
docker
containers
and
and
managed
to
kubernetes,
and
so
from
here.
B
A
person
can
basically
find
the
tools
they
want
and
run
them
run,
ten
of
them
and
try
them
out,
and
so
what
that
kind
of
looks
like
is
that
you
add
up
to
your
workspace
and
you
say:
can
select
lunch.
It
pulls
the
dependencies
of
those
tools
with
them.
So
in
this
example,
here
cloud
out
of
data
at
state
it
pulls
MongoDB,
RabbitMQ
and
so
forth
for
dataverse
it
pulls
postgrads
and
solar
and
so
forth,
and
these
are
all
dr.oz
components
as
well,
and
so
you
can
launch
them
and
try
them
out
for
yourself.
B
B
So
that's
so
that's
one
of
the
tools
we've
been
developing
nds
labs
also
provides
resource
allocations
to
cloud
resources
scattered
across
SDSC,
mcsa
and
others
to
kind
of
help
more
advanced
users,
try
out
new
technologies
and
work
towards
interoperability.
I'll
just
mention
that
the
NGS
labs
workbench
also
is
meant
for
that
interoperability
development
aspect.
It
supports
various
web
based
IDE
of
methods
of
accessing
data,
on
these
tools
accessing
terminals
and
so
forth.
B
That
mcsa
and
other
organizations
as
well,
and
so
that's
one
of
the
tools
and
then
I'll
just
finish
up
with
the
second
tool
here,
which
is
kind
of
ramping
up,
which
is
a
portion
of
this
activity,
which
we
call
NDA
share,
which
is
going
more
towards
the
line
of
what
animal
data
servers
might
look
like
it's
sort
of
a
portal
towards
to
all
data.
That's
out
there,
regardless
of
what
technology
is
behind
what
archive
technology?
B
What
storage
is
actually
on
and
you
can
think
of
it
kind
of
like
a
Google
for
scientific
data
kind
of
thing,
and
so
what
we've
been
building
there
is
that
as
a
resource
that
would
kind
of
foster
that
so
in
terms
of
archive
technology
date
of
publication,
there's
a
lot
of
tools
out
there.
I
won't
go
into
too
much
details
of
this,
because
I'm
certain
I'm
running
out
of
time
here,
but
this
globe
is
published,
is
one
example
where
you
can
basically
put
it's
the
extension
of
globes
transfer
where
you
can
basically
publish
data.
B
Sets
you
basically
upload.
A
data
set
could
add
some
metadata
and
basically,
the
end
of
the
day
get
a
DOI
or
some
other
handle.
With
that.
You
can
then
reference
that
data
set
by
data
net
see
does
a
similar
thing.
It's
got
a
drop
box
like
an
interface.
You
can
put
data
there
and
same
kind
of
process.
It'll
find
a
repository
for
you
get
a
DOI
dataverse
has
been
around
for
a
while.
B
If
you
do
that
with,
and
so
what
we
built
for
this
kind
of
world
is
one
something
something
above
a
useful
tool
that
can
be
leveraged
by
all
these
things,
for
something
that's
becoming
more
and
more
prominent.
These
days,
the
need
to
run
analyses
next
to
data
sets
and
so
trying
to
be
agnostic
to
technologies,
but
building
something
that
each
can
leverage.
We
have
been
begun
working
on
this
tool.
We
refer
to
as
a
data
DNS
and
what
it
is.
It's
kind
of
analogous
to
a
traditional
DNS
that
naps
IPA.
B
So
if
the
data
is
too
large
to
move
its
a
terabyte
or
more
legally,
where
it's
added
launcher
to
paterno
book
next
to
it
in
our
studio,
notebook
next
to
it,
dr.
oz,
container
next
to
it
and
so
forth,
and
so
we
did
a
demonstration
of
this
at
supercomputing.
This
is
kind
of
a
web
portal
interface
to
look
at
the
data
DNS
entry,
so
based
on
its
citation
where
it's
located
and
you
can
launch
a
notebook
from
here,
in
which
case
you
press
the
button.
It
brings
me
to
a
trigger
a
notebook.
B
You
can
run
it
and
get
some
sort
of
visualization
on
some
data
set
at
some
remote
location,
but
the
idea
here
is
to
leverage
it
in
all
these
data
management
technologies
that
currently
exist
so
in
the
globus
published
case,
basically
imposing
on
it
these
little
buttons
down
here
way
to
this
juana
juana,
some
sort
of
tool.
Next
to
the
data
data
net
seed,
then
kind
of
thing
basically
imposes
you
know,
imposes
Oh
too
capability
right
on
it.
B
So
that's
kind
of
an
angle
we're
taking
on
that
and
it'll,
be
one
component
of
this
Google
for
data
that
will
be
India
share
at
some
point,
so
this
is
kind
of
a
first
step,
so
I'm
going
to
end
there.
Those
are
two
of
the
technologies
we're
working
on
at
the
moment.
There's
YouTube
links
in
these
slides
that'll
be
online,
so
you
can
actually
see
them
running
and
if
there's
any
questions
that
can
take
them
now,.
A
A
B
Yeah
so
on
our
web
page
there's
a
little
link
to
submitting
pilot
efforts,
and
so,
if
you
do
that,
there's
a
little
form
you
fill
out
and
basically
propose
what
your,
what
you
think
basically
will
be
beneficial
in
connecting
some
components
and
developing
some
technology
around
that,
and-
and
we
look
give
you
those
are
periodically
and
with
that-
provide
resources
and
work
with
you
towards
enabling
that
kind
of
thing.
So
that
would
be
a
one
way
to
do
it.
B
A
B
So
the
idea
would
be
that
today
the
nds
labs
could
he
could
serve
as
that
purpose,
but
I
don't
believe,
we've
done
much
follow-up
in
terms
of
making
that
happen
yet
and
yes,
the
latest
labs
workbenches,
it's
still
in
its
alpha
stage.
The
beta
release
of
that
will
be
in
the
next
month,
and
so
that
would
be
when
it's
more
stable
but
I,
don't
believe.
We've
followed
up
too
much
with
that
at
the
moment,
very
good.
A
This
is
Lea
I,
really
liked
your
app
store,
storefront
that
you
showed
when
we
had
our
hubs,
we're
having
a
series
of
meetings
for
strategic
planning
for
the
South
hub
with
our
pies.
My
counterpart
were
not
a
rollin
cost
in
our
teams.
We
came
up
with
a
similar
conclusion
that
that
was
needed.
Is
this
something
that
you've
been
working
with
with
the
Midwest
hub
to
provide?
Is
this
something
that
could
be
expanded
that
all
the
hubs
work
with
you
on
and
also?
How
does
it
relate
to
what
exceed
provides
in
their
tools
and
resources?
So.
B
In
terms
of
need,
yeah,
we
have
seen
that
cross
need
and
specifically
from
earth
cube.
We've
been
involved
with
the
earth
cube
architecture
committee,
and
this
kind
of
thing
came
up
there
after
the
fact
too,
and
so
I
showed
it
to
them
as
well,
and
as
I
mentioned
to
them,
it's
open
source
and
we
welcome
contributions
from
anybody,
and
you
know
anyone
is
welcome
to
skin
it
as
they
wish
to
for
their
specific
endeavor.
B
What
we're
trying
to
do
is
basically
catalog
these
tools
and
docker
containers
and
in,
however,
that's
done,
they
can
be
shared
across
any
of
these
instances
at
some
point,
so
I
think
for
the
greater
good.
However,
it's
branded
it's
it's
good
to
have
the
eventually
help
us
build
up
this
tool
base
of
all
these
different
data
management
technologies
are.