►
From YouTube: CI WG demo: DataNet Federation Consortium
Description
Date: 1/20/17
Presenter: Mike Conway
Institution: Renaissance Computing Institute (RENCI)
South Big Data Hub
A
I'm
so
I'll
give
like
a
quick
demonstration
to
I'm,
not
really
here.
You
know
this
is
what
I
told
these
guys
I'm,
not
really
here
to
demonstrate.
Ciphers
per
se,
cybers
is
developed
by
our
seme
discovery.
Environment
is
developed
by
ciphers,
so
what
I
really
want
to
do
is
is
put
it
in
context
of
what
we're
doing
in
DFC
and
and
what
we
think
it
means
and
what's
happening
here
and
going
forward
that
may
be
of
interest
conceptually.
A
But
we
see
the
data
hub
as
an
emergent
network
of
nodes,
and
so
what
I
mean
by
that
is
we're
less
focused
on
developing
a
site
as
much
as
we
are
in
developing
software
that
can
be
used
to
form
constellations
of
sites,
and
this
can
half.
This
can
occur
in
my
mind,
around
sort
of
the
centralized
nature
of
data
hubs
as
they
exist
and
the
idea
is,
we
want
to
embed
Pauline
policy
management
and
openness
from
scratch
paste
to
publish
reference
collections.
A
The
idea
is
that
data
can
be
my
data
before
it's
our
data,
and
we
want
to
make
sure
that
we
can
handle
that
data
in
one
context
from
all
these
transformations
that
can
happen
before
it's
finally
visible
in
something
like
a
publish
endpoint.
So
again,
just
like
the
internet.
We
want
to
sort
of
see
if
data
of
grids
and
cyber
infrastructure
actually
behaves
like
the
Internet
as
we
know
it.
A
So
the
ideas
local
modes
freely
associate
we're
able
to
network
Confederate,
together
with
local
control
and
I'm,
emphasizing
that,
after
the
visit
to
the
northeast
data
hub,
where
they're
really
interested
in
pardoning
to
party
data
sharing
mediated
by
agreements.
So
we
want
to
sort
of
take
that
sort
of
idea
and
apply
that
to
the
the
data
hub
concept.
So
maybe
the
big
data
hub
is
the
red
of
focus
of
this
chart.
A
But
then
there
is
a
network
that
emerges
around
and
can
connect
to
or
the
independent
of
that
central
hub,
but
using
all
the
same
infrastructure
and
concepts
so
that
one
can
join
the
larger
collective
collectively
at
points
or
it
will.
It
has
to
service
the
entire
data
lifecycle
and
it
has
to
support
discovery
by
appropriate
audiences
and
that
sort
of
implies
things
like
the
data
one
catalog
and
things
like
that
of
valgus
of
shamelessly
from
Regan.
A
So
so
it
fails
to
facilitate
that
we're
working
on
what
is
the
apache
web
server
of
data?
You
know
some
basic
things
that
are
sort
of
non
or
sort
of
easy
to
sort
of
grok
would
be
it's
open,
source
and
ubiquitous,
and
it's
packaged
for
the
hardware
and
software
ecosystem
of
what's
out
there
in
research
land
or
where
people
are
doing
real
work
and
then
beyond
that.
What
are
the
facilities
that
are
needed?
Policy
management
again
to
support
that
life
cycle
and
as
Kenton
is
indicating
and
I'm
really
I?
A
Okay,
so
where
are
called
my
other
foot
is
in
the
consortium,
so
we
want
to
rely
on
the
consortium
model
for
the
underlying
data.
This
is
where
the
scratch
base.
You
know
from
my
scratch
base
all
the
way
out
to
these
published
reference
collections.
It
has
to
rest
on
an
integration
with
things
like
storage
technology
as
well
as
high-performance
computation.
So
these
are
some
of
the
people
that
were
working
with
on
the
consortium
side
for
that
not
on
your
Intel
who
just
joined
and
so
we're
going
to
be
integrating
this
technology
with
lustre.
A
A
Their
data
have
been
handled
according
to
of
the
prescribed
policies
and
so
forth.
So
there's
all
this
is
really
saying
is
that
within
that
network,
it's
not
all
open
that
we
have
to
design
and
do
something
that
allows
these
kinds
of
restrictions
and
sort
of
little
clusters
of
this
network
to
work
in
on
little
islands
policy.
A
If
you
will
so,
we
looked
at
de
from
cybers
as
a
season
sort
of
starting
point
for
the
data
workbench
part
of
this,
which
is
you
know,
tools
for
data
access
and
sharing
and
basic
metadata
management,
as
well
as
a
model
for
accessing
computation,
include
their
model,
bring
your
own
compute,
which
sounds
a
lot
like
what
Kenton
is
talking
about.
That
means
that
researchers
can
dr.
A
eyes
tools,
throw
them
in
there
and
have
their
environments
be
able
to
execute
what
researchers
provide
for
analyses
on
the
data
that
sits
inside
the
environment,
and
that
can
also
include
as
we're
talking
about
shipping
that
shipping
that
computation
to
the
data
within
this
environment
again
controlled
with
the
policy
those
so
that
again,
if
it's
medical
data
or
sensitive
data,
that
we
know,
we
know
that
only
certain
kinds
of
computations
happen
and
how
to
process
of
results
of
analysis.
So
we
looked
at
de
is
providing
a
service
layer
to
that.
A
We
can
exploit
and
adapt
for
these
use
cases
again
in
tenten's
presentation.
He
had
a
really
good
overview
and
think
of
some
of
the
use
cases
that
the
data
hubs
are
going
to
have
to
follow
in
the
attractiveness
of
us
of
the
cyber
stack
is
that
that
sort
of
have
a
functioning
embodiment
of
a
lot
of
those
use
cases
so
kind
of
a
central
organizing
component
for
us,
for
that,
we'll
also
include
other
interfaces
and
access
methods
so
quickly
on
status.
A
The
last
thing
we
did
was
with
the
baka
Nam
and
John
Goodall
at
University
of
Virginia
have
taken
her
hydrology
workflows,
docker,
eyes
them,
and
we
can
run
them
in
this
architecture,
but
also
we
created
a
gateway
so
that
things
like
Hydra
share
can
launch
apps
using
the
same
infrastructure.
So
it
becomes
a
generalized
service
in
the
same
way
that.
A
Yeah
yeah,
and
also
just
recently,
this
was
presented
as
a
brain
initiative
by
the
UNC
neuroscience
Center
is
we've
also
created.
These
are
three
environments
we've
created,
so
the
start
environment
is
linked
to
a
GPU
resources
and
researchers
are
doing
imaging
of
mouse
brains
using
GPUs,
and
so
again
we
can
do
this
all
through
the
same
set
of
environment
and
services.
A
So
we
want
for
working
with
a
new
Rob
and
the
Cypress
folks
who
are
putting
together
an
official
community
de
as
a
software
package
and
then
we're
working
with
the
consortium
to
extend
the
plug
ability
of
those
services
in
towards
the
core
architecture
over
on
the
our
roads,
construction
side,
so
met
extension
points
or
metadata,
curation
indexing
discovery.
A
So,
for
example,
we
just
met
with
data
one
to
see
about
formalizing,
a
sort
of
a
plugin
data,
one
member
node,
a
software
stack
and
formation
of
collections
across
distributed
grid
so
that
you
can
have
collections
in
multiple
places
that
all
appears
one
collection
but
are
still
having
all
the
policies
in
force
and
all
the
various
nodes
and
standards,
commodity
approaches
for
data
sharing
and
the
idea
of
Alucard
installation
of
components.
So
this
is
more
about
I'm,
a
shop
and
I
want
to
do.
Data
sharing
I
mean
I,
need
this
piece.
A
I
need
this
piece
to
run
as
services
on
my
hardware
or
out
in
the
cloud,
but
my
own
computational
infrastructure,
like
we
run
Apache
web
server
to
serve
web
pages.
I,
don't
have
a
ton
of
time
and
again,
I
wasn't
going
to
really
demonstrate
de
so
much,
but
for
those
of
you
who
have
not
seen
it
on
the
visual
model
is
of
a
workbench
where
you
have
data.
So
this
does
all
the
sort
of
things
you've
seen
over
and
over
for
sharing
data,
doing
a
metadata
management
setting
apples
and
things
like
that
upload
download.
A
It
also
has
a
view
of
apps
where
people
can
bring
their
own
pieces
of
computation.
For
example,
here's
a
Baca
noms,
a
hydrology
workflow
app,
it's
a
docker
eyes,
damage
that
goes
in
the
environment
and
then
you
can
stage
data
to
it
run
analysis
run
the
app
and
then
the
results
of
the
app
shall
appear
in
the
analyses,
and
you
can
run
stuff
both
on
high
throughput
environments
like
condor
as
well
as
use
they've
integrated
this
with
the
agave
science
api
c.
A
You
can
also
run
apps
on
high-performance
computing
like
a
Down
attack.
So
the
idea
here
I
know
that's
really
equipped.
A
I
think
you
guys
have
already
seen
cybers,
but
the
idea
is
we've
taken
that
out
of
cybers
it's
now
an
independent
software
stack
it's
being
integrated
with
the
data
grid
and
we're
working
towards
a
future
where
there
is
this
interface,
as
well
as
a
generous
rest
service
API
that
can
be
installed
with
pieces
as
needed
at
an
institution
at
a
business
and
that
will
integrate
with
their
high
performance
storage
that
they're
buying
from
DD
N
or
EMC
or
okay,
so
I'm
going
to
stop
there
I
know.
A
B
A
My
you
have
examples
of
the
type
of
applications
that
Jerrod
is
running
in
his
environment.
Oh
I
wish
he
was
here
to
characterize
that
I
know
that
they
do
a
lot
of
game
processing
pipelines.
There's
a
lot
of
work
with
the
DNA
analysis
and
I
think
they
do
a
lot
of
spatial
analysis.
It's
really
open-ended,
though,.
A
That's
another
we're
saying
I,
don't
know.
Oh
I.
B
Now
I
dodge
itself
is
a
federated
data
grid,
so
you
can
go
to
a
data
which
is
in
any
of
the
underlying
zones
of
ions,
so
it
gives
you
the
zone
of
zones
and
things
like
that.
Can
you
do
the
same
thing
for
the
apps?
Also
apps
can
be
from
multiple
zones,
and
you
can
you
can
look
at
them
and
use
them
across
zones
and
currently.
A
That
that
use
case
is
not
supported.
That
is
something
that
we
were
talking
with
cyber's
about
about
adding
Federation
to
other
aspects
of
what
they
do,
because
they
already
have
use
cases
for
that
yeah.
B
A
And
I
think
the
focus
we
have
right
now
would
be
like
the
definition
of
what
we're
calling
a
computational
resource
as
sort
of
a
packaged
installable
piece
so
that
you
could
have
a
storage
grid.
But
then
you
could
identify
certain
nodes
as
being
a
sandbox
for
the
computation.
So
we
had
a
conversation
with
Kenton
before
about
and
and
that
found
that
a
pleasingly
they
had
already
thought
of
this,
which
is,
could
we
take
brown
dog
services
and
then
on
the
data
and
ship
it
down
to
where
that
data
is
at
rest.
A
B
A
A
What's
a
nds
share,
but
in
terms
of
finding
where
that
data
is
at
rest,
is
there
a
way
for
a
site
to
say
here's,
the
sandbox,
you
know
here's
how,
when
you
ask
an
app
to
be
run,
I
could
pick
that
up
and
use
the
same
sort
of
infrastructure
to
push
that
down
to
one
of
these
compute
resources,
for
example.
That
would
be
really
really
interesting
like
if
we
had
could
be
a
provider,
if
you
will
of
that
kind
of
functionality,
what
that
would
entail.
B
If
you're,
storing,
maybe
I,
think
you
are
doing
it
I'm
just
being
a
devil's
advocate
here,
you
store
an
app
as
a
data
setting
hydarnes
you
automatically
inherit
all
all
those
facilities
which
it
works,
provides
with
you
with
access
control
and
replication
versioning
and
all
those
nudity
things.
Each
authentication,
authorization,
nudity
yeah.
A
We're
interested
in
that
for
a
president
as
a
preservation
task,
because
you
know
the
the
de
will
already
do
like
if
you
want
an
analysis.
I
may
be
out
of
time,
but
if
I'm
looking
at
an
analysis
that
has
wrong
the
idea,
is,
it
is
repeatable
at
keeping
data
about
that
of
what
happened.
What
app
was
run?
I
can
see
the
parameters.
A
I
can
tune
them
and
relaunch
it
to
achieve
repeatability,
which
you
knows
is
a
big
thing,
everybody's
looking
for
so
the
idea
of
being
able
to
add
at
a
point
preserve
what
happened
and
keep
the
computational
docker
image
with
the
data
is
at
least
a
step
towards.
You
know
having
a
complete
preserved
image
of
the
whatever
the
task
lows.