►
Description
Date: 11/11/2016
Presenter: Carol Song
Institution: Purdue University
Midwest Big Data Hub
A
I'd
like
to
introduce
Carol
song
Carol
is
a
senior
research,
scientist
and
director
of
the
scientific
solutions
group
at
the
Rosen
Center
for
Advanced
Computing
at
Purdue
University.
Our
current
research
areas
include
distributed
computing,
advanced
computing
and
data
infrastructure
and
science
gateways
carol
has
been
reading
produces
HPC
program
since
2007
served
as
a
PI
of
Terra
grand
resource
partner
project
is
currently
the
PIO
produced
partnerships
in
the
exceed
project.
A
She
has
a
leading
role
in
a
number
of
NSF
funded
interdisciplinary
projects,
including
data
interoperability
by
team,
SDC,
eye-fi
and,
most
recently,
a
CI
on
a
digital
initiative
project
geospatial
data
analysis
building
blocks,
Carol
received
her
PhD
degree
in
computer
science
from
the
University
of
Illinois
at
urbana-champaign
and
I
person
excited
about
this
because
GIS
is
my
area.
So
thank
you
very
much
excited
all
right.
Carol.
Take
it
away,
okay,
hi
everybody!
So
does
my
slide.
B
B
Okay,
great
because
I'm
always
confused
by
my
two
screen
setup,
so
I'm,
going
to
give
a
quick
overview
of
the
gaps.
Prada
gap
stands
for
geospatial
data
analysis,
building
blocks
and
I
probably
have
more
slides
than
I
can
talk
through
some
of
them.
I
just
includes
included
them
for
benefit
of
the
people,
to
have
a
copy
so
I.
B
When
we
talk
about
the
gap
architecture
and
give
some
examples
and
talk
about
where
we're
at
so
I
guess,
listening
to
Kenton
I
a
CEO,
there
were
probably
all
going
through
the
same
things
as
we
were
working
with
a
lot
of
researchers
over
the
years.
We
did
a
lot
of
these
type
of
applications
and
most
of
them
were
custom
development.
You
know
just
putting
even
just
putting
datasets
online
and
do
the
interactive
gis-based
interface
in
a
way
that
everybody
can
access
it
takes
months,
if
not
years.
B
So
it
came
to
a
point
where
also
you
know
this
is
the
type
of
year
that
we
went
into
a
lot
that
somebody
scientists
they
have
models,
they
need
to
run
simulations
and
they
update
their
input
parameters
and
input
data
and
run
multiple
times
in
so
these
are
sometimes.
These
are
pretty
capable
scientists
who
can
do
these
computing,
but
Co
they're.
They
have
number
of
the
issues,
for
example,
as
they
said
well,
a
lot
of
my
laptop
I
can't
do
other
things
and
they
lose
track
of.
B
You
know
how
many
ones
it's
done
and
where
they
put
the
results
and
what
they
use
for
their
parameters,
and
especially
one
that
you
is
maths
and
things
like
that,
and
they
need
a
different
set
of
skills
and
also
when
a
lot
of
people
came
to
us
when
they
start
sharing
their
tools
with
other
people,
and
they
said
well,
you
know
how
do
I
go
about
this.
There
are
a
lot
of
aspects
involved
and
finally,
they
have
a
paper
published.
B
They
said
you
know
how
can
a
reference
the
software
and
connect
set
it
up
to
so
other
people
can
run.
So
we
get
a
lot
of
those.
So
with
that,
we
really
that
that
we
can
see
the
project
gap
and
the
goal
was
really.
The
overarching
goal
is
to
lower
the
barrier,
make
it
easier
for
people
to
be
able
to
visualize
geospatial
data
and
realize
spatial
data
in
their
models
in
their
whatever
computation
they're
doing,
and
also
do
it
in
open-source
community
driven
fashion.
B
So
this
is
a
different
approach
from
what
Canton
presented
we're
doing
this
end-to-end
a
platform
for
people
to
work
together
and
obviously
the
results
were
hoping
to
see
us
broaden
the
participation
in
GIS
type
of
data
analysis
in
quicker
dissemination
in
to
be
able
to
support
some
of
the
needs
in
the
classroom,
so
our
user
community,
primarily
basically
anybody
who
needs
to
deal
with
spatial
data,
especially
sharing
and
making
them
available
for
other
people
we're
specific
to
this
project.
We
had
scientists
that
we're
working
with
from
hydrologic
modeling
climate
impact.
B
People
will
deal
with
large
amount
of
weather
data
and
disaster
related
data
products
and
we're
working
with
social
scientists
as
well.
So
our
approach
is
really
building
on
an
existing
platform.
So
many
of
you
probably
are
familiar
with
hook
bureau,
which
has
been
funded
by
NSF
for
many
years
and
initially
started
providing
a
platform
for
sharing
computational
tools
connecting
to
HPC
high-throughput
in
in
recent
years.
B
And
on
top
of
that,
we're
add
a
list
project
in
particular,
adding
geospatial
data
and
compute
stationed
on
building
blocks
to
allow
people
to
do
these
things
on
their
own
without
a
lot
of
programming.
Without
these
knowledge
about
GIS,
and
so
all
of
that
will
be
part
of
a
healthy
role
eventually
and
to
support
our
variety
of
application
and
needs.
So
that's
at
a
high
level
a
few
words
about
the
award
that
was
back
when
the
13.
A
B
Excuse
me
they're
here
something:
okay,
all
right,
so
developments
and
services
to
the
community
and
I
have
the
website
there
that
TOC
for
a
project
so
I'm
going
to
skip
these
lights
because
they
have
a
lot
of
things.
The
main
point
I
want
to
illustrate
here
is
all
these
orange
blocks
if
the
new
things
we're
adding
to
hub
0.
B
Of
course,
a
lot
of
the
hub,
Europe
stuff
aren't
shown
here:
the
CMS,
the
authentication,
access
control,
all
the
best
stuff
that
you
come
to
expect
and
we're
really
focusing
on
building
the
rendering
engine
for
the
geospatial
data,
and
so
you
see
from
the
other
CI
projects
that
we're
leveraging
IRAs,
koba's
and
so
on
and
I'm
not
going
to
show
that
one.
So
the
specific
goals,
the
toolkits
for
people
to
quickly
put
together
the
GIS
based
application
and
to
provide
an
integrated
data
management
environment
with
built
in
geospatial
data
support.
B
So
for
people
who
have
done
these
things
developers,
they
know
that
there
are
a
lot
of
pieces
involved
and
so
for
other
domain
scientists
with
no
research
groups
to
do
these
things
a
tremendous
barrier
for
them.
So
we
want
to
build
us
into
N
and
built-in
support
for
the
geospatial
data
and
also
data
visualization
builders
that
allow
people
to
quickly
visualize
their
spatial
data
without
doing
any
programming
or
minimum
in
the
goal
also
is
to
provide
a
production
system
where
people
can
come
and
use
or
if
they
want
to
set
up
their
own.
B
We
have
various
ways
of
providing
that,
so
we
do
have
a
VM
that
almost
ready
for
download
and
start
using
right
away.
There
is
also
the
help
0
open-source
software
packages,
people
install
in
we
are
actively
working
on
AWS
instances,
so
people
can
just
click
up
on
one
button
start
a
zero
Amazon
instance
right
away,
see
so
I'm
going
to
give
some
examples
for
the
first
three
I
think
just
so.
B
You
have
an
idea:
what's
there,
so
the
first
one
is
ready,
the
toolkit,
so
these
are
libraries
or
toolkits
in
the
back-end
support
that
people
can
leverage.
For
example,
we
have.
This
is
one
group
that
we
worked
with
and
they
have
this
go
hold
data
down
scale
applications
that
they,
you
know,
have
a
screen
for
their
model
setup
and
they
run
the
model
and
the
results
are
displayed.
So
this
interface
you're
looking
at
so
all
these
map,
rendering
and
the
controls.
A
B
The
site,
so
they
don't
need
to
do
any
of
that,
so
basically
they
drop
in
a
widget
through
of
library.
This
is
a
Python
base,
map,
library
that
we
created
based
on
QT,
ice
and
and
there's
another
example
with
weather
data
if
it
goes
to
a
big
database
to
access
the
data
based
on
the
time
periods,
people
select
the
variables
in
the
geographic
features
that
they
want
like
in
this
case
is
the
something
like
Indiana
in
doc.
B
And
you
can
do
anything
tweak
all
the
layers
and
transparency
and
all
that
so
that
those
are
examples
for
the
toolkits
were
putting
out
there
already
available
on
production
site
and
as
far
as
entry
and
spatial
data
support.
So
basically
doing
all
these
things
on
this
list
search
metadata
extraction
and
all
that
automatically
as
much
as
possible
and
and
we're
also
providing
data
services.
So
to
link
the
tools
in
datasets
in
vice-versa,
the
metadata
extraction
we're
using.
B
We
created
irods
micro
services
for
doing
those
type
of
things,
and
also
to
put
in
solar
index
or
search
and
here's
an
example
of
a
workflow.
So
people
can
create
a
project
area
for
themselves,
so
basically
a
two
fields,
quick,
fill
up
the
forms
and
start,
and
then
they
have
their
project
space
and
there
were
different
kind
of
storage
providers
they
can
plug
in
right.
Now
we
have
the
iOS
that
has
the
geospatial
support
and
you
can
pull
up
the
485.
It's
just
like
a
file
explorer.
B
B
A
B
Handling
and
processing,
hyperspectral
multispectral,
remote
sensing
data
or
other
types
of
data,
and
so
without
really
leaving
you
know
going
to
another
place.
They
can
just
open
the
tool
and
the
data
appears
in
there.
They
conclude
manipulations
there
and
save
it
back
to
the
project
area,
and
there
are
other
tools
that
allows
people
to
combine
different
data
sets
into
a
map
and
with
layers
they
can
control
and
the
lot
of
times.
B
B
And
these
guys
they
they
took
this
device.
They
they
were
on
a
punching
both
taking
measurements
and
about
wind,
wind
related
this
data
directions,
wind
speed
and
all
those
related
variables.
I
came
out
as
a
spreadsheet,
about
twenty
to
thirty
thousand
lines
of
rolls,
and
so
for
that
we
were
able
to
just
import
the
data
and
right
away.
They
see
how
they
went,
and
then
they
can
select
all
the
points
along
the
way
and
then
see
these
a
different
cloth
of
variables.
B
A
A
B
B
B
These
tools
with
students,
so
this,
for
example,
this
was
a
camp
that
summer
camp
that
we
did
as
part
of
a
summer
camp.
We
did
some
lessons
using
tools
that
students
can
use
to
study
in
a
holiday
camp
from
the
remote
sensing
image
to
calculate
the
extent
of
the
areas
of
flooding,
and
so
you
can
see
the
picture.
B
Okay-
and
so
this
is
not
ready
right
now
by
this-
is
what
we're
working
towards.
So
a
data
publication
looks
like
this
right
now,
so
you
have
area
to
describe
your
data
you
can
put
in
snapshots.
You
can
also
see
what
files
are
there.
You
can
actually
check
within
the
bundle
here
what
we
really
want
to
do,
or
in
it's
in
the
works.
B
B
B
We're
also
working
with
hydro
share
to
be
able
to
publish
resources
across
the
two
library
infrastructures
also
on
during
launching
a
to
launching
from
one
side
to
another
and
we're
right
now
we're
using
the
IROC
spews
from
where
we're
still
working
from
the
issues
out
in
terms
of
launching
tools
on
each
other's
side
in
golbez
were
using
it
in
in
various
tools.
We're
also
looking
into
whether
we
can
how
we
can
make
that
more
integrated
with
hubzero
brawn
dog
with
you
know,
talk
about
various
things.
B
B
Let's
see
what
else
I
have
I
just
general,
so
we
have
this
website
called
my
geohash.org,
and
that
was
supporting
a
number
of
large
projects
right
now,
but
individual
users
can
also
go
there
and
just
use
it
for
free.
We
have.
We
have
been
releasing
various
pieces
on
this
website
as
we
go
throughout
last
year
in
our
our
first
release
of
facial.
First
release
is
coming
out
in
December
and
between
December
and
next
year,
we're
going
to
have
incremental
releases
and
also,
in
addition
to
hosted
services.
B
We
also
are
going
to
put
out
a
W
anted
instances
and
also
open
source
release,
offer
packages
and
right
now
the
site
has
users
at
6
1
s,
ik
solving
users
per
year,
and
we
have
a
lot
more
visitors
which
is
come
here.
Take
a
look
at
users
actually
use
tools
or
downloading
data
sets
and
so
on,
and
there
are
many
things
that
we
would
like
to
work
with
through
places
like
big
data
hub
in
in
other
groups.
B
A
B
Right
so
yeah,
that
is
one
direction
that
that
were
going.
We
meaning
the
whole
have
Bureau
kind
of
enterprise
that
so
right
now
currently
helps
your
users.
Openvz
containers,
but
I
know
that
the
hubzero
team
is
running
soccer,
I,
don't
know
it
whether
it's
production
yet
but
I,
know
they're
testing
that,
because
there
are
a
lot
of
requests
for
running
things
remotely
and
to,
for
example,
to
deploy
some
tools
closer
to
this,
where
the
data
resides
so
I,
think
that
is
the
direction
we're
going.
A
Mike,
could
you
maybe
comment
on
how
this
fits
within
the
ecosystem?
We
were
talking
about
today,
sure
and
I'm,
hopefully
behind
on
being
able
to
address
it
honestly,
but
I
think
there's
a
very
nice
interlocking
of
all
the
dibs
projects,
and
you
know
so.
For
example,
we
have
we
double
Hydra
share,
would
equally
be
interested
in
the
capabilities.
I
think
that
are
in
Dad's
I,
don't
know
if
there's
any
hundred
people
in
the
room,
but
we
were
just
talking
with
Ray
of
dodging
about
Hydra
share
the.
A
To
this,
and
so
you
know,
we
really
kind
of
taken
the
perspective
that
in
combination
all
of
these
various
data
nets
and
divs
provide
a
really
good
basis
for
the
technical
architecture
of
the
hub's
Apollo
and
since
they
are
working
on
container
izing
Gans
at
some
point,
you
know
we're
wanting
to
get
into
how
to
essentially
productize
linkages
between
these
different
comments
so
that
they
are
deployable
twice
as
a
stack
that
you
know
a
cafeteria
stategy,
so
everything
that's
been
discussed
sound
like
is
very
much
pointing
towards
that
I'm
really
interested
in
as
I
think
is
rebound
pretty,
maybe
hinted,
and
you
know,
and
that's
what
we're
talking
about
with
brown
dog
is
like.
A
Where
does
the
computation
take
place?
It
seems
like
it's
a
really
big
question
that
everybody
is
trying
to
solve
right
and
through
what
sort
of
mechanism
whether
something
is
like
high-throughput
or
something
have
to
run
in
some.
You
know
parallel.
You
know
sort
of
great
communication
for
things
like
coastal
hazards
or
speed
right
so
now,
I'm
rambling,
but
those
are
just
the
point
I
heard
and
whether
anybody
can
react
to
that
or
it
makes
this
a
neighbor.
Does
anybody
have
any
thoughts
they
want
to
add.
B
B
Federate
in
some
fashion
and
get
the
data
over
for
reasonably
sized
data
and
to
be
able
to
be
able
to
launch
tools
that
from
Hydra
shear,
so
I
think
we're
kind
of
waiting
on
something
I
forgot
exactly
what,
but
probably
on
either
fuse
or
NS
and
SS
in
the
IRA.
Something
like
that
so
so
we're.
Hopefully
we
we
get
to
something,
at
least
before
the
container
version
for
sure.