►
From YouTube: CI WG demo: NDS Workbench
Description
Date: 6/1/2018
Presenter: Craig Willis
Institution: National Center for Supercomputing Applications
Midwest Big Data Hub
A
First,
like
to
introduce
Craig
Willis,
who
I've
been
working
with
on
the
national
data
service
and
and
other
projects
now
for
a
bit
and
he's
going
to
be
talking
about
what
the
NBS
labs
workbenches
and
how
it
can
be
used
for
many
different
efforts
that
are
going
on
in
sort
of
the
data
world
around
data
research.
So
Greg.
B
Thanks
Mel
so
Craig
well
as
some
see
research
program
or
the
National
Center
for
supercomputing
applications
at
University
of
Illinois,
and
also
the
technical
coordinator.
The
national
data
service
project
work
closely
with
Christine
Kirkpatrick.
Is
our
executive
director
I'm
sure
you
all
know
that
and
I
don't
have
time
to
talk
about
NDS.
If
there's
any
questions
feel
free.
If
you
don't
already
know
everything
I'll
just
talk
specifically
about
one
initiative,
which
is
this
the
last
workbench
platform
and
well,
let's
get
my
cursor
in
the
right
place
here
so
manias
labs.
B
B
We
it's
something
that
can
be
installed.
It's
I'll
give
my
architecture
diagram
towards
into
the
presentation,
but
it
is
really
based
on
kind
of
kubernetes
orchestration
framework
that
we
deploy
on
OpenStack
environments,
predominantly
Jetstream,
sdsc
and
nebula
that
we
can
host
or
it
can.
You
can
install
it
for
yourselves
in
those
environments,
so
we've
over
the
last
couple
of
years
have
been
kind
of
three
primary.
B
One
is
the
third
one
on
this
list
here,
so
the
one
that
garner
is
the
most
interest
is
remote
collaboration,
complex
data
product
projects,
I'll
go
into
a
use
case
here.
In
a
minute
we
see
a
lot
of
use
for
workshops,
training,
environments
and
akka
thorns
where
people
need
kind
of
quick
cloud-based
access
to
pre-configured
environments.
B
So
this
first
use
case
complex
collaboration
of
complex
data
projects.
You
get
a
shared
platform,
you've
got
remote,
interactive
access
to
data,
you've
got
all
of
the
software.
Dependencies
can
be
pre-installed
for
folks,
and
it
did
you're
dealing
often
in
these
projects,
with
various
levels
of
expertise.
So
users
that
may
or
may
not
be
comfortable
installing
software
or
comfortable
with
the
data
transfer
project
process
or
particularly
for
large
data
sets.
B
So
the
idea
is
that
this
allows
a
very
rapid
deployment
of
customized
interfaces
for
them
to
do
exploratory
work
or
collaborative
work
on
these
data
sets
so
bright.
Our
primary
use
case
right
now
is
the
arpa-e
of
it's
just
a
research
branch
of
the
do
E's
a
tara
ref
project
which
is
building
a
reference
platforms,
data
set
for
sorghum
high-throughput
genotyping
for
sorghum
growing
in
multiple
sites.
B
We've
got
petascale
data
storage
and
computing
pipeline
multi,
institutional
collaboration
and
lab
to
workman
just
become
gonna,
be
primary
interface
for
remote,
interactive
access
to
data
data
by
collaborators
and
developers
on
the
project,
so
both
researchers,
postdocs
students
and
then
the
actual
professional
research
development
team
here
than
CSA
and
in
the
partner
institutions.
So
the
illustration
here
is
that
the
workbench
itself
provides
these
camp
containerized
environments.
These
are
all
web-based
applications,
Python,
IDE
or
studio
Postgres
interface,
Jupiter
environments
and
you're
accessing.
A
B
B
Second
use
case
workshops,
training,
environments
and
hackathons,
so
this
was
actually
very
early
on
people
realize
this
was
a
good
application
in
the
system.
You've
got
pre-configured
software
environments
that
it's
in
a
highly
scalable
system.
The
data
in
software
is
kind
of
preloaded
for
participants.
They
can
be
productive
immediately
without
installing
software,
and
you
know
it's
effective
on
tablets
as
well
as
pcs
and
medaka
reduces
the
load
on
hotel,
Wi-Fi
or
conference
Wi-Fi.
B
So
we
supported
a
number
different
initiatives:
the
NSF
pi4
bootcamp,
which
is
a
an
initiative
here
at
the
University
of
Illinois,
for
history,
training
for
a
thematic
students,
teaching
them
data
science,
think
Chicago,
Civic
tech,
channel
initio.
In
the
summer,
we
host
an
ongoing
asynchronous
tutorial
environment
for
the
NSF
underlying
design
tools,
get
kid
group
and
done
a
number
of
workshops
related
with
the
terrorist
project,
including
phenome
last
year,
and.
B
C
B
Example
here
this
is
a
recent
case-
the
USC
áifá
data-driven
agriculture
workshop
and
hackathon,
so
we
had
three
terabytes
of
sample
data
we
wanted
to
provide
access
to.
This
was
in
conjunction
with
the
tariff
project,
some
of
it
loaded
via
geo
server,
which
is
hosted
in
that
same
kubernetes
cluster.
We
had
the
interactive
analysis,
environments,
the
data
pre-loaded
to
allow
folks
to
access
it,
and
then
they
were
using
desktop
tools
like
QGIS
access
that
ad
hoc,
maybe
medicine
to
VCS
services.
B
B
It's
the
high
level
architecture
we
typically
do
most.
Everything
we
do
right
now
is
is
on
the
OpenStack
environments,
either
through
jet
stream
or
the
two
other
systems
I
described.
We
deploy
a
kubernetes
substrate
on
that
for
container
orchestration,
and
then
we've
got
just
the
basic
services
that
we
use
to
support
the
system,
but
it
scales
out
pretty
wide
just
to
support,
particularly
the
hackathon
and
education
environment
scenarios
and
so
kind
of
a.
B
What
we've
learned
from
this
process
is
that
you
know
once
you've
packaged
software
in
this
way
that
enables
rapid
deployment
on
in
these
kind
of
scalable
infrastructures.
You
can
enable
education
and
training
on
the
same
research
environments,
where
you're
actually
doing
research
and
access
to
that
data.
So
that's
been
a
very
compelling
story
for
the
different
groups
have
used
this
system
and
we
use
it
internally,
obviously
than
for
our
own
development
on
some
of
these
projects.
C
Thanks
thanks
Craig.
That
was
great,
just
curious
about
a
couple
of
things.
This
is
Jim
Logan
Bush
from
University
of
Minnesota
Minnesota
supercomputing
institute.
One
of
the
bullet
points.
It
was
works
on
laptops
tablets
and
hotel
Wi-Fi.
You
don't
mean
that
the
platform
is
actually
installed
on
tablets.
Right
I
mean.
C
B
B
But
yeah
so,
instead
of,
if
you
can
imagine
you're
doing
a
workshop
at
a
conference
and
wiring
participants
to
download
install
software
and
we've
all
experienced,
hotel,
Wi-Fi
becomes
actually
an
issue.
Yep
software
is
available
in
these
kind
of
cloud-based
environments
and
your
minimal
footprint
there
so
yeah.
That's
kind
of
a
very
important
bullet
point.
So.
C
No
I
get
it
I
mean
that
that's
kind
of
what
I
was
thinking
I
just
wanted
to
be
sure.
I
was
getting
that
the
another
bullet
that
you
had
and
in
a
subsequent
slide
was
web-based
turnkey.
So
it's
turnkey
in
the
sense
that
you,
just
you
know,
open
up
your
browser
and
point
at
a
URL.
Is
that
the
idea
yeah.
B
B
B
Things
like
Jupiter
in
our
studio
with
Jupiter
hub,
and
course
video
club,
it's
very
novel,
but
for
some
of
these
other
tools
it
may
be
you're.
Looking
like
one
of
the
early
applications
from
yes,
you
have
seen
a
project
actually
between
John
Crabtree
at
Odom
and
I.
Think
the
group
renze
to
combine
dataverse
with
irods
heap
john
wanted
to
immediately
run
a
workshop
where
he
could
give
everybody
their
own
dataverse
instance
with
an
AI
rod
back
and
that
they
could
boot
up
at
a
conference
in
a
three
hour
workshops.
C
And
and
one
other
question,
or
maybe
just
two
and
a
point,
so
the
Jupiter
Jupiter
now
is
supporting
a
lot
of
these
environments
desktops
as
well
as
things
like
our
studio.
Are
you
actually
letting
Jupiter
handle
the
credentials
and
then
the
graphics
delivery
as
well,
or
is
that
actually
happening
outside
of
Jupiter.
B
C
Sorry,
yeah
you're,
you,
you
actually
run
a
Jupiter
notebook
and
you
have
that
Jupiter
notebook,
but
then
our
studio
sort
of
stands
in
terms
of
your
schematic.
Beside
that,
what
I'm
saying
is
that
if
you
open
up
Jupiter
notebook,
you
can
actually
then
select
from
within
your
Jupiter
notebook
access
to
our
studio
and
pass
through
the
various
tokens
for
for
our
studio,
then
to
run
through
that
Jupiter
session.
So.
B
I'd
say
I've
seen
our
studio
as
a
recent
addition
through
the
Jupiter
hub
proxy
I
know
so,
given
these
were
dealt,
this
was
developed
independent
of
the
Jupiter
hub
framework,
so
the
judge'
we're
not
actually
working.
That's
something
that
actually
we
got
if
a
proposal
gets
funded,
we're
hoping
to
do
better
hook
up
to
their
fast
moving
ship.
Therefore,.
B
B
B
C
A
C
Just
sort
of
simplifies
things
a
bit
from
the
stand
point
again
of
the
portability
element
as
well
as
authentication,
and
what
have
you
spell
gems,
G
EMS,
and
it
stated
the
name
came
from
the
emphasis
in
the
AG
space,
where
the
G
is
genomic,
the
e
as
environment,
the
M
is
management
and
the
essa
socioeconomic.
So
it's
a
reference
to
these
four
data
types
that
are
often
attempted
people
often
attempt
to
bring
these
in
to
some
form
of
interoperability,
to
make
larger
inferences
about
sort
of
the
value
chain.