►
From YouTube: 07 Data Ecosystem Overview
Description
Part of the NERSC New User Training on June 16, 2020.
Please see https://www.nersc.gov/users/training/events/new-user-training-june-16-2020/ for the training day agenda and presentation slides.
A
A
While
the
talk
is
going
on,
feel
free
to
post
questions
there
and
if
we
need
to,
if
we,
if
we
can
answer
them,
while
the
speaker
is
going
and
we'll
do
that,
otherwise,
the
speaker
will
pick
up
the
questions
of
n.
So
the
first
talk
that
we're
gonna
have
is
a
overview
of
the
data
ecosystem
and
nursed
by
what
he'd
been
gene
from
Deena
analytic
services
group.
So
taking
away
what.
B
B
A
B
But
it's
your
problem:
this
is
yeah,
okay,
I'll
keep
talking
and
it
goes
so
actually
I'm.
Just
as
oh.
This
is
actually
wrong
because
I'm
looking
for
10
minutes
and
then
bill
I
think
is
something
for
20
minutes
about
and
then
I'll
talk
a
bit
more
about
file
systems
and
in
particular,
the
burst
buffer,
which
is
a
sort
of
slightly
different
sort
of
files,
and
then
Lisa
will
talk
about
two
ways
that
we
can
use
the
file
systems
in
terms
of
transferring.
A
B
Data
and
then
Quincy
what
that
says,
and
then,
after
the
break,
we
have
more
sort
of
analytics
topics,
first
of
all,
about
Python
and
Jupiter,
and
nurse
2,
which
is
our
container
indeed
I.
Think
you
briefly
heard
about
earlier
and
then
finally,
the
interesting
topic
of
deepening
that
we'll
be
presenting.
B
Ok,
so
here's
just
an
overview
of
the
whole.
What
like
maybe
comes
within
the
idea
of
data
and
ask
if
these
technologies
we'll
be
talking
about
more
in
detail
in
later
talks,
some
we
won't
be,
but
I'll
just
go
through
kind
of
what
they
are
and
they've
been
putting
the
categories
of
accessing
data
either
transferring
in
so
I
recommend
you
both
doing
nice
globus.
These
then
interfacing
with
nurse
where
we
have,
as
we
probably
as
you
heard
in
the
morning,
a
bit
Jupiter
and
you'll
hear
more
about
that
later.
B
At
this
funny
symbol
here
is
an
X
I
believe,
which
is
there's
no
machine
way
of
accessing
mostly
very
button
morning.
Support
web
portals,
which
we'll
be
talking
about
here.
I'll
briefly
mention
an
easy
way
of
putting
things
on
the
web,
but
there's
much
more.
You
can
do
and
we
are
people
who
can
help
with
that
as
well,
and
many
people
use,
for
example,
the
jungle
framework,
and
this.
B
Is
an
API
we
have
so
you
can
actually
it'll
there's
a
so
refresh
of
this
coming
call
the
super
facility
API,
which
I
won't
be
talking
about
either,
but
there
are
ways
to
do
that.
If
you
need
and
then
workflows
bill
is
going
to
talk
about
this
in
lots
of
detail
the
some
tools
here,
like
tasks,
farmer
and
fireworks
or
supporting
some
time
and
then
some
new
that
are
coming
they're
coming
and
they
were
happy
to
help
you
with
then
in
terms
of
data
management.
B
These
things
on
the
Left
file
formats,
so
hdf
is
a
very
common
and
widely
used
across
different
groups.
File,
format
and
Quincy
is
super
active
and
we
will
be
talking
later.
Netcdf
and
root
are
like
were
heavily
used
by
the
communities
that
use
them.
So
we
do
also
provide
some
support
for
those,
but
if
you're
not
in
those
communities,
anybody
interested
in
that
and
then
in
terms
of
dip.
We
don't
have
a
talk
on
that,
but
you
can
look
up
our
documentation.
B
Thing
to
know
is
that
there
is
a
form
there
if
one,
a
database
that
we
do
host
database,
both
MongoDB
sort
of
larger
data
sets
and
MySQL
and
Postgres,
but
traditional
SQL
databases,
and
then
the
data
analytics
space
is
quite
broad
and
you'll
hear
a
little
bit
about
this
in
the
Python
talk
and
also
in
the
deep
learning
talks.
Are
these
things
on
the
right
and
deep
learning
frameworks,
but
we
also
support.
B
Those
as
well
and
spark
for
distributed
analytics
and
then
in
the
area
of
the
patient,
which
is
important
also
for
traditional
HPC
applications.
We
support
it
and
power
of
you,
okay,
software,
but
then
also
there
are.
There
are
features
if
you
like,
particularly
on
quarry,
and
so
these
are
a
kind
of
bunch
of
things,
some
of
them
work,
particularly
in
integrating
with
our
high-performance
computing
environment.
That's
really
about
running
big
jobs,
which
is
very
cunning,
pensive
jobs,
so.
A
A
B
B
B
That
are
maybe
suit
for
a
time
emitting
stores
or
experimental
data
and
ask
this
jobs
such
as
shared
no
queue.
If
you
don't
need,
I
know
application,
so
that
can
be
serial
jobs
or
any
actual
fraction
of
a
note
and
then
a
separate
to
you
for
transfers.
They
can,
you
know,
queued
up
as
well,
and
then
we
we
see
that
some
needs
expansiveness.
We
have
queues
and
we
can
take
some
time
for
a
job
to
get
through
those
depending
on
the
size
of
it.
But
some.
B
Have
needs
for
real-time
things,
so
things
that
actually
run
when
you
submit
it,
and
this
obviously
requires
dedicated
resource
on
our
part.
So
again,
it's
by
request
via
sort
of
app
for
that,
but
we
can
support
it
and
then
in
another
useful
features,
interactive
queue.
So
here
you
can
request
the
artistic
loads
project
and
up
to
four
hours,
and
you
can
usually
get
quite
a
lot
quick
response
on
that.
Then,
once
you
talk
to
running,
we
have
containers
which
later
I'll
talk
about
the
burst
buffer.
More
detail
later,
so
I
won't
go
through
that.
B
We've
worked
on
in
quarry
best
to
external
major
sets
directly
into
compute
pains,
so
this
just
shows
that
you
know
all
these
things.
We
have
we've
kind
of
learned
from
our
experience
on
quarry
and
we're
building
on
that
for
Perlmutter.
So,
for
example,
I.
Oh,
we
have
this
burst
buffer,
which
you'll
see
is
a
a
system
but
requires
management
of
your
data.
You
have
to
move
things
into
the
buffer
announced
again
so
on
poem,
so
this
will
be
easier.
They
to
management
then
analytics
this.
B
A
B
Planning,
and
particularly
enhanced
by
the
hardware
of
perlmutter
GPUs
and
then
in
terms
of
workflow
I,
mean
we
just
sort
of
moving
direction
less
into
the
system.
So
it's
easier
to
manage
your
workflows
and
in
terms
of
data
transfer.
This
thing
I
mentioned
at
the
end
that
we
actually
had
to
do
quite
a
bit
external
data
transfer
on
Corey.
This
will
hopefully
be
much
better
from
the
outset,
with
Perlmutter
that's
based
fabric,
but
if
I
performance
and
for
external
access,
okay,.
A
B
B
Generally,
we
have
quite
a
lot
of
staff
supporting
data
use
cases
at
nurses
and
we're
here
to
help.
As
you
know,
normally
this
session
is
quite
interactive
when
we
were
able
to
meet
in
person
with
people,
but
you
know
to
give
you
another
way
of
kind,
as
we
mentioned
earlier,
but
with
the
the
nug
is
not.
This
is
not
an
official
supported,
support,
Channel
and
asked,
but
it
is
you
can
contact
the.