►
From YouTube: 15. Jupyter Dataset Registry Discussion
Description
June 12, 2019 Jupyter Community Workshop talk by Brian Granger, Cal Poly State University
B
A
You're
standing
between
me
and
lunch
so
make
it
quick,
so
so
Saul
I'm
gonna
present
a
slide
just
to
introduce
it
and
then,
if
you
want
to
hop
on
and
do
the
demo
that
way,
we
can
keep
it
really
short
here
and
I'm.
Gonna
share
a
slide,
yeah
sure
all
right.
Can
everyone
see
that
so
where
this
came
from
is
obviously
data
sets
are
a
first-class
entity
in
scientific
computing
data
science
and
AI.
A
However,
until
now,
data
sets
are
not
known
to
Jupiter,
broadly
in
the
sense
that
they're
known
to
the
Python
code
or
R
or
skaila
or
C
code
running
in
a
Jupiter
kernel,
but
the
Jupiter
system
itself
knows
nothing
about
data
sets,
and
this
has
been
creating
a
lot
of
challenges
for
us
as
people
build
extensions
for
Jupiter
labs
in
particular
that
work
with
data
sets
so,
for
example,
saul
grant.
Nestor
master
student
of
ours
have
created
tools
that
can
do
data
visualization
using
voyager
or
plotly
and
jupiter
lab,
and
those
tools
need
to
get
tabular.
A
Data
sets
into
them
and
notebooks
have
tabular
data
sets
in
the
form
of
data
frames.
There
may
be
tabular
data
sets
on
the
file
system.
There
may
be
one
in
a
sequel
database
and
we
were
having
to
start
to
write
a
lot
of
really
brittle,
basically
N
squared
type
of
code
that
you
know
so
this.
This
data
visualization
tool
knows
how
to
pull
tabular
data
sets
out
of
notebooks,
and
this
one
knows
how
to
pull
it
out
of
CSV
and
so
to
address
this,
we
wrote
a
grant.
A
That
is
also
mimetype
based
and
on
top
of
this
there's
a
set
of
conversion
api's
that
basically
know
how
to
map
between
different
mime
types
in
an
efficient
manner,
and
the
idea
here
is
that
someone
may
write
an
extension,
though.
That
knows
how
to
work
with
tabular
data
sets,
but
there's
dozens
of
ways.
The
tabular
data
sets
can
be
encoded.
It
can
be
a
CSV
file
or
URL.
A
Can
we
get
this
data
set
into
that
needed
one
type
if
you're
familiar
with
odo
on
the
Python
side?
Lots
of
similar
ideas
in
this
to
emphasize
the
our
notion
of
data
here
is
entirely
abstracts
and
would
include
any
possible
notion
of
data
ranging
from
files
remote
endpoints
data
api's
that
expose
larger
than
memory
datasets,
essentially,
anything
that
you
could
possibly
imagine
could
be
a
data
set
in
this
context.
A
key
point
is
that
this
is
not
a
data
catalog.
A
This
is
a
system
that
existing
data
catalogues
can
use
to
get
the
data
into
Jupiter
in
a
meaningful
way.
A
data
catalog
is
not
required.
There's
other
routes
of
getting
data
into
the
system,
and
so
the
goal
here
is
to
enable
this
deep
integration
across
different
components
within
Jupiter
lab
as
concerns
data.
So
with
that
I'll
let
I
will
stop.
Sharing
and
Saul
can
see.
B
B
B
So
that
it
also
comes
with
the
built-in
UI,
the
data
Explorer,
so
that
we
can
see
what
data
sets
we
have
registered
and
one
thing
that
we've
added
recently
is
the
ability
to
have
nesting.
So
where
should
here
we're
showing
the
local
file
system
as
in
nest
data
set,
and
we
can
find
a
data
set
inside
of
it
like
the
CSV
file
and
view
it
in
our
built-in
grid
viewer.
B
C
C
B
A
We
would
love
to
work
with
others
on
this,
because
this
is.
This
is
a
type
of
thing
that
if
we
can
get
a
broad
consensus
in
the
community
that
this
type
of
approach
makes
sense,
I
think
it'll
really
unlock
a
lot
of
different
groups
to
begin
building
things
that
will
interoperate
with
very
minimal
friction.