►
Description
Join data experts Chris Blum, Michelle DiPalma and data novice Chris Short every other week for a hands-on Office Hour about Red Hat OpenShift Data Science. Be ready with your questions and to learn a few things along the way.
A
Good
morning,
good
afternoon,
good
evening,
wherever
you're
handling
from
welcome
to
another
edition
of
the
data
services
office
hour,
if
you
notice
the
name
changed
during
the
show,
so
we're
going
to
talk
a
little
bit
about
that
today,
I'm
chris
short
executive
producer
of
openshift
tv,
I'm
joined
by
chris
bloom
and
I'm
very
happy
to
have
both
of
you
here,
I'm
very
happy
to
have
learned
that
pronunciation
this
morning
as
well.
So
thank
you
very
much
for
the
lesson
and
french
pronunciations
this
morning.
C
Yeah
so
some
of
you,
our
true
fans
that
always
watch
the
office
are
here
they
probably
remember
me.
I
was
gone
for
a
little
bit
because
I
I
live
in
berlin,
germany
and
a
lot
of
people
have
decided
to
work
from
home
and
that
kind
of
overloaded,
the
internet
capabilities
of
germany,
and
so
I
was
limited
to
0.4
megabits
upstream,
and
so
I
said.
C
Nope
so
right
now
I
spent
three
months
with
my
aunt
very
remote.
I
don't
have
a
personal
interest
anymore,
but
we
do
have
fiber
internet
here,
so
100
megabits,
synchronous
and
so
I'm
able
to
come
back-
and
I
hope
you
you're
not
too
sad
not
to
see
michelle
today
so.
C
Yes,
she
will
look
back,
so
that's
not
the
only
change
that
I'm
back.
We
also
changed
the
name,
so
we
changed
from
the
ocs
officer.
We
changed
it
to
the
data
services
office
hour
and
that's
not
just
a
name
change.
It's
also
a
change
in
how
we
change
our
focus.
C
Where
previously
we
always
talked
about
storage,
storage,
storage
and
a
lot
of
people
just
think
about
ocs
as
the
solution
to
provide
their
persistent
stories,
their
pvcs
and
just
stole
that
single
use
case.
Just
do
the
dumb
stuff
that
a
lot
of
other
people
can
do,
and
it
was
sometimes
difficult
in
these
conversations
to
really
position
the
true
values
of
what
our
product
does.
C
So
with
the
name
change
of
data
services,
we're
going
a
step
further,
we're
not
just
talking
about
storage,
because
that
is
a
a
problem
that
has
been
solved
over
and
over
by
multiple
companies.
We
want
to
show
you
that
we
were
thinking
one
step
further,
we're
not
just
trying
to
provide
some
kind
of
storage
capacity.
C
A
C
Have
two
parts
of
this,
so
we
have
the
let's
call
it
the
old
part,
so
ocs
is
now
renamed
to
odf,
so
openshift
data
foundation
and
the
other
part
is
what
we
have
guillaume
here
now
is
we
want
to
talk
a
little
bit
about
data
science,
so
do
small
things
on
the
storage
and
that
involves
aiml
workloads
and
we
can.
We
can
do
fun
stuff
with
jupiter
hot,
just
prototype,
something
in
python.
Let
it
run
so
john
will
talk
a
little
bit
about
that
later.
D
Awesome,
yeah
and-
and
that's
the
idea
here,
if
I
may
it's
it's,
like
you
know,
changing
the
perspective
about
what
we
are
proposing:
it's
not
only
about
storage,
which
is
the
implementation.
How
you
do
it
really
but
much
more
about
you
know
the
business
value.
What
can
you
do
with
the
storage,
which,
in
fact
is
okay?
I
will
work
with
my
data
for
data
science
or
you
know
purely
data
data
for
your
applications
or
things
like
that.
But,
of
course,
at
the
end
of
the
day,
it's
the
same
thing
that
is
running.
You
know
you.
D
You
have
to
store
your
bits
and
pieces
inside
some,
some
storage,
but
now
we
want
to
to
lean
a
little
bit
more
on
this
aspect
on.
How
can
I
use
my
data?
How
do
I
integrate
my
data
inside
my
overall
architecture
and
not
only
leaving
storage
at
the
end
of
the
chain?
You
know
as
something
that
you
that
you
don't
even
consider
until
you
really
need
it.
It
has
to
be
part
of
your
architecture.
A
Right,
like
you,
can't
just
dump
everything
in
one
place
right
like
we
need
to
think
about
this
in
a
like.
You
know
how
we
used
to
think
about
how
we
partition
disks.
We
put
the
boot
volume
in
the
very
front
right,
like
you,
have
to
think
about
all
of
your
data
in
not
the
same
light,
but
you
got
to
think
about
it
right,
like
what
am
I
going
to
do
with
it?
It's
just
going
to
sit
here.
I
have
to
keep
it
for
regulatory
reasons.
A
Can
I
do
anything
else
with
it,
while
it's
there
right
like
how
do
we
manage
all
that,
so
the
entire
engineering
effort
behind
the
data
foundations,
I
think,
is
like
it
takes
the
the
name
and
actually
applies
values
to
it.
Right,
like
chris,
you
mentioned
values
and,
like
it's
a
very
interesting
proposition
here,
right
like
I
like
it,
a
lot.
C
Yeah,
so
the
foundation
is
literally
a
foundation,
so
what
we
had
before
is
now
the
foundation
of
what
we
can
put
on
top
of
it,
and
what
we
want
to
talk
about
is
use
cases.
You
tell
us
more
high
level
about
what
you
want
to
achieve,
and
then
we
can
talk
about
how
the
odf
or
the
data
services
can
support
you
in
doing
this,
and
also
one
of
the
consequences
is
we.
C
We
just
released
a
new
version
now
odf
4.7,
and
with
that
we
also
thought
a
little
bit
more
about
the
pricing,
so
the
pricing
will
be
a
lot
easier
to
calculate
with
this
new
version
and
yeah
and
we're
starting
in
this
data
services
approach,
where
we
add
things
onto
our
foundation
that
you
can
then
use
beautiful.
C
C
Yeah,
that's
that's
important.
Someone
needs
to
ask
the
the
hot
questions.
If
you
have
any
more
hard
questions,
just
write
them
in
the
chat.
B
C
C
We
just
released
it,
I
think
yesterday
was
was
the
ga
4.7,
so
in
the
last
12
hours
it
has
worked
in
the
last
12
hours.
We've
killed
it
yes,
awesome,
but
obviously,
before
the
ga,
we
we
had
a
lot
of
internal
discussions
about
this.
We
wanted
to
actually
understand.
C
Are
we
doing
this?
The
right
thing
do
people
want
this?
Do
people
understand
this?
There
was
a
lot
of
conversation
about
how
we
should
position
this,
how
we
should
do
it
and
the
in
the
conversations
when
people
really
understood
what
we
want
to
do.
Obviously
there
were
there
were
a
lot
of
people
that
were
sad
that
were
leaving
that
term
storage.
A
lot
of
people
look
at
this
term
and
say
well
now
we're
not
storage
anymore.
C
What
does
that
mean,
but
it's
a
new
term
it
needs
getting
used
to,
but
once
that
once
it
sinks
into
people,
that's
what
we've
seen
internally
here
is
that
they
they
understand
that
we
were
limited
before
we
were
limited
to
being
a
storage
department
that
just
cares
a
little
bit
about
disks
and
how
to
partition
those
disks,
how
to
make
them
available.
How
to
be
a
fast
storage
or
storage
that
only
needs
little
resources,
but
now
we
can,
we
can
actually
drive
our
conversations
further.
We
can
talk
further
about
hey
customer.
D
So
it's
also
yeah.
It's
also
a
change
of
you
know
the
people
we
want
to
talk
to
you
know
not
only
the
csun
means
the
storage
and
means,
but
go
a
little
bit
broader.
You
know
with
the
architecture
solutions
architect,
the
cto,
the
cio.
If
you
speak
to
cio
about
storage,
you
will
say:
oh
no,
you
know
it's
it's.
The
thing
for
my
itn
means
I
don't
really
care
about
storage.
D
Now,
if
you're
talking
about
what
you
can
do
with
the
storage,
what
you
can
do
with
the
data,
then
you
have
the
the
attention
and
anyway
we
we've.
I
have
experienced
this
shift.
You
know
for
the
past
10
years
in
all
the
it
infrastructure
components
becoming
more
and
more
commodities,
especially
with
the
with
the
cloud.
It's
okay.
D
Let
me
let
me
take
you
back
10
years
ago,
back
when
I
was
working
at
level
university,
we
were
out,
for
you
know,
for
an
rfp
for
new
servers
or
something
like
that
we
would
spend
you
know
the
architects
team.
We
would
spend
hours
looking
at
the
bus
architecture
with
the
processors
and
how
it's
handled
and
everything
fast
forward.
10
years.
Oh
just
bring
me
a
server
right
there
on
hp
or
whatever
I
just
don't
care,
because
that's
not
relevant
anymore.
D
What's
what
has
become
relevant
is
the
containers
that
you
are
able
to
reschedule
automatically
or
your
vms
or
things
like
that,
but
not
in
fra
itself,
not
saying
it's
not
important,
okay,
but
it
has
become
so
easy.
You
know,
oh,
I
can
have
a
server
from
aws
from
azure
or
even
on-prem.
Now
I
have
all
my
pipelines
to
deliver
vms
on
my
internal
cloud
or
things
like
it's.
D
It's
not
really
the
subject
anymore
now
the
subject
is:
how
can
you
deliver
this
to
your
dev
to
your
people,
for
them
to
be
able
to
use
it
in
one
hour,
because
that's
what
you
are
competing
with
with
ws?
You
know
10
years
from
now
for
servers.
Oh,
you
want
a
new
server.
Yeah
call
us
back
in
three
weeks
because
we
have
to
order
it
and
then
it
will
be
delivered
and
then
we
have
to
rack
it
and
connect
it.
Something.
D
And
you
know
it's
obvious
for
servers,
but
the
same
thing
has
been
happening
for
storage
storage
becoming
a
commodity,
because,
oh
you
want
object,
storage,
yeah,
just
go
to
readable
uss3
and
you
have
object
storage.
That
means
we
have
to
deliver
exactly
the
same
experience,
therefore
leaning
more
on
that
aspect.
What
do
you
do
with
the
storage?
Of
course,
we
will
continue
to
to
to
speak
and
to
work
really
closely
to
the
with
the
storage
people,
the
pure
storage
people,
because
at
some
point
you
have
to
do
this.
D
A
C
Like
the
biggest
change
is
actually
an
area
where
geo
missed
right,
okay,
so
internally
in
red
hat,
we
emerged
two
teams,
which
was
for
me
the
storage
people,
and
then
we
got
the
data
science
people
on
board
too.
So
the
the
biggest
change
that
you
can
see
today
is
that
the
data
science
part
has
been
added
and
geome
can
talk
a
lot
more
about
this.
The
other
thing
is
that
now,
with
the
4.7
release,
we're
starting
and
it's
now
def
premium
4.7.
C
We
we
started
looking
at
the
vr
things,
so
this
will
gradually
improve
now
and
you
you
get
a
little
preview
of
it
in
product.
Seven
4.8
will
already
be
a
lot
better
and
then
we're
looking
at
at
the
releases
afterwards,
where
you
can
release
that
really
yeah,
so
that
is
now
available,
but
the
biggest
change
is
the
data
science
part
that
has
been
added.
D
Nice
yeah-
and
it's
not,
you
know
we
changed
them
last
week,
something
or
the
announcement
was
yesterday
for
the
official
new
name.
So
it's
it's
not
about
the
changes
that
we
put
in
the
product,
because
this
has
been
happening
for
a
few
months.
You
know
adding
more
features
towards
this
easiness
of
consuming
storage
or
being
able
to
to
deliver
data
services.
So
we've
been
doing
that
and
we
will
of
course
continue
to
do
that.
D
So
that
means
integrating
more
things
into
directly
into
the
openshift
console
into
the
openshift
ui,
so
that
for
people
it's
easier
to
to
to
to
work
with
storage,
especially
as
a
dev,
you
know
you
don't
want
to.
In
fact
you
don't
even
want
to
talk
about
storage.
You
just
want
to
put
your
data
somewhere,
so.
D
I
only
want
to
talk
about
an
api
and
an
sdk.
The
rest
is
in
fact
it's
not
my
skill
set
as
a
developer,
and
I
don't
want
to
learn
more
about
this,
because
I
have
tons
of
other
things
to
learn
that
are
directly
linked
to
to
what
I
do.
Storage
again
is
a
commodity
from
from
this
point
of
view,
so
we
have
already
and
we
will
continue,
bringing
more
of
those
features
of
those
integrations
in
inside
inside
openshift,
as
must
as
we
can.
D
Let
me
give
you
an
example
in
sev,
since
last
year
we
have
this
feature
called
bucket
notification
in
object,
storage.
That
means
whenever
something
is
happening
on
your
bucket,
of
course,
you
configure
it.
Let's
say
you
have
uploaded
a
new
image
or
something
this
bucket
has
the
ability
to
send
a
message
to
send
a
notification
to
an
endpoint.
D
It
such
a
simple
message,
saying
hey
this
file
with
this
name
has
just
been
created
inside
this
bucket
and
we
can
send
this
message
to
different
endpoints,
an
http
rest,
api
kafka,
mq
messaging,
and
then
you
are
able
to
act
upon
this
event.
Okay,
so
that's
that's.
The
first
illustration,
where
you
bring
data
as
an
intelligent
thing
within
your
architecture,
because
now
it's
part
of
your
even
driven
architecture.
D
It's
not
just
something
where
you
dump
your
data
and
you
retrieve
it
when
you
need
it
now,
it
can
totally
be
part
of
the
architecture,
but
this
feature
that's
well.
It's
not
that
difficult.
You
know
to
to
to
configure
bucket
notifications,
it's
pretty
standard
and
we
reuse
the
same
mechanisms
and
protocol
as
as
you
have
in
wss3.
So
all
the
sdks
that
are
here
and
and
everything,
but
still
it
may
be
difficult
for
some
people,
so
there
are
work
there.
D
There
is
work
going
on
right
now
to
bring
this
as
a
configuration
configuration
part
of
the
yaml
definition,
so
purely
native
kubernetes
way
of
programming
thing.
Oh,
I
want
to
have
a
bucket
storage
and
I
want
it
to
send
event
to
this
endpoint
three
lines
of
yammo
bam.
You
have
your
your
object
bucket
and
you
are
able
to
work
with
it
from
your
applications
and
then
create
this
event-driven
architecture.
D
You
don't
even
need
to
know
how
it's
implemented.
You
know
behind
the
curtain.
You
don't
even
know
if
it
runs
on
ceph
or
whatever
else,
or
how
many
nodes
or
replicas
or
things
like
that.
No
normally,
your
I.t
team
is
supposed
to
take
care
of
that
and
provide
you
with
these
performant
scalable
storage
that
you
need
to
work
with.
That's
that's
the
kind
of
things
that
are
happening.
D
C
C
So,
in
addition
to
to
what
geome
said,
one
of
our
goals
is
to
keep
odf
in
with
the
similar
goals
that
we
already
have
with
ocs,
where
you
have
an
interface.
C
That's
very
simple:
to
use,
it's
very
integrated
into
the
openshift
experience,
a
lot
of
the
other
products
that
you
can
have
that
give
you
storage
on
the
openshift
platforms.
They
might
not
even
be
written
for
openshift
they're
written
for
kubernetes
and
they
sometimes
work
with
openshift
and
then
there's
a
new
version
and
there's
compatibility
issues.
C
Odf
is
developed
primarily
for
openshift
and
works
with
openshift.
It's
deeply
integrated.
You
get
dashboards
and
even
though
we
add
more
and
more
features,
we
have
a
ton
of
features
that
we
we
have
in
in
staff
in
the
back
end
that
we
can
port
to
odf,
but
we
do
still
want
to
have
that
ease
of
use
so
that
you
don't
need
dedicated
storage
people
that
need
to
understand
it.
It's
just
that
it's
there.
You
can
use
it,
but
you
can
have
your
regular
people
use
these
products
easily.
A
Regular
people
using
the
product
easily-
that's
that's
music
to
my
ears.
It
really
is,
I
mean
because
we
all
know
I'm
the
storage
idiot
on
the
show
right,
like
the
data
science
idiot
right,
like
I've,
helped
the
data
science
people
embrace
containers.
Now
it's
like.
Can
the
data
science
people
help
me
embrace
some
of
what
they're
doing
that'd
be
awesome,
yeah?
C
This
in
the
first
couple
of
shows,
if
you
want
to
go
back
to
the
archive
where
chris
himself,
he
installed
ocs
and
it
worked.
A
It's
one
of
those
things
where
it's
like,
I'm
patiently
just
waiting
for
the
next
cool
thing
to
come
out
about
it
right
like
it's,
not
it's
maintaining
itself,
it
does
what
it
does
it's
an
operator.
You
know
it's
gonna
handle
things
for
me.
Just
waiting
for
cool
features,
you
know,
is
there
anything
else
I
can
do
with?
This
is
always
the
thing
that
I
think
about
like
okay,
you
know
the
cluster's
here
it's
doing
things
for
me.
A
Can
it
do
more
things
for
me
right
and
that's
what
most
people
do
with
their
infrastructure
too
right
like
we'd
like
to
take
on
a
new
project?
Can
we
do
what
we
got
or
do
we
need
to
add
something
else
right?
A
lot
of
people
sit
there
and
think
that,
and
it
sounds
like
odf
is
going
to
have
some.
You
know
foundations
and
data
science
that
will
have
folks
thinking.
A
C
A
So
where
do
folks
go
right
now
to
learn
more
about
odf
data
services
that
whole
gamut
of
things?
I
dropped
one
link
in
here
that
I
found
openshift
data
foundation
from
the
technologies
section,
but
it's
just
a
high
level
overview.
I'm
assuming
the
docs
and
everything
have
been
updated
right
like
what
else.
C
Exactly
so,
we
just
updated
our
access.redhead.com
site.
Let
me
just
fetch
the
link
for
you
and
we
try
to
to
make
that
more
obvious.
What
we
talk
about,
what
is
data
services
and
that
and
yes,
you'll,
find
us
in
the
documentation
there.
You
you'll
learn
how
to
do
it
and
there's
going
to
be
a
lot
more
material
out
there
in
the
next
couple
of
days.
That
will
talk
about
data
services,
how
how
it
is
positioned,
all
the
things
that
I
talked
about
earlier.
A
So
what
what
are
you
most
excited
about
like
looking
forward
now
folks?
This
is
when
we're
talking
about
the
future.
There's
no
dates,
there's
no
times
being
promised
here.
You
know
like
just
keep
that
in
mind,
and
you
know
future
talk
is
happening
right
now,
so
not
saying
that
this
is
promised
in
the
next
release
or
promised
ever
right
like
what
are
you
most
looking
forward
to
as
a
part
of
this
change?
I
know
it's
bringing
people
together,
which
is
always
good.
It's
it's
changing
people's.
C
C
So
you
have
an
application
and
the
object
storage
can
enable
you
by
adding
new
features,
to
actually
do
that
and
now
that
we
have
the
object
bucket
notifications,
we
we
can
deliver
that
to
all
platforms,
no
matter
where
you
run
previously.
Maybe
you
only
have
that
available
in
aws
or
not
in
bare
metal
or
anywhere
now
with
odf
follows
you
wherever
you
want
to
go?
C
C
The
underlying
technology
is
already
in
step,
so
it's
nothing
new.
It's
not
like.
We
we
go
out,
and
we
say:
okay,
this
it's
quite
complicated
to
synchronize
data
across
an
internet
link,
and
sometimes
you
want
to
do
it
synchronously.
C
So
it's
updated
immediately
on
both
sides.
Sometimes
you
want
to
do
it
asynchronously
and
that
difficult
part
is
already
handled
and
has
been
used
by
customers
already
as
it's.
Even
though
it's
a
new
odf
feature,
it's
not
going
to
be
like
you
have
to
be
careful
or
afraid
to
use
it,
but
we
want
to
make
it
so
that
the
user
experience
is
great.
C
C
C
Right
now
it's
at
that
phase,
where
chris
bloom
can
do
it,
and
then
we
want
to
get
it
to
the
stage
where
chris
short
can
do
it,
because
it's
easy
it's
in
the
ui
and
we
do
have
specific
kubernetes
dr
objects
that
we
can
use
to
describe
how
we
want
to
do
the
synchronization
and
that's
what
I'm
looking
forward
to
and
that's
also
an
area
where
talking
about
data
services
takes.
C
C
A
So
talking
hybrid
here,
let's
think
hybridly
since
you've
mentioned
that
you
know
hey
cross-cloud,
maybe
or
on
on-premises
to
cloud
what
are
the
you
know,
kind
of
advantages
of
putting
odf
across
a
fleet
of
clusters
where
data
scientists
can
like
access
it
easily
and
then
have
like
you
know,
there's
a
team
over
here.
There's
a
team
over
there
there's
one
big
bucket
of
data
that
they
use
like
what
is
that
experience
going
to
be
like
for
everybody
using
it
right?
How
are
how
like,
if
I'm
pulling
up
a
jupiter
notebook
as
a
data
scientist?
A
D
Well
for
data
scientists,
if
you
work
inside
your
jupiter
environment,
you're
already
one
layer
over,
so
you
shouldn't
be
concerned
about
storage.
I,
and
there
are
different
things
you
can
do
that.
I
guess
the
main
interesting
point
brought
by
odf
is
that
it
brings
all
the
three
different
types
of
storage.
You
will
need
to
to
to
make
data
science
or
data
engineering
happen.
Okay,
I
will
take
first
example.
D
This
is
a
team
in
ontario
that
I
helped
build
the
data
science
platform
for
their
coving
19
research,
okay,
it's
a
loosely
loose
group
of
300
researchers
from
different
organizations,
the
the
different
ministries
and
guys
in
ontario,
and
they
grouped
up
together
as
a
community
to
work
on
the
data
that
was
available
for
kovi
19.
short
story.
They
were
kind
of
fed
up
with
the
way
the
government
was
publishing
the
data,
which
was
not
really
useful
in
well.
The
data
was
useful,
but
not
for
researchers,
because
no,
it
was
not
raw
data.
D
It
was
not
updated
in
the
right
way,
so
they
took
up
on
themselves
to
okay,
we'll
do
this
we'll
do
this
data
aggregation
data,
scraping
and
and
recreate
data
sets
that
we
can
really
work
with.
So
I
helped
them
set
up
this
open
data
hub
environment,
so
this
data
science
platform
environment,
and
here
they
had
these
specifications
that
they
wanted
to
be
able
to
share
notebooks
and
they
wanted
to
be
able
to
share
data
between
each
other.
How
do
you
achieve
that?
D
You
know
normally
a
notebook
when
you
launch
it
with
jupiter
hub
and
then
it's
connected
to
your
storage,
but
that's
your
storage,
your
your
kind
of
stock,
but
with
the
df.
Oh,
no.
We
have
also
file
system
storage
with
cfs,
so
that
means
we
are
able
to
have
those
rwx
volumes
and
mean
volumes
that
you
can
connect
to
multiple
pods.
At
the
same
time,
and
from
this
you
can
build
a
shared
library,
a
shared
library
of
notebooks
or
shared
library
of
data.
That's
that's
the
first
step.
D
Second
step
is
oh
yeah,
but
you
know
we
want
to
be
able
to
access.
Also,
all
those
data
from
many
different
points
on
you
know
interconnect
all
those
things
together.
Then
object.
Storage
is
much
more
suited
for
this
kind
of
things,
and
and
generally
we
we
tend
to
see
more
and
more
people
shifting
to
object
storage.
For
this
exact
reason,
it's
easy
to
work
with.
Now
it's
it's
built
into
most
of
those
scientific
or
data
science,
libraries,
but
because
of
this
disconnected
mode,
you
know
it's
not
something.
D
It's
not
a
file
system
that
you
mount
on
your
server.
It's
only
an
http
request
that
you
can
make
from
wherever
you
you
are
in
the
world,
so
this
disconnection
in
between
your
notebook
or
the
the
container
or
the
vm
that
is
processing
the
data
and
the
storage
makes
it
really
well
suited
for
for
data
science
environments
and
it's
also
brought
by
odf,
because
odf
has
also
object,
storage
and
then
at
some
point
you
will
need
a
database,
oh
for
database.
D
I
would
use
blocks
because
I
need
this
block
block
approach
and
an
intensive
workload
approach.
Well,
it's
still
odf,
so
you
see
that's
where
this
I
find
it
interesting
because
you
know
granted.
There
are
many
different
storage
vendors
that
offer
that
have
fantastic
offers
in
block
or
in
object
storage,
but
usually
they
don't
have
this
fully
integrated
approach.
D
You
know
all
across
all
across
the
board
of
storage,
which
is
what
you
get
in
the
df
as,
as
we
said,
even
eucris,
you
are
able
to
deploy
it
in
a
few
clicks
and
then
you
have
file
block
and
object,
and
then
you
are
about
to
do
mostly
whatever
you
want
depend,
depending
on
the
use
case,
so
because
data
science
and
data
engineering
is
exactly
about
this.
It's
always
trying
to
reinvent
something
because
the
context
changes
or
you
want
to
use
new
things
or
test
new
things.
It's
really
different
from
a
standard
application.
D
Let's
say
I'm
an
insurance
company
and
I
want
to
I
want
to
do
the
architecture
for
my
new
application.
Well,
I
will
work
a
few
months
on
my
architecture.
I
will
say
that,
okay,
I
need
this
type
of
storage.
I
will
go
out,
buy
it
and
then
I
bring
everything
and
it
will
stay
the
same
for
five
ten
years
right.
That's,
okay!
It's
not
true!
In
data
science,
in
data
science,
what
you
are
implementing
now
did
not
exist
six
months
ago
and
will
be
absolutely
six
months
from
now
right.
D
So
if
you
don't
have
this
agility,
you
know
being
able
to
pick
and
choose
the
different
types
of
storage
that
you
that
you
need
or
recreate
easily
architectures
by
just
using
again
pvcs
persistent
volume
claims
or
object
bucket
claims,
or
things
like
that.
If
you
don't
have
this
agility,
it
begins
to
get
really
really
difficult
to
do
to
to
work
with
so
again
it's
I,
I
think
the
best
thing
is
that
odf
being
fully
integrated
into
openshift.
D
That
totally
makes
it.
You
know
the
the
platform
of
choice
to
to
to
to
set
up
those
data,
science,
environment
plus
you're,
totally
agnostic
of
the
real
infrastructure
that
is
underneath
right,
meaning
whatever
you
are
creating
in
aws
or
azure
as
a
test.
You
know
you're
trying
your
things
just
to
to
learn
more
or
maybe
you
know
you
you,
you
have
a
subscription
to
to
rhodes,
to
begin
to
use
open
data
hub
and
shift
data,
science
and
okay.
D
You
see
it
fits
my
need,
but
I
want
to
be
able
to
do
something
on
prem
yeah.
You
can
totally
do
the
same
thing
on
prem,
because
you're
not
tied
to
the
specific
storage
that
is
brought
by
aws
or
when
roads
will
be
on
azure
you're,
not
tied
to
the
specific
storage
that
will
be
brought
by
azure.
So
again,
it's
about
flexibility,
and
I
I
I
guess
that's
our
main
strength
here.
A
D
Yeah,
I
I
can
show
you
some
of
the
things
I'm
doing.
Let
me
share
my
screen.
A
D
D
Basically,
a
notebook
is
a
web
interface
that
connects
you
to
a
kernel,
a
kernel
being
the
engine
that
will
run
your
code.
Okay.
So
here
we
can
see
I'm
in
my
environment
so
again
fully
web
environment,
and
I
can
see
that
I'm
connected
to
a
python
3
kernel.
So
that
means
whatever
I
will
be
running
inside.
My
notebook
will
be
run
against
this
kernel,
and
this
kernel
doesn't
run
on
my
computer.
It's
running
on
the
cluster
on
the
openshift
cluster
in
the
container
that
I've
launched.
D
That's
the
first
advantage
of
setting
up
this
data
sense
platform
on
top
of
weapon
shifts,
because
that
means
you
can
bring
to
your
users
the
full
capabilities
of
a
cluster
okay.
I
could
do
this
for
my
for
my
ipad.
It
would
work
exactly
in
the
same
way,
but
the
code
that
I
will
that
I
will
run
will
run
on
this
cluster,
so
maybe
with
8
cpus
and
two
gpus
32
gigabytes
of
ram.
Whatever
I
don't
have
on
my
ipad,
it
will
still
run
and
the
way
it
works
with
notebooks.
D
You
enter
your
code
into
cells
like
this
one,
and
this
is
a
python
cell.
This
is
python
code,
okay
and
you
are
then
able
to
run
those
cells
independently.
So
I
will
run
the
first
one.
I
click
on
run
and
I
have
the
result
here.
This
is
what
you
entered.
Okay,
hello
world,
you
know
very
basic,
but
it
has
run
only
this
cell.
D
Now
I
want
to
run
the
other
one
perfect
and
then
it
has
around
the
same
function.
It
has
run
the
function
that
I
had
created
on
my
first
cell,
but
with
a
new
text.
Okay,
so
interactive
way
of
developing
your
python
code.
This
is
basic
and
well,
of
course
you
can
take
notes.
You
have
sales
with
code
and
sales
with
markdown
and
you
can
create
your
environment.
D
D
C
D
Yeah,
that's
why
it
has
become.
You
know
so
popular
with
data
scientists,
because-
and
it's
called
a
notebook
because
that's
exactly
what
you
would
do
as
a
researcher
doing
experiments.
You
have
your
research
notebook
and
you
know
you
take
notes.
Okay,
here,
I'm
running
experiment
number
one
with
these
parameters.
You
run
the
things.
I
don't
know
what
you
do,
you're
in
chemistry,
you,
you
mix
up
different
liquids
and
see
what
happens
and
you
write
the
results
here.
It's
exactly
it's
exactly
the
same
thing.
I'm
writing.
Okay.
D
D
Let
me
first
switch
back
to
this
view.
To
give
some
explanation.
D
Okay,
here
it's
working
in
this
way,
I
have
extra
images
that
I
am
sending
into
a
bucket
okay,
an
object,
storage
bucket,
but
because
this
bucket
has
been
enabled
with
notification
every
time
I
am
sending
a
new
image,
it
will
send
a
notification
to
kafka
to
a
kafka
topic.
Nice,
okay-
and
I
have
here
in
my
open
shift
and
everything
runs
in
openshift
in
my
environment.
I
have
this,
which
is
a
creative
eventing
component.
D
It's
listening
to
this
kafka
topic
and
whenever
something
some
message
has
been
coming
in,
it
will
send
this
message
to
here.
I
have
a
canadian
serving
component
with
a
serverless
function
in
which
I
have
built
my
model,
my
ai
model,
that
is
able
to
recognize
the
risk
of
pneumonia
and
in
this
container
I'm
making
this
risk
assessment.
D
Okay
and
then
I
will.
I
will
save
the
results
into
a
database,
but
here
you
see
we
have
this
workflow
where
we
go
from
okay,
I'm
sending
my
data
to
my
data
repository,
which
is
an
object
bucket,
which
is
taking
part
of
this
overall
architecture
where
it
sends
message
to
kafka
and
then
to
my
risk
assessment
bucket
and
container,
and
we
can
see
it
live
on
this
dashboard
we
have
here.
You
know
I
started
a
while
ago,
the
generator
so
I'm
sending
all
those
images
into
my
object
bucket.
D
I
have
here
this
discounter
that
will
count
the
images
coming
in
then.
I
think
you
need
to
zoom
in
a
little
yeah,
but
it
will
mess
up
a
little
bit.
The.
D
Okay,
but
let
me
describe
you
know
it's
what
it's
what
I
described
here:
okay,
I'm
putting
everything
into
my
object:
storage,
it's
sent
to
a
kafka
bus,
and
here
I
have
my
container.
It
is
doing
the
risk
assessment.
I
have
this
counter.
D
The
model
is
not
able
to
recognize
exactly
if
there
is
a
risk
or
not
so
for
further
processing.
The
images
are
first
anonymized
and
then
they
are
sent
to
another
process,
and
here
I
have
all
those
those
data
about
the
the
the
last
images
that
were
recognized
and
so
on
with
with
the
images
themselves.
But
it's
just
to
illustrate
how
you
go
from
this
oops
this
one,
which
is
your
data,
science,
development,
environment,
okay,
and
here
again
you
are
leveraging
different
things.
D
You
are
leveraging
the
the
block
storage
because
that's
where
my
notebook
is
residing,
you
are
leveraging
object,
storage,
because
that's
where
the
data
set
with
my
6000
or
and
something
raw
images
are
to
trend
the
model
and
that's
a
small
data
set.
You
know.
Sometimes
the
data
set
that
you
have
to
work
with
is
500
terabytes
of
data.
D
Of
course
you
don't
put
it
on
your
usb
key,
and
that
means
you
have
to
have
those
bigger
environments
and
that's
where
openshift
plus
ceph,
with
its
scalability,
come
into
play
because
you
are
about
to
have
those
5,
500
terabytes
of
data
residing
with
no
problem
to
self.
And
then
you
can
have
hundreds
of
data
scientists
using
this
data
set
this
central
data
set
in
object,
storage.
D
So
again,
that's
what
I
find
really
interesting
with
the
the
the
the
business
proposition
that
we
are
making
here.
It's
the
same
openshift
plus
odf
platform
that
you
can
use
both
for
your
data
science,
development,
day-to-day
usage
and
also
application
production.
You
don't
change
your
environment
and
it's
totally
portable.
So
let's
yep.
A
A
Like
that's
incredibly
powerful
right
like
to
train
models
and
be
like
okay,
taking
this
a
step
further
right,
like
this
patient,
had
covered
this
patient,
didn't
what's
the
difference
right
like
we're,
gonna
have
to
get
through
this
pandemic.
There's
gonna
be
some
aftermath
right
like
that
has
to
occur,
or
you
know
something
has
to
happen
for
these
people
that
are
dealing
with
the
after
effects
of
covet
right
research
is
being
done
there
hell.
A
My
wife
just
told
me
the
other
day
that,
like
some
group
in
europe
developed
like
from
mrna
just
like
the
covet
vaccines,
but
it's
like
pandemic
agnostic.
It
doesn't
matter
right.
So
it's
like
okay,
great,
like
how
did
you
do
that,
like
what
data
did
you
consume
to
figure
out
that
you
could
create
a
vaccine
to
fight
any
coronavirus?
A
D
Yeah,
it's
you
know,
that's
why
data
science
has
been
on
the
rise
for
the
past
few
years,
because
now
we
have
the
the
capabilities.
You
know
the
processing
power.
We
have
the
techniques.
We
have
everything
to
be
able
to
to
train
those
models
to
do
real,
aiml,
the
the
the
the
the
mathematics
part
of
this
is
really
old.
You
know
it
is
30
years
40
years
old,
but
until
mid
2010s
we
didn't
have
the
real
mean
to
be
able
to
leverage
that
okay,
that's
not
true
anymore.
D
D
Is
easy,
but
there
are
many
other
things
that
were
that
were
tried
for
coving
19,
for
example,
someone
trained
the
model
you
just
cuff
a
little
bit
on
the
phone
and
it's
able
to
detect
if
there
is
there
is
a
risk
of
not
yeah.
Here
it's
the
same.
It's
about
having
those
thousands
of
samples
of
people
coughing
and
training
a
model
to
be
able
to
detect
what
the
human
hair
cannot
do.
D
Obviously,
so
it's
those
tools
that
we
are
bringing
that
have
been
brought
into
the
world
that
were
reserved
for
the
past
few
years
to
you
know
some
specialists
and
it
was
really
difficult
to
use
really
difficult
to
to
to
to
implement.
D
Now
it's
a
little
bit
more
mainstream
and
by
bringing
it
on
top
of
openshift,
it's
even
even
more
mainstream,
because
it's
the
standard
platform
that
you
may
you
may
already
have
in
your
enterprise
and
most
customers,
I'm
working,
I'm
working
with
you
know
they
already
have
some
open
shift,
installation
or
some
openshift
knowledge,
and
now
they
are
interested
into
this
data7
thing
and
they're.
Oh
yeah,
we
have
this
data
and
maybe
we
think
it
will
be
useful.
How
can
we
do
this?
Well,
you
already
have
openshift.
D
C
Yeah
you
notice,
like
I,
have
a
bigger
data
set
than
I
expected
because
of
openshift.
You
can
use
a
machine
set.
You
scale
it
out
with
a
different,
instant
style
that
is
bigger.
You
don't
need
to
touch
anything
because
the
openshift
thing
is
handling
all
the
installation
and
once
you're
done,
you
can
get
rid
of
it
again.
Yeah.
D
And
I
I
have
customers,
I'm
working
with
they're,
doing
exactly
that
they
have
those
huge
processing
to
do
periodically
every
24
hours.
You
know
it
takes
tens
of
tens
of
machines
to
to
be
able
to
run
that,
but
of
course,
as
it
runs
in
the
cloud,
they
don't
want
to
keep
it.
You
know
running.
D
Now
it's
part
of
the
workflow
at
the
beginning
of
the
process.
They
will
just
increase
the
machine
set,
it
will
spawn
some
new
things.
Then
they
will
launch
the
process
using
those.
You
know
those
data,
science
tools
spark
and
the
rest.
They
will
do
the
processing
takes
a
few
hours
and
then,
when
everything
has
been
done,
just
you
know
just
scale
down
the
cluster
and
they
save
a
lot
of
money.
A
I
mean
this
is
really
reminding
me
of
a
time
where
I
worked
for
a
financial
services
like
marketing
company
and
the
data
science
team.
We
were
having
so
many
problems
with
like
infrastructure
and
all
this
other
stuff
right,
like
oh,
my
model,
didn't
finish
running
before
the
spot
instance
shut
off,
and
now
I
wasted
all
that
time
and
money
right
so
openshift
like
it
puts
all
the
power
in
the
people's
hands
is
what
it
feels
like
right
like
I
don't
have
to
worry
about
some
other
team
or
some
other.
You
know
configuration
touching
my
workloads.
A
D
Right
like
and
it's
you
know,
it's
really
close.
You
know,
we
know
we
we've
known
for
a
few
years
now,
all
the
benefits
that
openshift
can
bring
to
development.
Okay,
in
general,
all
this
flexibility,
agility
and
everything.
It's
about
bringing
the
exact
same
advantages
to
data
science.
It's
it's
really
well
suited.
Now
that
most
data
science
tools
would
run
it
will
run
to
containers.
D
D
That's
perfect
and
when
you
add
to
the
mix
ceph
with
odf,
then
you
bring
the
scalability
and
the
performance
that
you
need
for
data
science,
because
it's
not
only
about
you,
know,
storing
a
few
data
here
for
for
now,
more
and
more
people
we're
talking
about
petabytes
of
data
and
petabytes
of
data
that
have
that
have
to
be
that
have
to
be
processed
in
as
a
small
time
as
possible,
so
meaning
you
have
to
to
have
performance
on
the
on
the
storage
pod
and
that's
where
seth
shines.
D
You
know,
especially
with
the
predictability
of
performance,
this
perfectly
straight
line,
the
more
capacity
you
add,
the
exact
same
performance
you
get.
That's
that's
really
important
in
data
science,
you
don't
want
to
be
okay.
Now
that
I'm
reaching
over
one
petabyte
of
storage
of
my
specific
stuff,
the
performance
are
totally
dropping
because
the
storage
is
not
able
to
cop
to
keep
up
with
it.
We
don't
have
those
kind
of
issues
with
stuff,
so
it's
kind
of
bringing
best
of
both
worlds,
storage
and
kubernetes
to
data
science.
D
That's
why
I'm
so
excited
you
know
to
work
with
it.
It's
yeah,
perfect
patch,.
C
Yeah,
but
so
in
my
daily
life,
I'm
not
actually
handling
a
lot
of
big
data
or
I'm
not
wearing
lab
coats
or
anything.
So
one
thing
that
I
want
to
mention
about
jupiter
hub
is
it's
not
just
to
to
do
what
jim
showed
us?
You
can
also
do
regular
development
in
it,
and
maybe
chris
you
can
share
in
the
chat,
a
link
that
I
just
said.
C
There's
like
a
list
of
all
kinds
of
kernels
that
you
can
use.
You
don't
mention
it
in
the
beginning.
The
kernel
is
the
language
that
you
write
in
your
notebook
and
there
are
kernels
for
pretty
much
anything
I
like
to
to
see
that
there
are
go
kernels,
so
you
can
write
your
go.
Applications
in
the
jupyter
notebook
in
your
browser,
share
it
with
anyone
or
one
thing,
that's
very
popular
and
that's
pretty
cool.
C
Is
you
have
an
ansible
kernel
so
if
you've
ever
written
an
ansible
playbook,
you
know
that
it's
hard
like
you,
you
write
it
and
you
want
to
have
it
so
that
you
can
repeatedly
run
it.
You
want
to
test
it.
You
want
to
document
it.
You
can
start
writing
your
ansible
playbook
in
jupiter,
notebook
and
test
it
in
there,
and
then
you
can
immediately
see
what
it
does,
what
the
output
is
and
all
of
that
that's
pretty
cool.
A
D
It's
funny
because
you
know
hardcore
developers
will
always
swear.
You
know
by
their
own
id.
You
know,
but
when
you
come
from
a
different
background
or
you're,
not
you
know,
I'm
not
a
full-blown
developer.
That's
not!
Well!
That's
not
what
I
do.
I
have
the
same
approach
as
chris.
You
know
taking
the
best
thing
depending
on
what
you
want
to
do
and
for
instability.
I
I've
never
done
this
before,
but
I
have
tons
of
fanciable
playbooks
to
rewrite
to
deploy
those
demos
into
our
hpds,
but
I
totally
see
the
pointer.
Oh
no.
D
C
And
you
can
document
it
in
full
markdown,
so
it's
it's
also
great
if
you
want
to
teach
someone
to
learn
a
certain
language
or
ansible,
whatever
there's
also
bash
kernel.
So
if
you
want
to
teach
all
those
millennials
what
you
can
all
do
in
bash,
then
you
can.
You
can
write
a
notebook
and
make
it
fancy
with
the
markdown.
Tell
them
exactly
hey.
This
is
a
for
loop
and
that's
how
you
do
it
and
they
can
run
it
and
see
it
immediately.
What
it
does,
what
the
output
is.
A
D
A
A
So
for
folks
that
aren't
aware
I
have
an
intern
this
summer
and
I'm
very
happy
about
that,
because
he
gets
to
take
notes
and
tell
me
what
I'm
doing
wrong,
because
he
has
production
like
this
kind
of
production
experience
in
his
background
so
or
not
this
kind,
but
like
movie
production
experience,
so
I'm
sure
he's
like
blushing
or
whatever
in
the
background
but
yeah
I
like
talking
about
my
intern,
so
yeah,
some
some
mood
music
as
I'm
searching
for
data
here
aiml,
is
that.
C
Stuff
in
it,
I
I
like
them,
because
I
I
have
this
app
on
my
phone
when
I
was
a
student,
I
I
didn't
always
have
everything
in
my
kitchen,
so
sometimes
I
only
had
a
weight.
So
I
wanted
to
know
okay,
how
much
does
100
grams
flour
weight
right,
yeah.
C
Milliliters
of
milk
is
that,
like
whole
milk.
C
D
A
Exactly
all
right,
I
don't
I
mean,
let's
not
belabor
the
fact
we
are
approaching
the
top
of
the
hour.
Is
there
anything
else
we
want
to
talk
about
before
we
sign
off?
We
don't
we
don't
have
any
questions
in
chat
so
or
at
least
I
haven't
seen
any.
I
hope
I
haven't
lost
any
by
just
not
looking
at
youtube
and
twitch
directly.
Okay,
no,
I
haven't
all
right
so
yeah,
like
anything,
you
want
to
sign
off
with.
D
I
would
reiterate
that
you
know
for
the
the
part
I'm
working
on,
which
is
data
science
and
data
engineering.
The
important
thing
you
have
to
consider
when
building
the
thing
is
the
platform.
Okay,
it's
not
the
tools
only
by
themselves.
The
tools
are
easy
to
figure
out,
but
it's
a
platform,
and
here
running
those
kind
of
workloads.
D
You
know
aiml
or
statistical
workloads
or
pure
data
analysis
on
top
of
openshift,
with
everything
that
got
with
it,
you
know
odf
all
the
other
components
that
we
have
several
ass
and
and
so
on,
that
that
makes
a
great
platform.
So
that's
that
would
be
my
takeaway
from
this
beautiful.
A
A
I
appreciate
your
time
today
as
always
later
on
the
channels
11
o'clock,
eastern
1500
utc
we're
going
to
be
talking
about
the
value
of
get
ops
and
we're
going
to
have
some
guests
on
so
please
tune
in
for
that,
and
until
next
data
science
or
data
services
office
hour,
we
will
see
you
then
stay
safe
out
there.
Everybody
for
real.