►
From YouTube: Data Science as a Managed Service Audrey Reznik (Red Hat) OpenShift Commons Gathering 2021
Description
OpenShift Commons Gathering 2021
Data Science as a Managed Service on OpenShift
Audrey Reznik (Red Hat)
https://commons.openshift.org/index.html#join
A
Okay,
I'm
standing
between
you
guys
and
lunch.
My
speaker
notes
didn't
come
up,
so
I'm
going
to
wing
this
so
that
we
can
get
done
on
time
and
everybody
can
eat
so
good
morning.
My
name
is
audrey
resnick,
I'm
a
senior
principal
software
engineer
and
data
scientist
with
the
red
hat,
open
chief
openshift
data
sciences
team,
and
I'm
going
to
talk
to
you
about.
A
What's
the
deal
with
managed
services
and
model
delivery,
so
if
you're,
at
a
data
scientist
or
if
you
work
with
a
data
scientist
supporting
them,
you'll
know
that
when
they
create
the
model,
there's
a
lot
more
than
to
creating
the
model
you
want
to
be
able
to
get
data
to
the
model
you
want
to
be
able
to
deploy
the
model
monitor
it.
So
we're
going
to
go
into
some
of
those
items,
we'll
take
a
look
at
a
model's
role
in
an
intelligent
application.
A
A
We'll
take
a
look
at
who
uses
managed
services,
and
surprisingly,
it's
not
just
the
data
scientists
when
we're
talking
about
intelligent
application
creation
and
then
we'll
kind
of
look
at
how
managed
services
help
you,
along
with
the
model
delivery
where
you
find
them,
and
finally,
I'm
going
to
click
through
a
very
quick
demo
of
the
red
hat
openshift
data
science
platform
and
its
managed
services
that
are
available.
A
So
when
we
take
a
look
at
intelligent
applications,
we
have
to
take
a
look
at
the
model's
role
in
them
now.
Intelligent
applications
by
themselves
are
not
just
one
small
thing:
they
are
a
distributed
system,
so
there
are
things
that
work
in
conjunction
all
the
way
from
data
verification
serving
the
infrastructure,
go
ahead,
going
ahead
and
doing
some
configuration
and
when
we
go
ahead
and
take
a
look
at
these
intelligent
applications.
A
We'll
see
that
the
model
code
is
just
a
very,
very
small
part
of
that
and
the
model
code
or
the
model
itself
has
to
be
able
to
make
its
way
through
these
district.
This
distributed
these
distrib
distributed
infrastructure,
so
it
has
to
have
a
way
to
interact
with
things
such
as
feature
and
extraction.
It
has
to
be
able
to
interact
with
some
of
the
analysis
tools
and
you
look
at
that
and
go
wow.
That
could
be
like
really
complicated.
A
A
So
when
we
take
a
look
at
managed
services,
we
can
divide
them
into
four
groups
or
four
categories,
and
within
those
four
categories
we
actually
will
have
a
number
of
personas
that
are
going
to
interact
with
them.
So
if
we
take
a
look
at
the
first
category,
we
want
to
go
ahead
and
gather
and
prepare
the
data.
So
that
means
we're
going
to
look
at
data
storage,
data
lakes,
data,
warehousing
stream
processing,
and
it's
really
our
data
engineers
that
are
going
to
get
totally
excited
about
this
category
of
managed
services.
A
Then
we
go
ahead
into
actually
developing
the
model.
So
when
we
go
ahead
and
develop
the
model,
we're
going
to
bring
the
data
scientists
in
and
they're
going
to
go
ahead,
and
you
know
create
the
model
work
with
the
algorithms
that
they
need
to
solve
the
particular
business
problem
that
they're
trying
to
solve.
A
A
Is
it
giving
us
some
of
the
answers
that
we
thought
we
were
going
to
get
or
do
we
have
to
correct
it
and
retrain
it,
and
that's
where
both
the
the
data
scientists
and
the
application
developer
or
machine
learning
engineer
will
come
into
conjunction
now
having
all
of
these
services
and
having
them
available
for
everybody
can
actually
be
a
nightmare
for
it
operations
right.
A
You
want
to
give
your
users
the
latest
bells
and
whistles,
but
at
the
same
time
you
want
some
sort
of
platform,
or
you
want
some
sort
of
services
that
you
know
you
can
be.
How
do
I
say
it
very
comfortable
with
you
can
depend
on
those,
and
you
know
that
they're
not
going
to
help
create
any
outages
when
they're
actually
trying
to
help
create
a
model.
A
So
let's
go
and
take
a
look
at
kind
of
the
model
life
cycle
and
where
these
managed
services
fit.
In
now,
remember
I
told
you
there
were
four:
we
wanted
to
kind
of
extract
and
transform
the
data,
so
we
can
actually
go
ahead
and
instead
of
building
something
ourselves
with
the
red
hat,
openshift
data
science
team-
we
said
well,
wouldn't
it
be
cool
if
we
could
just
go
ahead
and
invite
a
whole
bunch
of
different
vendors
open
source
vendors
in
so
that
that
way,
you
have
a
lot
of
choice.
A
So,
when
you're
extracting
and
transforming
the
data
yeah,
you
could
go
ahead
and
use
apache
kafka
streams
to
go
ahead
and
pull
in
some
of
your
data.
But
wouldn't
it
also
be
cool
to
use
somebody
like
starburst
galaxy
so
that
you
could
go
ahead
and
curate
your
data.
You
really
want
to
unlock
that
value
of
your
data
by
making
it
very
fast
and
easy
for
you
to
be
able
to
access
that
data
across
the
hybrid
cloud.
A
Next,
we
want
to
take
a
look
at
creating
models,
so
we
want
to
be
able
to
either
use
a
jupiter
notebook
or
something
like
that
for
some
exploration,
but
maybe,
at
the
end
of
the
day,
we're
really
interested
in
what
anaconda
has
to
has
to
offer,
because
they
might
have
an
extensive
set
of
data
science
packages
or
libraries
that
we
could
use
in
our
jupiter
notebook
projects
when
we're
going
ahead
and
doing
some
of
the
experimentation
experimentation
we're
coming
up
next,
so
another
one
of
our
internet
service,
vendors
or
is
fees,
is
ibm
watson
studio.
A
Now,
when
you're
going
ahead
and
you're
done
with
your
model
and
your
testing
and
everything
done
with
your
experimentation,
what
you
want
to
do
is
actually
go
ahead
and
deploy
those
models
as
actual
services.
So
can
you
can
use
an
isb
such
as
selden
deploy
and
it's
going
to
really
help
you
simplify
and
accelerate
the
process
of
deploying
and
managing
your
machine
learning
models?
A
A
Now
this
whole
path
that
you're
seeing
this
curvy
path
is
kind
of
the
the
model
operation
life
cycle.
I
want
you
to
keep
that
in
mind,
because
we
need
to
see
where
would
these
data
services
or
sorry
managed
services
actually
live,
so
we're
going
to
start
with
the
red
hat
manage
cloud
platform.
We
want
to
have
a
platform
that
is
very
stable.
A
A
Now,
on
top
of
that,
we
have
what
we
call
our
isv
managed
cloud
services.
So
these
are
internet
service,
vendors
such
as
starburst,
galaxy
or
anaconda
that
we
have
into
our
red
hat
openshift
data
science
platform,
so
that
you
can
use
some
of
the
open
ship
services
that
they
have
and
then
we
have
customer
manage
isv
software.
So
if
you
wanted
say,
for
instance,
in
a
model
to
take
a
look
at
quantization
or
go
and
take
a
look
at
inferencing,
you
could
use
intel
openvino,
which
I
apologize.
A
So
this
whole
red
hat
openshift
data
science
offering
actually
sits
on
aws.
So
it's
a
cloud
offering
right
now
and
what
I'm
going
to
do
is
probably
go
through
a
demo.
I
think
I
have
enough
time
for
that.
At
least
I
won't
have
to
have
it
as
a
live
demo,
but
one
thing
that
I
wanted
to
mention
about
this
entire
platform:
is
we
have
the
depth
and
scale
basically
without
lock-in,
so
the
capabilities
that
we
have
are
really
in
conjunction
with
red
hat
and
our
service
partners
that
we
brought
into
this
ecosystem.
A
A
Okay,
this
is
going
to
be
very
quick.
I'm
glad
I'm
going
to
be
clicking
through
it,
so
one
of
my
colleagues
actually
worked
with
the
london
city,
metro
and
the
london
city.
Metro
wanted
to
go
ahead
and
to
be
able
to
monitor
cars
within
the
metro
area.
They
wanted
to
be
able
to
recognize
license
plates
and
see
is
that
car
able
to
park
here
does
that
car
actually
have
a
tag
so
that
it
can
use
these
certain
metro
ways.
Is
this
car
containing
somebody
who
did
something
bad
that
we
want
to
track?
A
Okay,
get
you
get
that
idea.
So
here
I
have
a
picture
of
a
car.
What
we're
hoping
that
the
machine
learning
model
will
do
is
be
able
to
take
that
license,
plate
to
write
it
and
to
actually
grab
the
plate
numbers
and
then,
once
we
get
those
plate
numbers,
we
can
use
apache
kafka
to
go
ahead
and
store
that
information,
possibly
get
an
amber
alert
in
the
meantime,
we'll
be
pulling
a
lot
of
those
license
plates
into
our
various
warehouses
and
also
into
our
vehicle
registration
database
and,
of
course,
the
city
of
london.
A
Their
metro
services
can
then
perform
more
business
analytics
on
the
data
that
we've
gleaned.
So
what
does
this
look
like
if
we
are
trying
to
use
this
red
hat
openshift
dedicated
platform?
Well,
because
the
red
hat
openshift
dedicated
platform
sits
on
top
of
aws?
You
are
going
to
need
a
cluster
to
actually
use
red
hat
openshift
data
science
or
what
we
affectionately
called
roads,
and
I
know
I'm
going
to
hell
for
giving
you
the
acronym
there.
A
So
we're
going
to
click
on
the
roads
menu
option
and
then
what
you're
going
to
see
is
basically
a
menu
of
the
managed
services
and,
of
course,
there
will
be
managed
software
available
for
you
to
you
actually
work
with
in
the
background
you'll
see
that
I've
chosen
one
of
those
items
that
is
jupiter
hub
notice,
the
other
ones.
When
I
go
ahead
and
hit
the
explore
icon.
These
are
all
the
different
managed
services
that
I
can
go
ahead
and
choose
to
use
and
there's
plenty
of
documentation.
A
A
So
I'm
going
to
go
ahead
and
choose
jupiter
hub
and
what
that's
going
to
allow
me
to
do
is
to
go
into
a
jupiter
notebook
image,
I'm
going
to
take
this
notebook
image
and
then
basically
wrap
it
up
in
a
container
so
that
I
can
deploy
it
on
openshift.
But
I
want
to
customize
it
for
myself.
So
the
first
thing
that
I'm
going
to
do
is
say:
okay,
if
I'm
doing
some
machine
learning,
what
am
I
going
to
be
working
with?
A
Am
I
going
to
be
using
just
a
standard
data
science
package
which
may
contain
things
like
numpy
or
pandas
or
scikit-learn,
or
am
I
going
to
want
to
work
with
something
such
as
this
license?
Plate
detection,
where
I
may
have
to
use
a
lot
of
the
pie,
torch
libraries
that
are
available,
so
I'm
going
to
click
on
pi
torch?
A
A
A
Now,
if
I
want
to
go
ahead
and
deploy
this
application,
you're
not
going
to
deploy
this
application
as
a
jupiter
notebook,
I
know
people
have
done
that.
Please
don't
do
that.
What
we
want
to
do
is
really
package
the
model
as
an
api,
in
this
case
we're
going
to
use
flask
to
help
us
accomplish
this,
then
we'll
go
ahead
and
launch
our
server
to
see.
A
If
we've
been
able
to
successfully
deploy
something
internally
now
we'll
test
that
flask
app,
we
have
a
status
of
okay
and
now
we're
ready
to
go
back
into
our
openshift
dedicated
environment,
where
we
first
launched
the
roads
platform
from
to
see
if
we
can
actually
go
ahead
and
deploy
this
now
on
openshift.
A
A
Sometimes
this
can
be
very
hard
for
a
data
scientist
because
they
like
to
save
all
of
their
code
on
their
local
laptop.
But
you,
as
the
machine
learning
engineer
who
is
working
with
this
data
scientist,
are
going
to
encourage
them
to
check
their
code
in
to
get
yes,
you
are
all
right,
so
we're
going
to
go
ahead
and
now
from
the
the
get
option
we're
going
to
have
some
other
things
that
we
can
do,
such
as
the
resources
and
advanced
options.
A
What
we're
really
interested
in
is
making
sure
that
we
click
on
the
routing
option,
because
we
want
the
route
of
this
api.
We
need
to
be
able
to
access
the
api
from
another
location
so
that
very
important
route
we're
going
to
go
ahead
and
copy
that
probably
test
that
within
a
browser
to
make
sure
that
we
can
actually
hit
that
api.
A
And
then,
if
you
want
to,
you
can
go
back
into
a
jupiter
notebook
and
then
using
that
route
you
can
use
either
a
curl
or
you
can
invoke
a
web
request
to
actually
see
if
you
can
hit
that
api
successfully
and
then
to
test
your
deployed
aiml
application.
You
can
take
that
route
and
even
go
back
into
a
jupiter
notebook
and
you
can
put
in
the
actual
api
address
you're
going
to
give
it
an
image.
A
A
So
all
of
this
is
on
red
hat,
managed
cloud
services
and
again
this
demo
was
concentrating
on
the
red
hat
openshift
data
science
portion
and
remember
we're
trying
to
with
red
hat
openshift
data
science
is
to
have
a
platform,
that's
fairly
open,
so
that
if
you
have
a
specific
open
source
vendor
that
you
like
or
if
you
have
specific
requirements
where
you
want
to
use
not
only
red
hat
products
but
open
source
products,
you
should
have
that
choice
to
be
able
to
do
that.
A
So
what
did
we
learn
today?
Well,
we
learned
that
managed
services
or
in
particular,
managed
services
for
data.
Science
are
really
important
to
a
data
scientist
they're,
just
not
going
to
sit
there
with
their
their
model.
They
have
to
have
some
way
of
actually
going
ahead
and
deploying
their
model
training
it
testing
it
and
they
have
to
do
it
in
such
an
easy
manner
that
they
can
accomplish
that
task
themselves
and,
of
course,
I
t
operations
will
be
there
to
help
them
with
that
journey.