►
From YouTube: OpenShift Commons ML Briefing: KubeFlow On OpenShift with Subin Modeel and Willl Benton (Red Hat
Description
Subin Modeel and Will Benton (Red Hat OpenShift) demo Kubeflow on OpenShift to the Machine Learning on OpenShift SIG of OpenShift Commons
A
A
Recently,
some
folks
are
read
out
of
them,
working
to
get
it
up
and
running
on
on
open
shift
and
I've
asked
William
Benton
to
do
a
brief
introduction
for
those
who
aren't
aware
of
what
it
is
and
what
the
relevance
is
for
us
and
then
subin
to
take
us
through
a
demo
of
what
he's
been
able
to
get
going
on
open
shift.
So
far,
so
with
that
William
over
to
you,
sorry
will
over
to
you.
B
The
more
people,
use
them
to
provide
improved
functionality,
and
if
you
think
about
what
these
applications
look
like,
they
look
a
lot
like
contemporary
applications,
except
that
they're
also
dealing
with
data,
their
training,
predictive
models
based
on
that
data
and
there
you
know,
dealings
may
be
dealing
with
a
wider
range
of
data
sources
than
a
conventional
database
back
without
application.
So
all
of
the
things
you
have
to
deal
with
in
a
conventional
application
still
apply,
except
instead
of
just
scaling
out.
They
may
be
a
web
proxy
or
duplicating
or
replicating
at
sequel
database.
B
B
These
are
going
to
be
turned
into
apps
by
machine
learning,
engineers
or
app
developers,
and
if
we
put
another
way,
we
see
one
of
the
big
problems
with
turning
machine
learning
into
a
production
application.
These
data
scientists
are
working
in
one
environment,
handing
it
off
to
other
teams.
That's
then
going
to
sort
of
port
it
over
to
work
in
another
environment
and
I
bet.
Those
of
you
who
work
with
data
scientists
or
who
are
data.
B
Scientists
have
been
in
a
position
in
the
past
to
get
a
notebook
from
a
colleague
this
either
doesn't
run
or
doesn't
produce
the
same
results
that
your
colleague
expects
and
I
can
I
can
say
this
because
I've
gotten
those
notebooks
from
colleagues
and
I
think
I've.
Also
given
those
notebooks
to
colleagues.
The
another
problem
with
operationalizing
use.
Machine
learning
frameworks
is
that
they
might
depend
on
specialized
libraries
frameworks
and
not
everyone
is
going
to
have
installed
frameworks
that
aren't
going
to
be
packaged
to
be
distribute.
B
Another
problem,
in
addition
to
the
monitoring
the
scale
challenges
you
have
with
conventional
apps,
because
that
intelligent
apps
also
need
to
monitor
the
performance
of
models.
As
you
know,
a
trained
model
captured
something
what
the
data
it
was
trained
on,
but
it's
new
data
drifts
from
the
training
data.
We
can
get
silent
failures,
so
we
need
to
have
some
way
to
monitor
the
performance
of
our
models.
In
addition
to
the
performance
of
our
application,
now
open
shifts
can
make
some
of
this
easier
by
providing
a
really
nice
work.
B
Flow
for
reproducible,
builds
and
and
reproducible
to
plan
from
a
git
repository
through
a
source
to
image
builder,
and
you
can
basically
ensure
that
if
you
use
your
notebooks
with
a
source
damage
pipeline,
that
your
colleagues
will
be
able
to
reproduce
them
and
run
them
on
open
chip
in
the
cloud.
But
the
problem
is
you
still
don't
get
past
the
issue
of
having
a
common
library
of
these
frameworks
that
are
packaged.
B
So
we'll
talk
a
little
bit
more
about
that
in
a
second,
but
I
want
to
introduce
coop
flow
at
this
point,
and
the
idea
is
for
coop
flow
is
to
sort
of
view
for
machine
learning
frameworks
on
kubernetes.
What
kubernetes
does
for
application
registration
in
general?
The
idea
is
that
we
have
custom
resource
definitions
for
Jupiter
hub
and
so
that
you
could
have
a
multi-user
collaborative
notebook
tensorflow
so
that
you
can
train
conventional
or
deep
learning
models
and
tensorflow
serving
so
that
you
can
actually
deploy
those
models
as
components
or
production
application.
B
A
coop
flow
is
a
new
project
that
was
founded
at
Google.
It
was
just
announced
last
month
at
guca,
but
it's
already
attracted
a
ton
of
excitement
and
attention,
and
the
promise
of
coop
flow
is
really
that
all
of
these
frameworks
and
some
of
the
hardware
drivers
that
you
need
to
actually
deploy
these
frameworks
can
get
good
performance
are
going
to
be
packaged
up
so
that
you
can
run
them
either
on
your
laptop
and
a
local
kubernetes
cluster
or
in
the
cloud
at
scale.
B
So
if
you've
been
following
machine
learning
on
open
ship
for
some
time,
you
might
have
heard
of
efforts
like
rat
and
licks
do
which
a
number
of
people
on
this
call
that
are
involved
with
or
the
piece
a
program
and
the
really
nice
thing
about
coop
flow
is
group
flow
is
another
another
spin
on
these
approaches.
So
kupo
uses
custom
resource
definitions
which
weren't
available
when
we
started
working
on
that
emilich
style,
and
it's
it's
another
way
to
get
this
sort
of
the
sort
of
capability
into
kubernetes
and
open
ship.
B
So
we're
really
excited
that
that
more
people
are
interested
in
machine
learning
entities
and
open
ships
and
we're
involved
in
this
community
and
we're
looking
at
how
we
can
take
the
work
that
it's
already
been
done
and
integrate
it
and
benefit
from
the
Google
community
as
well.
So
with
that
I'll
hand
it
back
over
to
Matt.
A
C
Okay,
okay,
so
I
what
I
have
set
up
with
a
GPU
on
a
single
OpenShift
cluster,
so
I
have
a
single
node
in
this
open
shaped
cluster
and
I
have
a
GPU,
NVIDIA
GPU
in
this
cluster.
So
I
really
wanted
to
show
a
demo
of
one
of
the
component
of
cube
flow,
which
is
the
TF
tensorflow
job
operator
and
show
you
a
demo
of
an
application
doing
that
or
anyone
on
the
GPU
saving
the
model
into
a
volume,
and
you
know
doing
a
tensor
flow
serving
from
from
that
particular
volume.
C
C
C
See
that
it
will
create
conflict
map
called
PF,
zanpakuto
context,
it
will
create
a
deployment
conflict
800,
see
oxy
it
port.
It
will
create
a
job
operator
God
in
OpenShift,
so
coming
back
to
open
shift.
So
you
can
see
here
that
I
have
the
job
offer
at
a
pod
and
if
I
look
at
the
logs,
I
should
be
able
to
see
that
controllers
has
started
properly
and
what
I'm
going
to
do
now
do
next
is
to
submit
CID
instance,
which
is
RTF
chopped.
C
So
when
you
submit
a
TF
job.
This
is
how
the
template
would
look.
You
need
to
pass
in
image
given
in
the
flicker
spec
and
you
can
or
you
cannot
provide
it
if
you
have
GPU
or
provide
the
NVIDIA
GPU
the
source
limits,
and
if
you,
if
you
don't,
have
a
GPU,
you
don't
need
to
pass
this,
but
if
you
have
a
GP
just
possible
so
that
the
GPU
is
allocated
to
this
particular
pod,
so
I
have
with
me
some
templates
for
distributed
job.
C
This
is
a
tapo
for
distributed
application,
doing
really
nothing
much
just
as
a
master
worker
and
the
default
parameter.
Server
and
I
have
one
of
the
instance
of
sample
job,
which
is
just
having
the
image
which
is
doing
a
mathematics.
Multiplication
inside
so
I
will
go
ahead
and
I
am
just
create
drop.
C
C
C
We
get
rehab
job
will
show
you
that
our
sample
java
has
been
created.
This
is
a
cre
instance
and
I
come
back
to
OpenShift
and
see
in
the
operator
logs
that
that
particular
the
job
has
been
submitted
and
the
operator
is
creating
the
pods
and
if
I
look
at
the
available
pods
here,
I
can
see
that
hurts
sample
TF
job
the
pod
has
started
and
Hatter
has
already
computed.
If
I'd
log
go
to
that
particular
pod
and
look
at
the
logs
I
can
see
the
magic
summer
clicking
output
shown
here.
C
C
When
the
minister
job
is
taking
a
little
bit
more
time,
I
think
it's
related
to
the
computations
involved
in
it.
So
that's
about
it
for
my
demo
to
show
what
are
the
components
of
cube
flow,
which
is
the
TF
Java
operator
working
on
open
chip,
the
it
is
internally
using
a
service
account
which
is
having
the
cluster
admin
privileges.
That's
how
it
works.
Thank
you.
So
much.