►
Description
David Aronchick (Google) gives an introduction to Kubeflow to the Machine Learning on OpenShift SIG of OpenShift Commons.
A
So
I'll
try
to
be
brief.
This
is
this
is
an
introduction
to
key
flow
again.
The
this
is
to
be
very,
very
crystal
clear.
This
is
the
open
shift.
Sig
is
not
my
intent
to
drive
this,
but
with
Red
Hat,
we've
collaborated
to
help
make
tube
flow
work
excellently
on
open
shift
and
and
obviously
that's
a
work
in
progress,
but
but
it
is
absolutely
part
of
our
core
goals
that
we
make
this
happen,
and
this
is
a
just
a
brief
introduction
to
what
we're
doing
at
at
the
in
the
cube
flow
project.
A
So
high
level,
you
know
we
come
up
to
you
know
everyone's
heard
about
ml,
it's
very
very
attractive
because
it
really
helps
you
solve
a
lot
of
problems
where
it's
it's
very
hard
to
describe
what
the
answer
is
in
this
case
I.
You
know
it's
this
interesting
example
at
Google
we
actually
use
tensor
flow
to
to
and
and
some
of
the
deep
mind
work
to
actually
make
massive
improvements
and
cost
savings.
B
A
A
This
is
what
it
looked
like
before
we
turned
the
ml
control
on
and
we
saw
a
huge
reduction
in
in
overall
cost,
and
then
we
turned
it
off
and
it
came
back
and
so
again
you
know
we're
very,
very
excited
about
ml
generally
and
we
want
to
help
bring
it
to
market.
The
problem
is
that
you
know
for
all
the
magical
ml
goodness
in
the
world.
Most
people
are
over
here
and
in
between
the
two
there's
a
lot
of
pain
and
the
ping
could
be
adopting
your
existing
processes.
A
It
could
be
understanding
whether
or
not
ml
actually
has
an
impact
on
you
and
and
whether
or
not
your
problems
are
actually
solvable,
vo
ml,
and
so
what
we
really
want
to
do
is
help
people
bring
ml
into
their
business
without
as
much
disruption
as
possible.
You
want
to
do
that
in
a
cloud
native
way
and
by
cloud
native
we
mean
you
know
these
core
components
of
being
cloud
native,
composable
for
and
scalable.
In
this
case,
composability
means
the
following:
it's
you
know.
A
Obviously,
building
a
model
is
just
the
very
smallest
poor
part
of
solving
your
problem
with
ml.
In
fact,
when
you
look
at
it,
it
ends
up
being
all
the
things
around
ml
that
end
up.
Taking
up
the
majority
of
your
work
and
and
these
components
are
you
you
want
to
be
able
to
compose
them
such
that
you
can
use
the
tools
that
make
sense
for
you.
We,
we
are
not
in
the
fan
of
delivering
a
single
stack
and
only
that
stack
that
works.
We
know
that
every
enterprise
in
every
use
case
is
different.
A
We
can
skip
by
that
and
then
finally,
you
want
scalability
ml
workloads
tend
to
be
extremely
burstable.
As
you
know,
when
you're
training,
you
want
every
possible
cycle
you
can
get
and
then,
when
you're
done,
you
want
to
shut
everything
down
and
then
beyond
that
you
want
to
be
able
to
scale.
You
know
kind
of
in
a
linear
fashion,
so
that
if
you
you're
not
training
fast
enough
or
you're
not
able
to
support
it,
you
want
to
be
able
to
just
you
know,
directly
scale
up
the
machines.
A
When
we
looked
at
these
problems,
we
basically
said:
what's
pretty
good
about
this,
you
know
pretty
good
at
solving.
These
is
containers
and
kubernetes.
It
already
supports
a
high
scale.
It
is
also
highly
portable.
It
runs
in
many
many
different
locations
on
Prem
in
the
cloud
you
know
on
OpenShift,
no
matter
where
it
is.
It
supports
those
things
it's
highly
portable,
meaning
you
can
take
workloads
and
as
long
as
you
know,
they're
containerized
and
described
properly.
They
run
everywhere
and
it's
very
scalable.
It's
goes
up
to
thousands
of
nodes,
including
things
like
accelerators.
A
So
that's
great,
except
if
you
want
to
use
ml
on
kubernetes,
it
often
requires
understanding
a
lot
of
things
and
that's
very
painful,
and
certainly
not
something
in
a
data
scientists,
core
job
description,
and
so
that's
why
we
have
introduced
cube
flow.
The
idea
with
cube
flow
is,
we
want
to
make
it
easy
for
everyone
to
learn,
deploy
and
manage
portable
distributed
ml
on
kubernetes
everywhere.
A
So
this
is
our
core
vision
and
we
are
a
loosely
bound
community.
We're
happy
to
change
anything,
including
the
vision.
If
the
community
decides,
as
I
mentioned,
it
is
highly
composable.
It
allows
you
to
basically
create
these
components
and
wire
them
together,
using
nothing
more
than
yamo
it's
very
portable.
It
works
everywhere
that
kubernetes
does
and
uses
only
native
kubernetes
api
x'.
We
don't
make
any
changes
and
then,
finally,
it's
quite
scalable,
so
we're
just
getting
started
right
now
in
the
Box
we
have
a
Jupiter
hub.
A
We
have
a
tensorflow
training
controller
and
a
tensorflow
serving
deployment,
and
then
the
wiring
to
make
it
work
on
kubernetes
everywhere
and
so
in.
The
Box
is
just
these
components
out
of
the
over
a
pipeline
or
stack
or
things
that
we
were
talking
about.
We
are
actively
talking
to
folks
everywhere.
A
You
already
mentioned
and
I
want
to
point
out
the
the
great
data
science
being
done
by
a
pachyderm
and
we're
working
with
them,
as
as
close
as
we
can
to
try
and
help
integrate
and
I
should
also
say
that
all
of
these
steps
should
are
not
necessarily
required
to
run
in
queue
flow
or
on
kubernetes.
We
are
perfectly
okay
and
in
fact
expect
that
many
of
these
components
will
stay
outside
of
our
process
as
asbestos
for,
for
the
foreseeable
future,
kubernetes
doesn't
care
where
the
resource
ultimately
lives.
Excuse
me
cube
flow.
Basically,
it's
you
know.
A
A
So
using
cube
flow.
This
this
is
what's
necessary
to
set
it
up.
There's
some
boilerplate
stuff
here,
the
stuff
at
the
top.
Basically
is
just
initializing
a
few
variables.
This
K
S
stands
for
case
Annette.
You
do
have
to
install
that.
It's
just
a
packaging
system,
then
you
install
the
components
that
you
like
so
first.
A
You
just
use
this
registry
that
we
have
hosted
here
and
you
can
install
packages
in
this
case,
we're
installing
three
packages
core
serving
and
TF
job,
and
then
you
using
the
namespace
you
deploy
these
components
to
to
your
ultimate
destination,
whatever
kubernetes
cluster
you're.
Looking
at
whether
or
not
that's
your
on-prem,
whether
or
not
that's
your
in
cloud,
whether
or
not
say
your
laptop,
you
apply
it
and
now
cute
flow
is
up
right.
Let's
say
you
don't
like
tensor
flow,
that's
fine!
You
cross
out
one
and
you
install
you
know
SK,
learn
arbitrarily.
A
Maybe
you
don't
like
TF
serving
that's
fine
too.
You
would
swap
that
out.
So
again,
these
are
all
this
is
the
vision,
certainly,
but
you
know,
and
and
unfortunately,
if
those
and
then
little
P
up
magic
there,
those
those
packages
don't
yet
exist,
but
we
are
working
as
close
as
we
can
with
various
folks
to
get
things
just
like
that
in
the
system.
A
So
that's
it.
Yes,
that
is
it
for
now.
Our
goal
is
really
to
take
this
in
whatever
direction
the
ml
community
and
the
OpenShift
community
would
like
us
to
take
it
in
our
goal
is
really
to
solve
the
boring
gross
annoying
ml
problems
that
are
out
there
right
now,
so
that
folks
can
work
on
the
higher
level.
So
what's
next,
we
have
been
doing
a
number
of
community
meetings.
A
We
would
love
to
have
you
if
you'd
like
in
those
community
meetings
for
cube
flow
and
and
bring
the
open
shift
perspective,
and
what
it's
like
running
on
open
shift,
but
really
we're
just
nailing
down
the
governance
proposals.
We've
moved
it
out
of
the
Google
github
repo.
That
was
a
frequent
request.
It
is
in
the
kubernetes
or
exhuming
it's
in
its
own
organization
and
repo
right
now
it
is
entirely
open
source.
It
is
Apache.
A
A
I
mentioned
already
other
popular
toolkits
bark
ml
extra
boost,
SK
learn.
We
want
to
do
auto-scaling
serving.
We
want
to
do,
tensorflow,
transform
model
analysis,
and
we
really
really
want
this
last.
This
third
bullet
point,
which
is,
if
you
are
trying
to
use
it
and
finding
problems
in
any
way,
shape
or
form
or
you're
trying
to
contribute,
and
it
just
doesn't
line
up,
please
reach
out
to
me
reach
out
to
the
community.
You
know
we
want
this
to
be
incredibly
open
and
and
useful
from
day
one.
A
We,
the
next
major
milestone,
we
think,
will
be
cube.
Con
EU,
that's
the
beginning
of
May
and
we
hope
to
reach
a
fairly
stable
state
by
that
time.
You
know
we're
not
using
labels
like
alpha
and
beta,
but
I
would
certainly
consider
things
very
early
right
now
and
by
by
May.
We
hope
to
be
in
in
much
more
production
use
case
and
we've.
A
You
know
I
personally
know
of
at
least
eight
sessions
that
have
been
submitted
relative
to
cube
Flo
@
@q
con
and
you
know
we'd
love
to
meet
up
and
talk,
and
then
finally,
you
tell
us
that
whatever
direction
things
are
going,
you
know
we
we
want
to
do
this
in
a
very,
very
community
driven
way.
We
also
want
to
make
sure
you
know
I
come
from
the
kubernetes
world.
We
want
to
make
sure
we
do
this
in
a
very,
not
Google
way.
A
You
know
nothing
made
us
happier
at
Google,
then,
when
more
than
50%
of
the
contributions
came
from
not
Google,
I
would
love
nothing
more
than
to
have
the
same
into
flow
as
soon
as
humanly
possible
and
and
whether
or
not
that's
other
cloud
providers
like
Asher
or
AWS
or
digital
ocean,
or
you
name,
it
could
be
ISVs
people
building
in
inside
cube
flow.
You
know
OpenShift
Red,
Hat,
all
those
various
folks.
A
A
B
A
A
So,
for
example,
with
Jupiter,
we
want
to
make
sure
that
Jupiter
communicates
with
whatever
ml
framework
is
being
used
in
a
standard
way,
and
that's
not
on
Jupiter
to
do
it's
up
to
us
as
a
community
to
say:
okay,
if
you're
gonna
respond
and
be
an
endpoint
for
Jupiter,
you
should
make
these.
You
know
this
available,
whatever
you
should
use
to
Benes
native
service
discovery.
A
B
Yeah
I
mean
that
that
that
helps
put
it
in
perspective
somewhat
I'm
gonna
take
another
tact
and
just
say
from
like
if
I'm
a
machine
learning
data
scientist,
whatever
I,
might
use
PI
torch,
so
I
could
learn.
Tensorflow
dump
ice,
I
PI
it
like
pretty
much
are
you
know
a
variety
of
things
and
from
an
end-user,
and
this
is
not
like
the
streaming
data
production
spark
kind
of
workflow
but
I'm
at
my
desk
doing
analysis
on
data.
There
are
things
like
paper
space
which
you
can
go
and
pretty
much
just
click
a
button.
B
You
get
the
whole
GPU
with
everything
already
pre-configured.
That
kind
of
stuff
seem
like
digital
ocean
now
offers.
You
know,
machine
learning,
instance
and
there's
I.
Think
two
different
groups,
there's
the
machine,
learning
end-user
data
science,
folks
and
then
there's
the
okay
I
have
to
stand
this
up
in
production.
For
you
know,
let's
say:
I
was
Netflix
with
you
know,
movie
stuff,
coming
in
all
the
time
I'm
just
trying
to
see
where
cube
flow
fits
in.
All
of
that.
A
So
explicitly
our
goal
is
to
reduce
the
difference
between
those
two
as
much
as
humanly
possible,
because
I've
heard
exactly
what
you
said
many
many
many
times.
The
problem
is
that
when
that
data
scientist
finishes
their
experimentation,
they
often
have
to
rewrite,
throw
out,
etc,
etc.
They're
they're
ml
model,
in
order
to
port
it
to
the
production
one
right
pint
orchard
cafe
our
perfect
example
most
folks
that
do
work
in
PI
George
end
up
using
Cafe
as
the
production
ml
framework
and-
and
you
know,
I
we're
trying
to
reduce
that
as
much
as
possible.
A
But
going
beyond
that
today,
I
think
the
the
standard
experience
would
be-
and
let
me
I
will
share
this
again
so
today
a
data
scientist
today
might
look
at
something
like,
but
they
may
execute
these
same
number
of
commands,
but
instead
of
using
something
like
cube
flow
they'll
do
a
bunch
of
pip
installs.
They
might
use
some
apt-get
installs
and
things
like
that.
Just
to
make
sure
the
right
packages
are
there
and
then
they
have
to
do
their
own
coordination.
Oh
things
are
talking
over
local
host
there
on
this
port.
A
You
know
I'm
not
sure
they
have
to
do
a
whole
bunch
of
stuff.
Our
hope
is
that
we
get
this
level
of
stuff
down
Stowe's,
simply
and
so
cleanly
that
a
data
scientist
who
it's
certainly
they're,
not
gonna,
spend
a
lot
of
time
doing
this,
but
they
are
doing
that
today.
If
we
can
have
them
to
spit
up
a
production
stack
on
their
local
laptop
such
that
it
does
represent
what
the
standard
is
for
the
for
the
enterprise
or
that
the
particular
job
they
were
doing
with
all
these
various
components.
A
A
Don't
care
TF
job
and
SK,
learn
and
pipe
doors,
and
so
on
and
so
forth,
but
they're
all
done
in
the
way
that
the
organization
has
said:
okay
yeah.
This
is
what
we're
we
want
to
get
to
when
we
get
to
production
that
reduces
the
overall
amount
of
change
and
friction
and
pain
that
they
then
it
comes
to
when
it
comes
to
taking
that
bottle
that
they
trained
and
got
running
locally
to
production.
So
that's
really
our
goal.
A
You
know
it
is
not
our
goal
that
we're
trying
to
make
every
data
scientist
into
an
IT
ops
person.
Absolutely
not.
That
is
exactly
why
we're
using
kubernetes
to
provide
that
layer
of
abstraction
from
everything
under
the
hood
and
and
really
get
at
nothing
more
complicated
than
just
kind
of
like
pip
install
food.