►
Description
A Deep Dive into Kubeflow Pipelines - Senthil Raja Chermapandian, Ericsson
A Machine Learning model is only a tiny piece in a series of multiple processing steps executed as part of an ML workflow. A pipeline is a description of an ML workflow, including all the components in the workflow and how they combine in the form of a graph. Kubeflow Pipelines (KFP) is an open-source project that helps to run Cloud-native ML pipelines on Kubernetes. While most previous talks on KFP have focused on Data Scientists and Data Engineers, this talk will dive deep into KFP, covering its architecture, platform components and how the platform components work together in executing the workflow.
A
A
While
you
might
have
heard-
or
you
might
have
watched
several
talks
about
kubeflow
pipelines
before
I
presume
most
of
those
talks
would
have
been
highly
data.
Scientist
or
data
engineer
focused,
meaning
the
more
focus
would
have
been
to
how
to
write
pipelines
in
a
more
efficient
manner,
how
to
build
components
from
scratch
or
how
to
convert
a
python
function
into
components
and
then
eventually
build
a
pipeline
right.
So
that
is
fine.
A
But
today
I'm
going
to
talk
to
you
about
q
flow
pipelines,
which
will
be
more
from
an
let's
say,
ml
engineer,
point
of
view
or
an
ml
ops,
engineer,
point
of
view
or
even
a
devops
point
of
view.
So
today
I
try
to
cover
how
cube
flow
pipelines
are
composed.
What
are
the
components
that
comes
along
with
kubeflow
pipelines
and
how
these
components
interact
with
each
other
and
eventually
how
these
components
are
able
to
execute
the
pipeline
that
is
submitted
to
the
kubeflow
pipelines?
A
So,
let's
enter
into
the
talk,
I
am
central.
I
am
working
as
principal
software
engineer
in
ericsson
and
my
job
in
ericsson
is
to
primarily
architect
cloud
native
aiml
platforms,
so
these
are
platforms
that
are
highly
distributed
in
nature
and
use
kubernetes
as
the
underlying
platform
for
compute
and
other
resources,
and
apart
from
work,
I
take
time
to
participate
in
other
aspirations
of
mine.
A
I
am
the
maintainer
of
an
open
source
project
called
as
cubefletch,
so
this
project
is
actually
an
operator
which
will
help
you
to
cache
container
images
directly
on
the
worker
nodes
of
a
kubernetes
cluster,
and
I
am
also
an
occasional
speaker.
I
would
say
I
am
not
you
know
very
active
in
speaking,
but
whenever
I
talk,
I
love
to
talk
about
kubernetes
cloud
native
technologies
and
very
recently.
I
have
also
picked
up
an
interest
in
talking
about
mlaps
and
I
am
a
tech
blogger.
A
You
can
watch
my
blogs
in
medium
and
nowadays
I
am
a
little
bit
not
that
active
in
tech
blogging.
Due
to
my
preoccupation
with
organizing
kubernetes
community
days
chennai-
and
I
am
fairly
active
with
social
media
sites
like
twitter
and
linkedin,
so
do
check
out
my
profiles
on
the
social
media
platforms.
A
The
agenda
for
us
today
is
actually
very
simple:
I'm
going
to
talk
about
ml
workflows
and
the
various
ml
pipelining
tools
and
I'm
going
to
pick
out
cube
flow
and
I'm
going
to
talk
about
cube
flow.
What
are
the
platform
components
that
comprise
of
kubeflow
pipeline
I'll
be
talking
at
length
about
the
cube
flow
pipeline
architecture?
That
is
where
I
will
talk
about
the
various
components
that
make
up
q
flow
pipeline
and
how
these
components
interact
with
each
other
and
I'll.
Try
to
dig
more
deeper
into
q
flow
pipeline.
A
It
has
its
own
distinct
set
of
input
and
it
has
its
own
distinct
set
of
output,
and
the
input
can
be
a
very
simple
parameter
like
a
string
or
integer
or
float,
or
the
input
can
be
a
huge
data
set
which
is
stored
somewhere
in
a
data
store.
Okay.
Similarly,
the
output
can
be
a
very
simple
file
or
the
output
can
be
a
huge
data
set
that
is,
for
instance,
pushed
into
a
kafka
or
that
is,
for
instance,
stored
into
a
mini
object,
storage,
so
whatever
it
may
be.
A
A
And
today
I
am
going
to
focus
about
one
single
tool,
which
is
called
as
cube
flow
and
cube
flow
is,
by
the
way,
an
open
source
project,
which
also
provides
you
not
only
with
pipelining
capabilities,
and
it
also
provides
you
with
a
game
of
features
and
functionalities
that
you
would
expect
from
an
end-to-end
machine
learning
platform.
Okay,
for
instance,
there
is
a
k-serve
which
takes
care
of
serving
the
models
in
production
at
scale,
and
it
provides
features
like
a
b
testing.
A
Multi-Armed
bandits
and
things
like
that
and
kubeflow
also
provides
you
with
development
capabilities
where
you
can
make
use
of
jupiter
notebooks
in
order
to
make
use
of
various
machine
learning
frameworks
to
develop
your
model.
It
provides
you
with
capabilities
of
training,
your
model
retraining,
your
model
and
things
like
that.
A
Google
ran
tensorflow
models
internally
right,
so
we
know
that
tensorflow
is
a
very
popular
machine
learning
framework
that
is
widely
used
and
once
a
tensorflow
model
is
developed,
so
you
need
to
run
this
model.
So
google
was
using
some
of
the
features
that
you
find
today
in
kubeflow
internally
to
run
their
tensorflow
models.
A
In
fact,
it
began
as
just
a
simpler
way
to
run
tenfold
tensorflow
jobs
on
kubernetes,
okay,
it
actually
aimed
for
removing
the
complexities
associated
with
running
tensorflow
jobs
on
kubernetes,
and
that
is
how
it
all
started,
and
ever
since
that
q
flow
has
even
expanded
into
a
multi-architecture
multi-cloud
framework
for
running
end-to-end
machine
learning,
workflows.
A
Okay,
so
what
I
mean
by
end
to
end,
is
it
caters
to
each
and
every
step
of
a
typical
machine
learning
life
cycle,
starting
from
data
exploration
or
even
starting
from
defining
your
model,
accuracy,
criteria
and
metrics
criteria
up
till
deploying
the
model
and
monitoring
the
model
in
production?
So
it
offers
an
end-to-end
platform
and
kubeflow
provides
components,
as
I
said
earlier,
for
each
and
every
stage
in
the
ml
life
cycle,
for
exploration,
for
training,
for
deployment,
for
monitoring
for
retraining
and
things
like
that.
Okay,.
A
So
what
are
the
installation
options
available
for
queue
flow
so
either
you
can
install
kubeflow
pipelines
as
a
standalone
framework
or
a
platform,
so
that
is
available
or
you
can
choose
to
install
the
complete
kubeflow
platform
and
then
use
only
the
kubeflow
pipelines,
part
of
it.
Okay-
or
there
is
a
third
option
you
can
consume
kubeflow
as
a
fully
managed
service,
consume
queue
flow
pipelines
as
a
fully
managed
service.
A
A
So
when
we
talk
about
cube
flow
pipelines,
it
is
predominantly
built
of
these
four
components.
Okay,
so
the
first
and
foremost
you
have
an
user
interface
for
managing
and
tracking
the
various
machine,
learning,
experiments,
jobs
and
runs,
and
there
is
a
very
core
workflow
engine
that
actually
performs
the
hard
work
of
executing
the
workflow.
A
Okay,
we
will
talk
about
what
this
engine
is
made
up
of,
and
things
like
that
later
and
a
third
more
important
feature
of
q
flow
pipelines
is
it
provides
you
with
an
sdk
for
you
to
write
your
pipeline
okay
and
for
you
to
even
build
components,
reusable
components
for
pipeline,
so
that
these
components
can
then
be
used
in
different
pipelines.
Okay,
so
it
provides
you
with
sdk
and
there
is
also
a
rest
api.
A
So
if
you
want
to
re,
consume
kfp
by
in
the
form
of
rest,
apis
that
is
also
available
and
whereas,
if
you
want
to
do
it
in
the
using
the
sdk,
that
is
also
possible
or
if
you
want
to
just
use
the
ui
and
then
submit,
submit
the
submit
the
jobs
via
the
ui
and
then
see
the
artifacts,
and
things
like
that.
That
is
also
possible
and
kfb
also
provides
you
with
some
inbuilt
notebooks
for
you
to
easily
interact
with
kfp
using
the
sdk.
So
that
is
also
available.
A
So,
let's
get-
or
let's
spend
more
time
on
the
slide
where
this
is,
where
you
see
the
architecture
of
cube
flow
okay,
so
at
the
top
of
it
at
the
top
of
it,
you
have
the
ui
and
the
ui
is
served
by
the
pipeline
web
server
and
the
ui
itself
has
several
capabilities.
For
instance,
you
can
actually
submit
a
pipeline
in
the
ui
and
once
the
pipeline
is
submitted
once
you
have,
you
have
run
the
pipeline.
A
You
can
see
the
history
of
the
runs
in
the
ui
and
you
can
see
several
metadata.
You
can
in
fact,
even
drill
down
more
deeper
into
the
job
history
and
see
what
are
the
steps
that
were
executed?
What
up?
What
was
the
input
for
that
step?
What
was
the
output
for
that
step
and,
in
fact,
if
you,
you
can
also
see
where
that
output
is
stored?
Okay-
and
you
can
also
use
it
for
debugging
and
things
like
that,
and
there
is
also
a
capability
for
you
to
visualize
the
run.
A
So
if
you
get
a
results
out
of
training,
your
machine
learning
model,
for
instance,
if
you
are
trying
your
machine
learning
model
using
various
hyper
parameters
right,
so
you
can
visually
see
how
the
model
is
performing
with
these
various
hyper
parameters.
So
so
the
ui
is
actually
catering
to
a
wide
wide
set
of
features.
That
is
one
good
thing
about
q
flow
pipelines
and
underneath
you
have
the
system,
which
is
the
primary
orchestration
engine
which
performs
all
all
the
hard
work
necessary
for
executing
a
kubeflow
pipeline.
A
So
on
top
of
everything
you
have
the
pipeline
service,
so
the
goal
of
pipeline
service
or
the
responsibility
of
pipeline
services.
Whatever
pipeline
you
submit
to
kfp,
it
is
the
pipeline
service
that
interprets
it.
It
parses
it.
So
it
understands
the
python
dsl
that
is
actually
defined
for
writing
the
pipeline.
It
understands
the
dsl,
it
parses
the
dsl
and
then
eventually
it
compiles.
A
It
compiles
the
pipeline
code
and
then
it
prepares
the
pipeline
yaml.
So
that
is
the
job
of
pipeline
service
and
whatever
is
done
by
the
pipeline
servers
every
at
every
point
in
time
it
makes
sure
to
store
the
metadata
into
the
metadata
database
and
by
the
way,
it's
a
mysql
database
and
it
stores
all
the
metadata
into
this
mysql
database
and
once
it
has
determined
what
actions
or
what
tasks
have
to
be
performed
for
a
particular
pipeline
run.
A
It
goes
ahead
and
creates
the
necessary
kubernetes
resources
that
are
required
for
executing
the
pipeline,
okay
and
for
and
and
in
kfp,
each
and
every
step
of
the
pipeline
is
executed
as
a
kubernetes
part.
Okay,
so
there
is
a
container
image
and
each
and
every
container
is
run
within
the
kubernetes
part.
Okay,
so
essentially,
what
happens
is
whatever
kubernetes
resources
that
are
necessary
to
execute
this
pipeline
are
created
by
the
pipeline
service,
and
the
pipeline
persistence
agent
basically
persists
all
these
kubernetes
resources.
A
A
Okay,
let's
move
on
now
underneath
the
orchestration
system,
you
will
have
a
bunch
of
orchestration
controllers,
so
q
flow
pipeline
is
built
in
such
a
way
that
it
can
support
multiple
orchestration
controllers.
So
one
primary
controller
that
we
use
for
task
driven
workflows
is
the
argo,
workflow
and
argo
workflow
is
again
a
separate
cncf
project
for
executing
workflows.
So
you
will
also
see
instances
where
ml
pipelines
are
written
directly
in
argo
workflow
using
yaml
constructs
okay,
but
whereas
in
kubeflow
pipeline
you
have
a
pipeline
servers,
you
have
an
sdk.
A
A
So,
let's
get
moving,
choosing
an
argo
workflow
executor.
So,
as
I
said
earlier,
queue
flow
pipelines
run
on
argo
workflows.
So
argo
workflow
is
the
primary
workflow
engine.
Okay,
that
actually
executes
the
ml
workflow
and
you
can
either
use
the
docker
executor
for
argo
workflow
or
you
can
use
the
very
latest
emissary
executor
and
by
the
way,
emissary
executor
is
the
default
executor
from
version
1.8.0
onwards.
A
For
instance,
it
supports
only
the
docker
container
runtime
and
we
know
very
well
that
in
version
1.24
of
kubernetes,
the
docker
shim
is
getting
removed
or
the
docker
shim
has
already
been
removed
because
1.24
is
already
out
and
which
means
the
docker
executor
can
be
used
only
if
you
are
using
an
older
version
of
kubernetes
right
and
from
security
perspective
since
docker
needs
privileged
access
to
the
docker
socket
on
the
host.
A
It
is
not
preferable
to
use
such
a
such
a
approach
or
such
a
solution
in
production,
whereas
emissary
executor
supports
any
container
runtime
and
it
is
also
more
secure.
Okay,
so
so
moving
forward.
It
is
going
to
be
by
default.
Emissary
executor
that
is
already
and
default
executor
from
version
1.8.0
onwards,.
A
And
other
notable
features
of
kfp,
so
I
wanted
to
give
you
this
other
features,
because
it
will
help
you
to
understand
in
a
more
deeper
way
about
kfp,
so
it
provides
out
of
the
box
multi-user
isolation
for
pipelines
and
by
the
way
this
is
available
only
in
the
full
q
flow
deployment.
It
is
not
yet
available
in
the
standalone
kfb
deployment.
A
Basically,
this
feature
allows
you
to
separate
the
kubernetes
resources
for
multiple
users,
so
you
can
create
multiple
profiles
and
each
profile
is
nothing,
but
each
profile
is
actually
get
getting
mapped
into
a
kubernetes
namespace.
So
if
you,
if
you
create
a
user
profile
and
that
particular
user,
when
they
run
a
queue
flow
pipeline,
whatever
resources
that
are
created
for
that
pipeline
run
will
get
created
only
in
that
particular
namespace
okay.
A
So
this
provides
you
with
isolation,
for
instance,
when
you
are
sharing
a
queue
flow
instance
with
multiple
users,
it
provides
you
with
very
good
isolation,
and
another
good
feature
is
step.
Caching,
okay,
so
we
saw
that
there
are.
The
pipeline
is
executed
in
multiple
steps.
Let's
say
you
create
a
pipeline
run
and
let's
say
you
once
again:
recreate
a
pipeline
pipeline
run
this
time,
just
by
modifying
the
hyper
parameters
alone.
Okay,
the
and
this
modification
of
hyper
parameters
is
specific
to
a
particular
step.
Let's
assume
so
step.
A
It
also
efficiently
uses
the
resource
of
the
pipeline,
and
you
can
also
control
when
you,
the
cache
invalidation,
should
happen
and
when
the
caching
should
be
disabled
and
you
can
also
altogether
either
enable
or
disable
the
caching
feature
and
another
feature
that
was
recently
introduced
in
the
sdk
v2
is
pipeline
root.
A
This
essentially
represents
an
artifact
repository
where
the
pipeline
stores
artifacts
okay,
so
originally
only
minio
was
supported
and
that
too
the
minio
that
was
packaged
along
with
kubeflow
pipelines.
That
was
the
only
way
to
store
your
artifacts,
but
whereas
now
you
have
three
different
options,
you
can
have
minio,
you
can
bring
your
own
menu
or
you
can
use
any
s3,
compatible,
object,
storage
or
you
can
use
even
gcs.
Google
cloud
storage.
A
A
So
let
me
end
the
slide
show
and
before
I
open
the
ui,
let
me
show
the
list
of
pods
that
are
actually
running
for
a
queue
flow
pipeline
installation.
So
here
you
can
see
the
minio,
which
is
actually
the
artifact
repository
the
mysql
database,
which
is
actually
the
metadata
store
and
workflow
controller
is
basically
the
argo
workflow
controller,
because
this
installation
has
only
orgo
workflow
controller,
and
this
is
the
pipeline
service
which
basically
accepts
the
pipeline
and
then
creates
the
various
kubernetes
resources.
A
This
is
the
pipeline
persistence
agent,
which
persists
all
the
kubernetes
resources,
their
input
and
output.
Everything
in
the
ml
data
store
and
schedule
workflow
is
used
whenever
we
need
to
schedule
workflows
rather
than
one-time
workflows.
We
can
also
have
scheduled
workflows
and
when
we
have
scheduled
workflows,
the
scheduling
is
actually
taken
care
of
this
component
and
you
have
a
bunch
of
other
components.
These
are
all
ui
related
components.
The
pipeline
ui,
the
pipeline
viewer
crd,
as
well
as
the
pipeline
visualization
server.
A
A
So,
along
with
the
installation
of
kfp,
there
are
some
default
pipelines
that
are
installed
as
part
of
the
installation,
and
today
I
am
going
to
use
one
such
pipeline,
which
is
actually
the
pipeline
that
I
explained
using
this
light.
So
this
is
how
it
looks
graphically
and
I
am
going
to
run
this
pipeline
by
clicking
on
start.
A
So
once
I
do
that
a
run
a
new
run
has
been
created,
so
I
can
click
on
this
run
and
it
will
show
you
show
me
a
visual
graph
explaining
the
various
progressions
of
that
particular
graph.
So,
as
you
can
see,
this
step
has
completed
and
you
can
see
that
this
step
produced
two
output,
artifacts,
so
oneness
it
produced
a
table
which
was
stored
in
the
artifact
repository
and
then
it
produced
the
logs,
and
we
can
also
see
the
pod
that
was
the
kubernetes
pod
that
was
created
for
executing
the
step.
A
A
The
initial
model,
training
has
also
been
completed,
and
we
are
seeing
in
this
step
that
the
initial
model,
as
well
as
the
data
set,
are
being
sent
as
input
for
the
step,
and
the
output
of
the
step
is
the
trained
model,
along
with
the
model
config
plus
the
logs
okay.
That
is
what
we
see
it
as
output.
Again,
we
can
see
the
pod
that
was
created.
We
can
also
see
the
logs
that
was
created
by
the
container
that
ran
this
step.
A
A
A
A
A
The
pipeline
has
ran
successfully
and
now,
if
we
come
here,
we
can
see
all
the
pods
that
were
created
by
argo
workflow
in
order
to
execute
the
pipeline.
So
for
every
step
in
the
pipeline,
you
will
see
a
corresponding
pod,
so
you
can
also
use
cube,
ctl
commands
and
then
look
at
the
pods
look
at
the
logs
that
were
produced
by
this
pod
events
that
were
produced
by
the
spots,
the
same
information
that
you
saw
in
the
ui
okay
yeah.
A
This
is
a
very,
very
simple,
simple
pipeline
that
we
typically
will
find
during
model
exploration
and
model
development
phase,
and
we
saw
now
that
q
flow
pipeline
was
able
to
execute
this
pipeline
and
execute
it
successfully.
Okay,
yeah,
and
that
is
pretty
much
what
I
intended
to
talk,
and
I
really
hope
that
you
enjoy
the
talk,
and
I
really
hope
that
the
content
of
this
talk
will
be
useful
and
by
the
way,
if
you
have
any
questions
about
this,
talk,
feel
free
to
post
them
as
text
questions
in
the
corresponding
slack
channel
and
I'll.