►
Description
Michael Hausenblas (Red Hat OpenShift) discusses KAML-D (Kubernetes Advanced Machine Learning & Data Engineering Platform) open source project with the Machine Learning on OpenShift SIG of OpenShift Commons. Learn more about https://github.com/kaml-d/design
A
A
If
you
look
at
the
concrete
situation,
someone
might
be
using
I,
don't
know
R
or
even
Python,
it
doesn't
matter,
and
they
you
know
the
data
scientists
they
put
together
a
wonderful
model
and
they
have
all
their
parameters
orders
if
something
that
works
on
their
machine
and
then
come
the
data
engineers
and
developers
that
actually
write
the
application
in
PHP
in
rails
in
JavaScript
and
go
color
whatever,
and
they
very
often
need
to
reimplementation
learning
model
that
has
riff.
You
know
the
data
scientist
had
that
already
working
and
dead
causes.
Friction.
A
That
means
that,
if
something
changes,
if
they
decide
this
iterate
and
have
an
even
better
model
with
no
a
better
cost
function
or
whatever
the
data,
engineers
and
developers
need
to
to
play
catch-up
with
that
and
camel
really
focuses
on
that.
Bridging
that
gap,
helping
data
scientists
on
the
one
hand
and
engineers
and
developers
on
the
other
to
work
together
better
so
very
high
level,
typically
use
cases,
would
be.
A
Whoever
has
not
done
that
copying.
A
data
set
appending
a
version
one
or
the
current
date
or
whatever
doing
that
again,
adding
something
removing
something
sharing
that
with
others,
and
there
is
the
solution
for
that
right.
In
encode
we
use
git,
we
unity.
We
use
nowadays
typically
distributed
version
control
systems
like
it
that
take
care
of
that
at
automatically.
Whenever
we
say
take
a
snapshot
of
the
current
code
base,
then
a
trail.
A
An
entire
set
of
histories
is
build
up
and
we
can
go
back
in
time
and
the
same
is
possible
for
data
sets
and
gonna
go
into
that
in
a
moment.
If
you're
a
data
engineer
or
a
developer,
then
you
benefit
from
the
unified
way
of
how
camera
handles
the
data
set
in
the
models
and
again
think
of
you
might
be
implementing
version.
A
One
data
scientist
are
only
worth
two
or
we
finding
the
models,
we're
taking
a
different
approach
or
whatever,
and
you
want
to
be
able
to
pick
up
as
quick
as
possible
and
also
for
bugs
or
to
lesser
degree-
and
you
know
it
provides
the
guidance
to
run,
do
approximately.
Let's,
the
machine
learning
feature
is
enabled
so
far.
This
is
very,
very
early
phase,
as
you
can
see
at
him
got
any
running
code.
Yet
the
UX
is
more
or
less
as
follows.
You
essentially
four
tabs
that
might
well
change.
A
I
might
actually
a
discussion
with
Graham
who
presented
I
think
two
months
ago
to
build
a
hop-on
on
the
Commons
video
channel.
I
might
actually
swap
that
out.
I
might
actually
make
these
tabs
part
of
attribute
the
lab.
So
it's
actually
turning
it
around,
but
essentially
you
have
these
four
tabs.
You
have
the
data
tab
where
user
data
scientist
mainly
would
upload
the
data
set.
You
can
point
it
to
someone
web
or
just
upload
it
from
your
local
drive,
and
then
you
have
two
ways
to
that's,
essentially
metadata
level
to
search
for
data
sets.
A
A
And
then
anyone
who
has
been
working
in
a
real
world
situation
allows
there
to
be
many
data
sets
you're
dealing
with,
so
you
can
very
quickly
find
it
every
loan
data
set
and
whenever
you
want
to
you,
can
that's
that's.
This
green
checkmark
here
essentially
take
a
snapshot
of
the
data
set.
This
is
a
copy
on
write
and
snapshot
and
essentially
only
captures
the
diff.
A
So
if
you,
for
example,
you
have
a
CSV
file
and
I
say
into
whatever
reasons:
I
want
to
only
take
half
of
it
or
you
have
this
typical
training
and
test
split,
7030
or
whatever
you
can
do
that
with
one
mouse
click
extension
and
the
second
tab
would
essentially
be
currently
the
idea
having
to
put
the
hub
there
or,
as
I
said,
if
I
turn
it
around
in
terms
of
UI,
it
would
be
yellow
way
around,
but
essentially
having
the
development
mainly
for
for
data
scientists
to
essentially
put
together
them
all.
A
The
circuit
would
be
the
deployment
tab,
which
is
mainly
for
data
engineers
and
developers,
and
odd
folks
and
observation
will
essentially
be
in
Griffin
and
Jaeger
be
embedded
there
and
in
terms
of
component.
As
you
can
see,
there
is
pretty
much
everything
besides
the
actual
workbench,
which,
as
I
said
now,
I'm
pretty
convinced
I'm
gonna.
Do
that
as
plug
in
in
into
the
lab.
Everything
else.
Is
there
right,
so
we
can
run
it
on
any
plot
club
platform,
you're
using
Canadians
or,
in
our
case,
obviously,
or
push
it
as
the
the
runtime
environment.
A
Not
mesh
is
essentially
that
thing
that
takes
care
of
of
the
snapshots.
Essentially
what
get
does
for
code,
and
then
you
have
two
passes
for
for
the
metadata
hub,
which
is
presto,
DB,
a
district
weary
engine
that
exposes
to
sequel
interface
against
any
kind
of
storage,
block
storage
whatever
and
elasticsearch.
That
captures
the
other
bits
of
the
metadata.
A
But
on
the
right
hand,
side
you
see
if
you're
familiar
with
cube
flow,
that's
essentially
flow.
So
no
big,
surprise
there
and
just
a
final
note
here,
depending
on
your
role,
you
would
probably
not
see
all
the
taps
or
a
data
scientist
might
only
see
data
and
development
have
a
data
engineer.
A
developer
might
see
development
deployment,
ops
press
might
see
all
of
them
or
observation
and
deployment.
So,
depending
on
your
role,
you
would
see
different
tabs
there
and
that's
pretty
much
it.
A
Right
right
right
right,
not
everyone
might
know
that.
That's
that's
a
really
good
point.
Thank
you
very
and
dot
mesh
is
essentially
a
new,
as
did
the
first.
As
far
as
I
know,
the
first
representative
of
a
new
kind
of
mesh,
in
this
case
betameche,
might
have
heard
about
service
meshes
like
like
sto
or
confident
and
others,
but
don't
measure
is
a
day
dimension.
The
dimension
essentially
means
you
the
same
way.
A
Your
externalizing
functionality
in
the
service
mesh
with
respect
to
service
and
networking
your
externalizing
functionality
in
terms
of
data
in
terms
of
snapshotting
data
to
into
the
mesh
and
the
mesh
takes
care
of
that.
Essentially
so
technically,
dot
mesh
in
qualities
is
a
flex
volume
that
just
transparently
works
there.
You
don't
really
notice
it
you're.
Just
using
you
know
your
normal
volume
there
and
there
is
an
API
so
working
together
with
the
top
mesh
folks
there
with
Luke
there's
an
API
Python
API
at
this
entry.
A
It
allows
me
to
say
take
a
snapshot
of
that
volume
or
whatever,
and
then
I
can
provide
that
back
in
time.
So
you
can
imagine
at
at
some
point
in
time.
You
would
have
a
tree
like
structure
if
I,
if
I
click
on
that
finance
transaction
data
set.
I
would
see
this
tree
like
structure
of
snapshots,
that
I
that
I've
taken
up
or
whoever
in
that
context
has
taken
make
sense
regarding
not
matter
or
any
any
questions.
What
mesh.
D
Was
a
Michael
I
love
the
way
you
framed
the
whole
process
into
the
three
groups,
because
I
think
that
was
what
I
was
struggling
to
articulate
the
other
two
times
that
I
have
set
on
the
cube
flow
stuff
because
looking
at
it
as
a
data
scientist
or
somebody,
who's
interested
in
scientific
reproducibility.
As
a
data
scientist
I
want
to
be
able
to
run
whatever
I
want
to
run
on
our
Python
Julia
and
installed
the
libraries
that
I
need
to
get
my
work
done
and
access.
D
C
A
B
Had
a
brief
question
about
I
mean
people
tend
to
draw
this
diagram.
I've
seen
this
diagram
incarnations
before
and
they
tend
to
draw
it
like
waterfall
style
like
you
did,
but
I
mean
like,
as
a
you
know,
model
jockey
if
I'm
playing
that
role,
I
mean
I'm,
also
very
driven
by
what
you
know.
Dad
engineering
can
like
give
me
and
I
just
wondering
like
if
there's
any
consideration
given
to
kind
of
like
the
arrows
that
might
be
going
in
the
other
direction,
where
I
can't
I
think
in
train
on
stuff
that
I
don't
know.
A
Right
right
right,
no,
absolutely
your
that's
an
excellent
point
and
I
actually
you're,
absolutely
spot-on,
yeah!
The
pair
is
obviously
go.
It's
not
a.
You
know
from
one
side
to
the
other
that
that's
that's.
The
I
think
I
headed
a
little
bit
in
the
description.
This
iteration,
it's
right,
so
you're
going
back
and
forwards
and
you
get
feed
bags
like
Sony's,
deploys
that
in
production
ago,
like
yeah,
that
seems
to
work
in
that
region,
but
not
really
there
can
be.
Can
we
somehow
you
know,
update
the
beta
site
remodel
or
whatever?
D
I
just
think,
following
up
on
what
Eric
just
said,
the
ability,
as
a
data
scientist,
to
access
different
data
stores
which
might
have
different
authentication
barriers.
You
know
in
terms
of
you
know,
I
met
this
University
I
may
be
able
to
open.
You
know
access
this
data,
you
know,
but
not
you
know
the
one
over
here
at
this
other
University
or
whatever
HIPAA
compliance
things
like
that.
You
know
I,
think
the
data
scientist
wants
of
you
into
that
data
engineering
level.
They
just
don't
want
to
be
the
person
to
necessarily
do
it.
C
A
As
I'm
not
aware
of
any
metadata
like
open
source
metadata
project,
that
actually
does
that
this
bits
with
elastic,
search
and
and
press
to
be
be
so
if
anyone
has
anything
there
that
yeah
I,
my
goal
is
always
to
implement
as
little
as
possible.
So
here
anyone
has
any
any
project
that
I
can
use
them.
Yes,
but
I
am.