►
From YouTube: Kubernetes Machine Learning WG 20180524
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
A
D
The
as
such
we've
been
trying
to
attend
all
of
the
sig
and
working
group
meetings
this
season
in
order
to
just
let
people
know
where
we
are
and
what
we
can
do
for
you.
Let
me
go
through
some
of
the
dock
here.
One
thing
is:
we
are
kind
of
continuously
trying
to
tweak
how
we
handle
repo
and
workflow
management
for
the
repos
in
the
kubernetes
namespace
and
including
changes
to
labeling
changes
to
prow
bot
commands
to
do
issued
triage
guidelines
and
many
other
things.
D
Statement
and
a
question,
the
statement
is
that
a
lot
of
those
things
there's
going
to
continue
to
be
changes
over
the
next
three
or
four
months,
you'll
hear
from
us
and
test
in
four
on
that
we
post
that
to
contribute.
Sig
leads
and
kubernetes
dev
mailing
lists
ahead
of
time,
so
that
people
can
voice
objections
before
we
actually
push
changes.
D
But
it
does
mean
that,
if
you're
concerned
with
those
sorts
of
workflow
changes
that
you
need
to
be
following,
at
least
one
of
those
mailing
lists,
the
I
and
the
question
is:
do
people
in
cig
in
the
machine
learning
working
group
have
particular
concerns
or
feedback
about
things
like
the
pran
issue
and
Docs
workflow
the
way
that
it
stands
within
kubernetes.
There
are
things
that
we
should
hear
about
and
take
into
account.
D
D
D
A
A
D
A
E
D
E
That's
true
I
guess
I
mean
like
they're
like
there
is
that
there
definitely
many
projects
that
we
have
here
yeah
and
there
are
things
that
we're
interested
in
collaborating
on
as
a
community,
but
they
don't
live
inside
of
kubernetes
core
and
they
don't
live
inside
of
a
kubernetes,
SIG's
repo.
So
yes
well,
but
it's
unclear
that
that's
necessarily
the
direction
that
anyone's
even
going
anymore
yeah
and
it's.
D
E
D
Or,
depending
on
people's
available
free
time,
we
do
get
funding
for
programs
like
outreach,
e
and
google
Summer
of
Code
and
other
internship
tech
programs.
So
if
you
have
and
I'll
tell
you
from
an
applicant
standpoint,
if
we
say
hey,
we
have
funding
for
this
machine
learning
related
internship
position.
We
will
get
a
lot
of
applications.
D
So
if
you
have
anybody
who
has
the
time
to
do
that
kind
of
mentoring
and
that's
a
lot
more
time,
you're
talking
about
minimum
of
five
hours
a
week,
probably
more,
then
you
know,
as
that
rolls
around,
for
whatever
the
next
season
are
for
those
various
programs.
We'd
certainly
be
interested
in
having
the
mo
or
who
participate
in
those,
because
it
is
a
very
attractive
thing,
for
you
know,
students
and
unconventional
background
folks
and
other
folks
interest
in
internships
to
participate.
It.
A
D
A
D
A
B
So,
let's
see
this
is
kind
of
inspired
source
by
a
conversation
between
like
rich
Kenneth,
Escott
and
I,
and
the
D
email
thread
we
started
last
week
around
what
kind
of
concrete
things
would
a
mission
with
that?
Would
a
a
working
group
like
this
actually
go
and
target
if
the
resource
management
working
group
would
be
any
kind
of
inspiration.
B
It
was
a
bunch
of
people
with
common
interests
where
work
means
it
needed
to
get
done
across
different
six,
so
that
was
kind
of
inspiring.
What
are
the
first
working
groups
and
we
thought
that
the
machine-gunning
working
group
would
be
somewhat
the
same,
so
just
to
kind
of
snag
out
of
vicious
email.
B
This
is
a
pretty
good
summary
I.
Think,
like
our
goal,
is
to
identify
gaps
and
extend
kubernetes
appropriate
for
all
working
on
in
workloads.
Fine.
So
if
there's
any
confusion
between,
for
example,
what
different
I
asses
mentioning
just
queue
flow
as
being
one
of
them,
and
what
we
would
take
on
here
is
a
lot
of
AI.
B
So
we're
a
lot
of
AI
asses
comes
from
top
down
to
make
sure
that
everything
is
great
and
kind
of
to
some
extent
doing
I
won't
say,
stop
get
stopgap
solutions,
but
would
need
to
figure
out
a
way
to
make
it
a
good
experience.
The
machine
learning
working
group
would
be
a
bottom
up
to
figure
out.
How
can
we
enhance
existing
abstractions?
Is
there
a
space
for
new
ones?
How
do
we
lash
yourselves
to
ongoing
efforts
that
can
help
ease
the
machine,
learning
use
cases
and
those
pain
points
does
that
is?
B
That's
a
bunch
of
things
you'd
like
to
see
we
worked
on
our
algo
team
and
or
our
our
teams
are
submitting
thousands
of
jobs
and
our
a
comp
of
background
with
HPC
schedulers
and
I
used
predictable
job
scheduling,
very
large-scale
job
scheduling
and
the
best
answer
they
have
in
kubernetes
is
the
job
abstraction
and
there
are
many
ways
for
other
things
to
be
desired
within
that
one,
then
again,
when
there's
a
limited
amount
of
accelerators
that
tends
to
be
used
for
training,
fair
sharing
tends
to
be
an
issue.
We
sorry
about
that
game.
B
Scheduling,
as
a
team
has
already
been
talked
about
in
six
scheduling.
A
lot
of
the
patterns
that
we
see
in
the
job
operators
for
tensorflow
hi
torch
cafe,
could
potentially
be
helped
from
from
extensions
within
commodities
also
related
to
runtime,
and
these
are
incomplete
list
by
the
way
I
think
already.
Now
we
have
ideas
for
two
or
three
more
runtime
related
topics,
but
yeah.
B
You
know
loading
in
hundreds
to
terabyte
size
data
sets
easily
having
solid
support
for
accelerators,
having
ways
where
jobs
could
buy
checkpoints
without
having
to
be
opinionated
about
which
capoeira
is
coming
up
with
it.
So
there's
a
lot
of
different
topics
that
are
not
tied
to
any
one
kind
of
I
ass
on
top,
which
is
where
people
have
to
come
up
with
solutions
and
they're
coming
up
with
different
ones,
so
it
basically
creates
like
siloed
environments
on
top
and
that
that
is
not
in
the
spirit
of
kubernetes.
B
B
B
A
Some
of
these
aspects
will
will
help
even
HPC
workloads.
So
in
that
sense,
like
you
want
to
do
minimal
work
as
much
as
possible
by
the
same
time
like
enable
cumulus
to
become
really
friendly
for
so
I
think
I'm,
pretty
confident,
there's
probably
more
in
the
core
layers,
but
this
seems
to
be
a
pretty
good.
B
We
could
definitely
do
that.
So,
if
you
were
on
the
call
around
the
source
to
deployment,
we
spent
a
full
hour
talking
about
this.
This
is
the
first
half
of
it
where
some
of
the
things
we
saw
was
people
are
copying
a
lot
of
the
a
lot
of
pod
templates
and
job
templates,
some
people
in
some
solving
problems
for
themselves
that
they
don't
it
out
like
up
streaming.
So
we've
seen
more
than
one
solution
to
certain
problems.
Some
data
Sciences
go
go
on
just
still
having
problems
that
other
people
have
solved.
B
B
So
it's
kind
of
a
a
bit
of
a
low
practical
problem,
but
I
think
it
is
a
fairly
common
one.
This
is
where
we
discuss
at
least
like
five
or
six
different
ways
of
doing
it,
so
that
is
just
to
mention
that
part
of
it.
That's,
maybe
it's
a
whether
that
should
be
in
kubernetes
or
not.
That's,
that's
something
we
can
begin
discuss.
G
B
Okay,
so
yeah
I
talked
about
templating
here
because
wish
unlisted
listed
it.
So
I
was
just
to
mention
it
and
yes,
we
have
our
project
within
this
space
called
MLT.
We
know
what
a
scaffold
could
be
used.
You
know
that
draft
could
be
used.
There
is
two
others
is
forged:
SH
and
Red
Hat's
source
2
container
project
as
well.
So
we
we
have.
We
have
a
like
two
weeks
ago,
I
think
two
weeks
ago
or
or
four
weeks
ago,
we
we
had
like
a
pretty
elaborate
talk
about
this.
B
This
problem,
in
particular
yeah
same
thing,
I'm,
really
just
pulling
in
things
that
they'll
be
presented
back
then,
where
the
what
what
datacenters
tend
to
care
about
is
make
a
change
to
my
Python
file.
B
That's
another
issue
that
we've
seen
internally,
where,
if
people
are
for
different
machine
learning
workloads,
there
is
a
very
different
requirement
for
the
amount
of
acceleration
you
need
behind
it.
Some
workloads
can
deal
fine
with
one.
These
are
just
to
illustrate
some
part,
some
parts
for
the
quantity,
one
order
or
two
accelerates,
but
if
some
boyish
models
need,
for
example,
for
they
can
end
up
waiting
for
a
long
long
time
before
we
see
those
jobs
being
scheduled.
So
the
the
only
alternative
right
now
is
for
that
needs
to.
A
B
And
few
management
at
least
what
we
see-
people
use
the
Java
abstraction
just
like
they
would
have
done
with
slow-mo
app
or
compare
it
to
to
those
tools,
there's
very
little
a
few
mechanisms
for
doing
introspection
and
for
doing
accounting
on
those.
But
people
start
something
they
have,
and
they
don't
have
a
good
idea
of.
When
it's
going
to
start,
you
don't
have
a
good
idea
of
who
is
using
what
they
don't
know
where
poor
things
are
running
so
yeah
like
at
least
internally.
B
B
You
know
yeah
be
warned,
I
get
a
job
version
2
or
something
like
that.
But
it's
just
food
to
explain
like
the
problem
is
really
there's
fine
people
schedule
many
many
jobs
or
people
try
to
use
priority,
but
it's
only
at
admission.
So
you
know
in
all
that
there's
a
lot
of
a
lot
of
things
in
terms
of
the
experience.
Think
that
could
be
enhanced.
B
That
is
based
on
some
kind
of
budgetary
measure
of
who
is
funding
the
cluster.
Some
of
those
those
concerns
we've
been
looking
for
answers
as
well
I'm
trying
to
buy
controllers
on
top
of
quota,
but
in
general
kind
of
pushing
some
of
these
down,
so
they
become
almost
like
a
detector
standard
would
be
amazing.
B
B
We've
seen
a
lot
of
kind
of
specialized
job
operators
come
up.
There's
one
for
ten
to
flow,
then
they've
been
fork,
for
you
know
early
versions
of
MX
net.
We
forked
it
for
PI
torch
that
was
contributed
to
to
cube
flow,
but
a
lot
of
them
follow
the
same
pattern.
So
there
might
be
an
opportunity
to
find
a
way
to
enhance
either
the
Java
abstraction
or
some
abstraction
within
kubernetes
that
doesn't
love
the
heavy
lifting.
B
So
we
make
sure
that
whatever
we
come
up
with
is
scalar
and
robust
and
maybe
more
importantly,
has
the
same
experience
across
different
frameworks.
So
people
don't
have
to
learn
a
new
C
or
D
to
do
something
special
for
tens
of
lovers
this
by
torch.
If
there's
no
good
reason
for
it,
does
that
make
any
sense,
please
P
speed
up
if
there
is
anything
where
I'm
making
like
gross
assumptions,
are
going
Ophelia's
pretty
pretty
quickly.
H
What
observation
is
that
you
know
you
brought
up
these
alternate
schedulers,
slurm
and
Moab
from
HPC,
but
there
are.
There
are
others
specifically
in
data
analytics
like
spark
and
storm
that
I
think
are
commonly
used
in,
maybe
not
so
much
the
training
aspect
of
machine
learning
but
executing
it
later.
Those
are
the
ones
I've
encountered
more
frequently
at
conferences
involving
data
analytics
with
machine
learning
and
AI.
E
So
case,
storm
is
really
about
stream
processing.
More
than
anything
else,
if
it's
a
true
stream
processing,
add
in
micro
batching
after
the
fact,
but
it's
really
meant
for
processing
streams
of
data
and
kind
of
giving
you
realize
views
for
like
art
and
now
live
are
good
for
certain
types
of
machine
learning,
but
for
learning
and
training
it's
probably
not
the
way
you
want
to
go.
H
Right,
yeah
they're
more
like
applying
it
after
you
train
your
neural
net
and
getting
it
to
run
and
give
you
results,
and
there
are
lots
of
people
interested
in
running
those
workloads
on
kubernetes
I.
Don't
know
if
you
consider
it
out
of
scope
of
this
working
group
or
not,
but
there's
interested
there's
interest
in
the
topic.
We.
E
Already
have
a
lot
of
work.
That's
gone
into
running
spark
on
kubernetes,
so
we
like
the
working
the
workloads
team
at
Google
Masek.
The
data
did
a
lot
of
work
to
make
spark
kubernetes
a
first-class
scheduler
in
the
same
way
that
yarn
me
so
send
spark
native
are
inside
of
spark,
and
we
also
have
the
SPARC
operator
to
kind
of
give
spark
jobs.
First-Class
support,
the
committee's
API.
So
there's
a
lot
of
work
there
already,
but
that's
again,
still
not
the
right
way
for
I'm.
H
B
E
C
E
A
E
Things
upstream
in
this
part
is
extraordinarily
difficult,
right,
major
changes,
so
it
took
over
a
year
to
get
the
work
that
was
done
enough
to
integrate
kubernetes,
maybe
year
and
a
half
back,
and
so
we
can
in
terms
of
like
understanding
use
cases
and
so
forth.
We
could
do
that,
but
if
we
really
want
to
change
the
way
spark
works
or
like
try
to
modify
spark
to
better
suit
ml
or
stream
processing,
I,
don't
know
if
that's
something
we
want
to
bite
off.
B
So
this
is
a
this
one
area
where
we
already
did
one
open
source
project
like
the
kubernetes
volume
controller,
but
in
this
is
a
general
problem
and
even
with
the
careers
volume
controller,
that
there
are
some
things
that
we'd
like
to
aim
to
enhance
or
to
improve
in
terms
of
the
experience,
but,
for
example,
for
the
unit
demo
that
we
have
for
our
tool.
The
brain
scans
data
set
is
about
a
terabyte,
so
it
is
a
non-starter
to
start
copying
things
around.
B
So
how
do
you
make
that
a
better
experience,
because
it's
gonna
be
something
that
even
for
you
know,
it's
like
a
straight
out.
Kubernetes
job
is
something
that
you'd
like
to
to
have
wired
ovens
in
some
way,
so
yeah
I,
don't
think
I
have
more
for
this
one
under
Scott.
You
have
anything
else.
You
wanna
mention.
I
F
B
Yeah,
so
you
know
I
think
from
a
long
history
of
working
for
kind
of
pluggable
device
support.
We
bought
ourselves
a
lot
when
it
comes
to
turning
these
things
on
in
kubernetes,
so
we
do
have
the
device
plug-in
API,
but
this
is
more
like
a
call
out
to
its
continued
investment
and
to
its
you
know,
hardening
it
getting
things
like
pluggable
accelerator
metrics
in
so
it's
not
so
sinkhole
accelerator,
specific.
A
A
Okay,
yeah,
that's
that's
correct
for,
like
Google,
has
chosen
to
explore,
expose
tensor
processing
units
as
points
but
Google
also
exposes
GPUs,
and
they
are
not
exposed
as
in
points
they
expose
as
devices
and-
and
you
are
correct
in
that,
like
if
you're
spinning
up
your
own
cumulus
cloud.
So
you
would
have
to
be
with
these
things.
It's
just
that,
like
you
can't
like
no
take
Kate
or
and
like
whatever
you
get
as
part
of
the
code
release
and
say
now
all
these
work.
We
have
to
install
a
few
extensions.
B
E
E
We
they
all
use
a
similar
pattern,
we've
said
before,
and
then
there's
also
stream
processing,
which
kind
of
sometimes
do
so.
It's
something
that
people
continue
asked
for,
but
that
we
decided
we're,
probably
not
going
to
do
in
core.
So
something
we'd
like
to
do
as
an
extension
and
there's
a
common
expansion.
They
could
be
leveraged
across
the
board.
Yeah.
A
Eventually,
once
starts
taking
over,
then
there
will
be
more
and
more
utilization
demands
and
also
like-
and
this
is
pretty
difficult
to
achieve.
Given
the
current
state,
where,
like
VMs,
are
sort
of
the
boundary
for
most
of
the
so
I
guess,
one
of
the
themes
would
be.
How
do
we
work
with
existing
multi-currency
efforts
and
like
what
sort
of
fluids
do
we
have
to
enable
at
the
node
or
at
the
cluster
level,
in
order
to
make
like
the
cumulus
machine
learning
clusters
multi-tenant
ready,
while
at
the
same
time
not
sacrificing
performance
right.
A
Learning
really
really
cares
about
performance,
so
even
like
five
percent
performance,
jobs
are
actually
problematic,
and
so,
whenever
we
add
a
VM
layer
in
there
is
going
to
be
non-trivial
amount
of
performance
head
and
so
I
haven't,
like
I,
don't
know,
what's
the
right
models
for
this
space,
I
think
for
this,
the
pure
services
space,
the
performance
that
is
most
likely
okay
for
many
apps,
but
the
added
security
is
much
more
important,
but
I'm
not
sure
how
much
that
matters
here
in
this
space.
So.
B
H
All
I
can
suggest
I
don't
want
to
turn
this
into
a
promo,
but
but
just
full
disclosure
I
do
work
for
a
vendor
VMware,
but
hypothetically
if
there
was
I'm
not
announcing
any
products
but
hypothetically,
if
there
was
to
be
virtualized,
GPUs
or
accelerators
I
assume.
That
might
be
something
that
you'd
be
interested
in.
B
Is
it
like
multi-tenant
abuse,
maybe
but
yeah
Alex,
I
guess
the
most
common
thing
we've
seen
has
been
bare
metal
on
Prem
virtualized
cloud
setups,
but
it
would
be
interesting
to
either
do
studies
of
what
people
do
or
what
people
would
be
like
would
like
to
do.
If
someone
is
trying
to
string
together
a
you
know,
multi-tenant
appliance
how
people
go
about
doing
this,
so
maybe
it's
a
bit
more
of
a
hypothetical
but
I'm
sure
that
wishes
related
to
to
your
work
with
gke,
okay,
I'm.
I
B
Yeah,
like
that,
music
requests
also
comes
from
us,
where
you
don't
want
to
have
a
different
experience
based
on
different
frameworks.
That
basically
follows
the
same
pattern
when
it
comes
to
scheduling
things
and
keeping
things
up
or
sharing
certain
pieces
of
information
across
these
these
jobs,
so
they
can
get
in
touch
with
one
another.
So.
A
Yeah
I
mean
to
add
to
what
kind
of
saying
the
general
guidance
is
not
that
the
community
has
to
stop
working
on
these
kind
of
things.
Well,
it's
just
that
not
have
the
community
think
about
everything
else
core
in
the
sense
that
we
now
have
the
the
tools
and
the
extension
points
that
we
need
to
prove
new
api's
and
make
sure
that
it's
solid
and
robust
enough
to
be
supported
for
many
years
to
come,
and
so,
from
that
point
of
view,
I,
don't
think
ken
was
saying
no.
A
We
should
not
do
it,
but
rather
what
you're
saying
is
like?
Let's
not
go
change
the
core
constructs
or
add
core
a
piece,
but
rather
let's
build
it
as
an
extension
and
then
make
sure
that
it
works
really
well
before
we
bring
up
that
conversation
right
I
feel
like
this
secondary
conversation
of
core
is
probably
not
that
important,
because
we
all
want
cumulus
to
succeed,
4ml
uses
and
if
they
happen
to,
if
we
happen
to
ask
them
to
install
one
add-on
or
one
or
two
add-ons,
that's
probably
not
the
end.
E
Thing
is
this:
if
you're
going
to
do
something
as
a
core
resource
now-
and
there
are
some
things
there
are-
it's
not
saying-
they're
never
going
to
take
any
new
core
resources,
but
you
really
have
to
bring
a
very
strong
argument
that
we
have
to
do
it
as
a
core
resource
and
here's.
Why
and
four
staple
a
job.
It's
kind
of
already
been
proven
that
Bachelor
clothes
can
be
done
as
an
extension.
So
we
don't
really
have
a
strong
argument
to
do
it
in
core
first
and
evolve.
It
be
about
that.
E
B
Right
so
I
think
it's
we
can
find
out
the
details
of
this,
but
it's
maybe
just
a
call,
a
shot
up
from
from
us
being
I
think
we
saw
this
pattern
with
mesos,
where
a
new
framework
is
easy
to
make,
but
making
a
good
and
stay
stable
and
scalable.
One
is
really
really
tough.
So
if
there's
gonna
be
a
lot
of
operators
that
are
of
varying
quality,
that
users
are
better
off
having
a
a
common
abstraction,
and
that's
that
again
is
a
is
a
bad
and
a
hypothesis,
but
maybe
worthwhile
was
doing.
B
A
Not
necessarily
a
separate
API
like
if
you
just
keep
a
history
of
jobs,
that
by
itself
is
like
an
indicator
of
so
big
right
now.
Jobs
has
a
whole
bunch
of
issues
by
the
skin
said
it's
it's
v1,
so
we
can't
really
change
it.
So
the
only
alternative
we
got
is
like.
We
have
to
write
extensions
to
work
around
the
limitations
of
jobs
and
so
I.
A
Thinking
in
terms
of
like
more
and
more
curious,
narrative,
abstractions
is
probably
a
powerful
and
important
thing
we
have
to
do.
Fine
I
found
that
once
you
delete
your
once,
you
pod
the
eyes.
The
job
doesn't
know
what
happened
previously
or
odd
gets
deleted.
Job
goes
berserk,
like
that's,
that's
not
going
to
work
at
scale.
Yeah.
A
The
like
I
think
Nick
cut
a
slide
about
that
previously
right,
which
is
like
job
output
tracking,
like
you
had
some,
so
you
you
want
to
order
what
includes
UK
and
and
have
some
unique
identification
for
those
set
of
input
parameters
yeah
and
then
you
have
set
of
outputs
and
being
able
to
go
from
inputs
to
outputs
through
an
IB.
That's
something!
That's
not
easily
possible
today,
with
jobs
and
I
would
expect
that
to
be
something
important,
even
for
other
custom
controllers.
That
sends
like
what.
E
A
B
Can
I
think
we
saw
something
similar
like
in
in
the
in
the
cube
flow
community,
where
it
was
talking
about
history
servers
as
well,
and
it
just
seemed
to
be
a
wider
issue
right
if
you
want
to
have
a
way
to
keep
track
of
objects
in
their
history
forever.
If
there
is
a
way
to
plot
that,
and
maybe
is
something
it's
probably
something
you
have
on
the
side
that
does
backups
of
it
and
then
stores
it
in
some
that.
G
E
Here's
here
my
primary
concerns
with
this:
it's
not
clear
to
me
that
experiment,
the
results
of
an
experiment
are
closely
tied
to
the
life
cycle
of
a
kubernetes
cluster
and
backed
by
imagine
that
you
might
turn
clusters
up
and
turn
them
back
down
as
experiment
over
the
lifetime
of
multiple
experiments.
But
you'd
want
to
be
able
to
store
the
results
of
an
experiment
and
it's
outside
of
the
scope
of
the
kubernetes
cluster
yeah.
G
E
My
only
concern
really
and
then
the
other
thing
being
emitted.
You
may
have
multiple
clusters
turned
up
simultaneously
with
experiments
that
are
being
run
and
they
may
want
to
be
able
to
share
the
results
and
the
collection
of
experimental
data
as
a
global
resource.
So
I
don't
think
it's
as
simple
as
saying
we
can
build
an
abstraction
inside
of
a
kubernetes
cluster
bit.
I,
don't
know,
I
think
it's
a
harder
problem
than
just
saying.
We
can
just
build
some
abstraction
that
handles
that
particular
well.
B
We
have
some
idea
of
some
solutions
for
this,
but
it's
mostly
being
aware
of
there
being
a
problem
and
I
think
it's
a
good
indication
that
if
it's
a
tough
problem,
then
you
know
they
tend
to
walk
around
them
for
for
a
bit,
so
yeah
I
think
it's
just
like
the
lodging
that
whoever
pulls
in
is
today
will
have
to
deal
with
some
of
this,
and
it's
yet
one
of
these
kind
of
easy
to
get
into
a
silo
if
there
isn't
like
a
good
way
of
meeting
as
some
kind
of
common
as
shared
format.
B
A
Example,
which
dumping
his
historical
application,
like
historical
states
of
objects,
you
can
think
of
like
grease,
simple
solutions
and
not
over
complicated
to
start
with
I
think
at
the
end
of
the
day,
people
just
want
to
run
something,
and
if
the
cluster
goes
berserk
or
like
jobs,
EPA
goes
because
I
still
some
way
to
go.
Look
at
what
happened
previously.
E
I'll
be
more
like
a
generic
who's
to
be
serviced
and
up
and
running,
and
that
you
don't
really
care
about
whether
the
underlying
implementation
is
based
on
kubernetes
run
as
a
service,
and
you
know
something
that's
backed
by
multiple
types
of
durable
storage.
So
you
can
put
it
on
GCS
as
three
agent.
E
E
Yeah
so
I
mean
there's
some
challenges,
particularly
with
penser
flow
serving,
particularly
if
you
want
to
try
to
get
it
to
work
on
GPUs
and
how
we
deal
with
it
as
an
abstraction
also
with
running
experiments.
It's
not
necessarily
the
same
use
case
that
you'd
have
four
typical
deployments
like
stage
canary
type
deployments
that
you
do
with
serving
workloads
on
kubernetes.
E
Auto
scaling
becomes
a
bit
of
a
problem
in
particular
of
GPU,
like
it's
not
clear,
so,
with
a
CPU
bound
workload
and
with
TF
serving
as
a
CPU
bound
workload,
you
can
pretty
much
auto
scale
off
of
CPU
utilization
as
a
good
trigger.
Gpu
duty
cycle
is
probably
not
a
good
trigger
request.
Latency
might
be
a
better
trigger
or
total
QPS,
which
is
probably
tied
directly
with
request,
latency
and
batching.
E
There's
just
a
lot
of
things
that
we
could
probably
do
better
and
potentially
in
a
more
general
way
and
then
ultimately,
a
question
becomes
to
TF
serving,
is
you
know
it
works
for
tensorflow
out
of
the
door
right,
but
there's
nothing
that
stops
you
from
providing
other
plug-ins
for
TF
serving
to
serve
other
different
types
of
models
like
I.
Think
there's
a
there's,
an
XT
boosts
in
the
community
ready,
there's
a
Kathy,
2-1
yeah,
so
I
mean
looking.
B
Sounds
good,
so
I
think
the
last
one
here
is
we
don't
have
time
for
this,
but
we
try
to
start
a
you
know
a
list
here
which
is
like
for
each
one
of
the
slides.
We
can
go
and
vote
and
comment
and
do
whatever
and
figure
out
which
SiC
is
relevant
for
it,
and
if
there
is
ongoing
proposals
we
could
write
that
too.
So,
maybe
in
two
weeks
time
we'll
be
in
a
better
position
to
figure
out
what
is
less
ambiguous
and
what
code
do
now
versus
what
requires
a
lot
of
effort.
B
B
B
Sounds
good
I,
don't
have
anything
else,
hope
to
see
people
kind
of
chime
in
with
you
know
what
they
care
about
and
then
we
could.
We
could
figure
out
like
which
one
which,
which
topic
to
attack
first,
you
can
get
a
bit
more
focused.
You
know
today
we
were
all
over
the
place,
but
I
was
on
purpose
so,
like
gonna,
start
funneling
and
get
focused.