►
From YouTube: Kubernetes WG Batch Weekly Meeting for 20221222
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Yeah,
hello,
everyone
today
is
the
December
22nd.
This
is
batch
working
group
by
weekly
meeting.
We
have
one
item
on
the
agenda
for
today
before
we
go
through
that.
Just
why
I'd
like
to
remind
you
that
this
meeting
is
recorded
and
will
be
uploaded
to
YouTube.
So
please.
B
A
Sure
that
you
adhere
to
kubernetes
code
of
conduct,
so
yeah
as
I
was
mentioning.
There
is
one
argument:
the
agenda
that
I
guess,
Aldo
and
kante
would
like
to
discuss,
which
is
Q
roadmap.
A
We
have
a
PR
I,
think
discussing
that,
but
I
think
Aldo
prepared
a
short
presentation,
so
I'm
not
sure
Aldo
and
Kanti
did
you
have
an
agreement?
How
are
you
gonna
proceed
with
today's
meeting.
B
Okay,
hear
me:
yeah
yeah
I'll
show
some
a
little
map
for
the
queue
in
the
next
year
and
then
maybe
yeah.
You
know
what
I
mean.
B
Yeah
I
can
see
it.
Okay,
you.
C
B
Are
coming
to
the
end
of
the
of
2022,
so
we
have
a
rough
plan
for
next
year
about
what
we
want
to
do
in
the
in
the
next
year
and
there.
D
B
Several
parts
of
the
the
because
there
are
two
parts
of
the
roadmaps
is
one
one
is
about
the
features
we
we
think
we
can
finish
in
the
next
year,
but
it
depends
the
it
depends,
the
time
and
the
priorities.
D
B
Hopefully
we
can
get
get
some
feedbacks
about,
like
you
know,
like
the
suggestion
and
feature
demands
just
to
me
just
to
make
sure
that
we
are
on
the
right
track.
Okay,
so
so
during
the
sharing.
If
anyone
have
any
questions
as
up-
and
we
can
talk
about
in
details-
okay
and
I'll-
go
through
the
list
one
by
one
for.
B
A
Think
I'll
do
put
together
a
presentation.
Maybe
it's
easier
to
go
through
that
yeah.
B
A
C
C
Okay,
so
I
I
already
put
a
link
in
the
in
the
meeting
notes,
so
thank
you,
kante
for
for
bringing
up
the
topic.
I
wanted
to
kind
of
give
an
overview
of
of
what
we
have
been
doing
in
the
last
release.
C
So
let
me
let
me
let
me
just
start
so
we
are
planning
a
v
0.3
release
and
the
estimate
is
mid-January
and
we
have
two
ongoing
ongoing
items.
One
is
preemption,
so
if
you're
familiar
with
with
a
scheduler
there
is,
there
is
preemption
in
scalar
two,
but
the
preemption
is
based
on
thoughts.
So
when
there
is
Need
for
space,
we
preempt
individual
thoughts
in
in
queue.
We
are
kind
of
trying
to
do
the
same,
but
at
the
job
level.
C
So
this
is
a
Atomic
preemption,
let's
say,
and
there
is
two
modes
of
preemption
and
Q
according
to
our
design,
one
is
within
your
own
resources
and
one
is
among
the
resources
that
you
share
with
the
rest
of
the
teams
which
we
call
cohort
and
if
you,
if
you
open
these
slides
in
from
the
link
that
I
shared
you,
can
you
can
open
the
issues,
but
for
now
please
can
we
just
stay
here:
yeah
yeah,
if
you're
interested
follow
the
links.
C
So
that's
one
feature
the
other
feature:
we've
been
working
on,
it's
a
short-term
form
of
All
or
Nothing,
which
basically
is
just
a
it's
just
a
one
after
another
scheduling,
so
we
we
only
scheduled
one
job.
We
wait
for
the
Bots
to
be
to
be
running
and
then
we
proceed
with
the
next
job.
C
C
But
this
is
a
short-term
solution
that
that
is
configurable.
You
can
disable
it
if,
if
it's
not
something
that
works
for
you,
because
it
can
be
very
slow,
so
that's
that's
the
ongoing
and
then
thanks
to
our
contributors
in
the
community,
we've
completed
two
more
important
tasks.
We
now
have
a
performance
test
that
is
easily
reproducible
by
victors
and
we
have
end-to-end
tests.
I
think
think,
thanks
to
Kevin
Kevin
Harlan.
C
C
Just
for
for
you
to
have
an
idea
of
what
we're
what
we're
focusing
on
or
our
efforts
on.
But
of
course,
if
you
have
your
own
ideas,
please
please
feel
free
to
bring
them
up.
So
can
you
go
to
the
next
slide?
Please?
C
So!
Yes,
some
of
these.
In
some
of
these
plans,
we
have
some
developers
are
located
too,
but
there
will
likely
be
minor
tasks
or
or
bigger
tasks
where
we
we
would
need
help.
C
But
let's
go
over
first
through
the
priorities
we
we
already
have
some
allocation
for
so
comparative
preemption.
This
is
a
second
step
or
a
Improvement
to
the
financial
we
have
where
we
want
to
include
information
about
in
jobs.
The
jobs
can
include
information
about.
When
was
the
last
time
they
did.
They
did
a
checkpoint,
and
this
this
is
useful
information
for
prioritization
of
what
can
be
preempted
so
likely.
We
want
to
preempt
jobs
that
recently
that
more
recently
did
a
checkpoint.
C
So
that's
that's
that
task
graduate
apis
to
Beta.
This
is
important
for
production
Readiness.
Once
we
we
move
into
a
beta
API,
we
we
will
promise
longer
longer
support
for
it
and
backwards
compatibility.
C
So
that's
an
important
step
towards
production
production.
Readiness
then
we
want
to.
As
I
mentioned,
we
have
performance
a
performance
test
and
we
want
to
start
paying
attention
to
it
and
incorporate
learnings
from
there
to
improve
performance
yeah.
There
are
easy
tasks
here,
for
example,
incrementing
the
number
of
workers
in
in
several
of
the
small
controllers,
that
will
that
will
give
us
better
performance
and
introducing
parallelization
I,
mean
CPU,
polarization
and
so
on
and
so
forth.
C
The
things
easy
tasks
that
can
be
done
to
improve
performance
and
more
more
granular
tasks
can
be
done
with
people
with
profiling,
which
we
we
want
to
add
as
well,
and
another
thing
we
want
to
work
on
is
some
form
of
meta
job.
As
you
can
see,
the
the
name
is
not
established.
That's
kind
of
the
idea,
but
the
intention
here
is
to
have
an
aggregate
job
that
in
which
you
can
Define
multiple
pod
specs
right,
for
example,
to
support
a
launcher
worker
paradigm
paradigms.
C
C
C
We
want
meta
job
to
be
to
create
multiple
jobs
and
then
the
meta
job
will
be
handled
as
a
single
unit
by
queue,
and
this
is
one
kind
of
thing
that
could
possibly
make
it
all
the
way
down
to
kubernetes
core,
but
I
think
it's
it's
fair
to
start
in
a
project
which
has
more
velocity
such
as
Q,
so
and-
and
that
would
be
the
idea
we
start
this
in
Cube,
we
experiment
we
iterate
and
then
possibly
let's
say
2024
or
later
we
proposed
a
12
streaming
and
then
we
we
want
to
encourage
a
few
things
both
in
upstream
and
in
queue
in
Upstream
I.
C
Don't
know
if
way
is
here,
but
we
worked
on
on
scheduling,
Gates
scheduling,
Readiness
Gates,
which
is
a
very
useful
building
block
to
to
block
scheduling.
This
is
particularly
important
for
spark
applications,
because
in
spark
you
could
create
a
launcher
bot
and
then
the
launcher
pod
would
would
create,
would
create
a
lot
of
pots
without
asking
for
permission
right,
we'll
just
create
a
pods.
C
We
want
some
control
over
that
and
then
the
the
solution
we
we
proposed
along
with
way,
is
to
add
this
scheduling
Gates
feature
in
Cube
schedule,
so
this
is
an
important
building
block
that
will
be
very
helpful
for
for
Q
and
yes,
so
it's
currently
in
Alpha.
We
want
to
encourage
the
the
Beta
release
in
127
and
the
integration
of
Q
with
it-
and
here
is
this
last
item
is
where
we
would
like
like
help
with
from
the
community.
C
We
want
to
start
integrating
Q
with
with
multiple
existing
Frameworks,
in
particular,
MPI
job
tensorflow
job
spark
has
already
mentioned,
and
Ray
IO,
which
Rey
has
some
similarities
with
spark
in
terms
of
how
how
it
works.
So,
yes,
that
these
are
the
Frameworks
that
we
know
of
are
top
of
mind
for
multiple
customer
from
multiple
communities.
C
But
of
course,
if
you
have
your
own,
please
bring
it
up.
One
important
thing
to
notice
is
that
to
integrate
with
with
Q.
We
need
a
few
hooks
right.
We
need
a
hook
to
basically
suspend
what
creation
from
from
the
jobs.
So
that's
that's
kind
of
the
first
step
in
each
of
the
these
apis.
We
need
to
add
this.
This
configuration
this
this
field
and
then
Q
can
integrate
with
it
yeah.
So
that's
kind
of
what
why
we
want
to
achieve
in
H1.
C
C
So,
but
we
have
a
few
ideas,
for
example,
book
provisioning.
This
is
something
we
we.
C
We
are
already
thinking
of
Designing,
but
we
don't
have.
We
don't
have
a
complete
proposal
yet,
but
the
idea
is
that
cluster
autoscaler
can
can
provide
an
API
where
we
can
request
a
bulk
of
a
book
of
notes
or
a
bulk
of
resources,
and
then
the
cluster
Auto
scalar
can
respond.
C
C
C
So
that's
that's.
That's
both
provisioning
and
then
another
idea
is
to
work
on
budgets
as,
as
you
might
know,
Q
is
basically
a
a
quota
system
right.
What
means
that,
at
every
point
of
time,
you
cannot
surpass
those
amount
of
resources
budgets
is,
is
more
about.
C
So
that's,
that's,
that's
something
we
want
to
to
achieve
next
slide,
please.
C
This
is
these
ideas
that
I
presented
are
coming
from
from
from
our
you
know
our
view
of
things,
but
if,
if
we
are
missing
something
we
would
like
to
know
if
something
else
is
top
priority
for
you,
you
have
the
the
resources
to
to
work
on
them,
you're
more
than
welcome
to
to
come
and
discuss
and
and
present
your
designs,
but
we
already
have
some
some
smaller
or
bigger
features
that
we
know
are
important,
but
we
don't
have
enough
resources
to
work
on
and
and
they
are
basically
up
for
grabs,
so
one
one
nice
one,
one
interesting
task
or
or
even
design
is
a
partial
admission
of
jobs.
C
So
you
can
say
things
like
my
job.
Can
my
job
ideally
requires
10
workers
right,
but
I'm,
okay,
having
only
five
so
depending
on
how
much
quartile
there
is
there
is
available,
or
when
we
have
budgets
how
much
budget
there
is
available
you
can,
you
can
start
on.
You
can
start
a
given
a
given
size.
C
So
that's
that's
one
pip
Prof!
This
would
be
very
useful
for
improving
performance
and
yeah.
We
all
of
the
kubernetes
core
components,
have
a
paper
of
endpoint,
so
so
my
some
ideas
from
there
can
be
grab
and
then
some
non
non-coding
tasks
such
as
setting
up
the
documentation.
C
We've
been
hearing
from
multiple
of
you
in
the
charts
in
the
in
the
slack
or
in
or
from
some
some
user
experience.
Research
we've
been
doing
that
documentation
is
not
it's
not
the
best.
So
definitely
we
would
welcome
help
there,
and
another
point
is
the
cube
CTL
plugging
we
have
cubicles
has
some
limits
in
terms
of
how
useful
it
can
be
for
crds
it's
hard
to
surface
some
complex
information
such
as
States,
so
active
CTL
plugging,
would
be
would
be
helpful.
C
Another
another
problem
we
have
in
queue
is
that
most
of
the
information
about
a
job
is
not
contained
in
the
job
it's
contained
in
a
separate
object,
which
is
called
the
workload
right.
If
you
want
to
know,
if
why
my
job
is
not
admitted
yet,
why
is
it
not
running
yet?
You
have
to
look
at
a
different
object
right,
so
having
that
available
in
a
single
you
know
listing
through
a
cube,
CTL
plugin
would
be
super
useful
for
for
easy
of
use
and
a
web-based
dashboard
grafana
samples.
C
We
had
a
contributor
share,
some
grafana
samples
with
us,
I'm,
not
sure,
even
if
you
can
like
have
kafana
templates,
if
there's
such
a
thing,
but
those
would
be
useful
or
a
web-based
dashboard
to
to
see
all
the
information
about
about
the
jobs.
C
So
these
are
all
apps
for
for
grabs
again,
there's
there's
two
two
things
here.
This
is
again
our
vision
of
things.
C
Kante
is
a
suggesting
that
we
publish
actually
the
roadmap
in
the
in
the
Repository
for
for
for
all
people
to
have
a
permanent
view
of
it
and
that's
that's
owned
by
a
community
right.
So
if,
if
you
have
ideas,
we
will
try
to
put
most
of
what
we
have
here
in
this
slides
into
the
into
the
pull
request.
C
But
if
there
is
some
some
priorities
for
you,
please
join
the
discussion
in
the
pull
request
and
at
this
point,
I'm
opening
up
for
questions
or
back
to
kante.
If
you
have
more
things
to
add.
B
Totally
the
presentation
cardboard
most
of
the
features
we
plan
for
the
map.
That's
one
point
about
the
multi-class
support
and
some
of
the
users
from
the
community
asked
for
the
Mali
class
for
the
past.
We
think
is
far
from
us,
not
because
we
have.
We
still
have
plenty
of
cool
features
to
improve
in
in
the
following,
so
I
think
it's
a
yes!
It's
a
long-term
feature
in
this
one.
C
Yes,
I
think
giving
all
the
features
we
already
have
for
2023.
It
might
not
be
possible
to
to
start
working
on
multi-cluster
in
2023,
but
definitely
something
in
our
minds
and
I.
Don't
know
if
we
have
defaults
from
G
research
here,
but
we
are.
We
are
in
conversations
about
how
to
synchronize
the
efforts,
because
that's
that's
the
better
of
Geo
research
of
Armada.
C
E
I
was
curious
about
with
Cube
flow
I've
been
noticing
that
there's
like
a
lot
of
progress,
going
to
try
to
combine
all
these
operators
into
like
a
single
training
operator
is
this:
are
we
kind
of
focused
on
integration,
one
by
one,
for
each
of
these
operators
or
I?
Guess
I'm
not
going
to
be
the
one
doing
the
kubeflow
but
I'm
curious?
Because
when
I
read
the
like
the
threads
and
stuff,
it
seems
like
there's
a
lot
of
stuff
going
on
in
the
Kim
flow
Community
around
consolidation
of
those
operators.
C
Yes,
I
think,
given
how
the
architecture
of
cube
flow
is
today,
the
Q
flow
training
operator
I
think
it's
easy
to
It's
relatively
easy
to
do
the
integration
for
all
of
them
at
the
same
time,
except
for
MPI
job,
which
is
a
separate
project
for
now.
C
A
So
I
have
some
thoughts
on
this.
One
I've
been
looking
at
all
all
these
apis
API
drop,
TF
job
by
torch
and
My
Hope
Is
that
we
don't
really
need
to
integrate
with
any
of
them
directly,
and
we
can
mostly
replicate
that
using
the
the
multi-job
API.
A
A
Long
term
I
think
we
can
come
up
with
a
solution
that
basically
a
single
API
that
would
enable
you
to
deploy
all
of
these
training
operators
using
a
single
API,
but
I
don't
want
to
discourage
people
from
integrating
if
they
wish
to
do
so,
so
that
that
is
that's
the
path.
That's
possible
I'm,
just
commenting
in
general
about
like
the
that
common
training
operator
that
Cube
flow
has.
A
So
there
are
multiple
apis.
The
implementation
is
basically
I
replicate
like
they
are.
It's
really
minor
differences
between
them.
It's
mostly
just
making
it
convenient
to
set
some
environment
variables
that
these
different
libraries
use
and
so
yeah.
So
that's
I,
I
hope.
My
hope
is
that
we
we
will
have
that
multi-job.
A
C
Okay,
everything
in
general
is
a
question
of
timelines,
because
some
people
might
want
support
for
TF
job
in
2023
and
once
met
a
job
Lance,
it's
probably
going
to
be
at
least
2024
before
is
able
to
migrate
to
the
to
the
API.
C
So
we
we
probably
will
need
midterm
solution,
which
is
direct
integration.
C
But
against
up
to
the
community
right
now,
what's
the
highest
priority
for
the
community.
E
It
seems
to
me,
in
general,
the
the
problem
that
I'm
dealing
with
also
is
all
these
different
crd
and,
like
you,
have
spark
Ray
and
I
know:
I've
looked
at
desk
and
a
few
others
and
their
AP,
like
their
pattern,
is
very
similar,
but
there's
also
a
lot
of
I
guess
business
logic
built
in
some
of
those
operators
where
I'm
sorry,
some
of
those
crds
so
I
do
I.
Do
like
the
idea
of
having
kind
of
like
a
meta
job
or
multi-job
API,
but
also
I,
think
for
short
term.
E
D
I
was
just
going
to
go
back
to
the
inquiry
about
Armada
and
just
point
out:
Kevin,
I
and
I
are
here.
If
someone
had
questions
about
Armada
now,
I've
you
have
an
audience
or
if
you
want
to
wait
till
presentation
next
time.
That's
fine
too.
D
Probably
a
little
too
much
to
get
into
I
I
didn't
catch,
who
who
was
asking
about
multi-cluster
but
feel
free
to
reach
out
to
and
check
out
the
Armada
project.io,
because
that's
that's
definitely
what
the
space
we're
playing
in
we've
got.
We've
got
a
lot
going
on
there.
E
So
one
question
for
the
the
scheduling
Gates:
it
says
in
the
2023
encourage.
Is
there
I
guess
it's?
It
is
already
in
Alpha
for
one
dot
26.
So
what
more
items
are
you
interested
in
for
that?
One.
C
So
first
graduation
to
Beta,
because
that
means
enabled
by
default
and
integration
I
mean
the
API
is
there,
but
Q
is
not
using
it
and
I
think
they
the
primary
case
or
the
the
first
case
which
we'll
need
this.
This
feature
is
Spark.
C
So
probably
this
part
integration
would
be
the
the
first
user
of
of
such
integration.
A
So
we
will
have
this
I
guess
documented
in
the
repo
and
please
feel
free
to
check
our
like
open
issues
as
well
and
suggest
any
enhancements
communicate
your
priorities
as
well
to
us
yeah.
It
should
be
an
exciting
H1
effort.