►
Description
Meeting Notes: https://docs.google.com/document/d/16CEsBSSGm3sMpvB_cFnKnqqi1OxhIcyX3lVwBpIyMHc/edit#heading=h.ouxlycri8nmz
Discussed status of self hosting dependencies: daemonset surge updates and kubelet checkpointing.
A
Okay,
hello
and
welcome
to
the
October
11th
2017
edition
of
the
self-hosting
/h,
a
breakout
meeting.
First
aid
cluster
life
cycle.
Let's
see
we
had
started
talking
briefly
before
the
recording
about
surge
updates.
I
see
that
Aaron
is
now
here.
So
maybe
we
can
continue
that
discussion
Aaron.
You
know
if
Diego's
gonna
show
up
also
I,
don't.
A
So
I
just
mentioning
to
Tim
that
Brian
gave
a
a
comment
on
the
issue
last
night
saying
that
he
didn't
think
we
should
add
it
effectively
and
Tim
and
I
were
talking
about
the
the
fact
that
we
had
chosen
to
use
daemon
sets
for
the
master
components,
but
maybe
we
should
reassess
that
decision
in
light
of
Brian's
comment,
so
I
guess
they're
sort
of
too
fast
forward.
One
is
to
go
talk
to
Brian
and
Kenneth
and
try
and
get
sig
apps.
A
C
B
C
We,
the
original
motivation
behind
using
demons.
That's
for
sticking
around
the
scheduling,
snafu
that
existed
with
not
having
the
Tate's
be
schedulable
when
the
network
condition
and
the
network
wasn't
there
yet
right
and
I
believe
that's
all
fixed.
Now
it's
all
exists
or
is
in
place.
So
I
don't
know
if
that
condition,
that
there
was
the
primary
driver
behind
choosing
demons.
That's
I,
don't
know
if
that's
still
a
constraint
that
we
actually
have
anymore.
B
B
The
phased
approach,
kind
of
complicated
that
a
little
bit
but
the
stuff
that
we
run
into
with
using
deployments
is
essentially
you
want
these
components
to
be
spread
across
nodes
in
an
H,
a
situation,
but
if
you're
kind
of
bootstrapping
into
a
single
node
or
it
happens
that
they
all
end
up
on
one
server,
it's
actually
pretty
common.
For
that
to
happen,
you
want
to
start
using
anti
affinity
rules
on
your
deployment,
so
you're,
saying:
okay!
B
And
so
essentially,
our
anti
affinity
allows
pods
of
the
same
object
but
different
versions
to
coexist
on
the
same
node,
but
it
won't
allow
pods
of
the
same
version
of
the
same
object
on
a
node.
So
during
an
upgrade,
we
create
a
new
deployment
that
has
a
label
that
changes
so
that
it'll
be
even
though
it's
the
same
component.
Technically,
it's
the
scheduler,
let's
say
it'll
be
able
to
be
co-located
during
the
upgrade
process.
B
C
To
take
to
take
a
step
back,
this
is
this
is
the
conversion
of
using
deployments,
but
daemon
sets
the
only
problem
that
I'm
aware
of
that
exists
is
if
you
have
a
daemon
set
of
one.
If
you
have
a
daemon
set
of
more
than
one,
this
problem
doesn't
even
exist
at
all
because
it
doesn't
matter
if
you
kill
the
one,
that's
there
and
start
a
new
one.
It
only
matters
when
you
for
a
daemon
set
of
one
on
the
upgrade,
so
it's
a
special
condition
right.
A
C
I
think
we're
doing
a
lot
of
rigmarole
and
workaround
for
what
is
essentially
going
to
be
a
single
one-off
case,
which
we
already
actually
have
in
the
code
to
handle.
So
I'd
want
to
talk
with
Lucas,
to
verify
that
you
know
the
one-off
code
that
we've
written
for
upgrading,
zuv
upgrading
of
a
single
node.
C
We
might
want
to
maintain
for
a
longer
period
of
time,
but
the
demon
set
solution
works
fine
for
most
h.a
scenarios,
where
you
go
to
zero
and
come
back
because
you
will
have
multiple
controller
managers
and
they'll.
Just
bounce
and
you'll
have
multiple
schedulers
and
they're
also
bounced,
and
you
have
multiple
API
servers
and
they're
just
real
of
balance.
So
it's
I
don't
see
a
problem
there
other
than
the
the
one
node
scenario
yeah.
A
C
Yes,
I'd
have
to
think
through
the
scenarios
here
where
you
might
want
to
do
a
sort
of
the
conversion
scenario
right
where,
if
it's
a
single
node
and
you're
doing
an
upgrade,
it's
a
it's.
It's
on
the
machine
type
of
process
right
so
who
ATM
could
do
all
the
work
for
you
know
doing
a
then
B,
then
C
in
that
order
for
the
upgrade
process,
but
I,
don't
this
problem
goes
away
again,
once
you
actually
have
high
availability.
B
The
other
guard
are
the
other
thing
that
you
know
like
we're
speaking
about
this
in
terms
of
upgrades,
but
ideally
you
know
it
being
self
hosted.
You
want
to
change
a
flag
on
your
scheduled
area,
controller
manager,
you're
going
to
edit
it
and
that's
the
same
process
as
an
update,
and
so,
if
we're
saying
that
this
logic
of
being
able
to
safely
update
lives
in
through
that
and
the
tool
like
that
becomes
the
only
mechanism
of
like
making
those
kind
of
changes
to
that
component.
B
B
C
B
I
think
there's
enough
options
here
that
I
want
to
say:
I
want
to
continue
to
say
yes
like
we
do
it
today
and
it
works.
But
there's
like
that's
where
Damon
says
would
have
been
nice
in
this
case.
Another
thing
that
we've
kind
of
internally
floated
is
just
writing
a
babysitter
controller
that
just
sits
there
and
all
it
knows
how
do
is
extract.
The
pod
speck
out
of
a
out
of
a
deployment
object,
specifically
controlling
manager
or
scheduler.
If
they
don't
show
up
in
too
long
and
then
just
injects
it
in
a
bit
cluster.
B
C
A
I
guess
the
the
update
strategy
to
me
seems
a
lot
cleaner,
because
you
don't
have
fragmentation
between
single
master
and
multi
master
scenarios
you're
doing
exactly
the
same
thing.
In
both
cases,
it
would
make
a
lot
easier
to
switch
between
the
two
right.
If
you
start
out
with
three
masters
and
you
decide,
I
really
only
need
one
and
you
go
down
to
one
unless
you
add
extra
things
to
your
cluster,
to
make
it
sort
of
safe
to
run
a
single
master.
You
are
now
in
a
very
precarious
situation,
so.
C
Our
top
priority
in
the
workloads
area
is
today
SD
crisis
table
to
g8
one
point:
five,
this
seems
complicated.
It's
arguable
whether
we
should
ever
do
it,
but
we
should
not
do
it
in
one
line.
I
am
flummoxed
by
the
idea
that
we
have
these
notions
of
being
able
to
have
feature
gated
items
to
enable
these
types
of
things
right
and
if
it's
completely
behind
a
feature,
gate
I
do
not
understand
why
this
would
be
a
problem.
C
A
C
A
C
A
talk,
I
talked
with
Don
yesterday
briefly
didn't
mentioned
here
that
I'm,
it's
considered,
p0
from
our
side
and
I'm
talking
with
you
tomorrow.
My
plan
today
is
once
I
finish
up
some
work
to
start
my
rebase
and
change
all
the
names
to
bootstrap
slash
checkpoint,
because
I
don't
want
to
have
any
more
arguments,
because
the
arguments
originally
started
from
the
generic
checkpoint,
name
and
I.
Think
for
the
sake
of
expedience
and
just
to
get
things
done,
I'm
just
going
to
call
it
bootstrap
checkpoint.
Unless
there's
any
ambiguity,
other
people
care
about.