►
From YouTube: Kubernetes sig-aws 20170714
Description
Recording of kubernetes sig-aws meeting held 2017-07-14
A
B
A
A
D
A
C
You
know
any
any
execute
nodes
any
to
blitz
running
that
could
access
the
storage
kind
of
tracked
it
down
when
we
found
out
that
we
were
installing
the
clusters.
Without
you
know
the
cluster,
any
cluster
ID
tags
needed
the
kubernetes
cluster
tag,
and
we
had
thought
that
being
isolated
to
its
own
BBC
would
would
handle
that.
But
it
appears
it's
not
and
certainly
looking
in
the
code
it
it
looks
like
at
best
you
can
get
isolation
on
an
availability
zone
per
user,
possibly
even
one
cluster
per
user.
C
Without
you
know,
possibly
having
issues
like
this,
so
I
kind
of
wanted
to
bring
up
the
cluster
ID
seems
pretty
critical
if
you're
gonna
be
ever
running
more
than
one
cluster
in
AWS
and
it
kind
of
seems
to
make
sense
to
make
it
a
required
field
period
in
kubernetes.
If
you
can
run
it,
you
know
in
AWS.
You
need
to
have
this
even
if
it's
only
a
single
cluster.
A
Yeah
I
speak
to
like
some
of
the
history
here
which
is
like
originally,
the
tag
didn't
exist
at
all,
and
it
turns
out
that
you
need
this
tag
to
differentiate
different
clusters
in
the
same
account
really
looks
like
certainly
without
the
tags.
Things
don't
work
very
well,
and
there
are,
for
example,
like
not
tagging
yourself,
not
taking
your
subnets,
for
example.
Well,
also
call
it
problems
with
some
providers,
so
definitely
tagging
is
highly
highly
recommended.
Yeah.
D
A
Is
a
yeah,
and
so
the
I
think
the
only
reason
that
we
would
not
make
it
mandatory
is
because
it
isn't
like
if
we
were
to
start
from
zero
today.
I
think
we
would
make
it
mandatory
across
the
board.
The
issue
is
that
making
mandatory
in
theory
breaks
anyone's
cluster
that
doesn't
have
those
tags
set,
even
if
really
they
should
have
them
set
yeah.
C
A
So,
like
that's
sort
of
what
we've
done
in
the
past,
where
we've
made
sort
of
your
breaking
changes
is
to
have
a
have
a
flag
which
people
can
set,
I
mean
I'm
possibly
we
can
debate
like
the
exact
sequence
of
of
whether
it
defaults
true
or
defaults,
false
and
all
those
things
over
the
releases
in
the
points
at
the
deprecation
policy.
Basically,
you
know
start
off
with
it
being
easy
to
fix
and
Gretsch
and
like
at
some
point.
A
But
I
I
think
that
the
application
flag
is
a
good
idea
if
anyone
has
any
so.
In
other
words,
the
proposal
would
be
we
make
it
required
the
cluster
name
tag,
and
maybe
some
other
tags
would
be
required
in
1-8,
let's
say,
and
so
everyone's
cluster
that
did
not
have.
If
you
have
the
tag,
no
problem,
if
you
don't
have
the
tag,
your
cluster
will
fail
to
start,
but
the
workaround
is
simply
to
add
a
flag.
D
Speaking
of
legacy,
I
was
I,
have
one
seven
source
checked
out
and
just
looking
at
the
tagging,
a
div
s
provider,
and
it's
called
this
Const
tag.
Name
kubernetes
cluster
legacy
is
kubernetes
clusters.
The
vet
there,
so
yeah
I
think
the
QuickStart
we're
already
doing
one
level
of
legacy
by
naming
the
tag
wrong.
I
just
now
realized
that
this
is
a
problem.
So,
yes,
there's.
A
Two
different
tags
there
yeah,
so
the
original
one
was
kubernetes
cluster
I
believe
yeah,
okay,
capital
C.
The
problem
with
that
is,
it
doesn't
allow
for
subnets,
for
example,
to
be
shared.
You
can
only
have
one
tag
with
the
same
name,
so
then
the
new
ones
are
communities,
do
I,
owe
slash
cluster,
slash
cluster
name,
and
then
you
can
say
equals
shared
equals
owned
to
have
some
notion
of
ownership.
But
the
idea
is
that
you
can
have
two
clusters
that
are
sharing
a
subnet
the
level
three.
A
The
level
two
legacy
is
that
I've
been
told
a
terraform
can't
currently
create
key
tags
with
slashes
in
them,
or
something
like
that
or
something
like
dynamic.
There's
something
now
issue
with
dynamic
terraform
tags.
That
means
that
people
are
not
entirely
happy
about
that
new
form
either.
So
the
beat
goes
on.
A
A
A
But
if
you
pass
a
flag
to
I,
guess
coop
controller
manager,
I
don't
know
if
we
need
to
enforce
it
across
the
board,
but
just
COO
controller
manager.
Then
it
will.
Although
I
don't
know,
if
that's
possible,
if
you
pass
a
few
passive
flag,
it
will
ignore
that
check
and
allow
you
to
continue
for
a
limited
period
of
time.
Presumably
1:8
and
maybe
1/9
are.
D
C
I
think
I
think
that's
what
Justin
is
suggesting
is
by
default.
The
flag
would
be
would
require
the
cluster
so
the
first
time
you
upgrade
how
to
run
it.
It's
gonna
error
out
and
there's
there
would
be
a
flag
that
you
can
set
to
run
it
in
the
old
mode,
but
you're
gonna
have
to
I
think
acknowledge
the
issue
because.
A
A
C
It
would
be
significantly
more
work
to
set
the
cluster
tag,
in
my
mind,
then
set
the
option
on
the
whatever
Damon
would
need
it,
because
you
have
to
go
find.
You
know
your
load
balancers
your
nodes,
your
you're,
persistent
volumes.
Whatever
else
is
your
cluster
is
used
and
created
and
tag
everything.
A
You
think
you
raise
a
good
point
and
I,
don't
know
how
hard
I
mean
I
guess
we'd
only
actually
make
the
instances
the
instance
would
be.
The
only
one
we
would
air
out
on.
The
issue
is:
if
there's
someone
out
there
who,
for
whatever
reason,
can't
tag
their
instance
with
the
Canaries
buster,
we
were
basically,
we
would
have
no
out
unless
we
gave
him
the
flag,
but
I
do
a
group.
I
suspect
it
wouldn't
be
too
hard
to
like,
whereas
the
flood
placed
on
the
on
the
instance.
That's.
B
E
A
Think
that
makes
sense,
I
think
that
makes
sense
and
I
think
also
that
we
say
that
if
you
have
to
use
this,
if
you
have
to
use
this
flag,
let
us
know
why
you're
using
this
flag
because
it
is
or
will
deprecated
it
immediately
from
the
start.
So
we'll
start
the
clock
I'm
like
turning
off
that
flag
and
I
think
I
think
that's
reasonable.
We.
A
C
A
C
E
A
That's
it
I
think
they're
pretty
sure
be
more,
but
I
think
that
they're
technically
there
is
a
deprecation
policy.
So
I
don't
know
if
this
would
actually
be
a
breach
of
our
deprecation
policy.
It's
odd!
It's
an
odd
one.
We
can
ask
where
the
release
manager
is,
or
we
can
ask
the
architecture,
people
or
I,
don't
even
know.
Who
would
ask
I
can.
E
E
C
C
You
know
and
there's
been
some
discussion,
but
it
was
lots
of
different
ideas,
but
nothing
had
yet
towards
a
conclusion
and
I
just
kind
of
I
think
we
might
need
to
have
the
overall
working
group
or
sig
or
whatever
they're
gonna
end
up,
calling
it
doing
it
with
it,
whatever
to
kind
of
decide
how
to
move
forward
on
it
make
sure
we
get
enough
people
with
buy-in
right
from
the
cloud
providers.
I,
don't
I,
don't
think
yeah
we
got
like
a
B.
C
It
sure
yeah
can
we
start
it.
So
the
issue
is
that
least,
for
a
B
of
AWS
when
a
instance
is
stopped
in
in
AWS,
it
is
removed
from
kubernetes,
so
any
pods
that
are
running
on
and
get
rescheduled
and
all
that
kind
of
stuff.
And
then,
if
you
restart
the
instance,
obviously
it
comes
back
up
joins
the
cloud
it
has
all
of
its
data
on
it
in
other
cloud
providers.
There
are
some
others
that
when
you
stop
the
instance,
it
just
becomes
not
available
and
it
stays
in
kubernetes.
A
C
A
It
would,
if
you
had
a
big
deal,
yeah
yeah
I
mean
to
bit
should
treat
that
like
a
hot
plug,
which
apparently
happens
in
the
real
world
that
someone
went
plug
in
more
RAM
or
something
so
cubelets
should
be
able
to
update
the
node
status
and,
in
theory
everything
should
just
work,
I'm
sure-
or
I
suspect
it
won't
just
work
out
of
the
box,
but
I
don't
think
that's
a
guiding
concern,
because
I
would
just
be
a
straightforward
bug,
I
think
the
or
new
behavior
that
we
need
to
accommodate.
I.
A
Think
what
we're
good
interesting
is
like
with
volumes
like
local
volumes.
I
think
are,
are
the
more
problematic
or
the
ones
we're
willing
to
think
about
in
terms
of
like
look
of
Williams
anew
and
I
guess
like
if
you
have
a
local
volume
on
a
node
and
I
stopped
it
like
do,
I
want
to
bring
it
back
with
the
same
node
ID
so
that
the
same
local
volumes
persist
or
whatever,
whatever
the
logic
is
there
I
think
will
be
a
guiding
one.
I.
Don't
know
what
other
cases
can
kind.
A
E
E
B
E
E
A
I
think
that's
I,
think
that's
a
good
point.
I!
Think
that,
though,
there's
a
there's
like
a
general
like
what
is
it
you
know,
the
idea
of
like
the
nodes
on
a
degress
are
also
very
ephemeral,
right
and
I'd
love
to
see
us
integrate
with
the
or
continue
our
integration
with
the
cluster
autoscaler.
More
so
that
sure
the
notion
of
nodes
doesn't
really
matter
as
much
anymore
in
my
mind,
but
I
guess.
Initial
question
is
like
why?
Why
do
people
stop
nodes
like
case
I?
A
E
They're
in
the
region
or
an
availability
zone
is
no
longer
available,
so
what
would
happen
then
I
think
they
get
terminated
and
terminated
with,
but
looking
at
they
say,
I'm
I
only
look
at
the
committee's
screen,
I.
Don't
ever
look
at
AWS
right
because
I
trust
the
kubernetes
scrape
right.
Then,
if
reason
built
now
something
that
happens
to
the
cloud
and
those
notes
go
down.
What
does
my
kubernetes
screen
show.
A
E
A
B
E
A
A
A
A
F
Isn't
there
a
difference
between
an
instance
that
has
been
stopped
by
the
user?
That's
different
from
there's
an
issue
in
AWS
and
a
zone
is
not
responding
right.
So
I
would
imagine
that
within
communities
the
eight
of
us
controller,
it's
not
gonna
if
it
loses
contact
with
an
AZ
for
whatever
reason
it's
not
gonna
suddenly
mark
all
of
those
nodes
as
deleted.
A
There
are,
there
are
so
there
are
three
states,
I
guess,
there's
an
instance
can
be
terminated
right,
which
is
when
AWS
force
or
the
user
shut
it
down
forever.
There
is
stopped,
which
is
a
sort
of
temporary
suspension,
but
you're
not
charged
for
it.
Nothing
is
actually
running
it's
just
sort
of
it
remembers.
A
The
configuration
eight
of
us
remembers,
the
configuration
can
restart
it
and
that's
why
I
believe
we
can
change,
ends
types
and
then
the
third
case
is
something
has
gone
wrong
with
the
control
plane
and
you
can't
reach
an
AZ
and
a
couple
number
of
years
ago,
I,
remember,
seeing
like
eight
of
those
instances
would
just
disappear
from
the
describe
instance
list
in
that
situation
and
I
believe
it's
fixed
the
mitigation
against.
That
is
that
so
what
currently
happen
is
those
nodes
would
be
deleted
because
your
guys
would
say
I
have
no
knowledge
of
these
nodes.
A
The
mitigation
against
that
is,
if,
like
more
than
some
fraction
of
your
nodes,
disappear
at
once,
I
think
we
don't
like
we
don't
evict,
we
don't
delete
the
nodes,
I
mean
that's
what
I
think
happens,
but
honestly
I
think
we
saw
this
actually
like
two
years
ago
in
Europe
somewhere
as
well,
but
yeah
it.
Certainly
it's
certainly
a
scary
edge
case.
I
mean
your
nodes
on
AWS
or
the
cloud
are
supposed
to
be
ephemeral,
so
they
should
I,
don't
know,
that's
that's
where
we
I
think
that's
where
the
sort
of
the
yeah
this
might.
E
F
E
Mean
I'll
tell
you
how
I
got
to
it.
I
was
doing
a
demo
of
a
glossary
fest
on
top
of
on
AWS
and
I
was
trying
to
show
how,
if
I,
take
1
AZ
down,
but
the
parts
still
running
and
when
I
said
ready
and
everybody's
like
yeah.
What's
the
problem,
so
I
was
like
no,
no,
that
I
shut
down
a
whole
bunch
of
notes.
I
trust
me
I
did
so.
C
That
was
actually
what
kind
of
spawned
it.
So
we
had
a
customer
that
had
that
kind
of
a
thing
here.
They
had
a
system
that
they're
trying
to
use
to
monitor
the
nodes
and
kubernetes
and
query
and
kubernetes,
and
they
had
nodes,
go
down,
it
would
get
stopped
or
whatever
and
they
disappeared.
So
they
could
never
monitor.
When
a
node
had
an
issue
right
because
kubernetes
perspective,
it
didn't
stopped
existing.
A
C
A
It
might
be
a
general
cluster
health.
We
have
no
problem
now.
Maybe
it's
like
the
cluster
problem
detector,
which
is
you
know,
copier
or
a
third
of
your
nodes.
We
don't
have
any
coverage
in
this
AZ
or
you
only
run.
A
single
AZ
or
every
single
pod
is
in
is
on
one
in
one
node,
for
example,
like
those
sort
of
things
warnings
that
every
single
pods
on
one
node.
A
C
It
was,
it
was
something
they
tried
to
do.
It
didn't
work
out
very
well,
but
it
kind
of
brought
up
this
kind
of
when
digging
into
it
and
looking
at
what
was
going
on
it
kind
of
brought
up
this
question
of
oK.
We've
got
these
inconsistencies
across
cloud
providers.
So
now
what
are
we
in
order
to
come
up
with
any
consistency?
We
need
to
figure
out
what
a
new
whatwhat
a
node
is
and
what
determines
a
new
node
and
all
of
this
before
we
can
determine
you
know
in
four.
We
can
start
saying.
A
C
B
C
C
F
E
A
Yes,
not
ready
that
was
entirely
ready.
Not
ready
is
entirely
done
by
the
cubelet
and
the
qubit
heartbeats.
It's
entirely
kubernetes
comp
concept,
but
we
what
we
have
in
addition
on
AWS
ears
and
all
clouds
yeah,
most
confident
we
have
a
list
of
instances
and
if
the
disk,
here's
or
if
it,
if
it
is
not
running
on
AWS
in
the
describe
instance
list,
then
we
will
delete
the
note,
and
that
is
ace.
That
is
the
node
controller,
include
controller
manager.
F
A
E
A
D
Not
necessarily
an
old
new
split
either
I
mean
like
there's,
there's
situations
where
if
I
was
running
locally
or
something
like
that
and
that
was
on
Prem
and
I
wanted
to
shut
down
machines
because
to
save
power
or
something
like
that,
then
I
would
imagine
a
node
being
stopped
is
a
totally
legitimate
state.
True.
E
A
C
C
D
What
it
makes
sense
I
mean
I'm,
looking
at
the
code
right
now
and
I'm,
seeing
there's
only
three
states
for
a
node
fade
because
pending
running
and
terminated,
would
it
make
sense
for
a
kubernetes
to
learn
a
concept
of
a
stopped
node
and
that
you
guys
would
just
clearly
do
that
and
then
other
cloud
timers
don't
have
a
concept
of
stop,
so
they
would
put
it
right
and
terminate
it.
It's.
A
Something
you're
saying
that
a
kubernetes
No,
that's
interesting
right
because.
D
C
D
A
D
Dad
it's
painful.
It's
really
like
it's,
it's
kind
of
a
contrived
scenario,
a
little
bit,
but
it's
still
probably
worth
expressing
it's,
especially
because,
if
we're
in
a
situation
where
we
need
to
reconcile
the
behavior
different
cloud
providers
and
try
to
force
them
to
the
same
bucket,
when
it's
really
not
the
same
thing,
this
is
be
a
nice
like
slowly.
It's
fixing
that
yeah
I'll
start
on
that
doing.
E
D
Okay,
if
we
have
time
for
another
topic,
that
one
thing
I
wanted
to
bring
up
real
quick
was
that,
with
the
one-seven-one
release
that
just
came
out
cube
Adam
now
learns
a
node
named
flag,
which
is
important
on.
We
were
having
issues
with
one
seven
on
our
QuickStart,
where
nodes
would
join
trying
to
advertise
their
short
node
name
without
the
fully
qualified
DNS
and
some
other
part
of
kubernetes,
and
the
details
are
escaping
me
right
now.
D
But
some
other
part
of
kubernetes
was
anticipating
that
the
pre-allocated
node
was
gonna,
have
the
long
fqdn
they
weren't
agreeing
with
each
other
and
to
disambiguate
it
for
amazon's
case,
because
I
guess
this
was
hitting
everybody
that
was
trying
to
use
AWS
cube
atom
and
one
seven.
I
think
they
added
a
node
name
field,
where
you
could
just
specify
what
you
wanted
to
name
the
node
for
cube
adam
for
both
in
it
and
for
joining.
So
we're
gonna
be
doing
that
in
hefty
o's
end.
D
A
Cops
just
soaked
up
says
done
that
before,
but
what
we
had
literally
this
morning
is
cute
frocks.
He
also
has
a
similar
flag
and
also
things
about
the
same
thing
and
there's
a
PR
which
is
gonna,
make
it
a
serious
problem
if
they
don't
match,
so
we're
also
doing
that
in
queue
proxy.
This
is
a
long-standing
issue
where
the
root
of
this
is
it
used
to
be
that
the
node
name
had
to
be
resolvable
from
the
master
and
was
how
the
master
reached
the
cubits
that
is
now
fixed
with
some
flags.
A
You
can
set
the
resolution
order
for
how
the
master
or
the
API
server
think
it's
a
para.
How
the
API
server
talk
to
the
cubelets
and
you
prioritize
the
internal
IPS.
Then,
basically,
everything
works,
as
you
would
imagine.
It
should
have
always
worked.
The
node
name
doesn't
really
matter
anymore
other
than
they
have
to
match
yeah,
but
I
would
love
to
see
a
better
node
name.
A
D
A
D
See
I
think
what
we're
gonna
do
now
is
gonna
use
fqdn
everywhere
for
our
node
names
and
just
hope
that
works
across
our
fingers.
What.
A
I
would
love
at
some
point
to
get
the
node
name
being.
The
incident
database
instance
ID,
but
I've
also
had
feedback
that
people
like
it
being
the
longer
fqdn,
so
they
can
like
map
it
to
an
internal
IP.
Even
though
it's
the
internal
IP
and
not
the
external
IP,
like
people,
you
like
that
as
well,
so
it
got,
we
solved
the
amitabh
there
are
there
other
problems
here
which
is
like
if
you
have
a
custom
domain,
name
or
DHCP
domain
names
in
your
V,
PC
I
think
that's
correct.
A
You
get
into
all
sorts
of
problems,
but
which
is
another
reason
why
it's
so
so
frustrating
or
so
complicated
to
deal
with
this,
but
I.
Think
for
now,
we've
got
all
the
problems
fixed
and
we'll
probably
just
live
with
it.
Yep
yeah
I'd
love
at
some
point
to
get
a
better
node
name,
but
I,
don't
think
it's
gonna
happen
anytime
soon.
A
A
Though
yeah
it
is
two
hundred
where
it
goes
that
where
it
was
hitting
a
page
for
pagination
limit
or
a
a
limit
on
the
number
of
the
number
of
filters
you
can
have
actually
but
anyway
there's
those
bug
should
be
fixed
but
I.
It
certainly
hasn't
had
the
coverage
that
we
that
we
would
like
to
have
on
it.
Yet
so
any
feedback
will
be
very
welcoming.
Someone
has
that
you.