►
From YouTube: Kubernetes SIG Testing - 2021-11-02
Description
https://bit.ly/k8s-sig-testing-notes
NB: a portion of the meeting was edited out to avoid disclosing the specific vulnerabilities that were discussed in the meeting
A
Hi
everybody
today
is
tuesday
november
2nd.
You
are
at
the
kubernetes,
take
testing
bi-weekly
meeting.
I
am
today's
host
chair
of
sig
testing,
aaron
krittenberger,
also
known
as
say
beard
or
spiff
xp
at
all
the
places
this
meeting
follows
the
kubernetes
code
of
conduct,
which
means
we're
going
to
be
our
very
best
selves
to
each
other,
it's
also
being
recorded
so
that
it
can
be
posted
publicly
to
youtube
shortly.
A
Agenda
I've
got
a
quick
heads
up
and
then
we've
got
some
suggestions
from
chow
and
arno
to
talk
about
things.
A
A
A
If
there's
anything,
we
should
be
paying
specific
attention
to
in
the
next
two
weeks.
Stuff
that
typically
comes
up
around
this
time
are
any
major
test
changes
or
anything
that
attempts
to
change
a
bunch
of
container
images.
We've
made
great
progress
in
migrating
to
community,
hosted
images.
A
The
other
thing
we
try
to
keep
an
eye
on.
That's
not
necessarily
code
changes
for
kubernetes
are
things
like
workflow
changes
for
the
project.
A
So
if
there
are
any
like
major,
potentially
disruptive
workflow
changes
or
crowd
changes,
we
may
want
to
consider
whether
we
should
try
landing
those
now
or
passing
pause
on
them
or
adopting
a
little
bit
more
of
a
careful
migration.
Rollout
strategy.
A
A
That's
a
really
great
question:
I'm
gonna!
Look
that
up
real
quick.
I
think
it's
at
kubernetes
devs,
slash
release.
Typically,
I
think
code.
Freeze
these
days
lasts
until
the
release
actually
happens.
So
code
freeze
will
happen
tuesday
november
16th,
then
test
freeze.
So
these
are
test
only
changes
if
you
want
to
like
land
your
feature
then
land
your
tests,
which
don't
do
that.
That's
that's
not
good,
but
for
things
like
conformance
tests
or
things
that
have
been
hanging
out
waiting
for
approval
for
a
while
test.
A
A
And
then
my
impression
is
it's
it's
kind
of
a
lull
until
sometime
next
year,
I
think
I
had
seen
some
traffic
on
kubernetes
issues
about
maybe
using
this
time
for
any
large
scale
breaking
changes
or
like
messy
refactoring
that
we
might
want
to
land.
A
A
B
Yeah
and
now
claudio
is
in
the
in
the
meeting.
He
has
some
pr
to
implement
prepar
pre-pulling
off
images
on
the
e3,
and
I
mean
I
think
that
that's
interesting,
but
I
don't
know
if
all
of
you
are
aware,
because
everybody
will
sign
up
the
piazza
and
I
don't
know
if
he
will
need
more
more
help
to
learn
this
before
code.
Three.
D
That's
a
good
point
now.
Basically,
what's
the
idea
with
up
to
pulling
images,
I
already
saw
that
for
node
node
tests,
the
images
are
being
prepared
and
basically
I'm
proposing
something
similar
for
regular
e3
tests.
D
This
is
because
we
have
been
seeing
some
flakes
in
the
window
ci,
because
some
tests
were
expected,
the
the
poster
startup
in
one
minute,
but
if
those
tests
somehow
end
up
being
the
first
ones
to
run
it
might
take
longer
than
one
minute
to
pull
and
unpack
the
windows
images,
they
tend
to
be
a
bit
bigger
than
linux
images,
so
that
pull
requests.
I'm
gonna
link
it
in
sure
I
think
yeah
on
antonio
already
linked
it.
Thank
you
now
that
basically
introduces
an
optional
argument
by
default.
It's
false!
A
D
And
I
do
have
another
request
which
it
might
be
useful
for
us
now
from
what
I
observed
it's
regarding.
If
a
test
fails,
we
don't
really
see
anything
that
happened
in
the
pod.
If
there
are
any
issues
there,
especially
if
they
are
networking
issues,
a
lot
of
services
like
diagnosed
services,
that
being
are
being
startup,
every
request
is
being
logged,
which
is
extremely
useful,
but
we
don't
see
that
in
case
of
failed
tests.
D
So
if
we
could
actually
see
that
information,
we
might
get
some
more
in-depth
view
as
to
what
happened
for
that
failure.
Did
that
request
even
reach
the
pod
or
not?
Did
it
fail
with
some
interpol
communication?
D
We
don't
know
because
we
don't
see
any
of
the
pod
logs,
that's
scenario,
so
I
would
also
send
a
pull
request
for
that
as
well
in
that
one,
by
default
we
log
we
get
the
logs
from
the
top
five
points
in
that
test.
In
most
cases,
it's
more
than
enough,
and
in
other
scenarios
in
which
the
response
too
many
parts
more
than
five
labels
could
be
added
to
the
most
important
parts
that
should
be
locked.
Basically.
D
D
A
No,
I
think
this
totally
makes
sense.
I
feel
like.
A
Turn
on
and
just
sort
of
see
what
it
does.
The
only
thing
I
can.
The
only
impact
I
can
pick
up
is
like
an
increase
in
the
size
of
the
artifacts
that
we
store,
or
maybe
there's
gonna,
be
some
flakiness
and
actually
retrieving
the
logs
for
these
pods.
It
could
be
the
pods
end
up
going
away
before
we're
able
to
retrieve
the
logs
from
them.
A
So
the
one
I
wanted,
some
folks
eyes
on
was
an
update
to
the
approved
plugin.
We
had
this
person
come
by
stick
testing
a
while
back
and
sort
of
walk
us
through
a
demo
of
adding
a
granular.
A
Let's
see,
empower
more
people,
I
think,
to
approve
files.
It's
it's
targeted
at
like
getting
rid
of
the
targeting
the
root
owners
or
group
improvers
issue
for
kubernetes
kubernetes.
The
fact
that,
like
there
are
very
few
people
in
group
approvers-
and
we
kind
of
don't
want
to
add
more
people
to
groupers,
but
we
do
want
to
for
those
people
who
are
like
over
privileged.
We
want
them
to
be
able
to
say
I
am
approving
just
this
specific
part
of
it,
and
we
can
also
allow
people
to
approve
just
like
specific
pads.
A
Cases
here,
but
the
idea
is
this
was
discussed
in
brexit
testing
a
while
ago
and
then
I
think,
was
iterated
on
and
initially
it
had
been
done
as
a
separate,
approved
plug-in.
The
functionality
has
all
since
been
gated
behind
a
flag.
A
Freeze
and
stuff,
like
I
think,
if
we
wanted
this
deployed
kind
of
now,
ish
or
this
week-ish
would
be
the
time
I
would
want
to
see
it
so
that
we
would
have
enough
time
to
revert
it
as
we
start
to
see
sort
of
an
oncoming
flood
of
code,
reviews
and
tr
pushes
and
people
trying
to
get
their
stuff
landed
before
code.
Freeze,
like
I
approve
of
the
idea
in
principle,
but
I
need
people
who
have
a
little
better
granular.
C
I
have
some
thoughts
on
this.
First
of
all,
it's
the
first
I'm
saying
of
this.
I
it's
a
little
unfortunate
because
I
kind
of
had
some
opinions
about
the
approved
plug-in
earlier,
namely
that
we
should
like
completely
rewrite
it,
rather
than
continuing
to
add
more
functionality
to
it.
It
is
very
confusing
and
like
huge
amounts
of
the
code
are
vestigial
and
you
know
could
be
refactored
into
things
that
make
far
more
sense
and
I'm
a
little
alarmed
that
this
pr
is
6
000
lines
of
changes.
E
F
A
Yeah
I
mean-
I
don't
know
I
I
I
think
I
want
I
want
some
pushback
like
I
have
this
impulse
of
this
has
been
sitting
out
since
march
and
I
think
alvaro
took
a
look
at
it
back
in
march,
but
I
don't
know
how
much
we've
actually
engaged
with
this
since
then,
and
yes,
because
of
the
large
implications,
I'm
wary
of
having
this
land
so
close
to
code
freeze,
so
I
think
maybe
alvaro
or
cole,
if
you
guys
could
sort
of
what
I'm
trying
to
figure
out
is.
A
A
Is
there
value
in
doing
that,
because
if
I
were
this
contributor,
I
would
find
it
frustrating
that,
after
nine
plus
months,
the
feedback
or
financial
months,
the
feedback
from
the
people
they've
trained
they've
been
trying
to
like
get
attention
from,
is
actually
we
gotta
redo
the
whole
thing
entirely.
C
I
wonder,
if
that's
you
know,
I
I'm
not
sure
how
we
should
handle
that
I
guess
like
I,
I
alvarez
looked
at
this
a
little
more
than
I
have
so
maybe
he
maybe
you
know
like
how
much
the
other
two
commits
are
like
I'm
just
looking
at
the
two
commits
that
are
supposed
to
be
the
you
know,
the
changes,
not
the
copy,
and
it
still
looks
very
massive.
C
Do
you
know
how
much
that
is
actually
changes
and
how
much
of
it
is
stuff
that
should
have
been
put
in
that
first
commit.
A
Yeah,
I
I
think,
like
I'd,
have
to
go
back
and
dig
through
the
meeting.
That's
the
recording
to
know
for
sure,
like
I
do
recall
this
person
showing
up
and
actually
like
walking
us
through
a
demo
of
the
proposed
functionality.
I
think
maybe
the
part
of
the
process
here
that
was
lost
is
how
we
typically
go
through
a
design
dock
or
a
cap,
or
something
to
really
walk
through
the
implementation
and
the
implications
of
this,
and
I
feel
like
if
I
didn't.
A
Up
at
the
meeting,
this
pr
looks
an
awful
lot
like
dropping
a
multi-thousand
line.
Pr
as
the
first
step
of
engagement,
which
typically
doesn't
go
very
well,
so
it
could
be
what's
missing.
Here
is
like
a
document
that
sort
of
lays
out
the
design,
decisions
and
choices
that
were
made,
and
it
could
be
that
breaking
up
the
commits
to
make
them
more
digestible.
F
A
C
A
Know
making
red
jack's
stuff
better
than
different
plugin
is
a
thing.
A
The
only
other
thing
I
can
think
of
about
the
big
change
prior
to
code.
Freeze,
since
our
note
is
here,
it's
a
quick
kicks
in
for
hat
on
the
scalability
jobs.
We've
migrated
over
the
five
thousand
node
performance
test
over
the
k-10.
We
haven't
migrated
over
the
5
000
to
correct
this
job.
Yet
I
was
wondering
if
there
was
anything
specifically
that
was
holding
us
up
there.
G
Not
really,
I
think
I
want
to
leave
that
decision
to
see
scalability,
because
I
talked
to
one
of
the
leads
and
what
he
told
me
is.
They
need
some
kind
of
bandwidth
to
do
the
migration
and
babysit
the
job
in
case
there's
some
failure.
So
I
did
everything
to
prepare.
The
migration
now
is
just
an
approval
from
them.
A
Okay,
that's
everything
that
I
could
think
of
without
my
head
for
groceries
and
thank
you.
How
do
you
for
antonio
for
bringing
some
other
things
to
the
table
with
that?
I'm
going
to
hand
over
to
chow
to
talk
about
github,
dependable.
G
H
Thank
you
aaron.
So,
as
I
have
asked
us
like
a
couple
days
ago,
we
found
out
that
we
can
turn
on
github
depend
upon
alert
without
modifying
the
code
base,
so
I
turned
it
on
and
I
was
not
prepared
that
we
have
17
alerts
and
I'm
looking
at
the
page
right
now,
I
can
copy
paste
the
link,
pnd
meeting,
notes.
H
So
do
you
guys
mind
if
I
share,
or
I
think
aaron
you
can-
you
should
have
permission
to
open
the
link
yeah.
Let
me
let
me
allow
you
to
share
all
right.
A
Okay,
that
moves.
C
We're
talking
about
migrating,
the
cage
brow
instance
to
the
working
group,
infra
instance.
Is
that
correct
that
to.
C
I
would
imagine
that
the
next
steps
would
be
to
migrate.
You
know
sets
of
jobs
or
individual
jobs
at
a
time.
Is
that
kind
of
the
approach.
A
So
then,
so
the
thinking
here
is
we've
already
migrated
a
bunch
of
community
jobs
over
to
a
community
of
build
cluster.
Let's
just
pretend
for
a
second:
I
could
wave
a
magic
wand
and
proud.kate's
dot.
Io
suddenly
doesn't
run
into
kate's
crowd
project
and
instead
runs
in
a
multi-tenant
app
cluster.
A
It's
going
to
attempt
to
schedule
jobs
to
google
and
it's
going
to
attempt
to
some
of
those
jobs
that
it's
running
are
jobs
that
have
nothing
to
do
with
the
kubernetes
project.
So
if
I
look.
A
Jobs
perspective
like
we
got
to
decide
whether
we
should
continue
to
kick
off
non-kubernetes
projects
from
crowdcase.io.
A
A
Chat
like
basically,
this
is
the
last
time
I
thought
about
it
deeply,
and
I
had
concerns
that
we
needed
to
make
sure
that
if
we
were
to
stand
up
like
we
have
a
proud
service
cluster
running
over
in
k-10,
it's
called
k-10
for
proud,
kate,
spot
io
and
like
what
would
we
have
to
do
to
flip
domain
names
or
whatever
such
that?
That
became
kate's,
dot
io.
A
A
Infrastructure
wise,
I
feel
like
we
have
to
decide
whether
we
would
want
that
pro
instance
still
capable
of
scheduling
to
the
google.com
owned,
build
cluster
or
whether
we
would
say
no.
It
can't
schedule
to
that
cluster
everything
has
to
run
in
the
community
owned,
build
cluster,
and
that
leads
us
to
the
decision
tree
on
which
jobs
should
be
migrated
over
or
should
not
be
then
there's
the
google
cloud
storage
resources
like
kate's,
test
grid
and
kubernetes
jenkins
that
currently
live
in
google.com.
A
A
Is
that
and
then
the
thing
I
don't
even
have
listed
here,
I'm
just
realizing
out
loud
is
like
the
all
of
the
workflow
around
continuously
pumping
products.I,
o
and
stuff.
I
don't
know
like
it
all
relies
on
the
use
of
a
kate's
brow
project
to
host
all
the
images
which
is
also
google.com.
A
It's
unclear
to
me
whether
we
need
to
add
privileges
for
the
community
owned,
for
instance,
to
write
to
that
sorry.
At
this
point
I
feel
just
like
I'm
rambling,
but
I
agree
with
darno
that,
like
a
migration
plan
needs
to
be
developed
and
I
keep
losing
context
and
not
having
enough
time
to
actually
like
get
the
context
page
back
in
and
focus
on
it
and
come
to
it
in
sort
of
a
rigorous,
step-by-step
thing.
A
I
But
there
are
a
few
things
that
jumped
out
when
we
talked
about
this
before
the
budget
is
going
to
be
the
big
one,
especially
once
we
start
scheduling
and
putting
logs
on
community
owned
stuff,
because
I'm
pretty
sure
our
runway
will
just
explode.
We
won't
be
able
to
cover
it
for
the
year
right.
A
Basically,
I
kind
of
that-
that's
maybe
almost
an
orthogonal
thing
to
me
at
this
point
I
feel
like
we
already
lost
our
runway
with
the
artifact
hosting
costs,
but
yeah,
let's
say
ballpark.
It
doubles
our
ci
costs
if
we,
if
that's
assuming
moving
over
proud
on
case.io,
also
means
moving
over
all
of
the
other
jobs
that
it
currently
runs
right
for
reference.
I
don't,
I
think
it's
ballpark
like
300
and
something
jobs.
B
C
So
these
1700
remaining
jobs,
though
I
doubt
that
many
of
them
are
for
just
google
purposes
right,
like
most
of
the
like
very
few
of
those
are
going
to
be
things
that
are,
you
know,
shouldn't
be
running
in
that
cluster
right
yeah.
A
lot
of
it
is
like
kubernetes.
A
Ziggs
repos,
like
cluster
api,
all
the
cluster
api
repos,
all
the
cloud
provider
repos
like
there's
a
lot
of
valid
stuff
that
hasn't
been
migrated,
I
sort
of
in
terms
of
if
we
go
back
to
your
suggestion.
Cole,
like
think
about
just
the
jobs
that
we
should
migrate.
Thinking
that,
like
running
jobs,
is
the
majority
of
the
sea
ice,
not
necessarily
the
control
plane
or
the
service
cluster.
Then
yeah.
We
migrated
over
the
jobs
that
are
principally
aimed
at
release,
blocking
and
merge
blocking
jobs
for
kubernetes
kubernetes.
A
We
haven't
really
opened
the
floodgates
to
jobs
for
all
of
the
other
100
and
something
repos
that
make
up
the
kubernetes
project.
Part
of
that's
just
from.
Like
a
billing
perspective,
it's
really
difficult
to
segregate
out
billing
on
a
per
job
basis,
so
we
wouldn't
be
able
to
waive
a
magic
wand
and
say
like
well
sid
cluster
life
cycle.
You
get
a
budget
of
this
many
thousands
of
dollars,
and
it's
up
to
you
to
make
sure
that
you
run
you
don't
run
too
many
jobs
to
build
your
budget.
C
E
G
G
G
A
Off
tangent
here,
the
the
like
the
motivation
was
to
see
if
there's
a
way
we
can
migrate
just
the
service
cluster,
because
the
idea
here
is
great:
we're
spending
the
community's
money
on
jobs
that
run
in
the
community
on
build
cluster
and
that's
cool,
but
I
feel,
like
people
are
less
incentivized
to
contribute
to
and
help
support
crowd
because
they
can't
touch
the
proud
control
plane.
They
can't
see
the
same
logs
that
all
the
on-call
people
can
see.
A
This
is
all
kind
of
in
service,
of
allowing
more
community
members
to
be
able
to
join
an
on-call
rotation.
That's
basically
best
effort
and
to
be
able
to
help
out,
and
so
the
fact
that
proud
runs
in
a
google.com
only
project
is.
E
A
C
E
E
A
G
A
Like
I
would
do
the
security
stuff
I'm
thinking
of,
is
I
have
a
service
cluster
cross
service
cluster
outside
of
google.com
that
has
the
you
know,
config
to
be
able
to
schedule
to
a
build
cluster
which
you
all
can
correct
me
if
I'm
wrong.
But
last
time
I
went
through
this,
it
required
setting
up
basically
an
admin
level
user.
A
So
a
gke
cluster
outside
of
google.com,
more
or
less,
has
full
control
of
the
gke
cluster
inside
of
google.com.
So
it
could
schedule
whatever
it
wanted
inside
that
gke
cluster,
you
know
kind
of
depends
on
like
what
workload
identity
bindings
would
be
present
inside
of
the
google.com
cluster.
How
far
something
would
be
able
to
get
from
there
as
far
as
what
it
could
or
could
not
access.
E
I
was
saying
I
thought
that
was
actually
like
a
technical
blocker,
the
like,
like
being
able
to
grant
permission.
A
E
A
Would
there
be
a
dance
we
could
do
where,
like
the
let's
say,
we
leave
all
the
jobs
there
and
they
they
still
schedule
to
google.com
owned
kubernetes
cluster
or
a
google.com
cluster
that
runs
like
largely
kubernetes
jobs,
but
some
non-communities
jobs
and
like
what
if
we
were
to
swap
domain
names.
So
now
we
have
crowd.case.io,
that's
kubernetes
thing
and
it's
running
mostly
the
it's
running,
the
kubernetes
release
blocking
jobs
and
we
encourage
people
to
migrate
more
of
their
jobs
to
run
on
it
and
then
what's
left
is
like
google
case.prowl.io.
C
Having
both
temporarily
sounds
good
to
me,
I
would
say
that
we
probably
shouldn't
switch
the
name,
though,
until
we
are
able
to
deprecate,
because
I
think
that'll
just
cause
more
conflicts
and
issues
right
if
we
can
keep
the
name
consistent,
at
least
until
we're
ready
to
make
that
the
other
source
of
truth.
That
might
be
easier.
C
A
Well,
I
keep
feeling
like
when
I
walk
away
from
these
discussions.
It's
like
well,
I
guess
I
should
go
write
a
doc,
but
I
feel
like
arnold
is
holding
me
accountable.
The
fact
that
I
keep
not
doing
that
so
I
can
take
another
crack
at
this,
but
I
think
I'm
gonna
definitely
get
cole
and
ciao
to
take
a
look
at
it
this
time
and
see.
If
I
don't
know,
maybe
we
could
tee.
B
A
The
main
main
thing
we
haven't
discussed
here
is
like
workflow
related
stuff
when
it
comes
to
the
continuous
deployment
of
brow
and
its
images
and
all
that
stuff
I
feel
like
that
needs
a
little
bit
of
a.
I
think
as
well.
C
Yeah,
I
think
we
have
pretty
clear
paths
forward
on
all
of
that.
I
agree
that
we
need
to
flesh
it
out,
but
yeah.
I
think
that
all
of
that
there's
no
no
technical
barriers
there
that
I'm
aware
of
okay.
A
A
All
right
well,
unless
anybody's
got
anything
for
our
last
minute,
I'm
gonna
call
it.
Okay
thanks.
Everybody
have
a
happy
tuesday
and
I'll.
See
you
all
in
two
weeks.