►
From YouTube: Kubernetes SIG Testing 2018-08-21
Description
A
Okay,
hi
everybody
today
is
Tuesday
August
21st.
My
name
is
Aaron
curtain
burger.
This
is
the
kubernetes
state
testing
weekly
meeting.
This
is
being
recorded
publicly
and
will
be
posted
to
YouTube
later
today.
Assuming
I
remember
to
do
this
so
on
the
agenda.
Today
we
got
Steve
who
wants
to
talk
about
disruptive
tests
on
AWS
with
cops
Timothy
st.
Claire
wants
to
talk
about
issue,
routing
and
tags.
A
B
Yeah,
so
basically
I'm
some
feedback
that
I
got
from
some
engineers
is
that
it's
basically
impossible
to
write
any
disruptive
tests
on
AWS,
because
the
way
that
the
cops
set
up
works
is
all
the
nodes
get
put
into
auto
scaling
groups.
So,
basically,
if
you
want
to
disrupt
the
cluster
and
take
a
note
down
like
the
actual
AWS,
auto
scaling
group
will
terminate
that
instance
start
a
new
one
for
you
very
handily
and
I
wanted
to
know
what
was
the
larger
direction
there.
B
Cuz
I
did
like
those
tests
were
like
I,
think
there
was
there's
a
PR
fairly
large
PR
that
merged
and
it
was
merged
with,
like
sort
of
a
bunch
of
hand-wringing
about
the
fact
that
they
couldn't
actually
test
the
functionality
that
was
going
in,
and
is
that
like
something?
That's
just,
never
gonna
changing
cops?
Do
we
have
any
plans
on
on
allowing
this
or
two
disrupted
tests.
C
As
someone
from
sick
cluster
lifecycle,
cop
you'd
have
to
talk
with
Justin
he's
the
pretty
much
the
main
maintainer
for
cops
coming
forwards
and
it's
behind
the
times.
I've
long
been
frustrated
by
the
fact
that
it
is
in
the
blocking
PR
jobs
and
that
it
is
not
periodic
because
it
is
out
of
date
right.
C
So
this
weird
conundrum
of
it
I'd
much
rather
put
kuba
DM,
which
lives
on
master
and
actually
is
released
with
master
and
all
the
other
artifacts
in
the
main
line
to
do
main
line
testing
for
a
lot
of
these
other
things
and
push
that
into
AWS.
Along
with
the
cluster
API
implementation,
which
folks
are
worker
than
working
on.
It
makes
a
ton
more
sense
for
the
long
haul
for
support
ability
or
if
you
want
a
blocking
PR
job,
then
that,
but
that's
that's
kind
of
been
my
opinion.
There.
C
E
A
C
A
It
takes
forever,
and
so
we
want
to
actually
move
in
a
direction
where
no
clouds
are
involved
in
blocking
presubmit
tests
and
they
are
all
post
submit
tests.
But
the
world
grand
today
is
that
those
two
give
us
sufficient
coverage
of
e
2e
tests
and
we're
working
on
a
solution
that
gives
us
comparable
coverage
of
the
e
to
e
tests.
But
we're
not
there
yet
so,
like
I
agree
with
your
opinion
that
the
state
of
today
is
bad
but
I'm
not
sure
that
we
really
have
any
appreciable
place
to
go
right
now.
B
A
That's
probably
like
a
larger
discussion
that
I
I
really
hate
being
the
blocker
for
it.
If
I'm
like
I,
have
an
interest
in
you
know
trying
to
write
that
document
that
describes
the
path
so
like.
How
do
you
submit
your
results?
How
to
you
submit
post
submit
results?
How
do
you
get
from
like?
How
can
we
verify
that
you're
generating
good
signal?
How
can
we
make
sure
that
that's
worth
blocking
on
thing
things
of
that
nature,
and
not
we're
not
quite
there?
Yet
there.
C
Should
also
be
all
over
the
code,
there
are
GCP
specific
tests
inside
of
the
end-to-end
test
suite,
and
you
could
simply
wrap
that
portion
with
a
issue
and
reference
that
issue.
So
that
way
it
can
be
fixed
over
time
as
folks
tread
into
cluster
API
and
if
they
have
a
certain
incantation
that
uses
a
SGS
or
not.
That's
that's
an
implementation
choice,
not
necessarily
limitation
of
the
cloud
provider
itself,
but
a
limitation
of
with
how
cops
is
using
the
provider
yeah.
A
It
could
be
that
there
are
a
lot
of
tests
that
were
shoddily
written
that
are
skipped
if
the
provider
is
GCE,
because
the
person
who
originally
wrote
the
test
only
had
the
GCE
cluster
to
test
it
on,
and
nobody
with
access
to
an
AWS
cluster
has
taken
the
time
to
make
sure
that
the
test
is
written
well
and
meaningful
as
such
that
it
triggers
a
you
know
an
appropriate
fault
on
the
AWS
cluster
and
that
recovery
happens.
Similarly,
so
like
what's
the
plan
for
corruptive
tests,
I'm
not
sure
I
would
go.
C
The
problem
with
that
is
that
some
of
the
sweets
are
stood
up
for
a
given
provider
and
when
the
test
fails,
there
should
be
like
a
canonical
ordering
for
who
to
route
the
tests
to,
and
this
occurs
for
a
bunch
of
different
things.
So
we
haven't
really
thought
through
how
to
properly
route
things.
So
sometimes
the
CI
signal
lead.
The
pourcel
has
to
go
to
all
the
SIG's
to
try
and
find
the
right
person
and
then
eventually
hits
paydirt,
but
we
should
probably
do.
C
Yes,
I
do
and
I
can
lick
it
into
the
into
the
meeting
notes.
This
specific
one
was
the
cluster
upgrade
tests
that
have
been
failing
for
GCE
for
a
long
time
and
has
nothing
to
do
with
sequester
lifecycle.
In
fact,
it
doesn't
even
have
anything
to
do
with
GCE
it's
it's
deep
down
somewhere
in
GC
storage.
So
it's
like
it's
orthogonal.
C
It
almost
belongs
in
six
storage,
so
the
that
routing
is
basically
the
poor
CI
signal
person
was
poking
us
forever
and
we're
like
we
kind
of
ignored
them
for
a
while,
because
it
we
weren't
seeing
in
other
Suites
and
then
they
came
to
the
cig
meeting
and
we
dug
into
it
and
were
like
no.
This
has
nothing
to
do
with
us
at
all,
so
that
routing
process
which
probably
be
refined
and
thought
about
so.
A
To
provide
some
context
on
that,
I
helped
write
the
routing
process
for
a
CI
signal
and
the
way
that
I
wrote
it
was.
There
are
two
six
and
in
fact,
I
talked
about
this
at
the
community
meeting
last
week,
but
just
to
rehash
it
there,
there
should
be
two
sinks
that
shego
contact
first
write
each
first
take
a
look
at
the
sig
that
owns
the
job
who's
in
charge
of
the
the
health
of
the
job
as
a
whole
and
then
who's
in
charge
of
the
health
of
the
individual
test
case.
A
A
That
said,
cluster
lifecycle
is
put
in
this
really
unfortunate
position,
because
upgrade
tests
are
almost
always
where
all
of
the
failures
are
with
release
related
stuff,
but
I'm
more
interested
in
figuring
out
how
we
can
like
work
through
that
quicker,
as
opposed
to
having
people
be
ignored,
with
no
real
reasons.
Given
we
I.
C
Think
it
was
just
a
matter
of
people
didn't
have
enough
bandwidth,
people
looked
at
it,
they
gave
back
feed
back
then
it
wasn't
progressing
the
way
they
wanted
to
use.
So
they
showed
up
to
the
same
meeting
and
that's
when
progress
was
made
so
I
think
the
problem
in
general
is
there's
a
ton
of
signal
inside
of
the
system
on
average,
I
get
upwards
of
150
emails
a
day,
turner
easily,
so
getting
signal
out
of
the
system.
C
C
C
A
C
Owners
file
their
only
successor,
lifecycle
is
not
the
owner
and
shouldn't
be
alias.
Does
the
owner,
because
we,
those
folks,
don't
maintain
it
right?
So
the
me
it's
all
about
ownership
and
maintainer
ship
of
code
by
definition,
cyclists
or
life
cycle
does
not
maintain
or
own
that
code
right.
We
aim
maintain
and
own
all
of
the
stuff
around
Kubb,
ATM
and
close
to
API
and
that
stuff,
but
no
one
from
the
sig
actually
owns
and
maintains
that
code.
So
none
of
the
leads
do
only
Robert.
C
Bailey
is
the
last
maintainer
from
Google
that
routes
things
properly
so
again,
because
there
are
two
control
poke
paths
that
are
totally
parallel.
The
tests
themselves
are
useful,
but
the
jobs
exercise
the
stand-up
completely
differently,
and
because
of
that,
we
should
probably
route
via
a
job
first
and
then
test.
Second.
A
Okay,
I
think
that's
fair
and
I.
Think
like
you,
should
probably
go
propose
this
through
sake
release
because,
generally
speaking,
like
you
know,
we've
fallen
back
on
this
wonderful
situation
where
we're
leaving
on
a
human
to
do
the
triage
and
I
have
a
dream.
I've
had
a
dream
for
a
while
of
automating
and
scripting
that
away,
but
I
think
that
the
process
would
work
best.
A
But
I
like
I
agree:
I'm,
not
super
happy
that
right
now
we're
living
in
this
kind
of
in-between
state
where
jobs
aren't
I'm,
not
sure
if
it's
clear
anymore,
even
switch
job.
So
in
that
sense,
perhaps
we're
due
for
another
sorting
of
jobs
amongst
six
and
then
I
think
that's
a
perfectly
reasonable
argument
to
say
that
I'm
not
sure,
like
the
entire
cluster
directory
is
owned
by
Google,
but
I
certainly
think
the
cluster
GCE
directory
is
a
by
Google
I'm,
not
sure
about
things
like
jutsu
or
kubernetes
anywhere,
but
yeah.
A
So
next
up
I
yeah!
You
know
what
I'll
go
ahead
and
share
my
screen.
Just
so,
I
can
walk
along
here.
I
wanted
to
give
you
a
heads
up
on
some
of
the
automate
all
the
things
work
I've
been
doing.
I
gave
a
heads
up
about
this,
a
contributor
experience
and
again
at
community
basically
trying
to
take
another
push
on
this
issue.
Here,
we're
a
set
of
merged
automation
for
every
single
kubernetes
repo
in
every
single,
actively
managed
kubernetes
award.
A
It
gets
merged
and
then
finally,
are
any
of
the
branches
protected
automatically
by
our
branch
protector,
which
runs
daily,
so
I've
been
rolling
through
on
all
of
this,
where
we're
at
right
now
is
I.
A
lot
of
our
automation
depends
upon
certain
labels.
A
couple
of
the
plugins
are
written
in
not
the
greatest
fashion
where,
if
the
label
doesn't
exist,
it
just
goes
ahead
and
creates
it.
So
I
am
pretty
creating
all
the
labels
in
all
the
orgs
I
have
a
pull
request
out
there.
A
A
I
kind
of
need
to
go
through
and
update
this,
but
there's
one
repo
to
be
deleted
and
contribs
I
have
proposed,
gets
owned
by
sitting
architecture
under
the
guise
that
can
trigger
this,
a
repo
that
really
shouldn't
have
existed
for
years
and
years.
At
this
point,
it's
one
of
those
things
we
all
love
to
say
that
o
contribs
dead,
yeah
pull
requests
they'll
keep
showing
up
into
it.
So
I
think
that
if
cig
architecture
is
the
city
in
charge
of
making
sure
the
projects
code
is
well
organized,
they
are
the
cig.
A
A
B
B
A
Generally
know,
although
it
can
get
some
spammy
on
some
PRS
I,
don't
know,
I
haven't
taken
the
time
to
go,
collect
data
on
that.
Okay,
I
got
so
much
less
contingency.
Yep
next
thing
that
I
already
pasted
about
this
in
the
state
testing
channel,
there's
a
pull
request.
I
would
love
to
see
it
get
merged.
Today,
where
somebody
has
gone
ahead,
hippy
has
modified
the
e2b
framework
so
that
it
includes
the
test
name
in
the
user
agent
string.
A
I
know
the
example
here
shows
file
name
line
number,
but
I
was
asking
that
perhaps
we
just
trim
it
down
to
the
test
name
here
complete
with
these
tags.
That
Tim
was
talking
about
earlier.
With
this
we
can,
the
user
agent
gets
dumped
into
the
audit
log.
We
can
scrape
the
audit
log
and
generate
API
coverage
information,
so
we
can
see
exactly
which
tests
are
hitting
which
API
endpoints.
So
if
folks
from
this
state
can
take
a
look
at
this
and
help
push
it
through,
I
would
greatly
appreciate
that.
A
The
other
thing
to
discuss
is
I
in
chatting
with
Tim
last
night,
I
kind
of
want
to
try
and
be
lazy
about
conformance
and
reuse.
The
fact
that
we
have
all
these
conformance
jobs
that
are
continuously
running
and
posting
results
every
six
hours
I
feel
like
I,
really
ought
to
be
able
to
click
on
one
of
these
jobs.
Take
these
files
and
hand
them
over
to
the
C&C
F
in
order
to
make
sure
that
these
jobs
are
proof
of
certification,
or
you
know,
passing
conformance
tests.
A
The
thing
we
discovered
last
night
is
there
were
a
bunch
of
tests
that
we're
getting
run,
that
sauna
boy
doesn't
run,
and
it
turns
out
that
sauna
GUI
has
a
bunch
of
additional
regular
expressions
that
it
uses
to
skip
tests.
So
it
skips
all
the
tests
that
have
Q
cuddle
client
in
their
name.
It
skips
any
tests
that
are
flaky,
disruptive
has
a
feature
tag
in
them
or
our
alpha
I
get
it.
These
are
all
like
fantastic
criteria.
A
Honestly,
if
there's
a
test,
that's
tagged
as
conformance,
but
it's
flaky
it
shouldn't
have
been
tagged,
as
conformance
in
the
first
place.
My
concern
is
this:
has
allowed
us
to
end
up
in
a
situation
where
the
list
of
tests
that
kubernetes
defines
as
conformance
as
guarded
by
code
here
in
the
kubernetes
repo,
and
it
goes
through
and
it
blocks
through
all
the
code,
and
it
looks
for
anything
with
the
conformance
tag.
That's
basically
about
it.
A
It
doesn't
bother
to
exclude
any
tests
that
have
flaky
or
any
tests
that
have
slow,
I'd
love
to
demonstrate
that
to
you
here,
but
it
actually
strips
out
all
tags
when
it
dumps
out
the
test
names.
But
so
there
are
tests
in
this
list
that
we
treat
is
the
authoritative
list
of
conformance
tests
that
sound
boy
isn't
running,
and
this
is
an
issue
because
sauna
boy
is
listed
as
the
way
to
run
conformance
tests.
C
A
C
Yeah,
we
should
do
that.
The
question
of
coop
control
stuff
will
affect
OpenShift
with
ACLs
to
run
it
it's
respectively.
So
the
there
are
some
issues
that
we've
uncovered
a
long
way:
we've
already
fixed
a
bunch
of
them.
I
know
DIMMs
is
on,
but
we
fixed
a
long
time
ago.
You
couldn't
even
introspectively,
run
the
tests
at
all,
but
that's
been
fixed
because
it
uses
the
in
cluster
clients
that
was
like
going
around
like
the
one
nine
time
frame,
so
this
is
just
legacy
that
should
be
fixed
and
I.
C
Think
just
no
one
has
gone
through
and
fixed
it.
I
know
that
MML
was
originally
the
person
from
Google
who
was
working
on
this
stuff
and
we
had
been
conversing
about
fixing
some
of
the
things
and
we
fixed
some
of
them.
We
just
haven't
gone
through
and
updated
again,
so
I
think
I'd
be
happy
with
making
it
chive.
Ideally
this
isn't
really
suitably.
C
This
is
actually
the
coop
conformance
container
and
I
also
talked
with
Matt
a
while
ago,
and
I
would
like
to
have
that
published
as
part
of
the
build
artifacts
of
upstream
so
everybody's,
using
the
same
coop
conformance
container
with
the
same
reg
X's
and
the
same
relative
process
right.
So
anyone
could
take
that
particular
container.
They
couldn't
take
the
Coons
container.
They
never
could
because
it
had
issues,
it's
very
Google
specific
to
the
test
and
fro.
C
D
So
I
had
a
question:
I
had
a
question
about
this,
so
I
learned
last
week
or
this
week,
I
don't
remember
any
more
been
a
colleague
of
mine
was
running
conformance
tests
on
some
VMware
stuff
and
on
master,
and
it
turns
out
right
that
sonobuoy
apparently
doesn't
work
with
up
to
master.
It's
only
on
specific
tagged
releases
if
things
get
switched
over
to
an
image
that
uses
sonobuoy.
Is
that
not
going
to
work
with
this.
C
Again,
this
all
real
life
isn't
necessarily
so
nobly.
So
nobly
is
a
raptor
runner
right.
It's
a
basically
executor
framework
for
how
to
run
many
get
the
artifacts.
This
execute
the
coop
conformance
container.
Ideally,
if
folks
want
to
use
it
against
master-
and
that
was
also
part
of
animals,
mats
and
I
objective
was
to
get
that
container
into
what's
called
into
upstream.
So
that
way
upstream
publishes
this
the
container
as
part
of
the
release
process
itself,
and
then
son
of
we
can
execute
for
anything
right.
A
Yeah,
so
it's
it's
all
like
wrappers
upon
wrappers
upon
wrappers,
ultimately,
there's
an
e
to
e
test
binary
that
also
needs
like
a
coop,
CTL,
binary
and
I
think
all
right,
there's
game
food.
So
then
Genki
that
calls
the
e2b
test
binary.
But
then
there's
a
shell
script
that
calls
ginko
to
call
the
e
to
e
test
binary
and
then.
D
D
F
Yeah
so
cubes
cans.
Yes,
it's
possible
to
do
what
we
want.
It's
not
that
bad
right
now,
I
pasted
a
resist
which
essentially
starts
local
up
custard,
but
instead
of
running
local
up
cluster,
you
can
just
run
the
test
to
just
changing
the
command-line
parameters,
to
cube
test.
It's
self-contained
and
it's
easy
to
run
it's
just
that
we
need
to.
We
probably
have
to
streamline
the
UX
a
little
bit
more
and
publish
how
exactly
to
do
it.
F
A
D
A
F
A
F
D
Extending
cube
test
to
be
able
to
use
an
image
to
stand
up
the
infrastructure
and
I've
poked
around
at
that
I
haven't
had
the
bandwidth
to
really
work
on
that,
but
maybe
that
the
same
process
in
terms
of
you
know
can
be
leveraged
or
if
there's
going
to
be
worked
on
this,
at
least
at
least
be
aware
of
it.
So
maybe
it
could
be
extended
to
also
be
used
to
stand
up
the
infrastructure,
the.
C
The
only
I
would
prefer
to
be
looped
in
any
canonical
container
that
gets
published
there.
There
are
fixes
that
we
have
inside
of
the
coop
conformance
container
that
allow
it
to
clean
itself
up
afterwards,
because
not
of
the
cleaning
right
now
part
of
the
testing
infrastructure
destroys
your
provisioning
and
if
you're
doing
this,
on
someone's
premises
being
being
a
good
steward
is
super
important
and
we
have
that
auto
fix
in
there
and
that's
that's
important
yeah.
A
D
Was
a
I
mean
I
would
figure.
That
would
be
a
two
part
thing
right:
one
standing
up
infrastructure
and
the
other
is
running
and
cleaning
up
tests.
I
mean
you
could
all
be
a
single
image,
but
I
would
think
it
would
be
two
images.
You'd
have
the
infrastructure
image
and
then
you
or
multiple
infrastructure
images.
Ideally
they
could
be
interchange,
yeah.
A
F
A
Finally,
we're
over
time,
but
I
did
want
to
make
sure
we
got
the
Patrick's
things.
Who
is
the
keeper
of
the
clusters,
specifically
secrets
related
to
getting
access
to
other
clouds?
That
would
at
the
moment,
be
the
tests
infra
folks,
specifically
the
gke
inch
broad
team
here
at
Google
I
have
longer-term
dreams
codified
in
issues
of
making
on-call
the
thing
that
everybody
can
do.
A
The
way
we
get
to
that
is
to
have
these
clusters
stood
up
in
a
Google
cloud
account
that
is
managed
by
the
CN
CF,
so
that
non
google.com
people
can
be
added
to
that
Google
cloud
account,
but
in
the
meantime,
I
think.
If
you
get
in
touch
with
us
and
make
sure
we
are
made
aware
that
these
are
important
issues
to
get
merged,
we
should
be
able
to
work
with
you
on
that.
A
We
typically
have
one
person
who
is
on
call
for
testing
for
us,
I
think
it's
you
can
find
out
who
it
is
and
we
find
out.
If
this
is
right.
Now
it's
not
on
call.
Maybe
it's
on
call
yeah.
So
today
this
week
you
can
go
to
that
URI
and
you
can
go
find
out
that
you
need
to
poke
Erik
theta
has
the
point
of
contact,
San,
Liu
or
then
the
elder
may
also
be
of
use.