►
From YouTube: Kubernetes SIG Testing 2017-11-14
Description
Meeting notes: https://docs.google.com/document/d/1z8MQpr_jTwhmjLMUaqQyBk1EYG_Y_3D4y4YdMJ7V1Kk/edit
A
All
right,
hi
everybody
today
is
Tuesday
November
14th.
This
is
sick,
testings
weekly
meeting
I
am
Aaron
Berger,
and
this
will
be
publicly.
This
is
being
publicly
recorded
and
will
be
posted
to
YouTube,
eventually,
which
site
administrator.
You
note
I,
know
I'm
still
behind
on
posting
last
week's
recording
I
should
get
to
that
later
today,
assuming
I
actually
have
it
with
me.
Otherwise,
it's
gonna
have
to
wait
a
couple
weeks.
A
So
I
wanted
to
talk
briefly
about
a
proposal
that
I
finally
got
some
time
on
that
was
initially
wordsmith
by
Jace
around
how
we
define
the
criteria
for
release
blocking
jobs
and
merge
blocking
jobs,
I've
linked
it
in
the
agenda,
but
I'm
going
to
go
ahead
and
share
my
screen.
So
you
can
all
see
what
I'm
talking
about
okay.
So
this
proposal
here,
the
the
TLDR,
is
trying
to
really
like
admit
that
this
is
just
a
first
pass
I'm
trying
to
put
stake
in
the
ground.
For
today.
A
This
isn't
intended
to
be
like
the
forever
engraved
in
stone
thing,
but
we
want
like
enough
actual
numbers
that
we
can
sort
of
objectively
identify
when
things
are
misbehaving,
as
well
as
when
things
are
valid
to
be
considered,
release
or
merge
blocking,
and
we
want
to
have
sort
of
human
review
to
sort
of
understand
that
everybody's
acting
in
the
collective
best
interest
here
so
I
went
based
on
metrics
that
I
can
easily
glean
by
looking
at
test
grid.
I
can
pretty
quickly
figure
out
whether
or
not
a
job.
By
looking
at
the
test.
A
Duration
minutes
graph
on
test
grid
I
can
pretty
easily
see
whether
a
job
averages
finishing
a
run
in
less
than
60
minutes.
I
can
figure
out
pretty
quickly
if
that
job
runs
at
least
every
two
hours
I
can
figure
out.
If
that
job
passes,
90%
of
its
runs
in
the
past
week,
based
on
the
summary
tab,
whether
or
not
the
job
is
capable
of
passing
three
times
in
a
row
against
the
same
commits
a
little
bit
harder.
A
A
A
job
shows
up
as
failing
if
it
fails
ten
times
in
a
row
so
based
on
all
these
things,
I
can
pretty
quickly
see
whether
or
not
a
job
is
the
ideal
candidate
for
release
blocking
or
whether
it's
misbehaving.
And
then,
if
it's
misbehaving,
you
know
I'm
trying
to
like
go
triage
and
poke
humans
to
fix
and
resolve
the
situation,
but
eventually
we're
going
to
reach
a
point
where
maybe
it's
kind
of
time
to
talk
about
removing
it.
A
So
to
that
end,
I
just
opened
up
a
pull
request
to
talk
about
removing
all
the
gke
related
jobs
because
they
they
have
been
failing
for
about
two
weeks
now
and
the
evidence
is
just
showing
that,
even
though
I've
seen
some
people
working
on
it,
we
still
have
all
these
jobs
failing
for
two
weeks.
So
if
we
want
gke
to
be
a
release,
blocker
we
need
to
incentivize.
People
to
you
know,
gather
enough
resources
to
actually
work
the
problem,
but
I
recognize.
A
This
is
an
iterative
process,
so
I
want
to
make
sure
we
get
the
right
discussion
from
the
right
stakeholders.
Similarly,
so
the
key
thing
here
is
I
want
to
make
sure
this
poll
request
gets
looked
at
by
the
owning
cig,
which
would
be
sick,
JCP
and
sink
release,
since
they
kind
of
owned
the
dashboard.
The
criteria
for
merge
blocking
tests
is
basically
the
same.
It's
just
slightly
different
metrics.
It
seems
like
actually
most
of
our
pre
submit
jobs
run
in
less
than
40
minutes.
A
On
average,
some
of
them
spike
up
a
little
bit
verify
runs
a
little
high
and
they
flake
about
20%
of
the
time,
regardless
of
which
commits
they're.
Sorry
not
flake,
fail
they
just
straight
up
fail
about
80%
of
the
time
and
that's
expected:
it's
pull
requests
people
submit
code
that
will
fail
tests
because
people
are
humans,
but
this
is
all
stuff
that
I
can
glean
from
test
grid,
ideally
moving
forward.
You
know
we
want
faster
or
better
smaller
tests.
Eric
was
talking
about.
A
A
The
other
ideal
thing
I
think
I
had
mentioned
this
during
stand-up
the
other
day.
I
guess
I'll.
Stop
sharing
here
is
that
it
seems
pretty
likely
that
test
grade
is
it's
in
a
good
approximation
that
I
can
look
at
as
a
human
and
a
lot
of
people
in
the
release.
Burndown
have
been
trained
to
look
at,
but
I.
A
I
can
start
identifying
like
the
flaky
as
tests
within
those
jobs,
and
while
we
have
a
dashboard
that
does
that
for
pre
submits
today,
because
we
get
a
lot
of
great
traffic
there,
it
could
still
be
useful
for,
like
the
CI
signal
person
to
go,
say
hey.
We
noticed
that
these
these
jobs
are
still
flaking
in
the
CI
tests.
So
I
just
wanted
to
run
all
of
that
by
this
group
to
make
sure
that
seems
sane
and
I
linked
the
doc
in
the
proposal.
B
Okay,
maybe
it
might
be
someone
outside
of
the
scope
of
this
sick,
but
possibly
something
to
throw
at
contra
mix.
So
maybe
I'm
not
sure.
If
they're
icing
I
feel
like
we
might
need
to
do
more
to
make
sure
that
SIG's
are
aware
that
they
need
to
have
someone
or
someone's
monitoring
the
test
failures,
oh
and
that
handle
or
team.
A
All
those
things
get
it's
it's
something
I'm
trying
to
do
as
loudly
with
as
much
visibility
as
I
can
I'm,
always
wary
of
the
fact
that
sake.
Gcp
is
new
in
nascent.
So
this
is
why
I
like
I
went
to
their
introductory
meeting
and
gave
them
the
heads
up
about
this.
I
know
they're
understaffed.
So
that's
why
I
raised
it
during
the
community
meeting
where,
ideally,
some
Google
product
managers
are
sitting
there
publicly
saying.
Yes,
we
will
commit
resources
to
this,
so
I'm
holding
them
accountable
there.
A
The
next
step
would
be
to
have
some
mailing
list,
traffic
on
the
kubernetes
dev
mailing
lists
and
I'm
generally
pinging,
the
actual
estate
channels
as
well.
So
I
agree
like
people.
It's
definitely
I
can't
just
like
open
pull
requests
and
say
I've
done
it.
I
gotta
be
as
vocal
about
it
as
possible
and
recognize
that
we're
all
trying
here,
but
I
will
definitely
raise
it
at
contributes
tomorrow,
because
I've
like
been
kind
of
a
big,
vocal
opponent
to
the
github
teams,
with
all
these
prefixes
on
them
and
stuff.
A
Who
needs
a
point
of
contact
to
get
in
touch
with
when
tests
fail
and
so
having
a
unified
team
across
each
sig
seems
like
speaking
for
me
as
a
human,
and
if
we
find
that
this
works
out,
it's
not
too
hard
to
go
to
then
have
like
some
script.
Take
the
cig
owner
out
of
you
know,
wherever
the
test,
config
data
ultimately
lands
and
actually
automatically
do
these
notifications
I
also.
B
A
That's
a
that's
a
separate
thing:
I'm
gonna
go,
try
and
chase
them
down
over
and
it
turns
out
in
talking
with
folks
aw
aw
s
doesn't
actually
own
cops.
Sic
cluster
life
cycle
ends
cops
so
I'm
gonna
be
opening
up
another
PR
to
ask
everybody
if
they
think
that
is
correct,
but,
like
I
said,
my
goal
here
is
to
try
and
have
as
much
of
these
discussions
in
a
pull
request,
driven
workflow
as
possible
because
eventually
I
think,
like
anybody
should
be
capable
of
nominating
their
job.
A
B
Release
what
9c
is
coming?
That's
Thursday
I
have
been
looking
at
what
we
did
for
the
past
releases
and
talking
to
Zen,
who
did
those
and
I
think
mostly
sunn
is
looking
to
reorganize
the
naming
finally
to
be
unified
from
what
the
jobs
are
so
stable,
one
stable
in
for
each
other
releases
and
something
that
will
be
in
before
then,
which
will
thank.
C
C
A
That's
so
freakin
peaceful
yeah,
all
the
release
dashboards
have
slightly
different
sets
with
jobs.
I'm
gonna
be
dropping
out
the
face
of
the
earth
Thursday
and
Friday,
but
if
you
can
tag
me
on
issues
related
to
that,
I'm
interested
in
this
is
the
CI
signal
person,
because
I
was
just
chatting
with
Eric
Chang.
A
Who
is
CI
signal
for
the
last
release
and
like
I,
don't
know
where
I
should
be
looking
for
upgrade
jobs
they're
like
a
whole
bunch
of
different
dashboards
that
have
the
word
upgrade
on
them
and
some
of
them
seem
redundant
and
so
like.
Maybe
if
we
can
just
kind
of
hash
out
like
what
we
think
those
are
supposed
to
be
like
I
I
will
totally
take
your
guidance
on
what
you
think
the
standard
should
be
to
stamp
stuff
out
and
if
I
need
to
help
with
that
definition.
A
C
C
A
A
B
B
One
thing
to
add
to
the
blocking
jobs:
I
I,
don't
I
didn't
see
for
sure.
If
this
isn't
your
dog
but
I,
think
at
some
point
we
might
want
to
clarify
a
bit
more
when
when
they're
allowed
to
run
because
I
think,
for
example,
if
we
can
get
it
working
again,
something
like
cross
should
probably
block
if
it's
triggered,
but
it
doesn't
necessarily
run
against
every
PR.
It
does
run
in
CI.
B
Okay,
he's
a
much
longer
job
but
like
if
he,
if
you
change
the
build
week,
we
want
to
make
sure
build
both
words
right
now,
kind
of
it
triggers
and
people
pay
attention
to
it
and
try
not
to
merge
while
breaking
it,
but
it's
not
enforced
yet,
but
I
think
in
the
future.
There
are
a
few
tests
like
that,
where
we
have
a
pretty
good
idea,
if
it
needs
to
run
or
not,
and
it's
gonna
be
slower,
and
we
still
might
one
block
it's
a
lot
slower
yeah.
A
That's
where
I
wasn't
I
mean
I
I
question
whether
or
not
this
gets
mixed
in
with
tide
and
status.
Contexts
on
github
there
seem
to
be
some
pre
submits
that
aren't
blocking
so
I'm
wondering
if
it
makes
sense
to
break
up
the
dashboards
into
a
pre,
submits
blocking
dashboard
and
a
pre
submits
non
blocking
dashboard.
A
I
can
probably
PR
that
maybe
today
yeah
if
I,
don't
if
I
can't
PR
it
I'll
just
file
an
issue
yeah
you
mentioned
poll
kubernetes
cross,
that's
actually
been
failing
for
the
past
day.
I
wasn't
sure
if
I
should
I
think.
B
A
B
A
So
these
next
two
are
kind
of
related,
but,
like
sake,
release
wants
to
know.
If
we
have
any
major
migrations
planned
and
the
next
week's
and
whether
or
not
we
should
think
about
setting
any
kind
of
Free
State
for
test
infra
before
we
do
that,
do
we
think
we
could?
Where
do
we
feel
we
are
on
potentially
getting
tide
rolled
out
before
code?
Freeze,
I,
don't.
C
B
A
I
mean
so
the
thing
I
put
in
there
like
thought
when
this
was
raised
last
time
you
know
I
think
we
kind
of
agreed
that
just
freezing
in
general
out
of
paranoia
is
bad
because
it
hampers
the
productivity
of
this
team.
Small
incremental
changes
are
better
than
letting
huge
things
build
up.
So
it's
more
about
evaluating
what
we're
planning
on
doing
against
whether
that
could
be
seen
as
high
risk
or
destabilizing
the
queue
so
I
would
just
ask
you
maybe
is
like
that.
A
The
test
infra
guy
like
if
you
happen
to
catch
wind
of
there's
some
migrations,
you're
doing
you
think,
and
you
think,
like
hey,
that
might
be
a
little
disruptive.
Maybe
we
should
hold
off
on
that
until
after
code
freeze,
you
know
elevate
it,
and
these
are
the
sorts
of
things.
That'll
that'll
start
coming
up
kind
of
daily
during
the
release
or
down
period
like
I.
Think
the
biggest
question
for
me
was
whether
or
not
tied
was
actually
happening.
A
B
D
I
mean
I
most
of
the
plugins.
Besides
the
Simic,
you
are
not
going
to
be
terribly
disruptive
I,
don't
think
it
should
be
pretty
easy
to
move,
and
since
we
have
enough
other
repos
I've
been
able
to
canary
them
on
a
test,
infra
and
other
you
know
other
smaller
repost,
so
that
be
pretty
safe
to
do
for
at
least
some
of
the
plugins
I
also.
B
Think
in
general,
test
infra
has
very
much
gotten
into
a
place
where
were
heavily
trying
things
not
against
anything
that
affects
KK
in
particular,
before
we
switch
over
so
I
I
think
we
should
be
safe
to
do
some
migrations
that
have
been
carry
tested
like
that.
But
I,
don't
think
we'll
be
doing
anything
super
disruptive
and
we
definitely
won't
be
doing
anything
that
hasn't
been
extensively
carried.
Do.
E
C
E
D
E
B
Do
think
a
tied
migration
would
actually
be
relatively
safe.
I
think
the
things
that
are
going
to
be
breaking
are
just
the
then
it's
fundamentally
designed
differently
like
even
now
batch
merges
are
not
that
reliable,
they're
pretty
easy
to
break.
So,
if
you
know,
if
batch
merge
was
acting
up
a
bit
I
think
we
just
told
you
oh
yeah,
we.
D
D
D
Already
running
on
two
repos
I
think
but
I
think
the
other
one
is
Federation,
which
has
like
nothing
in
it
right
now
as
soon
as
we
have
what's
that
can.
E
B
E
I
was
talking
with
him
this
morning,
but
it
seems
like
they're
a
little
bit
smaller
yeah
but
I
mean
like
the
other
blockers,
for
you
guys
having
the
context
being
green,
don't
matter
for
us
for
origin
because
we
have
like
that's
already
happening
for
our
repos.
So
as
soon
as
we
have
a
new
I
I
think
we're
happy
to
stress
test
it
on
ours,
which,
like
more
throughput,
I,
think.
B
Need
a
we
need,
I
think.
Maybe
we
should
just
add
a
crowd
plugin
that
allows
people
to
like
for
screen
Oh,
because
because,
like
someone
can
come
in
right,
now
drop
a
slash
test,
some
random
test
and
then
you're
gonna
have
the
status
for
that
and
you
can't
you
can't
get
rid
of
it
right.
Maybe
we
can
look,
we
could
allow
like,
like
owners
or
something
to
michaelis.
E
Didn't
have
an
experimental,
refresh
plug-in
that
was
doing
that
like
run
tests
again
and
ignore
that
stuff
into
the
room.
I
think
either
is
that
or
it
was
reconciling
stale
context
or
something
of
the
sort,
but
it
you
know,
maybe
that
would
be
an
appropriate
place
to
put
that
code.
I
think
it's
still
a
work
of
artists
PR
this
summer
Oh.
E
B
Stopping
tide
from
like
checking
the
contacts
because.
D
A
B
Yeah
I'm
not
sure
what
makes
what
where
exactly
makes
the
most
sense
but
I
think
we
could
have
a
pretty
simple
thing
somewhere
work.
Some
component
is
allowed
to
wipe
away
context
for
jobs.
They
you
know,
maybe
somebody
wanted
to
see
or
forgot
to
set
skin
for
I
mean,
maybe
maybe
we
don't
know
eyes
for
this,
a
wheat
we,
you
know,
try
to
enforce,
not
reporting
at
github,
but
somebody's
gonna
have
a
straight
status
and
that's
gonna
walk.
If
we
could
just
wipe
away
that
status
quickly,
I
mean
today
they.
A
E
B
E
A
A
That
might
be
valid.
We're
kind
of
over
time
here.
I
think
like
this
is
something
we
should
hash
out
in
in
an
issue
or
breakout
or
something
I
would
normally
keep
the
money,
but
I
kind
of
have
to
run
I
guess
my
other.
The
other
thing
that
popped
in
my
head
about
potentially
breaking
migrations,
would
be
the
approval
Handler
that
we've
been
carrying
on
other
repos.
Is
that
something
we
would
anticipate
turning
on
4kk
or
are
we
gonna
stick
with
lunch?
Github.
D
I'm
planning
to
turn
that
on
to
Kay
Kay,
it's
been
working
everywhere
that
I've
deployed
it
so
far.
The
only
difference
is
that
it's
faster
and
less
buggy.
No,
there
was
three
repos
that
I
wanted
to
apply
it
to,
and
I
made
PRS
for
that
and
I've
applied
to
two
of
them
today.
I'll
do
the
the
last
one
and
then
once
that's
and
I'll
do
KK
later
this
week
and.
B
That
would
be
my
argument
towards
the
only
reason
we
might
still
want
to
consider
finishing
hashing
out
this
tight
stuff
and
making
it
happen
is
that
the
lunch
github
tool
chain
is
quite
buggy
and
stops
us
from
doing
things
like
making
cross
blocking.
It
would
be
great
to
migrate
to
tight,
which
is
much
smaller
and
easier
to
debug.
A
Yeah
I
I
really
want
to
see
that
happen,
but
I
just
want
to
make
sure
we're
diligent
about
keeping
things
predictable,
even
if
they're
buggy
so
I
do
like
the
idea
that,
like
we've
canary
different
handlers
or
plugins,
and
we
can
migrate
those
over
because
that
means
you
know
when
somebody
complains
about
a
problem.
We
can't
necessarily
say:
oh,
that's
lunch.
Github,
it's
dead
code,
we're
not
fixing
it.
We
actually
have
the
power
to
fix
it,
because
it's
prowl
anyway,
this
actually
ran
way
longer
than
I
thought.
It
would.