►
From YouTube: Deflaking Kubernetes Tests
Description
@liggitt walks us through finding and fixing test flakes
Notes: https://gist.github.com/liggitt/6a3a2217fa5f846b52519acfc0ffece0
(taken from Kubernetes SIG Testing - 2020-08-25)
A
A
So
my
goal
for
today
is
to
sort
of
help
you
get
into
the
mindset
of
deflaking
and
fixing
flakes
and
then
show
you
some
ways
to
find
things
to
fix
and
give
you
some
tools
and
techniques
to
make
you
more
effective
at
fixing
flakes
and
avoiding
them
in
the
first
place.
So
I
thought
I'd
start
with
the
mindset
in
general.
A
A
flake
means
that
we
have
a
problem
and
the
problem
can
be
in
one
or
more
places,
but
a
failing
test
is
not
a
thing
we
want
and
building
that
into
our
mentality
as
a
community
will
help
us
a
lot,
and
so
thinking
about
where
the
problem
is,
it
could
be
in
the
thing
that's
being
tested.
That's
the
ideal
right
like
if
a
test
fails.
We
really
want
that
to
be
a
good
signal
that
we
have
a
thing
we
need
to
fix.
A
In
code
that
we
ship
and
run,
but
sometimes
there's
a
problem
in
the
test
itself,
the
test's
making
bad
assumptions
or
is
written
in
a
fragile
way,
and
then
the
third
possibility
is
that
the
thing
running
the
test
has
a
problem.
So
like
infrastructure
issues,
and
so
as
developers.
The
sensation
is
to
assume
that
our
code
is
perfect
and
our
tests
are
perfect
and
the
problems
are
always
in
the
infrastructure
and
sadly,
that
has
been
more
or
less
true
at
various
times.
A
But
we've
worked
really
hard
over
the
past
month
to
improve
ci
infrastructure
consistency
to
make
it
a
better
signal
when
a
test
fails.
So
the
goal
is
in
an
ever-increasing
way
when
a
test
fails.
That
means
there's
a
problem
in
the
thing
being
tested
or
in
the
test
itself,
and
it
really
needs
to
be
looked
into
so
then
the
next
temptation
is
to
assume
that
flakes
are
a
test
only
issue.
So
if
the
test
is
timing
out
well,
we
should
just
increase
the
timeout
on
the
test.
A
So
some
examples,
if
you're
seeing
a
flake
in
a
test
and
if
you
add
a
timeout
or
at
a
pole
or
something
and
the
flight
goes
away,
make
sure
the
thing
that
you
are
pulling
for
or
waiting
for
is
supposed
to
be
an
asynchronous
thing.
A
That's
not
always
the
case
we've
discovered
times
where
a
test
by
pulling
or
waiting,
we
were
actually
changing
what
the
test
was
verifying.
Another
example
is
lengthening
timeouts,
so
I
have
a
couple
examples
here.
So
this
is
an
example
of
a
test
which
was
depending
on
garbage
collection,
and
it
was
running
in
our
ede
tests
and
in
our
e-test
we
run
a
lot
of
things
in
parallel
and
new
api
types
show
up
and
disappear
and
when
new
api
types
show
up
and
disappear,
that
can
actually
put
garbage
collection
into
a
back
off
state.
A
Briefly,
where
it
says
I
need
to
clean
up
this
thing,
but
this
thing
doesn't
seem
to
exist
anymore.
I'm
gonna
kind
of
wait
for
30
seconds
and
resync,
and
so
it's
not
unexpected
that
garbage
collection
would
sometimes
take
30
seconds
longer
than
other
times
in
an
e
to
e
test,
and
so
a
test,
that's
depending
on
garbage
collection
should
in
in
our
parallel
ede
tests
should
tolerate
a
delay
like
that.
So
in
this
case
adding
a
timeout
was
appropriate
in
another
example.
A
An
operation
that
we
expected
to
be
very
very
fast
was
actually
very,
very
slow,
and
so,
by
digging
into
the
logs,
we
were
seeing
that
a
particular
operation
that
we
expected
to
take
like
on
the
order
of
a
second
was
taking
15
to
20
seconds,
and
so,
if
we
had
just
blindly
added
a
one
minute,
timeout
toleration
to
that
ede
test,
we
would
have
missed
that
we
had
a
pretty
severe
performance
bug
and
so
the
the
fix,
let
me
see
if
I
can
find
the
fix
for
that.
A
I
think
this
was
where
he
fixed
it
yeah.
So
by
fixing
the
bug
he
reduced
the
run
time
of
this
method
from
sometimes
takes
15
seconds
to
consistently
takes
about
two
seconds.
A
So
adding
timeouts
in
the
appropriate
places
are
fine,
but
we
want
to
make
sure
we
understand
the
root
cause
before
we
do
that,
and
then
the
last
thing,
I'll
call
out
is
just
make
sure
that
the
changes
you
make
to
the
test
are
still
testing
what
you
expect.
So
this
was
one
we
discovered
recently
where
there
was
a
flaky
test
in
the
plugin
watcher
and
by
changing
the
test
by
initializing
things
in
a
different
order.
We
could
make
the
test
run
consistently,
but
by
initializing
things
in
a
different
order.
A
We
actually
weren't
exercising
reality
like
in
reality,
the
cubelet
starts
and
it
could
start
before
or
after
or
during
plugins,
and
so
our
test
shouldn't
care.
What
order
we
start
things
in
it
should
be
resilient
to
any
order,
and
so
michelle
did
a
really
good
job
of
noticing
that
the
initial
fix
was
actually
breaking
what
we
were
supposed
to
be
testing
and-
and
we
ended
up
fixing
a
real
bug,
and
so
this
turned
into
a
bug
fix.
A
Instead
of
a
flight
fix
all
right
so
now
that
we
have
that
that
mindset,
how
do
you
find
plates
to
fix?
You
would
think
this
would
be
easy
as
much
as
we
complain
about
flakes,
but
sometimes
it's
actually
kind
of
hard
to
find
things
that
are
are
actually
important
to
fix
and.
A
Good
places
to
start
issues
that
people
have
already
reported,
we
have
a
label
kind
flick,
so
looking
for
issues
that
have
already
been
reported,
and
that
can
help
you
see
if
someone's
already
working
on
this
or
how
much
this
is
getting
mentioned.
If
people
are
saying
yep,
I
saw
this.
I
saw
this.
I
saw
this
that's
one
place
to
look
and
you
can
filter
these
by
sig
label
to
see
flakes
relevant
to
your
sig.
A
This
actually
looks
better
than
it
has
in
a
long
time,
which
is
excellent
so
used
to
you
would
open
these
up
and
there
would
be
like
five
or
six
really
bad
flakes
in
each
job.
So
that
is
less
the
case
now,
which
is
next
one,
but
that's
one
place
to
look.
A
A
Here's
our
gce
container
d
test
grid
and
you
can
make
this
super
small
and
then
you
can
see
which
tests
have
been
failing.
It
looks
like
we
kind
of
have
a
variety
there's,
not
like
one
test.
That's
repeatedly
failing
except
this
one,
which
we
already
have
an
issue
for,
but
this
can
be
a
good
way
to
sort
of
identify
once
you
zoom
out
and
see
like
several
weeks
worth
or
weeks
worth
of
runs.
A
If
you
see
one
test
failing
repeatedly,
that
could
be
a
good
place
to
start,
and
you
can
also
filter
this
down
to
anything
in
the
test
name.
But
sig
is
a
a
good.
A
A
good
thing
to
filter
on
so
that,
if
you're
looking
for
things
specific
to
your
sig,
that's
a
good
way
to
to
find
them,
and
then,
lastly,
is
the
triage
board.
I
love
this.
This
just
got
really
rewritten
in
go
it's
much
faster
now
and
this.
This
is
one
of
the
most
powerful
tools
that
I
use
it
lets
you
filter
by
sig.
A
Now
these
are
the
sig
titles
associated
with
the
tests,
so
that
doesn't
always
show
you
exactly
what
you
want,
but
it
can
be
a
good
starting
place,
but
then
it
also
lets
you
filter
on
failure,
text
or
the
job
name
or
the
test
name
or
any
combination
of
those
things
and
then
exclude
specific
things.
So
if
I
wanted
to
find
something
related
to
that
cube,
control.
A
Flake
about
standard-
and
I
can
put
in
failure
text,
find
how
often
it's
happening
and
then
jump
down
and
see
all
the
specific
jobs
where
it's
failing
and
then
even
links
to
specific
instances.
A
So
that
is
a
good
place
to
start
as
an
example.
As
an
example,
I
went
through
some
of
the
sig
off
attributed
failures
and
found
some
really
noisy
tests
that
had
been
marked,
flaky
and
just
kind
of
ignored
for
a
long
time
and
actually
cleaned
those
up.
So
the
sig
off
filter
signal
is,
is
much
clearer
now
and
we'll
be
working
on
getting
these
cleaned
up
as
well
all
right.
So
what
are
good
things
to
put
in
a
flake
report?
A
Let
me
pull
this
over
and
we
can
talk
through
some
of
the
helpful
things
to
put
in
if
something
is
failing
in
multiple
jobs.
We
see
this,
especially
in
our
end-to-end
tests,
where
we
have
different
variants
of
them:
different
container
runtimes,
different
network
setups,
if
something's
failing
in
multiple
jobs,
that's
helpful
to
know
if
we're
seeing
something
fail
in
only
one
variant.
That's
also
helpful
to
know,
because
it
might
be
something
specific
to
that
variant.
A
If
there's
more
than
one
test
that
is
having
the
same
failure
text,
that's
helpful.
The
triage
board
is
great
for
figuring
this
out
and
then
specific
links
to
the
test
grid
queries
the
reason
for
the
failure
so
that,
when
you
search
github
for
like
some
random
text
and
failure,
you
find
it
links
to
the
triage
board
and
the
specific
failed
examples.
All
of
these
are
super
helpful
for
helping
someone
who
wants
to
dive
into
fixing
this
get
context
right
away
and-
and
most
of
this
is
in
the
flake
template
the
flake
issue
template.
A
But
I
thought
maybe
a
specific
example
of
like
what
good
things
to
put
in.
There
would
be
helpful
all
right
great.
So
now
we've
got.
We've
found
the
test,
that's
flaking
that
we
want
to.
We
want
to
fix.
I
thought
I
would
go
briefly
through
ways
to
reproduce
flakes
in
each
different
kind
of
test.
A
So
one
thing
I
really
like
about
unit
tests
is
that
you
can
reproduce
them
locally,
so
I
am
in
my
kubernetes
folder
and
there
is
an
open
issue
about
a
flake
for
this
unit
test,
and
so,
if
I
just
run
that
test,
sadly
it
passes
now
you
notice
it
says
that
was
cached.
A
You
have
to
watch
out
for
the
go
cache,
it
will
cache
test
results,
and
so
you
can
bypass
that
by
telling
it
some
uncashable
argument
like
how
many
times
you
want
it
to
run,
so
that
is
no
longer
a
cache
result,
but
it's
still
passed.
So
that
means
we've
definitely
got
a
flake
like
it's
it's
passing
for
me.
A
So
what's
the
next
thing,
we
can
try
to
reproduce
this
flake
the
race
detector,
and
I
have
a
link
to
the
discussion
of
that-
will
actually
rewrite
the
code
when
it
compiles
it
to
sort
of
put
in
delays
or
detect.
A
A
My
favorite
tool
this
year
is
the
stress
tool,
and
so
I
linked
to
that.
I
already
have
it
installed,
but
what
that
lets
you
do
is
build
a
binary
for
the
test
so
go
test.
I
want
to
build
it
with
race
detection
enabled
and
instead
of
telling
it
what
test
I
want
to
run.
I'm
going
to
give
it
the
dash
c
argument,
and
that
is
telling
it
to
compile
the
test,
and
that
is
going
to
create
this
binary
in
my
current
directory.
A
So
now
I
have
a
binary
which
I
can
run
standalone
and
I
can
give
it
this
same
argument.
So
when
you
run
standalone,
you
have
to
give
it
the
test
run
argument.
A
I
can
run
it
standalone
and
it
still
passes.
But
now,
if
I
stress
it,
this
is
going
to
run
a
bunch
of
instances
in
parallel
over
and
over
and
over
and
over
so
in
about
five
seconds.
It
ran
almost
300
instances
of
that
test
and,
as
you
can
see,
it's
flicking
immediately
like
the
thing
that
we're
observing
reproduces.
A
That's
super
super
useful
and
it's
just
sitting
there
like
running
300
times,
and
so
now
we
have
a
reproducer.
Now
we
can
start
to
dig
in
and
you
know
try
to
figure
out
where
the
where
the
problem
is.
So
that's
that's
what
I
love
doing
for
unit
tests
integration
tests,
you
can
actually
do
really
similar
things.
Most
of
our
integration
tests
expect
an
std
instance
to
be
started,
so
you
can
do
that
pretty
simply
by
just
starting
std
in
another
tab.
A
And
then
can't
you
can
run
the
the
integration
tests
using
the
same
method,
either
directly
with
your
test
or
you
can
build
them
into
a
binary
and
stress
them.
The
question
was:
where
is
the
stress
binary?
I
linked
to
it
right
there
and
you
can
install
it
with
that.
Go,
get
command.
A
So
for
just
simple
flakes:
where
we're
seeing
an
integration
test,
failure
that
same
approach,
works
works
pretty
well.
One
interesting
issue
that
I
ran
into
was
a
deadlock
or
a
timeout.
That
would
fail
the
entire
package
on
an
integration
test
and
when
that
happens
it
it
barks
out
about
10
trillion
guild
routines
and
you
had
no
idea
which
test
even
failed.
A
The
question
was
why
it
could
be
compiled
into
a
banner.
So
it's
it's
actually
compiling
the
tests
for
that
package
into
a
binary,
and
the
reason
is
so
that
we
can
invoke
that
with
stress.
So
it's
not
re-recompiling
the
tests.
Every
time
we
build
the
tests
once
and
then
we
invoke
them
a
bunch
of
times
in
parallel.
A
So
this
is
something
that
we've
seen
on
a
few
of
our
packages.
We
don't
see
it
super
often,
but
I
wanted
to
talk
about
how
you
could
track
that
down.
So
this
was
the
sig
scheduling.
Integration
test
was
dead,
locking
and
timing
out,
and
so
the
way
that
we
tracked
this
down.
A
When
we,
when
we
just
stressed
the
whole
package
like
this,
we
could
see
it
was
completing
10
runs
of
the
package
and
it
was
completing
nine
runs
of
the
package
at
a
time.
Then
eight
runs
at
a
time,
then
seven,
then
five,
but
gradually
the
test
runners.
I
think
it
defaults
to
eight
or
ten
in
parallel.
Gradually
the
test
runners
were
getting
hung
up
on
some
deadlock
and
then
fewer
and
fewer
were
completing
at
a
time
and
then,
finally,
after
two
minutes,
we
got
a
timeout.
A
So
the
way
that
I
broke,
this
down
was
to
stress
individual
tests.
So
first
I
would
just
run
one
test
in
the
package
and
I
would
see
how
long
does
this
test
take
normally,
and
so,
if
it
took,
you
know
a
tenth
of
a
second
normally,
then
I
would
stress
that
one
test
and
I
would
give
it
generously
like
a
hundred
times
as
long
as
it
normally
takes
to
complete,
and
this
way
I
didn't
have
to
wait
for
two
minutes
for
the
timeout
to
happen.
A
If
it
was
gonna
timeout,
it
would
time
out
after
10
seconds,
so
I'm
running
one
test,
giving
it
a
10
second
timeout,
when
normally
it
takes
a
tenth
of
a
second
or
whatever.
You
have
to
figure
that
out
per
test,
and
then
I
stress
it
in
parallel,
and
basically
I
let
these
run
for
you
know
20
or
30
seconds,
and
once
it
was
happy
for
20
or
30
seconds,
I
said.
A
Well,
that's
probably
not
the
culprit,
and
so
I
just
went
one
by
one
by
one
by
one-
and
this
was
the
culprit,
this
one
took
a
tenth
of
a
second
on
average
when
I
stressed
it
with
a
timeout
of
10
seconds
after
10
seconds
immediately
I
was
getting
time-out
flakes,
and
so
that
gave
us
a
particular
test
to
look
at
once.
We
had
a
particular
test
to
look
at.
Our
job
is
much
smaller,
we're
just
trying
to
break
the
problem
down
and
find
where
the
particular
problem
is
so
looking
at
that
test.
A
It
was
timing
out
on
this
weight.
So
then
we
start
adding
debug
logging
to
the
places
where
we
could
return
early
or
see
the
thing.
That's
going
to
unblock
this,
and
once
we
once
we
did
that
the
issue
was
pretty
quick
to
resolve.
It
turns
out.
We
weren't
we
weren't
waiting
for
all
of
our
caches
to
be
synced
before
we
were
starting
the
test,
and
so
the
event
we
were
waiting
for
in
the
test
happened
before
we
kicked
off
the
weight
all
right.
So
now,
everybody's
favorite
topic
ede
tests.
A
Okay,
so
remember
we
said
the
problem
in
a
flake
could
be
the
thing
that's
being
tested
right.
Well,
the
bad
news
is
for
an
e
to
e
test.
A
The
thing
being
tested
is
everything
pretty
much
so,
on
the
one
hand,
that's
good,
because
we
do
actually
want
to
make
sure
our
system
works
when
you're
on
the
whole
thing
we've
all
seen
like
the
comic
of
like
unit
tests
passed,
integration
tests
fail
with
like
two
drawers
that
each
open
individually,
but
you
can't
open
them
at
the
same
time
or
whatever,
but
the
bad
thing
is
an
idiot
ed
test
can
fail
because
of
something
completely
unrelated.
A
So
an
example
we
ran
into
today
a
gluster
volume
sub
path
test.
Like
you
look
at
the
test
title
and
you're
like
oh
man,
we
must
have
a
gluster
problem
or
a
subpath
problem
or
volume
problem,
but
nope.
The
problem
is
that
the
namespace,
the
test
was
using,
got
deleted
by
something,
and
so
like
the
setup
for
the
test
failed
because
the
namespace
was
being
deleted
and
so
just
be
aware
that
you
can't
just
look
at
the
title
of
the
test
for
an
ed
test.
A
You
actually
have
to
dig
into
what
the
problem
is,
so
the
takeaway
is
prefer
unit
and
integration
tests.
If
those
are
sufficient
to
test
the
thing
you're
looking
at
and
then
yeah,
don't
assume
the
title
of
the
ede
test
identifies
the
problem.
So
the
steps
that
I
follow
for
deflating
an
ede
test
first
step
is
just
gathering
information
right.
So
this
link
is
gonna,
go
stale
because
we
reap
our
artifacts
from
ede
runs.
But
hopefully
you
get
the
idea.
There's
a
lot
of
things
we
capture
from
ede
runs.
A
This
artifacts
tab
is
your
friend
under
here
we
have
the
build
log
which
is
all
of
the
output
from
the
test
when
it
was
running,
but
then
under
artifacts
we
capture,
tons
and
tons
and
tons
of
logs.
So
for
the
control
plane.
We
capture
logs,
like
the
api
server
audit,
which
will
tell
you
in
detail
every
request
that
was
made
and
who
made
it
and
what
order
it
was
made
in
and
don't
forget
that
there's
archived
rotated
versions
of
them.
These
are
big.
A
But
if
you
need
to
know
what
order
things
happened
in
they're,
very
useful,
the
api
server
log,
the
controller
manager
and
the
scheduler,
those
are
the
main
logs
that
you
normally
care
about
on
the
control
plane
and
then
for
each
node
and
most
of
our
ed
tests
set
up
three
node
clusters.
We
capture
the
container
runtime
logs,
so
that's
either
a
docker
log
or
container
d.
A
We
capture
cube
proxy
and
we
capture
cubelet.
Those
are
the
main
things
you
might
care
about
for
for
most
ede
issues.
So
once
you
have
those
things
gathered,
the
next
step
is
to
filter
and
correlate
that
information.
So
if
your
first
step
is
to
kind
of
pick
likely
candidates
like
the
the
things
you
know
that
interact
around
this
issue
might
be,
the
test
is
doing
something.
So
you
care
about
the
test
log
and
the
api
server
log,
and
then
the
controller
manager
is
going
to
do
something
and
the
cube
is
going
to
do
something.
A
So
if
you
only
look
at
your
namespace
you'll
miss,
maybe
the
root
cause,
if
you're,
trying
to
if
you're,
trying
to
figure
out
how
a
particular
object
got
into
a
particular
state,
a
pod
got
into
a
particular
state
or
something
then
filtering
just
to
that
pod,
or
that
namespace
is
probably
reasonable.
A
So
you
can
filter
the
logs
for
the
relevant
things
I
like
to
keep
timestamps
at
the
beginning
of
the
logs
and
then
right
after
the
timestamp
put
something
that
identifies
the
component
and
then
merge
all
the
files
into
one
file
and
sort
by
time,
and
so
you
end
up
with
something
like
like
this.
Let's
see,
if
I
can
find
there,
we
go
so
this
was
when
we
were
trying
to
debug
a
garbage
collection
issue.
So
you
see
the
timestamp.
So
this
is
the
api
log
api
log
cube
controller
manager.
A
I
thought
I
put
cubelet
in
here.
Maybe
it
was
just
those
two,
oh
yeah
and
then
so
this
was
the
output
from
the
ede.
So
this
was
the
test
code,
that's
running
so
when
the
test
started.
Looking
for
the
thing
to
go
away,
maybe
I
put
maybe
that
was
it,
but
but
you
get
the
idea
you
take
the
logs
from
the
relevant
components.
A
A
A
So
this
is
an
example
of
a
bug
in
run
c,
which
was
actually
an
old
version
of
run
c
configured
on
this
particular
job,
and
so
the
clue
to
this
was
the
line
numbers
of
the
message.
So
in
the
error
message,
you
know
we
see,
process
linux
line,
449
and
then
stuff
happening,
and
if
we
look
at
the
version
of
process
linux
that
we
have
in
kubernetes,
that
line
number
doesn't
match
what,
where
that
message
comes
from,
and
so
tracking
that
line
number
down
actually
pointed
at
the
run
c
component.
A
A
Just
because
of
which
log
it
showed
up
in
so
matching
up
line,
numbers
can
be
super
helpful
and
then,
finally,
if
you,
if
you
are
trying
to
figure
out
like
which
branch
is
being
taken
or
what
timing
issue
is
happening,
and
we
don't
have
log
messages
like
feel
free
to
add
them,
adding
debug
blogging
to
track
down
a
flake
is
totally
acceptable.
So
I'd
like
to
do
an
example
of
that
all
right,
so
now,
you've
reproduced
the
flake
you've
found,
like
maybe
sort
of
the
area
where
it's
happening.
A
What
are
the
types
of
things
you
can
do
to
to
look
for
and
to
sort
of
force
the
flake
to
happen?
So
the
first
thing
does
the
test
assume
that
something
that's
happening.
Asynchronously
is
happening.
Synchronously,
so
is
the
test
gonna
do
something
and
then
immediately
check
a
condition
when
really
the
thing
that's
going
to
make
that
condition
pass
might
not
run
right
away,
and
then
there
are
ways
to
stimulate
this.
A
So
if
the
test
is
kicking
off
a
go
routine
or
the
component,
that's
being
tested
is
kicking
off
a
go
routine,
put
a
sleep
at
the
top
of
the
go
routine
and
that
will
simulate
the
go
routine.
Taking
a
while
to
start-
and
you
I
mean
I
say
it
a
second,
but
it
could
be
five
seconds
or
whatever
try
that
and
see
if
that
makes
the
flake
happen
reproducibly.
A
These
are
all
the
types
of
places
where
we
normally
do
asynchronous
things.
So
if
you
have
a
watch
event
handler.
A
Do
things
like
that
to
see
if
that
makes
the
the
flake
reproducible
and
then
a
normal
pattern
is
to
observe
watch
events
and
then
queue
up
work
to
do,
and
so
try
doing
the
same
thing
in
the
the
worker,
if
you
put
it
at
the
beginning
of
the
worker,
then
that
simulates,
a
worker,
that's
bogged
down
and
is
going
to
react
slowly
to
to
work
and
if
you
put
it
at
the
end
of
the
worker
that
simulates
a
worker
that
does
work
but
then
gets
distracted
with
other
things
before
coming
back
to
get
pick
up
new
work.
A
This
is
sort
of
similar,
but
sometimes
a
test
will
do
work
and
assume
that
it
can
complete
its
work.
But
there's
a
background
process
running.
That's
gonna
do
conflicting
stuff.
So
this
was
a
good
example
where
the
test
was
doing
some
setup.
A
It
was
creating
a
service
object
and
then
creating
an
endpoints
object
and
most
of
the
time
that
was
fine,
but
we
actually
have
a
controller
that
when
you
create
a
service
object,
we'll
create
endpoints
objects
for
you
in
the
background,
and
so
if
the
test
setup
lost
the
race,
the
test
would
get
an
already
exist
error
when
it
was
trying
to
set
up
its
endpoints
and
we
could
trigger.
B
A
A
couple
rules
of
thumb
tests
that
assume
things
are
going
to
be
fast,
something
that
takes
like
a
second
or
less
locally
could
take
a
few
seconds
in
ci
environments.
For
a
couple
reasons,
ci
environments
normally
have
more
resource
constraints
than
like
a
local,
powerful
depth
machine
and
often
we
run
multiple
tests
in
parallel.
So
maybe
it
happens
really
fast
when
you
run
just
your
test,
but
if
you
run
10
or
15
or
20
tests
in
parallel
things
slow
down
a
little
bit.
A
So
unless
your
test
is
specifically
a
performance
or
timing
test,
don't
put
super
tight
tolerances
weight,
dot
forever
test
timeout
is
set
to
30
seconds.
That's
a
reasonable
thing
to
use,
for
you
know,
quote
unquote
things
that
should
not
take
very
long,
that's
useful
when
we
don't
want
to
test
to
hang
for
10
minutes
before
failing.
We
wanted
to
actually
fail
quickly,
but
we
don't
care
for
the
purposes
of
this
test
if
it
takes
one
second
or
five
seconds
or
ten
seconds.
A
A
Another
thing
we
see
a
lot
of
is
assuming
deterministic
output,
so
these
are
just
your
friendly
reminders
that
map
iteration
and
go
is
non-deterministic,
and
so,
if
there
is
something
being
a
list
being
compiled
or
a
set
of
steps
being
done
by
iterating
over
a
map,
those
are
going
to
happen
in
non-deterministic
order,
so
either
sort
and
compare
or
tolerate
any
order.
So
there's
a
link
to
an
example
of
that
this
was
a
fun
one
that
we
found.
Sometimes
we
have
things
that
will
do
random
allocation.
A
We
also
can
request
a
specific
ip,
and
so
we
had
a
test
that
was
creating
one
service
randomly
and
then
creating
another
service
with
a
specific
ip
and
one
out
of
every
256
runs
the
randomly
allocated
ip
would
be
the
same
as
the
static
ip
which
we
later
requested
and
we'd
get
a
conflict.
So
just
be
aware,
if
you're
mixing
those-
and
in
this
case
there
is
actually
a
bug
that
we
could
fix
to
improve
things
for
everyone,
so
that
kind
of
goes
back
to.
Where
should
we
make
the
fix?
A
Is
it
a
test
only
issue,
or
is
it
a
a
a
real
bug
we
should
fix.
So
in
this
case
it
was
a
real
bug
we
could
fix
and
then
the
last
one
I
was
going
to
call
out
if
you're
using
a
fake
client-
and
you
have
like
an
informer
watcher
on
it,
it
can
do
a
read
list
in
a
rewatch
at
any
point,
and
so,
if
you're,
making
fake
client
calls
and
then
expecting
like
exact
actions
to
be
output,
those
can
get
interleaved
spuriously
with
the
informer.
A
In
the
background,
so
it's
better
to
look
for
the
specific
things
you
wanted
to
happen
instead
of
just
asserting
exact
matches.
A
Once
you
know
the
tools
and
kind
of
get
a
workflow
to
where
you
can
do
the
gathering
and
the
filtering
and
correlating
that
usually
takes-
I
mean
just
that
bit
once
you
get
it
down
takes
you
know.
Five
minutes
like
it
takes
a
while
to
get
that
workflow
down,
but
once
you
have
something
correlated,
it
really
varies.
Sometimes
the
issue
will
jump
out
at
you
immediately.
A
Sometimes,
like
you
saw
the
one
where
we
had
to
add
more
debug
logging,
because
we
didn't
have
enough
information
about
the
timing
stuff
there.
There
is
no
usual,
it
could
be
five
minutes.
It
could
be
a
month.
B
Right,
I
totally
understand
it's
kind
of
kind
of
a
long
tail
for
some
of
these
things,
but
it's
just
sort
of
a
gut
check
like
I
feel
like
I've
seen
you
and
some
other
folks
go
through
an
impressive
number
of
these
lately.
So
it
does
feel
like
there's
a
bit
of
a
rhythm,
at
least
as
far
as
uncovering
some
of
the
lower
hanging
fruit
in
unit
and
integration
tests.
A
Yeah,
the
unit
and
integration
tests
are
way
way
easier
and
faster
to
figure
out.
Just
because
of
the
rapid,
you
know
make
a
change
reproduce
make
a
change,
reproduce
make
a
change
reproduce,
so
those
you
can
actually
normally
resolve
or
at
least
root
cause
within
you
know
a
couple
hours
sometimes
once
you
find
the
root
cause
the
root
cause
is
this
test
is
fundamentally
wrong
and
we
need
to
rewrite
it,
and
so
that
can
be
tricky
but
root,
causing
unit
integration
issues.
B
A
Yeah
I
mean
the
the
more
realistic.
The
setup
of
the
test
is
the
better.
So
if
you
can
use
the
same
constructors
to
set
up
the
the
controller
or
the
component
that
you
know
are
really
being
used
when
we
run
the
thing
in
production,
that's
nice
sometimes
we'll
see
issues
where
the
setup
code
was
faulty
and
we
were
sort
of
hacking
together,
fake
clients
and
artificially
running,
go
routines
and
waiting
for
cache,
syncs
and
informers
in
totally
different
orders
than
happen.
A
When
you
run
the
component
for
real
and
so
the
more
you
can
use
the
normal
constructors,
the
better
thinking
about
like
behavioral
testing,
like
we're
gonna
we're
gonna
trigger
some
input,
either
by
calling
a
go
function
directly
or
by
creating
some
api
object
and
waiting
for
the
component
to
observe
that
we're
going
to
trigger
some
input,
and
then
we
have
some
expectation
of
behavior,
the
more
you
can
limit
the
test
just
to
the
inputs
and
the
expected
behavior
the
better.
A
Instead
of
sort
of
this
extremely
fragile,
like
I
expect
this
call
to
be
made,
then
this
call
would
be
made.
This
call
will
be
made.
It
must
happen
in
this
order.
It
must
happen
with
this
timing
and
like
for
a
functional
test.
That's
probably
not
what
we
care
about
like
we
want
an
invariant
of.
I
create
a
thing,
and
then
this
happens
to
that
thing,
and
so
the
more
you
can
scope
the
test
to
just
those
things
better.
B
Okay,
since
you
mentioned,
integrations
necessarily
or
just
one
fyi,
the
part
of
the
reason
that
the
flight
query
that
you
linked
looked
way
better
than
it
has
in
a
while
is
because
integration
tests
don't
show
up
on
that
right
now,
it
has
to
do
with
some
of
the
crowd.
Jobs
use
one
mechanism.
B
The
repo
they
climb-
and
that
means
that
some
of
the
data
that's
consumed
by
our
flake
queries,
doesn't
doesn't
plan
for
those.
So
for
integration
tests,
especially
the
triage
tool,
is
the
better
place
to
go
looking
for
what
are
the
integration
tests
that
are
failing
most
often
right
now,
which
may
be
a
hint
or
clue
into
like
what
are
the
flakiest
tests
that
we
should
look
at
addressing.
A
A
Point
out,
too,
that
that
the
trios
board
and
test
grid
actually
for
some
jobs,
do
not
differentiate
which
branch
and
so
always
like.
If
you're
looking
at
failures
that
you
found
via
the
triage
board
or
via
test
grid,
always
make
sure
that
the
pr
that,
especially
for
pull
request,
flakes
make
sure
that
the
flake
was
actually
running
on
a
pull
request
against
the
master
branch.