►
Description
Discussion on finding a way to enforce more reliable testing practices
A
To
others
to
watch
some
in
on
the
topic
of
red
master,
so
I
mean
I've,
been
talking
with
a
lot
of
companies
here
at
cube
con
just
to
see
like
how
they're
handling
their
pipelines
and
I
mean
it's
not
completely
shocking
to
find
out
that
everyone
has
the
same
problem.
Obviously,
right,
like
everyone
has
like
a
large
test
week
and
very
flaky
test.
One
discussion
I
had
with
one
of
the
companies
was
very
interesting
and
I
thought
that
made
a
lot
of
sense.
A
What
I
do
with
tests
that
are
flaky?
They
confirm
that
their
flake
once
they
are
flaky,
they
move
it
out
of
the
general
test,
shoot
and
move
it
into
a
separate
bucket.
That
says
flaky
tests
they
don't
even
run
those
flaky
tests.
It
is
up
to
the
team
then
to
create
either
a
new
test
to
replace
the
flaky
test,
or
they
do
integration
tests.
Specifically
on
the
item
that
the
flaky
test
is
causing
and
they're
saying
that
anything
else
they've
tried
right
like
disciplining
the
developers
automatically
reverting
just
deploying
green
pipelines.
A
All
of
that
at
some
point
just
collapsed
under
pressure
right.
The
only
thing
that
is
working
so
far
for
them
is
just
taking
that
immediate
action
of
moving
the
pipe
or
moving
the
the
thing
into
a
flaky
bucket
and
then
requiring
a
change.
Basically
so
I
thought
that
was
kind
of
interesting
and
I
think
it
theory.
It
would
resolve
quite
a
lot
of
our
problems
as
well.
If
we
do
it
that
way,
yeah.
B
B
B
Check
that
the
promise
I
think
once
there's
tests
are
removed
or
marked
as
pending
or
whatever
there's
no
real
incentive
for
people
to
fix
that,
because,
in
their
mind
like
oh,
you
know,
my
code
is
a
master.
My
feature
is
done:
an
equality
or
delivery,
they'll
fix
it,
rikes
quality
supposed
to
you,
do
two
tests
and
blah
blah
blah
and,
of
course,
people
say:
no
it
it's
not
supposed
to
be
the
way
and
etc,
but.
B
A
Right
but
but
you
know,
I
think
there
is
an
incentive
because
we
do
have
error
budgets.
So
if
something
does
get
a
break
and
you
have
an
issue
assigned
to
your
team
to
handle
aspect
that
was
removed
because
of
its
flakiness
and
you
have
a
failure
that
automatically
takes
it
out
of
your
error
budget
right.
So.
B
A
B
B
My
concern
is
basically
I.
I,
don't
want
to
end
up
where
say
six
months
from
now
we
have
this
piles
like
500
flaky
tests.
Nobody
has
any
clue
what
they
do
or
why
they
randomly
feel
whatsoever,
quality
or
real.
Some
specific
group
of
people
has
to
solve
that
problem,
and
it's
probably
not
the
people
who
wrote
the
test
well,.
A
B
B
Trackers
from
dangerous
or
like
people
have
the
same
problem,
but
if
get
up
pull
requests,
and
so
I
was
reminded
that
what
people
do
with
rust,
for
example,
some
other
projects
is,
they
have
a
bot
where
they
assigned
a
metric
us
to
that
bot
and
the
bot
will
thence
rebasing
until
there
are
no
conflicts
and
a
pipeline
screen,
and
then
it
will
merge
it.
The
underlying
idea
is
that
if
you
don't
use
merge
commits,
you
can
have
more
conflict.
So
you
just
let
a
book
to
all
the
rebasing
where
you
could
do.
B
There's
hook
that
into
this
system,
where
the
bot
will
say
I'm
not
gonna,
merge
this
because
you
consumed
your
error
budget
because
I
think
if
you
let
that
up
to
people
they're,
not
gonna,
check
the
documents
or
true
I,
don't
even
know
where
we
have
the
current
number
with
the
current
error
budget
recorded
DRO,
and
so
you
could
do.
That
thing
is
that
it's
say
with
this
is
why
I
didn't
do
with
the
danger
for
cuz,
like
oh
yeah,
okay,
we
can
do
that
like
the
idea
had.
Was
we
just
run
like
a
periodic
pipeline?
B
A
A
B
I'm
sure
they
are
like
five
issues
of
that.
Actually,
what
a
dates
back
to
years
ago,
where
seats
the
create
an
issue
this
some,
this
sort
of
stack
on
each
other
I.
Think
one
issue
is
sort
of
a
generic
lint
API
or
you
can
just
have
like
a
like
I
think,
was
like
a
message
with
like
an
indicator
like
red
or
green,
or
something
like
that
kind
of
similar
to
the
security
reports
and
then
j-unit
output.
We
have,
and
then
there
was
a
second
one
where
I
think
people
want
like
generic
build
output.
A
B
User
and
they
have
to
update
the
existing
comments.
So
what
I
like
this
idea
of
similar
to
the
security
report
or
whatever?
It's
just
like
a
thing?
You
click
and
it
just
lists
all
the
offenses
and
those
might
be
marked
down
or
whatever
yeah
and
the
instead
maintaining
super
easy.
Just
look
at
the
existing
list
and
echo
and.
A
A
B
Or
something
else
I
don't
know,
but
it's
important
that
they
don't
talk
to
the
API,
because
then
it
won't
work
for
Forks.
And
so
then,
if
CI,
which
wants
to
the
build,
has
done
it
just
checks
or
does
it
have
and
a
particular
artifact
file.
For
example,
then
we
ingest
that
and
we
took
that
into
output,
so
that
could
be
something
like
Oh
a
bill
just
write
like
lynnster
jason
in
the
route.
B
A
Urich
but
but
we
also
have
to
think
about,
obviously
our
values
like
how
we
actually
do
things
so
what
I'm
proposing
here
is.
Maybe
we
wanna,
like
we
schedule
this
item
for
ourselves
and
work
on
this,
and
we
get
to
a
situation
where
we
can
use
it.
I'm
telling
you
right
now
that
I
had
and
like
last
night
I
had
a
discussion
with
one
of
the
customers
where
I
explained
what
we
are
doing
with
the
commit
messages
with.
A
B
A
There
is
an
interest
out
there,
even
for
the
very
basic
thing.
So,
let's,
let's,
let's
talk
about
I,
mean
I,
know
that
you're
going
on
a
vacation
next
week,
but
after
after
you're
back.
Maybe
we
can
talk
about
like
scoping
this
a
bit
and
like
finding
out
like
what
can
we
do
to
satisfy
the
minimum
of
our
requirements
to
handle
master?
And
we
shouldn't
do
any
implemented
right?
You
and
Robert
I.
Don't
think
that
should
be
a
difficult.
You
know.
B
And
so
I
think
so
the
core,
if
it
can
or
basically
all
the
haters
yeah
and
all
the
bike
and
whatever
so
at
the
core.
It's
just
you
right,
lins,
Jason,
whatever
and
I,
think
the
former
I
had
in
mind
was
just
very
straightforward.
It's
like
an
array
of
objects,
it's
just
a
type
which
is
like
I,
guess
and
warning
error
info,
something
like
that
and
a
message
that
message
can
be
marked
down,
though
we
probably
have
to
sanitize
the
crap
out
of
it.
B
So
you
don't
get
people
doing
funny,
xs/s
injections
there
and
then
just
every
time.
I
build
completes
that
artifact.
This
present
we
ingest
it
and
then,
despite
I,
think
you
can
implement
that
basic
idea.
In
a
couple
of
days.
Yep,
probably
most
time
will
be
spent
fighting
the
default
encode
trying
to
get
it
to
show
properly
I
mean.
B
A
B
A
B
Cuz
we
could
get
rid
of
danger,
which
my
biggest
problem
with
dangerous,
not
just
even
the
the
fork
issue.
The
fact
that
the
code
is
completely
untested,
all
just
some
random
script
and
it
evaluates
in
a
weird
context,
and
it
means
say
if
you
require
it:
it's
like
everything's
global.
It
starts
running
it,
it's
awful
yeah
and
the
second
thing
is
we:
we
had
a
Philippa
who,
for
some
Hugh
J
s,
project
wiki
lab
they
implemented.
A
B
B
B
A
Let's,
let's
talk
about
that
once
your
once
your
back
and
I!
Think
with
that
it
would
allow
us
to
think
about
the
first
iteration
of
some
rules
to
say
like
okay,
you
have
a
flaky
buckets
right
now,
fricatives
buckets!
If
we
add
this
rule
to
the
repository,
your
team
is
no
longer
going
to
be
able
to
merge
without
an
additional
approval
and
that
additional
approver
I
needs
to
sign
off
on
your
merge
into
monster
because
you
have
your
bucket
is
too
full
or
you
broke
master
too
many
times
or
something
like
that.
B
B
The
underlying
idea
is
that
if
you
have
let's
say
50%
code
coverage
and
there's
a
bunch
of
flaky
tests,
we
remove
those
your
code
coverage
goes
down
effectively.
Nothing
at
that
point
can
be
merged
unless
something
actually
increases
that
code
coverage
again
by
ideally
fixing
the
the
flaky
tests,
which
I
think
is
probably
easier
to
implement
in
a
sense
that
we
have
the
cup
of
coffee
which
data
there.
B
So
it's
a
matter
of,
oh
if
we
had
a
threshold
per
project
and
if
it's
lower
than
no
merge,
probably
people
not
gonna
like
it,
but
that's
their
way.
You
sort
of
kill
two
birds,
oh
no
way,
I'm
not
supposed
to
say,
and
it's
not
politically
correct,
somewhere
off
topic
somewhere
I
read
where
people
were
trying
to
replace
these
sayings
like
killing
two
birds
with
one
stone
with
two
political,
correct
versions.
And
anyway,
you
get
two
things:
they're
like
one.
B
B
A
B
People
are
gonna
argue
that
code,
the
minimum
should
be
a
hundred
percent,
but
we
cannot
achieve
that.
So
we
set
it
to
like
40
percent.
I
will
always
have
more
than
that,
so
it
basically
becomes
useless
yep.
Whereas
if
you
say
oh
code
coverage,
let's
see
cannot
yeah.
Basically
code
coverage
cannot
decrease.
That's
it
simple.
B
A
B
B
B
So
we
have
tool
in
seeing
the
code
coverage
thing
the
whole
not
merge
into
master.
When
it's
read
I
know
you
could
do
a
ball,
but
I
think
it
up
should
be
able
to
do
that
like
it's
something
so
simple
to
implement
and
get
lap
itself
this,
it's
kind
of
stupid
to
require
people
to
write
a
500
to
do
that.
B
B
Looking,
for
example,
bucks
per
team,
cuz,
there's
always
been
sort
of
mean,
look
how
front-end
breaks
everything
and
people
pointing
at
each
other
basically,
and
it
doesn't
help
that
when
I
start
digging
through
can
its
in
virtue
quests,
the
pattern
is
sort
of
confirmed
we're
like
oh,
they
add
like
five
for
the
lines
of
JavaScript
or
Ruby.
For
that
matter.
You
know
it's
like
two
tests.
I
think
we
need
to
start
slowly.
Looking
at
okay,
what
can
we
do
about
sort
of
the
human
side
of
things?
B
C
B
Idle
right,
then,
I
saw
looking
at
look
like.
Oh,
this
is
other
button.
We
just
like
a
dashed
outline.
Instead
of
a
fixed
slide
and
I
saw
some
people
reporting
issues,
and
so
people
see
there
so
get
my
brakes
front
at
all
the
time.
Bah,
blah
blah
but
I
think
the
core
problem
is
there
that
we
don't
really
have
tools
for
testing
visual
changes?
Okay,
you
can
write
a
test.
B
Know
exactly,
and
so,
because
you
can
test
that
an
element
has
a
certain
class
or
ID,
but
as
far
as
I
know,
you
can
test
that
a
certain
CSS
rules
apply
like
our
tools.
Just
don't
support
that
and
what
I'm
reminded
of
his
the
dolphin
emulator
for
its
emulator
for
the
Wii
U
and
what
they
did
is
to
test
graphical
changes.
They
have
a
continuous
integration
where
they
basically
take
screenshots
and
they
compare
those
with
some
algorithm
to
see
how
different
they
are.
Jakob.
A
B
And
so
you
could
do
here,
you
get
like
a
ton
of
screenshots
depending
on
how
fine-grained
you
make
it,
but
it's
either
that
we
have
to
somehow
figure
out
how,
with
the
browser
API
you
can
retrieve
the
CSS
rules
apply
to
an
element,
because
then
we,
for
example,
these
buttons-
you
can
say.
Oh
all,
these
buttons
in
this
area
need
a
fixed
line
around
them.
Instead
of
dashes.
B
A
B
For
the
lint
stuff,
there
are
issues
there's
like
all
all
of
them,
actually,
because
a
lot
of
people
want
it
so
I
think
what
I
will
do
is
probably
two
more
I
just
want
to
dig
through
some
numbers
and
see
how
bad
know
how
bad
things
really
are
and
then
I
have
Monday
until
Thursday
I
think
I
can
actually
just
probably
on
Monday,
just
implement
this
like
without
front-end
cooks.
It's
not
that
difficult.
B
A
Then
reason
issue
in
the
in
the
framework
is
tracker
and
like
put
all
your
data,
there
collect
all
your
data
there
and
please
put
all
the
related
issues
there.
So
we
can
estimate
what
kind
of
things
we
can
actually
cover
and
not
cover
so
yeah
cool,
awesome,
you're.
Thank
you
very
much
for
your
time
and.