►
Description
Today Kyle joined us to catch us up on what Engineering Productivity has been up to, how they might use some existing test features and what the new repeat failed test counter.
Testing group playlist: https://www.youtube.com/playlist?list=PL05JrBw4t0Kq53VUOvTk3VdXN79PA0SXT
Last ThinkBig discussion: https://www.youtube.com/watch?v=mV9jCk5Znhw&list=PL05JrBw4t0Kq53VUOvTk3VdXN79PA0SXT&index=3
A
This
is
the
verified,
testing
monthly
internal
customer
call
for
january
27th
2021,
it's
our
first
one
of
2021.,
I'm
gonna,
just
vocalize
real,
quick,
a
couple
of
the
roadmap
deck
changes
of
note,
code
testing
and
coverage
moved
up
from
minimal
to
viable
maturity
in
january.
Super
excited
about
that
and
we
are
in
active
development
on
that
code.
A
Quality,
epic,
that
we've
been
in
development
on
for
a
while
to
resolve
those
open
dog
fooding
issues
and
also
move
maturity
up
to
viable,
been
getting
great
feedback
from
customers
from
twitter
from
the
forums
of
people
who
want
those
same
kinds
of
features.
They
solve
problems
for
them
in
the
code
quality
space
so
really
excited
about
that.
A
The
other
thing
I
wanted
to
call
out
was,
I
think,
big
discussions.
We
do
post
the
recordings
to
the
team,
playlist
I'll,
put
a
link
to
that
in
the
description
of
this
video,
and
our
latest
discussion
was
with
package
about
getting
some
team
specific
data
out
of
the
monorepo.
In
this
case
they
were
interested
in
understanding.
A
What's
the
test
coverage
for
the
area
that
we
cover
or
that
we're
responsible
for,
and
we
do
have
a
follow-up
issue
on
deck
for
that
in
1310,
the
team
will
be
producing
a
coverage
report
for
our
stuff
out
of
the
monorepo
and
then
we'll
be
circulating
that
with
some
engineering
managers
of
imagine
this
was
your
data.
You
know
how
would
you
feel
about
it
so
figure
out
what
problems
that
solves
or
if
there's
still
problems
out
there.
A
We
need
to
address
with
that
type
of
capability,
and
after
that
manual
effort,
we
can
figure
out
how
to
build
that
in
that's
everything.
For
me
from
a
high
level
view
ricky,
it
looks
like
you're
the
first
data
player
to
vocalize.
C
Yeah,
the
succinct
summary
is
we're
kind
of
at
a
pause
on
it,
so
the
recall
rate
was
really
low
and
we
were
facing
a
situation
where
we'd
have
to
kind
of
endlessly
adjust
and
change.
I
think
it
was
around
85
percent,
so
that
would
mean
15
pipe
like
what
we
saw
was
15
of
pipelines.
We
had
the
minimal
jobs
pass
and
the
full
jobs,
whatever
the
full
suite.
That
was
run
for
that
mr
fail
and
we
were
hoping
for
somewhere
north
of
95.
C
That
15
difference
wasn't
a
lot
that
we
can
move
forward
on.
What
we're
looking
to
do
with
the
information
is
starting
to
focus
on
mean
time
to
failure
and
see
how
we
can
use
that
like
use
that
data
to
just
accelerate
failure
overall
kind
of
like
what
we
did
with
art,
like
with
the
foss
impact
and
other
things
there,
which
was
the
original
I'll
say
the
originally.
C
C
Is
going
to
become
like
a
much
bigger
engineering
productivity
metric
that
we
we
focus
on
over
duration
and
the
other
ones
that
we
traditionally
a
duration
and
cost
in
particular
we're
kind
of.
C
As
far
as
how
you
can
help
like,
I
said
we're
really
on
pause,
so
it's
hard
to
say
we
kind
of
set
it
aside
and
we're
revisiting
after
we
look
at
priorities
for
the
next
like
quarter
and
how
to
fit
it
and
fit
in
the
work
to
accelerate
the
accelerate
failure.
So
I
don't
have
anything
great
for
you
all
on
this.
Unfortunately,.
B
Is
there
anything
that
you
can
think
of
from
your
efforts
and
work
you've
done
on
this
that
might
be
worth
commoditizing
and
incorporating
into
the
into
the
product
in
some
way?
I
know
I
know
you
talked
about
how
we
experimented
and
it
didn't
really
work
out,
but
now
we're
talking
about
mean
time
to
failure
and
how
we
can
accelerate
that,
and
you
got
some
data
from
what
we
were
doing
before.
So
is
this
something
that
we
can
build
into
the
app
somehow?
B
Can
we
introduce
a
template
or
an
image
or
something
that
we
can?
You
know,
sell
people.
C
I'm
sure
there's
some
value
to
customers
that
can
be
gotten
from
this.
I
just
nothing
is
coming
immediately
to
mine.
Let
me
prompt
albert
and
see
what
comes
to
mind
for
him.
He
has
usually
has
a
lot
better
insight
on
that
than
me
yeah.
So
I'm
going
to
ask
him
an
issue
and
cc
ccu
you
all
on
that.
Right
now,.
B
One
one
thing
that
comes
to
my
mind
is
the
work
that
you
had
done
to
cancel
the
pipeline
in
flight
in
order
to
make
the
the
fellow
fast
work,
I
feel
like
incorporating
that
into
the
product
in
some
way
would
add
a
lot
of
value.
Talking
to
my
friends
and
and
ex
co-workers
who
are
in
the
industry.
A
lot
of
them
are
looking
into
that
type
of
thing.
C
Yeah,
so
I
can
at
least
point
you
to
to
that.
It
was
all
done
through
the
api,
so
just
api
calls.
I
let
me
let
me
take
the
action
to
provide
you
with
that,
because
you're
right,
I
think
that
was
done
as
a
part
of
the
faucet
impact.
So
so
that's
that's
one
of
the
reasons
why
they
didn't
come
to
mind
ideally
well.
C
My
thinking
I
should
say
is:
it
would
be
great
if
there
was
almost
like
a
like
a
short
circuit
setting
like
allow
failure
so
like
if
this
job
fails
just
stop
everything
else
or
like
cancel
the
not
cancel,
because
then
the
status
recording
gets
a
little
weird,
but
that's
how
that's
how
we
implemented
it,
but
ideally
just
halt
all
other
jobs
in
progress,
because
we
want
to
know
about
this
failure
right
now
and
everything
else
beyond
this
point
doesn't
matter,
maybe
maybe
that's
a
little
excessive,
but
let
me
take.
B
What
it's
worth
that
does
seem
like
a
a
feature:
that's
general
enough
to
package
as
a
product.
You
know
just
it's
very
unconcerned
with
what
the
job
is.
It
just
says
you
know
if
something
fails,
stop
the
presses
that
might
that
might
be
amenable
to
sort
of
repackaging
in
a
general
way.
I
think
that
makes
sense
thinking
about
from
like
from
like
a
ci
template
perspective
like
we
can,
we
can
produce
something
that
I
think
will
be
easy
for
people
to
pick
up
and
get
started
with
that
at
least.
C
So
this
what
we're
talking
about
here
like
short-circuiting
the
pipeline,
already
existed,
so
I
wouldn't
say
it's
different
than
the
pipeline
today.
It
can
be
something
that's
harvested,
though,
and
essentially
just
the
pattern
is
reused
in
the
template
is
how
I
see
it.
Is
that
what
you
were,
I
guess
ricky.
Maybe
I
should.
A
A
So
my
question
is:
if
I
have
a
job
that
fails
today,
it
kills
the
pipeline.
Where
is
the
savings
in
like
I'm,
not
understanding
how
it
how
it
saves
you
runner
minutes
if
the
pipeline
is
going.
C
To
halt
so
with
dag
yeah,
so
with
the
needs
implementation
that
we
have
in
our
monorepo
pipeline,
we
have
jobs
running
in
lots
of
different
stages
simultaneously,
so
we
so
I
in
this
my
understanding
might
be
wrong
here.
If
you're,
not
using
dag,
you
have
a
job
fail.
Everything
stops
at
that's
at
that
stage
in
the
pipeline,
but
with
needs.
You
can
have
things
running
way
far
out
and
you
have
to
just
wait.
C
So
you
have
a
job
fail,
10
minutes
into
the
pipeline,
but
because
of
everything
else,
that's
running
you
actually
get
that
feedback
about
30
minutes
later
gotcha.
Okay,
when
I
say
like
and
when
I
say
feedback
like
the
email,
the
status
on
the
mr
would
would
be
fail.
You'd
have
a
job
status
that
says
that
has
the
circle
with
the
x
in
it,
but
everything
would
look
like
it
was
still
good.
B
A
That
that
makes
a
lot
of
sense,
because
if
you
have
a
test
job,
that's
just
like
a
smoke
test
and
that
fails.
You
want
to
kill
those
long-running
tests
if
you're,
if
you're
running
everything
in
parallel
and
they're
like
whoa,
don't
run
that
hour
worth
of
tests
stop
right
now,
hopefully
someone
would
say
that
runs
a
stage
earlier,
but.
C
Yeah,
and
so
so
you
could
do
exactly
what
you
described
and
we
could
actually
configure
needs
to
to
set
up
the
dependencies
like
that.
Our
problem
is,
there's
a
there's,
a
needs
limit,
I
think
that's
like
50
per
job,
and
since
we
use
parallel
on
all
of
our
test
jobs
where
the
model
repo
is
very
limited
and
how
we
can
implement
needs
when
we
start
going
when
we
start
talking
about
like
our
spec
jobs
or
anything
that
would
depend
on
them.
We
go
over
the
limit,
gosh.
B
And
this
is
definitely
will
something
that
is
increasingly
useful,
as
you
have
more
parallelization
like
in
a
fairly
standard
job
after
job
in
your
ci
file.
Configuration
you're
not
going
to
get
anything
out
of
this
if
all
your
jobs
run
sequentially
and
if
all
your
jobs
run
in
parallel,
you'll
see
the
most
possible
benefit,
and
so,
where
you
fall
in
that
spectrum
will
depend
on
we'll
decide
how
useful
this
feature
would
be.
Yeah.
C
Yeah
so
it
to
your
point,
I
think
maybe
it's
not
something
that
is
important
to
a
large.
I
don't
know
our
customers
very
well,
but
we
may
not
have
a
lot
of
customers
that
have
high
parallels
parallelization
like
we
do
where
they
actually
get
value.
Out
of
this.
B
I
I
think
this
goes
back
to
conversations
that
james
and
I
james,
and
I
have
had
several
times
where.
Yes,
the
the
majority
stats-wise
of
our
customers,
may
not
see
the
benefit,
but
the
ones
who
do
are
gonna
be
the
the
bigger
customers.
They're
gonna,
be
the
people
that
are
using
the
product
to
its
fullest
and
are
are
probably
having
lots
of
engineers
participating
and
are
very
concerned
with
the
speed
of
their.
B
Yeah
yeah,
I
I
I
I'm
kind
of
I'm
kind
of
interested
now
in
working
that
into
the
the
the
needs
or
the
the
rules
syntax.
Somehow,
where,
like
oh,
if
this
job
fails,
then
you
should,
you
know,
pull
the
plug
on
the
whole
pipeline
kind
of
thing
and
having
that
as
part
of
the
the
gitlab
cie
ammo,
configuration
is
kind
of
compelling
yeah.
C
And
yeah-
maybe
maybe
if,
if
we
can
get
the
needs
limit
raised,
I'm
not
sure
like.
I
just
think
that's
something
that
hasn't
been
relooked
at
the
I'll
say
the
problem
kind
of
gets
the
the
magnitude
of
the
people
who
would
probably
want
this
would
continue
to
decrease.
B
A
So
I
was
curious
kyle
if
there's
and
we
talk
a
lot
about
testing
and
unit
testing
and
trying
to
get
that
feedback
loop
faster.
What
other
areas
of
testing
around
like
accessibility
or
the
browser
performance?
C
Well,
I
can
take
that
back
to
the
other
like
qems
and
see,
because
I
think
that's
just
a
blind
spot
that
we
have
right
now,
yeah
and
yeah.
Let's,
let's
see
what
they
let's
see,
what
feedback
they
have.
I,
I
think
we're
always
open
to
that,
but
we
want
to
be
very
careful
with
what
we
add
and
the
feedback
it
provides
to
developers.
C
You
know
we
we
get
feedback,
both
directions
on
that
we're
like
hey.
This
is
great
that
we're
doing
this
now,
but
for
every
positive
feedback
we
tend
to
see
a
lot
more
confusion
or
negative
feedback
on
us
to
communicate
better.
But
one
more
thing
on
that
we
do
have
the
ability
to
like
run
jobs
at
a
lesser
frequency.
C
So
there
are
certain
things
that
we
delegate
to
run
like
every
night
das,
for
example,
we
run
like
a
specific
pipeline
that
runs
all
of
their
desk
scans,
that
reports
to
the
security
team
for
analysis
and
audits.
We
can
always
start
with
something
like
that.
I'm
just
not
sure
what
we
do
from
like
an
action
perspective
with
the
results,
yeah.
A
It
might
be
interesting
to
do
that
for
accessibility
and
just
drop
the
results
into
the
accessibility
slack
channel.
You
know,
even
if
it's
only
once
in
a
baseline
yeah
and
like
here's
four
pages
go
scan
those
four,
even
if
one
of
them
is
just
a
perpetually
open,
mr
and
see
how
things
are
behaving
yeah.
A
Okay,
that's
interesting!
The
other
thing
that
I've
I'm
hoping
to
get
some
insight
into,
because
we
see
it
continue
to
see
an
increase
in
the
number
of
jobs
that
are
generating
data
for
the
metrics
report.
The
custom
metrics
trying
to
get
a
better
understanding
of
how
we're
using
that
at
gitlab
and
where
the
holes
are
in
that,
because
I
think
we've
had
one
issue
in
the
last
18
months
that
relates
to
that
feature.
A
C
Yeah,
I'm
not
as
familiar
with
with
those
cut.
You
say:
custom
metrics,
right,
yeah,.
C
Yeah
yeah,
I
I
this
is
really
embarrassing
yeah.
I
I
I
don't
really
know
you
see
if
I
can
find
anymore
with.
B
That
that's
no
that's
part
of
the
problem
yeah,
but
the
thing
the
thing
about
metrics
reports
is
when
I
learned
what
it
did.
I
was
like
wow.
This
is
super
cool.
You
could
literally
do
anything
with
this,
so
so
all
you're
doing
is
you're
outputting
a
text
file
in
your
job,
with
the
open,
metrics
format
like
the
prometheus
metrics
format
in
it,
and
then
it
compares
that
with
the
the
base
pipelines
version
of
that
report,
so
you
could
put
literally
any
metric.
B
You
want
in
there
manually
or
automatically
or
whatever,
and
then
it'll
compare
it
against
the
base
pipeline
in
the
merge
request,
widget.
C
Where
can
I
learn
more
about
this?
It
sounds
like
there's.
Definitely
some
value
and
again
I
have
I'm
sure
it's
like
in
the
deck.
I'm
sure
I've
seen
this
like
six
times
and
I'm
just
again
very,
very
embarrassed
here.
A
I
don't
think
I've
ever
expanded
it
actually
on
an
mr,
so
I'm
not
sure
what's
in
here
or
what
is
useful,
yeah.
B
So
this
is
this
is
just
this
looks
like
it'll,
be
some
measurements
around
from
from
prometheus.
It
looks
like
it
was
generated
by
about
mega
megabyte
usage
and
stuff
like
that,
in
the
bundle
when
it
was
building
the
bundle.
So
that's
one
example
of
something
you
can
use
that
for
but,
like
I
said
it's
just
open,
metrics
format,
anything
so
you
could.
I
don't
know
you
could
put
like
how
much
electricity
the
runner
used
when
it
ran
or
like
lit
anything.
C
Yeah,
let
me
let
me
bring
this
up
with
the
team
and
see.
Maybe
this
is
a
place
where
we
can
surface
so.
Can
you
fail
something
on
the
custom
metrics
reporter
we
have
to
like
read
it
like
it's
essentially
just
information
presented
to
everyone,
yeah.
A
It
just
dumps
into
a
text
file.
I
think,
and
then
doesn't
compare
that
way,
so
you
could
fail
on
it,
but
it
would
require
some
custom
scripting
to
do
it.
C
Yeah,
okay,
like
one
of
the
things
that
comes
to
mind,
is
there's
like
a
lot
of
danger.
Logic
for,
like
the
front
end
like
web
pack,
size
so
being
able
to
compare
that
to
a
baseline
and
then
putting
that
information
in
I
mean
it'll
just
be
shifting
from
danger
into
something
that's
standard
in
the
product
and
therefore
maybe
like
harvestable
and
usable,
but
yeah.
Let
me
see
what
could
be
done
here
cool.
I
too
have
never
expanded.
That,
though,
for
the
record,
I.
A
A
B
And
when
we
fix
the
metrics
widget
it'll
be
even
more
important
there
you
go,
but
I
think
it's
still
it's
it's
still
doing
the
wrong
comparison
like
code
quality
was
doing
in
the
past
from
er
drew.
Is
that
right
that
that
is
right?
As
far
as
I
know,
I
recently
worked
on
an
issue
that
I
found
out
was
like
mostly
fixed
like
a
week
ago,
so
I'm
I'm
currently
hesitating
to
to
speak
to
the
state
of
any
problem.
But
yes,
as
far
as
I
know,.
C
Yeah,
I
was
gonna
say
I
think
the
action
there
is
just
for
me
to
like
ask
the
team
and
see.
Is
there
I'll
read
up
on
the
feature
and
then
ask
the
team
about
different
uses
that
we
could
have
for
this
much
appreciated.
A
C
A
C
The
things
to
consider
is,
if
you're
looking
to
use
those
features
as
a
part
of
like
moving
them
to
viable.
There's
that
option
I
talked
about
where
we
can
run
it
on
a
different
frequency
and
do
something
with
the
with
the
data
and
then
yeah.
The
other
action
was
getting
with
albert
on
what
could
be
harvested
from
the
dynamics.
C
I
feel
like
I
never
have
anything
fun
for
you.
I
could
bring
it's
always
like
which
which
yeah,
which,
if
there's
something
more,
I
can
do
to
help
prepare
or
bring
something
to
the
meeting.
Let
me
know:
okay,.
A
C
C
C
Yeah
yeah,
that's
definitely
true,
because
I
kind
of
help
like
manage
the
team
and
also
try
to
manage
the
backlog
for
entering
productivity.
It's
kind
of
very,
very
hard,
but
one
of
the
things
I
was
going
to
ask
about
code
quality,
so
there's
a
team
member
who's
looking
to
become
a
maintainer
on
the
team
and
I'm
hesitant
to
just
like
spell
it
all
out,
because
it's
recorded
and
part
of
the
feedback
that
you
got
from
the
maintainership
program
was
maybe
doing
some
development
on
features.
C
Since
our
team
doesn't
work
as
much
on
features,
they've
done
development
for
you
in
the
past
on
things,
if
that's
a
clue
to
who
this
might
be,
but
with
the
code
quality
work,
it
was
something
that
came
to
mind
as
he
and
I
were
talking
about
work
streams
that
we're
looking
to
dog
food
and
that
need
some
feature
development.
If
there's
work,
that's
holding
you
back
this
back
end
on
that
or
that's
like
not
prioritized
but
needed.
C
B
Sorry,
I
was
just
taking
some
notes
there
yeah.
We
can
absolutely
do
that
as
we
go,
nothing
springs
to
mind
at
the
moment.
We
are
having
some
technical
issues
with
the
limitations
of
our
shared
runner,
docker
configuration,
particularly
the
docker
and
docker
configuration
where
we're
at
the
point
now,
where
we're
not
sure
if
we
can
speed
up
the
code
quality
job
at
all
because
of
the
way
the
caching
layers
work
with
the
shared
runners,
it's
kind
of
like
it
pretty
much
needs
to
pull
all
the
docker
images
from
scratch.
B
Every
time
the
code
quality
job
runs
on
the
shared
runners,
instead
of
being
able
to
cache
it.
So
that's
that's
kind
of
drew.
If
forgive
me
for
summarizing
your
work,
because
that
sound
about
right
sounds
exactly
right.
There's-
and
I
think
we
talked
about
this
a
little
bit
for
our
internal
project-
that
there's
a
being
unable
to
cache
images.
B
Is
the
price
we're
paying
for
being
for
operating
in
a
strictly
disposable
environment,
because
caching
is
not
disposing
of
things,
and
so,
as
long
as
the
the
shared
runners
prioritize
that
disposability
we're
not
going
to
get
caching,
and
so
it's
a
use
case
specific
trade-off
between
those
things.
As
far
as
we
can
tell
right
now-
and
we
don't
have
a
good
answer
for
the
middle-
we
don't
have
a
good
middle
ground
for
that.
C
So
if
like
we
don't
use
the
shared
runners,
is
there
the
potential
that
customers
or
us
like?
We
have
the
issue
to
look
at
changing
the
configuration,
that's
essentially
to
overcome
this
problem
for
our
private
runners
right?
Yes,
okay,
that
makes
sense
that
helps
with
the
like
kind
of
the
background
on
that.
C
If
we
can
prove
that
out,
then
we
can
have
like
a
quantifiable
benefit
of
our
own
use
case
to
say:
here's
the
improvement
that
we
saw
based
on
the
scale
that
we
run
it
and
kind
of
reaffirm
that
the
other
thing
kind
that
I
that
I
guess
comes
to
mind.
That's
kind
of
related
in
q1
we're
working
with
infrastructure
to
try
out
some
new
machine
specs
for
our
private
runners.
C
C
I
think
it's
more
just
a
general
fleet
for
our
private
runners
at
the
moment,
but
we
are
going
to
test
different
specifications
to
see
the
performance
on
a
jump
by
job
basis.
So
we'll
have
the
data
to
know.
Oh,
our
spec
jobs
are
maybe
cpu
bound,
so
these
ones
perform
better.
Other
jobs
are
maybe
memory
bound,
so
yeah.
These
type
of
machine
specs
will
be
better.
B
Just
for
decision,
sorry,
just
for
clarity,
sake,
we're
talking
about
different
levels
of
vm,
because
our
providers
offer
different
speeds
of
cpu
with
different
ram
allocations
or
ssds
or
something
right.
Yeah.
Sorry,
that's!
Well!
That's!
I
guess
I
yes,
that's
what
I
meant
this
size
vm
for
these
runners
for
these
jobs
and
this
different
vm
for
other
runners
for
other
jobs.
C
Yeah
and
there's
not
to
say
that
that's
not
the
decision
we'll
make
going
forward,
but
I
think
it
starts
with
gathering
the
data
and
then
we
can
assess
the
cost
and
the
value
we
get
out
of
the
complexity
of
managing
different
different
runners
for
different
jobs.
C
C
This
is
separate
moving
kubernetes
for
our
runners,
like
our
hosted
runners
yeah.
I
think
this
is
separate
from
this
is
separate
from
that.
C
The
issue
might
be
related
to
upgrading
some
of
the
components
that
have
already
migrated
to
kubernetes
to
different
machine
specs.
There's
a
lot
rolled
into
this,
and
one
of
them
is
looking
at
our
own
private
runners.
So,
just
as
if
you
read
through
the
description
on
the
issue,
there's
a
lot
there,
but
the
engineering
productivity
part
is
just
the
runners.
B
A
C
I
am
curious,
actually
a
question
comes
to
mind.
So
the
team,
sorry,
our
engineering
productivity,
is
really
passionate
about
trying
to
do
something
better
than
we
are
with
like
especs
inside
of
the
mono
repo.
C
What
are
some
things
that
we
can
leverage
inside
the
testing
like
functionality,
that
maybe
you
don't
see
us
leveraging
that
could
help
reduce
or
at
least
measure
the
frequency
of
flaky
specs
better
on
a
test
by
test
basis.
A
B
B
Have
you
have,
but
in
in
the
wrong
context,
you've
looked
at
it
from
like.
Well,
we
have
all
this
stuff
in
the
database.
Can
we
use
it
and
not
as
to
what
we're
using
it
for
yet?
So
my
question
is
we're
starting
to
put
I'm
going
to
flip.
I'm
going
to
answer
your
question
with
a
question
we're
starting
to
put
the
the
results
of
tests
that
failed,
specifically,
so
we're
starting
to
log
like
okay.
This
is
a
test
case
when
it
fails.
B
We
log
it
in
the
database,
and
this
is
when
it
failed
and
what
pipeline
it
belonged
to
when
it
failed
and
what
time
it
was
and
blah
blah
blah
so
we're
starting
to
aggregate
that
data
in
the
database
and
have
it
long
term
like
what?
What
what
can
we
do
with
that?
That
will
help
you
most
because
we're
kind
of
just
like
well
we're
gonna,
make
a
little
notification
badge
in
the
mr
that
says
that
this
has
failed
before
in
the
default
pipeline,
and
that's
all
we're
doing
with
it
right
now.
B
C
Yeah,
so
I
would
answer
that
as
what
the
question
that
always
bothers
me
is:
how
frequently
does
this
does
a
specific
test
fail
on
pipeline,
like
on
a
pipeline
basis,
ideally
we'd
be
able
to
be
able
to
look
at
like
master
and
mr
pipelines,
but
it's?
How
often
is
it
failing
and
then
almost
like
how
often
over
time
so
that
we
can
see?
Is
this
related
to
just
like
a
master
broken
where
there's
like
a
spike
on
one
day
and
then
it
just
drops
off
or
is
it
like
a
consistent?
C
A
A
A
B
Right
but
it's
for
like
one
text,
so
if
one
test
fails
10
times
in
one
day,
that's
the
spike.
But
if
one
test
fails
once
every
week,
then
that's
the
more
like
spread
out
thing,
but
the
thing
that
the
feature
does
do
right
now
is:
if
the
whole
pipeline
just
blows
up
and
something
went
horribly
wrong.
We
don't
bother
logging
that
every
test
failed,
because
it's
probably
unrelated
to
the
test-
it's
probably
the
misconfiguration.
So
we
don't
log.
C
C
Maybe
that
data
is
already
available
with
api
calls,
or
I
should
ask,
is
that
sort
of
data,
so
it
sounds
like
we're
tracking
it.
I've
looked
at
test
failures
from
like
a
pen
to
end
test
perspective.
I
know
we
capture
the
information
on
the
gitlab
pipeline.
Can
we
get
that
sort
of
failure?
Rate
information
from
what's
extract,
like
what's
available
in
the
api.
B
So
right
now
the
problem
is
manifold,
because
we
were
worried
very
concerned
about
the
scalability
of
storing
this
in
the
database.
B
B
With
that
I
we
were
kind
of
of
split
minds
on
this.
I
was
like
how
useful
is
it
if
we
just
tell
people
that?
Oh
this
test
failed
to
20
times
on
any
pipeline
that
ever
ran
in
the
last
14
days,
like?
Is
that
more
or
less
useful
than
this
test
failed
20
times
on
the
default
branch
in
the
last
14
days?
B
So
that's
what
we're
looking
at
right
now
over
a
14
day
period,
we're
not
currently
aggregating
it
and
then
further,
because
we're
concerned
about
the
scalability
of
the
feature
we're
not
actually
storing
the
whole
test
name
in
the
database,
we're
just
storing
a
hash,
a
hash
of
the
test
name.
So
it's
a
fixed
length
because
we
were
worried
about
people
having
like
really
really
large
test
names
in
their
in
their
files
and
having
that
in
the
database
and
causing
issues.
B
Yeah,
so
with
what
we're
storing
right
now,
we're
actually
not
seeing
it's
not
too
too
bad,
with
the
way,
with
all
the
caveats
that
I've
just
explained,
only
logging
test
failures
on
the
default
branch,
not
logging,
any
failures,
if
there's
more
than
200
in
the
whole
pipeline,
so
it'll
just
ignore
it.
If
there's
more
than
200
and
further
we're
also
looking
to
probably
purge
that
on
a
14
day
period
currently,
and
also
we're
not
we're
not
storing
the
full
name.
B
So,
given
all
of
that,
we're
actually
not
concerned
with
the
expansion
that
we're
seeing
we're
seeing
about,
I
don't
know
we're
seeing
an
amount
of
increase
that
I
know
the
exact
number
of.
But
this
is
a
recording
and
it's
it's
not
concerning
to
us
but
who's
to
say
how
what
that
rate
would
be
if
we
included
every
pipeline
and
not
just
the
default
branch
pipeline,
and
is
it
worth
it
to
investigate
that
avenue
to
from
your
from
your
perspective,.
C
I
don't
think
so.
Right
now
like,
we
are
still
like
in
exploratory
mode
of
how
what
data
we
want
to
capture
we'd,
want
to
capture
what's
available
in
the
product
and
really
what
we're
looking
to
do
with
it
is
make
more
informed
decisions
about
what
to
focus
on
from
a
flaky
speck
perspective,
what
to
automatically
quarantine.
What
kind
of
signal
boost
to
ems
and
say
hey?
C
This
spec
fails
two
percent
of
all
pipelines
because
we're
seeing
a
larger
impact
to
developer
productivity
than
we
anticipated
based
on
legacy
failures,
like
our
merge
request.
Success
rate
at
the
pipeline
level
is
65
on
average,
which
was
surprisingly
low
to
me,
and
if
we
can
cut
flaky
specks
out
of
that,
raise
it
up
even
higher,
so
that
feedback
is
more
actionable.
B
So
how
what
what
thing,
what
kinds
of
things
are
you
doing,
or
have
you
been
looking
at
for
pulling
that
more
to
the
left
right,
like
it's
one
thing
for
it
to
that
number
is
high
to
be
failing
in
in
the
app
right.
But
how
have
we
tried
what
what
stuff
have
we
done
already
to
try
and
make
it
easier
for
engineers
to
run
the
test
locally
so
that
it's
failing
there,
instead
of
failing
on
machines
that
cost
money.
C
Yeah,
that's
what
we're
looking
to
harvest
from
the
dynamic
spec
analysis
piece
where
we'd
essentially
use
that
mapping
tie
it
into
the
left
hook,
which
is
kind
of
our
default
pre-hook,
tooling,
right
now,
and
run
tests
that
are
most
applicable
to
the
files
that
you've
changed,
be
able
to
empower
people
to
do
that
very
easily
locally.
That's
our
our
goal
on
shortening
the
feedback
loop.
I
think
we're
looking
to
just
take
more
automated
actions
based
on
frequency
of
flinky
failures
with
the
data
we
were
really
talking
about.
B
C
Okay,
let
me
just
refine
this,
like
I
really
just
brought
this
up
on
the
cuff
so
like
that
sounds
great.
Let
me
I'm
still
kind
of
in
our
the
team
is
really
passionate
about
this.
We
have
a
lot
of
priorities.
We
don't
have
a
good
plan
on
what
we
want
to
do.
What's
the
smallest
thing,
we
can
do
to
add
value
from
a
reduce
the
frequency
of
flicking
specs
in
or
out
of
interrupting
the
developer
experience.
B
Yeah,
so
we're
hoping
that
that
widget
provides
some
value
in
that
context
as
well,
because
now
in
the
test,
failure
widget.
If
a
test
fails
in
an
mr
pipeline
and
it's
failing
in
master
and
the
default
branch,
then
you
should
get
that
notifications
like
hey.
This
has
failed
10
times
in
master
in
the
last
14
days,
and
so
that
should
be
at
least
an
indication
where
engineers
are
looking
at
it.
That
might
tell
them
that
it's
not
their
fault,
that
that
the
pipeline
fail.
C
Yeah
yeah,
let
me
yeah
I'll,
definitely
make
sure
and
kind
of
signal.
I'll
signal
boost
that
to
the
team
and
say
here's
a
here's,
a
way
for
us
to
get
some
information
based
on
what
was
that
on.
B
I
was
talking
to
eric
about
this
a
little
bit
it's
because
the
expansion
of
those
tables
is
so
much
below
where
we
were
worried
about
like
this,
we
can
absolutely
add,
probably
the
full
name
of
the
test
in
there,
so
you
could
query
it
a
little
bit
more
effectively
and
build
some
data
from
that
as
well
yeah,
and
then
I
think
we're
also
talking
about
like
rolling
it
up
james
we
had
we
had
an
issue
or
a
conversation
around
instead
of
just
purging
the
data
after
14
days,
we'd
roll
it
up
into
a
summary
table
and
then
store
store
that
for
a
longer
period
of
time,.
C
I
so
now
I'm
curious:
do
you
have
feedback
from
customers
that
are
similar
scale
or
like
similar
model
repo
strategy,
as
we
are
on
where
to
take
test?
This
test
failures
feature
that
maybe
yeah
okay,.
A
Okay,
so
this
feature
just
rolled
out
with
13.8,
so
we
know
that
a
lot
of
those
monorepo
folks
are
self-hosted,
not
on.com,
so
it's
going
to
take
a
while
for
them
to
actually
get
this
in
their
upgrade
cycle.
C
C
A
A
A
C
Yeah,
I
will,
I
will
look
and
see
what
could
be
helpful
for
us
with
the
future,
after
really
refining
our
need
and
what
we're
trying
to
do
and
seeing
how
the
functionality
aligns
with
that
and
get.
B
The
the
issue-
don't
don't
worry
about
that?
I
was
good.
I
was
gonna,
make
it
if
you
hadn't
made
it
already.
So
thank
you
for
making
the
issue.