►
From YouTube: Monthly Testing Internal Customer Call - July 2020
Description
An in depth discussion on the next steps for TestFileFinder and design for Test History.
Links:
https://docs.gitlab.com/ee/user/project/merge_requests/fail_fast_testing.html
Test History Design: https://gitlab.com/gitlab-org/gitlab/-/issues/223737/
A
This
is
the
internal
customer
meeting
for
the
verify
testing
group
for
july
2020.
I
have
the
first
point
on
the
agenda,
which
is
just
a
general
update
to
the
roadmap
deck.
There's
really
no
changes
of
note
in
the
progress
on
the
epics.
We
have
some
longer-lived
epics
right
now
that
we
haven't
made
it
well.
We've
made
a
ton
of
progress
on
in
the
last
month.
A
We
haven't
delivered
anything
of
note
that
I
wanted
to
call
out
and
wanted
to
say
mostly
thanks
again
to
grant
for
contributing
the
load
performance,
testing
mvc,
which
introduced
a
whole
new
category
for
us,
which
is
just
amazing,
so
really
appreciate
that
and
then
really
wanted
to
take
some
time
with
kyle
joanna
mack
to
talk
through
just
how
has
how
had
things
gone
with
our
identify
failures,
fast
template
the
tff
gym.
I
really
think
we
could
dedicate
some
time
to
that.
I
do
want
to
before.
A
We
journey
today
show
some
designs
that
we're
starting
to
do
solution,
validation
on
for
the
test
history
in
both
the
junit
report
and
at
a
project
level
to
make
sure
that
this
this
group
is
aware
of
those
and
as
we're
making
progress
in
iterating
on
those.
So
don't
let
me
forget
about
that
before
we
adjourn,
but
let's
go
ahead
and
jump
in
how
are
things
going
with
identify
failures,
fast
zeph?
Do
you
want
to
vocalize
your
point.
B
Yeah,
so
I
I
tried
to
do
a
simpler
integration,
one
that
albert
had
identified
as
a
possible
target
environment
to
run
this
in,
and
that
was
the
customer
portal
we
did
get
it
enabled
and
ran
it
for
several
days
captured
quite
a
few
data
points.
B
In
the
end,
it
was
burning
up
quite
a
few
more
ci
minutes
than
everyone
was
comfortable
with
from
a
value
perspective,
and
that
was
because
mainly
for
two
reasons,
one
they
weren't
actually
having
that
many
failures
of
all
of
the
pipelines
that
were
executed,
that
only
caught
three
failures,
so
it
was
like
a
less
than
five
percent
some
somewhere
in
that
range
and
then
the
other
issue
was
they
had
just
optimized
their
pipeline
streamlined
their
test
pipeline.
B
B
For
this
I
had
a
generic
cost
of
five
to
six
minutes
just
from
the
environment
perspective
and
then
the
the
number
of
r-spec
tests.
Obviously
it
was
minimal,
and
that
was
usually
you
know,
measured
in
seconds
as
to
how
many
of
those
were
executed.
So
we
did
end
up
backing
it
out.
There's
currently
we're
trying
to
figure
out
how
to
just
integrate
it
into
the
specific
job
itself.
But
we
can't
do.
B
We
can't
do
anything
direct,
it's
just
basically
going
to
be
stealing
the
logic
and
trying
to
do
it
in
a
in
a
bash
script
to
check
to
see
if
it
fails
and
if,
if
it
does
we'll
go
ahead
and
fill
the
job,
if
not
we'll
pass
on
to
the
rest
of
the
r-spec
tests.
B
So
that's
that's
where
we've
landed
up
at
the
moment,
I'm
on
rotation
this
week
for
our
our
pipeline
triage.
So
I'm
trying
to
spend
some
time
on
that.
But
it's
been
a
little
difficult.
A
Sure
so
I
just
want
to
make
sure
I
understand
that
the
so
for
the
project
implementing
the
template,
just
as
is,
was
taking
up
more
minutes
than
it
was
saving,
and
is
that
as
against
a
scheduled
pipeline.
B
Yeah,
well,
that
was
for
mrs
and
trains
right,
okay,
yeah!
So
so
that's
that's
how
the
templates
actually
set
up.
It's
not
for
just
your
your
regular
scheduled
runs.
It's
it's!
Basically,
when
anybody
submits
a
change.
So
what
that's?
What
we're
learning
from
that
is
that
there's
a
threshold
at
which
this
becomes
valuable
enough,
and
I
think
in
long
enough
running
pipelines
and
pipelines
with
enough
changes
where
we
have
a
better
chance
of
capturing
issues,
I
think
it
definitely
could
provide
value.
B
B
The
r
spec
jobs
I
think
there
was
270
tests.
I
think,
okay
off
the
top
of
my
head.
A
A
B
And
I
think
it
looked
like
a
promising
target
to
begin
with,
because
their
pipeline
was
almost
twice
as
long
initially,
but
they
they
had
some
caching
issues
that
they
figured
out
were
taking
up
a
lot
of
time
and
they
got
that
cleaned
up.
Just
as
we
were
integrating
this
in.
A
C
I
just
had
one
follow-up:
was
there
did
you
or
are
you
able
to
look
for
the
reverse
of
what
you
mentioned
happening?
You
said
that
there
were
three
times
where
the
the
job
caught
a
failure
and
killed
the
pipeline.
Were
there
any
times
where
it
didn't
catch,
a
failure
and
it
failed
later
in
the
pipeline.
B
B
Me
and
and
funny
enough,
all
of
the
failures
were
with
a
particular
developer,
so
this
is
the
other
thing
that
I
learned
from
it
as
well.
Is
that
I
think
it's
going
to
depend
as
well
on
your.
B
Development
process
in
general
that
all
the
developers
are
following
if
everyone,
if
everyone's
running
their
r-spec
tests
regularly
locally
before
they're
submitting
changes,
you
know
we're
not
gonna
catch
a
bunch
of
failures,
especially
if
they're
like
running
all
the
jobs
similar
to
the
way
we
are,
and
I
either
all
the
other
developers
were
just
excellent
and
just
this
one
developer
is
not,
or
this
one
developer
was
just
go
ahead
and
pushing
it
and
letting
the
letting
the
pipeline
catch
the
jobs
instead
of
running
the
test
locally.
C
B
D
Yeah,
so
I'm
just
going
to
ask
in
the
the
comment
and
I've
linked
to
that
down
below
I'll.
I
didn't
realize
you're
on
rotation,
so
I
could
just
add
it
to
the
feedback
issue,
because
I
thought
you
did
a
great
job
summarizing
it.
But
when
you
said
or
when
it
says,
like
six
minutes
was
added,
that's
just
because
there
was
a
new
stage.
So
the
total,
like
time
for
the
pipeline
to
complete,
was
six
minutes
longer
for
the
majority
of
the
cases
just
to
begin.
B
Well,
I
mean
compared
to
it's.
It
was
six
additional
minutes
compared
to
how
long
their
pipeline
was
running
previously.
Yeah,
okay,.
D
B
Okay,
what
should
I
reword
that,
in
in
a
way
to
make
it
clear.
D
No
sometimes
I'll
I'll
just
say.
Sometimes
I
talk
about
like
job
minutes,
but
it
doesn't
but
there's
other
jobs
that
are
running
simultaneously,
so
it
may
not
impact
the
total
pipeline
run
time
so
pipeline
runtime
might
be
50
minutes,
but
the
six
minutes
might
be
consumed
within
a
stage
where
there's
other
jobs
running
so.
A
In
addition,
we
also
found
that,
or
you
found
that
there
was
at
least
three
cases
where
it
didn't
catch,
something
it
should
have
in
three
cases
where
it
caught
something
it.
We
didn't
expect
it
to
catch.
B
There
were
three
cases
that
caught
something
we
didn't
expect
it
to
catch.
I
don't
know
if
it
should
have
caught
the
other
three
cases:
okay,
yeah,
but
because
we're
talking
about
running
the
entire
test,
suite
for
those
errors
to
have
been
captured.
C
B
D
So
so
like
what
I
was
thinking-
and
I
I
can
bring
this
up-
there's
like
danger
front
end
test
lining
r,
spec
and
robocop
jobs
all
in
the
test
stage
for
this
project,
and
really
we
just
want
to
short
circuit
the
r-spec
job,
because
that's
the
tests
that
are
being
failed
fast,
so
the
other
jobs
in
theory
we
you
could
use
dag
to
start
them
earlier
so
danger
could
provide
a
review
comment
like
things
could
happen
to
get
some
of
the
feedback,
but
the
still
but
you're,
still
gonna
block
some
jobs
which
were
previously
not
blocked
to
get
that
feedback.
C
Yeah,
it's
just
it'd
be
kind
of
neat
if
almost
like,
like
if
you
had
a
context
with
cancel
and
go,
and
you
spun
up
a
bunch
of
go
routines
and
you
could
pass
the
cancel
context,
all
the
other
jobs
that
were
running
and
cancel
them
like
mid-flight,
with
the
fail
fast.
So
you
could
like
it'd,
be
neat
if
you
could
run
that
job.
C
D
I
you
know,
I'm
sure
we
could
so
that's
interesting.
I
didn't
think
about
that
with
what
we
were
looking
to
do
with
the
next
evolution
of
test
file
finder,
but
that
might
be
something
we
consider
with
engineering
productivity,
because
we
should
be
able
to
make
an
api
call
to
like
stop
the
other
jobs
at
minimum.
That
might
be
something
to
start
with
yeah.
It's
interesting.
I
didn't
consider
that.
E
A
Next
up,
actually
albert
is
picking
up
the
issue
that
we
were
going
to
start
in
13,
4
and
13
3..
I
need
to
jump
back
into
tff
with
drew
and
see
where
we're
going
next
with
it
really.
I
mean
our
map
is
to
support
kyle
and
his
team
in
making
use
of
the
gem
and
so
they're
going
to
really
be
helping
us
drive
the
revamp
for
this.
D
And
the
reason
why
I
volunteered
albert
is
just
because
it'll
help
us
advance
on
rkrs
faster
versus
waiting
until
13
4.
albert
was
looking
for
some
product
development
experience
too.
So
it's
kind
of
blended
things
kind
of
matched
up.
I
think,
from
from
a
need
and
desire
perspective.
A
Yeah
so
we'll
be
looking
for
the
next
thing,
then
in
134
to
pull
forward.
I
I
just
don't.
I
haven't
looked
at
the
epic
today
or
recently
to
know
what
that
next
thing
is
and
definitely
we'll
be
leaning
on
kyle
to
get
the
feedback
of
hey.
How
can
we
help
next
where's,
the
next
problem
that
we
can
go
help
you
solve.
E
Okay,
so
then
what
is
the?
What
is
the
ops
team's
effort
in
this
in
q3?
Can
we
is
there
any
other
iteration
that
we
can
do
versus
waiting
for
the
ep
team?
It
sounds
to
me
like
the
ep
team.
Is
gonna
be
working
on
it
and
it's
not
going
to
be
a
product-facing
thing
yet.
Is
that.
C
Like
the
product
iteration
on
the
tff
is
to
get
an
example
in
the
docs,
like
that's
the
first
mvc,
it's
like
this
is
how
you
could
use
this
thing
reasonably
and
then,
if
the
ep
team
is
working
on
the
configuration
mapping,
then
in
parallel
we
could
work
on
examples
for
other
languages
or
also
we
could
work
with
the
runner
team,
because
I
know
elliott
is
interested
in
this
a
little
bit.
C
So
if
we
could
get
that
working
for
his
his
go
projects,
then
that
would
be
another
small
iteration
that
we
could
take
toward
furthering
the
test
file.
Finder
specifically.
E
Okay,
kyle,
let's
catch
up
offline,
because
I
I'm
a
big
boy
on
the
head
boom
of
of
the
ep
team
and
I
think
both
mostly
on
our
side,
because
I
think
both
the
ops,
q,
quality
team
and
and
your
team
have
had
some
experience
in
this-
just
want
some
clarity
done
clarity
on
who
should
be
owning
doing
what
then
I'd
like
us
to
move
closer
to
like
a
functional
ownership.
If,
if
we
can
but
yeah
just
just
concern
some
head
hunt
head
room
on
our
on
our
side.
Thank
you.
Three.
D
Yeah
and
the
short
version
is
we're
facing
either
a
point
where
we
diverge
more
from
test
file
finder,
where
albert
or
someone
on
the
team
would
spend
time
building
up.
What's
called
a
gitlab
projects,
test
file,
finder
to
do
the
same
function
or
we
just
partner
and
and
help
add
the
feature
to
the
product
that
we
would
then
leverage
so
that
development
work
was
going
to
be
done
by
the
ep
team
one
way
or
the
other
to
work
towards
our
q3
okrs.
E
Yeah
like
it's
not,
it
sounds
to
me,
like
our
dog
fooding
momentum
will
be
on
pause
for
a
bit
because
we're
waiting
for
improvements
that
the
quality
team
has
to
ship
in
right.
It's
anything
that
we
can
be
proud
of,
that
this
thing
has
added
value
to
to
customers
that
it
reduced
the
test
time
and
the
first
mvc
is
gives
us
feedback,
and
it's
totally
fine,
it's
great
that
we
get
feedback
now.
But
how
can
we?
E
How
can
we
kind
of
expand
this
to
more
projects
and
not
have
to
wait,
because
we
only
try
with
one
project?
Is
there
any
other
areas
that
we
could
just
like
drop
this
in
and
it's
good?
It's
helping
reduce
the
time
to
run
tests.
B
I
I
think
I
can.
I
can
help
with
that
going
forward
for
other
projects,
but
I
haven't
given
up
on
the
customer
portal.
So
what
I,
what
I
want
to
know
is
if,
if
we're
not
using
the
template-
as
is
we're
not
just
in
including
this
as
perhaps
a
customer
would
try
to
do
and-
and
maybe
we're
just
wiring
in
the
gem
on
on
the
back
end
for
this
for
short
circuiting,
this
r-spec
job
as
kyle
described
it
aptly.
E
Template
so
dog
foodie
means
it's
available
to
customers
out
of
the
box,
and
if
it's
in
the
documentation
and
customers
can
configure
it
the
same
way,
we
are
using.
I
think
that
counts
it
it
might
it's
not
like
a
one
switch
if
there's
a
one
switch.
Unlike
you,
add
these
three
statements
and
the
default
documentation
and
they're
getting
immediate
value
out
of
the
tests
test.
E
Optimization,
I
think,
that's
a
win,
and
then
we
slowly
reduce
the
gap
in
the
next
iterations
like
instead
of
adding,
instead
of
a
switch
in
like
four
lines
of
customization,
we
slowly
remove
those
customization
and
in
the
end
right,
you
just
flip
a
switch,
and
you
know
it
automatically
is
baked
in
into
your
pipeline,
and
you
can
proudly
say
that
if
you
don't
use
this
feature,
you're
gonna
have
like
20,
more
runtime
and
just
be
proud
of
that,
and
that's
where
the
end
goal
should
be.
E
B
Also
as
well
so
so
so
would
it
make
sense
for
me
to
continue
with
the
customer
portal
and
work
on
that,
our
spec
job
with
that
integration
and
and
maybe
supplement
the
documentation
for
from
a
customer
perspective.
So
they
could
understand
how
to
use
the
gym,
or
should
we
look
for
another
project
where
would
be
an
easier
fit.
E
I'll
leave
that
for
for
you
and
james,
if,
because
this
is
the
product
direction
as
long
as
we're
moving
towards
progress
there,
I
I
will
leave
I'll
leave
room
for
creativity,
how
you
wanna,
how
you
wanna
do
it,
but
it
should
be.
It
should
be
part
of
the
release
it
should
be
in
the
documentation.
I
think
that's
the
most
important
thing
and
they.
D
Oh,
I
just
linked
to
the
summary,
so
we
we
kind
of
just
talked
through
everything
that
was
in
there.
I
will
port
that
over
to
the
feedback
issue
that
you
link
to
above
okay
and
then
I
did
find
a
fail
like
a
short-circuited
pipeline
example
that
I
linked
to
as
well,
so
just
for
great
for
reference
and
that
can
kind
of
show
you
like
you,
can
look
around
in
the
project
and
see
the
change
in
the
pipelines
before
and
after.
If
need
be.
D
A
All
right,
so
I
just
want
to
make
sure
nick
it
sounds
like
next
steps
are,
except
you,
and
I
are
going
to
sync
and
kind
of
pick,
a
direction
for
continuing
to
go
down
the
project
with
the
template
versus
pulling
out
the
gem
and
documenting
how
to
use
that
to
short
circuit.
A
I
think
it's
worthwhile
to
also
look
at
other
projects
that
maybe
have
bigger
suites,
where
this
might
be
more
helpful
and
just
better
understanding
our
use
case,
where
it's
applicable,
we'll
then
circle
back
with
the
next
steps
within
the
tff
project
of
where
we're
going
roadmap
wise
I'll
work
on
between
now
and
our
next.
A
A
Did
I
miss
anything
in
there
that
we
talked
through
that
just
flew
blue
by
me
great
any
anything
else
that
we
didn't
talk
about
about
tff
or
the
efforts
that
we're
making
there.
B
They'll
probably
be
covered
with
how
you're
thinking
about
expanding.
You
know
future
discussions.
I
I'm
assuming
we
are
going
to
discuss
about
applying
this
to
other
test
frameworks
as
well.
Yeah.
A
A
Let's
say
in
the
customer
interviews
that
we've
done
and
recruited
interviews,
we've
done.
There
hasn't
been
confusion
about
this,
but
my
take
is
that
they
they're
really
not
sure
what
this
is.
So
they
don't
know
to
be
confused.
They're
not
know
what
to
expect
on
this
screen.
Yet
we
made
for
mvc
just
pull
this
out
and
then
really
focus
on
history
on
the
next
screen.
A
When
did
they
fail
in
sequence,
some
of
the
resounding
feedback
we
got
was
that's
great
but
which
one
failed
when
it
wasn't
clear
to
a
customer
or
to
a
user
what
they
really
needed
to
care
about.
They
were
really
confused
about
the
9
out
of
10
and
the
green
like
wait.
One
of
those
is
red,
and
one
of
those
is
green.
A
So
I
don't
know
if
I
need
to
care
about
this
or
not
as
we
talk
through,
we
realize
well
what
this
really
shows
you
is
that
this
one
failed,
the
previous
nine
or
sorry
backwards.
This
one
passed,
the
previous
nine
all
failed,
so
you
had
a
failing
test
that
you
just
got
fixed,
and
so
you
don't
really
need
to
care
about
this
one.
A
But
this
one,
maybe
if
like
this,
was
1
out
of
10
this
one
failed
10
times
previously.
It's
passed.
You
really
should
care
about
that.
So
juan's.
Looking
at
a
couple
of
different
options
to
display
this
one
is
even
combining
the
two
so
that
you
have.
It
would
almost
look
like
a
pipeline
with
check,
marks
and
x's
showing
the
past
history.
A
We
may
condense,
then
how
many
appear
in
this
screen
or
even
pull
it
into
another
detail
screen
where
you
would
see
the
history
we're
looking
at
a
couple
of
different
options
here,
the
I
mean
the
the
vision,
though,
is
that
if
you
have
a
test
that
failed,
you
could
see
what
are
some
of
my
past
runs.
Is
it
a
flicky
test?
Is
it
a
brand
new
failure
of
what's
been
a
rock
solid
test
so
far.
E
A
Correct,
I
know
we
talked
about
that
quite
a
bit
on
this
call
and
for
our
quality
folks
pipeline
would
be
the
most
helpful
I
believe,
is
or
was
my
takeaway
from
that
on
the
technical
side
we
haven't
decided
yet,
where
we're
going
to
grab
the
history
from
so
not
sure.
A
D
Yeah
yeah,
sorry,
what
I
was
thinking
is
like
the
pipeline
context.
So
if
it's
a
merge
request
pipeline,
it
should
be
how
many
times
did
this
test
run
in
different
pipelines
within
that
merge
request
or
if
it's
a
pipeline
that
ran
on
master?
What
are
the
last
n
number
of
master
pipelines,
and
how
did
this
test
behave
there,
not
like
the
last
pipeline,
because
there
should
just
be
one,
maybe
two
executions
of
that
test
within
the
pipeline.
D
There
might
be
multiple
pipelines
for
an
mr
or
multiple
pipelines
for
a
branch
so
that
I'm
a
little
confused
on
that.
C
We
haven't
headlines
yeah,
we
haven't
settled
on
the
actual
technical
implementation,
because
we
kind
of
need
to
know
what
people
are
expecting
this.
It's
going
to
be
a
challenge
either
way,
just
because
there's
a
ton
of
data
here
like
storing
a
junit
report
for
these
parsings,
like
each
pipeline,
run
right
now
on
gitlab.com
produces
something
like.
I
don't
know,
30
megabytes
of
junit
report
that
we
need
to
parse
in
order
to
make
this
page
so
for
storing
historical
information
about
text
executions,
and
we
try
to
do
that
in
the
database.
C
We're
gonna
have
a
problem
like
right
away
with
that,
so
it's
really
gonna
be
a
matter
of.
How
can
we
do
this
in
a
clever
way
where
we
can
maybe
only
store
failures
and
test
runs,
and
so
we
can
kind
of
see
like
okay
and
again,
it
depends
what
what
we
want
like
kyle
was
saying
if
we're
just
really
looking
at
merge
requests,
then.
Okay,
in
this
merge
request,
this
pipeline
ran
three
times,
and
this
test
failed
one
out
of
those
three
times
like
I.
We
could
do
that.
C
C
The
last
five
times
was
run
or
less
10
times
it
was
run,
then
that
might
be
more
valuable
because
that's
a
better
indication
of
whether
that
test
is
flaky
or
not
than
just
having
a
smaller
sample
set,
and
you
could
even
eventually
probably
boil
that
down
to
summarize
statistics
about
overall,
this
test
fails
33
of
the
time.
So
maybe
it's
not
super
important
that
it
failed
right
now,
right.
E
Right
joanna,
maybe
maybe
it
would
be
good
to
run
the
whole
team
on
the
test
session
thing
that
we
did,
and
maybe
we
could.
I
mean
if
we
don't
have
to
like
save
the
stack
trace
anymore
in
our
issues
and
just
use
this
if
it
lands.
E
B
Well,
I
I
think
that'll
that'll
work
if,
if
we're
capturing
it
at
a
large
enough
context,
because
part
of
the
issue
with
having
our
stack
trace
captures
that
we
currently
have
it
in
test
cases
is
to
give
a
more
complete
history
to
enable
us
to
to
really
dig
down
and
figure
out
where
this
actually
started.
E
B
E
Yeah,
so
what
I'm
saying
is,
I
think,
the
next
iteration.
I
think
I
think
the
team
is
on
the
right
track.
The
next
iteration
is
in
in
addition
to
saving
the
results,
the
historical
stack
trace
would
be
really
valuable
and
we
can
can
add
from
there
so
yeah.
A
A
I
think,
where
your
historical
stack,
trace
and
being
able
to
track
that
data
gets
interesting.
Is
this
view
at
the
project
level?
So
this
would
be
at
the
project
level.
This
is
kind
of
that
first
iteration
towards
here
your
flaky
tests,
and
so
here
are
tests
at
the
last
10
times
that
they've
run
here
are
the
most
failures
here.
The
most
skips
potentially
could
be
another
view,
as
one
has
on
the
design
here
we
haven't
talked
through,
but
there's
probably
an
interaction
here,
maybe
not
in
the
mvc,
but
in
a
follow-up.
A
That
could
be
a
really
slow
interaction
for
the
user,
because
you're
going
to
have
to
load
up
all
of
those
individual
junit
reports.
As
you
click
in,
but
if
you're
doing
the
research,
that's
probably
a
penalty
you're
willing
to
pay
to
individually
load
those
junit
reports
to
figure
out
hey.
When
did
this
start
failing.
C
It
shouldn't
be
that
slow,
because
since
we
did
the
refactoring,
where
we're
only
loading
one
report
at
a
time
after
you
click
in
it,
really
takes
like,
depending
on
the
size
of
the
junior
report,
one
to
two
seconds
to
kind
of
load
it
in
and
then
you
can
kind
of
interact
with
it
from
there.
It's
not
like
great,
like
ideally
you'd
like
to
have
100
milliseconds
or
something
when
you
click
in
in
order
to
load
it,
but
we're
kind
of
parsing
a
file
on
the
fly.
So
it's
a
little
bit
expensive.
E
Okay,
james,
would
you
mind
click
on
the
the
test
case
issue
613
under
your
point,
yeah
yeah,
oh
here.
Let
me
just
put
it
in
the
chat
for
us.
E
So
this
is
our
historical
view,
and
would
you
mind
expanding
the
label
bar
great,
so
you
just
go
all
the
way
down
the
the
reason
I
think
it
should
be
by
by
project.
Is
that
you
see
the
results
we
track
here?
We
have.
Each
of
these
environments
is
actually
grouped
by
project
ci
structure.
E
So,
if
I
want
to
know
flaky
test
in
production,
it
should
be
scoped
towards
that
that
historical
context,
and
if
you
want
to
look
at
the
flaky
test
and
staging
you
scroll
down
a
little
bit,
there
should
be.
There
should
be
some
more
right.
The
next
thing
is
staging
correct,
so
this
would
decouple
this
all
together,
because
right
now
everything
is
in
one
one
one
place,
and
this
test
case
is
really
long.
E
Since
we
just
keep
adding
discussion
points,
then
you
scroll
down,
I
think
they're,
like
100,
something
comments
that
is
automated
by
the
bot,
so
I
think
separating
this
out
into
different
buckets
of
staging
and
production,
which
is
by
project,
and
it
kind
of
confirms
that
it's
it
should
be
at
the
the
project
level.
The
historical
historical
context.
E
We
don't
use
environments
in
our
in
our
own
structure
right
now,
that's
kind
of
where
it's
blocking
or
like
it's
we're,
not
using
the
concept
of
environment
in
our
deployment.
So
each
each
project
is
tied
to
a
gitlab
environment.
So
you
see
canary
you
see,
production.
You
see,
you
see,
staging.
B
From
a
scope
level,
I
think
it's
better
to
see
the
the
test
across
environments
as
well,
because
if
it's
passing
in
one
environment,
failing
in
another
you've
got
an
indicator
that
we
have
an
issue
with
a
particular
environment
as
opposed
to
to
a
test.
So
I,
like
the
broader
scope
there
and
not
not
limiting
it
to
just
how
many
times
this
test
has
passed
or
failed
in
nightly
alone,.
E
I
I
think
they
could
end
up
being
the
same
place.
We
just
can't
use
that
that
granular
grouping
of
environments,
yet
we
will
just
we'll
just
be
using
the
project
different
projects
to
look
at
different
different
deployments
environment
on
our
end.
But
I
think
for
the
broader
use
case,
I
think
grouping
it
by
environments
makes
sense.
Do
we
have
a
concept
of
environments
in
ci
right
now
or
in
the
release?
A
E
I
think
that
can
be
like
two
or
three
iterations
in
the
future.
I
think,
if
you
save,
if
you
solve
us
at
the
project
level
that
addresses
like
80
percent
okay
and
then
when,
when
we
have
headroom,
you
can
think
of
how
you
want
to
glue
it
together
into
like
a
ci
analytics
in
the
future
at
the
dashboard
at
a
group
level.
But
in
the
immediate
form,
I
think
the
scoping.
E
The
historical
result
of
a
test
to
the
project
makes
sense,
because
I
guess,
as
as
one
said,
pipelines
can
run
multiple
times
right
and
then
can
run.
E
A
And
I
think,
where
we're
going
to
end
up
ricky's
not
going
to
hear
this
probably,
but
this
this
design
is
going
to
probably
split
into
two
different
issues,
or
it
needs
to
split
into
two
different
issues
pretty
soon,
because
these
designs
are
really
targeted
at
two
different
users
and
who
have
two
different
problems
where
it
sounds
like
yours
is
more
of
the
the
team
lead
user
who's
looking
at
the
project
and
historically,
what
does
everything
look
like
overall
versus
the
developer
who's?
A
Maybe
looking
at
just
for
my
branch
and
maybe
back
to
one
to
my
target
branch?
How
is
this
test
behaving?
Did
I
break
it
or
historically?
Is
this
just
a
flaky
test
in
the
context
of
my
merge
request,
not
in
a
larger
context
of
how
is
this
test
performing
historically
because
they
just
want
to
know?
Is
this
always
a
flaky
test,
or
is
this
something
I
broke
and
need
to
fix?
So
we
may
end
up
storing
the
data
differently
for
the
two
different
use
cases.
A
I
don't
think
we're
gonna
land
in
a
place
where
we
can
store
the
same
data,
that's
used
in
both
views.
I
don't
think
so.
The
smarter
engineers
may,
you
know,
find
a
great
way
to
do
that.
C
I
think
we
could,
but
we'll
have
to
want
to
think
about
it.
Some
more.
F
Yeah,
either
way,
it
seems
that
we
all
agree
that
flakiness,
it's
a
concept
that
lives
at
the
project
level
right
that
that's
how
like
it's
not
a
concept
that
lives
at
the
like
at
the
branch
level.
You
know
if
you
break,
if
you
break
test
in
nmr,
that's
expected,
but
if
they
are
consistently
failing
throughout
the
project
history,
then
that
means
that
the
the
test
is
flaky
and
that's
likely.
What
we're
trying
to
explore
expose
here
is
this
test,
flaky
or
not
right
so,
which
could
be,
makes
a
lot
of.
C
Yeah,
where
my
head's
at
right
now
is
when
we're
part
we're
running
partial
sets
of
our
tests
dynamically,
depending
on
the
changes
in
the
pipeline.
How
does
that
affect
the
statistics?
We're
gathering
about
the
run
for
each
test?
So
if
this
test
only
gets
run
ever
when
we
run
the
whole
suite
and
it
never
usually
gets
run
when
we're
doing
partial
pipeline?
Mrs,
like
what
is,
how
does
that
affect
our
statistics
for
test
runs?
C
I
think
we
ran
into
this
problem
already
kyle,
when
we
were
talking
about
coverage
because
we're
not
generating
complete
coverage
on
every
mr
pipeline
anymore,
because
we
don't
run
all
the
tests,
so
we're
not
going
to
spend
the
extra
cycles
to
get
the
coverage
report
so
that
that
problem,
I
think,
is
going
to
be
pretty
interesting,
because
maybe
this
test
failed
once
or
the
last
10
times.
But
maybe
it
only
ran
four
times
today
when
all
the
other
tests
ran
a
hundred
times.
You
know
what
I
mean.
D
Yeah,
I
think,
where
we're
moving
to
is
the
full
test.
Suite
usually
runs
in
master,
and
it's
going
to
be
as
limited
as
possible
in
the
mr,
so
that
context
of
project
as
long
as
you're
looking
at
where's
the
pipeline,
the
most
complete
which
in
get
lab
project,
I
think
that
would
be
the
mate
like
master
branch,
the
main
branch
that
sounds
good
you'll
be
able
to
detect
like
these.
There.
E
I
think
that
these
are
really
great
points
and
I
don't
think
we
should
think
of
it
in
terms
of
missing
data.
I
think
historic,
historical
test
data
should
be
treated.
As
you
know,
a
glass
glass,
half
full,
you
didn't
run
it
that's
fine.
We
just
won't
have
the
data,
so
we
would
just
list
whatever
we
have
and
the
branches
can
just
be
a
filter
later
on.
Where
hey
this,
this
test
a
ran
like
a
thousand
times
last
month.
E
These
are
like
list
of
branches
that
it
existed
there
and
then
you
can
add
more
dimensions
later
on
as
a
as
a
filter
mechanism,
but
the
the
primary
dimension
or
character
of
the
test
lists
at
the
project
level.
So
unit
test,
one
ran
like
a
million
times
last
quarter,
and
you
know
it
passed
80
percent
of
the
time
and
20
percent
of
the
time
it
failed,
and
these
are
the
branches
that
it
failed
in
which
product
area
which
product
team
is
failing
this
test.
E
B
Just
a
quick
question
on
that.
I
agree
at
the
project
level,
especially
for
us.
It
makes
sense
james
when
you
were
mentioning
at
the
group
level.
Were
you
thinking
of
maybe
a
company
that
uses
microservices
and
each
microservices
in
their
own
project
kind
of
situation.
A
Yeah
we've
heard
from
customers
that
they
want-
and
this
is
even
like
a
more
aggregated
view
of
all
of
our
tests.
How
many
are
passing,
how
many
are
failing
and
so
rolling
it
up
to
that
level,
but
also
then
being
able
to
kind
of
dig
into
that
in
between?
So
if
you
have
a
project
that
this
is
our
project,
for
you
know
this
environment.
This
is
our
project
for
this
environment,
or
this
is
our
project
for
linux.
This
is
our
project
for
windows.
C
So
one
thing
we
heard
from
elliot
is
actually
the
reverse
of
what
we're
talking
about
and
and
and
breaking
breaking
things
out,
we're
she's
talking
specifically
about
coverage,
but
I
think
this
is
applicable
to
tests
too,
because
we're
talking
about
grouping
and
grouping
is
important.
He
was
talking
about.
C
So
how
can
we
facilitate
both
of
these
things
when
we're
trying
to
group
things
by
project?
But
now
we
also
maybe
want
to
group
things
by
like
directory
inside
of
a
project.
A
We
wrote
up
an
issue
and
I
can
link
that
back
in
here
as
well
or
back
into
the
discussion
for
that
issue.
That
elia
brought
up.
A
We
have
a
few
more
minutes,
any
other
topics
that
we
should
cover.
While
we
have
the
group
together.
A
Today,
all
right,
I
don't
see
anything
else
in
the
agenda.
I
see
everybody
else
on
mute,
so
I'm
gonna
say
no.
Thank
you.
Everyone
thank
you
ricky
and
joanna,
as
always
for
taking
notes,
appreciate
it,
and
this
will
get
uploaded
unfiltered
a
little
bit
later
today.