►
From YouTube: A/B testing Think Big - Initial meeting
Description
https://gitlab.com/groups/gitlab-org/-/epics/2966
We discuss our initial overall idea of the new A/B testing product category.
A
When
we
briefly
discussed
what
was
already
there
like
ford
met
up
with
unleash,
there
was
a
call
with
the
unleashed.
Folks
later
today
we
got
a
performance.
A
A
b
test
is
going
to
be
the
top
goal
of
21..
We
didn't
really
discuss
that
yet,
but
I
take
that
as
something
I
read
somewhere
and
there
is
unleash
plans
to
move
into
that
direction
as
well
as
in
they
have
variants
which
is
kind
of
like
a
limited
version
of
fd
testing,
and
we
kind
of
got
into
some
details
around
how
that
works
for
them,
which
is
currently
restricted
by
fixed
percentages
big
unless
here
and
this
year
is
stable.
A
So
and
we
are
seeing
my
unleash
as
competitive
or
as
partners
instead
of
competitors,
we
give
them
great
exposure
and
contribute
back
to
the
open
source
project.
So
are
you
still
there
with
it
or
did
just
the
expansion.
A
Yeah,
so
let's,
let's
jump
into
the
next
point,
which
is
kind
of
like
where
we
want
to
take
this
meeting
ultimately
like
this
was
intended
as
kind
of
like
a
brainstorm
meeting.
You
requested
it
in
my
pto
management
issue
and
what
is
your
exact
like?
What
are
your
goals?
What
is
your
expectation
towards
the
deliverable
coming
out
of
this
meeting.
B
So
I
actually
set
up
like
a
whole
day
for
you,
and
I,
I
think
tomorrow,
to
kind
of
just
deep
dive
into
the
research
outcomes.
B
Open
up
some
implementation
issues
come
up
with
a
think
big
proposal
like
similar
to
what
you
did
with
flow
a
in
the
three-year
vision.
So
I
want
to
come
up
with
a
big
picture.
First,
what
we
want
to
achieve
a
little
bit
about
how
it's
going
to
look
like
before
we
drill
down
into
the
small
nitty-gritty
details.
B
So
if
we
take
zoom
out
for
a
minute,
let's
just
talk
about
the
goal
of
zab
testing,
which
is
you
know,
to
experiment
with
different
types
of
variants
in
your
code
and
to
get
feedback
back
from
these
different
variants
in
order
to
make
a
conscious
decision
of
which
flow
of
the
code
you're
actually
going
to
keep
over
time
right
and
when
we're
talking
about
a
b
testing,
we're
talking
about
experiments
as
a
title,
and
so
we
already
did
one
brainstorming
session
with
the
developers
and
we
had
some
things
that
came
up
from
there
like
experiments
are
bound
in
time.
B
B
A
A
A
B
Okay,
so
these
are
the
questions,
the
big
questions
that
I
think
we
need
to
answer
before,
going
to
a
b
testing,
and
then
we
have
a
whole
new
topic,
which
is
new
personas.
B
B
A
Existing
so
the
idea
for
tomorrow
is
that
we're
gonna
kind
of
create
like
a
happy
path
right,
like
we're
gonna
figure
out
like
all
right.
How
is
this
flow,
gonna,
gonna
work.
B
Yeah
so
so
we
have,
I
would
say
we
have
a
definition
phase.
B
B
Phase
decision.
B
And
we
also
discussed
on
the
ongoing,
we
have
collaboration
collaboration
discussion
with
a
team.
B
B
B
Okay,
I'm
back
sorry,
my
my
engineer
has
been
giving
me
problems
all
week.
Okay,
so
where
were
we.
B
B
You
keep
breaking
breaking
up
in
the
middle
of
the
sentence,
it's
hard
for
me
to
understand
the
question.
A
Should
we
like,
I
think,
it's
better
if
we
disable
the
video
like,
we
don't
need
it.
Okay,.
B
B
So
we
want
collaboration,
so
this
is
where,
in
the
middle
of
the
experiment,
the
team
has
a
place
to
discuss
it
has
if
you
need
to
change
something
in
your
experiment
like
adding
a
variant
or
changing
percentage.
This
is
where
you
would
do
it.
B
So
again,
if
something
when
you're
monitoring
your
ongoing
experiment,
sometimes
you
need
to
change
the
percentages
or
you
need
to
change
your
variance
or
you're
you're
gonna
totally
get
rid
of
a
specific
variance.
So
this
is
the
place.
What
I,
when
I
call
ongoing
tracking
phase.
B
Then
we
have
getting
metrics
and
analysis,
so
someone
probably
someone
technical,
is
going
to
change
something
in
the
code,
but
like
a
product
manager,
someone
is
going
to
oversee
the
metrics
and
analysis
and
say:
okay,
no
one's
actually
hitting
this
blue
button.
Let's
get
rid
of
it
and
then
they'll
probably
open
an
issue
for
a
developer.
B
A
Yeah,
I
think
this
is
one
of
the
the
difficult
things
where,
depending
on
how
you
want
to
approach
it
like,
how
are
we
gonna
support
like
from
the
research?
It
is,
if
I
remember
correctly,
they
want
to
have
a
single
metric.
They
can
kind
of
go
down
to
to
see
alright.
What
is
the
health
of
my
of
my
experiment,
and
this
is
what
we're
gonna
need
to
connect
to
whatever
they
are
using
to
collect
the
data
right?
Are
we
going
to
support
one
single
tool
or
like?
A
A
Yes,
we
are
using
this
internally
so
when
I
think
back
on
unit
tests,
which
have
like
a
standardized
xml
format,
I
believe-
and
if
the
system
then
finds
such
a
like
an
xml
format
exported
as
an
artifact
of
the
pipeline,
it
will
kind
of
like
see
that
and
say
all
right.
This
is
like
genuine
test
report.
I
can
present
this
information
regardless.
A
B
Yeah,
so
what
I
think
we
need
to
do
is
kind
of
get
help
with
the
monitor
from
the
monitor
team,
because
they're
agnostic
to
any
metric
system
they
use
prometheus
heavily,
but
they
can
can
connect
to
anyone.
So
I
would
assume
it's
very
similar
in
that
sense,.
A
All
right
will
you
be
included
into
such
conversations,
I
mean
like:
are
they
aware
of
that?
We
probably
will
gonna
need
to
make
use.
B
Of
that,
this
is
just
an
assumption
that
I
have.
I
haven't
spoken
to
anyone
from
the
monitor
team
yet
about
it,
but
I
plan
to
ask
for
their
help.
B
You
can
also
probably
also
use
the
growth
team,
because
the
growth
team
is
using
snowplow
and
they
are
also
using
feature
flags
and
they
are
doing
some
kind
of
experimentation,
so
we
need
to
figure
out
how
how
they're
doing
it
and
then
just
copy
paste.
I
guess
so.
Growth
team
is
also
a
really
good
use
resource.
A
A
B
A
What
are
they
trying?
So
what
are
they
tracking?
I
actually
have
a
lot
of
information
on
that.
How
do
the
metrics
come
into
play?
I
also
have
a
discussion.
I've
had
a
discussion
with
one
of
the
engineers
to
have
a
discussion
on
like
how
the
growth
theme
is
is
using.
You
know,
flipper
kind
of
like
how
that
flow
is
working
from
a
developer
point
of
view,
which
is,
I
think,
pretty
interesting.
It
might
actually
make
a
lot
of
sense
for
you
to
watch
the
recording
of
one
of
those.
B
A
Yes,
in
terms
of
the
a
b
testing
research,
there's
still
a
couple
of
interviews
to
be
done,
which
is
this
is
one
of
them
at
least
to
be
done
to
be
tagged
as
well.
So,
in
some
of
the
earlier
interviews,
my
note-taking
was
less
optimal,
so
I
have
to
redo
the
note-taking
for
the
other
ones.
They're
all
done-
and
I
think
especially
for
if
you
want
to
kind
of
you,
know,
connect
with
the
growth
theme.
A
B
A
A
There
is
an
audit
happening,
they're,
getting
the
metrics
analysis
to
some
extent.
So
this
is
interesting
like
digesting
the
data.
How
will
it
be
presented?
Do
you
have
your
reports.
B
I'm
not
sure
it
has
to
be
part
of
the
nbc,
but
if
we're
talking
about
a
think
big,
some
of
the
interviewers
mentioned
that
they
have
to
export
some
reports.
That
prove
why
an
experiment
was
like
the
results
and
why
they
made
a
decision
to
to
leave
something
in.
So
I
think
it
would
be
really
convenient
if
we
could
create
such
at
least
graphs
to
be
exported
into
a
spreadsheet
or
something
like
that,
so
that
it
can
be
later
used
directly
from
the
tool.
B
It
doesn't
really
matter
the
format
it
can
be
excel,
it
can
be
pdf,
it
could
be
whatever,
but
it
should
contain
the
data
of
the
analysis,
the
graph
you
know,
the
name
of
the
experiment,
the
duration
of
the
experiment,
the
results
and
the
final
decision
of
which
variant
was
chosen.
At
the
end
of
the
day,.
B
B
B
B
It's
just
something
that
came
up
from
from
some
of
the
interviews,
and
I
thought
it
would
be
useful.
A
Explorer,
I
think,
digesting
the
data
is
kind
of
like
one
of
those
important
things
like
what
do
you
think
about?
Like
you
got
a
lot,
you
got
your
main
metrics,
you
could
be
side
metrics
and
you
got
the
metrics
which
are
kind
of
what
you're
going
to
base
your
decision
on
which
are
not
directly
like.
A
So
let
me
give
you
an
example,
so
say:
there's
an
experiment
going
on
there's
a
website
with
two
different
buttons,
like
your
theoretical
example
and
we're
measuring
on
clicks,
but
the
click's,
not
the
thing
that
is
gonna
persuade
us
so
like
one
is
better
than
the
other
right
it's
like
did.
We
get
a
greater
conversion
because
it
was
part
of
some
kind
of
e-commerce
workflow.
A
So
in
that
sense
you
would
want
to
say
all
right
flow
a
indeed.
The
only
difference
was
this
button,
but
did
they
actually
buy
more
products
right?
Did
they
buy
a
larger
amount
of
product?
Did
they
spend
more
money?
So
then
you
will
have
to
kind
of
derive
from
the
clicks
and
how
you
know
like
it's.
It's
not
as
easy
saying
like.
Oh
and
this
in
this
experiment,
like
51
people,
click
the
button
yeah,
but
did
it
actually
help
them
like
you
know,
buy
the
product,
yes
or
no,
so
that.
B
Goes
back
to
the
first
issue,
which
is
defining
the
experiment
was
the
goal,
because
I
think
you
need
to
know
what
you're
measuring
in
order
to
make
a
good
decision.
I,
if,
if
you're,
counting
clicks,
that
should
be
the
with
the
decision
point.
But
if
it's
not,
then
then
like,
why
even
bring
it
forward?
If
it's
not
interesting,.
A
A
Yeah
yeah,
what
what
I'm
trying
to
say
is
that
the
main
health
metric
of
the
experiment
is
not
always
exactly
the
same
as
the
direct
metric
you
are
monitoring,
so
clicks
doesn't
always
mean
the
health
of
the
project
is
going
well.
If
there's
many
clicks
but
yeah,
it
is
part
of
the
definition
phase.
I'm
just
wondering
like
how
much
finesse
or
granularity
are
we
going
to
offer
to
our
users
in
defining
like
the
main
metric
to
be
presented,
to
say
all
right,
it's
good
or
not,.
B
Yeah,
I
see
what
you're
saying
I
think
for
the
think
big.
We
need
definitely
need
to
take
this
under
consideration,
because
I
think
that
you
know
the
number
of
clicks
will
tell
you
how
many
people
actually
interact
with
your
new
thing,
which
is
interesting.
But
yes,
it
doesn't
necessarily
convert
to
revenue.
So
I
guess
in
the
goal
setting
we
would
probably
need
to
define
the
metric
the
decision
metric,
which
would
be
revenue,
and
then
you
would
have
a
place
to
define
supporting
metrics,
maybe
even
in
a
yaml
file.
A
Check
what
is
more
like
what?
What
would
you,
what
would
you
say,
is
more
part
of
the
definition
phase.
From
your
point
of
view,.
B
So
I
think
I
think
you
know
in
order
to
make
a
decision
or
even
if
we
wanted
to
automate
a
decision,
you
need
to
have
a
hard
metric.
That
says
you
know
a
clear
winner
winner.
So
if
we're
measuring
revenue,
you
want
to
see
what
what
converted
into
the
highest
revenue.
B
So
you
need
to
find
a
way
to
measure
and
compare
the
variance
and
choose
the
winner
and
if
you're
talking
about
not
a
comparison
measurement
but
to
see
I
don't
know,
like
number
of
clicks,
went
up
by
10,
that's
the
end
of
the
experiment
or
something
like
that.
So
there's
different
different
ways
to
end
an
experiment.
It
can
be.
You
reach
your
goal.
It
could
be
duration.
A
B
So
if
you
remember
the
interview
we
had
with
booking.com,
they
mentioned
that
they
have
experiments
for
two
weeks.
They
have
a
set
date.
Every
experiment
runs
for
two
weeks
and
then
it's
done
and
they
collect
the
data
after
two
weeks
and
decide
based
on
that,
so
it
can
be
also
duration,
regardless
to
like
a
season.
A
A
A
Let
me
think
so
definition
phase.
Do
you
also
consider
how
I
always
think
it's
like
you
know.
Part
of
the
discussions
happen
there
as
to
how
we're
gonna
do
this
in
the
product
like
what
the
experiment
changes
are
gonna,
be
like
like
there's
the
development
part
of
it
as
well
right.
B
A
Experiments
it's
setting
so
know
what
you're
measuring
development
of
the
experience,
which
is
setting
up
tracking
and
setting
up
different
variants.
A
And
I
would
actually
say
you
know.
Part
of
this
is
also
the
design
and
discussion
leading
up
to
that.
A
A
B
So
in
my
mind
it
was
always
a
single
feed
feature
flag
and
I
think
we
can
do
something
really
nice.
I
know
you
hate
when
I
go
to
solutionizing,
but
I
think
we
could
do
something
really
nice,
where
a
combination
of
experiments
which
are
flags
could
be
tied
under
one
epic,
like
the
mother
of
experiments.
A
So
how
would
you
see
that
working
across
like
a
b
tests
like
like
I'm
thinking
of
flow
here,
so
say
that
you
have
an
ecommerce
flow
where
there
is
a
you're
buying
a
project,
a
product?
And
you
have
to
you
know
like
authenticate
with
your
bank,
then,
which
is
a
different
product
and
then
come
back
to
the
original
e-commerce
website,
which
kind
of
leads
you
further
into
buying
the
product
and
like
finishing
up
making
aware
that
they
sent
you
an
email
off
confirmation.
A
So
in
this
case
in
this,
in
this
situation,
the
a
b
test
would
go
beyond
just
the
initial
project,
you're
developing
for
and
perhaps
you're
also
part
of
the
bank
bank.
Banks
thing.
So
in
that
case
a
single
feature
flag
will
not
tie
to
a
single
experiment
like
an
experiment
can
span
beyond
like
a
single
feature.
Do
you
think
that
is
misinterpreting
this.
B
There
that
there
isn't
a
difference
between
a
b
test
and
future
flags.
In
that
sense,
every
feature
flags
relates
to
a
single
feature.
So
I
would
say
a
b
testing
relates
to
a
single
feature.
Having
said
that,
you
can
have
multiple
feature
flags
that
are
turned
on
and
off
in
different
strategies
in
different
environments,
and
you
need
to
manage
them
all
in
an
instance
level
or
on
an
environment
level.
A
Somehow
so,
how
let
me
give
a
different
example
just
to
test
to
test
the
waters
here
say
you
have
a
micro
surface
setup
of
your
application.
A
A
B
Yes,
so,
regardless
of
the
fact
that
it
can
span
different
projects
again,
what's
interesting
is
the
environment
level.
So
if
the
projects
all
deployed
the
same
environment,
it's
really
important
to
see
them
all
at
once,
but
it's
also
important
to
measure
one
each
one
of
them
individually,
and
I
think,
especially
if
you're
talking
about
a
microservice
architecture,
the
ability
to
silo
one
of
those
experiments
is
really
important,
because
you
can
also
decide
that
one
of
them
is
is
finished
and
it's
achieved
this
goal,
but
the
others
have
not.
B
A
A
Okay,
that
actually
makes
sense,
and
especially
as
if
it
spans
yeah
like
a
b
test,
can
be
so
big,
on
the
other
hand
like
if
we
go
group
level
immediately
with
a
b
test,
it
does
mean
that
you
always
need
to
have
a
group
in
order
to
do
an
aud
test,
which
I
would
say
makes
sense.
But
you
know
if
you
want
to
support
a
single
project
with
a
b
test
within
that
single
project.
A
You
would
need
a
group
as
well
so,
but
on
the
other
hand
like,
I
think
I
think
I
think
this
is
something
that
shouldn't
be
too
much
of
a
problem,
because
most
of
the
companies
were
targeting
with
this
would
be
larger
than
a
single
project.
Right.
B
Yeah,
so
what's
really
important-
and
we
didn't
mention
this
in
the
user
flow-
is
how
to
view
all
the
experiments
that
are
currently
running
in
my
projects.
B
A
For
the
definition
phase,
like
you
know,
like
the
discussions
around
what
the
experiment
is,
gonna
contain
like
what
are
you
gonna
measure
and
then
the
development
of
the
experiment.
Of
course,.
B
A
A
B
Into
details,
and
not
look
only
in
the
high
level,
then
we
also
need
to
take
here
into
account.
I'm
putting
this
under
the
questions
section
freeze
what
sorry,
what
should
we
do
when
line
and
freeze
or
incidents.
A
Feature
flex
you're
beyond
like
deployments
interactions
right
like
it's
already.
B
The
future
flags,
we
actually
opened
an
issue
to
disable
them
when
there's
an
incident
going
on,
because
you
don't
know
when
you're
handling
an
incident.
If
something
is
going
wrong
because
someone's
playing
with
a
flag
or
something
so
we
disabled
everything
we
haven't
yet
there's
an
issue
for
it.
A
Is
that
desirable,
as
a
as
a
thing
like,
I
would
wonder
like
when
we
have
the
feature
flag
types
right,
the
different
types
some
of
them
are,
you
know
featured
like
that,
should
be
around
forever.
Some
are
access.
Limiting
some
are,
you
know
short
bound,
and
this
would
kind
of,
like
just
say
all
right,
there's
something
going
on
disable
all
of
them.
That
would
mean
that
half
the
project
will
not
be
running
as
it
was
anymore.
B
A
B
B
So
let's
say
you
did
an
experiment
and
then
you
changed
it
along
the
way,
and
now
you
know
it's
totally
either
the
users
are
seeing
something
totally
different.
So
I'm
wondering
I
don't
know
if
this
is
a
must.
A
Okay
cool,
so
we
have
the
definition
phase.
We
got
the
ongoing
tracking
phase
and
then
we
have
the
decision
phase,
and
this
is
kind
of
looping
back
into
that
kind
of
like
discussion
around
the
experiment.
Brief,
I
would
say
yeah:
do
you
want
to
get
back
into
that
that
brief
or
that,
like
discussion,
place
where
you're
going
to
document
what
has
been
decided
upon.
B
I
think
so
I
think
that
the
experiment
needs
to
needs
to
end
to
okay,
so
you
make
a
decision
back
right.
B
A
You
still
there
yeah
yeah.
That
seems
good
how
about
we.
We
can't
like
we
now
have
some
initial
favor
and
we
kind
of
like
make
a
figma
document
tomorrow,
where
we
kind
of
you
know
create
initial
steps
similar
as
in
a
three-year
vision.
We're
gonna
kind
of
you
know
like
detail
this
out
a
little
bit
further
like
how
this
looks
how
this
like,
which
subflows
there
are,
and
ideally
a
little
bit
like,
set
it
up
with
a
job
to
be
done.
A
A
Sounds
so
let
me
set
up
that
think
my
document
for
tomorrow
and
I'll
add
it
to
the
media
cool.
A
Document
by
the
way,
a
small
request
tomorrow,
we
also
have
the
three-year
vision
review
and
when
I
was
looking
back
into
the
document
of
the
three-year
vision
there
were
these
notes
that
we
kind
of
like
discussed
briefly.
They
were
mispositioned,
so
I
was
wondering
like.
Could
you
do
a
small
review
like
give
it
like
15
minutes
or
20
minutes
of
your
time
and
write
a
little
piece
at
each
of
the
steps
of
flow?
A
see
all
right,
hey?
This
is
what
I'm
thinking.
These
are
my
thoughts.