►
From YouTube: Community Engineering Hangouts. Mar 24, 2021
Description
Agenda:
- Predictive Tests Selection by Soumya Unnikrishnan
- Mocking 3d Party Services for Testing Infrastructure by Alex Kolesnyk
A
Okay,
guys
so
welcome
the
community
engineering
hangouts.
Today
we
have
two
really
interesting
topics
connected
to
some
test:
automated
testing
models
and
strategies
and
how
to
how
to
deal
with
some
infrastructure
problems.
Okay,
so
go
ahead
with
your
presentation.
B
Here
we
won,
my
name
is
samuel
and
I'm
a
part
of
the
quality
engineering
team.
What
we
do
here
is
we
work
on
building
solutions
and
tools
for
improving
quality
processes
for
product
teams
here
at
magento,
I'm
here
to
present
a
small
talk
on
the
work
that
we
have
done
so
far
on
the
predictive
test:
selection
research.
B
So
we've
based
this
research
on
papers
published
by
facebook
and
google
on
how
they
are
deeming
the
continuous
testing
process,
especially
when
they're
seeing
high
feature
churn
rate
and
an
ever
increasing
test
full
size.
B
B
We
are
also
seeing
that
the
test
infrastructure
costs
to
run
these
ev
increasing
test
processes,
also
increasing.
So
we've
been
researching
on
certain
ways
to
implement
a
change-based
test
approach,
which
means
running
tests
that
are
relevant
to
the
coaching
instead
of
exercising
all
tests
on
those.
B
B
So
if
we
estimate
this,
we
can
rule
out
tests
that
are
extremely
unlikely
to
fail
on
that
code
change.
So
what
we
did
here
was
we
trained.
We
used
standard
machine
learning
techniques
to
train
a
predictive
model
with
a
large
data
set
containing
test
results
on
historical
code,
changes
that
are
collected
by
our
mts
platform,
our
country's
integration
platform.
B
So
this
model
then
selects
tests
based
on
the
probability
score
of
a
test
failing
on
a
new
code,
change
I'll
talk
a
little
bit
about
the
proof
of
concept
that
we
worked
on.
So
what
we
did
was
we
collected
three
months
of
functional
ce,
build
which
has
these
mftf
test
laws
right.
So
we
collected
that
from
the
jenkins
job
archive
database,
so
we
specifically
looked
at
certain
job
types
like
mts
api
and
existing
prs.
B
The
reason
why
we
only
consider
these
job
types
for
the
poc
is
because
we
could
easily
find
the
information
for
extracting
pull
request
numbers
from
these
logs
from
the
pull
request.
We
could
get
a
change
list.
Information
from
github
other
supporting
data
sets
that
we
used
were
module
dependencies
of
mftf
tests
as
well
as
module
to
domain
mapping.
B
So
this
essentially
formed
the
raw
data
that
we
worked
with
for
this
poc.
B
So
from
the
data
we
collected,
we
saw
that
very
few
of
our
tests
actually
failed,
but
those
that
do
are
generally
closer
to
the
code
they
test.
So
from
the
the
data
that
we
collected,
we
built
a
data
set
containing
code,
change,
information
and
right
and
test
information
test
outcomes
of
those
changes
to
train
the
model.
B
B
B
So
we
built
a
data
set
that
contained
features
we
extracted
from
the
collected
data.
We
categorized.
These
features
as
change
level
features,
test,
level
features
and
cross
features,
so
change
level
features
were
features
related
to
the
code
change
itself
so
which
included
change
history
of
files,
the
file
cardinality
number
of
dependent
modules
for
a
specific
file
file.
Extensions
number
of
authors,
test
level
features
were
historical
test
failure
rates
that
we
saw
from
the
runs
that
we
we've
been
doing
for
these
past
three
months.
B
Cross
features
are
features
which
are
engineered
from
the
change
level
and
the
test
level
features.
So
we
looked
at
features
like
number
of
intersected
modules
between
code
change
and
test
number
of
intersected
domains,
number
of
common
tokens
in
the
file
paths.
B
I'll
talk
a
little
bit
about
the
model
we
trained
to
predict
the
test
failures,
so
we
used
a
standard
machine
learning,
algorithm
gradient,
boosting
decision
tree
classifier,
so
we
used
very
standard
machine
learning
techniques.
We
did
a
70
30
split
of
the
data
set
such
that
the
most
recent
records
fall
into
the
testing
data
set
and
the
remainder
the
previous
records
fall
into
the
trading
data
set.
So
this
way
we
wanted
to
ensure
that
the
evaluation,
the
model
evaluation
closely-
represents
how
the
model
is
going
to
be
used
in
production.
B
B
So,
in
order
to
avoid
predictions
of
any
false
positives,
we
you
know
we
encounter
test
flakiness,
very
often
in
our
bills,
so
we
wanted
to
attempt
to
not
include
such
flaky
tests
into
the
training
data
to
prevent
false
predictions.
So
here
the
term
flaky
means
that
the
test
had
at
least
one
re-run
and
a
build.
So
in
our
ci
system
we
have
a
concept
of
re-running
a
test,
a
maximum
of
three
times
until
it
passes
to
it's
a
deflating
strategy.
So
we
wanted
to
exclude
such
tests
from
our
training
data
set.
B
So
what
we
did
was
we
we,
we
set
a
threshold
and
we
removed
such
flaky
tests
from
the
training
data.
B
So
in
machine
learning
model
in
machine
learning
generally
what
you
do
when
you're
training
a
model,
you
also
do
certain
hyper
parameter
tuning
of
the
model.
So
what
it
does
essentially
is
to
select
the
best
features
that
the
model
was
trained
on
and
use
that
for
predictions.
B
What
we
found
was
that
the
best
performing
model
or
the
model
that
was
giving
us
the
best
evaluation
metrics,
were
considered
features
like
test
failure,
rates,
file,
extensions
change,
history,
number
of
intersected
modules,
a
number
of
dependent
modules
as
the
strongest
feature
in
the
trading
dataset.
B
So
coming
to
calibration
of
the
model,
so
we
use
standard
machine
learning
metrics
like
recall,
scope
to
evaluate
how
the
model
did
on
the
test
data
so
in
our
case
a
test
data.
So
we
used
three
months
of
data
for
for
this
poc
and
the
first
two
months
of
data
was
used
for
the
training
and
the
third
month
of
data
was
used
for
testing.
B
So
the
the
test
data
is
basically
the
data
extracted
from
one
month
of
pull
requests
submitted
right
after
the
time
period
of
the
data
with
which
the
model
was
trained.
So
we
looked
at
two
metrics
that
were
that
we
were
interested
in,
which
was
test,
recall
and
change.
Recall
so
test
recall
indicates
the
percentage
of
test
failures
correctly
predicted
in
the
test.
Data
and
change
recall
indicates
the
percentage
of
build
failures,
correctly
predicted.
B
So
we
are
currently
looking
at
so
from
the
results.
The
promising
results
that
we're
seeing
on
the
poc.
We
are
currently
working
on
high
level
design
of
how
we
could
integrate
this
into
our
into
our
mts
or
the
ci
platform.
B
So
the
model
meeting
our
criteria
automatically
replaces
the
one
operating
in
production.
So
the
summary
of
it
is
that
we
save
new
training
data
as
we
receive
it.
When
we
have
enough
data,
we
train
the
model,
we
test
its
recall
against
the
machine
learning
model,
and
if
we
see
that
the
accuracy
of
our
model
is
degrading
over
time,
we
we
do
some
more
feature
engineering
and
improve
the
scores
of
that
model.
B
So
some
of
the
next
steps
that
we
are
working
on
is
doing
more
testing
of
this
poc,
and
we
are
also
currently
talking
to
some
of
the
architects
on
the
mts
side,
the
platform
side,
as
well
as
magento
architects,
to
finalize
on
a
high
level
design
of
how
this
strategy
would
be
implemented
with
our
existing
ci
infrastructure.
B
So
we,
the
the
whole
idea,
is
that
we
would
be
reducing
the
number
of
tests
that
we
are
running
on
our
pull
request
and
the
user
defined
our
user
requested
bills,
cutting
it
down
by
say
even
one-fourth
if
we
are
seeing
good
metrics
and
then
have
bills
that
are
scheduled
on
on
a
specific
currency,
every
four
hours
that
will
run
all
tests,
so
we
still
want
to
exercise
all
tests,
but
the
number
of
so,
but
the
frequency
of
it
would
be
sort
of
on
a
cadence
rather
than
having
it
on
every
build.
B
So
there
are
several
strategies
that
are
currently
under
discussions
and
we'd
love
to
keep
this
group
updated
on
when
we
have
when
we
have
a
formalized
approach
on
how
we
will
go
about
this
project,
we
also
are
looking
at
you
know,
flaky
testimony
which
is
one
of
the
the
biggest
factors
that
are
slowing
down
our
pr
processing.
B
We
are
looking
at
ways
to
how
to
quarantine
these
flaky
tests
and
not
have
them
run
as
a
part
of
our
clog,
our
delivery
process.
B
C
Now
do
you
see
my
screen
yep
good,
so
my
presentation
is
not
as
cool
and
fancy
as
some
yes,
but
I
think
it
is
very
important,
as
well
as
predictive
test
selection
and
how
we
build
our
test
and
infrastructure.
C
C
If
we
use-
let's
say
payment,
paypal,
integration
or
integration
with
youtube
videos,
something
like
that,
we
usually
create
an
account
which
directs
to
paypal,
sandbox
or
youtube
sandbox,
where
we
can
play
and
test
that
our
application
works
correctly.
So
and-
and-
and
you
can
say
that-
then
what
is
the
problem?
Those
sandboxes
were
created
right
for
this
specific
purpose,
so
you
can
test
your
application
works
correctly.
C
C
C
C
Well,
integration,
redirects
us
to
paypal,
ui,
page
paypal,
web
page
sorry,
sometimes
our
tests
been
written
to
use
specific
selector
to
click
a
button
or
do
something
and
then
suddenly
they
just
change
this
selector
to
something
else.
So
task
test
will
fail
and
and
and
that's
causes
a
lot
of
maintenance
issues,
and
basically
this.
This
is
a
problem
for
us
and
and
there's
many
more
other
problems
we
can
face
with
those
sandboxes
when
we
have
this
connection
with
the
outside
world.
C
So
what
do
we
want
to
do?
And
it's
it's
currently
on
proof
of
concept
stage?
It's
not!
It's
haven't
been
developed
yet,
but
we
look
forward
to
see
this
being
implemented
and
work
fine
for
for
us
good.
But
what
do
we
want
to
do?
We
want
to
mock
those
third-party
services
and
we
decided
to.
C
Build
couple
more
docker
containers
for
our
infrastructure,
which
will
serve
this.
The
main
the
main
idea
is
that
all
requests
which
goes
outside
of
magento
will
go
through
the
proxy
server
and
proxy
server
will.
If,
if
proxy
server
will
know
that
this
request
should
be
marked,
it
will
go
to
third
party
service
mark
and
this
service
mark
will
give
us
a
necessary
response.
The
response
we're
waiting
for
and
it
can
be
anything
it
can
be-
can
be
any
format
you
can
imagine
in
the
html.
C
We
do
it
and
everything
else
which
is
not
mocked
or
shouldn't
be
mocked
or
we
haven't
mocked
yet,
but
we
will
do
this
later,
we'll
go
to
the
world
wide
web
and
get
will
will
get
data
from
reels
and
boxes.
C
C
Do
you
see
phpstorm
yeah?
I
can
see
it
okay,
so,
as
I
mentioned,
we
will
have
a
proxy
server
which
will
filter
those
requests
and
and
redirect
them
to
some
services
and
the
server
smoke
itself.
So
we
we
decided
to
build
to
use
a
basic
docker
container
functionality
where
you
can
specify
and
and
in
my
example,
I
will
use
selenium
chrome,
debug.
C
Container,
which
will
in
and
in
this
container
we
will
try
to
open,
google.com
and
see
how
we
can
mock
google.com
page.
So
this
is
the
main
configuration
here
and
basically,
you
just
need
to
execute
a
simple
command
of
docker
compose
up
and
it
will
bring
everything
up
and
will
do
and
will
do
the
magic
for
you.
C
The
idea
is
that
when
we
start
selenium
chrome
debug,
you
can
specify
environment
variables,
docker,
environment
variables
called
http
proxy
and
you
can
specify
where,
where
this
proxy
is
located,
so,
as
you
can
see
here
in
my
configuration,
I've
got
proxy
server
dot
docker
on
this
specific
port,
and
there
is
my
proxy,
the
docker
container.
That's
what
I
have
here.
I
show
you
later
and
this
will
this
will
sort
serve
as
a
proxy
server
and
then
I'm
having
my
service
mock
and
I'm
going
to
mock
google.com.
C
This
is
a
node.js
application
which
creates
a
server
which
listens
to
this
requests
and
also
when
we
start
up
this
service,
we
register
all
service
mocks.
C
C
C
I
won't
go
through
the
all
parts
of
implementation.
I
will
show
you
the
main
part,
which
is
the
most
interesting
for
everyone,
is
how
it's
gonna,
how
we
can
put
those
responses.
What
mark
those
responses
we
would
like
to
see.
So,
as
you
can
see
here,
you
can
specify
any
you.
You
can
specify
the
link.
C
You
would
like
to
hear-
and
you
can
say
what
you
would
like
to
respond
here
and
you
can
basically
build
here
whatever
you
want,
if,
if,
if
you
you,
you
can
even
put
here
your
own
logic,
which
will
build
this
response
based
on
the
post
request
you
send
in
some
third-party
service,
you
can
do
whatever
you
want,
but
I've
got
here
pretty
simple
example.
C
I
will
respond
that
I'm
a
google
and
if
I
will
go
to
a
google.com
test,
it
will
respond
on
my
test,
google
and,
as
you
can
see
here,
you
can
read
get
parameters
as
well
as
posting
anything
else.
I
just
don't
have
here
an
example
for
this:
let's
run
this
up
and
turn
them
down.
C
C
You
you
will
see,
you
will
see
errors
soon.
Do
you
see
my
docker
dashboard
right
now,
yeah
good.
So,
as
you
can
see
here,
this
is
my
main,
selenium
standalone,
chrome,
debug
container,
which
I
use
to
execute
functional
passwords
and
do
something
else,
and
those
three
are
my
mocking
services
containers,
including
this
separate,
selenium,
chrome,
debug.
C
So
what
I
want
to
show
you
I'll
start
I'll
connect
to
this
one,
which
is
my
main
container
I'll,
go
to
google
chrome.
C
Yeah,
so
this
is
my
main,
which
is
do
not
have
any
proxy
set
up
here
and
if
we
will
go
to
another
one,
which
is
a
different
port
and
we
will
try
to
open
chrome
browser-
and
I
know
what's
going
to
happen
right
now-
it
will
show
an
error
and
I
didn't
right
now.
I
don't
know
why
this
is
happening.
Yes,
so
no
internet
connection.
C
C
There
is
my
presentation,
another
beauty
of
this
approach,
comparing
to
other
third-party
tools
like
we,
we
investigated
one,
maybe
you're
familiar
with
it
called
monty
bank,
but
the
problem
with
that
is,
you
need
to
go
and
configure
magento
itself
to
work
with
monty
bank.
So
if
you
have
integration
with
google.com,
you
need
to
find
a
place
where
google.com
is
hardcoded
or
configured
in
magento
change
it
to
to
link
which
monty
bank
provides
you
and
then
it
will
work.
C
In
our
case,
you
don't
you,
you
have
to
do
zero
configuration
for
your
magenta
application.
It
will
work
even
with
those
urls
and
configurations
just
you've
put
into
magento.
C
D
Hey
is
there
a
way
to
lock
all
requests
that
proxy
receives.