►
From YouTube: Ceph Testing 2018-09-19
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
How
do
you
go
about
detecting
Danny
they're
like
number
of
ways,
but
the
one
that
we've
implemented
now
is
like
using
machine
learning,
that's
based
on
clustering
and
classification
algorithm.
If
you,
if
you
wanted
to
use
this
technique
and
ideally,
if
you
want
to
use
machine
learning,
then
you
would
go
about
implementing
the
machine.
Learning
core
choose
the
right
infrastructure,
the
storage
back-end
like
a
lot
of
those
tiles,
and
that
would
take
time
what
we
have
here
is
like
we
take
care
of
all
those
stars
like
we
already
expose
that
as
a
service.
A
So
all
you
need
to
do
is
like
upload.
Your
data
using
the
REST
API
call
and
then
get
the
analysis
done.
So
it's
as
easy
as
that
and
cool
the
workflow
of
like
what
the
infrastructure
looks
like
in
the
back
end.
The
we
use
are
safe
for
the
storage
pack,
like
that's
to
store
your
training
data
set
or
your
test
failures,
extra
track
and
the
code
itself.
It
runs
on
open
ship
cluster
to
the
machine
learning
code
like
training.
A
The
model
is
quite
computationally
intensive,
so
we
trigger
that
as
a
separate,
open
shift
job
and
what
actually
interfaces
with
the
API
call
it's
the
open
list.
It
says
over
this
platform
that
actually
receive
you
receives
your
REST
API
calls
to
either
train
the
job
train
the
model
or
like
predict
whether
a
test
credit
has
flakes
or
not,
or
whether
you
want
to
pull
the
status
of
a
particular
job
like
was
the
training
completed
or
what's
the
status
with
the
prediction,
the
things
like
that,
so
we
use
open
risk
for
those.
A
Let
me
do
your
sample
training
data,
so
this
is
just
one
failure,
data
and
the
training
set,
so
you
would
have
a
lot
of
other
failures
too.
So
the
obviously
the
status
is
failure
and
it
has
the
failure.
Information
like
the
trace
pack
or
any
other
error
messages,
and
another
important
thing
is
like
if
you
actually
flag,
whether
it's
a
false
positive
or
not.
It's
it's
based
on
a
historical
data
value
and
you
have
a
test.
You've
recorded
that
as
like,
they
like
being
a
falls
pastor
or
not
the
based
on
that.
A
A
So
we
assume
that
the
developer
considers
that
as
a
false
positive,
but
if
you
have
any
other
ways
of
doing
that,
that's
fine
too,
like
all
you
need
to
do,
is
like
ten
historical
data
that
has
already
been
tagged
as
further
being
false.
First
to
annoy,
and
once
that
training
is
done,
then
you
send
in
the
new
failures,
whatever
you
see
and
that
the
model
actually
takes
that
and
then
gives
you
back
the
result
thing.
A
Oh
there,
you
go
so
when
the
model
gives
you
back
the
result,
so
each
of
the
spaced
failures
that
is
saying
would
actually
be
offended
with
a
probability
value
saying
like
okay,
this
particular
test
as
an
89%
chance
that
it's
a
flake
now
it's
up
to
the
individual
team
together
to
decide
on
the
threshold.
Now,
if
it's
like
look
what
the
cockpit
team
does
it
like?
A
Even
if
it's
about
50%,
the
tagger
gets
a
flake
and
you're,
confident
enough
to
trust
the
analysis
and
then
you're
just
gonna
rerun
it
instead
of
just
going
to
deep
dive
analysis
so
that
that's
the
gist
of
the
flake
analysis,
so.
C
D
C
Trigger
the
problem,
how
is
there
anything
in
there
that
and
that
that's
sort
of
trying
to
pick
out
the
right
feature?
Because
if
all
the
other
stuff
is
like
you
know,
it's
similar
between
runs,
but
it's
a
little
different
because
they're
different
tests,
and
so
there's
different
just
so
I'm
like
how
does
it.
A
Right
now
we
assume
that
user
is
going
to
upload
the
data
in
the
clean
format,
but
I
know
like
that's
not
possible,
because
the
data
cleansing
in
the
data
free
frosting
needs
to
happen
so
I'm
working
on
the
library
library.
That
does
that
but
I'll
be
happy
to
like
work
with
your
team
like
see
like
how
we
need
to
clean
up
your
data
and
kind
of
like
push
it
to
the
staff
back
and
I
can
work
on
that,
because.
A
B
Could
maybe
you
know
we're
talking
about?
Are
we
talking
about
tooth
ology
here?
Are
we
talking
about
yeah
yeah
yeah?
So
if
we're
specifically
talking
about
tooth
ology,
what
we'd
want
to
collect,
you
know
like
possibly
the
entire
job
config,
or
at
least
you
know
large
sections
of
it,
because
you
know
I
think
it
would
be
useful
to
can
consider
even
things
like
what
OS
we
ran
on
and
then
grabbing
the
trace
back
plus
as
much
as
much
context.
A
Right
now
it's
not
triggered
I
mean
we
just
implemented
that,
and
so
we're
just
like
pushing
it
out
to
different
themes
if
it
if
it
has
to
be
triggered
on
a
like,
like
a
cron
job
like
on
a
regular
basis
that
could
be
done
as
well
right
now.
This
service
just
sits
there
idle
down,
but.
C
At
the
end
of
the
day,
we
have
to
provide
assertion
data
to
actually
train
it
right,
so
I
think
I
mean
stepping
back.
A
second
I
think
it's
it's
important
to
think
about
what
we're
talking
about
when
we're
talking
about
flaky
tests,
because
I
think
they're
they're,
two
they're,
two
main
categories
that
I
can
think
of
one
is
that
there
is
there's
a
pathology
test,
failure
that
has
nothing
to
do
with
the
test
itself.
It's
like
Fogg
failed
to
provision.
C
The
Machine
I
get
those
pretty
frequently
or
this
times
out
it
doesn't
version
machine
or
sometimes
there's
like
a
network
failure
or
some
other
like
Hardware
issue
in
the
lab.
That
just
makes
desafio
has
nothing
to
do
with
the
testing
run,
but
that
would
we
normally
call
those
infrastructure
failures.
C
So
that
would
be
something
that
if
we
can
identify
those
automatically,
that
would
be
great,
but
the
other
thing
is
that
sometimes
the
tests
are
unreliable.
We
have
we
have
bugs
that,
don't
don't
trigger
every
time,
so
the
test
will
pass
99
times
and
the
time
it
will
fail,
not
because
it's
a
bad
test,
but
because
it's
just
a
hard
corner
case
to
hit,
and
so
but
those
we
don't
want
to
ignore
right.
Those
are,
or
sometimes
we
have
a
test.
That's
just
flaky
like
it.
C
You
know,
it'll
pass,
you
know
nine
times
out
of
ten,
but
then
the
tenth
time
it
will
fail,
because
it's
a
poorly
written
test
like
I,
think
those
are
things
that
we
wouldn't
want
to
filter
out
and
train
against,
because
we
want
the
test
to
be
better.
We
want
to
see
those
failures,
so
we
know
that
the
test
is
bad,
so
we'd
want
to
we'd
want
to
be
really
careful
about.
With
about
what
we
submit
early.
Do
this
thing:
yeah
yeah,
and
sometimes
we
you
know.
C
B
B
A
No,
it's
it's
pretty
user
to
define
like
what
a
false
positive
is.
I
mean
the
carpeting
is
seems
like
anything
that
the
developer
merges,
even
though
it's
a
that's
false,
positive,
but
I
mean
how
you
present.
The
historical
data
is
up
to
you
and
how
you
define
false
positive
is
up
to
the
team
and
I
say
it's
a
like
psychotic
failures,
which
you,
which
you
expected
fail,
because
the
semester
by
running
the
oxalic
acid
or
whatever
so
yeah
I
mean.
If
you
want
to
skip
those
and
can't
like
record,
then
that's
different.
A
C
It
it
seems
to
me,
like
sort
of
the
the
minimum
that
we
need
to
do.
We
can
talk
about
with
a
minimum
minimum
and
we
would
need
to
do
in
order
to
take
advantage
of
this.
I
think
that
the
first
thing
would
be
we
need.
C
We
would
need
a
way
to
like
explicitly
say
that
this
test
failure
is
a
false
positive
in
the
absence
of
inferring
it
magically
from
whether
it's
merged
or
not
so
either
like
you
know,
some
convention,
like
you,
touch
a
file
and
the
test
directory
or
if
there's
a
button
in
papito
or
something
you
make,
it
read
right,
there's
a
button
that
you
click
like
this
was
an
infrastructure,
failure
and
you're
like
market,
something
like
that
because
then
once
we
have
that,
then
we
can
have
something
that's
trained
against
and
then
it
seems
I
guess
the
second
half
of
that
would
be
I've
teeth.
C
B
Yeah,
that
seems
that
seems
feasible,
so
we'd
want.
Let's
see,
we
need
to
figure
out
what
what
sort
of
component
would
be
submitting
the
data
right.
So
if,
if
we
wanted
to
do
it
through
a
pool
papito
it,
for
example
like
it
might
warrant
a
different
service,
that
papito
could
ping
and
have
it
do
it
or
for
me
to
ever
do
it
itself
just
up
until
today,
it's
just
kind
of
a
a
reading.
D
C
Maybe
I'm
I'm
a
little
bit
unclear
how
the
how
the
training
versus
querying,
if
they're,
exposed
explicitly
separate
like
you,
explicitly
provide
a
training,
set
train
the
model
and
then
you
separately
provide
data
points,
and
you
say
this
is
a
failure
or
not,
and
it
doesn't
actually
coordinate
so
stateless.
So.
A
Yeah
I
knew
he
provide
entire
data
set
and
I
could
probably
like
do
the
splitting
of
the
training
versus
the
prediction.
But
the
idea
here
is
like
once,
you
train
the
model
like
for
future
prediction,
you're
kind
of
like
sending
in
light
data
and
as
soon
as
the
test
fails,
you're
tending
eating
training.
A
What
casting
do
you
like?
Is
it
like?
You
run
the
test
again
a
much
like
a
pull
request,
or
is
it
like
a
functional
testing
or
that's.
C
B
So
the
test
that
we're
talking
about
like
the
tooth
ology
tests,
they're
not
run
they're,
not
you
know
by
like
github
hooks
or
anything
right,
though
a
human
goes
and
merges
a
bunch
of
PR
branches
and
then
runs
them
through
the
system.
That's
what
you're
talking
about
right
sage,
yeah.
C
B
B
Another
approach
we
could
take
is
so
I
I've
been
close
to
this
in
in
quite
some
time.
But
but
you
know
we
we
run
run
tests
for
PRS
and
we
do
Knightley's
and
and
humans
look
at
those
to
a
certain
extent
like
like
Yuri
who's
here,
and
we
do
have
people
that
are
pretty
good
at
spotting.
You
know:
what's
what's
an
infrastructure
failure
and
what's
a
test
failure,
you
know
it's
a
process
that
we
kind
of
do
already.
It's
not
exhaustive,
it's
not
perfect,
but
but
to
build
a
set
of
training
data
we
could.
B
What
we
could
do
is
is
just
this
is
one
idea
we
could
say
you
know
whoever's
looking
at
these
tests,
when
you
see
you
know
in
an
infrastructure
failure,
maybe
put
it
in
this
list.
Put
it
in
this
pile
of
things
we've
identified
as
infrastructure
failures,
then
we
have
a
tool
that
goes
and
scrapes
all
of
those
and
submits
them.
B
C
C
Anyway,
we
say
it
was
this
unknown
issue
was
caused
by
this
other
bug,
and
sometimes
we
do
it
over
email
and
we're
applying
to
the
static
you
a
list,
but
it's
all
very
ad
hoc.
We
don't
have
any
like
actual
tracking
of
that,
though,
if
it,
if
it
were
me,
if
we
were
making
like
one
user
interface,
it
would
be
in
full
pizza
as
we're.
C
B
C
A
So
I
think
like
once
you
do,
the
clustering
I
mean
we
can
also
expose
the
clusters
and
kind
of
to
like
high
level
topic.
Modeling
that
says
like:
where
does
this
cluster
actually
mean?
So
this
could
actually
tell
you
whether
it's
like
networking
or
security
or,
like
some
other
kind
of
problems,
yeah.
Maybe
that's
like
the
secondary
level
of
Michigan
for
life.
A
Not
enough
right
now,
like
it's
a
big
server,
so
it's
beefed
up
so
yeah.
It
should
handle
our
stuff.
How
big
is
big?
I
haven't
crashed
yet
so.
D
My
understanding
is
that
you
know
the
content
that
has
all
these
traced
back
into
the
server,
and
you
are
trying
to
query
again
ship
to
find
the
false
positive
side
and
I'm
just
wondering
the
trace
back
for
infrastructure
failure.
So
similar
issues
are
going
to
be
the
same.
It's
not
going
to
be
different
for
every
failure,
so
wouldn't
wouldn't
it
be
if
you
know
like,
what's
that
mentioned
people
who
ever
are
hitting
the
network
failure,
it's
just
a
failure.
I'm,
sorry,
trucks
are
failures.
D
D
B
C
B
D
D
B
B
A
The
training
data
would
have
only
the
false
positives,
so
that
basically
tells
brains
the
model
to
find
out
clusters
groups
of
failures
that
are
similar
and
when
you
start
submitting
new
failures
like
it
could
be
anything
you
can
just
send
in
all
the
failures
and
you're
just
gonna
find
out
the
chances
of
that
being
a
false
past
or
not.
Okay,.
A
C
All
right,
but
I've,
made
some
notes
on
the
pad.
It
still
seems
like
the
first
step
is
just
we
need
to
reliably
start
recording
this
data,
and
we
need
probably
need
up
quite
a
bit
of
it
before
we
can
have
any
meaningful,
no
garbage
in
garbage
out
before
we
have
any
meaningful
predictions
from
from
the
flake
analysis.
C
B
A
Yeah
I
mean
anything
in
the
mag's.
Like
probably
I
mean
I,
don't
have
a
specific
number,
but
it's
just
I'd
like
to
see
like
how
well
you're
the
model
its
trained.
You
know
like
the
mode,
the
data,
the
better.
It
is
like
the
thousands
of
data
points
yeah
it's
it's
good
to
like
hundreds,
even
hundreds
or
it's
fine
to
start
with,
and
then
you
can
slowly
keep
a
pen
streaming
in
the
data
and
compare
it
to
a
training
set.
C
A
A
C
C
A
Though
I
don't
think
you're
gonna
do
the
other
initiative
that
you're
doing
is
like
we're.
Gonna
push
everything
upstream
and
we're
gonna
deploy
this
service
in
the
MOC.
That's
like
the
Massachusetts
yeah,
the
ones
that's
almost
there,
so
probably
by
this
week.
That
should
be
done
once
that
is
done.
Was
a
good
point
to
that.
A
A
C
Okay,
if
I
ambitious
I
will
do
anything
in
a
week,
but
because
I
think
I
think
before
we
can
really
start
using
this,
we
need
to
get
data,
which
means
we
need
to
make
changes
to
help
each
other
over,
which
has
to
be
prioritized
long
with
everything
else.
Yeah.
Okay,
the
tricky
part.
A
C
A
B
C
I
see
other
stuff
on
this
agenda
is
the
stuff
that
this
is
the
first
time
I've
actually
managed
to
join
this
meeting.
Oh
I,
don't
know
Howard
normally
doing
it,
but
just.
D
D
A
D
D
C
There
are
a
couple
things:
there's
like
the
dead
jobs.
Not
getting
logs
is
super
annoying.
That
wastes
a
lot
of
time,
that'd
be
nice
and
they
fixed
you,
but
yeah
there's
a
whole
list
of
stuff.
This
one
would
be
pretty
nice
I
think
this
adding
this
UI
that
lets
us
provide
some
meaning
so
like
what
the
failure
means.
Is
it
a
false
positive
or
is
it
associated
with
a
particular
bug
would
be
huge
or
it's
like
I'm
good
I
can't
tell
what
happened
here.
I,
don't
know
what
the
problem
is.
All
that
would
be.
C
It's
not
gonna
do
for
you,
I
mean
for
a
human
like
right.
Now
we
analyze
the
runs,
and
we
just
like
look
at
it
and
say:
oh
yeah,
that's
that's
that
one
I
don't
need
to
worry
about
and
then
that's
it
and
we
forget
about
it,
but
actually
recording
that
information
so
that
every
time
we
actually
link
it
to
the
bug
that
caused
the
failure.
If
we
did
that,
we
would
have
an
order
of
magnitude.
More
David
were
firm.
C
D
C
Mean
if,
if
I'm
being
consistent
and
good,
then
every
time
I
see
a
failure
and
I
figure
out
what
bug
caused
it.
I'll
go
look
up
that
bug
and
I'll
paste
the
path
to
the
failure
in
the
bug
ticket
so
that
when
you're,
looking
at
a
bug,
you
can
see
if
it's
still
causing
failures
or
if
it
seems
to
have
fixed
itself
or
dunaway.
Having
that
process
happen
automatically,
it
would
be
so
nice.
Why.
D
C
D
C
C
D
Wow,
that's
something
I'm
trying
to
figure
it
out,
because
the
rate
of
it
makes
it
like
a
big
music
bug
and
that's
a
blocker
now
for
3.1
I'm,
pretty
sure
we
have
the
stocks
test
or
safe
ansible
and
yeah
like
right.
Now
we
have
stuff
because
of
safe
and
stable.
We
have
stuff
little
bit
distributed
and
you're
missing
out
some
pickups
thing,
so
I.