►
From YouTube: Ceph Tech Talk: Making Teuthology a Better Detective
Description
Join us monthly for Ceph Tech Talks: https://ceph.io/en/community/tech-talks/
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
B
C
So
to
everyone,
I
am
without
reaching
summer
intern.
For
this
cohort,
I
was
working
on
the
project,
making
pathology
about
the
detective
I
worked
with
my
mentors,
Zach,
Junior
and
Josh.
C
So
in
this
presentation,
I
will
explain
General
overflow
of
pathology,
the
problem
that
I
worked
on
the
solution
that
we
came
up
with
the
benefits
of
those
solution,
a
little
brief
explanation
on
the
implementation
of
that
solution
and
how
we
can
adopt
that
solution
further
for
more
tests
and
feed
any
facial
improvements
that
we
can
have
so
to
start
with
a
general
overflow
pathology
is
a
framework
with
which
we
can
run
vast
amount
of
tests.
C
Which
is
specific
to
trigger
the
package
building?
So
when
we
push
this
FCI
Jenkins
is
a
package
filter
which
bills
package
for
that
specific
branch
and
pushes
the
information
about
the
package
to
shop
shopping?
Is
the
database
and
an
API
which
stores
information
about
what
the
state
status
of
the
pack
is
it
the
packages?
What
branches
is
run
against
and
all
of
the
information
so
now
to
schedule
a
test
soon?
We
if
we
have
the
right
access
to
the
CPL
Labs,
which
is
a
pool
of
labs
through
which
we
schedule
artists.
C
So
if
you
have
access
to
that,
you
run
the
technology
Suite
commands
with
and
pass
some
arguments
to
that
command,
which
specify
all
the
details,
that
of
that
the
schedule
scheduling
that
you're
doing
the
priority
it
should
have
which
suit
you're
running
and
what
filters
you
want
to
have
in
all
of
the
information.
So
the
pathology
scheduler
schedules
job
according
to
the
con
configuration
of
the
jobs,
the
performs
a
series
of
tasks
because
it
queries
Shaman
and
asks
if
the
package
is
built
and
it's
ready
if
the
shaman
replies.
C
Yes,
it's
ready,
then
pathology
is
scheduled,
goes
to
the
next
step,
which
is
building
the
entire
conflict
of
how
all
the
configurations
of
the
jobs
after
building
that
it
stores
that
information
to
adults
paddles
is
a
database
where
we
save
all
the
information
related
to
our
test,
run
and
all
the
jobs
so
and
it
stores
information
like
the
statuses
of
of
the
job.
If
it's
passing
failing,
we,
it
has
information
about
the
results
of
the
jobs
who
run
that
job,
then,
in
exact
time
that
they
ran
it.
C
C
So
the
Beanstalk
queue
has
also
in
a
different
cues.
So
we
have
the
Smithy
machines,
you
know
machine,
so
every
kind
of
machine
has
their
own
queue
and
that
pathology
dispatches
takes
up
jobs
from
that
you
and
a
dispatcher
responsible
for
doing
the
entire
running
of
the
actual
jobs,
which
is
getting
the
machines
ready
by
team
using
it
and
then
actually
running
those
jobs.
When
we
get
the
results
of
those
they
are
pushed
to
paddles
and
which
the
viewers
can
view
it
on
13th.
C
C
So
the
problem
that
I
worked
on
was
so
we
have
tasks
in
our
jobs.
Those
tasks
could
be
unit
tests
and
when
those
unit
tests
fail,
the
our
topology
throws
a
common
failure,
error,
which
is
a
vague
error
which
this
will
specify,
which
unit
test
is
failing.
So
so
what
happens?
Is
the
reviewer
has
to
go
through
the
pathology
log
files
which
can
be
really
large
and
look
through
or
that
log
file
to
find
where
its
tests
fail,
where
it
failed
and
all
of
that
information?
C
So
it
takes
quite
a
bit
of
amount
of
time
for
the
reviewer,
and
the
other
thing
is
that
the
error,
which
is
captured
in
paddles,
that
results
which
is
stored-
they
don't,
but
they
aren't
very
specific
or
meaningful,
because
the
a
description
of
the
failure
is
very
weak
So
currently-
and
this
is
a
screenshot
of
Applebee,
though
this
is
a
test
run-
and
this
is
a
job
ID
when
we
see
a
test
is
failing,
we
see
that
there's
a
command
failure
and
we
see
a
bunch
of
text
which
it
does
not
specify
that
which
unit
is
this
very
good.
C
Even
Factor
unit
test
is
failing,
so
so
the
inner
solution.
What
we
do
is
where,
before,
when
reviewers
had
to
look
through
the
pathology
logs,
that's
only
if
we
are
automating
that
feature
the
pathology
can
look
through
the
errors
in
vector
or
AC
log
files
and
throw
a
new
kind
of
error,
which
is
unit
test
error
instead
of
that
break
error,
so
we
made
so
if
you
pass
catalogs
through
a
key
value
error
in
the
job
CML
file.
This
will
enable
this
feature.
We
have
made
this
feature
opt-in.
C
B
C
C
C
So
the
benefits
of
these
Solutions
are
with
this.
We
save
the
time
of
the
engineer
to
go
through
the
pathology
log
files.
It
saves
about
five
minutes
of
Engineers
time
to
look
through
their
logs
per
run
and
it
also
improves
the
error
tracking
and
the
errors
that
are
stored
in
puddles
for
Century
purposes
and
instead
of
command
failure.
It
stores
the
exact
error
which
failed,
the
exact
test
switch
failed.
C
Another
great
point
is
that
the
the
time
difference
between
using
this
feature
and
not
using
feature,
isn't
great,
so
isn't
too
much
the
difference
between
the
time
and
the
error.
Is
a
scanner
takes
about
1.5
seconds
to
scan
half
a
gb5
the
ink?
If,
if
you
want
to
opt
into
this
feature,
you
can
obtain,
but
we
are
not
Sports
in
this
feature,
then,
if
you
think
that
your
pathology
logs
will
be
too
largely,
you
don't
want
to
opt
in
for
this.
We
are
not
forcing
this
feature.
Teams
can
incorporate
this
according
to
their
needs.
C
So
let's
look
at
how
implemented
this
if
the
features
enabled
pathology
will
look
for
all
the
rejects
is
related
to
that
error
to
that
unit,
test
error
type.
So
if
you
have
a
nose
test
running
and
we
so
its
failure
would
start
with
an
error
or
a
fail
keyword
before
the
error
message.
So
that's
what
a
pathology
scans
and
finds
that
error
message
for
then
we
have
a
flag
teacher
where
which
is
used
to
prevent
that
our
scanner
never
rereads
a
line.
C
So
it
keeps
store
of
the
last
line
that
we
read
in
the
pathology
log
file
and
keeps
checking
on
that
then
keeps
a
check
on
that
that
we
never
read
the
files
that
we've
read
before
and
lastly,
we
throw
our
unit
test
center
to
the
battles.
So,
while
scanning
for
implementing
this,
we
discussed
a
few
ideas
about
how
we
can
go
about
scanning
our
technology,
log
files
and
we
use
the
best
method.
We
can
do
for
scanning
these
large
files,
because
we
do
know
that
they
can
get
very
large.
C
I'll
go
through
a
brief
in
my
presentation,
then
I'll
give
a
quick
demo
where
we
can
look
at
the
code
where
we
have
to
actually
add
these
information.
Look
at
the
files
and
then
also
you
can
refer
to
this
information
in
the
pr
that
pressure
I
can
add
in
chat
APR
in
the
pathology
I
have
explained
in
the
entire
dogs
there
as
well.
C
So
there
appears
to
be
two
changes
that
need
to
be
made
to
add
this
feature
in
the
pathology
site,
and
then
there
is
safe,
QA
side.
In
the
pathology
side,
we
need
to
add
the
res
exercises
for
the
unit
test,
so
there
is
a
dictionary
which
means
things
in
key.
We
have
to
add
the
generative
stereotype
and
a
list
of
all
the
reject
says
that
you
wanted
to
identify
related
to
that
unit
test.
The
other
is
on
the
QA
section.
We
want
to
make
sure
that
the
orchestra
run
function.
C
C
Let's
look
at
the
code
so
in
the
pathology
repo
in
the
orchestra
section
directory.
We
have
this
in
run.pi,
and
here
we
have
a
error
scanner
where
you
can
add
a
new
unit
test
type
and
all
the
rejects
that
you
want
to
add
added
related
to
that,
so
how
this
flows
the
workflow.
This
is
that
in
the
QV
section
this
run
function
is
called
and
we
want
to
pass
a
scan
test,
error
reader,
and
that
is
all
that's
here
to
enable
it.
C
Changes
here
in
the
error
button
dictionary
now
in
the
safe
QA
section,
for
example,
I-
am
giving
an
example
of
S3
test
which
runs
the
news
test.
So
in
this,
when
we
call
this
run
function,
we
want
to
ensure
that
the
error
test
scanner
has
a
list
which
has
no
is
in
it
now
to
make
this
picture
obtained.
C
We,
however,
this
is
a
job
cable
file
and
if
you
have
a,
we
have
this
key
value
pair.
If
you
have
a
scan
logs
to
true
here,
it's
disabled,
and
if
it's
true,
then
in
the
SCS
Wi-Fi,
we
read
for
the
scan
of
it.
If
it's
there
in
the
conf
client
config,
the
client
config
is
this
and
and
if
it's
there,
it
searches
for
its
value.
If
that's
true,
if
it
does,
then
error
scanner
adds
close
to
it
and
that's
fast.
C
C
Because
if
that
went,
if
you
have
any
questions,
you
can
look
through
the.
C
Hey
you
can
look
through
the
dogs
in
the
beer
software
feature
improvements.
Currently
we
have,
as
we
just
saw
the
code,
no
stress
and
GPS
enabled
so
we
can
add
coverage
for
more
tests
and.
C
E
C
E
Question
a
quick
question
so,
first
of
all
a
great
job
Valerie,
this
looks
great.
How
did
you
check
for
both
negatives
or
false
positives
in
the
detection
of
errors.
C
So
I
did
a
test
run
and
I
took
that
file
in
the
duplicated
that
a
couple
of
times
manually
locally
in
the
same
file.
So
it
becomes
a
huge
file
and
and
I
changed
the
errors
in
each
purple
line
to
all
the
duplications.
So
I
could
see
that
the
flag
feature
isn't
actually
rereading
and
also
that
in
the
Air
Force
error
is
captured
and
it's
adding
some
old
errors
not
captured
so
just
to
test
that
the
error
is
initiated
at
the
right
place.
E
Right
yeah
I
think
it
it
can
be
really
useful
once
the
feature
is
enabled
that
at
least
for
some
transition
period,
the
Engineers
will
compare
the
that.
You
know
the
detections
of
of
the
errors
to
see
that
the
tool
is
working
correctly
and
have
some
improvement
in
case
needed.
E
So
I'll
have
some
like
humans,
looking
at
it
as
well
to
to
verify
that
it's
doing
what
we
expect.
C
C
E
Yeah
I,
just
it's
just
that
with
regex
many
times
there
are
so
many
patterns
that
we
can
miss
so
and
there
are
new
things
all
the
time.
So
it's
it's!
It's
really
good!
That
foreign
is
looking.
You
know
at
the
output
of
the
tool
so
that
that's
all
I'm
saying
hey.
A
D
Yeah
to
follow
up
with
that
I
think
Gary
made
a
good
point,
I
think
the
channel
where
users
of
this
feature
can
report
their
like
analysis.
Let's
say
if
they
have
like,
if
it's
not
doing
the
right
thing,
maybe
like
other
than
yeah
I,
think
they
can
record
it
through
a
Tracker
like
open,
a
Tracker
and
specifically
say
like:
oh,
they
opt
into
this
feature
and
then
it's
not
like
okay,
let's
say
he's
missing
some
cases,
so
yeah
I
think
I.
D
B
B
I
have
no
questions
Valerie,
but
I
just
wanted
to
say
it
was
a
great
presentation
and
I
think
this
work
is
very
valuable
and
it
can
save
a
bunch
of
time
going
through
lots
of
failures
and
helping
identifying
them
and
even
as
a
stretch,
goal
The
Next
Step
would
be
automatically
post
the
run
to
the
tracker
to
the
relevant
tracker.
If
there's
one
exists,
this
also
can
save
time,
but
but
thank
you
very
much
for
your
work.