►
From YouTube: 2018-08-02 :: Ceph Code walk-through: Intro to Teuthology (not strictly a code walkthrough)
Description
Presented by: Josh Durgin
Every month the Ceph Developer Community meets to discuss one aspect of Ceph code, to spread knowledge of how it works and why it works that way.
This monthly meeting will occur on the last Tuesday of every month via our BlueJeans teleconferencing system. Each month we alternate meeting times to ensure that all time zones have the opportunity to participate.
http://tracker.ceph.com/projects/ceph/wiki/Code_Walkthroughs
D
B
B
Currently,
though,
the
first
step
to
getting
anything
tested
is
to
push
your
branch
of
the
stuff
like
it
repository
to
get
up:
daikon,
/,
f,
/,
SF,
CI
and
any
branch
there
will
automatically
get
sent
to
ramen,
that's
F
calm,
which
will
go
ahead
and
kick
off,
builds
and
generate
packages
for
sent
to
us
in
bun
in
different
combinations
that
we
need
to
run
tests
on
them.
You
can
see
kind
of
latest
packages
and
builds
going
on
over
time.
B
Url
shaman
in
kind
of
currency,
and
if
you
wanted
to
down
download
those
manually,
you
could
actually
go
and
find
out
the
URLs
for
the
individual
packages.
But
in
practice
you
you
don't
gonna
need
to
worry
about
that.
You
just
push
your
branch
or
to
FCI,
wait
until
the
weavers
our
builds,
and
then
you
can
start
running
pathology
jobs
which
wish
go
down,
go
and
grab
those
packages.
B
Before
it
pathology,
testing
stuff,
basically
or
most
changes
and
stuff,
whenever
before
we
merge
PR,
you
will
things
up
into
a
test
branch
push
this
sei
way
for
its
build
and
then
kick
off
a
test
suite
launched
by
$2
G.
B
You
can
see
that
more
these
on
the
CT
lab
on
the
theology
that
EPA
that
stuff,
like
a
machine,
even
take
a
look
at
my
home
directory
and
see
I,
think
I
said
up
there,
but
basically
on
that
machine.
There's
a
simple
machine
wide!
That's
etiology
that
you
know
file,
but
you
can
think
of
all
the
settings
needed
for
it
to
connect
to
the
lab
cluster
and
schedule
jobs.
There.
B
Though,
and
once
you've
read
checked
out
on
that
machine
and
your
branch
is
pushed
to
fci
and
Packers
have
built
from
it
I'm
trying
to
Sweden,
he
was
saying
you're,
basically,
just
are
just
running
a
topology
suite
command
and
specifying
which
branch
it
read
running
basically
off
of
and
it'll
add
a
bunch
of
jobs
to
like
you,
which
we
are.
You
else
finds
it
never
say
web
interface.
For
that
II.
Do
that's
after
I.
Come
take
a
quick
look
around
there.
B
So
poppy
do
has
an
emperor
view,
both
the
jobs
that
are
scheduled
as
well
as
individual
speedruns.
So
you
can
take
a
look
at
the
current
queue
and
you
can
see-
and
these
are
various
suites
that
have
been
earned
the
queue
right
now.
The
key
was
very
simplistic.
It's
just
friends
pasted
in
order
with
the
thirty
field,
and
you
can
see
who
scheduled
these
jobs
all
these
ones
scheduled
by
ethology
or
just
automated,
runs.
B
B
So
if
you
go-
and
you
can
also
kind
of
look
at
past-
runs
in
copito,
you
can.
B
B
Julie
I'm
into
the
results
of
a
given
sweet.
So,
for
example,
this
one
you'll
see
the
results
of
each
job
for
the
results
of
all
the
jobs
are
stored
in
the
CPA
cluster.
There's
a
that
is
actually
a
stuff
FS
file
system,
that's
holding
all
of
the
job
results,
but
he
can
also
access
them
through
this
open
your
face
here.
If
you
even
go
into
one
of
these
failed
jobs,
you
can
give
links
to
the
two
villagey
that
live,
which
is
basically
that
the
output
from.
B
B
C
B
B
B
B
B
B
And
to
theology,
when
you're,
when
you
say
to
the
sweep
we'll
go
check
out
a
copy
of
this
f+
from
from
the
branch
that
you're
scheduling
against
I,
get
this
the
sweets
from
there
generated
jobs
from
that
you,
those
jobs
up
and
then
run
those
jobs
with.
E
B
B
But
I
guess
we
can
say
yesterday
looking
at
what
what
the
structure
of
the
do
that
input
files
is
here.
So
basically,
the
sweets
are
composed
of
a
number
of
my
fragments
that
are
all
concatenated
together
and
merged.
B
So
the
general
format
of
the
input
is
a
bunch
of
email
files
or
one
yellow
file
has
a
list
of
tasks
and,
for
example,
this
one
uses.
This
task
called
the
work
unit
task
which
basically
just
downloads
some
scripts
from
the
Ceph
repository
and
runs
the
shell
scripts
against
a
stuffed
cluster
that
has
already
been
set
up
by
an
earlier
piece
of
this
suite.
B
So
these
were
these.
These
working
scripts
are
one
example
of
something
that's
very
severe
easy
to
run
as
I
have
a
technology
environment.
You
could
run
these
against
a
visto
cluster
or
another
stuff
cluster
that
you
already
have
setup.
They
generally
only
assume
that
you
have
a
stuff
cluster
with
a
kind
of
client
key.
B
Little
more
complicated
in
the
fashion
sub
suite,
because
it's
using
mainly
this
travesties
task
here,
which
is
part
of
run
by
technology,
to
orchestrate
the
cluster
and
inject
various
kinds
of
failures
in
the
background,
while
other
tasks
are
going
on.
So
this
kind
of
test
is
more
difficult
to
run.
Sio2,
ecology.
B
But
looking
back
to
kind
of
what
the
overall
configuration
looks
like,
though,
let's
take
a
look
at,
for
example,
this
original
configuration
your
age
to
get
that
config
that
yan
will
file
from
this.
This
job
and
that'll
give
you
an
idea
from
what
what's
needed
for
it
to
run
in
a
visual
test,
but
the
pathology
though
there
it's
a
whole
bunch
of
extra
stuff
for
these
Java
scheduled
mr.
sweet.
B
Most
of
this
isn't
necessary
if
you
wanted
to
run
a
job
manually,
the
things
that
are
necessary
for
learning
it
manually
are
just
a
list
of
roles
which
basically
means
it.
These
are
just
like,
what's
going
to
be
run
on
different
machines,
so
this
means
that
you
have
one
machine
running
mother
de
Mond
at
sea
and
3os
T's.
Second
machine
running
another
monitor
a
manager
and
Rios
T's,
a
third
machine
which
just
has
that
client
keyring
career
done.
It.
E
B
B
That
this
you
know,
syntax,
is
kind
of
funky
with
exactly
a
list
of
lists
here.
So
the
first
annotation
levels
like
one
list
and
there's
a
few
different
ways
to
write
this.
You
could
have
instead
of
saying
I
like
two
dashes
for
the
list.
You
could
say:
yeah
Shannon
said
mind
that
a
mom
dad
be
there
and
more
like
I'd
casein
stylist
a
little
bit
clearer.
B
B
B
Override
section,
so
this
is
over
I
Section
is
used
kind
of
extensively
within
the
test.
We
just
to
I'd
be
able
to
add
extra
configuration
to
various
tasks
without
having
to
kind
of
duplicate
that
configuration
everywhere.
You
can
kind
of
override
one
setting
in
one
yellow
file
and
that
it'll
become
it'll,
be
combined
later
with
another
yellow
file
without
having
to
duplicate
that
configuration.
B
The
the
general
format
other
the
overrides
is
that
these
are
all
these
always
could
get
merged
into
the
configuration
of
tasks.
So
everything
below
the
set
this
stuff
label
here
would
get
merged
into
the
configuration
of
the
stuff
tasks
and,
similarly,
a
little
bit.
We
have
an
overrides
to
meet
for
the
install
task,
which
is
telling
us
Billy
installed
tasks
to
add
some
extra
packages
that
namely
RBD
NBD
in
this
case,
and
which
version
fun
to
install
this
could
be
by
sha-1
or
by
branch
name
or
by
tag.
B
The
other
task
that
pretty
much
every
test
is
going
to
use
is
the
Steph
task
which
goes
and
install
and
sets
up
the
stuff
cluster,
though
after
the
packages
are
installed,
are
installed
everywhere,
and
this
Steph
task
is
the
one
who
looks
at
the
roles
and
sets
up
the
monitors
in
the
OSD.
Is
the
managers
and
the
client
keys.
B
B
C
B
B
It
was
like
the
running
for
boss
mode
just
for
extra
key
button,
I'm
so
I'd
resv.
You
testify
this
week
to
run
so
you
might
run
that
their
rate
of
speed
and
then
you'd
specify
the
version
of
the
branch
that
you
want
to
run
against
they.
This
might
be
say
if
you're
doing
it
backward
like
with
luminous.
B
Exertion
thing,
and
just
by
specifying
that
branch
it
will
brat
a
little
which
one
it's
a
cue
run
as
technologies,
we
command
it
will
clone
the
suppository
from
that
branch
to
you
know
wherever
you're
running
this
command
in
order
to
read
the
QA
speed
from
that
branch
and
this
path
and
specify
that
job
that
exact
branch
name
in
the
job
should
submit
its
queuing
up
for
the
packages
for
that
from
that
branch
will
be
installed
and
also
disinfect
some
sanity
checking.
B
C
B
C
B
B
But
if
there's
a
if
it's
a
foreground
task
like
it's
running
a
bunch
of
functional
unit
tests,
and
it
fails
that
immediately
than
that,
then
that
the
job
will
will
fail
across
the
entire
suite
the
the
test
that
individual
test
runs
and
jobs
are
independent
of
each
other.
So
if
one
job
fails,
the
rest
will
still
try
to
run.
B
Exactly
and
it's
only
detect
and
something
I
really
listen
to
the
first
failure
there.
So
the
first
thing
that
detects
is
wrong.
So
if
there
is,
for
example,
one
of
the
things
that
checks
is,
it
looks
at
the
cluster
log
at
the
end
of
every
of
the
entire
test
to
see
if
there's
any
areas
that
aren't
supposed
to
be
there
and
there's
the
guy
there's
a
wait
list
so
that
tests
that
are
supposed
to
introduce
errors
can.
B
B
And
one
of
the
thing
I
wanted
to
mention
about
and
scheduling,
is
that
these,
the
the
weight
of
these
suites
are
structured
and
they're,
made
out
of
many
different
fragments
of
gamo
files
that
are
combined
in
all
possible
combinations.
Basically,
so
there's
kind
of
a
combinatorial
explosion
of
the
number
of
jobs
based
on
all
those
different
that
giant
matrix
of
different
settings.
What
we
typically
do
is
we
sample
that
matrix
and
we
run
a
subset
of
it
at
a
time.
B
So,
for
example,
for
the
rate
of
suite,
if
I
tried
to
run
the
entire
thing,
it's
probably
five
hundred
thousand
jobs
or
anything
like
that.
Typically,
we
use
the
subset
parameter
to
run
say
you
can
specify
they
run
one
out
of
thousands
or
one
out
of
ten
one
hundred
tubs
not
running
a
hundred
thousand
jobs,
but
you're
running
a
reasonable
sample
of
them.
And
if
you
went
through
0
through
99
out
of
100,
you
would
have
run
exactly
every
single
configuration
possible.
B
The
nerf
or
I
scheduled
a
suite
I
always
like
to
do
it.
I
dry
run
first
just
to
make
sure
I'm
not
going
to
schedule
like
thousands
and
thousands
of
jobs
at
once,
and
then
you
can
adjust
the
subset
and,
as
appropriate
and
further
I
give
in
sweet
sweets.
Don't
need
a
subset,
because
they're
small
enough
that
you
can
just
run
all
the
all
the
configuration
in
them
every
time,
but
the
radius
feed
is
one
of
those
where
you,
even
if
they
want
to
every
three
tip,
which
is
something
to
keep
in
mind.
C
B
Yeah,
so
technology
kill
and
there's
a
few
things,
and
so
how
these
these
kick
me.
They
end
up
running
eventually
answer
jobs
are
processed
and
pulled
out
of
the
queue
they're
run
by
a
different
worker
on
this
machine,
our
different
UNIX
user.
So
you
would
need
a
sudo
permissions
to
be
able
to
kill
those
processes
from
that
other
user.
B
B
But
I
guess
general:
if
you
were
doing
some
kind
of
performance
related
tests,
its
it's
more
important
to
figure
out
what
you
want
to
use.
Smith,
use,
honesty
and
Miri
is
older,
hard
disk
hardware
and
you
also
wanted
taking
into
account
them,
and
you
can
look
at
the
current
state
of
the
queue
and
papito
and
see
what's
already
queued
up
and
you
oftentimes
there's
a
whole
bunch
of
I
got
automatic,
runs
waiting
for
smithy
machines,
but
you
can
schedule
against
mirror
machines
and
it
look
your
Tesla's
to
kick
off
faster.
B
Something
very
urgent
to
run,
and
you
can
add
me
priority,
feel
then
run
run
young
test
with
say,
if
I
already
100,
which
will
get,
which
means
that
your
test
look
at
run
before
the
automated
jobs,
which
I
think
are
scheduled
at
priority.
1000,
though,
if
you
have
anything
urgent,
you
can
do
that
then
you're,
just
so
kind
of
go
to
the
front
of
the
line.
C
Also
seem
that
some
tests
are
flapping
or
you
I
mean.
If
you
have
a
look
at
the
branch,
the
number
of
branching
grid
are
by
far
you
know
much
more
than
than
the
green
ones.
So
is
this
I
mean
there
is
some
you
know,
agreement
on
when
a
pass
around
can
be
considered
as
past
or
tail
depending
on
which
tests
are
failing.
B
B
But
that
will
often
be
that
they'll
present
with
the
same
kind
of
back
trace
or
our
failure
mode,
and
you
can
search
in
the
tracker
for
that
exact
failure
and
see.
Okay.
This
is
a
known
issue.
It's
on
unrelated
to
my
change,
because
it's
a
it's
a
totally
different
system,
that's
I'm,
unaffected
by
it.
B
And
they're
also
you're
also
occasionally
not
failures
for
exam
from
the
11
infrastructure
or
from
github
I'll
go
seasonings.
Things
like
if
you
see
something
like
packages
failing
to
install
because
the
the
mirrors
timed
out.
That's
obviously
not
the
fault
of
your
code
know
or
if
get
downloading
a
test
from
github
fails
because
get
a
write
down
or
sorry
to
read
them
with
us
for
some
reason,
then,
that's
obviously
not
I
feel
you're.
The
curator.
B
So
when
you're
scheduling
it
tells
it
you
tell
it
which
branch
to
use
and-
and
that's
where
the
quiz
to
install
the
packages
it's.
But
it
was
looks
of
that
branch.
I
mean
in
shaman.
Shaman
has
an
API
that
it
uses
to
find
out
where
the
where
the
repository
is
a
birth
packages,
for
this
branch
are,
though
configures
them
fiends
and
whether
they're
sent
to
us,
or
we
went
to
to
use
that
repository
and
installs
packages
from
there.
C
B
Yeah
so
and
I
guess
in
general,
can
use
tend
to
mean
made
either
to
a
few
different
areas
so
like
like
I
was
mentioning
before
there
is
that
they're
kind
of
sweets
do
different
areas
and
stuff
like
their
sweets
for
rbdr,
so
I
profess
for
rgw
and,
for
example,
within
the
greatest
feed
there
is
there
ones
for
more
specific,
the
monitor
or
more
specific
to
the
manager.
C
B
B
B
That's
you
time
like,
and
so
whenever
there
is,
there
was
a
failure,
usually
its
duty
like
a
command
failing
or
something,
and
it
generates
a
trace
back
in
that
tooth,
ology
log
explaining
exception
why
this
failed,
though,
in
this
case,
have
a
command
failed
error,
which
is
please
the
explanatory.
We
was
running
a
command
and
that
command
exited
in
non
zero.
B
In
this
case.
Yes,
you
may
see
that,
like
a
bunch
of
kind
of
boilerplate
for
geology-
and
you
can
kind
of
ignore
this
address-
you
limits
coverage
stuff,
that's
kind
of
the
coverage
stuff
in
particular
it's
a
relic
of
the
code
coverage
that
doesn't
really
function
anymore,
and
then
you
have
the
actual
command
its
retina,
which
in
this
case
is
the
Ceph
test
of
our
BFS
x.
Rb,
stress
test.
B
So
basically-
and
we
get
the
standard
I've
put
in
a
standard
error
from
I'll-
make
all
these
brands
that
are
being
run,
and
this
particular
case
I'm
familiar
with
our
our
EDF
a
sex
test,
and
this
is
the
output
from
it
here.
B
D
B
That's
the
excellent
point,
but
in
this
case
it's
just
as
a
test
man
that
doesn't
do
that
trace
like
that:
it's
not
pretty
deliberate
in
stuff
dude,
but
if
there
was
a
back
trace
from
got
a
demon
or
something
I'm
searching
for
separation,
it's
a
good
way
to
find
that.
B
Run,
who
that
should
the
failure
reason
as
look
at
the
you
can
also
look
at
the
failure.
Reason
in
this
summary
that
Yan
will
file
meets
these
haploid
directories.
This
is
the
same
thing
that's
displayed
in
papito.
It
reports
it
something
times
out
waiting
for.
It
happens
like
it
to
appear
after
OSD
dead,
three
restart,
and
this
typically
means
that
LSD
three
crashed
there
should
be
a
trace
back
and
there
in
the
geology,
log
or
in
the
West
II
logs
themselves.
I.
B
Their
failed
in
this
case,
there's
a
isn't
one
fit
one
exception.
While
it
was
reconnecting
the
machines
which
isn't
didn't
cause,
it
has
to
fail
at
that
sigh.
Just
remember.
We
tried
afterwards
and
here's
where
got
that
error
waiting
for
the
gyro
is
c3e
to
be
started
if
I
search
backwards
from
here
for.
B
Again,
this
fact
from
the
USD
showing
which
is
hurt,
we're
hitting
and
where
that
came
from.
B
So
if
you
want
to
investigate
further,
you
could
take
a
look
at
the
remote
directory.
First
Matthew
145
I
need
to
see
that
cord
up
there.
B
B
C
C
B
So
the
brought
the
jobs
they're
scheduled
through
the
sweets
and
queued
up,
there's
not
a
good
way
to
do
that,
because
I
would
keep
those
machines
around
and
unavailable
for
too
long.
If
you
ran
it,
I
guess
we'd
that
had
a
whole
bunch
of
failures
like
that,
and
it
could
just
lock
up
a
whole
bunch
machines
for
a
really
long
time.
Instead,
what
we
usually
do
is
we
would
would
if
there
was
a
some
kind
of
crazy
bug
like
that,
we
need
to
go
in
and
investigate
that
interactively.
B
We
would
done
last
couple
machines
manually
run
that
same
UML
file
that
that,
for
the
job
that
failed
manually
on
those
machines
and
and
there's
that
and
when
you
burn
you're
running
it
manually,
there's
a
parameter,
you
can
add
to
your
yellow
file
called
interactive
on
error.
You
said
that
the
true,
then
the
job
tests
will
pause
when
it
hits
an
error
and
you
can
go
and
inspect
the
machines.
At
that
point,.
B
So
Ryan
something
manually
first,
you
want
to
do
with
some
vodka.
Coke
machines
insist,
we
can
just
say,
run
a
simple
test
on
one
machine.
For
now
the
lock
machine
you
use
the
to
follow,
Jesus,
not
command.
This
form
is
just
saying
block,
many
say
how
many
you
wanted
lock
say
one
and
you
can
say
I,
think
it's
machine
type
say
lock.
I'm
your.
B
B
B
This
has
absolutely
small
number
options,
at
least,
and
typically
you
just
want
to
specify
proposed
modes
ticket
or
in
you
look
that
way.
Something
goes
wrong
and
you
want
to
say
I
saved,
but
if
you
want
to
save
the
log
files
after
the
test
is
complete,
as
well
as
the
output
from
the
to
theology
command
itself.
Now
you
can
add
an
archive
directory.
B
B
Then
the
other,
the
other
thing
you
need
is
your
mo
file,
since
we
have
all
of
our
information
about
this
test,
both
the
machines
that
we're
going
to
use
the
roles
they're
like
those
machines
and
the
tasks
in
one
file,
that's
all
we
need.
If
we
had
those
more
than
one
file,
we
could
specify
more
files
here.
B
But
it
a
consequence
of
bringing
against
master
to
unstable
branches,
so
running
against
master
and
seal
branches
is
slightly
different
because
master
and
other
stable
branches
aren't
in
this
FCI
repository
there
in
the
regular
get
up
to
calm
stuff
stuff
story,
though
now
mo
file
we
have,
we
want,
we
don't
want.
B
B
B
And
if
it's
archive
directory,
you
already
exist,
but
it'll
it'll,
stop
it
so
I
just
remove
it.
First.
B
B
May
not
see,
as
you
can
see,
it's
fine
to
a
Python
frontier.
This
is
a
sample
of
one
of
those
failures
that
were
deposit
to
go
into
interactive
mode,
and
this
case
since
the
the
it
failed.
Don't
even
connect
the
Machine
there's
not
much
I
can
do
I'm
just
going
to
control,
be
out
of
that
Python
front
thing.
I
end
the
test.
So
let's
see
what's
happening
here,
I
try
manually
escaping
into
that
machine.
B
If
it's
not
specifying
uses
the
default
of
I,
think
after
what
the
default
is
and
the
vceo
we
didn't.
C
B
C
A
D
B
La
commission
machine
with
at
the
beginning
here
and
unlock
it
when
the
exits,
maybe
the
mirrors,
are
having
problems
right
now
we
try
smoothy.
A
B
B
Either
have
a
ask
method,
or
they
will
be
a
class
with
a
setup
and
teardown,
which
is
their
entry
point,
and
the
back
string
for
that
ask
will
have
information
about
what
kind
of
options
it
has,
and
example.
This
is
the
interactive
tasks.
It's
just
a
single
pass
method
which
doesn't
have
any
options.
B
B
You
could
specify
which
file
system
to
use-
and
you
can
add
different
configuration
and
settings
that
you
would
be
added
to
as
if
that
comm
file,
that
kind
of
thing,
but
in
general,
if
you
want
to
see
more
about
what
exactly
is
going
on,
you
can
go
and
look
at
what
these
tasks
are
doing,
what
options
they
take
and
go
from
there.
So
I
think
that's
been
about
everything.
I
wanted
to
cover
today.
I
feel
there's
to
lots
of
questions
and
feel
free
to
we
catch
me
after
this
as
well.
C
B
Yeah,
so
we
at
least
used
to
be
gathering
a
bunch
of
those
into
a
graph
Anna.
It's
not
sure
that
that
still
is
working
anymore.
We
do
have
a
performance
suite
which
they
had
just
last
year,
which
does
gather
more
of
those
in
from
bad
stuff,
based
on
like
using
crack
bail,
but
in
general,
that's
an
area
where
we
could
improve
a
quite
a
bit.
B
C
B
I'm,
not
sure
I,
don't
really
use
that
century
myself
that
much
you
don't
find
it
that
that
useful
I'm,
not
sure
if
others
do
I
think
the
idea
is
that
it's
just
like
collect
with
failures
and
show
you
when
you've
seen
the
same
failures
but
I
think
in
the
past.
It
hasn't
been
up
to
me
at
least.