►
From YouTube: CDS Hammer (Day 2) - CephFS: Forward Scrub
Description
http://goo.gl/U4b70r
29 October 2014
Ceph Developer Summit: Hammer
Day 2
CephFS: Forward Scrub
Greg Farnum, Sage Weil
A
C
C
So
we
started
with
this
forward
scrub
and
there's
some
code
up
for
review
now,
which
is
the
first
part
of
that.
We
just
put
it
and
I,
put
a
link
in
here
somewhere
to
the
to
the
action,
pull
request
and
it's
a
whip.
I
know'd
scrub
branch,
and
that
includes
a
bunch
of
mechanisms
that
make
it
putt
that
sort
of
promote
the
MDS
internal
operations
into
things
that
you
can
that
we
can
work
with
the
same
way
as
client
requests,
because
architectural
II
we've
had
a
problem.
C
The
metadata
locks
and
makes
it
needs
to
process
them
and
then,
if
it
hits,
runs
into
something,
then
we'll
put
them
on
a
list
of
I'm
waiting
for
this
lock
and
then,
when
it's
possible
to
get
that
lock,
then
the
request
will
get
retried
but
internal
operations,
all
sort
of
just
assume
that
you've
already
set
that
up
and
that
it's
going
to
succeed
and
so
there's
no
retry
or
anything.
So
we
have
a
bunch
of
infrastructure
that
allows
you
to
create
first-class
internal
operations
that
look
to
the
system.
C
Basically
like
client
apps
do
so
you
can
do
things
like
have
admin,
sockets
or
other
mdss
or
whatever
say
I
want
you
to
do
this
thing
and
it
will
gather
up
all
the
locks
and
then
you
know
wait
until
it
can
do
so
and
then
run
through
the
operation
study
did
that
infrastructure
and
then
the
actual
code
to
validate
a
single
inode
and
an
admin
socket
interface.
That
lets
you
say
I
want
you
to
scrub,
slash
home,
slash,
greg
and
it'll
go
and
look
that
up
and
say:
oh,
that's
it
and
it'll.
Look
at
that!
C
Look
up
that
that
DN
treeless
or
that
path
and
will
say.
Oh,
this
is
a
directory
I
know'd
so
it'll
go
through
and
will
validate
from
the
back
trace
on
the
directory
object
and
that
the
directories
are
stats
Matt
or
the
directory
self
consistent
with
respect
to
its
our
stats
and
things
like
that
and
work
is
started
on
what
I'm
calling
the
scrub
stack,
which
is
how
we
sort
of
convert
the
the
idea
for
the
algorithm.
C
We
have
for
a
long
time
into
actual
working
for
sort
of
recursively,
doing
this
to
everything
in
the
system,
how
we
convert
that
into
actual
working
code,
but
that's
not
in
not
ready
for
a
PR,
yet
I'd
been
hoping
to
have
it
in
before
the
session,
but
a
lot.
Several
things
were
going
over
the
last
week
week
and
a
half,
but
in
the
blue
fur
and
I
have
to
find
sort
of
the
key
data
structures.
C
Not
sure
how
much
of
that
is
appropriate,
or
we
should
talk
about
now
versus
doing
more
interesting
things
like
talking
about
the
stuff
that
we
haven't
planned,
that
as
well
is
John
here:
hi
John,
hey,
hey,
Tim,
okay,
thanks
for
waking
up,
we
appreciate
it
from
the
United
Kingdom,
so
I
guess
was
it
last
Thursday
several
of
us
sat
in
a
room
together
and
started
talking
about
how
we
could
set
up
the
reverse
or
not,
actually
not
how
we
can
set
up
the
reverse
scrub.
C
We
talked
a
little
bit
about
it,
but
a
lot
of
it
actually
was
just
sort
of
like
what
are
the
individual
pieces.
We
can.
We
can
build
to
start
recovering
from
certain
cases
we
identify
that
will
be
useful
as
pieces
for
the
final,
the
final
backwards
scrub
as
a
whole,
and
we
talked
about
that
for
a
while
and
I
really
want
to
pass
this
off
to
John
or
sage,
because
they
like
actually
wrote
on
tickets
and
I
did
not.
D
D
Those
recovered
paths
I
think
quite
soon,
after
that,
we're
probably
going
to
want
to
insert
via
via
running
MDS
as
well.
It
kind
of
sucks
to
have
to
take
your
whole
file
system
down
and
flush
your
journals
in
order
to
in
certain
the
updated
master
data.
So
I
think
that
will
probably
have
a
similar
kind
of
infrastructure
requirements.
The
the
ability
to
inject
that
recovered
metadata
by
some
kind
of
backdoor.
D
C
C
So
it's
actually
possible
to
I
think
we
need
to
write
a
new
filter
class
and
then
or
I
mean
it's
not
a
class
but
write
it
write.
A
new
filter
function
that
we
can
call.
It
says
you
know
decode
the
factor,
X
rating
filter
and
then
and
we
pass
along
the
data
to
it,
yep
and
it
will
select
yeah
so
that
filter
will.
When
you
do
a
PG,
LS
filter,
then
it'll
look
at
the
back,
trace
on
every
object
and
be
coated
and
compare
it
to
the
one
we've
sent
along
and
hell.
A
B
D
A
B
B
B
Let's
see
I
think
the
other
thing
that
we
we
talked
about
was
the
idea
of
taking
a
backup
or
a
snapshot
of
your
metadata
pool.
Before
you
start
doing
any
of
this
scrubbing
repair
stuff,
we
talked
about
fixing
rados,
export
and
import
yesterday
and
I
think
everything
that
we
talked
about
their
should
apply,
but
you
could
actually
make
a
full
backup
of
your
metadata
pool,
including
all
the
omap,
all
the
right,
metadata,
stuff,
I
think
also.
B
C
D
If
we're
choosing
between
the
light
I
lean
strongly
toward
making
it
back
up
but
I,
don't
know,
I
mean
thats
case
kind
of
a
subjective
thing.
I
knew
my
feeling
is
that
once
you're
in
disaster
recovery
land,
you
want
to
be
using
something
as
brutish
and
simple
as
as
possible.
Yeah.
B
Yeah,
well
that
so
that
the
rain
us
export
is
needed
for,
like
all
manner
of
other
things.
Also,
oh,
it's
totally
non
specific
to
this,
so
we
could
number
free,
but
the
pool
snaps
also
I
mean
the
only
thing
that
we
would
need
to
make
pull
snaps
a
viable
way
to
do
it
to
make
roll
back.
That's
all
yeah!
Well,
I
mean
it's
easy.
You
list
objects
and
you
call
roll
back
on
every
oh.
B
C
Oh
right,
okay,
sorry
I
was
going
to
say
so.
I
actually
did
have
a
couple
of
things
that
well
no.
There
was
one
important
thing,
but
that
we
haven't
talked
about
much.
That
actually
would
be
good
for
forward
scrub,
which
is
the
about
how
we
should
be
surfacing
strawberries
to
administrators
for
them
to
do
anything
with
yes,
I
mean
we
don't
want
to
crash
the
MBS,
but
we
can't
just
throw
it
to
the
central
log.
We
don't
so
I
think.
Maybe
what
we
need.
We
need
to
like
the
MDS
f
health
checks.
D
So
that
I
mean
that
that
relies
on
on
the
ability
to
have
something
that
is
going
to
keep
poking
whatever
the
health
check
is
to
keep
populating
it
in
the
beacon.
I
think
the
most
challenging
thing
about
surfacing
scrub
errors
is
going
to
be
having
them
indexed
in
such
a
way
that
when
you
go
and
repair
an
error,
you
can
clear
the
the
flag
forum
or
for
the
defect.
D
C
It
sufficient
to
just
like
have
a
an
error:
pin
that
pins
anything
we
find
a
scrub
your
own
in
cash
and
like
is
an
excellent
or
something
so
we
can
just
immune
rate
them
on
every
health
warned
because
that'll
prevent
them
from
being
flushed
out
right,
so
you
actually
maintain
both
versions
of
it.
I.
B
Mean
you
could,
but
then
you
have
to
put
it.
You
have
to
put
an
excellent
nub
thing
and
every
and
every
object
and
that's
yet
they're
thinking
that
you
never
gonna
use
I
would
be
more
inclined
to
just
set
a
flag
so
to
do
the
preventing
part.
So
the
thing
that
we
talked
about
the
other
day
that
I
like
some.
B
C
C
That
that
we
actually
find
a
bad
scrub
where
it's
something
like
what
we
have
disagrees
like
what
like.
We
had
something
in
memory,
and
it
totally
disagrees
with
us
on
disk
because,
like
our
back-trace
versions,
don't
match
or
something
or
like
you
know,
our
battery
surgeons,
that
our
impact
rates
diverge
with
their
the
same
version
or
whatever.
We
don't
want
to
overwrite.
What's
on
bits,
because
we
don't
actually
know
what
them
to
correct.
D
So
I
think
you
need
to
having
the
log
is
important,
but
once
you've
generated
that
log
in
an
inside
of
running
MDS
you're
going
to
also
have
to
have
an
in-memory
version
of
that
in
order
to
serve
up
like
a
iOS.
When
people
try
and
access
those
parts
you
have
to
have
something
in
the
end
you
cash,
even
if
it's
not
if
the
persistence
might
just
be
the
log
but
you're
still
gonna
have
to
have
an
in-memory
thing.
D
B
If
we
want,
if
we
pin
the
things
that
we
find
errors
on,
then
we
we
don't
need
to
use
an
excellent
or
a
cell
yeah
cuz,
the
premise
tables
that
one
need:
okay,
that
would
be
simpler
and
then
and
then
you,
if
it's
like
a
set
that
in
stl
set
of
you,
know,
India's
cache
object
star
or
whatever.
Then,
though,
health
warning
can
actually
say,
there
are
37
items
that
are
no.
C
D
There
is
an
assumption
running
through
this
stuff
that
the
set
of
nodes
on
which
we
find
Ford
scrub
errors
will
fit
in
RAM,
which
is
which
is
fine.
We
just
have
to
bear
it
in
mind
and
maybe
set
some
kind
of
artificial
planet
on
the
size
of
these
structures,
so
that
in
pathological
cases
we
don't
we
don't
fill
up
the
ramp
on
the
machine.
I
mean.
B
B
B
The
first
case,
where
it's
just
like
a
missing
directory
object
like
I,
don't
know
that
it
matters
that
we
have
the
it
pinned,
but
in
the
second
case,
certainly,
we
would
because
we
would
want
to
know
later.
If
we
do
happen
to
find
that
we
come
across
another
primary
link
to
the
same
file
that
does
exist.
Then
we
would
have
it
have
the
first
one,
perhaps
so
that
I
think
that
it
might
be
useful.
C
C
D
A
D
Okay,
I've
played
with
it,
but
I'm
not
I'm
not
claiming
to
join
a
comprehensive
java,
and
I
think
he
should
look
at
it
too.
Yeah.