►
From YouTube: 2016-FEB-25 -- Ceph Tech Talks: CephFS
Description
A detailed update on the current state of CephFS as we prepare for the Jewel release.
http://ceph.com/ceph-tech-talks
A
Welcome
everyone
back
to
the
second
installment
of
SEF
tech
talks
of
2016.
If
you
missed
last
month,
we
had
a
great
one
with
the
with
the
folks
doing,
postgres
sequel
on
SEF
under
my
sauce
with
Aurora
and
dr..
So
that
was
pretty
pretty
awesome.
Conversation
this
this
a
month
we're
having
another
great
one,
we're
having
a
cephus
update.
This
will
be
the
last
major
kind
of
SEF,
a
fesse
update.
A
We
see
here
before
the
the
next
major
release
of
jewel,
which
will
include
sefa
fests
and
all
the
new
hotness
moving
from
the
nearly
awesome
camp
into
the
fully
awesome
camp.
So
we're
all
very
excited
about
that.
So
without
further
ado,
I
will
let
John
give
you
a
rundown
of
what's
new
in
sefa
fests
and
where
we're
headed,
no
pressure.
B
Right,
okay,
so
hello,
everyone
thanks
becoming
I'm
a
developer
at
red
house
and
I
work
primarily
on
the
South
file
system.
I
will
just
share
my
screen,
so
you
can
see
my
slides.
B
While
I've
got
my
slides
up,
I
won't
be
able
to
see
the
blue
jeans
chat.
So
if
anyone
else
has
any
questions
go
ahead
and
ask
them
there
and
maybe
Patrick,
if
you
just
interrupt
me
verbally
if
that
comes
up
sure
so
today,
I'm
going
to
give
a
very
brief
recap
of
what's
ffs
is
and
what
its
architecture
is,
but
primarily
I
want
to
talk
about
what's
new
in
the
jewel
releases
FS,
which
is
what
we've
been
working
on
over
the
past
six
months
to
a
year.
B
B
So
the
surface,
the
file
system,
interface
to
Seth,
is
13.
Application
is
that
you
get
with
a
South
cluster
go
from
left
to
right.
We
have
the
radars
gateway,
which
provides
an
s3,
calm,
basketball,
object,
interface
to
the
cluster
rbd,
the
Raiders
plot
device
that
provides
a
disk
image
type
access
to
the
cluster
and
then
surface.
B
Although
three
sit
on
top
of
radius,
which
is
the
underlying
resilient
object,
store
that
surfers
built
on
so
surface,
is
a
POSIX
file
system.
What
that
means
is
it
has
a
high
level
of
consistency
than
a
to
the
pool,
NFS
or
file
system?
Would
you
get
all
the
same
semantics
when
it
comes
to
locking
and
concurrency
that
you
would
on
a
local
file
system
like
EXT,
for
the
data
is
stored
directly
in
the
radars
cluster?
B
So
data
goes
directly
from
where
you've
mended
your
file
system
to
raid
us
doesn't
go
through
any
intermediate
service
for
the
file
system.
We
have
a
separate
metadata
server
called
the
ceph
MDS
that
can
act
in
a
cluster
to
spread
the
load
of
filesystem
metadata
across
multiple
servers
in
a
water
bottle
neck,
and
we
do
a
little
bit
more
than
the
average
POSIX
file
system.
B
So
in
additional
to
normal
file
system
operations,
we
have
a
couple
of
special
editions
which
are
the
ability
to
take
per
directory
snapshots
of
the
file
system
and
also
maintaining
recursive
statistics.
So,
for
example,
you
can
look
at
the
statistics
on
a
directory
and
your
file
system
and
see
the
total
size
of
funds
within
it
without
having
to
run,
for
example,
DF
to
iterate
through
the
file
system,
which
is
a
comparatively
expensive
thing
to
do
in
a
network
file
system,
so
in
visual
form
at
the
top
of
the
diagram.
B
B
It
sends
the
metadata
to
special
metadata
servers,
which
are
represented
by
little
three
icons
in
this
diagram,
and
it
sends
the
data
to
OS
DS
directly
the
motivation
for
certain
pests
or
distributed
file
systems
in
is,
firstly,
you
have
a
lot
of
existing
work
plates
that
expect
to
file
system
and
those
workloads
aren't
going
away.
You
might
have
some
applications
that
will
use
an
object
store,
but
you
will
always
well
for
at
least
for
a
long
time.
B
You'll
still
have
file
system
workloads,
fun
systems
are
useful
because
they're
very
familiar
to
everybody,
as
well
as
having
applications
that
already
depend
on
them.
You
also
have
system
administrators
who
know
how
to
deal
with
them,
and
you
have
other
storage
systems
that
know
how
to
interoperate
with
the
file
system,
such
as
backup
systems,
not
just
legacy
systems,
but
also
new
systems
use
file
systems,
for
example,
in
emerging
container
environments
like
docker
what
they
call
volumes.
Their
units
of
persistent
storage
are
of
themselves
file
systems.
So
what
we've
gone
through?
B
So
the
reason
we
don't
use
a
phone
system
for
everything
is
that
they're
actually
harder
in
some
ways
and
other
ways
of
accessing
storage.
Unlike
an
object
store.
We're
not
dealing
with
a
flat
representation
of
the
data
and
the
pieces
of
data,
we're
storing
the
files
and
the
directories
are
not
independent.
They
have
these
relationships
to
one
another
before
hierarchy,
and
that
means
that
spreading
them
out
across
the
cluster
of
servers,
it's
a
more
challenging
problem
than
doing
a
similar
thing
with
object
store.
B
It's
also
challenging
to
deal
with
some
applications.
If
you've
got
an
application
which
is
written
against
a
local
file
system,
it
has
expectations
about
latency
and
performance
that
don't
really
make
sense,
necessarily
in
a
distributed
high
scale,
environment.
The
classic
example
is
when
people
like
to
run
LS
dash
L
on
the
directory,
it
seems
like
very
innocuous
operation,
but
on
a
distributed
network
file
system,
you
are
potentially
issuing.
A
very
large
number
metadata
is
in
order
to
retrieve
all
that
metadata
for
the
to
know
what
color
the
files
should
be
in
your
terminal.
B
When
you
run
LS
dash
up
the
concurrency
in
a
distributed.
File
system
is
also
more
complex
than
in
other
forms
of
storage.
When
you've
got
multiple
file
system
mounts
coming
from
different
hosts,
if
one
client
is
opening
a
file
and
writing
to
it
and
another
client
would
like
to
open
it
and
maybe
read
from
it
at
the
same
time,
in
order
to
enforce
opposing
semantics,
there's
a
fair
amount
of
complexity
that
has
to
exist
within
the
metadata
server
room
within
the
client.
B
In
order
to
make
that
happen,
so
those
are
the
the
downsides
to
a
file
system.
Why
is
it
hard?
Why
is
it
taking
such
a
long
time
for
this
and
other
distributed
file
systems
to
reach
level
of
maturity,
where
you
would
use,
alongside
your
object
and
block
interfaces,
to
give
them
all
concrete
illustration
of
what
seven
visits
emphasis
and
how
you
use
it
here
are
paraphrased.
B
The
commands
that
you
use,
if
you
have
an
existing
Raiders
cluster
and
you
want
to
start
running,
set
up
s,
so
the
set
deploy
tool
knows
how
to
set
up
a
metadata
server.
So
you'll
need
one
of
those
for
a
second
that
file
system.
You
need
a
data
pool
and
a
metadata
fill,
so
you
create
to
raid
our
schools.
And
finally,
there
is
a
command
called
set,
FS
new,
which
configures
the
file
system
and
tells
the
set
cluster
which
pools
you
would
like
to
use
for
once.
B
You've
created
your
file
system
and
your
metadata
server
is
up
and
running.
You
can
start
using
your
file
system
by
mounting
it
from
a
client,
the
example
at
the
bottom.
There
is
how
you
mount
a
set
file
system
using
the
kernel
client,
which
is
part
of
the
upstream
curl,
the
command
line
that
you
would
use
for
using
off
user
space,
pues
client,
it's
a
little
different,
but
it's
the
same
workflow.
B
So
that
was
your
lightning
tour
of
service
and
what
it
is
and
I'm
now
going
to
step
through
all
of
the
new
stuff.
So
the
biggest
thing
and
the
thing
that
I
think
a
lot
of
people
have
been
waiting
for,
is
the
ability
to
scrub
the
file
system
for
errors
and
repair
it.
When
something
goes
wrong.
This
kind
of
functionality
is
critical
to
moving
set
of
s
into
a
production
reigning
state
because
it
means
that
not
only
have
we
stabilized
the
software
somewhat
and
removed
a
lot
of
bugs
we're.
B
Now,
in
a
position
where,
even
if
we
do
encounter
those
or
unexpected
issues,
we
have
the
tools
that
we
need
to
detect
issues
and
correct
them.
On
a
customer
system
in
general,
the
resilience
and
self-repair
or
of
the
data
and
metadata
stored
in
surface
is
actually
the
underlying
radars
clusters
job.
B
So
when
surface
rights,
some
I
nodes
or
some
file
data
into
radars,
it's
getting
replicated
and
when
one
of
those
copies
one
one
of
those
disks
dies,
it's
Renesys
job
to
deal
with
that,
so
the
scrubbing
repair
stuff
and
set
the
fest
isn't
about
the
everyday
data
resilience
because
that's
already
taken
care
of
it's
about
disasters.
It's
about
the
unforeseen,
serious
software
bugs
which
clearly
we
go
to
great
lengths
to
avoid,
but
are
possible
or
scenarios
where
r
anals
can't
do
its
job
anymore.
B
We
now
have
that
capability
for
many
forms
of
metadata
damage,
principally
the
lots
of
objects,
which
is
the
kind
of
scenario
that
you
would
see
if
you
had
a
triple
failure
of
desks
and
radius
and
you've
lost
a
certain
number
of
your
placement
groups,
but
also
for
corruptions,
which
we
don't
generally
expect
to
see
in
a
Reynolds
cluster.
But
that's
really
more
of
a
proxy
for
what
would
happen
if
we
encounter
some
unexpected
software
bug.
If
we
encountered
some
structure
on
disk,
which
just
didn't
make
sense,
so
we
couldn't
decode
it.
B
That
would
be
a
corruption.
These
tools
are
for
using
disasters
and
they
require
expertise.
They're
primarily
intended
for
vendors,
providing
support
for
CFS
to
be
able
to
intervene
in
extreme
situations
and
repair
systems.
They
are
not
something
that
ordinary
users
would
be
using
every
single
day
so
to
go
into
more
detail
about
what
the
new
scrubber
past
up
looks
like
historically,
if
SEF,
if
set
the
first
encountered,
something
it
didn't
like
on
the
radio
stir
a
couple
of
versions
ago.
B
The
metadata
server
would
generally
assert
out,
which
clearly
looks
like
a
crash
to
the
user
and
essentially
is
a
slightly
safer
form
of
crash.
In
the
last
release
of
SEF,
we
added
the
ability
to
mark
MDS
ranks
which
are
the
roles
in
a
cluster,
that's
occupied
by
an
MDS
demon
as
damaged
so
that
when
they
encountered
something
they
couldn't
handle
on
disk
rather
than
crashing,
they
would
report
at
and
go
into
an
official
damaged
state
and
wait
for
intervention
to
fix
them
in
jewel,
we're
more
fine-grained
than
that
again.
B
So,
when
something
bad
is
found
on
it
within
the
radiance
cluster,
the
metadata
server
will
be
able
to
identify
where,
in
the
metadata
tree
hit
an
issue.
For
example,
if
a
particular
directory
has
damaged
metadata
on
disk
and
mark
that
directory
as
damaged
meanwhile,
the
MDS
demon
will
stay
up,
users
will
continue
to
be
able
to
access
the
rest
of
their
data
and
if
they
try
to
access
that
particular
broken
part,
they'll
get
an
appropriate
eio
code
from
their
clients.
B
There
are
some
tools,
server
side
for
dealing
with
this
situation,
so
there's
a
new
damage,
LS
command,
which
ninety-nine
point
nine
percent
of
the
time
will
give
you
zero
entries.
But
if
something
has
gone
seriously
wrong,
it
will
allow
you
to
see
exactly
where,
in
the
metadata
tree,
you've
got
issues
those
corresponding
damage,
RM
command,
which
is
for
removing
entries
from
that
list.
If
you
know
that
you've
fixed
some
fixed
something
using
the
repetitive
and
the
third
come
on,
there
is
for
the
situations
where
we
have
a
non
localized
form
of
damage.
B
If
something
is
wrong,
with
an
entire
MDS
rank.
So,
for
example,
if
some
critical
data
structure
has
been
critical,
global
data
structure
has
been
damaged.
That
would
be
reported
in
the
health
status
and
there
is
this
repaired
command
for
telling
staff
that
you
have
done
some
intervention
in
the
back
in
the
background
and
that
it
should
not
start
using
that
rank
again.
So
the
types
of
repairs
that
we
can
do.
If
we
see
inconsistent
statistics
in
the
system,
we
can
repair
that
online.
B
So
that's
the
process
by
which
the
MDS
demon
will
traverse
a
tree
at
metadata
and
work
out.
Things
like
what
should
be
the
final
count
for
a
directory
or
how
many
children
should
a
directory
have
recursively.
Currently,
all
the
other
repairs
such
as.
If
we
have
orphaned
files,
they
have
an
offline,
which
means
you
need
to
stop
the
MDS
and
go
and
fix
them
in
the
background.
So
the
online
capability
that
we
have
here
I'll
come
to
the
offline
in
a
second.
B
Firstly,
we
can
scrub
a
particular
path
so
that
path
that
the
user
passes
in
will
be
a
file
or
a
directory,
and
the
MDS
will
go
and
check
what's
on
disk
versus
what
it
has
in
cash,
and
if
there
is
nothing
in
cash,
it
will
just
check
that
it
can
load
it
from
desk
and
it'll.
Tell
you
if
everything
is
ok
if
what's
old
desk
seems
to
be
healthy.
Secondly,
there
is
a
recursive
flag
for
that
which
will
do
the
same
thing.
We
will
go
all
the
way
down
into
a
directory.
B
Thirdly,
you
can
pass
them
repair
flag
to
it,
which
will
go
through
this
procedure,
but
in
the
process,
if
it
finds
any
statistics
that
it
doesn't
like
it'll
rewrite
them.
That's
the
online
repair
capability
that
I
mentioned
a
second
ago
and
fourthly,
there
is
c'mon
called
tagged
path
and
what
that
does
is
similar
to
a
scrub,
but
instead
of
going
through
and
just
checking
all
the
metadata.
It
also
goes
an
update
to
the
data
pool
where
the
file
content
is
stored
and
tags.
B
So
the
scrub
commands
will
give
you
a
message
when
they
complete
they'll,
actually
give
you
a
little
JSON
structure
that
will
tell
you
about
the
what,
if
any
issue
they
found
and
they'll
also
admit,
cluster
love
messages.
So,
if
you're
not
around
to
see
the
result
of
the
command,
if
you
have
a
long
run,
command,
you'll
also
be
able
to
see
in
the
class
to
log
if
they
found
many
issues
so
that
process
of
iterating
over
the
metadata
is
what
we
call
forward
scrub.
B
The
backward
scrub
is
where
we
iterate
over
all
the
data
objects,
so
that
number
of
objects
is
of
the
order.
The
amount
of
data
in
your
system's,
so
your
data
files
are
stored,
chunked
into
four
megabytes
objects
and,
depending
on
how
many
files
and
how
big
they
are,
that
will
influence
how
many
data
objects
you
have
so
you're
iterating
over
all
of
them,
which
is
hopefully
you
can
see
that
that
means
this
is
an
unusual
thing
to
do.
You
don't
want
to
do
this
continuously.
B
It's
not
absolutely
fast
thing
to
do,
and
you
really
would
only
do
it
in
a
disaster
to
find
any
orphaned
files
or
recover
any
invalid
file.
Size
information
by
searchingly
objects
is
an
exhaustive
search
of
the
data
objects
in
order
to
mitigate
the
fact
that
it's
an
exhaustive
search
and
we've
added
the
ability
to
run
the
workers
in
parallel
and
I'll
show
you
what
that
looks
like
on
the
next
slide,
so
the
backwards
grow
up
process
is
done
using
the
seven
test,
data
scan
tool
and
it's
a
two-step
thing.
First,
the
scan
extends
command.
B
B
In
general,
we
can
recover
file
names
because
seth
has
stores
something
called
back
trace
on
data
objects,
that's
not
guaranteed
to
be
there,
and
if
it
isn't,
then
we
will
inject
lost
files
in
a
lost
and
found
directory.
So
just
the
same
as
it.
If
you
round
fsck
on
an
ext4
filesystem,
you
would
potentially
see
things
getting
linked
into
the
lost-and-found
folder.
We
know
how
that
concept
in
surface
as
well,
to
make
this
more
efficient
and
avoid
iterating
over
absolutely
every
object.
You
can
run
the
tag
path
command
that
I
showed
on
the
previous
slide.
B
B
B
We
don't
currently
have
a
very
user-friendly
way
of
running
this
if
you
want
to
run
a
large
number
of
workers
in
parallel,
it's
up
to
you
to
orchestrate
running
them
across
a
collection
of
clients.
You
probably
would
want
to
write
a
shell
script
of
some
kind
to
do
this.
Each
individual
worker,
though,
is
fairly
simple
to
invoke
there's
just
a
dash
dash
worker
n
and
dash
dash
work
at
em
argument.
B
B
Just
to
reiterate
this
is
disaster.
Recovery
functionality
don't
go
running
some
all
these
repair
commands
just
in
case
they
fix
something
because
it
is
possible
for
them
to
make
things
worse
as
well
as
better.
They
are
invasive
things
designed
to
operate
on
an
offline
cluster.
There's
also
some
future
work
to
be
done
here.
B
The
forward
scrub
functionality
needs
to
be
more
multi,
MDS
aware
so
currently
we're
sending
these
commands
directly
to
a
single
MDS
and
operating
on
whatever
happens
to
be
in
the
caption
pattern
book
were
in
the
metadata
they've
allocated
to
that
mvs
at
one
time
it
doesn't
handle
distributing
this
across
most
or
demons
in
the
multi-layer
situation.
I'm
also
not
currently
running
this
in
the
background,
opportunistically
the
way
that
we
do
with
radar
scrubs.
So
at
some
point
it
would
be
nice
to
extend
this
to
share
your
filesystem
scrubs
to
happen.
B
So
the
next
thing
I
want
to
talk
about
is
improvements
to
authorization
in
service
the
clients
in
a
guinness
ffs
file
system.
The
servers
that
are
mounting
it
need
to
talk
to
the
South
monitors.
The
South
OST
is
to
store
their
data
and
the
ceph
nds
demons
to
do
their
metadata
operations.
We
already
have.
The
ability
to
limit
hell
is
DS
how
clients
could
talk
to
is
DS,
so
we
can
tell
them.
B
They
could
only
talk
to
a
particular
data
pool
and
then
we
could
create
layouts
in
set
of
s
four
files
and
directories
that
puts
the
data
in
particular
directories
and
particular
data
tools,
so
that
you
can
have
some
level
of
separation
to
what
clients
can
see
in
which
clients
could
touch
each
other's
data.
Historically,
you
didn't
have
any
finer
grained
control
over
which
parts
of
the
metadata
the
clients
could
see.
So
that's
been
fixed.
B
What
that
looks
like
is
shown
here
in
this
example.
So
the
typical
use
case
for
this
is
that
we
have
a
client
and
we
want
that
client
to
only
be
able
to
see
data
within
a
particular
pool
and
only
be
able
to
see
metadata
within
a
particular
directory.
So
if
we
have
a
directory
food
and
we
have
a
pool
of
boot
cool
and
we've
linked
those
two
up
by
setting
a
SF
lay
out
the
set
file
system
layout
on
food,
/
points8
approval.
B
We
can
then
craft
one
of
these
authorization
caps
for
a
client
that
tells
it
NBS
allow
audibly
path,
equals
Buddha
to
limit
it
to
that,
and
the
existing
ability
is
still
add
to
restrict,
which
is
DZ
control
to
as
well.
And
then
the
new
part
is
what's
involved
that
once
you
have
a
client
that
has
capabilities
like
this,
it
needs
to
be
started
in
a
certain
way
in
order
to
work
so
because
it
can
no
longer
see
the
root
of
the
file
system.
It
can
only
see
this
directory.
B
You
need
to
pass
the
Dutch
our
flag,
which
tells
the
client
which
directory
to
treat
us
this
route.
Many
needs
to
have
a
root
that
it
is
actually
permitted
to
read.
Once
you've
gone
through
this
setup,
you
effectively
have
a
client
which
can
only
access
one
part
of
your
filesystem
and
as
far
as
that
client
is
concerned,
that
is
the
root
of
the
file
system.
B
Next,
up
on
a
similar
theme,
I
want
to
talk
about
improvements
to
finally
outs
so
that,
in
addition
to
having
a
file
or
directory
layout
that
points
files
to
a
particular
pool,
we're
going
to
be
able
to
point
files
into
a
particular
greatest
namespace.
So
you
may
not
be
familiar
with
rados
namespaces.
They
are
a
cheaper
way
than
pools
to
subdivide
your
cluster,
so
pools
involve
creating
pgs,
pgs
consume,
CPU
and
RAM,
and
what
we
want
is
a
logical
separation
between
one
set
of
objects
and
another
set
of
objects.
B
We
would
like
a
lighter
weight
to
way
of
doing
that
and
that's
something
that's
existed
in
raid
us
for
quite
some
time.
It's
called
the
namespace
and
they're
implemented
effectively
as
just
a
prefix
to
object
names,
so
they
create
a
different
logical
namespace,
as
opposed
to
a
pool
which
is
we're
going
to
physically
store
the
data
and
handle
it
separately.
B
So
there
is
an
existing
ability
to
write,
OSD
or
caps
that
limit
you
to
a
namespace.
So
if
we
could
write
our
files
to
different
namespaces
for
different
set
of
clients,
then
this
would
be
a
good
way
of
providing
security
that
preventive
to
clients
from
seeing
each
other's
files
without
the
overhead
of
creating
whole
different
pools.
B
So
that's
what
has
been
done?
The
existing
layout
fields,
you
have
the
ability
to
center
pool
and
then
the
ability
to
configure
how
the
object
was
going
to
be
striped
with
striking
a
strike.
Count
object
size.
So,
there's
now
just
an
additional
feel:
they're
called
pool
namespace,
there's
a
caveat
here,
which
is
that
on
the
client
side
or
the
information
for
a
file
that
it
needs
to
access,
the
data
itself
is
going
to
go
into
that
namespace.
B
But
we
do
also
store
these
back
traces,
which
are
an
implementation
detail
set
of
us
from
the
MDS
for
each
file
and,
as
was
the
case
with
customizing
the
pool,
that's
used
for
a
file,
we
will
write
the
back
traces
from
the
MDS
still
to
the
default
pool.
I'm
default.
Namespace
I.
Don't
really
have
time
to
go
into
why
that
is.
But
it's
something
to
be
aware
of
that.
B
B
Those
two
features
are
actually
somewhat
motivated
by
what
I'm
going
to
talk
about
now,
which
is
OpenStack
vanilla.
So
OpenStack
is
the
open
source
cloud
framework
for
building
private
or
public
clouds.
It
has
a
number
of
different
services
that
make
it
up
so
Nova
the
compute
servers
cinder.
The
block
storage
service
needs
from
the
networking
service
in
one
of
those
services
that
was
added
a
little
bit
more
recently
than
the
better
known
ones
is
Manila,
which
is
a
service
for
provisioning
and
accessing
shared
file
systems.
B
As
part
of
your
cloud,
Manila
provides
Model,
T
users,
where
it
allows
them
to
request
a
piece
of
file
system
storage
that
it
calls
a
share
and
Manila
has
a
plug-in
framework
that
enables
you
to
write
drivers
for
it.
So,
for
example,
there
is
a
HDFS
driver.
There
is
a
cluster,
a
fast
driver
and
there
are
drivers
for
integrating
this
with
various
proprietary
storage
appliances
as
well,
so
we've
gone
ahead
and
written
a
second
that's
driving
for
this
and
in
sep
of
us.
B
B
B
Similarly
to
what
I
was
talking
about
with
MDS
or
caps
that
limited
access
by
path,
the
dash
I'll
flag
is
needed
by
the
client.
So
when
we've
created
a
share
for
user,
we
give
them
that
path
as
something
like
/
Manila,
/a
1,
alpha
numeric
ID,
and
they
need
to
pass
that
into
the
dash
our
flag
to
their
fuse
client.
In
order
to
be
able
to
mount
that
share.
B
We
take
advantage
of
the
recursive
statistics
feature
of
service
in
order
to
get
the
capacity
statistics.
So
when
we're
reporting
to
Manila
that
a
given
share
is
using
a
certain
number
of
lights
because
it
share
is
just
a
directory
when
we're
getting
that
data
from
is
the
our
stats
within
set
of
s
and
using
these
new
or
caps,
we
were
able
to
make
sure
that
clients
only
have
permission
to
access
the
particular
directory,
which
corresponds
to
a
share
that
they
have
been
explicitly
granted
access
to.
B
That's
very
important,
because
an
openstack
cloud
is
a
multi-talented
environment.
So
vanilla
really
represents
a
use
case.
That
brings
together
a
number
of
these
different
features
that
we
been
working
on
recently,
as
well
as
a
number
of
useful
features
that
have
existed
in
south
of
us
for
some
time
and
it
wraps
them
up
in
a
way
that
makes
the
whole
thing
a
lot
more
accessible
to
users.
So
they
don't
have
to
type
those
long
step
or
get
or
create
commands
anymore.
B
B
B
B
There
is
no
actual
fundamental
reason
for
that:
it's
sort
of
a
convenience
of
implementation
thing,
but
there
are
good
reasons
you
might
want
to
have
multiple
file
systems,
which
means
having
multiple
MDS
clusters
that
are
all
sharing
what
cluster.
So,
if
you
want
to
separate
file
systems
rather
than
having
to
have
to
set
clusters,
you
can
now
back
at
all
on
the
same
Raiders
cluster
you
might
want
to
do.
This
have
multiple
file
systems
if
you
want
to
physically
isolate
some
work
clothes.
B
So
if
you
want
to
make
sure
for
security
or
quality
of
service
reasons
that
two
different
workloads,
maybe
one
is
very
mission
critical
and
what
is
experimental
are
just
going
to
go
through
physically
separate
MDS
service.
You'll
be
able
to
do
that.
There's
also
a
disaster
recovery
use
case,
which
is
quite
remote
avator
for
us
here
that
currently,
if
you
were
going
through
some
of
the
repair
procedures,
that
I
was
talking
about
earlier
you're
kind
of
trying
to
do
a
lot
of
stuff
in
place
and
that's
kind
of
a
scary,
uncomfortable
thing
to
do
so.
B
So
if
you're
worried
about
hitting
issues,
whether
they're
bugs
or
performance
issues
or
stability
issues
in
set
of
s,
then
it's
a
nice
way
of
protecting
yourself.
If
you
can
say
well,
I've
got
a
stable
workload.
That's
working
really!
Well
I'm
going
to
keep
that
running
on
this
MDS
in
this
file
system
and
then
what
I
want
to
try
Ewing
something
different
I'm,
going
to
run
that
on
a
different
MDS
with
a
different
file
system,
so
that
one
thing
doesn't
interfere
with
the
other,
so
the
FS
new
command.
B
That
I
mentioned
at
the
beginning,
that's
something
that
was
added
a
little
while
ago,
the
FS
new
FS,
LS
and
SRM
commands,
which
very
much
suggest
you
should
be
able
to
have
more
than
one
but
his.
We
would
give
you
an
error.
If
you
try
to
create
a
second
one.
Well
now
you
can
run
it
all
the
ones,
so
you
run
FS
new
second
time
you
give
it
a
different
couple
of
coolness
to
use
and
you'll
get
a
second
file
system.
B
Hopefully
before
doing
that,
you
have
created
enough
MDS
demons
to
actually
operate
the
new
file
system.
So
if
you
only
had
one
and
the
sd1
and
you
had
one
file
system,
then
you
create
second
file
system.
Well,
the
second
file
systems
not
gonna,
be
able
to
come
up.
Yet
in
this
initial
implementation,
the
MDS
demons
are
all
treated
equally.
B
We
have
a
flag
on
the
cluster
that
you
have
to
explicitly
set
to
indicate
that
you
are
aware
of
the
caveats
and
that
you're
not
going
to
get
angry
with
us
if
something
goes
wrong
for
legacy
clients
which
don't
support,
explicitly
selecting
which
file
system
you'd
like
to
connect
to,
we
have
an
ability
to
configure
which
the
default
file
system
file
system
should
be.
So
if
you
create
three
file
systems-
and
you
want
the
second
of
those
two
either
one
that
old
clients
will
get
when
they
try
and
connect
to
the
system
you
can.
B
You
can
do
that,
whereas,
if
you're
using
the
new
client-
or
I
should
say
the
latest
version
of
the
fuse
client,
you
can
pass
an
option
on
the
command
line
to
say
which
file
system
you
want
to
use.
If
you
omit
that
option
and
then
like
a
legacy
client,
you
will
get
the
default
file
system.
So,
if
you're
not
actively
trying
to
use
this,
if
you
just
have
one
file
system,
everything
will
just
still
work
the
same
on
the
client
side
and
on
the
server
side.
A
B
Is
a
bunch
of
fallen
work
that
to
improve
this?
Much
like
with
data
storage,
it
would
be
nice
to
be
able
to
use
Redis
namespaces
for
my
status
for
rich
as
well,
so
that,
if
I'm
creating
a
second
file
system,
I
don't
have
to
needlessly
create
a
second
metadata
pool.
I
could
just
use
different
name
space
within
my
existing
metadata
pool.
There's
a
bunch
of
authorization
work
needed
here
to
make
it
suitable
for
a
lot
of
youth
use
cases,
especially
multi-talent
use
cases.
B
So
currently,
there
is
nothing
to
limit
our
client
to
connecting
to
particular
file
systems,
so
any
client
can
connect
to
any
file
system
and
any
MDS
can
act
as
a
server
fatty
filesystem.
The
file
system
functions
stuff
doesn't
exist
in
the
kernel,
client
yeah.
So
currently
the
ability
to
pick
on
the
client
by
client
basis,
which
file
system
you
get
is
limited
user
space,
client
and
as
I
was
mentioning.
The
M
de
esas
are
all
considered
equal
and
you
can't
currently
set
up
any
clever
policies
that
say
this
MDS
is
for
that
file.
B
B
So
I'm
going
to
wrap
up
with
some
tips
for
anyone
who
is
thinking
of
or
it
already
is
an
early
adopter
of
cephus.
B
This
is
a
stock
slide
that
I
put
in
a
number
of
presentations
so
come
to
the
mailing
list,
come
and
look
at
the
issue.
Tracker
look
at
the
online
reference
for
how
to
configure
lobbying
and
debugging,
and
so
you
can
get
more
information
about
any
issues
you're
having
and
then,
when
you
all,
are
having
an
issue.
B
Please
consider
installing
the
most
recent
development
release
or
if
you
use
in
the
kernel,
play
a
more
recent
Colonel,
because
ffs
is
very
actively
developed
and
there's
a
lot
of
difference
between
the
code
from
six
months
ago
in
the
code
for
today,
if
you're
getting
in
touch
with
us,
please
let
us
not
as
much
detail
as
you
can.
How
are
your
MDS
is
configured?
How
many
of
them
have
you
got?
Which
client
are
you
using?
Is
it
the
colonel?
Is
it
users
base
and
so
on?
B
What
are
you
doing
with
the
file
system
so
there's
a
big
difference
for
a
distributed
file
system
between
a
workload
week
or
just
a
single
client
versus
where
you've
got
multiple
whites,
accessing
the
same
files
and
and
so
on.
So
really
just
as
much
information
as
you
can
gather
using
the
tools
and
that
as
much
information
as
you
can
give
us
when
you're
reporting
issues
is
really
helpful.
B
A
B
Okay,
yes,
I
should
have
mentioned
that
so
the
dual
release,
Oh
Seth,
which
is
the
release
happening.
This
spring,
is
going
to
be
the
first
one
where,
where
we're
going
to
start
calling
the
upstream
open
source
release
stable
and
we
are
going
to
be
encouraging
people
to
start
evaluating.
It
start
testing
it
as
for
official
supported
releases
of
products
based
on
this
code,
whether
it's
by
red
hat
or
any
other
vendors,
that's
TVA,
but
yeah
jewel
is
the
first
day
we'll
release
of
CFS,
you've
also
lost
about
snapshots
and
snapshots
are
not
I.
B
Think
currently
we're
not
including
that
in
all
sort
of
statement,
instability
for
jewel,
but
there
has
been
recent
work
on
stabilizing
slap
shots
so
what
they
they're
coming
along
as
well
so
I'm
being
vague
about
everything
apart
from
jewel
and
this
the
jewel
is
going
to
be
jewel,
is
going
to
be
the
upstream
release
of
CFS.
The
folks
should
start
testing
and
evaluating
and
complaining
to
us
about
once
with.
A
Now
there
are
to
expand
on
that
a
little
bit.
Typically,
what
you
see
when
an
open
source
when
an
open
source
version
of
stuff
is
released
and
is
stable
and
whatnot.
Typically,
it's
about
a
six-month
window
to
when
the
Red
Hat
supported
packages
come
out
that
include
those
things
you
know
modulo
any
no
problems
concerns
things
that
we
want
to
expand
on
before
putting
it
in
the
you
know:
Red
Hat's,
F
storage
product,
that
kind
of
thing,
so
the
commercially
supportable
is
typically
about
six
months
behind.
A
B
So
there's
two
questions.
You
might
be
asking
that
when
you
say,
what's
the
relation
to
involve
object,
you
might
be
asking
what's
the
relationship
between
files
and
set
of
s
and
objects
in
our
GW
and
the
answers
that
is,
there
is
no
relation
there
they're
separate,
so
the
voting
the
same
Raiders
positive
of
a
day
trainer
file
and
the
data
illogically
object.
The
two
things
cannot
see
each
other.
B
As
for
the,
the
other
interpretation
of
the
question,
which
is
how
are
the
files
and
seth
has
mapped
to
objects
in
rados,
they
are
striped
in
the
same
way
that
they
are
in
IBD
or
a
GW.
So
you
can
configure
that
with
the
settings
on
file
layout,
but
the
default
which
most
people
I
think
go
ahead
and
use
is
to
simply
stripe
chunk
objects
into
four
megabytes
hunks.
So
if
you
write
a
for
Megan
by
objects,
that's
going
to
be
a
sorry.
B
If
you
write
a
four
megabyte
file,
that's
going
to
be
more
object
and
if
you
were
an
a
leg
by
far,
let's
go
be
two
objects.
There
is
a
little
bit
of
our
head.
That
goes
with
that.
In
the
cases
where
you're,
storing
files
in
non-default
pools
or
namespaces,
you
will
get
the
data
objects,
the
file
which
will
follow
those
rules
about
for
matin
by
chunking,
and
then
you
will
get
an
additional,
very
small
object
for
each
file
as
well,
which
tracks
some
some
other
metadata
about
it.
B
You
also
ask
if
active,
active
MTS's
will
be
supported,
so
the
short
answer
is
no
in
the
jewel
release
where
all
of
the
work
we've
done
on
stabilization
and
repair,
and
so
on
has
focused
on
the
single
active
MVS
case.
So
the
case
we're
encouraging
people
to
evaluate
is,
and
one
active,
MDS
and
then
a
standby
or
a
stamp
I
replay,
and
yes,
so
standby
replay
is,
in
general,
pretty
good
idea
to
get
a
fast
failover.
That's
the
mode
where
the
standpipe
continuously
replays,
the
Journal
of
the
active
India's.
B
However,
having
multiple
active
I'm
ds's,
the
code
is
all
in
there.
You
know
you
can
install
it
and
enable
it
and
you
will
in
general,
find
that
it
works.
But
that
is
not
something
we've
focused
on
stabilizing.
So
if
you're,
if
you're
putting
things
in
a
more
production,
ready
and
less
production-ready
bucket,
then
active
active
goes
and
be
less
prepped
and
ready
bucket.
We
have
a
question:
what
about
quota
in
the
kernel
client?
Is
it
ready?
B
A
A
Colonel
CAI
encoders
yeah.
B
How
is
active
passive,
a
che
among
the
MDS
knows
handle
is
the
standby
and
be
as
hot
orders.
Failure
Vinny
to
be
Hannah
looks
darling
some
flight
pacemaker.
It
is
hot,
so
there
is
no
pacemaker
in
this.
The
way
this
works
is
you
stop
as
many
mbs
demons
as
you
like,
they
all
communicate
with
the
Ceph
Molitor
cluster
and
they
essentially
register
themselves
in
a
list
of
electrical
standbys
and
then
the
set
monitor
cluster
looks
at
your
configuration.
B
So
none
of
that
requires
any
external
input
unless
you
want
to
preempt
it
to
get
past
a
timeout,
so
by
default
in
MDS
and
just
falls
off,
the
network
will
wait
30
seconds
for
it
before
promoting
another
guy
in
its
place,
so
you
can
preempt
that
using
the
SF
MDS
failed
come
on,
but
aside
from
that,
it's
an
entirely
autonomous
thing.
B
And
you
can
also
set
a
flag
which
causes
that
and
standby
demon
to
continuously
replay
the
metadata
log.
The
metadata
journal
from
the
guy
who
it
is
the
potential
replacement
for
which
means
that,
at
the
point
that
it
is
asked
to
take
on
that
role
as
the
replacement,
it
already
has
the
metadata
in
cash
and
it
doesn't
have
to
reload
or
the
metadata
Reyes.
That
gets
you
up
faster
failover
at
the
cost
of
some
extra
read
iOS
four,
following
the
journal
and
at
the
cost
of
being
a
slightly
more
handcrafted
configuration.
B
So
in
general,
what
makes
a
good
monster
that
doesn't
necessarily
hardware
wise,
doesn't
necessarily
make
a
good
MDS.
Aside
from
that,
you
just
have
all
the
general
issues
associated
with
putting
mixing
different
demons
on
the
same
server.
B
So
if
we
often
get
asked,
can
we
run
ones
and
oh
SDS
on
same
server
and
so
there's
a
similar
kind
of
set
of
issues
that
you
need
to
work
through
if
you're
thinking
about
doing
that,
but
in
future
we're
looking
at
things
like
cgroups
configurations
that
will
enable
people
to
run
montuno
SDS
in
the
same
place
and
that
same
kind
of
work
will
at
some
point
also
apply
to
willebrand
MDS
demons.
In
the
same
places
month,
so
the
answer
isn't
know
you
can
run
them
on
the
same
servers.
B
It's
not
necessarily
a
bad
idea,
but
you
have
to
really
think
through
that
configuration
Brian
asks.
How
can
you
scale
setup
has
to
meet
demands
of
a
growing
open,
star
cluster
if
you're
using
it
for
Manila
so
essentially
in
the
same
way
as
we
would
for
any
other
large
workload.
So
step
of
us
ten
years
ago,
long
before
I
worked
on
it.
B
The
the
surface
architecture
was
originally
designed
to
deal
with
HPC
clusters,
which
would
have
very
large
numbers
of
clients,
and
so
in
Manila,
where
we
potentially
also
have
very
large
numbers
of
clients
talking
to
lots
of
different
directories
directing
as
manila
shares.
It's
the
it's
a
similar
use
case
to
when
it
comes
to
scaling,
and
so
it's
the
same
answers
it
scales
by
increasing
number
of
MDS
is
in
a
cluster,
although
currently
that
multiple
active
MBS
functionality
isn't
stabilized
the
other
aspect
to
scaling
with
Manila
is.
B
We
might
also
look
at
integrating
the
multi
file
system
functionality
with
Manila
so
that
in
cases
where
people
knew
that
they
had,
for
example,
one
tenant
was
going
to
use
this
set
of
file
systems
and
another
tenant
was
going
to
use
another
set
of
file
systems.
You
might
consider
crafting
a
configuration
where
you
had
an
MDS
for
this
tenants
file
systems
and
MDS
for
this
other
tenants
file
systems
rather
than
having
a
completely
elastic
cluster
mdss.
B
So
it's
a
mixture
of
the
elastic
scaling
with
the
size
of
the
mps
cluster
and
then
maybe
a
little
bit
of
manually
configured
magic
for
deploying
more
individual
demons
that
mine
operators,
paulo
separate
clusters,
but
the
middle
of
stuff
is
really
very
new
at
the
moment.
So
you'll
be
interesting
to
see
how
that
pans
out.
A
Alright,
it
looks
like
we
don't
have
any
more
questions.
That
was
a
great
work.
Thank
you,
John
very
much
just
a
reminder
to
everybody
here.
This
was
recorded
and
will
be
posted
up
on
the
YouTube
channel,
which
is
also
going
to
be
linked
from
the
SEF
Tech
Talks
site
chef
calm.
So
if
you
want
to
replay
or
revisit
this,
it
will
be
there
otherwise
stay
tuned
for
our
next
one,
which
is
on
24th
of
March
same
time
same
bat
channel.
So
we'll,
hopefully
see
you
all
back
then.