►
From YouTube: Ceph Project Update
Description
A full project update on current work, future roadmap, and project contributions.
A
All
right
welcome
to
the
final
session
of
the
Ceph
date
track
here
at
open
source
days,
go
ahead
and
take
your
seat
when
we
get
started
just
a
reminder,
if
you
do
have
questions
either
in
the
middle
or
at
the
end,
to
make
sure
you
use
the
microphones
so
that
the
recording
that
we
have
going
for
posterity
will
will
pick
you
up
as
well
as
the
rest
of
us
in
the
room.
Our
our
next
and
final
speaker
is
the
creator
and
current
project
lead
force.
A
B
Can
everyone
hear
me?
Okay?
Yes,
all
right!
My
name
is
sage
while
and
the
Ceph
project
lead
I
work
at
Red,
Hat
in
the
office
of
Technology
and
oversee
Ceph
development,
and
so
on
so
I'm
going
to
talk
a
bit
about
the
release,
that's
about
to
come
out
luminous.
What's
in
it,
what's
cool
and
then
I'm
going
to
talk
about
what
we're
working
on
after
that
and
then
a
bit
about
contributor
stats
and
so
on.
So
just
a
level
set
set
does
a
regular
release
cadence,
we
do
normally
do
releases
every
six
months.
B
Every
other
one
is
an
LTS,
which
means
we
do
backboards
for
bug,
fixes
and
so
on.
Luminous
is
about
to
come
out
as
supposed
to
be
spring.
It's
kind
of
sort
of
another
month
or
two
before
it's
out,
so
it'll
be
I,
guess
early
summer,
we're
a
little
bit
behind
the
next
release
after
that
is
going
to
be
mimic
which
will
be
an
on
LTS
in
the
fall
or
maybe
winter.
B
B
You
go
look
at
Google,
Images
and
wonderful,
so
on
so
lots
of
good
stuff
coming
luminous,
it's
going
to
be
really
good
release,
I'm
very
excited
about
it.
The
biggest
one,
the
biggest
piece
that
I'm
most
excited
about,
because
I
worked
on
it
primarily
was
blue
store.
Blue
store
is
going
to
be
stable
and
luminous,
and
it's
going
to
be
the
default
back
in
for
the
OSDs.
So
a
big
big
milestone
bluestar
consumes
a
raw
block
device,
in
contrast
to
our
sort
of
legacy
file
store,
which
consumes
ekits
XFS.
B
We
use
proxy
B
internally
for
metadata,
but
it's
all
sort
of
packaged
up
one
big
thing
that
we
control
it's
very
fast
on
hard
disks,
roughly
twice
as
fast,
both
for
large
iOS
and
small
iOS
for
regular
SSDs.
It's
also
faster
than
file
store,
maybe
more
like
one-and-a-half
times,
but
the
varies
on
your
workload
for
nvme.
It
isn't
that
different
than
file
store,
because
the
mdme
isn't
actually
the
slow
part.
B
We
have
other
issues
to
deal
with
the
optimizing
stuff
itself,
so
it
uses
less
CPU,
but
blue
store
is
the
future
and
that's
that's
sort
of
where
we're
trying
to
get
get
to.
More
importantly,
it
sort
of
gets
rid
of
all
this
legacy
stuff
that
we
had
with
valve
store.
So
all
these
weird
performance
anomalies
that
you
wouldn't
notice
until
you
had
strange
workloads,
try
to
go
away
because
we
are
less
stupid.
Mostly
so
blue
store
has
full
data
check
zones
on
everything.
B
So
every
time
you
read
any
data
from
the
disk,
it
gets
checksum
verified,
so
you
won't
get
sort
of
bad
errors.
It
also
does
in
like
compression,
was
the
love
or
snappy,
which
is
nice,
and
it's
it's
going
to
be
the
stable
thing.
Lots
of
people
contributed
to
it
as
a
group
effort
and
we're
very
excited
to
finally
have
it
done
and
out
there
I'm
going
to
show
a
few
quick
performance
plots.
B
These
are
showing
large
and
small
right,
iOS
random,
writes
on
trying
throughput
and
latency
and
the
whole
bunch
of
development
branch
is
sort
of
the
top.
One
is
the
one
that
all
got
merge.
These
are
a
little
bit
old,
but
it's
roughly
twice
as
fast
for
large
and
small.
You
sort
of
the
takeaway
same
thing,
similar
picture
when
you
mix
reads
and
writes
the
reads:
aren't
necessarily
twice
as
fast,
so
it's
a
bit
of
a
blend
there.
B
But,
more
importantly,
if
you
look
at
sort
of
the
the
aggregate
workload
not
just
sort
of
a
micro
benchmark,
things
like
radio
Skateway
that
are
doing
index
updates
on
the
bucket
indices
that
are
stored
in
Ray
DOS.
There
were
all
these
weird
annoying
things
that
file
store
had
to
do
in
order
to
make
that
work
properly
and
be
consistent
and
safe.
Well,
the
store
does
it
much
better,
and
so
the
performance
improvement
is
more
like
3x
or
4x,
depending
on
what
your
workload
is,
because
the
ditch
just
isn't
has
done
this.
B
It
used
to
be
so
we're
very
excited
about
that,
and,
as
a
consequence
of
all
that,
this
also
enabled
us
to
do
another
big
feature,
which
is
a
racial
code,
support
for
ratos
block
device.
Finally,
so
that
the
key
missing
piece
before
was
erasure,
coated
pools
didn't
support
over
rights
of
existing
data.
Now
they
do
so,
you
can
put
a
block
device
on
top
of
those
objects.
B
It
requires
blue
store
in
order
to
perform,
because
we
have
to
do
a
two-phase
commit
with
the
official
to
be
able
to
rollback
and
that
it's
implemented
on
the
file
store,
but
it's
horrifically
slow
and
the
other
thing
is
that
we
rely
on
the
checksums
in
blue
store
in
order
to
do
the
deep
scrubbing,
and
so,
if
you're,
using
file
store
with
the
easy
overrides,
you
can't
deep
scrub.
It
doesn't
actually
verify
anything.
Let's
just
go
read
the
data,
so
blue
store
and
easy
overwrites
are
to
go
together
in
luminous.
So
it's
there,
it's
good!
B
B
Hopefully,
we
hope
to
mitigate
that
with
blue
store,
all
that
we
haven't
done
sort
of
the
final
testing
that
pits
file
store,
3x
against
blue
store
or
racially
coded,
so
that's
sort
of
in
progress.
It
will
sort
know
with
the
final
how
we
NetApp,
but
on
the
flip
side,
large
rights
are
actually
faster
than
replication,
because
you're
actually
doing
less
IO
to
your
devices,
which
is
also
good.
B
The
implementation
is
still
doing
sort
of
the
simple
thing
when
you
do
a
small
right:
it's
updating
a
full
stripe,
so
we
we
want
to
do
things
that
are
more
clever,
but
it's
going
to
take
a
little
more
time
before
we
were
able
to
make
those
optimizations,
but
it
works
in
luminous.
It's
there.
It's
ready
to
ready
to
go.
B
The
other
big
piece
of
work
that
went
into
luminous
was
the
new
stuff
manager
daemon.
It
actually
appeared
in
kraken,
but
didn't
actually
do
anything
useful.
Yet
in
luminous
it
does
lots
of
useful
things.
The
main
thing
is
that
it
offloads
a
whole
bunch
of
work
that
the
monitor
used
to
do
into
a
new
demon,
so
the
Oh
monitor
previously
dealt
with
all
the
PG
stats.
It
ended
up
with
a
whole
bunch
of
data
that
was
just
turning
through
paxos
and
slowing
down
the
monitor
and
limiting
or
overall
scalability.
B
B
So
it's
going
to
make
Cephalon
scale
again
and
coincidentally,
this
morning,
I
just
got
word
that
CERN
has
like
a
10,000
OSD
cluster
that
we're
going
to
be
able
to
use
for
two
weeks
in
a
couple
of
weeks
to
do
another
set
scale
test,
and
so
it's
perfectly
time
to
test
all
this
new
Luminous
stuff
to
actually
see
how
it
does
with
these
big,
because
we
don't
usually
get
to
buy
that
much
hard
road
Red
Hat.
Unfortunately,
so
that's
that's
going
to
happen
in
next
couple
weeks.
B
Very
excited
about
that
set
manager
also
has
a
new
REST
API,
we
sort
of
took
the
caliber
API
adapted
it.
Its
uses,
the
pecan
framework.
Now
it's
written,
Python
manager
has
this
nice
Python
plug-in
framework
that
you
can
use,
so
that's
going
to
be
there
and
there's
also
going
to
be
a
built
in
dashboard.
It's
super
simple.
It's
basically
like
a
set
yes
on
the
web,
but
it
works.
Here's
a
screen
shot,
it's
pretty
simplistic
right
now.
B
It
shows
the
log
just
really
basic
stuff
like
that,
so
not
much
in
Luminos
yet,
but
this
is
going
to
be
a
building
block
moving
forward,
eventually
we're
going
to
add
all
the
all
the
metrics
in
there
that
the
manager
already
has
actually
just
doesn't
present
them
through
the
GUI
and
that
sort
of
thing
so
moving
forward
there
there's
a
new
network
messenger
implementation
in
luminous.
It
actually
was
new
in
Kraken,
but
it's
lumen
or
jewel.
It's
new
since
jewel
new
implementation.
It
doesn't
use
up
lots
of
threads,
it's
much
more
efficient.
B
It's
event-driven
fresh
codebase,
it's
great!
That's
so
much
better!
It
also
has
a
pluggable
back
in
so
there's
an
RDM,
a
back-end
for
the
messenger
that
is
built
by
default.
It
isn't
tested
very
heavily
but
I'm
having
our
team
a
gear
in
our
in
our
community
lab,
but
it's
being
used
in
a
few
places
in
production
with
good
results,
so
it
seems
table,
but
it's
not
sort
of
officially
supported
and
tested.
Yet
so
your
mileage
may
vary,
there's
also
an
experimental
DB
TK
back-end
that
uses
Intel's
user
space,
acceleration
library,
stuff.
B
That
also
looks
very
promising.
It's
it's
definitely
in
a
prototype
stage:
it's
not
ready
to
go
the
codes
there
and
it
you
know
you
can
build
it
and
play
with
it.
If
you
want
so
very
excited
there,
Mellanox
has
been
an
ex
Kyra
have
been
that
many
people
working
on
the
rDNA
stuff
and
the
other
sort
of
nice
thing
is
coming
in
jewel.
Is
that
we're
finally
going
to
have
perfectly
balanced
OSTs?
B
So
any
of
everybody
who
operates
a
large
cluster
is
dealing
with
a
variation
between
the
least
utilized
OST
and
the
most
utilize
OSD
and
dealing
with
three
weights
and
capacity
planning.
And
it's
just
a
headache.
We
finally
have
a
bunch
of
new
tools
to
actually
make
that
essentially
a
perfect
balance.
So
the
two
two
tools
are
something
called
shoes
args
for
crash,
which
basically
is
sort
of
a
way
to
feed
in
all
the
specific
parameters
for
a
particular
pool
that
tweak
the
weights
and
the
ID.
B
So
you
can
sort
of
get
it
to
do
exactly
what
you
want.
It's
sort
of
a
generic
capability,
but
what
it
allows
us
to
do
is
around
a
numeric
optimization
that
just
does
a
gradient
descent
and
fiddles
with
all
the
weights
to
up
so
you
actually
get.
The
actual
output
is
exactly
what
the
weights
are
that
you
intended
when
you
put
it
in
so
it
solves
that
imbalance
problem,
but
also
addresses
addresses
something
that
we've
been
calling.
B
And
this
it's
it's
a
it's
annoying
math,
but
we
don't
even
notice
it
for
a
long
time,
but
we
can
act.
We
can
correct
with
that
as
well
by
using
adjusted
probabilities
for
the
second
and
third
replicate
choices
than
crush,
so
the
good
news
is
that
the
imbalance
portion
of
that
optimizing
or
that
is
actually
going
to
be
backwards
compatible
with
older
clients.
If
you
want
to
correct
for
the
multi
pick
part,
then
you
have
to
wait
to
allow
your
clients
or
running
luminous
and
understand
the
new
stuff.
B
The
other
tool
is
something
called
PG
up
map,
which
is
just
like
it's
just
the
ability
to
put
an
explicit
exception.
Mapping
in
the
OSD
Memphis
that
says
this
PG
is
stored
on
these
OSDs
period,
so
it
just
overrides
whatever
process
and
says
put
it
here,
and
so
there's
a
really
simple
optimizer
that
looks
at
your
distribution
and
says
this:
postie
is
one
PG
too
many,
and
this
one
has
one
too
few,
so
I'm
just
going
to
move
there
and
does
that.
So
both
those
tools
are
there,
but
that
PGF
map
also
requires
luminous
clients.
B
They
won't
really
be
able
to
use
it
at
a
production
cluster
until
everybody
is
upgraded
on
the
client
side
and
a
few
other
odds
and
ends
for
the
rate
aside
crash
has
something
new
called
device
classes
where
you
can
just
tag
the
OSDs
in
your
system
as
being
a
particular
class
or
type.
So
you
can
say
these
are
SSDs.
These
are
hard
disks.
These
are
in
VMI's
and
then
you
can
write
a
crush
rule.
That's
just
really
simple
that
says:
map
to
OSD
or
O's
to
use
that
are
SSDs
and
map
to
hard
disks.
B
Previously,
if
you
want
to
do
this,
you
had
to
create
you
have
to
manually
edit
your
crush
map
and
create
two
parallel
hierarchies
and
futz
with
all
the
names
and
all
the
automatic
crush
manipulation.
Stuff
just
kind
of
broke
super
tedious.
Now
it
works
out
of
the
box.
It's
really
simple!
So
that's
nice,
there's
a
streamlined
disk
replacement
process.
That's
well
documented,
so
set
disk.
You
can
replace
OS
D's
for
using
the
same
IDs
and
it's
going
to
be
simple
and
it's
actually
going
to
work.
It's
all
be
nice!
B
B
So
you
can
just
say:
I
want
to
be
able
to
compatible
be
compatible
with
hammer
clients,
and
you
tell
the
cluster
that
and
it'll
just
prevent
you
from
doing
anything
that
would
break
break
that
constraint
just
to
make
operator
slides
a
little
bit
easier,
we're
annotating
and
documenting
all
the
config
options
in
the
code.
So
you
can
just
do
a
dump
and
see
all
the
config
options,
what
they
mean
and
what,
whether
you
should
touch
them
or
not.
They'll
be
marked
as
like
experimental
developer.
Only
do
not
touch.
B
First
is
something
that
you
should
adjust
expert.
Only
that
sort
of
thing
there's
a
mechanism
so
that
if
a
PG
or
object
is
stuck,
then
there's
a
back
off
mechanism,
so
the
clients
will
stop
sending
requests
which,
in
certain
recovery
situations,
can
bite
people
in
the
whatever
so
that
they
can't
actually
talk
to
the
roasty.
So
there's
some
things
like
that
that
are
fixed
from
having
better
eio
handling
appearing
in
recovery,
speed,
ups,
new
and
crack,
and
actually
but
nuisance
Jewel
Saif
is
now
in
most
cases.
If
analyst,
he
fails,
it
immediately
notices.
B
You
have
to
wait
for
a
heartbeat
timeout.
So
it's
much
faster
failure,
detection
and
cluster
moves
on
so
lots
of
good
stuff.
That's
there's
a
ongoing
list
that
just
sort
of
random
little
robustness
stuff,
that's
improving,
so
good
things
there
I'm
sort
of
moving
out
of
radius
into
the
raitis
gateway.
B
We
sort
of
have
this
high-level
view
that
in
the
future,
most
data
is
going
to
be
stored
in
object
stores,
so,
while
block
is
obviously
very
important,
particularly
for
cloud
workloads
and
hosting
VMs,
that's
not
actually
where
most
of
the
data
is
going
to
be
most
of
its
going
to
end
up
in
objects
like
s3
and
Swift
api's,
and
so
there's
a
whole
raft
of
features
that
we're
looking
at
here.
Things
like
a
researcher
coding,
tearing
multi-site,
Federation
and
so
on.
B
So
new
and
luminous
sort
of
the
biggest
new
thing,
that's
most
exciting
is
greatest
gateway,
metadata
search.
So
we
already
have
this
mechanism.
I
might
build
kinda
screwed
up
here
this
mechanism,
where
you
can
take
SEF
clusters
and
multiple
data
centers
or
the
same
data
center,
but
you
can
have
multiple
zones,
sort
of
quasi-independent
rgw
installations
that
are
federated
with
each
other,
so
they
share
a
bucket
namespace
and
user,
and
you
can
put
a
bucket
in
a
particular
zone.
You
can
replicate
across
ohms.
B
So
if
you
set
up
one
of
these
zones
to
index
your
object,
gateway
content,
you
can
have
an
index
either
the
default
stuff
or
whatever
headers
you
care
about.
Then
you
can
go.
Do
search
queries
to
find
out
what
you're,
storing
you
know
what
file
types
with
headers
are
setting
whatever
you
want
to
do
so
that's
exciting
and
totally
new.
A
bunch
of
other
stuff
with
the
Rado
Skateway
there's
a
new
NFS
gateway
and
it's
actually
been
present
in
some
versions
of
dual
got
back
ported
some
of
the
downstream
stuff.
B
I
can't
remember,
if
actually,
if
it
was
upstream
before,
but
this
is
for
the
right
escape
way,
there's
a
very
simple
NFS
gateway
that
lets
you
mailed
nfsv4
v3,
the
cop
in
data
or
copy
out
data,
which
is
great
for
migrating.
Existing
workloads
from
sort
of
file
based
storage
systems
to
object,
as
you
make
that
transition,
and
it's
not
meant
to
be
a
full
POSIX
file
system.
It
doesn't
do
small,
writes
and
renames
and
truncates,
and
all
that
random
crap.
But
for
just
copying
data
in
and
out
it
works
great.
B
So
that's
that's
big
for
a
lot
of
users,
so
the
biggest
management
operations,
headache
that
we
are
resolving
is
dynamic,
bucket
index
charting.
So
for
our
GW
users,
the
bucket
indexes,
if
you
put
too
many
objects
in
a
bucket,
the
index
would
get
big
and
there
was
a
tool
that
you
could
do
offline
that
would
restart
it
or
if
you
created
a
bucket,
you
could
declare
decide
what
the
charting
was
up
front,
but
it
was
kind
of
a
headache
and
you
had
to
plan
ahead
and
it
wasn't
very
friendly.
B
Finally,
a
luminous,
that's
just
going
to
be
automatic.
So
as
the
book
it
gets
big
it
will
restart
on
its
own,
and
you
want
to
do
anything.
It'll
happen
online
it'll
just
not
be
something
you
have
to
worry
about,
just
sort
of.
If
you
might
there's
a
bit
of
a
theme
of
not
having
to
worry
about
annoying
things
that
we're
trying
to
chip
away
these
things.
So
so
that's
good.
There
also
a
couple
other
sort
of
headline
features
that
came
into
the
gateway
through
team
room
rant,
as
did
a
bunch
of
great
work.
B
There's
inline
compression,
so
rgw
will
compress
the
data
as
it
comes
into
the
cluster
and
write
that
compressed
data
to
to
ratos.
So
that's
good
and
sort
of
happens
transparently.
There's
also
a
bunch
of
encryption
api's
that
were
implemented.
These
are
following
the
s3
encryption.
Ups
I
can
trouble
with
the
name
of
that
the
whole
categories,
but
you
can
set
keys
on
the
buckets
on
the
users.
It's
a
whole
bunch
of
it's
a
big,
complicated
API
that
Amazon
made
up.
B
I'm
gonna
basically
implemented
it,
so
it's
there
yep
and
then
there's
it's
a
whole
bunch
of
stuff
with
the
s3
and
Swift
API,
that's
been
improved
and
added
and
updated
and
there's
sort
of
always
a
constant
flow
of
issues.
There
I
get
resolved.
Those
are
sort
of
the
big,
exciting
things
on
the
rate
of
Skateway
site
on
the
rate
of
block
device.
B
Also
lots
of
stuff
going
on
the
biggest
thing,
obviously
is
the
racial
coding
which
I
already
mentioned,
but
I'm
going
to
mention
it
again
because
it's
a
big
deal,
you
can
run
rate
of
block
devices
on
a
richer
coded
pool
and
buy
less
hard
disks
and
SSDs
and
everything
else.
So
it's
pretty
simple.
You
just
specify
the
data
pool
when
you're
creating
the
our
buddy
block
device,
and
it
puts
just
the
data
blocks
in
that
in
that
pool.
B
There's
also
a
lot
of
work
went
into
the
our
buddy
mirroring
mechanism,
so
the
our
Bhatia
mirroring
daemons
are
now
both
they're,
multiple
ones
of
them
and
they're
sharing
the
load
and
their
H
a
and
all
that
stuff,
whereas
in
the
ghoul
there
it
existed,
but
it
was
just
one
demon.
Now,
it's
a
bunch
of
them
and
they
scale
out
and
all
that's
all
that
stuff,
so
lots
of
stuff
there,
I'm,
mostly
just
around
robustness
and
not
so
much
around
new
feature
capabilities,
improved
cinder
integration.
B
Noise
is
ongoing,
stuff
OpenStack
a
lot
of
work
going
into
I
scuzzy.
This
has
sort
of
been
a
multi-year
journey
of
various
false
starts
and
attempts
to
use
different
kernel
interfaces
that
get
tanked
by
upstream
kernel
whatever,
but
the
final,
the
new,
the
latest
I
scuzzy
approach
is
based
on
l,
AST,
CMU
Runner,
which
is
basically
a
user
space
pass-through.
B
So
the
ice
cozy
kernel
target
and
the
kernel
is
going
to
pass
through
to
user
space
to
lebar
BD,
which
is
nice
because
you
get
the
full
level
everyd
feature
set
on
the
latest
and
greatest
there
and
the
performance
penalty
of
doing
that.
Pass-Through
is
very
modest.
Actually,
so
that's
good.
It's
going
to
be
a
full
H,
a
solution
that
does
failover
it
scuzzy
reservations
and
all
that
stuff.
B
So
that's
that's
coming
and
on
the
kernel
side
there
have
been
lots
of
our
beti
improvements,
keeping
up
with
the
crush
and
OSD
and
closer
protocol
changes
that
have
happened.
That's
all
there
I'm
already,
specifically
the
exclusive
blocking
stuff
is
in
the
kernel
now
upstream
and
support
for
the
object
map
stuff,
which
are
both
kind
of
old
features,
but
they're
now
in
the
chronal.
So
if
you're
using
the
native
chrono
block
device,
you
can
get
that
stuff
when
finally
set
of
s.
B
So
if
you've
seen
in
my
talks
last
few
years
or
last
year-
I
guess
you've
seen
this
before,
but
we
used
to
talk
about
SEPA,
fests
and
stuff,
saying
that
all
these
parts
of
stuff
were
awesome.
But
sefa
Fest
was
nearly
awesome
because
it
wasn't
ready
yet
and
yadda
yadda
and
finally,
now
Stefan
Fest
is
production-ready.
It's
stable.
C
C
B
There
we
go
okay,
there
we
go
yes,
so
multiple
active
MVS
is
finally
and
there's
them
a
bunch
of
stuff
to
go
along
with
that.
So
the
multiple
NDS
is
have
this
load
balancing
framework.
That's
all
heuristic
based
and
tries
to
understand
your
workload
and
move
things
around,
but
it's
hard
to
understand
what
client
workloads
are
doing
so
there's
also
this
manual
mechanism.
Is
you
can
just
go
in
and
say
this
subtree,
this
directory
I'm
just
going
to
pin
it
to
that
NPS.
B
So,
if
you
want
to,
you,
can
just
manually
just
enforce
whatever
the
subtree
partition
is
that
you
want?
If
you
don't
want
to
rely
on
the
automatic
thing
to
do
its
thing,
which
it
might
do
right,
it
might
do
wrong.
It's
sort
of
ongoing
work,
so
there's
that
directory
fragmentation
is
finally
on
by
default.
This
is
stuff
if
s
dealing
with
very
large
directories,
it'll
break
them
up
into
little
pieces
and
put
them
in
separate
objects
and
multiple
DS's
and
all
that
stuff.
B
B
So
that's
all
there
and
a
lot
of
work
also
going
in
on
the
on
the
kernel
client
side,
also
keeping
the
chrome
client
up
to
date
with
all
the
changes
in
user
space
and
improve
fixing
bugs
and
so
on
so
group
effort
here,
mostly
from
redhounds
user
developers.
But
it's
been
it's
been
good,
so
we're
really
excited
about
some
of
us.
Ok,.
C
B
B
B
They're
awesome
anyway,
so
mimic
lots
of
stuff
again
the
sort
of
the
sort
of
that
highest.
The
main
motivation.
I
say
the
main
priorities
are
like
you
know,
makes
F
faster
performance,
but
mostly
as
far
as
features,
we're
feel
like
we're
in
pretty
good
position.
The
main
challenge
I
think
that
a
lot
of
both
the
stack
users
are
facing
or
dogapus
access
users
are
facing
is
around
usability.
It's
just
hard
to
manage.
B
B
But
it's
going
to
be
painful,
but
it's
really
important,
because
the
current
structure
of
the
code
is
hard
to
maintain
because
it's
gotten
so
complicated
and
it
doesn't
perform
as
well
as
it
needs
to
so,
as
our
storage
devices
get
faster
and
faster,
we
really
need
to
address
this
sort
of
elephant
in
the
room
in
order
to
make
progress.
So
that's
going
to
happen.
Ongoing
work
on
blue
FS
in
rocks
v2.
There
are
some
sort
of
tactical
items
that
we
are
dealing
with
there,
but
really
we're
limited
by
that
OSD
piece.
B
We've
done
a
lot
of
optimization
on
the
messenger
side,
you
saw
the
talks
earlier
with
our
DMA
and
so
on.
That's
getting
much
much
faster,
getting
stuff
into
an
out
of
Ceph
and
blue
store
is
much
much
better.
I
mean
getting
stuff
on
disk
and
off
disk.
We
sort
of
eliminated
the
main
issues
there,
but
it's
really
everything
in
between
that
needs
to
be
fixed
up.
So
so
that's
the
big
thing.
That's
going
to
keep
at
least
some
piece
of
our
team
pretty
busy,
but
there's
some
other
exciting,
Rheda
stuff,
that's
coming
too.
B
So
one
of
the
efforts
has
been
going
on
for
quite
a
while
now,
but
sort
of
has
been
a
secondary.
Priority
has
been
working
on
quality
of
service.
This
ongoing
background
development
around
the
DM
clock.
Algorithm
I
was
published
several
years
ago
in
an
academic
conference.
It's
distributed
quality
of
service.
That
gives
you
two
things.
B
Oh,
we
can
prioritize
pools
so
that
certain
pools
will
get
more
be
faster
than
other
pools
on
the
same
hosti
or
just
do
it
based
on
client,
so
that
this
client
gets
a
minimum
reservation.
This
one
gets
whatever
is
left
over,
but
the
problem
is
it's
just
a
complicated
problem,
especially
when
you
talk
about
distributed
systems
and
things
that
replicate
so
you're
actually
signing
up
to
do
I/o
and
other
people's
nodes
it's
complicated,
but
despite
that,
our
initial
testing
has
actually
shown
pretty
good
results.
B
So
we're
encouraged
that,
despite
not
having
sort
of
a
complete
solution,
necessarily
it's
actually
actually
seems
to
be
working
pretty
well.
The
main
thing
missing
right
now
is
really
having
any
kind
of
management
framework.
So
we
have
a
lot
of
the
underlying
doing
stuff
done,
but
we
don't
know
how
it's
going
to
be
configured
and
how
what
the
user
experience
is.
Gonna,
look
at
that
sort
of
all
work
TBD,
but
the
initial
results
are
promising.
So
this
is
a.
B
This
is
an
example
of
a
test
run
a
month
or
two
got
back
where
you
have
a
couple:
clients
that
have
minimum
I
observations
of
100
or
no
50
100
apps,
and
then
the
third
one
had
a
very
high
priority.
So
everything
that
was
left
over
beyond
that
ended
up
given
to
the
third
client
and
not
the
first
two,
and
you
can
see
that
it
actually,
you
know,
does
what
it
says.
B
It's
supposed
to
do
so
so
it's
exciting
we're
sort
of
getting
pieces
that
merged
and
it's
sort
of
coming
together,
but
it'll,
probably
a
couple
releases
before
it's
actually
a
complete
usable
thing.
The
other
thing
that's
going
on
is
there's
more
work
in
the
tearing
department.
So
once
upon
a
time,
we
did
this
thing
called
cash
steering
and
it
worked
okay,
but
not
great,
and
so
we
sort
of
stopped
talking
about
it
and
doing
much
with
it.
This
the
new
tearing
stuff
is,
is
coming
and
it's
it's
based
on
some
pretty
simple
primitives.
B
So
the
basic
idea
is
the
concept
of
a
redirect.
So
you
have
a
ratos
object.
That's
basically
assembling
to
another
lab
object,
but
from
the
clients
perspective,
you
don't
know
you
just
talk
to
the
OSD
and
it
proxies
it
through
to
wherever
it
is
so,
instead
of
you'd
be
able
to
move
from
a
cache
cheering
type
model
where
you
have
sort
of
a
sparse
set
of
objects,
that
may
or
may
not
be
there
in
the
cache
tier
and
if
you
miss,
you
goes
through
the
base
tier
to
the
new
model.
B
Where
you
go
straight
to
the
base
tier,
which
is
essentially
an
index,
it
knows
what
all
the
objects
are
and
either
they're
either
there
or
there's
a
pointer
to
where
they
are,
and
then
they
can
be
either
in
you
know,
one
slow
pool
or
different
slow
pool
or
wherever.
So
it's
a
bit
more
flexible
and
it
enables
us
to
do
other
things.
B
So
deduplication
is
a
project
that
the
folks
at
SK
have
been
working
on
for
a
while
and
we've
been
helping
out
a
little
bit
with
and
it
sort
of
builds
on
this
basic
concept
of
a
redirect.
So
the
idea
is
to
generalize
a
pointer
to
somewhere
else
with
a
manifest
that
says,
you
know
this
part
of
the
object
is
over
in
that
piece,
and
this
part
of
the
object
is
over
in
that
piece,
so
you
can
have
fragments
that
are
stored
in
other
pools,
so
we
break
objects
into
chunks.
B
We
can
store
those
chunks
in
content,
addressable
pools
where
you
hash
the
content,
and
so
your
DD
Bing,
based
on
the
content
and
the
reference
count
those
chunks
and
then
you
can
have
these
manifests
that
point
to
a
bunch
of
different
chunks.
So
it's
how
the
basic
idea,
how
all
the
you
know
due
to
Bing
storage
systems,
work
with
you
know,
chunking
things
and
so
on,
but
we're
it's
scale
out
in
the
sense
that
that
that's
sort
of
that
base
ray
dosed
here
is
acting
as
the
index.
That
says
you
know.
B
This
is
the
name
of
the
object.
You
look
up
the
name
and
that
pool,
and
it
tells
you
what
the
chunks
are
and
where
they're
stored.
So
that's
that's
the
basic
idea
and
that's
the
direction
we're
going
in
there's
a
lot
that
sort
of
still
to
be
determined.
Is
this
going
to
be
inline,
chunking
and
storing,
or
is
it
going
to
be
post
processing?
Is
that
going
to
happen
inside
the
OSD
or
by
an
external
agent?
Is
it
going
to
be?
B
So
you
can
get
a
little
eye,
ops
graph
without
any
additional
work,
but
eventually
will
I
want
to
have
that
stream
off
to
external
platforms.
So
if
you
have
like
Prometheus
or
zabbix
or
whatever
your
big
thing
is,
you
can
also
just
turn
on
the
firehose
and
send
it
all
there.
So
that's
that's
coming
it
might
even
it's
even
possible.
Maybe
that
will
get
the
Prometheus
stuff
in
for
Luminess,
but
maybe
I,
don't
we'll
see,
that's
I
know
fast.
They
are,
but
there's
also
the
intent
of
manager.
B
B
So
it
lots
to
do
there
and
it's
kind
of
exciting,
because
it's
that
there's,
because
you
can
write
things
in
Python-
that
are
more
policy
based
than
it
brings
in
a
whole
new
pool
of
contributors
that
can
that
can
write
that
kind
of
code.
So
that's
good,
there's
also
stuff
on
the
architecture
front.
So
there
are
arm
64
builds
I've
mentioned
this
before,
but
we're
still
trying
to
get
enough
hardware
in
the
lab,
where
we
can
actually
do
these
on
a
regular
basis
and
get
them
into
the
C
ICD
pipeline.
B
So
we
have
some
of
the
hardware
we're
still
waiting
on
a
few
more
boxes,
but
the
intent
is
that
going
forward,
all
the
new
releases
are
going
to
have
arm
64
packages
for
both
CentOS
and
for
a
boon
they're.
Also
much
of
patches
coming
in
recently
on
PowerPC,
adding
support
for
that
as
well,
and
we're
talking
about
getting
PowerPC
Hardware
in
the
community
laughter.
Do
those
builds
so
they
power
a
while
back.
We
did
some
work
with
arm.
32
builds
because
we
built
this
500
node
cluster
4
petabytes.
B
Out
of
these
little
micro
servers,
it's
essentially
a
hard
disk
with
an
arm
server
on
the
hard
disk
speaking
use
a
net,
so
you're
running
those
T's
on
the
hard
disk,
literally
no
boxes
hosting
them.
That
was
pretty
fun.
That
was
with
WD
labs
and
they're,
actually
doing
an
update
to
their
platform.
They're
doing
a
that
was
agenda
to
drive
they're,
doing
a
gen
3
Drive
that
has
a
64-bit
arm,
yay
and
more
RAM
and
better
networking
and
all
kinds
of
stuff,
so
we're
working
with
them.
It's
exciting!
B
If
you're
interested
in
that
project-
or
these
things
seem
interesting
to
you-
you
should
contact
Jim
Wilshire
at
WDC,
they're,
looking
for
POC
people
to
work
with
so
exciting,
good
stuff
and
then
finally,
a
client,
caching
sort
of
across
the
board,
so
they're
on
the
greatest
gateway
side.
There's
a
project
with
Boston
University
and
until
I
think
worked
on
it,
where
they
added
a
persistent
cache
for
a
rate
of
gateway
to
support
their
big
data
workloads
over
rgw,
and
they
it
worked
great
they're
putting
stuff
on
nvme
they're
at
like
saturating
the
N'diaye
they're.
B
Getting
really
good
performance
I
mean
it
didn't,
sacrifice
consistency
because
of
the
way
that
it
was
architected
with
they're
doing
immutable
objects
only.
So
that
was
great
and
the
students
a
couple
of
the
students
who
worked
on
that
are
now
interns
for
the
summer
at
Red
Hat,
so
we're
playing
us
to
get
all
that
code,
cleaned
up
and
merged
into
the
tree,
so
that's
exciting
and
on
the
RBD
front,
we're
also
very
interested
in
doing
client-side.
B
Caching,
if
you
saw
the
talk
earlier
with
Jason
and
Tushar
we're
looking
both
at
immutable
caching,
so
that
you
have,
if
you
have
snapshots
that
are
the
basis
for
clones,
then
that
sort
of
immutable
parent
can
be
cached
and
then
also
write
back
cache.
So
you
get
sort
of
low
latency
writes
that
then
gets
streamed
back
to
the
cluster.
Then
seth
FS
are
actually
already
has
a
persistent
client-side
cache.
B
If
you're
using
the
kernel
client,
it's
been
there
for
awhile
there's
a
generic
kernel
infrastructure
called
FS
cache
that
plugs
into
Seth
FS,
so
I
run
the
set
aside.
You
already
have
client
caches
at
least
a
read-only
cache.
It's
not
doesn't
do
written
data,
but
yes,
so
client
caches
are
good
and
that's
that
sort
of
it.
For
my
whirlwind,
tour
of
all
the
new
development
stuff
I'm
going
to
talk
a
little
bit
about
all
the
people
who
are
helping
us
do
it.
So
these
graphs
are
a
little
bit
old.
B
I
didn't
get
updated
once,
unfortunately,
but
lots
of
people
are
contributing
to
Ceph.
The
number
of
contributors
is
increasing
or
it's
great.
We
love
it.
It's
it's
a
challenge
for
us
to
keep
up
with
all
the
pull
requests
and
reviews.
So
apologies
that
you've
submitted
a
pull
request
and
you
feel
ignored
just
keep
painting
us
and
we're
busy,
but
we
we
want
you
to
keep
doing
it
and
the
community
is
broadened
and
expanding.
So
these
are
the
most.
These
are
the
top
contributors
from
since
jewel.
B
So
it's
updated
a
bit
since
my
last
talk,
but
you'll
see
that
there's
sort
of
a
broad
set
of
people
here,
so
you
can
see
all
of
the
OpenStack
vendors
on
this
list.
With
the
you
know,
the
Linux
people
and
easy
stack
unite
stack.
You
see
a
whole
bunch
of
cloud
operators,
a
bunch
of
them
and
in
a
pack
and
also
in
Europe
public
cloud,
private
clouds
all
across
the
board.
Not
all
of
them
are
using
OpenStack,
although
I
think
most
of
them
are.
B
You
also
see
hardware
and
solution,
vendors
that
are
selling
software
products
based
on
self
SEF
or
in
some
cases,
hardware,
products
based
on
SEF,
which
is
very
exciting
people,
like
quantum
twice
in
fact,
that
you
don't
usually
see
on
these
lists
and
a
couple
of
the
people
that
actually
I
don't
really
know
what
they
do.
I
guess
I
could
have
googled
it,
but
but
it's
exciting
it's
exciting
to
see
the
breadth,
I
guess
of
a
contribution.
So
there
are
lots
of
ways
to
get
involved.
There's
mailing
list.
We
do
a
set
developer
monthly.
B
So
every
month
we
have
a
developer
video
call.
We
alternate
a
pack
friendly
and
amia
friendly
times,
and
we
just
talk
about
whatever
development
issues
are
pending,
it's
all
virtual
IRC,
whatever
so
so
you
can
join
and
then,
if
you
want
more
events
like
this,
of
course,
they're
set
days,
this
one's
awfully
convenient
cuz
it's
at
OpenStack
summit,
but
we
do
them
all
across
the
world
about
once
a
month.
B
You
can
go
see
the
schedule
there
is
the
next
ones
are
all,
and
in
Asia
next
few
months,
their
meetups
that
you
can
search
for
in
various
locales.
We
also
do
SEF
tech
talks.
I
think
this
is
like
the
last
Thursday
of
every
month
or
something
there's
always
an
on.
It's
like
a
YouTube
blue
jeans
thing
where
somebody
just
does
some
technical
presentation
of
sub
check
on
some
subject
related
to
SEF
those
tend
to
be
debris,
interests
and
all
of
this
stuff
is
recorded
and
ends
up
in
a
set
YouTube
channel.
B
B
D
B
That
is
the
intention,
so
the
way
that
damn
clock
works,
the
EM
clock
piece
is
the
actual
cue
prioritized
queue
that
does
waiting.
That
does
both
minimum
reservations
and
waiting
and
the
D
part
is
the
distributed
part.
So
there's
metadata
that's
shared
between
iOS
by
the
clients
across
those
DS,
so
you
get
a
global
reservation
and
not
just
a
local
one.
So
the
intention
there
is
that
you
can
tag.
B
So
just
measuring
usage
is
one
of
the
thing
I
should
have
mentioned
that
that
we
want
to
do
in
manager,
so
have
the
have
all
those
T's
in
the
system
sample
the
request
streams
and
send
that
information
back
to
the
manager,
so
they
can
build
like
a
top
a
top
view,
basically
of
who's,
doing
all
the
IO
in
the
system.
Okay,
thanks.
D
B
E
B
Yes,
we
have
many
plans
to
optimize
the
CPU
utilization,
so
that's
largely
what
the
the
OST
ver
factor,
one
of
the
main
things
that's
looking
to
address,
but
also
just
across
the
board,
we're
doing
a
lot
of
profile
and
we're
trying
to
figure
out
where
we're
wasting
CPU
and
data
structures
and
whatever
else
that
just
shouldn't
be
done.
The
way
it
does
so,
if
you're
a
if
you
like,
optimizing
and
profiling
whatever
then
like
we'd
love
to
have
you
be
involved.
B
B
Region
so
in
joule,
the
the
multi-site
federation
for
our
GW
was
almost
completely
rewritten,
so
there's
a
whole
new
way
to
configure
zones
and
zone
groups
and
replication
bi-directional
stuff,
and
there
are
ongoing
improvements
to
that
bug,
fixes,
but
there's
no
new
feature
per
se
in
Luminess,
except
for
the
metadata
indexing.
So
but
it's
changed
since
hammer
and
Firefly.
So
if
that
was
last
time
you
looked
at
it,
it's
it's
newer
and
better
and
more
robust
and
all
that
man,
two.
F
B
I
mentioned
usability
a
couple
times
for
building
a
Trello
board
with
all
the
annoying
things,
and
that
was
one
of
the
cards.
I
just
added
get
a
day,
I
think
it's
a
simple
enough
thing:
that's
going
to
make
it
into
an
luminous
so
that
we
can
also
gate
changing
that
minimum
required
client
on
whether
those
clients
are
connected.
So
it'll
prevent
you
from
saying
require
luminous.
If
you
have
a
more
clients
that
are
talking
to
the
cluster
or
something
yeah.
F
B
You're
not
so
the
the
OSDs
already
report,
just
their
sort
of
average
latency
metric
to
the
monitor
and
there's
already
a
command
called
set,
postie
perf
that
you
can
pipe
the
sort
Kay
and
whatever
and
you
can.
You
can
see
it,
but
it's
it's
annoying.
You
have
to
go,
do
it
so
one
of
the
ideas
is
that
the
manager
will
be
able
to
now
that
it
met
it.
Has
all
those
metrics.
It
can
do
its
do
that
automatically,
and
we
can.
You
can
write.
You
know
easier
to
understand
code
in
Python.