►
From YouTube: 2014 #vBrownBag OpenStack Summit Atlanta Ross Turk Introduction to Ceph Project for OpenStack
Description
There has been a lot of buzz surrounding Ceph and OpenStack lately, and a lot of new developments. This talk will provide a quick overview of Ceph's architecture and the integration between Ceph and OpenStack, examine recent use cases, and discuss what is coming up next. We'll take a look at how to keep fans of both platforms happy and discuss how the less-oft-used pieces of the Ceph platform can help augment your OpenStack setup. We'll also talk about how to get started and what the community can do to get involved.
A
Hi
I'm
roster
kromagg
tank
now
a
redhead,
subsidiary
and
I'm
going
to
give
a
talk
today
on
an
introduction
to
Seth.
This
is
a
talk
I've
given
a
bunch
of
times
in
the
past,
it's
an
overview
of
all
the
different
architectural
components
that
make
up
the
set
storage
system,
how
they
fit
together
and
what
they
do,
but
first
I'd
like
to
start
by
talking
about
a
little
bit
about
what's
F
is
and
why
it
exists
in
the
ecosystem.
A
If
you
look
on
the
left
hand,
side
of
the
slide
here
at
your
traditional
proprietary
storage
stack,
it's
a
whole
bunch
of
proprietary
hardware,
bunch
of
computers
and
a
bunch
of
disks
that
you
can't
really
do
much
with
their
put
together
in
an
appliance
on
top
of
that
is
some
proprietary
software,
and
on
top
of
that
is
a
support
and
maintenance
contract,
and
this
is
big
business.
This
is
billions
and
billions
of
dollars
of
business.
A
The
sefirah
approach
is
something
different:
its
standard
hardware,
normal
computers
with
normal
disks
every
day;
hardware,
open-source
software
that
you
don't
pay
for,
and
then
enterprise
products
and
services
that
you
can
that
you
can
use.
If
you
need
them,
if
you
really
need
them,
so
it's
it's
a
completely
different
approach
and
in
general,
what's
F
aims
to
do
is
due
to
storage
what
linux
did
to
the
operating
system
so
that
people
can
use
open
source
software
and
standard
hardware
to
do
what
used
to
be
a
very
proprietary
activity.
A
Taking
a
broad
look
at
stuff.
This
is
an
architectural
overview.
It
starts
with
Rados
at
the
bottom
liber
a
dose
is
a
programming
library,
and
then,
on
top
of
that,
there
are
three
interfaces:
ratos
gateway,
the
raters
block
device
and
stuff
FS
I'm
going
to
welcome
to
each
one
of
these
and
talk
about
what
makes
them
interesting
and
I'll
start
with
ray
dos.
The
one
at
the
bottom,
oh
I'll,
go
back.
A
Ray
dose
is
actually
an
acronym
that
stands
for
reliable
autonomic,
distributed,
object,
store
and
Ray
dose
is
the
objects
toward
that's
underneath
everything
in
this
fr
katek
chur,
the
radel's
cluster
is
made
up
of
monitors
and
os
DS
and
an
OSD
looks
sort
of
like
this
shape
on
the
left,
where
you
have
a
server
with
disks
standard
disks
and
you
put
standard
file
systems
on
top
of
those
disks.
Ext4
butter,
FS
or
XFS
are
the
chosen
file
systems.
Now
we
think
butter
FS
is
the
future.
Ext4
is
the
one
we're
choosing
right
now.
A
On
top
of
that
file
system,
you
run
object,
storage,
demons.
These
are
software
agents
that
you
just
configure
with
the
path
and
what
they
do
is
they
take
the
storage
resources
that
are
available
at
that
path
and
they
make
them
part
of
the
object,
storage,
cluster
part
of
the
radios
clustered,
and
then
they
come
one
of
many
many
servers
in
the
rails
cluster.
The
important
thing
to
know
here
is
that
underneath
an
object,
storage
demon
in
Rados
is
a
file
system
and
then,
of
course,
there's
an
application.
You
don't
deal
with
just
one
of
these.
A
The
components
of
ray
dos
just
to
sort
of
review
them
are
the
OS
DS
and
the
monitors
the
OSD
s
tend
to
be
tends
to
thousands
tens
to
ten
thousands
in
a
cluster
40
s
DS.
These
are
the
things
that
actually
serve
stored
objects
to
the
clients.
So
when
you
get
data
out
of
the
Rados
cluster
you're,
getting
it
through
the
OS
DS,
the
client
connects
directly
to
the
OS
DS,
there's
generally
10
SD
/
disk.
You
can
also
put
SSDs
earning
them.
Of
course
you
can
put
raid
groups
under
them.
A
If
you
want,
although
you,
you
may
end
up
having
some
redundancy,
because
Steph
does
its
own
replication,
but
generally
you
can
figure
the
OSD
with
the
path
to
store
its
information
in
and
it's
true,
it
serves.
All
these
stores
are
options
to
clients
and
also
these
Oh
SDS
intelligently,
pier
with
one
another
for
replication
tasks
and
for
recovery
tasks.
I
like
to
tell
people
that
the
architecture
of
stuff
is
closer
to
bittorrent
than
it
is
to
netapp
and
I.
A
Think
you'll
understand
why,
in
a
little
bit,
it's
more
like
a
peer-to-peer
network
than
it
is
a
monolithic
storage
appliance.
The
monitors,
on
the
other
hand,
are
a
fewer
number.
You
don't
need
lots
of
these.
You
need
a
small,
odd
number
of
these
more
than
one
generally.
If
you
want
to
have
the
ability
to
you
know
cope
with
the
failure
of
it.
Three
is
the
right
number
for
most
clusters.
Some
clusters
have
five,
but
three
is
generally
the
right
number.
A
You
want
a
small,
odd
number,
because
monitors
vote
and
their
job
is
to
understand
the
state
of
the
cluster
who's
in
who's
out
who's
up
who's
down
which
OS
these
are
part
of
the
cluster
in
which
are
not,
and
they
also
maintain,
what's
called
the
crush
map,
which
is
the
mapping
of
the
architecture
of
the
cluster
which
is
used
to
determine
where
to
put
data
into
the
cluster.
The
monitor
does
not
serve
objects
to
clients,
it's
not
part
of
the
data
path.
A
So
that's
why
you
can
have
a
small
odd
number
of
these
is
because
the
data
that
you
get
from
seth
is
not
actually
going
through.
These
monitors
it's
important
of
a
small
number
because
they
vote
like
I
said,
and
the
more
monitors
you
have
the
longer
it
takes
for
them
to
vote,
and
you
want
an
odd
number,
because
if
there's
ever
a
tie,
it
uses
paxos
underneath
to
figure
out
how
to
resolve
a
tie
if
there
is,
but
a
small
odd
number
is
the
most
effective
way.
So
this
is
ray
dos
it.
A
It's
essentially
a
collection
of
software
demons,
Oh,
SDS
and
monitors
that
manage
the
underlying
storage
resources
and
then
make
them
available.
A
thing
that's
kind
of
interesting
about
Ray
dose
is
where
objects
live.
When
you
have
a
distributed
storage
cluster
and
you
need
to
put
an
object
into
it,
the
question
is
which
which
host
do
I
connect
to?
It
is
a
TCP
connection.
After
all,
which
hosts
do
I
connect
to
store
this
object
into
the
cluster,
and
one
way
of
doing
that
is
you
know,
step
one?
A
Is
you
go
to
a
metadata
server
and
you
say
I'm?
Looking
for
this
object,
where
is
it
and
it
goes
oh,
its
own
server
25,
and
then
you
go
to
that
server.
The
challenge
with
this
approach
is
that
you're
writing
down
where
everything
is
and
if
you're
writing
down
where
everything
is,
you
have
a
scalability
limitation,
because
the
place
where
you're
writing
it
down
can
only
be
so
big
and
you
can
only
read
it
so
big
you
can
we
query
it
so
quickly.
A
The
other
approach
is
to
calculate
the
placement
of
the
object
inside
the
cluster,
and
a
lot
of
folks
are
a
lot
of
technologies
will
just
by
taking
the
cluster
and
splitting
it
up
into.
You
know,
like
a
book
show
for
the
World
Book
Encyclopedia
on
it,
where
you
have
a
through
z
and
M
is
really
big,
and
then
they
put
Q
&
Z
together,
and
you
know
a
lot
of
will
work
that
way
and
then,
as
the
client,
if
my
final
name
starts
with
F
I
know
it's
on
the
third
book
down
or
whatever.
A
So
you
choose
the
right
server
based
on
the
name
and
a
general
distribution
across
the
cluster
which
which
works
in
general,
except
when
the
cluster
grows
and
shrinks
and
there's
massive
repositioning
that
has
to
do
it
has
to
occur.
So
that's
the
second
approach,
Seth
does
something
different
Seth
uses
an
algorithm
called
crush,
which
is
also
calculated
on
the
client,
but
it's
calculated
based
on
a
policy.
So
what
happens?
Is
you
take
the
object?
It
will
split
the
object
up
into
a
number
of
groups,
then
it'll
use
crush
to
determine
where
those
objects
live
inside.
A
The
cluster
crush
is
a
function,
it's
a
c
function
and
you
pass
it
two
things.
The
first
is
the
cluster
map,
which
is
a
data
structure
of
which
nodes
are
in
and
out
and
up
and
down,
and
the
second
thing
you
pass,
it
is
your
crush
map,
which
is
the
hierarchy
in
your
cluster,
and
a
series
of
policies
like
I.
Have
this
many
rows
in
this
many
racks
in
this
many
shelves
never
put
two
copies
of
the
same
object
in
the
same
failure
domain.
A
So
what
ends
up
happening
is
the
cluster
is
constantly
using
crush
this
algorithm
to
rebalance
the
data
in
the
cluster
and
clients
who
want
to
interact
with
the
cluster
are
using
this
function,
to
figure
out
where
to
connect
in
the
cluster
and
as
long
as
the
cluster
map
and
the
crush
map
are
the
same,
it
will
have
the
same
result,
and
then
you
use
it's
a
very
quick
calculation.
It
takes
almost
no
time
at
all
when
you
want
to
store
an
object
in
the
cluster.
A
You
call
a
calculation,
you
give
it
those
two
things
and
it
says
here
are
the
servers
that
it
belongs
on
and
if
the
cluster
changes
the
results
change
too
so
to
review
crushes
what
we
call
a
pseudo
random
placement
algorithm:
it's
not
random,
but
it's
pseudo-random.
It's
a
fast
calculation!
There's,
no
look
up,
it's
repeatable
and
deterministic,
meaning
that
if
you
call
it
again
with
the
same
inputs,
it
will
always
give
you
the
same
outputs
and
only
the
inputs
affect
the
outputs.
That's
what
we
mean
by
repeatable
and
deterministic.
A
It
gives
you
a
statistical,
uniform
distribution.
If
you
call
crush
on
a
wide
variety
of
data,
it's
going
to
distribute
it
relatively
in
a
uniform
fashion
and
it's
a
stable
mapping,
meaning
that
when
your
cluster
changes
very
little
of
the
mapping
actually
changes.
If
you
have
a
hundred
nodes
in
your
cluster
and
you
lose
one
you're
moving
data
from
99
nodes
to
99
notes,
essentially-
and
it's
only
1%
of
the
data
it
has
to
move
so
everything
doesn't
take
a
big
step
back.
You
know
or
or
move
one
step
to
the
left
like
that.
A
Like
happens
in
some
systems,
it's
also
rule-based.
So
you
set
policies
based
on
the
infrastructure,
on
the
topology
of
your
infrastructure
and
based
on
waiting
and
adjustable
replication
policies.
So
the
idea
is
you
configure,
crush,
crush
handles
the
placement
of
the
data
in
the
cluster.
So
that's
ray
dos.
It's
a
barely
scratched
the
surface
of
what
ray
doses,
how
you
access
ray
doses,
liberate
O's,
so
labret
dose
is
essentially
how
you
access
an
applicant
have
two
straight
us.
A
If
you
are
an
application,
it's
it's
a
library,
it's
C,
C++,
Python,
Ruby,
there's
a
bunch
of
native
language
bindings.
You
link
it
in
your
application
and
it
speaks
a
very
efficient
protocol
to
talk
to
the
cluster
and
it's
it's
a
it's
a
raw
socket
protocol.
It's
not
over
HTTP
here
and
you
can
like
that.
That's
that's
librettos!
It's
see,
suppose
plus
Python
PHP,
Java
Erlang
direct
access
to
the
storage
nodes,
very
little
overhead.
If
you're
writing
an
application
and
you're
tying
it
tightly
true
to
Seth.
A
This
is
the
way
to
do
it,
and
it's
also
the
library
that
we've
used
a
program
or
two
to
create
all
of
the
other
storage
interfaces
and
one
of
them
all
I'll
start
with
is
the
rado
skateway.
This
is
built
on
top
of
labret.
Dos
rios
gateway
is
essentially
a
rest
interface
on
top
of
Rados.
So
if
you
want
to
store
objects
into
the
ratos
cluster
using
rest
using
the
s3
or
Swift
interfaces,
you
run
ratos
gateway,
you
can
run
multiple
of
them,
but
put
them
behind
load
balancers
or
do
whatever
you
need
to
do.
A
They
speak
rest
out
of
the
top,
and
now
at
the
bottom
they
speak
that
efficient,
wireline,
socket
protocol
I
was
talking
about.
This
is
what,
for
example,
dreamhost
used
to
build
dream
objects,
which
is
there
there
s
three
competitive
product,
so
radel's
gateway
is
a
rest-based
objects,
George
proxy
everything
that
it
stores
all
of
its
buckets
all
of
its
accounting,
all
of
its
managing
who
has
access
to
what
is
also
stored
in
rados.
So
these
Raiders
gateways
are
ephemeral.
You
can
kill
one
and
start
up
another
and
all
the
same
stuff.
Is
there?
A
A
The
second
thing
that
got
built
on
top
of
on
top
oseph
is
our
block
interface,
which
we
call
the
rails
block
device.
This
is
a
kind
of
an
interesting
thing
and
that
it
takes
a
disk
image,
splits
it
up
into
a
bunch
of
chunks
generally
for
megabyte
chunks
and
stores.
Each
chunk
as
an
object
inside
of
Rados
then
has
lib
rbd,
which
is
a
years
of
space
library
or
k
rbd,
which
is
kernel
module
that
assembles
all
those
chunks
into
a
disk
and
presents
it
to
a
hypervisor
into
a
host.
A
So
this
is
this
is
something
that
is
used
for
cinder,
for
example,
to
access,
virtual
machines
and
boot
them
straight
off
of
the
cluster,
and
it
allows
things
like
live
migration
where
you
can
have
a
hypervisor
move,
a
virtual
machine
from
from
one
to
the
other,
because
you're
separating
storage
from
compute.
It
also
has
a
kernel
module.
So,
to
summarize,
our
BD
is
storage
of
disk
images
in
raid
0
sin
to
couples
the
vm
from
the
host.
It
has
cool
things
like
snapshots,
copy-on-write
clones.
A
It's
got
support
in
the
mainland
Linux
kernel
since
2
639,
it's
integrating
to
Q
and
Q
mu
and
kvm.
You
can
make
it
work
with
Zen
natives
n
supports
coming
soon
and
it's
integrated
into
most
of
the
clouds
facts,
including
of
course
OpenStack.
This
is
how
you
store
disks
inside
abrazos.
The
third
thing
is
set:
fso
ffs
introduces
a
new
type
of
storage,
no
called
the
metadata
server
which
handles
hierarchy
and
all
the
POSIX
semantics
statute
that
you
need
for
for
for
having
a
distributed
file
system
and
there's
actually
two
things.
A
So
the
first
thing
when
you
want
to
store
things
in
synthesis,
you
talk
to
the
metadata
server.
Then
you
do
the
data
path
through
the
OS
DS.
This
metadata
server
is
not
required
unless
you're
running
the
file
system
aspect
of
stuff.
It
also
stores
all
of
its
metadata
and
ratos,
and
it's
not
part
of
the
data
path.
So
if
you're
not
running
the
file
system,
you
don't
really
need
to
worry
about
it,
but
it's
another
cluster
component
that
you
need.
If
you
are
running
Seth
FS.
A
This
is
sort
of
an
overview
of
how
Seth
and
OpenStack
integrate
the
radio
skateway.
Is
the
integration
point
for
Keystone
and
Swift
allowing
object
storage
inside
ratos
in
a
way
that
integrates
with
OpenStack
on
the
block
side,
cinder
and
cinder
and
glance
and
Nova
are
integrated
with
the
rails
block
device
to
allow
storage
of
volumes,
snapshots
and
disk
images
inside
of
inside
of
set
and
boot
off
of
them
in
a
way
that
will
read
from
a
whole
bunch
of
storage?
A
Knows
it
once
when
you're
booting
a
virtual
machine,
so
getting
started
with
stuff,
you
can
always
go
to
SEF,
calm,
/,
get
to
download
SEF.
You
can
learn
about
Seth
at
SEF,
calm,
/
docs!
You
can
read
the
QuickStart
guide
to
get
up
and
running
quickly
at
SEF,
calm,
/
q,
SG,
and
if
you
need
help,
you
can
go
to
set
calm
/
help.
There
are
volunteers
waiting
with
shifts
on
IRC
to
answer
your
questions,
because
we
want
to
make
sure
everybody
can
understand
how
house
f
works
and
get
it,
get
it
up
and
running.