►
Description
Running Backups with Ceph-to-Ceph - Michel Raabe, B1 Systems GmbH
This presentation highlights several different methods for backups inside of Ceph (clusters).
We often receive requests for both local and remote backups. So we would like to introduce backup
methods using external tools as well as some using Ceph's own 'rbd export' or 'rbd-mirror' approaches.
Learn about the pros and cons of each approach, be warned of possible pitfalls both of native use or
OpenStack-based approaches.
About Michel Raabe
Cloud Solution Architect, B1 Systems GmbH
Michel is working for B1 Systems since 2008. He is...
A
Okay,
okay,
can
you
hear
me
yeah?
Okay,
perfect
yeah
welcome
everybody.
So,
let's
start
with
the
afternoon
sessions,
yeah
I'm,
my
talk
is
about
running
backups,
with
Seth
Seth
and
I'm
working
for
the
to
be
one
systems
located
in
Germany
around
hundred
hundred
twenty
people
and
yeah.
My
part
is
more
or
less
the
cloud
computing
stuff,
so
all
the
OpenStack
and
self
installations
and
workshops
and
all
the
stuff
and
when
I'm
talking
to
customers.
The
idea
came
for
this
talk
because
in
every
workshop
I
did
there
there's
one
minute.
A
A
Really
come
on,
it
must
but
must
be
more
hands.
Okay,
yeah!
That's
that
sounds
looks
better.
So
yet
in
every
workshop,
I
hear
descendants.
We
want
to
back
up
our
self
cluster.
So
there's
two
sides
of
the
group
that
one
side
says
why
we
are
building
our
self
classes
so
that
we
can
one
house
can
die
and
we
all
can
access
our
data
and
the
other
side
says
don't
know
we
need
a
backup.
A
Why?
So
the
goal
is
to
to
find
a
way
to
back
up
some
data
or
all
the
data
for
self
cluster.
The
other
ways
or
the
other
problem
is
that
sometimes
I
have
installations
with
OpenStack,
and
sometimes
it's
precise
installation
without
OpenStack.
That's
abusive
cluster
running
safe
FS
or
just
an
export
over
its
three,
or
something
like
that.
So
we
have
to
find
a
way
that
also
OpenStack
is
happy
with
our
backup
solution.
Of
course,
as
super
knows
a
beginner
or
provides
the
functionality,
sometimes
it's
an
off-site
backup,
I
mean
in
Germany.
A
On
this
I'm
filthy
Heights
technique
in
the
Entenmann's
see
informations
technique.
So
if
someone
knows
they
did
the
English
words,
fine,
otherwise
it's
just
the
Bureau
of
that
says.
The
distance
between
two
data
centers
should
be
at
least
200
kilometers.
So
the
first
recommendation
is
years
ago
was
three
or
four
I
think
and
now
they
updated
their
recommendation
and
says
at
least
200,
oh
okay,
and
yet
last
but
not
least,
I
think
it's
clear
what
I
mean
just
to
make
sure
everybody
knows
what
I
mean
with
the
file
browser.
A
So
who
wants
a
file
browser
for
backups?
No
wonder
there
was
only
one
small
and
no
one.
Okay,
that's
good,
but
my
customers
always
wanted
the
file
browser
I,
don't
know
why,
but
hey
okay,
so
SEF
provides
some
solutions
or
some
options
to
to
run
the
backup.
The
first,
I
think
it's
pretty
clear.
Yeah
be
more
so
we
can
replicate
an
image
or
pool
to
turn
off
side
to
another
safe
cluster.
The
other
side.
A
Then,
of
course,
you
can
write
your
own
scripts
with
the
a
BD
backup,
every
D
import/export,
if
and
all
the
stuff-
and
there
are
some
tools
and
I
only
mentioned
two
but
I'm
pretty
sure
that
Peter
provides
a
lot
of
more
scripts
and
tools
that
provide
the
same
functionality.
But
for
my
case,
only
these
two
are
only
the
first
one
Becky
someone
is
using
a
tool
so
see
some
shaking
ads,
but
okay
yeah.
A
So,
but
for
all
of
these
tools
possibilities
we
have
challenges,
challenges
that
we
have
to
solve
before
we
can
start,
and
sometimes,
if
I
start
a
discussion
with
the
customer
after
one
day
they
are
saying:
okay,
maybe
we
don't
want
to
backup
ourself
class,
because
it's
too
much
that
we
need,
we
need
a
second
safe
class.
They
are
on
also
so
the
first
challenges
are
the
seacrest
consistency,
disaster,
recovery,
band,
writing
course.
So
these
four
points
are
the
challenges
that
we
have
to
solve.
A
So
the
first
one
is,
of
course
we
only
have
access
to
the
base
layer,
so
we
don't
know
the
workload
of
the
VMS
of
the
f3
or
whatever
it
is.
So
we
only
know
there's
disk
image
and
a
BD
image,
and
that's
it
have
to
make
sure
that
the
disk
image
is
replicated
from
ROM
cluster
to
another
one
without
knowing
what's
inside
they
said
the
VM
is
a
database
system
subsystem
our
own
with
snapshots
we're
getting
corrupt
file
systems?
A
A
No
one!
Okay,
that's
that's
new!
So
normally
you
have
the
option
that
you
can
configure
scripts
that
runs
just
before
you
do
any
snapshot
or
go
to
the
dashboard
from
OpenStack.
You
have
the
option
to
say:
let's
create
a
backup,
let's
create
a
snapshot
and
seconds
before
you
can
instruct
the
agent
inside
the
guess
to
hold
the
fascism,
the
free
to
file
system,
and
then
you
can
do
the
snapshot
on
the
owner
on
the
storage,
so
it
prevents
corrupt
file
systems,
sometimes
not
so
lost
on
the
last
transactions,
depending
on
the
workload.
A
A
It's
it's
also
a
point
how
you
can
access
the
data.
Of
course
our
B
D
minus
C
for
the
cluster
name,
and
that's
it
it's
fine.
You
can
access
the
data.
What
would
your
customers
watch
with
your
applications?
So
OpenStack,
for
example,
has
different
since
Queens
the
the
option
that
you
can
configure
a
second
safe
class
that
as
a
replication
cluster,
so
that's
one
option.
What
was
the
rest?
Yes,
three,
depending
on
the
client,
normally
with
f3,
you
have
the
problem.
A
You
have
to
reconfigure
the
backup
zone
from
let's
say
for
me,
read-only
to
a
master
zone
that
is
this,
that
this
zone
except
reads
the
rights.
What
was
the
band
wise
they're,
not
the
endpoint
yeah.
If
I
already
said,
can
you
can
this
storage?
Can
the
application
handle
switch
over
to
a
second
storage
so,
for
example,
OpenStack?
Yes,
it
can
and
Bank
by
itself
so
I
mean
not.
A
The
band
was
not
one
gigabit
lingo,
so
so
I
mean
the
bandwidth
compared
to
your
backup
volume,
so
easy
question:
can
we
back
up
20,
terabyte
and
24
hours
of
an
800
megabyte
line
who's
for
know
who?
So
yes?
Yes,
it's
only
one
hand.
What's
this,
the
rest
know
who
so
no
okay,
so
yeah,
of
course
not
20
terabytes.
A
If
you
really
want
to
transfer
20
terabyte
of
an
800
megabyte
line,
it
takes
two
hours
to
two
days
and
7
hours
or
so,
and
with
the
bandwidth
across
comes
to
network
latency
for
disaster
recovery
and
backup,
it's
normally
not
the
problem.
But
if
you
want
to
access
the
data
instantly
or
from
from
the
primary
side,
you
want
to
access
insulate
the
data
on
the
remote
side.
It
is
a
problem
of
course,
yeah
and
cost
so
second
safe
Plaza.
A
If
I
started
discussion
about
the
first
one,
the
customer
freaks
out,
sometimes
because,
if
you
really
want
to
sell
faster,
you
cannot
use
the
laptops.
You
need
hardware
and
VMI's
drives
depending
on
the
workload.
Some
then
comes
to
point
back
up.
It's
a
problem.
We
need
a
second
one.
So
that's
it
all!
The
discussion
is
stopped
I'm
going
home.
Of
course
you
can
configure
your
second
safe
class
that
will
slow
a
disk,
cheaper
disk,
slower
network
and
all
the
stuff
at
in
the
end,
because
money
cost
money,
the
customer
and
one
problem.
A
A
If
a
customer
runs
a
safe
cluster
and
the
second
safe
cluster
runs
three
or
400
kilometers
away,
and
he
only
have
this
800
this
800
megabyte-
is
he
able
to
do
the
backup,
depending
on
the
workload,
because
the
initial
backup
takes
a
huge
amount
of
time
and
then
only
the
deltas?
So
what
is
possible?
What
can
we
do
now
yourself?
A
A
Come
on
ok,
so
then
there
are
two
ways
to
replicate
images:
data.
The
first
one
is:
you
can
configure
the
replication
between
two
pools
or
what
one
pool
was
replicated
to
another
one
on
the
second
side.
So,
second
option
is
you
only
configure
the
replication
from
one
image?
One
image
is
replicated
to
one
pool
on
the
other
side.
That's
it
it's
available
since
jewel.
So
it's
a
few
days
ago,
it
works
are
super
notes
and
you
need
a
second
demon.
So
if
you
run
the
config
management,
no
problem
enable
it
deploy
and
that's
it.
A
A
A
Did
you
know
the
message
features
not
support
how
to
disable
that
and
all
these
long
parameters
yeah.
So
you
know
what
I
mean
it's
sometimes
hard
to
trick
to
to
enable
the
feature
and
with
journaling
you
also
includes
the
exclusive
lock
yeah
by
default.
There's
a
default
thirty-seconds
trigger.
So
that
means
every
30
seconds
the
deltas
assumed.
A
But
that's
all
not
the
problem.
The
problem
is
to
understand
that
you
running
to
safe
clusters,
both
with
the
same
name
who
has
changed
to
self
cluster
default
name
from
self
to
specific
name,
three
hands
three
or
four
hands
yeah.
So
you
know
what
I
mean
it's
it's.
It
is
possible,
of
course,
but
the
problem
is
to
to
write
the
documentation
and
get
rid
of
all
these
names,
linkings
and
all
the
stuff
it's
easier
to
to
named
Jeff
Kloster.
A
A
Problems
with
our
a
beginner,
our
problem
is
both
demons
need
a
connection
to
both
clusters,
so
the
local
one
needs
a
connection
to
the
local
one
and
to
the
remote
one
and
the
remote
one
needs
a
connection
to
his
remote
cluster,
the
name
and
to
the
local
one.
So
and
in
fact
we
can
not
provide
any
or
not
use
two
public
networks,
because
it's
not
allowed
as
possible.
A
The
only
way
to
do
it
is
to
run
both
clusters
on
the
same
network
or
you
have
some
routing
chrome
food
to
do
or
network
guys
have
that
to
do
Yanis
already
set
the
k.
Led
module
is
a
problem
if
the
journaling
or
if
your
mapping
images
from
a
pool
to
the
host,
you
cannot
enable.
Normally,
you
cannot
enable
the
journaling
feature.
A
So
that's
that's.
The
urban
Amuro
multi-site
has
three
multi-site
simple
storage
servers
more
or
less
under,
like
let's
say,
90%
compatible
to
Amazon's
API.
It
also
includes
the
Swift
API
or
some
parts
of
that
it's
Keystone
competence.
You
can
integrate
all
your
Keystone
users
into
now.
You
can
integrate
Keystone
to
authenticate
your
users
over
the
s3.
Simply
Keystone
back
end
encryption
laughing.
You
and
I
don't
mean
the
SS
connection
over
HTTP
I
mean
server-side
encryption,
but
there's
still
no
file
browser
so
and
I
mean
Chrome
Firefox
that
there
are
not
file
browser.
A
A
It's
also
possible
to
run
an
NFS
export
over
Ganesha
and
you
need
an
s3
client,
self-written,
backup
software
or
whatever
it
is,
and
the
browser
is
still
no
client.
But
here's
the
same
of
more
or
less
the
same
problem
than
the
abadeer
pair
default.
If
you
start
you
don't
get
rid
of,
you
want
to
get
rid
of
the
names
you
just
want
to
use
the
data
and
provide
the
f3
server
without
thinking
about
the
names.
So
no
problem
until
you
want
to
replicate
the
data,
then
you
have
the
same
problem.
A
We
have
zones
with
the
name
default
default
is
replicated
with
default
and
default,
which
so
which
side
is
the
master
on
which
side
is
stick
to
Barcelona,
which
side
is
to
stick
to
mini,
for
example.
So
if
you
want
to
stab
try
to
find
the
name,
she
metallic
you
can
use,
for
example,
Germany,
Frankfurt,
Munich
or
so,
and
that's
it.
A
A
A
Yeah,
the
the
idea
is
you
curate
the
snapshot
and
after
the
snapshot
you
can
create
or
you
export
the
difference
from
one
snapshot
to
another
one
to
the
remote
side
and
that's
it.
But
in
the
end
someone
has
to
track
your
backups.
Someone
had
to
track
the
back
of
the
snapshots
of
the
of
the
cluster
and
that's
where
Becky
came
in.
It
has
an
internal
database.
It
can
handle
all
these
snapshots.
You
can
backup
directly
to
to
an
f3
storage.
A
For
example,
it's
a
bit
tricky
with
Cuba
Nettie's,
because
Cuba
Nerys,
if
you,
if
you
if
you
create
an
persistent
volume
on
open
on
Cuban
Eddy's
Becky,
creates
these
snapshots
and
if
you
try
to
delete
the
these
snapshots
in
Cuba
natives,
you
get
an
error
message
that
says:
oh
there's,
the
snapshot
I
can't
delete
the
RVD.
So,
let's
start
cube.
Analysis
is
good,
but
if
you
there
are
some
problems
of
that
right
now.
It's
only
available
for
Debian,
but
it's
private,
so
no
problem
and
it
supports
the
net
block
device
mount.
A
Yeah
the
third-party
tools.
There
are
also
some
problems,
as
I
already
mentioned,
active
old
snapshots,
someone
has
to
track
that
one
Cuba
narratives
and
surf
it's
a
good
combination,
but
with
backing,
for
example,
you
have
two
problems
with
the
snapshots
and
yeah
and
the
end.
There
is
no
file
browser
available
for
the
user.
So
each
time
you
run
the
backup
you
get
a
ticket
phone
call
or
whatever
it
is.
Can
you
restore
my
whole
instance?
A
I?
Add
the
workflow,
for
example:
it's
you
create
initial
snapshot.
Then
you
copy
this
first
snapshot
to
remote
side
and
after
hours
days
weeks.
Whatever
you
want,
you
create
the
second
step
snapshot
and
only
transfer
the
Delta
from
step
should
want
to
slap
shot
to.
So
this
one
takes
depending
on
how
much
data,
as
you
have
seconds
hours,
depends,
and
then
you
can
across
delete
the
old
one
or,
for
example,
we
track
or
we'll
keep
the
latest
two
snapshots
open
back
up
and
delete
all
the
other
snapshots.
A
So
what
we
can
we
do
was
the
air
BD
export?
It's
only.
We
using
the
initial
air
BD
export
just
the
export
for
one
time
since
if
we
want
to
migrate
a
cluster
from
stores
on
the
old
cluster
to
a
new
one,
that's
perfect
at
soon
untuned
utilized
safe
cluster
and
it
takes
20
minutes
for
20
gigabytes.
So
it's
not
really
fast,
but
it
it
will
work.
A
On
the
other
hand,
the
export,
if
same
image
but
booth
snapshots,
drop
down
to
8
minutes
so,
but
if,
if
there
is
really
fast
that
depends
on
how
much
debt
that
you
have,
but
the
good
thing
is,
you
can
schedule
that,
for
example,
the
air
redeemer
runs
in
the
background.
So
every
time
customer
writes
data
after
30
seconds,
it
starts
synchronizing,
it's
good
for
disaster
recovery
and
all
the
stuff.
That's
on
a
backup.
Not
really
you
want
to
schedule
something
you
want
to
schedule
your
backup
jobs.
So
that's
them.
That's
my
problem
with
the
epidemia.
A
If
the
customer
deletes
a
file
or
the
crashes
the
instance
30
seconds
later,
it
gets
all
the
changes
to
the
remote
side
and
also
the
image
is
destroyed
or
deleted
or
whatever
it
is,
and
it's
that
that
works
really
good.
The
synchronization,
so
you
only
have
time
frame
1
30
seconds,
to
get
a
call
and
to
restore
the
data
from
the
remote
side.
A
Afterwards,
it's
gone
problem
is,
as
you
can
see,
that
the
speed
is
1
peak
I,
don't
know
why
maybe
the
initial
creation
or
so
and
then
afterwards
it
takes
30
minutes
for
also
20
gigabyte,
and
there
are
two
options
to
tune
up
to
thing
but
same
image,
5
minutes.
But
the
problem
is
it
consumes
memory.
They
are
here
from
the
urban
Liveris
to
run
it
as
a
background
and
it
tries
to
to
consume
as
low
as
can
memory
for
the
system.
A
A
Okay,
they
are
threes
for
f3.
You
need
an
capable
client.
So
what
is
it?
F3,
CMD
cyberduck
for
Macs
and
Windows
clients,
and
that's
it,
but
the
good
thing
is:
you
only
have
one
connection.
You
only
can
provide
an
HTTP
service
for
the
users
to
backup
something,
and
it's
gets
really
good,
so
I'm
I'm
really
really
fast.
Unfortunately,
but
it's
no
no
problem
right
now,
yeah.
What
can
we
do
now?
After
all
the
challenges?
A
After
all
the
problems
and
all
the
stuff,
we
are
still
using
the
snapshot
feature
as
the
main
component
to
to
run
backups
for
self,
and
that's
only
because
we
want
to
schedule
something
and
we
want
to
have
the
ability
to
restore
data
from
the
last
6
hours.
So
when
we
get
an
ticket
from
a
customer
who
said
our
instance
I've
deleted
my
it's
a
crash,
can
you
restore
the
instance
yeah?
No
problem
yeah,
the
other
thing
it
works
really
good
for
us
is
ds3
multi-site
replication,
for
example.
A
We
run
this
setup
with
a
packet
to
client
and
it
replicates
all
the
data
that
comes
in
from
Becky
to
to
a
remote
site
and
that's
it,
but
in
the
end
it's
also
our
BD
snapshots,
yeah
and
they're
beginner.
Last
but
not
least,
it's
built-in
feature
you
can
backup
all
the
images
on
whole
pool,
but
in
the
end,
is
that
really
a
backup
I?
A
Think?
No,
because
if
you
delete
something
it
is
instantly
gone
after
30
seconds,
it
is
deleted
too,
on
the
remote
side
of
also
so,
what's
with
the
set
of
s
right
now,
housing,
there
is
no
plan,
a
or
Plan
B
to
run
a
replication
from
CFS.
A
A
A
A
Can
you,
depending
on
your
script,
look
at
the
question
is:
can
you
have
different
numbers
of
snapshots
on
the
sides?
Yes,
you
can.
If
you
have
your
own,
if
you,
if
you're
using
your
own
scripts,
of
course
you
can
with
the
Abbott
Emeril.
No,
it's
replicated
that
as
it
is.
A
A
A
That's
good.
The
question
was:
what
is
the?
What
is
the
feature
that
I
cannot
recommend
to
to
to
the
end
user?
That
I,
would
we
like
to
see
our
sorry.
A
Okay,
the
question
is
what
what
we
like
to
see
and
in
the
back
I'm
pre,
okay,
with
that,
what
we
have
when
I
think
in
most
of
the
customers
want
file
browser,
something
like
that,
but
it's
not
a
problem
of
self.
That's
the
problem
of
the
backup
strategy
that
customer
runs
so
I'm
pretty
good
with
that,
but
we
have
so
nothing.