►
Description
Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Lars Marowsky-Brée SUSE Distinguished Engineer, Ceph Advisory Board member
Marc Koderer, SAP OpenStack Evangelist
A
A
Souza
has
been
in
the
Linux
business
for
over
25
years.
Now
we
have
four
five
years
of
that
time.
We
have
journeyed
with
Seth,
but
previously
two
that
we've
done
a
lot
of
work
on
storage.
I
personally
have
build
ups,
a
high
available
product,
where
we
have
focused
very
much
on
enterprise
availability
and
our
products
are
now
in
use
everywhere,
including
the
airplane
tower
I
flew
through,
which
is
always
kind
of
worrisome
to
myself
so
and
Linux
and
Souza,
and
especially
Seth,
are
really
chameleons.
A
You
find
some
everywhere
as
these
days
right
and
the
mimic
octopus
is
just
a
perfect
match
for
the
show
today.
So,
but
first,
let's
review
some
of
the
challenges
that
we've
already
heard
in
many
of
the
previous
sessions.
Why
are
people
looking
at
software-defined
storage
in
the
first
place?
Well,
that's
primarily
because
the
solutions
that
we
have
been
using
in
the
past
and
that
I've
been
working
on
for
the
larger
part
of
my
career,
don't
scale
it's
impossible
to
bring
a
single
storage
system
up
to
you
know
hundred
400,
500,
petabytes
or
even
an
exabyte.
A
We
have
data
coming
in
from
our
mobile
devices,
Internet
of
Things
video
surveillance,
medical
data
that
has
to
be
you
know
not
just
generated
and
stored
briefly
but
kept
around
for
us
a
lifetime
of
the
patient
plus
10
years.
Videos
are
growing.
Resolution
is
growing
email,
us
I,
don't
remember
when
I
last
deleted
an
email
I
just
keep
it
around
forever,
because
I
never
know
when
I
might
need
it
again,
so
nobody
deletes
State
hi.
Everyone
just
wants
to
add
more
and
data
protection
makes
us
worse
right.
A
You
you
have
a
lot
of
data,
it
becomes
hard
to
manage.
It
becomes
hard
to
store,
it
becomes
expensive
to
store.
You
have
to
keep
it
available.
Yes,
I
have
this
big
email
archive,
but
if
I
have
a
really
big
one
I'm
like
big
provider
or
you
know,
storing
data
for
other
purposes,
I,
don't
quite
know
which
part
of
my
data,
my
clients
are
going
to
access.
I
have
to
make
sure
all
of
this
data
is
online.
A
I
also
have
to
make
sure
all
of
this
data
survives
as
one
of
my
data
centers
there's
a
problem
so
I
have
to
keep
backup
and
recovery
redundancy
in
my
mind
and
that's
why
I
really
appreciate
safe
and
why?
One
of
the
reasons
why
we
choose
safe
of
others
is
I
come
from
a
world
where
we
do
high
availability,
clustering
and
the
protection
of
the
data
is
the
most
important
thing
ever
right.
A
Eventual
consistency
usually
translates
to
very
little
consistency,
and
if
you
don't
have
the
data,
you
will
not
get
the
service
back
up,
so
the
data
is
really
at
the
core
of
everything
and
another
important
reason
why
we
choose
safe
over
other
proprietary
solutions
is
well.
Suzy
is
an
open
source
business,
so
we
only
do
open
source,
but
open
source
is
the
only
sustainable
option.
If
you
have
a
proprietary
solution,
you
are
computing
competing
sorry
with
this
huge
community
of
contributors.
A
Geo
replicated
cluster
for
our
mission,
critical
core
business
and
I,
don't
quite
know
what
happened
in
between,
but
it
was
kind
of
inevitable
right
and
it's
possible
to
sometimes
keep
a
little
bit
of
an
edge
over
open-source
when
you
were
like
really
focused,
but
eventually
the
community
catches
up
right.
This
is
a
marathon,
not
a
sprint
right,
so
in
the
end,
we
really
believe
that
contributing
to
the
community
is
the
only
way
of
solving
our
customers
problems
and
with
that,
I
would
like
to
take
a
brief
look
at
the
product
itself.
A
Our
Granada
outsider
at
that
point
was
the
technology
preview,
but
we
have
a
long,
long
history
with
this
and
then
eventually
we
realized
that
surf
is
a
really
great
and
not
just
for
OpenStack,
which
is
why
we
called
it
Sousa
enterprise
storage.
We
make
it
useful,
not
just
for
OpenStack,
it's
still
the
most
common
use
case
for
sure,
but
we
have
a
long
history.
With
this
we
keep
building.
A
Sometimes
people
are
worrying
why
they
should
choose
a
linux
vendor
over
you
know
just
using
the
community
project
and
if
I
was
CERN,
I
would
probably
go
with
a
community
version.
I
have
a
lot
of
cheap
labor.
That
is
very
well
trained
and
that's
how
Olynyk
started
right.
It
started
at
the
universities,
but
then
you
realize
that
just
having
the
software
out,
there
is
not
good
enough.
It
needs
to
be
tested,
it
needs
to
be
validated.
You
need
to
have
so
certification
and
certifications
are
really
not
something
community
members
enjoy
right.
It's
much
better.
A
They
do
not
always
have
to
bend
twist
to
interact
with
the
community
members
directly.
They
do
not
have
the
bandwidth
to
interact
with
all
the
software,
vendors
and
hardware,
vendors
and,
of
course,
sometimes
customers
have
problems
and
really
want
somebody
to
fix
them
now,
and
that
also
is
one
of
those
value-adds
that
vendors
provide.
But
besides
business
Suzy
is
very
active
in
the
safe
community
as
well.
A
We
are
strong
contributor
to
the
safe
community,
I'm,
very
happy
that
my
company
was
able
to
sponsor
this
conference,
but
we
have,
in
the
past
sponsored
many
safety
events
as
well.
We
have
hosted
them.
We
have
sponsored
them.
We
are
a
member
of
the
surf
advisory
board.
We
do
everything
in
open
source.
All
our
safe
work
is
open
source.
We
aspire
to
an
upstream
first
policy.
A
We
have
contributed
to
I
Scaasi
you
where,
as
a
solution
that
we
choose
supports
multi
parsing
very
early
on,
and
we
are
now
adding
it
to
the
solutions
that
the
upstream
community
has
chosen
to
go
within
the
next
upstream
release.
So
that's
great.
That's
really
really
useful
I
Scott!
It's
really
important
if
you
are
into
operating
with
systems
that
are
not
quite
ready
for
native
safe,
yet
we
have
supported
it
on
arm
64.
A
We
have
supported
CFS
and
that
CFS
deployment
was
actually
a
great
scenario,
because
something
being
ready
for
the
community
is
not
always
the
same
as
something
being
entered
ready
for
production
used
by
customers.
So
we
initially
put
guidelines
around
self
s,
use
based
on
our
own
testing,
so
that
the
customers
could
feel
safe,
that
the
use
cases
they
had
would
work
stable
and
reliably
with
CFS.
A
We
call
the
project
deep
sea,
you
know
octopuses
swims
through
the
deep
sea
and
that's
also
fully
open
source,
of
course-
and
it
also
includes
tooling,
for
upgrades
it
includes
tooling,
for
is
a
file
store,
blue
store
migration
that
takes
care.
That's
a
cluster
is
migrated
one
OSD
at
a
time
and
converted
from
file
store
to
blue
store,
while
the
entire
system
remains
online
and
continue
serving
customer
data.
And
of
course
we
have
OPA
medic,
which
is
a
great
great
tooling,
based
with
monitoring
based
on
Prometheus
and
Griffin
arm.
A
So
we
try
to
leverage
open
source
project
as
much
as
possible
instead
of
reinventing
them
and
it's
now
being
merged
into
SEF
core
and
here's
just
a
screen
shot
of
automatic
monitoring
dashboard.
That
is
actually
an
embedded
prometheus
graph
fauna
instance.
Prometheus
is
an
excellent
monitoring
time
series
tool,
it's
also
cloud
native,
so
we
are
already
here
ready
for
containerization.
I
will
talk
a
little
bit
more
about
that
in
the
future,
and
we
realize
how
important
it
is
that
our
software
is
accessible
to
everyone.
A
So,
together
with
a
partner
in
China,
we
have
localised,
the
automatic
dashboard
and
the
management
functionality.
We
have
translated
our
documentation
and
we
are
making
sure
it's
a
safe
management
dashboard
from
day
one
inherits
the
capability
to
be
localized
and
translated
into
all
the
languages
that
are
needed
right,
because
sometimes
the
local
operator,
you
know,
may
need
this
functionality
and
it's
generally
just
good
practice
and
yes
automatic
is
now
merging.
This
is
actually
outdated.
A
It
is
now
merged
into
the
Ceph
dashboard,
but
it's
not
yet
in
a
shipping
release
and
we
are
adding
functionality
to
this
and
it
validates
our
choice
both
in
automatic
but
also
in
up
streaming
it.
We
are
seeing
already
the
first
contribution
from
outside
our
team
so
and
for
that
we
are
deeply
deeply
grateful.
But
this
just
highlights
that
if
you
are
not
doing
open
source,
eventually
open
source
will
catch
up
with
you
and
it's
great.
A
Sometimes
things
are
prototype
outside
the
Ceph
core,
like
automatic
took
some
time
and
frankly,
we
took
inspiration
from
many
open
source
management
platforms
that
may
not
be
around
anymore,
but
ultimately
the
community
will
catch
up
and
rillette
it.
So
strong
industry
partnerships-
this
is
also
important.
We
have
interactions
with
many
many
large
companies.
Souza
is
part
of
micro
focus,
so
we
are
very
truly
global
company
and
we
have
also
the
ties
to
the
other
companies
as
well,
but
instead
of
talking
about
the
partners
that
we
have,
for
which
again,
we
are
very
grateful.
A
I
would
like
to
talk
to
you
very
soon
about
a
partner
who
is
actually
here
with
us
on
stage
and
before
I
hand
over
to
mark.
Let's
take
a
look
briefly
at
this
growth
of
SEF
deployments.
Initially,
the
deployments
were
very
tiny
and
tender,
so
customers
were
just
trying
out
Seth
right.
It
was
like
small
environments
that
were
like
development
setups,
just
like
grassroots,
how
Linux
started,
and
then
we
had
customers
deploying
it.
A
You
know
in
the
second
stage,
deploying
it
for
traditional
workloads
where
they
replace
traditional
storage
arrays
that
were
getting
too
expensive
with
set
based
solutions,
and
that's
great
that's
really.
What
helps
a
product
grow
in
understand
the
market
better
and
get
the
customer
familiar
with
it,
but
where
it
really
shines
is
when
you
go
as
a
full
weigh
in
and
really
switch.
Not
just
to.
A
You
know,
drops
F
into
an
environment
that
you
already
have,
but
have
a
true
mode
to
cloud
native
environment,
and
then
you
can
really
get
the
full
benefit
of
the
scale
out
solutions,
and
we
said
I
would
like
to
invite
mark
Cordova
from
sa
P
I'm
sure
you
may
have
heard
of
this
company
who
has
choosen
SEF.
As
you
know,
a
core
part
of
the
new
architecture
and
after
his
presentation,
I
will
be
back
with
you
for
a
few
slides
about
the
future.
B
Hello,
everyone
I'm
really
happy
to
be
here
so
I
want
to
talk
about
our
architecture
that
we
choose
for
the
cloud.
So
basically
s
ap
has
a
long
tradition,
so
it
has
40
more
than
40
years
of
history
in
software
and
computer
science,
and
basically
we
are
middle
of
of
an
industrial
revolution
to
to
transform
everything
to
a
cloud
native
world
and,
as
I
said,
as
we
have
a
big
history,
we
have
a
lot
of
brownfield
applications
that
needs
to
be
transformed
into
the
new
architecture.
B
So
basically,
how
can
we
do
that
and
the
the
answer
here
is
we
have
a
product
called
s
AP
cloud
platform,
so
the
s
AP
cloud
lab
cloud
platform
is
a
platform
as
a
service
offering
which
enables
you
to
put
your
cloud
native
application
with
a
variety
of
backing
services
that
are
also
cloud
native,
like
a
database,
Cassandra
database
and
so
on,
and
you
put
your
micro
services
on
top
and
can
build
your
application.
So
it's
also
integrated
and
not
the
the
traditional
as
ap
products
for
ERP
and
also
the
sa
P
Hana
system.
B
So
my
team
is
basically
focusing
on
the
layer
below,
so
we
are
running
the
OpenStack
cluster
in
a
DevOps
mode.
We
are
kept
taking
care
about
the
bare-metal
provisioning
and
all
the
bare
metal
servers
for
that.
The
SP
core
platform
itself
can
run.
Public
cloud
can
also
run
on
as
ASAP
datacenters
with
OpenStack,
and
we
were.
We
were
working
on
enabling
that
also
for
for
on-premise
private
edition
workload
so
that
customers
can
build
the
same
stack
on
their
own
premise.
B
The
basic
the
the
usage
that
for
for
this
kind
of
stack,
is
primarily
IOT
workloads,
so
we
already
have
two
years
of
experience
with
iot
workloads
on
top
of
OpenStack.
So
now
we
came
up
with
a
new
architecture
and
we're
since
last
year
started
to
youssef.
So
the
question
is:
why
do
we
have
a
need
on
SEF?
And
why
do
we
have
a
need
on
a
special
architecture
for
cloud
native?
So
so,
basically,
if
you
have
a
look
and
on
the
left,
you
see
a
monolithic
payload
and
that
that's
the
traditional
thing
right.
B
So
you
have
an
application.
You
have
an
database
that
is
active
standby
and
you
have
one
central
storage.
What
you
do
is
that's
the
usual
for
monolithic
payloads.
You
chew
your
hardware
that
you
dead
a
never
fail
and
you
never
test
fail
overs
that
extensively
in
a
distributed
or
cloud
native
world.
You
will
have
micro
services
that
can
stay
scale
unlimitedly.
Basically,
and
you
have
databases
that
all
do
their
job
active
active
so
also
that
can
scale.
So
you
need
to
have
a
software-defined
storage
there
to
scale
up
your
needs.
So,
for
instance,
we
are.
B
We
are
onboarding
now
the
one
customer
that
has
60,000
IOT
centers
and
these
60,000
IOT
centers
will
increase
every
month
by
20
thousands.
So
you
see
that
there
will
be
a
huge
amount
of
data
coming
in
and
we
need
to
be
prepared
and
we
need
to
enable
scaling
in
that
in
that
sense,
so
let's
have
a
look
in
our
architecture,
so
we
choose
within
our
deployment
three
availability
zones.
So
why
three
availability
zones,
because
we
are
hosting
cloud
native
applications
and
all
cloud
native
applications
need
forum.
B
B
Basically,
four
for
our
instance,
we
have
here
a
set
up
that
we
have
a
to
ten
to
two
data
centers
that
are
quite
close,
300
meters
in
range,
and
we
have
basically
these
two
data
centers
and
two
five
compartments
in
one
of
those,
so
that
we
have
three
availability
zones
so
also
for
staff.
Here
we
are,
we
are
distributed
to
safe
cluster
across
those
a
real
ability
zone.
B
So
let's
have
a
closer
look
to
to
our
installation.
So
what
is
really
important
here
is
that
that
we
we
should
not
underestimate
that
software-defined
storage
need
network.
So
that's
really
crucial
thing
you!
It's
not!
You
don't
need
to
just
storage
expertise
in
your
team
if
you
want
to
run
this
after
so
you
need
to
have
also
networking
expertise
and
also
compute
or
Linux
expertise.
B
So
what
we
have
here
is
a
spinal
if
architecture
from
networking
means
it
doesn't
matter
where,
when
if
data
is
flowing
from
one
data
center
to
another
or
from
one
within
the
same
data
center,
it
will
be
the
same
hope
and
the
same
bandwidth.
So
our
our
safe
cluster,
with
our
in
production
landscape,
has
108
storage,
nodes,
24
disks,
pear,
OS
d,
node
and
2
viene,
an
Viennese
and
networking
wise.
We
have
two
times
twenty
five
gigabit
four
for
the
front
and
network
and
two
times
twenty
five
gigabit
for
the
back
end
replication
network.
B
Primarily,
we
are
focusing
on
radio
Skateway
object,
store
usage
because
IOT
workload,
our
IOT
workload,
would
hit
the
object
store.
So
that's
the
thing
that
we
actually
tuned
a
lot
right
now.
Now
we
measured
the
performance
overall,
we
had
I
think
at
the
end,
10
compute
hosts
really
putting
load
on
it.
We
came
to
something
around
50
gigabyte
per
second,
but
this
was
not
was
not
the
the
end
of
it.
So
we
estimate
the
maximum
performance
about
60
gigabyte
per
seconds
with
4
megabyte
writes.
B
So,
basically,
as
I
said,
the
radio
scaled
way
for
us
is
really
important.
We
use
swift
interface
here,
and
so
we
had
basically
already
last
week
some
some
problems
with
with
scaling.
So
we
did
a
lot
of
performances
and
also
our
customers
now
start
to
onboard
AI.
Oracle
I
would
heal
workloads
and
we
saw
that
the
performance
is
not
good
enough.
So
we
scaled
up.
We
had
we
started
with
3
radio
state
ways
we
scaled
up
now
to
30
radio
state
ways.
B
These
are
virtual
little
machines,
so
we
can
easily
scale
them
up
I
think
later
on.
It
would
be
nice
to
have
them
on
kubernetes
to
really
yeah
on
the
fly
scale
them.
So
basically,
what?
What
do
you
see
here?
If
you
have
a
look
at
the
stack,
is
something
quite
obvious.
You
see
that
that's
F
quite
well
scales.
You
can
scale
each
layer
individually,
so
you
can
add
more
osts,
you
can
add
more
monitors
or
you
can
add
more
writers
gateways,
but
there's
one
layer
here.
B
That
is
actually
not
really
scaling,
which
is
the
load
balancer.
So
so
we
had
problems
with
with
the
load
balancer
with
SSL
Kemeny,
so
we
changed
the
way
that
we
do
determination
in
the
registered
way
right
now,
so
we
have
the
first
measurements.
These
are
really
fresh
from
Friday
last
week.
So
basically
we
we
tried
out.
How
much
does
the
Raiders
Gateway
scale
here?
B
So
we
start
with
one
Raiders
gateway,
it's
a
virtual
machine
with
60
P
use
and
do
a
get
put
benchmark,
which
is
a
nice
tool
for
for
benchmarking
by
the
way-
and
you
see
from
from
the
numbers
that
they're
the
radio
Skateway
itself
can
scale
up
quite
dynamically.
So
so,
basically
we
can.
We
can
reach
and
saturate
the
staff
trusts
or
quite
easily
by
extending
the
Raiders
gateway
number.
B
What
is
also
an
important
thing
is
the
concurrent
connection,
so
basically
we
s
there
will
be
more
and
more
IOT
Central's
in
place.
There
will
be
also
more
and
more
traffic,
so
we
we
came
up
with
a
number
that
we
have
the
maximum
of
512
worker
threads.
So
basically,
this
also
will
scale
up
individually
radios
gateway.
So,
basically,
that
that's
just
one
detail
that
I
want
to
share
to
show
how
so
so.
B
Basically,
it's
important
to
understand
that
if
you
want
to
operate
a
safe
press,
so
you
need
to
also
benchmark
it
and
we'll
each
unit
for
the
way
that
you
want
to
use
it
and
I
think
for
for
IOT.
We
are
now
well
prepared
and
and
yeah.
That's
basically
what
I
wanted
to
present
and
want
to
thank
you
for
giving
me
the
opportunity
to
be
here.
A
Thank
you
Mark,
yes,
so
that
is
very
exciting.
We
are
really
happy
to
see
that
chef
is
being
deployed
in
these
use
cases,
but
that's
where
we
were
and
where
we
are.
Let's
see
where
we
believes
this
might
go.
I.
A
Already
mentioned
that
we'll
have
a
release
out
based
on
self
mimic.
Our
focus
is
improved.
Interoperability,
improved
localization
management,
the
scale-out
user
experience
is
also
interesting.
Sometimes
you
notice
that
a
UI
that
worked
well
for
cluster
of
100
nodes
really
needs
some
changes
for
cluster
with
the
cells
and
nodes,
eventing,
alerting,
metric
reporting
and
telemetry
and
I
never
get
tired
of
seeing
those
hollow
ports.
They
are
so
beautiful.
A
A
A
We
need
an
abstraction
layer
that
is
more
agile
than
installing
packages
and
less
heavies
and
virtual
machines
and
containers
and
cuban
Etta's
seem
to
be
the
way
this
goes,
and
it's
really
it
would
help
us
address
a
number
of
the
issues
that
we
see
in
managing
large-scale
cluster
lifecycle
issues,
so
that
is
certainly
coming.
We've
had
meetings
all
this
week
around
sus,
it's
a
really
exciting
technology.
A
We
love
the
automatic
dashboard
and
we
love
self.
So
as
part
of
this
management
experience,
we
want
to
make
this
better.
Add
features
make
complex
tasks
easier.
Yes,
initially,
some
of
the
workflows
are
going
to
be
there
manual,
but
ultimately
there
will
be
Brizard
guiding
you
through.
There
will
assist
you
along
the
way.
You
know
really.
The
computer
and
the
human
have
to
work
better
together
from
a
monitoring
perspective
from
a
management
perspective,
and
it
also
means
that
we
have
to
make
sure
that
our
management
interface
can
address
users
of
different
levels.
A
We
have
to
make
sure
that
we
can
address
users
that
are
very
experienced
and
address
users
that
you
know
just
want
to
provision
an
additional
instance
of
their
workload
or
see.
Why
their
workload
isn't
performing
and
effects
that
workload
and
also
expose
more
relevant
metrics
to
the
administrator,
but
those
metrics
are
really
complicated
and
we
hear
a
lot
about
machine
learning
in
AI,
I,
don't
think
AI
is
just
rel
and
machine
learning
is
just
relevant
as
a
workload
running
on
top
of
our
cluster.
A
A
How
you
know
how
many
I
helps
do
you
need
and
it
becomes
really
complicated
and
we
need
from
telemetry
data
that
we
get
from
real-life
clusters
as
we
referred
in
the
previous
presentation
as
well,
and
where
there's
a
currently
an
upstream
project
with
video
den
Hollander,
we
need
real-life
data
on
how
much
memory
a
safe
cluster
needs,
how
much
disks
it
needs
and
also
be
able
to
track.
You
know
global
performance,
you
know
it's
F
release
faster
on
average.
We
know
on
our
synthetic
benchmarks
in
our
lab,
but
is
it
true
in
the
real
world?
A
So
that
is
important.
You
know
analyzing
and
understanding
that
telemetry
gaiter
from
a
performance
back
perspective.
Also
from
a
failure
prediction
perspective.
When
will
a
drive
fail,
you
know,
do
I
have
to
order
drives,
will
turn
off
my
derive.
Will
10%
of
my
cluster
fail
on
the
next
week?
Should
I
be
prepared
and
we've
heard
a
lot
about
all
the
knobs
that
set
exposes,
and
you
know,
tuning
PG
NAM
tuning,
all
those
quality
of
service
parameters
and
all
those
things
those
options
are
really
really
complicated.
A
Right,
I
am
NOT
sage
I'm,
not
smart
enough
to
understand
them
all,
but
there
is
already
interesting
research
out
there
that
treats
this
as
a
game
problem,
and
you
tell
the
system,
your
your
procuring
Q
networks
that
you've
wanted
to
optimize
for
latency
and
latency
stability,
or
maybe
you
want
to
optimize
for
throughput,
and
you
know
over
time
it
learns
and
generates
data.
So
this
is
actually
really
exciting
and
it
seems
far
so
far
out
and
then
you
look
at
it
and
somebody
has
already
done
it
right.
A
So
these
kinds
of
technologies
have
already
been
documented
for
other
storage
solutions
and
would
be
really
great
to
bring
them
to
Seth
and
with
that
I
would
really
like
to
conclude
that
safe.
The
question
is
not
if
you
should
be
using
self
I,
believe
the
question
really
is
only
when
and
I
hope
to
speak
to
more
of
you
about
these
questions
during
the
rest
of
the
conference
and
with
that
I
would
like
to
again
thank
you
for
your
time
and
please
come
find
me
and
talk
to
us.
Thank
you.