►
Description
We explore the security model exposed by Rook with Ceph, the leading software-defined storage platform of the Open Source world. Digging increasingly deeper in the stack, we examine options for hardening Ceph storage that are appropriate for a variety of threat profiles.
A
All
righty
so
we're
going
to
try
to
give
you
the
light
subject
at
the
end
since
you're
already
sleeping,
but
maybe
the
subject
is
not
light,
but
it's
a
relatively
short
talk
at
least
and
I
think
the
record
is
17
minutes,
so
we're
not
trying
to
beat
it,
but
we're
not
going
to
bore
you
too
much,
it's
nice
to
be
back
in
New
York.
For
me,
it's
especially
special
the
first
time
I
spoke
publicly
about
stuff
in
my
career
was
eight
years
ago
in
New
York.
The
last
time
I
came
out
to
see.
A
Matt's
team
was
last
repeated
before
the
before
the.
A
So
it
feels
nice
to
be
restarting
here
so
I
have
been
fortunate
to
spend
nearly
all
my
career
in
open
source
and,
like
almost
everybody,
have
had
terrible
marketing
managers,
but
I
also
had
two
excellent
ones,
and
one
of
them
gave
me
this
lesson
that
I
will
never
forget,
which
is
marketing
malpractice,
not
to
introduce
yourself.
So
that's
me
in
one
slide.
Those
are
things
I
worked
on.
A
B
Everyone
I'm
Sage,
McTaggart,
I,
use
they
them
pronouns
and
I.
Work
on
cyber
security
for
Seth
at
IBM
I
did
my
undergrad
at
UMass
Amherst
and
graduate
school
at
UC.
Santa
Cruz
I've
done
research
in
a
wide
variety
of
areas,
ranging
from
programming
languages,
file
systems,
all
sorts
of
other
stuff
and
I
was
working
on
security
for
Stefan
odf
at
Red
Hat
as
well.
B
A
So
we
both
successfully
escaped
from
Academia,
except
I,
still
have
a
taste
for
corduroy.
Apparently,
so
there
are
slides
introducing
saffron
Rook
that
we're
going
to
jump
over
they're
built
for
a
different
audience.
Those
became
clearly
not
applicable
when
someone
asked
the
question
about
the
AIX
client,
so
let's
jump
straight
into
security,
so
the
big
picture
for
security
is
that
security
practices
harden
a
specific
point
of
the
infrastructure.
A
Cherry
picking
practices
without
the
model
of
the
threat
and
of
the
attacker
is
just
not
a
viable
strategy.
The
joke
usually
goes
that
to
a
really
secure
computer.
You
have
to
cover
it
in
concrete,
shut
it
down,
throw
it
to
the
bottom
of
the
ocean,
but
then
it's
not
very
useful,
and
so,
in
other
words
in
Practical
words,
absolute
security
is
not
usable
or
maybe
even
attainable,
so
you
have
to
define
a
threat
model.
A
Are
you
facing
script
kidneys
or
the
gru
or
the
dreaded
privilege
Insider?
These
are
very
different
scenarios.
Some
of
these
want
to
steal
your
data,
and
others
want
to
cryptolock
data
and
hold
you
for
ransom.
Other
just
maybe
satisfied
with
deleting
things
at
random
and
the
disruption
that
that
causes
like
a
transient
denial
of
service
kind
of
thing.
You
need
to
Define
what
threats
are,
what
threats
and
personas
you're
protecting
against
and
what
is
the
priority
so
that
you
can
pick
your
battles.
If
everything
is
a
priority,
then
nothing
is
so.
A
The
public
security
zone
is
an
entirely
untrusted
area
of
the
cloud
it
could
be
the
internet
as
a
whole,
or
just
networks
external
to
your
cluster,
that
you
have
no
authority
over
data
transitions
Crossing.
This
Zone
should
make
use
of
encryption,
not
that
the
public
Zone
as
I
just
defined
it
does
not
include
the
storage
cluster
front
end.
The
SEF
public
underscore
Network,
which
sounds
the
same,
but
is
not
the
same,
which
defines
the
storage
front
end
and
properly
belongs
in
the
storage
access
Zone.
A
Now
the
going
down
the
list,
the
self-client
Zone-
refers
to
networks
as
accessing
SF
clients
like
the
object,
Gateway,
the
set
file
system
or
block
storage
Sev
clients
are
not
always
excluded
from
the
public
security
zone.
For
instance,
canonical
example
would
be
to
expose
the
object
gateways
as
three
or
Swift
apis
in
the
public
security
zone,
so
that
data
can
be
retrieved
from
the
outside.
A
A
Finally,
in
the
cluster,
Zone
refers
to
the
most
internal
Network,
providing
storage
nodes
with
connectivity
for
replication,
heartbeat
a
heartbeat
backfill
and
the
like
recovery.
This
Zone
includes
the
ceph
Clusters
back
in
the
network.
Called
cluster
Network
in
SEF
operators
often
run
clear
text
traffic
in
the
cluster
Zone,
relying
on
the
physical
separation
or
VLAN
separation
of
the
network
from
all
other
traffic.
A
This
going
back
to
the
previous
example,
would
not
be
a
valid
choice
if
your
threat
Model
includes
adversarial
privileged
insiders,
for
example,
these
four
zones
are
separately
mapped
on
or
combined
depending
on
your
use
case
and
the
threat
model
you
use,
and
so
you
have
things
like
these,
where
you
can
look
at
what
services
you
have
and
what
what
networks
they
span
to
now
at
the
edges
of
these
connections,
you
have
components
spanning
the
boundaries
of
two
Networks
and
by
definition,
because
they're
spanning
the
two
zones
with
different
levels
of
privilege,
you
need
to
secure
them
with
the
requirements
of
the
highest
privilege
Network.
A
A
In
many
cases,
the
obvious
thing
to
look
at
is
the
security
controls
like
are
things
properly
secured?
Is
there
an
obvious
misconfiguration
that
was
missed
if
possible,
exceeding
security
requirements
at
integration
points
is
a
good
idea
which,
given
that
this
is
a
storage
product,
is
easier
than
it
would
be
on
on
a
generic
operating
system
or
a
compute
product.
A
A
Completely,
the
opposite
could
be
an
example
again
with
object,
Gateway
and
needs
to
access
all
the
nodes
in
all
the
USD
nodes
to
get
at
the
data
needs
to
access
the
monitors
to
know
where
the
cluster
map
is
and
we'll
likely
need
to
access
the
outside
to
serve
data
out.
So
we
have
all
varieties
in
terms
of
the
demons
that
we
have,
but
we
don't
have
to
apply
the
most
permissive
policy
to
everything.
A
B
So
product
secured
at
product
security
at
IBM
follows
a
secure
development
life
cycle.
With
the
goal
of
reducing
risk
and
improving
security
for
Seth.
We
always
are
suggesting
improvements.
We're
pen
testing
more
regularly.
Now
we're
manifesting
our
all
of
our
dependencies,
just
like
at
Red
Hat,
we're
reviewing
vulnerabilities
tracking
weaknesses,
exploits
all
that
good
stuff,
we'll
still
be
doing
security
releases
and
reviewing
all
new
releases,
just
as
before,
checking
everything
for
cves
and
vulnerabilities
just
like
before,
except
now
we're
at
IBM
and
using
their
systems
previously
at
Red
Hat.
B
This
was
split
into
two
different
roles
of
incident
response
and
security
architect,
which
you
might
have
seen
in
Prior
versions
of
this
talk,
and
now
it's
a
little
bit
different.
B
And,
in
addition,
we're
going
to
continue
to
expand
our
process
to
improve
code
security,
preemptively
and
eventually
start
following
IBM
standards
to
fix
vulnerabilities
and
ensure
compliance,
and
these
are
oftentimes
even
more
extensive
than
Red
Hats.
So,
in
addition
to
us
still
being
devoted
to
Upstream
everything
else,
all
of
this
will
result
in
a
more
secure
stuff,
because
we
have
even
more
rigorous
requirements,
as
we
have
three
releases
going
on
so
and
again,
just
to
reassure
everybody.
B
Now,
of
course,
new
collaboration
produces
new
challenges
and
lots
of
really
fun
and
clever
goals
that
we're
still
figuring
out
how
it's
going
to
play
out,
but
fingers
crossed
it
works
out,
fantastic
and
please
feel
free
to
reach
out
with
any
things.
Any
concerns
that
you
have
any
collaborations
with
IBM
that
you
want
to
talk
about
like
all
of
this
stuff,
please
reach
out.
B
So,
let's
now
talk
a
little
bit.
How
stuff
actually
implements
encryption?
Not
just
what
happens
when
we
get
a
vulnerability
report,
so
server
side,
The,
Operators
overwhelmingly
choose
to
encrypt
data
at
risk
using
Lux,
and
you
don't
necessarily
have
to
have
encryption,
but
it's
an
option
that
we
highly
highly
recommend
all
the
data
and
metadata
of
a
self
storage
cluster
as
a
result
of
using
locks
can
be
secured
using
a
variety
of
DM
Crypt
configurations
and
almost
all
of
our
customers
choose
to.
You
should
choose
to
do
this.
B
We
enable
the
a
general
security
best
practice
by
locating
our
monitors
on
separate
hosts
from
the
osds.
We
ensure
anti-affinity
of
the
keys
and
the
data
that
they
encrypt.
This
means
that
your
driver
host
is
physically
separated
for
your
from
your
decryption
key
as
much
as
possible,
so
that
if
somebody
steals
one
drive,
they
don't
have
both
the
key
and
the
data
they
have.
They
either
have
one
or
the
other,
and
that
makes
it
a
little
bit
harder
to
crack.
B
The
object.
Storage
Gateway
also
has
some
additional
encryption
capabilities.
It
includes
encryption
at
ingestion
time
we
have
the
use
of
per
user
Keys
as
opposed
to
just
pure
per
Drive
keys.
So
you
know,
if
you
want
to
revoke
a
user,
you
don't
have
to
re-encrypt
everything
and
send
out
a
whole
new
key.
We
allow
KIRO.
We
allow
key
rotation
with
tools
like
vault.
B
B
Now,
what
about
encryption
and
Transit
now
that
we've
covered
encryption
rest
well?
Network
communication
can
be
secured
by
turning
on
the
ceph
protocol
for
with
the
messenger
version
2.1
that
was
introduced
to
Nautilus.
Now
clear
text
is
fine
in
this
case.
It's
not
a
huge
security
risk
because,
typically,
where
you
have
a
network
where
you're
using
the
ceph
protocol,
it's
physically
or
logically
isolated
from
access,
you
aren't
just
having
people
monitoring
it
on
The,
Wire
you're,
just
having
people
sniffing
your
packets.
B
It
is
a
little
bit
of
concern
if
you're
on
the
cloud
in
a
shared
cluster,
perhaps
with
a
kubernetes
deployment
or
if
you
run
a
little
bit
more
nervous
about
security
and
or
you
want
to,
for
whatever
reason
have
encryption
on
The
Wire
here
and
your
threat.
Model
includes
it.
Thus
we
implemented
encryption
here
and
you
know
there's
a
lot
of
issues
with
compatibility
and
overhead
for
back-end
protocols,
so
it
really
depends
on
how
you're
setting
up
your
network,
how
you're
setting
up
your
deployment.
B
But
that
being
said
in
most
cases,
the
performance
impact
is
pretty
insignificant
for
a
properly
designed
cluster.
Your
latency
is
going
to
be
completely
overshadowed
by
network
communication
as
long
as
you
account
for
your
CPU
overhead,
and
you
know
we
have
some
best
practices
here
that
go
back
to
the
security
zones
that
we
were
talking
about.
You
have
your
network
hygiene.
B
You.
You
generally
want
to
have
these.
In
addition,
so
when
we
think
about
some
more
specific
protocols,
though
S3
Services
usually
secured
between
rgw
and
the
S3
client,
we
use
TLS
on
Port
443.
You
can
totally
use
plain
HTTP
on
Port
80.,
depending
on
the
nature,
your
data
being
served.
If
you
want
to
make
it
public,
you
can,
but
we
it's
usually
secured,
and
we
recommend
that
TLS
termination
at
our
proxies
a
special
case.
B
The
link
between
h,
a
proxy
and
rgw
is
clear
text
and
needs
to
be
located
and
protected
by
the
security
zones
that
we
were
just
talking
about
and,
of
course,
with
the
network.
Standard
Security
Standard
Security
practices
like
firing
individual
nodes
to
only
exposed
to
cleared
list
of
ports
apply.
B
We
check
that
with
pen
testing
we're
making
sure
that
everything
is
generally
pretty
safe,
but
those
are
all
best
practices-
and
you
know
Rook
specific,
isn't
as
quite
as
relevant
here,
but
Rick
can
use
crds
to
encode
many
of
these
settings.
Configuring
trust
certificates
for
our
rgw
web
server
Rook
also
supports
at
rest
data
encryption
as
we
discussed
earlier
in-flight
stuff
protocol
in
1.9.
Kubernetes
permission
system
also
applies
to
the
persistent
volumes,
so
you
get
the
permissions
quotas
and
all
that
and
all
that
comes
from
kubernetes.
Nothing
Rook
needs
to
do
here.
B
Rook
supports
a
key
management
system
in
the
CSI
driver,
container
storage
interface
and
that
allows
again
individual
volumes
to
be
encrypted
with
their
own
key.
Going
back
to
that
that
limits
the
scope
per
key.
So
if
you
need
to
revoke
a
user,
you
can-
and
this
is
all
done
so
we
can
follow
best
practices,
easily
key
rotation,
revocation
limiting
the
scope
from
each
key.
It
limits
the
scope
of
our
unencrypted
traffic
and
all
that
good
stuff
best
practices
important
to
follow.
We
all
see
them
on
paper,
but
we
got
to
make
them
easy
to
implement.
B
B
We
don't
necessarily
want
our
dashboard
to
be
exposed
to
the
world,
but
it
does
definitely
need
to
be
reachable
by
your
operator's
workstation
to
be
of
use,
so
the
manager
supports
the
whole
infrastructure
has
to
be
reachable
on
the
storage
access
zone.
So
how
do
we
do
that
with
our
control
zones?
Ssr
control,
planes,
SSH,
so
yeah,
but
who
are
we
and
who
is
accessing
what
we
sort
of
have
to
talk
about
identity
and
access
briefly,
and
we
do
use
shared
secret
keys
that
protects
us
from
man
in
the
middle
attacks
by
default.
B
Shared
secret
keys
are
done
using
AES,
which
is
thought
to
be
Quantum
resistant,
fun
fact,
and
but
you
still
need
to
do
some
good
practices.
You
have
to
Grant
key
ring,
read
and
write
permissions
only
for
the
current
user
and
root.
You
want
your
client
admin
user
restricted
to
root.
Only
I,
don't
want
all
users
to
be
rude.
That's
a
bad
security
practice.
Don't
do
that
and
you
know
talking
about
rgw
next.
B
It
supports
the
key
and
Secret
model
of
AWS
S3,
the
equivalent
model
for
openstack
Swift
as
well,
and
talking
a
little
bit
more
about
rgw
again,
the
administrator's
key
and
secret
has
to
be
treated
well
also
in
general,
the
administrator's
key
and
secret
should
be
treated
with
appropriate
respect.
You
don't
want
to
to
reiterate
the
point
you
want
to
use
your
administrative
users
sparingly
to
reduce
your
risk
profile,
the
rgw
user
data
data
is
stored
in
soft
pools,
which
should
be
secured,
as
we
discussed
previously
regarding
data
at
rest.
B
This
isn't
required,
but
people
generally
do
use
it.
We
can
couple
with
oidc
providers,
such
as
click
key
cloak,
backed
with
your
organizational
IDP
for
an
even
more
granular
roller
attribute
access,
but
we'll
continue
to
work
to
make
this
more
and
more
granular,
as
time
goes
on,
just
to
make
security,
best
practices
even
easier
and
again
we
support
ldap
and
active
directory
users.
We
so
we
highly
recommend
using
secure
ldap.
We
support
openstack
Keystone
to
authenticate
object,
Gateway
users
and
openstack
clouds.
B
You
know
what
happens
when
there
is
a
breach.
How
do
we
detect
it?
What
happens
when
we
want
to
just
check
our
logs?
What
happens
if
we
have
a
security
requirement
that
requires
audits?
Well,
we
allow
operator
actions
against
a
cluster
to
be
logged
and
they're
stored
in
VAR
log
saph
audit
log.
You
can
check
the
information
there.
B
A
So
one
more
once
data
is
deleted
from
a
safe
cluster.
It
generally
cannot
be
recovered
for
practical
use,
but
there
are
exceptions.
A
Additionally,
individual
data
blocks
that
use
to
constitute
an
object,
file
or
volume
are
often
still
present
on
persistent
storage,
like
with
any
type
of
storage,
really
until
they
are
overwritten
by
that
capacity
being
used.
So
you
can
end
in
the
case
of
ceph.
You
cannot
securely
delete
the
cluster
by
writing
a
ton
of
data
to
it.
It's
not
going
to
work,
or
this
is
not
going
to
work
reliably,
which
is
what
you
care
about.
A
Security
deletion
is
another
common
question,
so
the
easiest
way
to
solve
this
is
to
do
the
right
thing
from
the
beginning,
which
is
again
encrypted
data,
I
trust,
and
when
you
want
to
sanitize
the
disk,
you
forget
the
encryption
key,
throw
away
the
encryption
key
in
that's
it
plus.
There
are
plenty
of
storage
media
these
days
that
provide
that
functionality
in
Hardware,
so
you
don't
even
have
to
manage
it
and
when
you
need
to
sanitize
okay.
A
A
You
cannot
use
a
shredder
or
a
degausser.
Usually
in
those
cases
you
cannot
return
destroyed,
drives
for
warranty
unless
you
have
a
special
contract
with
the
vendor.
So
so
that's
another
reason
to
use
encryption
Keys
instead
and
then
the
other
case
is
when
you
actually
want
to
prevent
the
lesion.
Absolutely
the
opposite
scenario,
and
so
there
safer
is
one
facility
which
is
the
use
of
multi-factor
authentication
in
RDW,
so
that
you
can
make
it
harder
for
someone
to
go
and
delete
your
data
in
an
attack
by
requesting
a
second
Factor.
A
There
is
one
more
thing
which
is
a
hardening
options,
so
these
are
very
highly
vendor
dependent,
but
they're.
Really,
the
availability
of
them
is
all
the
same
across
all
Linux
distributions.
The
question
is
whether
they
are
compiled
in
the
kernel
or
in
the
case
of
the
self-distribution,
if
the
binaries
are
compiled
with
it.
So
these
are
Red
Hats
choices.
A
A
As
Sage
mentioned,
we
can
make
use
of
fips
142
optionally.
You
configure
the
operating
system
for
that.
A
Red
Hat
Seth,
Linux,
ceph
storage,
binaries
are
compiled
with
those
options
and
we
don't
want
to
go
into
a
discussion
of
GCC
I.
Think
that
we're
dense
enough
already,
but
those
are
the
ones
that
we're
using
they
generally
get
in
the
way
of
exploits
that
are
buffer
overruns
or
Heap
overruns.
A
On
the
Kernel
side,
obviously
you
get
the
options
that
the
trell
selects,
but
you
can
always
build
your
own
kernel
and
play
with
things
like
sitcom
pi
and
the
like,
or
the
aslr
options.
Also
quite
a
few
of
those
are
already
in
Real
by
default
generally.
What
you
want
to
do,
as
the
user
is,
consult
your
vendors
documentation
and
just
figure
out
what
is
that
is
already
in
there
and
most
of
us
don't
have
the
time
to
do
garage
experiment
and
build
your
own
kernel.
A
A
So
that
is
it
aside
from
some
bookmarks
for
you
explaining
the
things
that
we
didn't
go
into
so
managing
kubernetes
Secrets
is
always
a
a
favorite
Ronnie
osnut
at
aqueous
written
a
very
nice
primer
on
that
chapter.
6
of
hacking,
kubernetes
Michael
hasenblast's
new
book
has
a
very
nice
primer
on
storage
that
is
primarily
kubernetes
based
data
security
and
hardening
guide
is
coming
from
our
product.
A
A
Core
documentation
is
very
nice
details
on
how
to
encrypt
secret
data
at
rest,
which
kind
of
goes
back
to
running
is
article
at
the
beginning
and
then
the
last
item
is
for
all
the
stuff
that
we
didn't
discuss
about
hardening
the
kernel
or
the
binaries.
That
will
explain
all
the
mystery
flags
of
GCC
for
those
of
you
that
are
fortunate
enough,
not
to
know
them,
and
there
is
another
one.
A
That's
not
here,
but
Ubuntu
security
team
has
a
very
nice
table
of
the
hardening
options
of
of
the
kernel,
so
you
go
to
their
table
and
if
you're,
not
a
Kernel,
Security
expert,
you
know
what
they
are
and
then
you
can
Google
them
for
your
own
distribution
and
I.
Think
that's
it.
Let's
see
if
there
are
any
questions.
These
are
all
the
people
that
have
contributed
to
the
presentation.
So
far,
if
you
have
things
that
we
should
be
adding
here,
areas
that
we
didn't
address
or.
C
C
A
Okay,
yes,
that's
a
very
smart.
You
got
a
hyper-converged.
A
Before
you
make
it
more
complicated,
because
this
is
the
limit
of
what
people
can
take
already,
the
OSD
has
been
encrypted
with
Lux.
Where
is
the
key?
The
key
is
in
the
monitor,
so
that
goes
back
to
what
Sage
was
saying
earlier,
that
we
want
anti-affinity
between
the
Monitor
and
USD,
so
that
the
key
is
somewhere
else.
So
you
have
to
steal
two
machines
at
least
then.
A
The
key
is
the
key
is
on
the
monitor
machine.
So
how
is
it
secured
on?
The
monitor
machine?
Is
the
the
part
of
the
question
that
I
get
from
the
smartest
people
like
you,
so
the
answer
there
is
that
you're
also
the
encrypting
the
file
system
of
the
monitor,
so
that
that
key
is
encrypted
at
rest
on
the
monitor,
that's
usually,
okay,
the
full.
What's
the
key
for
the
key,
the
full
thing,
the
key
for
the
key
is
there
on
the
machine?
So
if
they
can
boot
that
machine,
then
then
you
have
lost
the.
A
It
shouldn't
matter,
you
still
need
to
boot.
The
host
okay
and.
C
Until
a
couple
of
years
ago,
we
didn't
think
that
the
predictive
you
know
power
analysis
was
possible
in
the
certain
CPUs,
but
now
we
know
so.
A
D
Yes,
I
have
a
similar
question
if
you
use
the
the
encryption
in
the
cefcsi
where
the
the
password
is
stored
in
the
storage
class
and
the
key
is
stored
as
the
metadata
next
to
the
image
and
from
my
understanding
is
it
like
this?
If
all
the
images
have
one
thousand
ten
thousand
images
with
the
keys
and
the
keys,
the
key
is
always
encrypted
with
the
same
passphrase.
It
should
be
possible
to
to
reverse
engineer
this
passphrase.
Oh.
A
You're
saying
in
terms
of
doing
large
number
of
known
of
known
ciphertext,
yes,.
D
A
D
The
data
and
at
the
moment
we
are-
or
we
made
already
the
decision
to
re-implement
this
part
of
cefcsi,
because
in
the
in
the
storage
class,
the
passphrase
to
Inc
to
encrypt
the
key
of
of
the
image
should
not
be
stored
like
in
a
world
or.
A
Usually
the
answer
there
is
either
a
the
protocol
uses
temporary
keys,
and
so
you
don't
care
as
long
as
the
sessions
are
very
not
long,
not
very
long
lived
or
two.
The
protocol
doesn't
use
temporary
keys,
in
which
case
you
need
key
rotation.
Like
Amazon
keeps
bothering
you
saying,
rotate
your
keys,
every
I
think
three
months
already
know
like
but
I,
don't
know
what
is
What
scenario
would
Lux
fall
into
here.
D
D
E
Hey
so
we
we
actually
started
encrypting
with
looks
before
ceph.
C
A
E
Think
so
this
maybe
is
a
dumb
question,
but
is
there
a
facility
to
rotate
Lux
keys,
really.
A
Built
into
stuff
with,
like
oh
definitely,
not
built
into
self
okay,
so
anything
that's
available
for
Linux
we
could
use,
but
we
we
don't
have
our
own
okay
and
the
most
popular
thing
in
terms
of
key
rotation.
Usually
then
I
get
the
question
about
about
object
stores
and
the
vast
majority
of
customers
that
want
to
do
that.
Use
hashicorp,
Vault,
but
obviously
it's
at
another
level.