►
From YouTube: Pivotal: Apache Cassandra on Pivotal CloudFoundry
Description
Speaker: Tammer Saleh, Director of Product - Cloud Foundry Services at Pivotal
Pivotal is dedicated to bringing best-of-breed data services to Pivotal CF, and there is no other open source data technology with as much potential as Cassandra. We’ll discuss the strategies and techniques for deploying and managing a multi-user Cassandra installation that integrates with Cloud Foundry.
- Making Cassandra manage itself
- Single-tenant versus Multi-tenant usage
- Deploying Cassandra with BOSH
- Cloud Foundry services architecture.
A
A
Hi
everybody
today
I'm
going
to
talk
about
cloud,
pivotal
cloud,
foundries,
use
of
Cassandra
and
the
product
that
we're
building
for
deploying
Cassandra
clusters
on
premise
behind
the
firewall
in
a
kind
of
automated
fashion.
It's
a
partnership
between
pivotal
and
datastax
that
we're
going
to
be
releasing
in
the
next
couple
of
weeks.
A
So
first,
my
name
is
tamara
saleh,
I'm
the
director
of
product
for
pivotal
cloud
foundry,
I'm
actually
in
London,
flew
in
for
this
event,
which
is
is
marvelous
incredible.
I
also
was
an
engineer
for
pivotal
for
a
while
spent
some
time
pairing
on
the
runtime
team.
In
the
past,
I've
ran
consulting
companies,
director
of
engineering
for
Engine,
Yard,
various
other
things.
A
Today,
we're
going
to
talk
about
a
bunch
of
things.
It's
a
lot
of
content
to
get
through
so
I'm
going
to
try
and
move
through
it
quickly.
Let
me
give
you
a
quick
overview
of
pivotal
cloud
foundry
what
it
is
and
how
it
works.
I'm
going
to
talk
about
the
services
API
and
I'm
also
going
to
talk
about
why
we
chose
Cassandra
as
the
the
data
store
to
focus
upon
how
we
built
Cassandra
as
a
service
available
at
a
platform
and
how
we
automated
Cassandra
operations.
A
A
To
I
mean
it
is
a
an
on-premise
deployment
of
cloud
foundry,
it's
very
similar
to
Heroku
and
how
you
use
it,
and
the
idea
is
that
we
can
give
developers
agility
by
removing
all
of
the
friction
from
getting
their
app
from
a
developed
application
into
production
right
and
a
phrase
that
we
like
to
use
in.
Pivotal
is
CF
push
is
golden
and
that
really
what
we're
saying
is
a
sing
command
to
push
your
app
into
production
is
the
user
experience
goal
that
were
that
were
focused
upon
now.
A
A
So
real
quick.
Let's
talk
about
the
services
API
and
how
that
works.
So,
let's
look
again
at
kind
of
a
simplified
version
of
the
internals
of
Cloud
Foundry.
You
can
see
here.
We've
got
again
the
operations
manager,
we've
got
a
Bosch
behind
it.
On
top
of
the
infrastructure
and
we've
already
deployed
the
cloud
foundry
runtime
we've
got
the
router
and
then
we've
got
some
application
instances
and
the
cloud
controller,
which
is
the
API
endpoint.
That
manages
the
whole
thing
now.
The
operator
has
now
deployed
a
service
next
to
it.
A
The
service
is
it's
all
about
the
service
broker,
which
is
a
very
small
API
endpoint
that
communicates
with
the
cloud
controller
and
the
concept
of
service
instances
that
the
application
developer
is
requested.
So,
let's
kind
of
go
through
the
workflow
here:
application
developer
and
I
like
the
command
line,
because
I'm
a
UNIX
geek
so
most
of
stuff
I
talk
about
it's
going
to
be
on
the
command
line.
But
of
course
this
all
works
to
the
dashboard
as
well.
But
the
application
developer
says:
okay,
CF
marketplace.
A
Tell
me
what
you
have
for
data
services
like
Cloud
Controller
then
reaches
out
to
the
service
broker
the
sir.
It
says:
what
do
you
have?
A
service
broker
says:
well,
I've
got
Cassandra
right
and
then
the
Cloud
Controller
says
here's
here's.
What
we
have
available.
One
of
the
services
is
Cassandra,
and
this
is
a
view
of
the
the
dashboard,
but
in
the
command
line.
You
just
get
that
listing
right
there.
So
the
application
developer
says:
sweet
I,
like
Cassandra
I,
want
to
get
that
right.
A
So
they
create
that
service
when
they,
when
the
initial,
when
they,
when
they
run
CF,
create
service.
The
Cloud
Controller
talks
back
to
the
services
API
again
on
the
broker
and
says:
ok,
give
me
an
instance
of
this,
and
you
can
see
the
little
blue
instance
that
now
popped
up
at
that
point.
The
service
broker
has
actually
made
a
database
for
the
application
developer
right
now,
it's
just
sitting
out
there
there's
nothing
talking
to
it.
You
can
bind
it
to
an
application,
so
the
application
developer
says
buying
the
service
to
my
appt.
A
The
blue
dot
on
the
left
is
the
application
blue
doubt
on
the
right
is
the
service
instance,
so
when
he
says,
bind
it
it
provisions
a
binding,
the
binding
gets
returned
as
basically
an
environment
variable
that
is
exposed
to
the
application
instance
right
from
that
point
forward.
All
communication
goes
directly
from
the
application
instance
to
the
service
instance
directly.
We
don't
get
in
the
way
right
now.
A
The
reason
for
that
two-phase
approach,
where
you
provision
the
service
and
then
you
bind
us
to
the
application,
is
so
that
you
can
bind
that
service
to
multiple
applications
right
depending
on
how
you
want
to
do
your
microservices
architecture,
you
might
want
to
share
database
on
the
back
end,
but
the
point
of
this
is
the
services.
Api
is
incredibly
simple,
incredibly
easy
to
implement.
This
is
literally
all
of
the
RESTful
API
calls
for
this
API
and
I'll
point
out
that
these
are
all
the
verbs
we
support
as
well.
A
There
are
only
five
permutations,
so
you
can
write
a
service
broker
trivially
in
something
like
Sinatra,
via
Ruby
or
via
java.
It's
very
easy.
The
complexity
of
doing
a
service
is
not
in
the
API.
It's
in
the
implementation
behind
it,
which
we'll
talk
a
lot
about
so
real
quick
I
want
to
talk
about
what
services
we're
focusing
on
for
pivotal
kind
of
get
into
why
cassandra
is
the
one
that
we're
taking
to
production.
A
First,
we
have
Redis,
which
is
a
kv,
store,
I'm
sure
every
I'm
sure
everybody's
heard
of
these
databases,
so
I'm
going
to
kind
of
move
through
it
quickly.
Redis
is
one
of
my
favorites:
it's
somebody
referred
to
it
as
the
ak-47
update
of
data
stores,
it's
very
focused
in
what
it
can
do
its
rock-solid,
but
it
doesn't
scale
very
well
in
terms
of
number
of
nodes.
I
think
you
could
do
masters
slave
at
the
moment.
A
Clustering
coming
out
later,
nothing
like
a
Sondra
neo4j
for
graph
database,
elastic
search
for
full-text,
searching
MongoDB
for
a
fast
and
easy,
no
sequel
store
memcache
for
caching,
Maria
DB.
For
my
sequel,
we
have
our
own
services
like
our
Hadoop
distribution,
PhD
react,
cs4
a
blobstore,
s3
compatible
and
then,
of
course,
Cassandra
highly
distributed
right.
Heavy
kv,
/
columnstore,
so
these
are
the
focuses
of
what
we're
going
to
be
releasing
with
pivotal
CF.
Some
of
these.
Some
of
these
are
already
released.
Some
of
these
are
in
progress
right
now
with
the
London
team.
A
A
A
Of
course,
netflix
has
shown
that
they
were
able
to
get
up
to
I.
Think
it's
a
million
writes
per
second,
with
Cassandra
and
again
linear
scaling
is
very
rare
in
computer
science
to
see
a
system
that
actually
scales
this
linearly.
It's
really
impressive,
so
we
believe
you
know,
Cassandra
is
truly
cloud
scale.
Cassandra
supports
multi
data
center
deployments
and
nowadays,
given
the
cloud
a
multi
data
center
deployment
is
is
commonplace.
I
remember
when
I
used
to
be
a
unix
administrator
for
citysearch.
A
We
had
two
data
centers
one
in
los
angeles,
one
in
chicago,
and
that
was
cutting
edge
for
us
to
have
that
kind
of
distributed
system.
But
even
then
it
was
just
a
master's
master-slave
failover
nowadays,
with
Amazon
and
with
various
cloud
technologies,
is
very
easy
and
how
in
place
to
be
deploying
your
application
across
multiple
data
centers,
because
Sandra
is
the
best
data
store
to
support
that.
A
So,
let's
talk
about
how
we
built
Cassandra
in
terms
of
the
architecture
when
it's
a
service
that
integrates
of
the
platform
so
again,
I
said
the
services.
Api
is
actually
quite
simply,
you've
got
kind
of
these
four
main
terms.
You
got
the
service
itself,
which
is
kind
of
a
meta
term.
It
means
what
exactly
is
the
technology
that
you're
deploying
for
in
this
situation?
Of
course,
it's
Cassandra
you
can
choose
the
various
plans
in
that
service,
which
is
nice.
A
It
gives
the
application
developer
a
little
bit
of
tweaking
I
want
a
small
me,
more
large,
Cassandra
right,
and
then
you
provision
the
Cassandra
instance,
and
then
you
bind
it
again.
The
binding
is
just
a
user
account
and
some
credentials
that
gets
passed
back
to
the
to
the
application.
The
real
trick
is
what
is
an
instance
whenever
you're
building
out
a
new
service,
you
have
to
think
long
and
hard
about
exactly
what
the
user
gets
when
they
say.
A
I
want
an
instance
of
service
x,
all
right,
so
initially,
the
easiest
way
of
getting
a
service
out
into
the
world
is
to
do
a
what
we
call
multi-tenant
installation.
So
we
have
this
available
right
now
in
beta
form
for
the
Cassandra
product,
and
what
this
means
is
it's
one
Cassandra
cluster
that
is
divvied
up
amongst
users
right
when
I
asked
for
an
instance
of
Cassandra.
A
What
I
actually
get
is
a
single
key
space
and
a
user
that
has
access
to
that
key
space,
so
this
works
actually
fairly
well
with
Cassandra
in
terms
of
scaling,
as
we've
already
seen,
Cassandra
can
scale
out
linearly
and
it's
good
for
development
and
testing
and
staging
environments.
Unfortunately,
there's
some
limitations
of
Cassandra,
which
means
that
it's
not
good
for
heavy
production
use,
and
it's
not
it's
not
good
for
in
it
for
strictly
untrusted
environments.
So,
let's
talk
about
a
couple
of
limitations,
key
space
visibility.
This
is
the
biggest
one.
A
So,
basically,
there's
no
way
in
Cassandra
to
say
that
user
a
cannot,
see
the
names
of
all
the
key
spaces
in
the
system.
All
right
now,
that's
not
a
huge
security
concern
for
us,
because
in
cloud
foundry
the
key
space
names
are
actually
just
randomized.
Goo
id's,
a
user
can
see
that
there's
20
key
spaces
or
200
key
spaces,
but
that's
not
really
a
big
deal.
A
bigger
problem
with
key
space
visibility
is
that
in
Cassandra,
there's
also
not
a
way
to
say
that
user
a
cannot,
see
the
names
of
tables
inside
the
key
space.
A
Now
the
names
of
the
tables
inside
the
key
spaces
are
determined
by
the
other
users.
So
there
is
some
information
leakage
there.
That
makes
this
only
appropriate
for
again
for
semi
trusted
environments.
Now
pivotal
CF
is
designed
to
be
deployed
behind
the
firewall
inside
private
institutions,
so
it's
still
useful
again
for
staging
and
development
environments.
A
There's
also
the
noisy,
neighbor
problem
in
Cassandra.
There's
no
way
to
limit
the
amount
of,
for
example,
the
amount
of
CPU,
that's
used
by
queries
against
a
single
key
space.
The
amount
of
memory
that's
used
for
queries
against
a
single
key
space
or
the
amount
of
disk
space
this
used
by
a
key
space.
We
actually
did
some
research
to
see
if
we
could
just
make
use
of
the
underlying
unix
system
quotas
to
deal
with
the
the
disk
quota,
at
least
the
key
space
size
that
would
kind
of
work.
A
The
problem
is
that
the
way
Cassandra
dumps,
the
I
think
is
SS
tables-
means
that
it'd
be
easy
to
kind
of
overrun
that
quota
and
then
lose
data,
and
that's
absolutely
unacceptable
in
our
situation.
So
noisy
neighbors
is
a
problem
with
a
multi-tenant
cassandra
and
again
quotas
as
well,
noisy
neighbors
for
cpu
quotas
for
things
like
disk
and
memory.
A
So
so
we
have
the
multi-tenant
cassandra,
which
is
good
for
application
development
staging.
But
when
we
really
want
to
push
it
out
to
production,
we
need
to
think
bigger
right.
So
for
production
level,
when
a
user
requests
a
cassandra
instance,
what
they're
actually
going
to
get
is
a
set
of
dedicated
VMs
that
are
set
up
as
a
cassandra
cluster
in
the
initial
version
we're
just
going
to
hard
code.
A
That
number,
maybe
three
and
probably
one
number
four
plan
so
easier
to
say
I
want
the
small
plan,
which
is
three
I,
want
the
medium
plan,
which
is
say
ten.
I
want
the
large
plan,
which
is
whatever
the
operator
wants
to
make
it
now.
This
is
truly
production
grade.
It's
is
dedicated
VMS
as
a
cluster,
there's
no
noisy
neighbors
whatsoever,
there's
no
issues
of
quotas,
because
the
operator
can
set
the
size
of
the
disks
on
these
VMS.
The
only
issue
with
it
is,
it
is
quite
expensive.
A
The
operator
is
clearly
going
to
want
to
limit
who
can
provision
a
production
level,
cassandra
cluster
there's
also
a
middle
ground
and
by
the
way,
this
is
what
we
recently
rhian
cepted
about
a
week
ago.
So
we
had
a
big
kickoff
meeting
where
we
defined
the
architecture
to
find
this
story's,
pivotal's
and
agile
organization
and
we're
actively
building
this
version
out
for
the
production
level.
Cassandra,
there's
another
architecture
that
we've
been
looking
at
closely
and
it's
something
that
we're
considering
it's
not
quite
clear.
A
This
is
something
that
our
customers
are
demanding
yet,
but
it
is
it's
a
possible
architecture
which
is
basically
a
mix
between
the
two,
so
it's
a
shared
set
of
VMs,
but
when
a
when
a
user
asks
for
a
Cassandra
instance,
what
they
get
is
a
set
of
Cassandra
processes
spread
out
across
those
VMS
configured
as
a
cluster
wrapped
inside
containers,
we
use
warden
for
a
containerisation
technology.
You
can
also
use
docker
and
the
key
is
that
these
would
be
shared
in
the
VMS.
A
So
a
set
of
VMs
might
be
running,
say
three
VMs
might
be
running
20
Cassandra
clusters
right
now
because
of
Linux
containerization
technology.
That's
rather
well,
it's
not
new.
It's
actually
been
around
for
a
long
time,
but
it's
constantly
improving
and
what
you
can,
what
you
can
constrain
with
containers.
It
is
possible
to
basically
remove
the
noisy
neighbor
problem
entirely.
You
can
constrain,
of
course,
how
much
disk
space
is
being
used.
How
much
memory
is
being
used?
How
much
CPU
you
can
even
constrain
the
networking
aspect,
so
the
clusters
can
only
talk
to
each
other.
A
You
can
do
all
kinds
of
things
there.
The
nice
thing
about
this
is
that
a
user
would
get
an
entire
access
to
the
entire
key
space
or
all
of
the
key
spaces.
With
a
multi-tenant
solution,
a
user
just
gets
the
one
key
space,
usually
for
Cassandra.
That's
the
right
way
to
go
because
hundra,
you
know.
Usually
you
have
a
key
space
per
application,
but
we
have
seen
you
use
cases
where
people
want
to
be
able
to
provision
key
spaces
on
the
fly
in
order
to
segregate
data.
A
A
Let's
see,
Cassandra
automation
is
managing
a
cassandra
database
is
actually
fairly
complex,
repairs
and
timings,
and
such
we
have
a
tool
called
Bosch,
which
makes
all
of
this
much
easier
bosch.
Is
our
tool
for
deploying
everything,
Cloud
Foundry
I
actually
used
to
run
product
for
Bosh,
so
I
was
running.
The
Bosch
team.
I
can
say
this
because
of
that
Bosch
is
incredibly
powerful.
It's
also
incredibly
painful
to
use
I
like
to
make
the
analogy.
Bosch
is
like
a
like
this.
A
A
So
Bosch
is
a
tool
that
is
you
can
it's
in
the
same
space
as
as
a
puppet
or
chef
or
salt
or
ansible,
but
it
takes
things
it
does
things
in
a
different
way.
It's
predictable,
it's
repeatable,
its
infrastructure,
agnostic
and
it's
built
for
large-scale
deployments.
Bosch
takes
care
of
the
entire
life
cycle
of
a
deployment
it
takes
care
of
provisioning.
The
vm
is
talking
to
the
infrastructure
to
get
that
done.
It
takes
care
of
configuring.
What's
on
the
VMS,
it
takes
care
of
juggling
the
persistent
store
for
the
for
the
VMS
as
well.
A
All
the
networking
everything
bosch
is
also
very,
like
I
said,
predictable
and
repeatable,
because
bosch
compiles
everything
from
scratch.
It
does
it
once
and
uploads
it
to
a
blob
store
and
it
uses
that
same
the
same
set
of
blobs
to
lay
down
every
vm
that
it
deploys.
That's
also
why
bashas
built
for
large-scale
deployments.
In
addition,
Bosh
has
features
like
Canaries
rolling
deployments.
A
You
can
configure
that
Bosh
also
ensures
that
all
the
processes
are
running
correctly
on
the
VMS,
the
ones
you've
told
it
to
watch
and
and
to
make
sure
that
the
vm
stay
healthy
and
that's
a
feature
that
we
added
to
Bosh
because
of
AWS.
It's
the
Bosch
resurrector
kind
of
a
side
story
here
is
that
we
used
to
run
our
public
cloud
foundry
installation
on
a
vsphere
installation,
a
vsphere
data
center
that
was
absolutely
production
grade.
Emc
data
storage,
I
think
EMC
servers
as
well.
A
I
think
they're,
like
gars,
with
machine
guns
outside
this
building
right.
Because
of
that,
we
didn't
really
have
to
worry
about
a
che,
because
VMs
never
went
away
right
and
then,
when
I
came
onto
pivotal,
my
first
task
was
to
take
this
production
platform.
Does
public
public
cloud
and
move
it
over
to
AWS,
which
I
like
to
refer
to
as
the
the
quantum
flux
of
the
cloud
world
AWS
is
absolutely
the
worst
API
and
the
least
reliable
of
any
of
the
infrastructures
that
you
could
possibly
work
with.
A
A
Basically
on
each
one
of
the
VMS,
that's
deployed
by
Bosh
Bosh
has
an
agent,
that's
constantly
heart,
beating
back
into
the
bosch
director.
Now,
if
at
any
point
one
of
those
agents,
just
the
agent
stop
sending
heartbeats
could
be
a
network,
partition
could
be
anything
the
agent
could
be
down
for
some
reason,
box
doesn't
really
care
at
that
point.
Boss
just
sends
a
stone,
ax
signal
to
the
to
the
infrastructure
and
just
wipes
the
entire
vm
off
the
buff.
A
The
the
the
network
now
again
bosch
is
aware
of
the
of
the
persistent
storage
of
the
vm,
so
nothing's
lost
it
just
recreates
that
vm
and
then
attaches
the
persistent
store
again
now
the
agents
heart
beating
and
everything
just
works.
Fine,
and
this
is
proven
to
be
one
of
the
more
important
features-
a
bosch,
especially
when
you're
deploying
it's
something
like
AWS.
A
A
There's
a
difference
between
precision
and
accuracy,
which
I
think
most
you
probably
understand.
The
interesting
thing
about
Cassandra
is
that
it's
more
important
for
the
clocks
on
the
various
Cassandra
VMs
to
be
precise
than
accurate,
I,
don't
care
if
my
Cassandra
nodes
think
that
it's
1912
as
long
as
they
all
think
that
it's
1912
right.
A
So
the
way
that
you
get
that
done
is
in
ntpd,
which
is
the
the
service
that
ensures
that
your
VMs
are
have
the
right
time
set
on
them.
You
can
tell
it
to
think
they
call
it
homing
to
peers.
You
can
tell
it
to
to
home
to
peers
in
a
in
a
tiered
fashion.
So
this
is
a
very
unusual
configuration
for
VMS
in
a
network.
But
basically
you
say
all
these
nodes
I
want
them
all
to
hone
to
each
other.
A
A
Another
aspect
of
maintaining
and
managing
a
Cassandra
cluster
is
the
the
repair
lifecycle,
I'm
sure
everybody
here
is
dealt
with
running
repairs
on
Cassandra
cluster.
In
all
of
the
all
the
times
you
might
have
to
do
that
by
hand.
Now
the
challenge
with
pivotal
Cloud
Foundry
is
that
we're
taking
over
the
operations
aspect
of
Cassandra
on
behalf
of
the
operator
right,
so
we're
we
can't
send
in
any
telemetry
back
to
headquarters
for
us,
because
it's
all
deployed
behind
a
firewall.
Most
of
the
point
of
this
is
that
companies
don't
want
that
stuff
being
sent
out.
A
So
there's
a
set
of
times
when
we
run
repairs
the
first
one
is
when
we're
decommissioning
a
node,
and
we
know
that
that
VM
is
going
away
permanently.
Now
we
don't
run
repairs
when
we
know
that
node
is
coming
right
back.
So
if
Bosch
is
doing
a
deployment-
and
it
has
to
recreate
the
vm
for
some
reason
or
if
the
resurrector
ran
and
had
to
recreate
the
vm,
we
don't
run
a
repair
then
because
in
those
situations
you
could
actually
end
up
running
a
repair
across
the
entire
cluster
and
really
degrading
performance.
A
We
also
allow
the
operator
to
configure
a
time
threshold
when,
if
the
node
has
been
down
for
longer
than
this
period
of
time,
then
we
run
a
repair
when
the
node
comes
back.
So
if
there
was
some
downtime
in
the
node
for
say
twenty
minutes
and
the
operator
said
15s
their
threshold.
When
the
node
comes
back,
we
will
we
will
run
a
repair
on
that
node.
A
A
A
So
the
whole
point
of
this
is
like
I
said.
Nowadays,
it's
almost
with
technologies
like
docker,
with
all
the
new
routing
technologies
that
are
coming
out.
It's
not
that
difficult
to
produce
kind
of
a
toy
platform.
The
real
key
is
in
producing
something
that
manages
itself
maintains
itself
and
that
has
high
available
highly
available
stateful
data
services
behind
it,
and
we
believe
that
those
are
the
key
differentiator
for
pivotal
CF.
A
Our
customers
believe
that
as
well,
our
customers
are
really
excited
by
the
Cassandra
product
that
we're
building
with
a
datastax
partnership
and
and
it's
one
of
the
most
important
projects
that
were
working
on
right
now,
so
anyways
I
want
to
wrap
that
up
quick.
Thank
you
very
much
for
your
time.
If,
if
this
interests
you
and
if
you're
interested
in
living
in
London,
we
are
hiring
on
the
cloud
foundry
team
over
there.