►
From YouTube: BlackRock: Multi Tenancy in Cassandra at BlackRock
Description
Speaker: Randy Fradin, Vice President
At BlackRock, we use Apache Cassandra in a variety of ways to help power our Aladdin investment management platform. In this talk I will give an overview of our use of Cassandra, with an emphasis on how we manage multi-tenancy in our Cassandra infrastructure. Multi-tenancy can mean different things to different people, but it often comes with added requirements related to security, isolation, and administration. I'll talk about how we operate (and make changes to) Cassandra to accommodate these needs in our platform.
A
Hi
everyone,
my
name,
is
Randy
Frieden
and
and
I
work
for
Blackrock
I'm,
very
excited
to
be
here
closing
out
cassandra
summit
2015,
I'm
thrilled
to
see
that
some
people
had
enough
stamina
to
sit
through
one
more
session
before
hitting
the
road.
So
thank
you
for
that.
I'm
gonna
spend
a
few
minutes,
giving
some
background
on
myself
and
and
Blackrock
and
a
little
bit
of
our
history
with
Cassandra,
and
then
the
main
topic
of
the
presentation
is
going
to
be
multi-tenancy
in
Cassandra
and
how
we
approach
it.
At
Blackrock,
a
bit
of
background
about
myself.
A
For
those
of
you
who
aren't
familiar
with
Blackrock
we're
the
world's
largest
investment
manager.
We
manage
over
four
point:
seven
trillion
dollars
in
investments
or
the
world's
largest
issuer
of
exchange-traded
funds
with
our
iShares
platform.
So
hopefully,
a
few
of
you
own
some
I
shares
or
some
some
Blackrock
mutual
funds.
If
your
money's
just
sitting
under
a
mattress,
you
should
come
talk
to
me
after
this
presentation
we're
also.
We
also
have
an
advisory
business
that
works
with
governments
and
central
banks
around
the
world,
and
on
top
of
all
that
we're
a
technology
provider.
A
The
technology
that
we
provide
is
called
Aladdin
and
it's
black
rocks
Enterprise
Investment
system.
It
covers
all
asset
classes
in
all
parts
of
the
investment
management
process,
everything
from
trade
and
order
management
to
risk
management
and
portfolio
administration,
accounting
oversight.
So
all
parts
of
the
process
it's
used
by
Blackrock
and
we
also
sell
it
as
a
service
to
over
50
they're
large
financial
institutions
around
the
world.
In
aggregate,
it's
used
to
manage
over
15
trillion
dollars
in
investments.
A
A
Blackrock
started
looking
started
looking
at
using
cassandra
in
aladdin
back
in
2010,
when
cassandra
was
on
version
0.6
for
those
of
you
who
weren't
using
it
back,
then
it
was
sort
of
a
bumpy
ride.
Luckily,
we
stuck
with
it
anyway
and
got
our
first
production
application
using
Cassandra
in
2011
on
version
point
8.
A
By
now
it's
used
abundantly
across
all
sorts
of
parts
of
aladdhin
I
can
see
from
the
chart
I
put
on
the
on
the
slide
here
that
the
daily
read
and
write
operation
counts
across.
Our
clusters
has
been
growing
steadily
over
the
last
few
years
and
just
recently
crossed
2
billion
and
aggregate.
We
use
it
for
all
kinds
of
things.
We
use
it
as
an
object
database
or
an
object
cache
for
things
like
reports
and
models
and
projections,
and
you
know
reference
data
from
the
market.
We
use
it
as
a
time
series
database
a
lot.
A
Obviously,
cassandra
is
very
good
at
that
for
event,
logs
and
workflow
updates,
and
it's
the
backing
store
for
our
durable
message.
Bus.
We,
especially
like
Cassandra's,
flexible
cross
data
center
support.
That
was
a
big
reason
why
we
decided
to
start
using
it
in
the
first
place
we
there.
There
was
a
gentleman
in
this
room
earlier
today
from
pager
duty
who
talked
about
their
unique
use
case
of
using
it
synchronously
across
the
wind.
We
actually
do
that
too.
A
A
What
do
I
mean
by
multi-tenancy
and
what
requirements
do
we
have
of
a
multi-tenant
storage
system
and
then
I'll
go
over
the
different
options
for
for
doing
multi-tenancy
and
cassandra
and
then
finally
I'll
talk
about
our
requirements
in
Aladdin
and
what
choices
we
made
and
and
why
we
made
them
I've
drawn
here
a
simple
illustration
of
a
multi-tenant
storage
system.
So
on
the
left,
you
have
your
different
applications
running
independently
and
serving
different
subsets
of
users
and
then
on
the
right.
A
You
have
the
data
stored
by
those
tenants
which
is
also
independent
but
stored,
using
shared
database
infrastructure.
Sharing
the
infrastructure
gives
you
economies
of
scale.
So,
rather
than
having
the
cost
and
complexity
of
having
to
manage
many
independent
sets
of
infrastructure,
you
can
just
deploy
and
manage
a
single
pool
of
capacity.
A
So
what
requirements
would
you
have
of
this
kind
of
multi-tenant
database
system?
One
of
them
is,
is
administrative
needs
you
for
one
you're
going
to
want
to
be
able
to
customize
the
configuration
of
each
tenant
storage.
So
in
the
case
of
Cassandra,
that
might
mean
things
like
replication
or
or
compaction
strategies
or
compression
settings.
A
Each
tenant
might
also
have
different
requirements
for
backup
how
often
to
take
backups
what
to
do
with
them.
How
long
to
keep
them
for
you're
also
going
to
want
to
be
able
to
track
metrics
and
attribute
load
in
your
cluster
or
your
your
database
system.
So
you
can
really
understand
who's
driving
utilization
to
accurately
project
your
future
capacity
needs
and
and
if
need
be
see
if
someone
is
misbehaving
and
have
the
ability
to
dial
down
an
individual
tenant
our
next.
Our
next
requirement
is
security
is
consists
of
two
parts.
A
The
first
is
authentication,
meaning
knowing
who's
connecting
to
the
system
and
being
able
to
correlate
all
the
activity
that
takes
place
with
some
verified
identity
and
then
the
second
part
is
authorization,
meaning
using
the
information
that
you
get
from
the
authentication
system
to
enforce
a
policy
dictate
in
what
what
users
are
allowed
to
do
once
they
connect
these
two
parts
together.
Allow
you
to
guarantee
to
tenants
that
that
their
data
remains
private
and
safe
from
tampering
by
other
tenants,
and
our
third
requirement
is
for
performance
isolation.
A
The
goal
here
is
that
a
spike
in
utilization
from
one
tenant
should
ideally
have
no
impact
on
any
other
tenants
level
of
service.
So
the
guy
in
the
middle
here
might
have
thought
he
he
was
living
in
an
isolated
system
until
the
day
his
his
neighbors
really
jacked
up
their
utilization,
and
now
he
probably
wishes
he'd
rent
in
space
in
someone
else's
database.
A
So
what
are
our
options
for
deploying
Kassandra
in
a
multi
tenant
model?
Well,
the
first
option
is
just
deploy
a
separate
cluster
for
every
one
of
your
tenants.
The
advantage
of
this
approach
is
that
it
should
be
pretty
trivial
to
meet
our
requirements
of
multi
tenant
administration
and
security
and
performance
isolation,
because
we've
essentially
completely
separated
all
the
tenants,
but
the
disadvantage
is
the
cost
and
complexity
and
an
overhead
of
having
to
administer
and
manage
that
many
clusters
in
your
organization.
A
Essentially,
you
might
not
be
getting
many
of
the
benefits
of
multi-tenancy
with
this
approach
and
to
the
degree
to
which
really
depends
on
your
exact
infrastructure
and
and
how
streamlined
your
Cassandra
operations
are
so
I've
drawn.
In
a
simple
illustration
of
how
this
might
look
in
practice.
Let's
say
you
have
three
tenants
and
they're
all
interested
in
storing
the
same
class
of
data.
In
this
case,
prices
from
the
market
you'd
end
up
having
three
independent
tables
stored
in
three
completely
different
Cassandra
clusters.
A
Our
section.
Our
second
option,
is
to
put
those
tenants
into
shared
clusters,
but
separate
them
by
key
space.
One
advantage
of
this
approach
is
that
we
should
still
be
able
to
meet
most
of
our
multi
tenant
administrative
needs.
Although
a
lot
of
configuration
in
Cassandra
is
at
the
cluster
level,
via
the
things
you'd
put
into
your
cassandra
demo
file,
a
lot
can
be
customized
per
key
space
or
per
table
using
the
options
that
you
specify
when,
when
you
create
the
key
space
for
the
table.
A
Additionally,
a
lot
of
the
control,
the
commands
you
run
via
node
tool,
so
like
snapshot
or
repair
or
those
kinds
of
things,
can
be
done
per
table
as
well,
and
a
lot
of
the
metrics
that
that
Cassandra
gives
you
our
our
per
table.
So
it
seems
like
we
might
be
covered
with
administration
and
additionally,
Cassandra's
authorization
features
are
our
per
table,
so
it
looks
like
we
can
beat
our
security
needs.
The
tricky
part
here
is
going
to
be
meeting
our
isolation
needs.
A
That's
somewhat
challenging
to
do
once
you
once
you
start
to
introduce
shared
clusters,
I'm
going
to
address
that
concern.
A
little
later,
I've
extended
my
example:
prices
table
to
this
option,
so
you
can
see
at
the
bottom
now.
You
have
still
two
independent
tables,
but
this
time
stored
in
the
same
cluster,
but
in
tenon
specific
key
spaces.
A
The
third
option
is
to
share
your
clusters.
Share
your
key
spaces,
but
separate
your
tenants
by
table
turns
out
there's
very
little
substantive
difference
between
this
and
option
to
almost
all
that
granular
configuration
and
control
and
metrics
are
actually
at
the
table
level,
not
the
key
space
level
anyway.
In
fact,
one
of
the
only
things
that
you
do-
control
/
key
space,
is
your
replication
factor.
So
again
the
example
at
the
bottom
shows
how
this
is
nearly
identical
to
option
two.
A
Our
fourth
option
is
to
share
clusters
key
spaces
and
tables
between
tenants.
Now,
why
would
we
do
this?
It
seems
like
it
would
make
meeting
all
of
our
multi-tenancy
requirements
more
difficult
and
you'd
be
right,
but
one
reason
you
might
want
to
consider
doing
this
is
that
you
can
accommodate
more
tenants.
This
way
you
see
it
turns
out.
A
Cassandra
cluster
can
only
support
having
so
many
tables
in
it.
We've
been
told,
maybe
a
few
thousand
will
work,
perhaps
up
to
ten
thousand.
A
In
fact,
if
you
go
on
the
internet,
a
lot
of
people
will
tell
you,
don't
don't
go
beyond
100
and
that's
not
a
limit
that
you
can
expand
by
adding
more
nodes
to
your
cluster.
Unlike
the
amount
of
data
you
can
store
or
the
amount
of
operations
per.
Second,
you
can
push
through
Cassandra.
You
can't
increase
the
table
limit
by
adding
nodes
to
the
cluster,
and
the
reason
for
that
is
that
each
table
takes
up
a
certain
amount
of
memory
in
every
single
node
in
your
Cassandra
cluster.
A
Some
of
that
is
variable
cost
based
on
the
amount
of
data
per
node,
but
some
of
it
is
fixed
cost
as
well.
Now
this
has
been
getting
better
for
a
while
in
general,
storing
things
off
heap
is
cheaper
on
the
just
on
the
amount
of
work
the
node
has
to
do
then
on
heap.
So
as
they've
moved
more
data
structures
in
Cassandra
off
the
Java
heap
bloom
filters
in
version
1.2
now
in
2.1,
you
can
optionally
put
your
mem
tables
off
heap.
That's
probably
increased
the
table
limit
a
bit
and
there
there's
even
Advanced
Options.
A
You
know
if
you
dig
into
the
bowels
to
change
how
Cassandra's
memory
allocation
works
so,
for
example,
the
fixed
one
megabytes
slab
allocator
it
has
for
each
mental.
You
can
dig
in
and
try
to
try
to
work
around
that
and
and
squeeze
more
tables
out
of
your
cluster,
but
the
bottom
line
is
that
you
can
either
become
an
expert
in
all
these
extremely
advanced
options
and
really
straight
from
the
beaten
path,
or
you
can
first
evaluate
whether
you
really
need
that
many
tables
in
your
cluster.
A
The
first
question
to
ask
is:
who
are
our
tenants
there's
actually
two
dimensions
to
this
in
Aladdin?
One
is
the
different
applications
that
we
have.
These
are
different
components
of
Aladdin.
You
know
things
related
to
trading
or
risk
management
or
securities
etc,
requiring
hundreds
of
different
classes
of
data
to
be
stored.
Now,
not
all
of
that
is
in
Cassandra.
We
have
other
database
solutions
as
well,
but
a
pretty
sizable
chunk
of
it
is,
and
it's
been
growing
for
a
few
years
now
and
then
the
second
dimension
is
the
different
aladdin
clients.
A
Now
I
mentioned
that
Aladdin
is
delivered
as
a
service
to
teams,
both
at
Blackrock
and
50
other
over
50
other
companies
around
the
world,
and
even
that
number
has
been
growing
each
of
those
clients.
Installation
is
hosted
on
black
rocks
internal
clouds,
so
we're
hosting
all
the
software
infrastructure
and
the
Kassandra
infrastructure
everything
for
all
of
those
clients.
So,
with
the
the
simple
little
picture
I
put
on
the
slide,
you
can
see
how
the
total
number
of
tenants
explodes
when
each
client
has
its
own
installation
of
each
Aladdin
application.
A
So
let's
evaluate
our
options
for
Cassandra
multi-tenancy
in
light
of
those
requirements
for
Aladdin
the
first
option,
just
deploying
a
separate
cluster
for
everything
is
technically
feasible,
but
again
this
is
a
potentially
very
costly
solution
in
terms
of
complexity
and
operational
overhead.
You
know,
admittedly,
this
is
probably
how
a
lot
of
companies
do
their
Cassandra
multi-tenancy.
This
is
pretty
frequent
advice.
A
But
when
we
run
the
numbers
between
hundreds
of
different
classes
of
data
and
and
hundreds
of
clients,
we
we
easily
have
over
10,000
tenants
by
that
definition,
many
to
put
them
all
in
one
cluster
and
just
give
them
all
their
own
tables
or
key
spaces,
and
we
could
potentially
work
around
that
by
mixing
this
with
option.
1
a
bit,
maybe
sharding
things
somehow.
A
So
we
have
different
multi
tenant
clusters,
but
then
we
start
to
lose
some
of
the
benefits
of
multi-tenancy
again
and
depending
on
how
we
shard
things,
we
might
create
a
brittle
situation
where
we
still
don't
have
much
room
for
growth
in
dimension
or
the
other,
so
option
for
finding
a
way
to
share
tables
seems
pretty
attractive.
Now,
if
we
can
make
it
work,
this
requires
co-hosting
different
tenants
data
in
the
same
Cassandra
tables,
though,
since
data
modeling
and
Cassandra
is
per
table,
we
would
like
to
still
have
a
distinct
table
per
per
data
class.
A
Otherwise,
we'd
have
to
resort
to
techniques
like
generic
or
overloaded
columns
that
get
used
for
different
purposes
in
different
contexts
and
generally
make
things
very
confusing,
but
even
so
this
brings
our
required
number
of
tables
down
into
the
hundreds.
If
we
can
co-host
our
clients
data
in
the
same
tables.
A
So,
let's
see
how
this
would
look
from
a
data
modeling
perspective
at
the
top
of
the
slide
here,
I've
copied
again,
this
prices
table
example
as
if
we
had
chosen
option
three,
we
were
going
to
give
every
every
client
its
own
prices
table.
So
you
see
the
client
name.
Client
X
is
part
of
the
table.
Identifier
and
presumably
there's
you
know
one
hundred
some-odd
more
tables
just
like
this
in
the
same
cluster,
but
with
different
table
names.
A
Contrast
that
with
the
bottom
of
the
slide,
where
I
put
the
shared
table
approach
now,
client
is
a
field
in
the
table
and
additionally
I've
added
it
to
the
primary
key
and
specifically,
the
partitioning
part
of
the
primary
key.
So
now,
I
have
different
clients
data
stored
in
the
same
table,
but
separated
by
partition.
A
Excuse
me
that
was
pretty
easy
right.
I
guess
it's
multi-tenant
now
everything's
working
can
probably
wrap
up
early
and
just
hit
the
road,
but
you
know
hang
out
we
had
those
requirements
right,
so
maybe
we
should
review
them
again
and
just
see
if
we're
cover
the
first
one
was
administrative
needs.
So
I
mentioned
that
about
a
lot
of
the
administrative
stuff.
You'd
want
to
do
in
Cassandra.
You
can
do
at
the
table
level,
but
now
we've
put
different
tenants
into
the
same
table.
A
So
this
approach
really
only
works
if
you're,
ok,
with
with
giving
some
of
that
per
tenant
control
up
which,
for
us
we're
grouping
things
by
application
and
it
turns
out
most
of
the
configuration
and
control
we
want
to
do
is
at
the
application
level.
Anyway.
Now
that's
not
all
administration.
We
would
like
to
still
be
able
to
track
metrics
and
attribute
utilization
at
the
at
the
per
tenant
level
and
I'm
going
to
address
how
how
we
did
that
in
a
few
slides.
But
first
I'd
really
like
to
talk
about
our
second
requirement,
which
was
security.
A
Now
we
had
a
requirement
that
one
clients
data
should
be
inaccessible
from
another
clients
applications
just
because
of
the
way
we've
architected
Aladin.
That's
the
requirement
that
we've
made
for
ourselves.
So
let's
see
what
Cassandra
is
able
to
do
out
of
the
box
with
security
and
and
see
if
it
covers
our
use
case.
A
A
So
it
seems
like
cassandra,
has
awesome
security
support,
but
is
it
what
we
need
in
this
case?
Well
for
authentication
it
turns
out
it
is
we
we
didn't
care
so
much
about
the
built
in
implementation,
but
we
wrote
our
own
I
authenticate,
err
that
plugs
in
very
nicely
to
to
the
existing
Aladdin
authentication
systems.
A
So
we're
able
to
know
at
all
times,
who's
who's
connecting
to
our
clusters
and
verify
those
identities
on
the
authorization
side.
Not
so
much
you
see
on
each
request.
Cassandra
will
tell
your
I
authorize
er,
who
made
the
request
and
which
table
they
asked
for
our
I
authorized.
Er
needs
to
know
who
made
the
request
which
table
they
asked
for
and
which
partitions
in
the
table
they
asked
for,
because
that's
the
level
that
we're
separating
our
clients
data
by
so
we
changed
Cassandra
to
make
it
do
just
that.
A
I
put
a
diagram
on
this
slide
illustrates
how
Cassandra's
security
works.
I
won't
call
this
high-level,
it's
not
exactly
low
level
somewhere
in
the
middle,
just
explaining
how
it
works
and
and
the
change
that
we
made
so
on
the
left.
You
have
a
client
application
process
running
in
client,
X's,
a
Lateran
environment
and
on
the
right
you
have
a
Cassandra
server
running
in
our
multi
tenant,
Cassandra
environment
and
the
application
over
here
in
step.
One
is
making
the
request,
which
in
this
case,
is
select
star.
A
It's
talking
to
our
Aladdin
Cassandra
library,
which
is
really
just
the
the
data
stacks
library,
with
some
some
extras
that
we
use
tacked
on
to
it,
one
of
them
being
the
authentication
so
step.
Two,
our
library
is
connecting
to
the
Cassandra
server
and
logging
in
with
a
client,
X
specific
credential
step.
Three,
the
Cassandra
server
driver
is
getting
that
credential
and
passing
it
through
to
our
Authenticator,
which
in
step
four,
is
verifying
the
authenticity
of
that
credential
and
returning
an
authenticated
identity
to
Cassandra.
A
So
in
step,
five
Cassandra
is
going
to
store
that
identity
in
the
the
session
context,
which
is
tied
to
the
TCP
socket
going
back
to
the
specific
client
process.
Saying
remember:
this
is
user
at
client
X.
So
are
our
servers
telling
the
client
okay
good
to
go
you've
logged
in
so
in
step?
Six,
the
the
application
is
free
to
actually
send
the
request
to
the
server
in
step.
Seven,
the
Cassandra
server
driver
is
passing
that
sessions
authenticated
identity,
the
table
that
they
requested
and
with
our
change.
A
A
Just
this
application
is
an
owner
of
the
data
they're
requesting.
So
yes,
this
is
authorized.
You
are
good
to
go
so
in
step.
9
Cassandra
can
actually
complete
the
request.
This
is
where
most
of
the
work
happens.
Not
much
work
has
happened
before
now,
and
then
it
will
get
the
data
and
send
it
back
to
the
application
process
in
step.
10
note
that
steps
two
through
five
are
only
happening
the
first
time
the
application
opens
the
connection
to
that
Cassandra
server.
It's
step
six
through
ten.
That
happened
every
time.
The
client
makes
a
request.
A
A
Little
bit
more
about
how
we
actually
implemented
this
change
and
the
old
days
this
was
done
through
changes
to
the
thrift
server-side
code.
So
we
using
the
interfaces
and
the
settings
that
you
see
here
nowadays,
it's
done
through
our
own
custom,
cql,
three
query
handler
so
a
little-known
feature,
but
using
the
the
VM
argument
on
the
slide
here,
you
can
actually
override
the
custom
handler
that
that
Cassandra
uses
to
process
queries.
A
So
our
handler
will
intercept,
select,
insert
update,
delete
and
batch
statements
and
resolve
the
partition
keys
associated
with
those
requests
and
pass
them
to
the
authorizer
note
that
this
limits
ordinary
user
access
in
this
cluster
to
requests
where
you
actually
know
the
partition
keys
upfront.
So
that's
not
every
request
you
can
make
to
Cassandra.
A
It
is
every
mutation
request
you
can
make
to
Cassandra,
but
it's
only
selects
that
are
based
on
the
primary
key
of
your
table.
So,
for
example,
if
you
were
doing
a
token
range
scan
or
just
scanning
all
the
data
in
a
table,
a
regular
tenant
wouldn't
be
able
to
do
that
with
this
solution.
That
would
be
limited
to
super
users.
Since
we
don't
know
the
partition
keys
before
the
request
is
processed.
A
Our
last
requirement
was
for
performance.
Isolation.
Remember
this
means
that
a
spike
in
utilization
from
one
tenon
should
not
have
any
impact
on
other
tenants
level
of
service
turns
out.
This
is
actually
pretty
challenging
to
do
in
a
Kassandra
cluster.
It's
definitely
possible
for
a
misbehaving
client
application
to
monopolize
resources
in
your
cluster,
and
you
know,
frankly,
cassandra
has
pretty
limited
tools
to
prevent
this
from
happening.
A
So
so,
let's
look
at
how
we
how
we
try
to
mitigate
this.
Despite
that,
the
first
thing
that
we
do
is
is
keep
our
tenants
accountable.
This
means
understanding
at
all
times
in
real
time,
who's
driving
utilization
in
the
system
so
that
we
can
identify.
If,
if
someone
is
misbehaving
and
if
need
be,
try
to
dial
it
down
or
an
emergency
shut,
something
off
until
the
situation
can
be
remedied.
So
we
use
two
sources
of
information
to
do
this.
The
first
source
of
information
is
the
metrics
that
Cassandra
itself
publishes
via
JMX.
A
Many
of
them
also
available
through
through
node
tool
commands,
so
these
are
per
node
metrics
and
many
of
them
are
per
table
metrics
as
well.
So
I
put
an
example
of
a
visualization
that
we
use
on
this
slide
from
one
of
our
in-house
monitoring
tools
derived
from
those
metrics.
So
what
this
graph
is
showing
is
what
is
the
average
percent
utilization
of
the
read
thread
pool
in
our
Cassandra
clusters,
broken
down
by
table
in
real
time?
A
We
collect
this
by
by
taking
the
the
running
total
read
latency
published
by
Cassandra
via
JM
X,
which
is
per
table
and
and
just
doing
some
arithmetic
on
it
and
saying
how
much
time
is
that
the
read
thread
pool
spending
out
of
all
of
its
available
time
on
each
table
in
the
cluster,
and
this
turns
out
just
anecdotally
that
that
metric
alone
turns
out
to
be
an
awesome
way
to
find
out.
Who
is
monopolizing
the
resources
in
your
cluster,
more
so
than
who's.
A
Sending
the
most
requests,
because
this
is
really
weighting
that
number
by
how
expensive
or
each
of
those
requests.
So
I
highly
recommend,
if
you're
not
using
that
metric,
that
you
go
ahead
and
do
that.
But
this
is
not
enough
for
us,
because
this
is
only
broken
down
per
table
and
and
we're
putting
different
tenants
in
the
same
table,
and
we
we
also
want
to
you
know
we
want
to
know
even
within
a
table
which
client
is
driving
utilization.
A
A
finer-grained
metrics
like
end-to-end
latency,
that's
true
and
then
latency,
not
just
the
coordinator
metric.
That
Cassandra
gives
you
a
number
of
timeouts.
The
actual
number
of
bytes
that
are
being
written
to
Cassandra
are
read
from
Cassandra,
so
that
really
gives
us
down
all
the
way
to
the
the
finest
grain
level.
What's
driving
utilization
in
our
clusters,
the
other,
the
other
thing
to
do
it
seems
kind
of
obvious,
but
stay
ahead
of
your
capacity
needs.
Make
sure
that
you
have
plenty
of
buffer
to
absorb
Burson
utilization.
A
You
know
estimate
upfront
what
you
need,
but
also
be
sure
that
you're
looking
on
a
frequent
and
regular
basis.
What's
actually
you
know
what
the
actual
growth
and
utilization
is
and
how
close
you
think
you're
getting
also,
if
you
do
observe
any
performance
anomalies
in
your
cluster,
be
be
sure
to
track
down
the
cause
of
those
anomalies,
even
if
they're
not
big
problems.
Yet,
in
my
experience,
a
small
anomaly
this
week
in
like
two
weeks,
becomes
a
disaster.
So
so
don't
let
that
pass
without
looking
into
it.
A
These
techniques
have
served
us
pretty
well,
but
I.
Think
Cassandra
still
has
some
room
for
improvement
in
this
space,
so
one
general
area
to
think
about
is
is
how
could
Cassandra
directly
prevent
tenants
from
monopolizing
resources
in
your
cluster.
So
so
one
thing
is
thinking
about
this
percent
utilization
of
the
read
thread
pools
because
that's
you
know
ultimately
the
kind
of
a
proxy
for
the
actual
resources
for
for
reading
data
out
of
Cassandra.
A
This
is
already
something
that
you
know
the
people
working
on
Cassandra
are
working
to
address,
but
the
bottom
line
is
that
we
don't
want
the
first
sign
of
trouble
in
Cassandra
to
be
at
falling
off
a
cliff
because
of
a
long
garbage
collection
pause.
It
should
be.
You
know
a
slow,
steady
degradation
of
service,
because
that's
a
big
way
that
one
tenant
can
end
up
affecting
another
tenant
is
if
it
suddenly
causes
a
60
second
pause
in
one
of
the
JVMs
and
then.
A
Having
seen
all
the
amazing
progress
in
Cassandra
since
version
0.6
I've,
no
doubt
that
we're
going
to
continue
seeing
awesome
progress,
hopefully
in
some
of
these
areas,
but
obviously
lots
of
other
ones
in
3.0
and
beyond,
so
I
have
faith
there
in
the
meantime,
you
know
because
this
is
a
concern.
We
we
end
up
taking
a
hybrid
approach
to
multi-tenancy,
so
we
use
all
the
text-
neat
techniques
that
I've
discussed
here,
to
try
to
use
our
clusters
in
a
multi-tenant
fashion.
But
in
some
places
you
know,
isolation
is
strict
requirements.
A
So
you
know
for
select
cases.
We
may
still
deploy
separate
clusters
so
that
wraps
up
how
we're
doing
our
Cassandra
multi-tenancy
at
Blackrock.
You
know
keep
in
mind
that
everyone's
situation
is,
is
a
bit
different.
The
things
that
made
sense
for
us
might
not
make
sense
for
for
other
people,
but
if
you're
in
a
situation
similar
to
ours.
Hopefully
this
talk
gave
you
a
few
things
to
think
about.
B
You
mentioned
in
your
requirements
that
isolation,
what
of
tenants
from
a
performance
perspective
you
were
talking
about
monitoring,
then,
when
you
find
a
tenant.
That's
doing
something
odd.
Is
your
only
tool
to
increase
all
the
tenants
capability?
Can
you
move
to
tenant?
What
have
you
found
around
that
so.
A
It's
a
challenge
today.
We
take
advantage
of
the
fact
that
again,
these
are
all
running
inside
our
walls.
We
have
it's
not
like
exposed
on
the
internet
or
anything.
We
have
control
over
the
actual
client
processes.
So
for
us,
it's
a
lot
more
about
identification
than
then
direct
control
in
the
cluster.
The
direct
control
is
really
about
that
that
five-minute
window
between
the
problem
starts
and
identifying
and
fixing
it.
If
that
makes
sense,
I.
B
B
C
Hi
thanks
for
a
nice
presentation,
I
like
the
security
custom
security
solution
that
I
implemented,
but
there
is
one
area
where
we
are
not
able
to
do
the
isolation
or
I
just
like
to
understand
this
creation
of
the
session
object.
When
you're
going
from
your
client
libraries
just
now,
I
turn
to
the
apple
lecture.
They
they
said
the
creation
of
connection
and
the
session
object
is
very
expensive.
So
what
are
you
doing
in
your
case?
Do
you
share
those
connections,
session,
objects,
connection
and
connection
and
session
objects
per
partition
per
client?
A
Do
so
so
each
actual
process
that's
running,
would
just
be
for
for
one
client,
so
it
maybe
I
drew
the
picture
to
simply,
but
that
top
part
that
we're
not
doing
that
before
every
time
we
do.
The
bottom
part
that'll
happen
once
I
just
wanted
to
illustrate
it
and
then
yeah
we'll
save
the
session
object,
but.
C
A
A
A
D
A
So
we'll
back
up
by
by
snapshotting
or
turning
on
incremental
backups,
and
then
we
have
an
in-house
solution,
that'll
scan
those
snapshot
files
and
and
create
like
a
consistent
backup
file
at
the
time
that
we're
doing
that
we
separate
it
out
by
by
client.
So
although
each
SS
table
itself
will
have
multiple
clients
data
in
it,
the
backup
files
themselves
will
be
at
the
at
the
tenant
level
and
that's
why
we
use
like
an
in-house
way
of
doing
that.
So
so
we
can
see.
We
can
separate
it
like
that.
Thank.
D
E
E
A
A
F
A
F
A
Do
so
I
was
kind
of
highlighting
you
know
what
I
thought.
That
was
the
more
interesting
case,
but
in
some
cases
and
prices
from
the
market
might
be
one
of
those,
because
you
know
it
might
be
public
vendor
data.
We
might
not
use
that
approach.
We
might
just
store
it
once
for
all
the
clients
from
the
environment
or
publishing
it
from
you,
write
it
and
then
just
expose
it
as
read-only.
Just
one
copy
of
the
data
from
from
each
client
iç.
F
A
Of
a
partition
can,
but
if
you're
saying
like
give
me
all
the
tokens
between
you
know
two
to
the
whatever
and
a
gajillion
that
that's
not
going
to
work
or
if
you're
just
saying,
select
star
from
an
entire
table.
That's
not
going
to
work,
but
you
can
certainly
say
get
me
all
the
you
know.
All
black
rocks
prices
specially
in
1
o'clock
and
2
o'clock
that
that
works.
Ok,.