►
From YouTube: C* Summit 2013: Time is Money
Description
Speakers: Jake Luciani and Carl Yeksigian, Quantitative Strategists at BlueMountain Capital Management
Slides: http://www.slideshare.net/planetcassandra/jake-luciani-and-carl-yeksigian
This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.
A
Get
started,
my
name
is
Jake
luciani,
and
this
is
Carl.
He
y
X.
Again
we
Blue
Mountain
capital,
I'm,
a
patchy,
Cassandra
emitter
and
we
are
going
to
talk
to
you
about
how
we
are
using
Cassandra
to
build
our
time
series
database
before
we
begin,
though
I
just
wanted
to
I
came
up
with
a
joke
about
Cassandra
and
I.
I
couldn't
not
share
it,
so
my
joke
is
what
did
the
one
Cassandra
notes
say
to
the
other
Cassandra
node
after
the
network?
Split?
B
A
Just
going
to
go
quick
quickly
over
our
use
cases
and
architecture
and
I'm
going
to
quickly
pass
it
over
to
car,
which
is
going
to
take
a
deep,
deep
dive
into
how
we've
tuned
our
arc
Sandra
cluster
and
how
we've
dealt
with
with
with
certain
problems
specifically
compaction.
So
the
the
talk
is
about
time
is
money.
There's
sort
of
two
two
aspects
there
right.
You
know
time
is
money
in
terms
of
latency,
and
time
is
money
because
we
have
a
time
series
database.
A
So
the
the
main
idea
that
we
want
to
start
with
is
you
know,
describe
our
problem.
Our
problem
is,
we
have
a
compute
cluster
of
a
thousand
nodes
which
are
constantly
running
calculations
off
of
real-time
market
data,
so
it
basically
reads
and
writes
as
fast
as
it
possibly
can
and
all
those
reads
have
to
be
consistent
across
all
readers.
We
have.
You
know
we
also
have
a
UI
so
that
so
that
our
team
can
basically
run
ad
hoc
queries
and
everything
has
to
run
quickly.
Oh
and
we
have
multiple
data
centers.
A
So
it's
really
a
hard
problem
to
handle
is
very
rewrite.
Intensive
and
in
cassandra
is
a
is
really
the
only
option
for
us
so
time
series
intro,
I'm
sure
everyone's
seen
stock
prices.
This
is
the
the
simplest
example.
You
know
the
the
change
of
price
over
time,
but
if
you
think
about
it,
most
things
are
time
series
and
and
for
databases
like
like
Cassandra,
it
does
really
well
at
dealing
with
with,
with
with
time
time
ordered
data.
A
You
know
you
need
to
always
have
the
same
exact
data.
So
so
so
we
never
update
data.
Oh
sorry,
we
never
over
right
data,
we're
always
adding
on
the
end.
Our
cross
section
data
is
random.
So
when
you
think
about
us
searching
across
any
given
fund
or
any
given
portfolio,
you
can't
really
partition
that
problem,
because
you're
never
going
to
know
what
what
all
possible
combinations
of
that
data
is.
So
it's
difficult
to
do,
which
is
why
you
need
something
which
in
which
can
scale
horizontally.
A
So
you
kind
of
have
to
fan
out
to
get
that
data,
and
we
also
support
this
thing
called
by
temporality,
which
is
there's,
there's
two
dimensions
of
time.
There's
there's
the
time
of
which
something
happened
and
the
time
in
which
you
learned
about
it
so
say.
For
example,
if
you're
fixing
a
bad
data
point
after
two
weeks
of
you
know
of
it
being
in
the
system,
you
want
to
be
able
to
go
back
and
run
something
with
the
old
data
point
and
the
new
data
point.
A
So
our
system
optimizes
for
time
series
and
we
are
using
Cassandra,
12
and
c
ql
3
the
with
c
ql
3.
It
makes
it
pretty
straight
forward
to
define
this
type
of
table.
We've
simplified
things
a
lot
here,
but
you
know
the
main
table
looks
something
like
this.
Where
you
can
have
a
identifier,
you
have
a
type,
for
example
price
you.
You
have
the
sort
of
time
foot
for
the.
A
A
A
You
give
it
a
knowledge
time
cut
off
where
the
last
point
before
you
know
this
knowledge
time
and
and
and
you
get
it
for
limit
one
and
you-
and
you
just
run
this
in
in
in
parallel
right,
so
you
send
all
these
off
of
parallel
and
into
the
cluster.
So
what
we
built
is
none
of
our
clients
talk
directly
to
Cassandra.
We've
built
a
thrift
service
on
top
of
it
and
we're
using
the
cassandra.
A
Has
this
thing
called
the
fat
client
and
what
allows
you
to
do
is
sort
of
act
like
a
Cassandra
node
with
without
being
responsible
for
any
data.
If
you
look
in
the
examples
directory
there's
a
example
of
this,
so
what
allows
us
to
do
is
cut
down
on
the
amount
of
serialization,
so
our
thrift
service
is
sort
of
running
Cassandra
internally,
but
not
dealing
with
any
data.
It
can
just
use
the
storage
proxy
directly,
so
it
can
use
the
same
internal
protocol
that
Cassandra
uses
to
talk
node
to
node.
A
This
makes
it
simple
for
us,
so
we
don't
have
to
worry
about
connection
pooling
or
you
know,
to
token
identification
or
dynamic,
snitch
or
na
that
stuff.
It's
all
built
in
because
it's
it's
running
the
same
code
as
what's
running
in
Cassandra,
and
what
we're
doing
is
we're
passing
the
cql
through
that
layer.
So
our
applications
talk
to
our
service,
which
is
called
Olympus
now.
The
other
thing
is:
if
we
don't
just
price
data,
there's
lots
of
different
financial
data.
A
There's
reference
data,
there's
also
sort
of
when
you're,
when
you're
running
models,
you
sort
of
want
to
track
the
the
the
metadata
about
your
model
over
time,
so
that
information
needs
to
be
stored.
So
we
use
thrift
structures,
and
this
is
similar
to
you-
know
goo
goo,
using
protocol
buffers
kind
of
keep
things
on
disk
and
what
and
by
using
the
thrift
Union
type,
we
can
easily
sort
of
reflect
on
what
type
of
data
we're
pulling
back.
So
so
our
users
will
create
a
thrift
structure.
A
B
So
that's
the
easy
part:
that's
our
data
model
so
now
I'm
going
to
talk
about
how
we
actually
scale
this
thing
to
handle
all
the
reads
and
writes
simultaneously
so
first
rule
about
scaling.
Is
you
don't
just
throw
everything
at
it
and
let
it
run
as
fast
as
it
can?
So
you
don't
just
turn
to
11.
That
was
actually
Jake's
slide,
so
I'm
just
going
to
throw
them
under
the
bus.
For
that
ok,
so
scaling!
You
know
you
got
to
get
fast
machines.
You
have
to
get
you
have
to
avoid
the
GC
pauses.
B
You
have
to
tune
cassandra
and
you
have
to
prefetch
and
cash
inside
of
your
application.
So,
but
before
we
get
there,
you
can't
fix
what
you
can't
measure.
So
we
use
Riemann,
which
is
a
really
great
stream
based
monitoring
system.
So
you
can
easily
push
application
and
system
level
metrics.
So
cassandra
has
a
plug-in
for
this.
Currently
we
push
about
6,000
metrics
per
second
into
a
single
Riemann
instance.
So
so
this
is
the
little
snippet
of
code.
That's
required
to
actually
hook
up
Riemann
in
you
into
Cassandra,
so
there's
two
components
to
Ahriman
set
up.
B
One
is
the
dashboard
which
shows
us
the
live
stats
that
we
currently
have,
and
one
is
graphite,
which
shows
us
historical
stats,
so
using
both
of
them
kind
of
gives
us
a
good
picture
of
what's
wrong
now
versus
how
has
it
been
acting
in
the
past
and
then
visualvm
is
a
tool
for
the
the
jvm,
which
shows
you
a
bunch
of
information
about
your
your
jvm.
So
it
shows
you
what
the
threads
have
been
working
on.
What's
going
on
with
your
GC,
it
also
has
like
j
beans
plugin.
B
So
it's
got
it's
about
all
sorts
of
great
tools
for
it
so
gain
the
right.
Hardware
is
really
important
for
us.
We
got
SSDs
for
all
of
our
hot
data
and
we're
going
to
get
spinning
disks
for
all
of
our
cold
data.
We
use
the
J
bod
configuration
that
way.
We
can
use
as
many
disks
as
as
possible.
We
don't
have
to
go
through
the
raid
controller.
We
don't
have
to
do
software
raid.
B
You
want
to
get
as
many
cores
as
possible,
so
we
we
have
machines.
I
have
16
cores
we're
looking
to
to
get
machines
of
like
32
cores
in
the
future.
We
currently
have
a
10
gig
network
and
we
actually
use
bonded
network
cards.
We
get
20
gigs
on
a
single
machine
and
then
in
order
to
actually
take
advantage
of
10
gig
network,
you
have
to
use
jumbo
frames,
so
Jay
baud
really
helped
us
out.
B
B
So
actually
configuring
cassandra
is
another
big
part
of
making
your
application
go
quickly.
So
there's
three
components
that
we
went
through.
One
was
configuration
of
Cassandra
I'm
not
going
to
talk
about
it
at
this
one,
because
we
only
have
like
five
more
minutes,
but
we
we
have
a
slide,
show
that
we
did
in
New
York.
That
has
all
the
configuration
that
we
did.
B
The
next
thing
is
compaction,
so
that's
that's
actually
a
big
part
of
the
time
that
we
ate
the
CPU
time
that
we
that
we
spend
and
then
compression,
so
we
actually
switched
to
a
faster
compression
algorithm
called
LZ.
For
so
we
use
leveled
compaction.
The
good
part
about
levelled
is
that
it
limits
the
number
of
SS
tables
you
have
to
read.
B
So,
even
though
you
have
all
these
levels,
you
know
exactly
which
SS
tables
will
actually
contain
the
data
that
you
that
you're
reading,
so
it
puts
a
upper
bound
on
the
number
of
SS
tables
that
you
actually
have
to
read.
So
we
use
wide
rows,
which
is
why
this
is
actually
an
issue
so
but
under
high
right
load.
All
of
these
SS
tables,
as
the
mem
tables,
get
full.
They
get
flush
out
to
disk.
B
So
in
your
level
0,
which
is
completely
random
data,
you
have
to
read
every
single
SS
table
in
order
to
figure
out
what
data
you
actually
need
to
respond
with.
So
our
original
attempt,
which
is
actually
included
in
Cassandra
20,
is
this
thing
called
hybrid
compaction.
So
it's
it
was
just
rolled
into
regular,
leveled
compaction
and
what
it
does
is
in
the
first
level,
rather
than
just
writing
just
doing
the
regular
compaction
zup
to
level
1.
B
What
it
does
is
it
takes
a
bunch
of
the
SS
tables
and
it
does
the
the
old
size
tier
compaction.
So
that
way,
you
have
fewer
SS
tables
that
you
have
to
read
when
you're
doing
the
this
heavy
right
load
so
that
now
we
have
a
new
idea
that
that
I'm
currently
working
on
it's
called
overlapped
compaction
and
basically,
what
this
allows
is
that,
instead
of
taking
your
level
0
files
and
combining
them
with
level
1
and
then
writing
all
new
files
info
level,
one,
it
just
takes
your
level
0.
B
It
does
the
regular
compaction
that
it
would
up
to
level
1,
but
it
doesn't
take
level
1
files
and,
at
some
point
it
switches
back
into
this
regular
mode
of
taking
all
of
your
files
at
the
the
tier
at
the
level
and
combining
it
with
the
the
level
above.
But
this
way
you
don't
have
the
bottleneck
of
I
have
to
take
these
files
and
the
files
in
the
next
level
to
do
the
compaction.
B
So
this
will
allow
us
to
get
higher
throughput
because
we'll
be
able
to
to
do
more
parallel
compaction
and
then
another
thing
that
we're
working
on
is
a
sea,
optimized
library.
So
there's
there's
actually
a
lot
of
a
lot
of
times
spent
in
on
the
repast
doing
comparisons
and
doing
the
the
CRC
check
for
for
compression.
So
the
composite
comparison
actually
eats
a
lot
of
cycles,
so
CRC
check
is
implemented
on
on
the
chip
for
a
lot
of
architectures.
B
So
right
now
we
have
16
nodes
across
two
data
centers.
So
a
replication
factor
is
six,
but
that's
three
in
each
data
center,
so
we
can
lose
a
node
in
either
data
center
or
we
can
lose
an
entire
data
center
and
continue
continue
operating.
We
get
200,000
writes
per
second
at
each
forum
and
150,000
reads
per
second
at
local
quorum,
so
the
difference
there
is
that
writes,
have
to
be
acknowledged
by
both
data.
B
Centers
reads
only
have
to
be
acknowledged
by
the
data
center
that
you're
that
you're
in
and
we
currently
store
over
30
million
time
series
and
we
have
over
15
billion
data
points
and
that
takes
up
about
10
terabytes
of
data
on
disk
and
re
latency
for
the
50th
percentile
in
the
95th.
Percentile
is
one
millisecond
and
five
milliseconds.
B
C
B
C
B
B
Didn't
we
didn't
really
measure
the
performance
of
jumbo
frames?
It
was
just
like
that's
that's
our
network
configuration
in
order
to
really
utilize
the
10
gig
network.
So
without
the
jumbo
frames
you
can
actually
utilize
the
whole
10
gig
network
that
you
get
so
you're
back
to
basically
having
a
one
gig
network.
B
We
actually
had
a
lot
of
problems
with
jumbo
frames.
No
original
structure
was
that
clients
would
talk
to
the
the
Cassandra
cluster
and
there
were
a
lot
of
packets
that
we
get
dropped
between
them.
We
would
have
liked
some
machines
that
just
couldn't
talk
to
the
Cassandra
cluster,
so
I
don't
actually
remember
how
that
got
resolved,
but
the.