►
Description
Speaker: Dave Gardner, Architect at Hailo
Slides: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-no-whistling-required-cabs-cassandra-and-hailo
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centres globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.
A
Right,
hi,
everyone,
my
name
is
dave
and
I
work
for
halo
and
I'm
going
to
talk
today
about
how
we're
using
Cassandra
at
halo
I'm
going
to
go
through
our
use
case.
I'm
going
to
go
through
the
data
model
and
some
of
the
operational
side
of
things
and
then
I'm
going
to
talk
a
little
bit
about
the
the
perspective
of
Cassandra
from
different
people
at
halo.
A
Cassandra
is
something
where
we
would.
We
would
come
to
work
and
we
would
kind
of
do
battle
with
it
all
day
to
try
and
make
it
do
what
we
wanted
it
to
do
and
we
would
often
lose,
but
a
lot
of
things
have
changed
since
then,
I
went
and
looked
it
up.
There's
tons
of
features
Jonathan
kind
of
every
time,
I
see
one
of
his
key
notes.
A
He's
kind
of
introducing
these
new
features
that
they're
bringing
in
and
it's
quite
staggering,
how
how
far
it's
come
and
and
today
we're
quite
lucky
really
and
that
Cassandra
isn't
isn't
as
difficult
isn't
as
brittle
to
use
as
it
used
to
be.
So
that's
pretty
good
work
by
the
committee's
so
I'm
going
to
talk
about
and
the
kind
of
perspective
of
Cassandra
at
halo
when
I
was
thinking
about
this
talk,
I
went
round
of
spoke
to
a
lot
of
people
in
our
organization.
I
spoke
to
some
of
the
developers.
A
I
spoke
to
the
people
who
have
to
operate
it
day
to
day
and
I
talked
to
some
other
management,
including
people
kind
of
all
levels
at
right
at
very
top,
and
it
was
quite
interesting.
It's
quite
quite
eye-opening
in
some
respects,
to
talk
to
these
different
people
as
halos
grown
from
a
company.
You
know
we're
all
on.
We
were
all
SAT
together
on
one
boat
and
now
we're
a
bigger
company,
so
it
doesn't
I,
don't
always
get
the
opportunity
to
meet
these
people.
So,
what's
halo,
hopefully
everyone
hears
heard
of
it.
A
Is
anyone
not
heard
of
halo,
2
self-interest?
Okay,
all
right,
quite
a
few
good,
but
it's
a
sales
pitch,
so
halo
is
an
app.
That
means
you
can
get
a
taxi
and
wherever
you
want-
and
you
can
do
it
by
just
pressing
a
button
on
your
phone
say
pick
me
up.
A
black
cab
will
come
and
get
you
you
can
jump
into
the
cab
and
at
the
end
of
journey
you
can
just
jump
out
and
it'll
be
charged
to
your
card
and
you'll
get
emailed
a
receipt.
A
So
it
kind
of
takes
the
pain
out
of
getting
a
black
cab,
especially
in
some
of
the
more
obscure
parts
of
London
when
it's
winter
and
cold
and
wet
and
for
drivers.
We
we
make
the
life
of
drivers
easier
by
meaning
they
don't
have
to
drive
around
so
much
empty.
They
can
kind
of
they
can,
you
know,
make
better.
You
said
a
base
nearby
by
getting
these
extra
jobs
and
they
can
get
extra
work
that
people
might
have
used.
Other
services
for
before
hailer
came
along
and
it
was
founded
by
by
this
motley
crew.
A
In
London
about
three
years
ago,
and
we
built
most
of
it,
working
on
a
boat
on
the
Thames
and
we
launched
in
november
two
thousand
eleven
to
give
you
some
idea
of
like
where
we've
come
in
in
two
years,
since
launch
halo
is
now
the
world's
highest
rated
taxi.
App
we've
got
over
over
11,000
five
star
reviews
and
we've
got
over
half
a
million
registered
customers,
a
halo
hail.
So
that's
like
when
someone
hits
two
button
and
says
come
get
me.
A
A
Like
account
crud,
that
sort
of
thing
we
had
some
resilience
in
that
we
ran
multi-master
replication
for
my
squirrel,
but
we
were
still
kind
of
at
the
whim
of
Amazon.
If
we
lost
you
know
a
couple
of.
If
we
lost
to
availability
zones,
for
instance,
we
would
be
in
big
trouble
and
if
we
were
only
single
region
at
that
point,
but
the
whole
thing
was
built
by
quite
small
team
quite
quickly,
so
that's
kind
of
where
we
started
and
then
before
before
halo
launched.
A
It
was
all
about
finishing
the
features
you
know
get
getting
the
basic
getting
the
basics
there
so
that
you
could
actually
book
a
cab.
They'll
come
and
get
you
once
we
launch
there
was
there
was
kind
of
a
new
set
of
requirements
really,
and
it
was
those
new
sets
of
requirements
post-launch
that
that
led
to
the
adoption
of
Cassandra.
These
are
the
kind
of
things
that
influence
the
decision,
so
we
wanted
halo
to
becoming
utility.
We
wanted
it
to
be
very
reliable.
We
wanted
you
to
hit
the
button
and
it
would
always
work
it.
A
Wouldn't
let
you
down
on
a
cold
Friday
night
or
something-
and
that
was
that
was
a
plan
and
we
wanted
to
make
sure
the
technology
was
solid.
We
had
plans
for
international
expansion,
we
knew
we
wanted
to
go
to
the
US.
We
had
we
sort
of
had
that
in
mind
from
the
data
company
was
founded
and
we
wanted
to
be
able
to
run
data
centers
in
the
US
so
that
we
people
using
out
in
the
US
would
have
locality
of
service.
So
we
didn't
have
to
dial
back
all
over
the
Atlantic.
A
A
So
we,
the
path
to
adoption,
was,
was
kind
of
unilateral,
really
the
developers
this
was,
you
know,
rule
on
the
boat
we
kind
of
made
the
decision
ourselves.
If
we
didn't
you
know
it
was
a
starter.
There
wasn't
an
enormous
amount
of
management,
oversight
and
I'll
sort
of
come
back
to
that
later,
and
we
the
way
we
did
it
was
we
took
some
of
the
core
functionality
that
we
needed,
mainly
customer
records
to
start
with,
and
we
put
them
into
a
different
service
and
we
back
to
that
with
Cassandra.
A
You
know
web
apps
based
on
PHP
MySQL,
into
kind
of
a
more
of
an
SOA
based
on
go
and
java,
so
that's
kind
of
where
we're
heading
so
the
development
perspective
of
Cassandra.
This
was
basically
what
what
the
developers
think
about
it
in
Halo,
so
I
when
I
went
and
asked
them
and
Dom
w
senior
engineers
that
he
he
thought
son
had
just
worked,
and
this
is
kind
of
this
is
to
some
extent
leader
picture.
A
There
are
difficulties,
and
obviously
people
don't
always
have
experience
with
Cassandra,
for
they
can't,
but
but
generally
it's
been
pretty
solid
for
us
and
it's
relatively
straight
full
of
people
to
use
it
for
simple
use.
Cases
like
entity
storage
so
with
the
two
use
cases
do
that
we
have
really
in
Halo.
Are
simple
entity,
storage,
so
store
my
customer
get
my
customer
back
and
on
oxy
update
and
then
also
we
have
time
series
data.
I've
got
some
examples.
This
is
an
customer
records
in
Cassandra,
so
we
the
Rope.
A
He
is
a
snowflake
style
in
64-bit
integer,
which
we
can
generate
globally
guaranteed
to
be
unique
without
coordination,
which
is
finite
and
then
the
columns
so
things
like
creative
time
stamp,
email,
etc.
There
the
column,
names
and
then
column
value
to
just
the
values
for
those.
So
this
is
a
relatively
straightforward
way
of
using
Cassandra.
The
key
thing
to
remember
when
story
entities
is
to
avoid
kind
of
race
conditions
is
don't
read
the
whole
entity
update
one
thing
and
I'm
write
it
all
back.
A
If
you
just
mutate
individual
columns
that
you
change,
then
you
avoid
race
conditions.
So
so
we
kind
of
try
and
follow
our
practice,
and
this
is
kind
of
an
example
of
kind
of
volumes
we're
doing
so.
I
guess.
The
point
here
is
we're
not
doing
enormous
volumes.
This
is
like
20.
This
is
this:
is
the
only
one
kind
of
service
computing
a
use
case,
but
the
volume
is
not
enormous,
we're
doing
kind
of
you
know
this
is
a
read
heavy
workload,
which
is
the
blue
line,
and
it's
still
only
like
20
20
a
second.
A
A
So
it's
a
unique
identifier
that
contains
a
time
component
and
the
value
is
just
like
a
blob
which
is
here's
some
stuff.
Here's
what
happened!
Here's
the
event-
and
this
is
immutable
so
once
we
write
this
in
it's
immutable,
so
this
is
a
kind
of
common
Cassandra
use
case.
It's
kind
of
like
song,
playing
measurements,
that
sort
of
thing
where
you've
got
the
immutable
stream
of
data.
You
just
want
to
store
somewhere
and
then
retrieve
later
and
I'll
come
back
to
some
of
the
some
of
the
issues.
A
With
this
later,
the
interesting
thing
about
Cassandra
is
you
have
to
store
the
data
in
the
way
you
want
to
get
it
back,
so
we
want
to
be
able
to
query
the
events
for
a
job-
that's
quite
an
important
use
case
for
us
and
if
we
only
store
the
data
by
day,
the
only
query
you
can
really
do
is
get
me
stuff.
That
happened
between
this
time
and
it
give
you
everything.
A
Whereas
sometimes
we
need
to
be
able
to
say
give
me
all
of
things
that
happen
for
this
job,
so
we
store
another
index
where
this
time
the
road
key
is
a
job
ID,
and
we
store
just
the
events
that
happen
for
that
job.
So
we're
basically
duplicating
information
on
in
jest,
so
we're
d
normalizing
on
ingest,
so
that
we
can
satisfy
two
different
query
patterns
and
that's
a
kind
another
common
cassandra
patent,
and
this
is
our
stats
DB
load.
A
So
this
is
much
higher
if
it's
like
4,000,
a
second
up
to
five
or
six
writes
per
second
and
the
Green
Line
is
read.
So
in
this
instance,
you
can
see
it's
really
a
regular,
which
is
basically
because
it's
it's
kind
of
on
demand.
It's
when
people
need
some
data,
they'll
go
and
do
a
query,
and
then
at
night
no
one
doesn
t
query
so
just
disappears
the
main
consideration
for
time
series
is
picking
a
decent
Roky,
so,
basically
petitioning
your
data
into
enough
blocks.
A
If
you
go
to
some
of
the
kind
of
fundamental
cassandra
talks
or
if
you've
used
Sandra
before
you'll
know
that
Cassandra
puts
data,
it
decides
which
node
to
put
data
on
based
on
the
row
key.
So
if
you
put,
if
you
put
lots
and
lots
and
lots
of
things
in
one
row,
they're
all
going
to
be
on
one
node
base
today
we're
using
Gossie
the
golite
driver.
A
Stein
expert
java
and
PHP
code
for
PHP
drivers
are
still
a
bit
of
an
issue
datastax
trying
to
do
something
about
that.
A
The
gauzy
driver,
the
guy,
who
wrote
it
about
two
months
ago,
basically
declared
that
he
was.
He
was
not
going
to
take
part
anymore
and
as
far
as
he
was
concerned,
he
was
deprecated,
which
was
quite
interesting
when
you're
building
replication
on
it
so
I
mean
I,
guess
that's
the
nature
of
github
and
open
source,
but
so
that's
the
way
it
works.
A
So
another
thing
we
use
a
halo
is
we
have
analytics
so
as
we
migrate,
our
PHP
MySQL
based
web
app
to
Cassandra
anyone
from
newark
central
know
that
Cassandra's
got
a
bunch
of
em
limitations
in
that
you
kind
of
have
to
store
the
data
in
the
way
you
want
to
get
it
back
and
one
thing
it's
really
not
going
to
do.
For
you
he's
analytic,
you're
not
gonna,
be
able
to
you're
going
to
lose
the
the
kind
of
the
queries
you
could
do
against
mysql,
where
you
say
sum
up.
A
You
know
some
count
average
group
by
basically
those
sorts
of
things,
and
we
need
to
come
up
with
a
solution
for
that
cuz.
Obviously,
we
were
we're
taking
functionality.
We
already
have
and
we're
migrating
it
into
this
global
Cassandra,
so
we
use
a
khoonam
analytics
list
which
is
a
product
by
a
team
in
London.
They
sponsor
the
event
as
well,
and
this
is
broadly
what
the
pipeline
looks
like.
A
We
have
all
our
event,
we
put
them
into
a
canoe
and
that
stores
the
day
drinks
and
very,
but
it
kind
of
hides
the
complexity
for
miles
of
how
it
stores
it,
and
it
means
that
we
can
write
stuff
like
this.
So
this
is
an
example
where
we're
basically
saying
work
out.
The
number
of
accepts
ignores
declines
and
withdrawals
for
a
particular
driver.
So
imagine
you
you
hit
pick
me
up
here
on
the
app
and
we're
going
to
ask
drivers
if
they
want
the
job
and
for
every
single
time
we
make
that
offer
we.
A
This
is
the
information
of
a
record
and
using
a
Cunha.
We
can
just
kind
of
construct,
sql-like
queries
again,
so
that's
kind
of
handy
and
it
thought
it
hides
a
detailed
and
they
have
a
like
a
dashboard.
So
this
is
the
sort
of
thing
you
can
surface
with
it.
You
can
draw
heat
maps
and
you
can
you
can
count
up
so
this
is
an
example
in
our
test
cluster
of
me
counting
the
number
of
distinct
drivers
on
shift
grouped
by
minute
of
the
day.
So
it's
very
quick
way
of
getting
some
pretty
good
insight
into.
A
This
was
one
of
our
and
one
of
the
top
operations
people
who
like
got
to
tell
me
about
a
Cassandra
and
their
experience
of
it,
and
his
view
was
that
Sandra
is
allowing
a
kind
of
small
ops
team
to
do
something
they
wouldn't
really
be
able
to
do
without
Sandra,
and
in
particular
he
was
really
referring
to
thee
to
running
a
date
of
an
active
active
database
on
three
continents
so
with
Cassandra,
that's
actually
relatively
straightforward,
so
this
is
halo.
This
is
where
we
operate.
We
start
in
London.
A
We're
in
dublin
were
in
Spain
we're
in
North
America
and
we're
in
Tokyo
on
a
soccer
now,
so
you
can
see
that
we're
very
spread
out
and
cassandra
means
that
we
can
run
clusters
in
each
of
those
regions
on
amazon
and
it
will
replicate
the
data
back
and
forth
between
the
clusters.
So
the
gray
dotted
lines
here
are
kind
of
VPN
links
that
join
out
different
sandra
clusters.
Together,
we're
running,
we
typically
run
six
machines
per
region.
This
is
kind
of
what
we
have
in
production.
A
Now,
six
machines
per
region,
three
regions-
we've
got
with
actually
got
three
clusters-
are
only
drawn
to
it
and
we've
got
another
one.
Now
we're
using
a
dedicated
cluster
for
the
video
Cooney
back
end
now,
and
the
stats
cuff
has
only
got
two
regions.
Interestingly,
it
did
it.
We
tried
to
make
three,
but
we
we
basically
failed
to
stream
all
the
data
to
Japan
it
took
about
three
months.
A
Then
we
gave
up
so
it's
it's
cross-regional
repair
is
either
is
an
interesting
problem
which
I,
don't
think,
is
entirely
solved,
but
with
with
the
kind
of
with
the
volume
of
data
we've
got
in,
our
operational
cluster
hasn't
been
a
problem,
yet
we
organize
it.
We
use
the
ec2
snitch,
which
means
that
kind
of
knows
about
a
situ
and
it
will
arrange
the
machine,
the
replicas
such
that
it
spreads
them
out
into
different
availability
zones
in
different
regions.
So
if
you
lose
an
availability
zone,
you
should
still
have
a
replica.
A
Basically,
so
we
use
that
and
we're
using
alimony
using
em
one
large
machines
and
provision
I
op
cbs2
for
storage,
both
of
those
things
are
terrible
really,
and
you
should
definitely
not
do
that.
We're
currently
in
the
process
of
changing
to
use
ephemeral
disks
and
we're
going
to
use
them
m1x
large
machine
sizes.
When
we
first
started
using
Sandra,
we
kind
of
use
that
balance
between
needing
enough
replica.
So
you
can
tolerate
failure,
but
not
spending
an
enormous
amount
of
money
to
service,
not
very
many
requests.
So
that's
where
we
were,
but
now
we're.
A
Now
we
put
more
and
more
stuff
on
to
Cassandra.
It's
kind
of
it
makes
sense
for
us
to
have
more
to
make
have
larger
nodes.
Basically
just
some
quick
details
about
some
other
operational
aspects.
Backups,
we
at
the
moment
we
do
an
EBS
snapshots
and
that's
kind
of
the
only
Medini
feature
of
EBS,
which
is
that
you
can
just
snatch
all
the
whole
volume
instantly
and
then
and
then
keep
it,
but
it's
not
really
worth
the
payoff.
A
So
this
is
one
of
the
main
things
we
need
to
figure
out
to
move
to
ephemeral
and
we
have
a
solution
in
place,
but
I'm
not
how
to
potential
at
our
solution
is
actually
when
we
launched
in
NYC.
There
are
a
lot
of
requirements
about
data
security,
the
authorities
in
New
York
we're
very
particular
about
about
kind
of
sort
of
equivalent
security.
You
have
to
go
through
for
the
payments,
but
then
with
a
load
of
extra
stuff.
A
On
top
just
for
the
taxi
authorities
that
they
needed
done
and
one
of
the
requirement
is,
we
have
to
encrypt
all
data
arrest,
so
we
we
actually,
we
encrypt
all
of
the
volumes
the
Cassandra
runs
on
and
we
found
that
that's
actually
really
relatively
straightforward
to
implement
and
doesn't
really
add
a
lot
of
overhead.
So
we
just
use
DM
crypt,
that's
pretty
good
solution.
We
use
a
basic
top
Center,
which
is
like
a
just
a
really
easy
way
of
getting
some
pictures
of
your
cluster.
So
this
is
that
relatively
healthy
cluster.
A
Multi
DC
is
a
really
important
story
for
halo
and
in
terms
of
Cassandra
use.
So
this
was
kind
of
one
of
the
killer
features
for
us.
We
can.
We
started
off
just
having
machines
in
Ireland
and
then
we
brought
up
the
u.s.
cluster
and
we
were
able
to
join
them
together
without
any
downtime
at
all,
so
we
could
basically
grow
from
a
single
region
cluster,
and
now
we
now
we
run
a
cluster
that
spans
three
continents
and
we
could
do
that
with
no
downtime
at
all.
A
So
that's
quite
impressive,
and
if
you
go
back
to
the
idea
of
like
being
a
utility,
this
is
a
kind
of
important
point
for
for
halo.
We
don't
want
to
have
to
say
only
you
can't
get
taxis
for
this
hour
of
the
day.
While
we
do
a
bunch
of
data
migrations,
that
would
be
a
terrible
thing
for
our
customers.
A
A
It
was
becoming
a
problem
in
that
we
just
had
too
much
data
per
node
and
when
the
nodes
are
under
SPECT,
but
we
at
the
point
when
it
was
a
real
problem
at
the
crunch
time,
we
didn't
really
want
to
add
any
more
nodes
for
for
kind
of
cost
reasons.
So
we
added
compression
and
we've
managed
to
go
from
sort
of.
You
know
we
managed
to
have
our
data
volume,
which
was
pretty
nifty,
so
that
was
quite
an
easy
win,
so
management.
A
That
is
that
when
we
had
everything
in
mysql,
he
could
go
and
tap
anyone
on
the
shoulder
in
the
company
and
say:
can
you
just
go
and
count
up
these
things
and
tell
me
you
know
how
many
people
did
this
between
this
time
and
they
would
probably
be
able
to
do
it
because
most
people
have
a
enough
understanding
of
mice.
Curls
be
able
to
do
it,
whereas
with
Cassandra
I.
A
Think,
there's
two
points
number
one
is
that
you
know
unless
you
store
the
data
and
the
way
you
want
to
get
it
back,
you
can't
really
get
it
back
without
jumping
through
some
some
reasonable
hoops
like
setting
up
Hadoop
integration,
for
example-
and
the
second
point
is:
is
that
I
think
it
gave
people
an
excuse
not
to
do
things
they
didn't
want
to
do?
Which
was
you
know
the
developers
don't
really
like
doing
these
ad-hoc
staff?
A
So
if
you've
got,
if
you
can
just
say
well,
Cassandra
no
I
can't
do
it
anyway,
but
it's
interesting
for
management
that
basically,
when
we
migrated
to
Cassandra,
we
took
away
functionality
from
them.
We
prevented
them
being
able
to
do
something
that
they
could
do
before
we
migrate
it
and
the
CRO
kind
of
echoes
that
he
he.
He
was
saying
to
me
that
he
feels
it's
like
this
beautiful
solution
and
we've
implemented,
but
he
was
there
confirmed
they
basically
have
concerns,
so
they
technically.
A
They
trust
that
we
know
they're
making
decisions,
but
I
don't
really
understand
the
trade-offs.
We
didn't
explain
him
Pepe
enough
and
the
main
question
they
have
is
that
they
feel
that
we
were
ended
up
in
this
situation,
where
we've
migrated
to
a
database
that
basically
no
one
can
get
data
out
of,
except
for,
like
you
know,
a
few
skilled
professionals
and
even
then
against
some
tight
constraints.
So
this
is
kind
of
a
perception
I'm
not
entirely
convinced.
A
It
is
necessarily
true,
but
this
is
a
perception
from
management
that
halo
and
the
last
point
from
management
was
that
they
were.
They
were
worried
that
you
know.
We've
got
this
big
database,
we
can
store
everything
we
ever
want
to.
You
know
disks
achieve,
etc,
etc
and
they're
worried
that
we're
basically
causing
ourselves
a
big
data
problem.
So
this
is
an
example
of
taking
we
and
one
of
the
things
that
that
makes
halo
operate.
Is
that
taxi
driver
send
their
location
to
us
regular
intervals,
so
we
know
where
the
best
taxi
is.
A
So
when
you
hit
the
button,
we
can
pick
the
nearest
driver
and
offering
the
job.
So
we
collect
all
that
data
we
put
in
X
andhra,
and
this
was
taking
a
subset
of
the
data,
so
this
was
this
was
about
one
or
two
million
points
or
something
and
then
just
plotting
them
against
a
black
canvas
in
London
and
obviously
you
end
up
with
a
map
of
London,
because
taxi
drivers
drive
down
roads
so
tada.
A
The
interesting
thing
is
this:
is
this
is
a
subset.
This
is
like
a
couple
of
million
points,
so
it's
just
done
a
while
ago.
We've
got
you
know
three
or
four:
either
we've
probably
got
10
billion
points.
Now
we've
got
we've
got
in
you
know
all
this
data
and
restoring
Sandra,
and
the
question
is:
why
are
we
storing
any
Sandra?
A
You
know
what
is
the
actual
business
purpose
of
story
and
Sandra
because
obviously
does
cost
money
and
it
there
is
a
developers
will
do
things
because
they
can
do
them
and
Cassandra
means
you
can
do
a
load
of
stuff
that
you
couldn't
previously
do
you
couldn't
previously
store
10
10
billion
points
in
my
school
I?
Wouldn't
that
wouldn't
have
worked,
whereas
you
actually
can
do
it
with
Sandra?
A
So
let
lessons
learn
now
there
might
be
a
golf
and
experience
so
people
joining
halo
typically
have
no
experience
of
Cassandra
at
all.
We
have
been
trying
to
hire
a
effectively
a
DBA
someone
to
come
and
look
after
a
Cassandra
install
for
about
six
months
and
we've
had
I
think
we've
had
about
two
applicants
and
neither
remember
I've
used
cassandra
before
so.
A
A
Don't
think
anyone
has
joined
halo
and
having
used
Cassandra
before
so
it's
all
new.
So
the
interesting
thing
about
that
is
that
we
one
of
the
lessons
we've
learned,
Rania's
that
you
you
have
to
you
have
to
hit
that
head
on.
It's
not
necessarily
a
problem
if
you,
if
you
address
it
in
the
right
way-
and
these
are
some
of
the
things
that
we
think
helps
so
the
first
thing
you
need
to
have
an
advocate.
A
So
people
join
your
company
with
no
experience
with
Sandra
and
then
being
asked
to
develop
some
application
to
a
tight
deadline
against
this
thing,
probably
going
to
run
into
issues
and
they
might
end
up
hating
Sandra
because
you're
preventing
them
hitting
their
deadlines,
and
it
makes
them
feel
bad
and
look
bad.
So
you
need
to
have
an
advocate.
You
need
to
kind
of
sell
the
vision
of
you
know.
Why
have
we
picked
this
thing?
Why
do
we?
Why
are
we
asking
you
to
use
this
data
store?
A
You
know
because
we're
praying
multiple
regions
and
we
have
a
resilient,
so
you
can
sell
it,
but
you
need
to
do
that.
The
second
point
is
going
to
learn
the
theory
so
with
Cassandra,
it's
I,
I
think
I
still
think
it's
really
really
useful
to
understand
how
the
data
is
stored
and
to
understand
how
the
data
is
distributed
around
the
cluster.
A
So
to
sort
of
understand
the
fundamentals
of
you
know:
kind
of
dynamo,
the
distribution
and
the
big
table
for
petit
storage,
because
it
does
help
inform
how
you
go
about
building
your
application,
basically
cql
stuff,
that's
that
seems
to
be
really
popular.
Now
is
is
a
good
thing
for
reducing
the
barrier
to
entry
in
that
people
will
come
and
think
they
have
familiarity,
but
also
causes
its
own
issues,
in
that
it
hides
some
of
the
details,
so
you
could
be
using
cql
and
thinking.
A
It's
kind
of
you
know
you're
using
a
relational
database
and
you're,
not
basically
so
it's
it's
an
interesting
problem,
and
then
finally
much
just
making
everyone's
make
an
effort
to
get
one
on
board.
When
we
unilaterally
adopted
Cassandra
halo,
we
probably
didn't
put
enough
effort
into
getting
everyone
on
board.
There
are
a
few
key
people
in
the
company
who,
who
were
kind
of
you,
know
grisly
old
java
devs,
who
was
quite
stuck
in
their
ways
and-
and
we
probably
didn't
do
enough
of
a
job
of
selling
them.
A
Secondly,
second
learning
things
can
drift
into
failure.
This
is
a
this
is
this
is
when
things
don't
look
good
from
off
center
by
the
way.
If
you
see
this,
then
something's
not
right.
So
this
is.
We
want
all
the
balls
to
be
the
same
size,
and
we
want
the
streaming
to
be
not
doing
that
much
streaming.
This
basically
tells
us
they're,
like
we've,
got
really
unbalanced.
Cluster
lots
of
things
are
streaming
to
lots
of
other
things,
we're
doing
all
this
extra
io
to
try
and
get
the
cluster
back
into
the
same
state.
A
While
this
is
going
on,
the
application
will
be
having
problems.
Interestingly,
we
were
able
to
continue.
We
were
limping
on,
but
but
not
brilliantly.
This
was
actually
our
stats
cluster,
so
it's
not!
Luckily
we,
you
know
we'd
already
foreseen
that
it
was
beautiful
thing
to
do
to
partition
by
usko.
So
we
have
a
stats
cluster
that
isn't
required
for
getting
you
a
taxi,
but
still
so
it
is
kind
of
easy.
It's
easy
to
shoot
off
in
the
foot
with
Cassandra
as
it
is
or
anything,
I
guess.
This
is
another
point
about
experience.
A
When
people
join
your
company
with
mysql
experience,
they're,
probably
not
going
to
make
a
mistake
like
creating
terrible
indexes
or
no
indexes
on
you
know
on
a
relational
database
or
as
what
Cassandra
they
might
make
similar
things
that
once
you've
been
using
it
for
a
while
you'd
go
yet.
Definitely
don't
do
that,
but
they
wouldn't
have
seen
it
coming.
So
this
is
we've
really
recently.
This
is
kind
of.
In
the
last
week
we've
had
someone
in
to
kind
of
help
us
look
at
a
Cassandra
cluster.
A
So,
basically,
if
you,
if
you,
if
you
know
anything
about
hi
Cassandra
works
under
the
hood,
whenever
you
write
data,
it's
putting
it
into
SS
tables
and
they're
immutable
and
it
flushes
them
to
disk.
So
if
you're
updating
things
often,
you
know
a
lots
of
SS
tables,
and
you,
if
you
do
a
query,
that
is
something
like
select
everything
select
all
the
columns
in
this
row.
Cassandra
has
to
go
and
look
in
every
single
SS
table
basically,
and
that
can
slow
your
performance
down.
So
ideally
you
want,
you
would
want.
A
You
know
less
queries
to
be
looking
at
that
many
SS
tables
and
in
this
instance,
for
this
particular
column,
family
Aaron
pointed
out
that
indicated
that
we
had
like
a
mixed
workload.
We
had
some
relatively
static
customer
data
that
pretty
much
never
changed,
and
then
we
had
some
dynamic
bits
and
we
were
storing
them
both
in
the
same
place
and
what
that
meant
was
that
every
time
we
updated
right,
one
field
we
kept
generating
more
entries
in
more
SS
tables,
so
to
gather
the
customer
row,
we
have
to
go
and
look
at
all
this
data.
A
Actually
most
of
the
data
were
updating.
We
don't
really
care
about
when
you
need
to
get
taxi
is
to
stuff
the
static
star.
So
we've
much
better
splitting
the
use
case,
because
then
we'd
have
a
column,
family,
we've
sort
of
separating
out
the
data
that
doesn't
change
very
often
from
data
that
changes
all
the
time
and
that's
a
good.
A
This
is
the
same
alper
different,
different
color
we've
lost
the
headers
now
so
it's
difficult
to
understand
what
it
means,
but
this
is
the
real
latency
column
and
what
this
actually
means
is
and
we're
missing
a
little
tiny
bit
of
a
leper.
This
means
that
a
thousand
and
forty-three
queries
took.
You
have
to
go
right
away
back
to
the
left
to
the
left
column
two,
and
it's
that
many,
my
micro
seconds
to
complete
and
the
interesting
thing
about
this.
A
A
Down
point
to
that
was
that
we
have
a
bunch
of
things
that
are
completing
in
20
micro
seconds
and,
and
you
know,
a
really
short
amount
of
time.
Basically
a
lot.
You
know
a
lot
of
fair
amount
of
queries
complete
in
almost
no
time
at
all,
but
then
we
have
down
here.
We
have
still
a
reasonable
number
of
queries
that
are
taking.
You
know,
orders
of
magnitude
longer,
and
he
was
pointing
out
that
you
know.
Maybe
this
indicates
you've
got
a
problem
with
ionno
enough
capacity
or
maybe
again,
data
modeling
problems.
A
We're
asked
to
go
and
look
at
all
these
SS
tables.
So
there's
some
interesting
things
you
can.
You
know
get
from
your
cluster
just
by
running
CF,
histograms
and
the
other
office
of
obvious
one
to
run.
Is
it
CF
stats,
and
this
is
an
example
where
we
have
a
row
sighs,
that's
actually
16
gigabytes.
So
this
is
a
sit.
The
biggest
row
in
this
column,
family,
is
16
gigabyte.
A
Errands
advice
to
us
was
that
he
reckons
if
you've
got
a
rose
of
like
you
know,
50
meg.
He
would
probably
ask
you
a
question
and
say:
are
you
sure
you've
got
the
data
model
right
said?
If
you
had
a
row
of
about
a
hundred
Meg,
he
would
probably
be
saying
to
you.
Please
change
the
data
model,
so
he
wasn't
no
impressed
with
our
16
gig
row.
A
Suffice
to
say
this
is
because
we're
doing
time
series
and
we
picked
a
bad
granularity,
so
we're
storing
everything
for
one
day
in
a
row
and
as
halo
grows.
That
means
that
the
Rose
get
bigger
and
bigger
and
bigger,
and
because
of
the
way
Cassandra
works,
you
know
he
has
to
it
basically
like
the
gross.
So
keep
an
eye
on
that.
A
So
lessons
learned
be
proactive.
Basically,
we
thought
a
Cassandra
cluster
was
running
really
smoothly.
Most
time
it
was,
but
it's
worth
checking
these
things
out
occasionally
go
and
look
at
CF
starts
going.
Look
at
CF,
histograms
peer
review
data
model
spend
a
bit
of
time.
Designing
dates
models,
especially
if
you
keep
in
mind
that
you're
going
to
have
people
with
no
experience
or
little
experience.
You
know
building
applications
on
top
of
it.
It's
worth
just
taking
a
bit
of
time
and
saying.
Why
are
you
storing
everything
for
one
day
under
one
row?
A
A
The
other
interesting
point
iron
rails
was
about
compaction
strategy
and
he
was
saying
about
if
you
have,
if
you're
doing
a
lot
of
over
rights
to
column
values,
it's
worth
thinking
about
using
leveled
and
compaction,
because
it's
it's
more
aggressive
with
with
with
basically
doing
the
compaction
and
reducing
them
of
SS
tables.
I'm,
pretty
sure,
that's
always
selling
right,
but
I
guess
his
point
really
was
you
know
you
don't
need
to
have
the
same
compaction
strategy
for
every
column
family.
So
it's
worth
considering.
You
know
what
what
is
this
common
family
do?
A
A
Next
learning
EBS
is
terrible.
So
whenever
amazon
have
a
problem,
it's
always
it's
nearly
always
EBS.
If
anyone
doesn't
know,
EBS
is
kind
of
an
Amazon's
network
backed
you
know
the
magic
storage
thing
when
EBS
fails
it's
gonna.
If
every
Sandra
node
is
using
EBS
and
EBS
goes
wrong,
then
every
central
node
starts
working
at
once.
So
it's
kind
of
thing
where
you
know
you've
got
this.
Oh
we've
got
this
brilliant
distributed
database.
You
know
no
single
points
of
failure
except
they're
all
using
this
one
thing
for
storage.
So
you
know
if
EBS
goes
wrong.
A
It
all
stops.
Working
EBS
is
generally
slow
and
expensive,
and
really
it's
unnecessary,
I
think
it's
I.
Don't
really
want
to
use
EBS,
except
for
us
actually,
but
where
we're
going
to
change
pretty
soon
and
I.
Think
the
only
thing
is
I
fear
that
you've
got
your
data
in
these
ephemeral
dryers
and
if
you
stop
the
instance,
it's
all
going
to
go
away.
But
you
have
to
bear
in
mind.
A
A
Next
point
management,
so
we
should
have
done
better,
explain
the
trade-offs
basically
of
Cassandra.
We
should
have
kept
the
business
informed
and
let
them
know
in
Sylvia,
migraine,
Cassandra
and
just
selling
the
positives
or
via
all
resilience.
All
these
brilliant
features
were
reading
in.
We
should
have
pointed
out
to
them.
Actually
you're
going
to
be.
A
You
know,
one
of
the
trade-offs
is
going
to
be
that
we're
not
going
to
have
it
as
easy,
a
dork
access
to
the
data
and
then
and
then
we
could
have
decided
without
as
a
problem
and
may
be
implemented
a
solution
because
the
right
I
mean
you
know
the
things
you
can
do
you
can.
You
can
run
DSC
and
then
have
the
hive
integration.
You
can
bake
your
own
Hadoop
integration.
There
are
ways
of
solving
a
problem
basically,
but
we
should
have
done
better
of
thinking
about
a
sim
sing
from
the
same
him.
A
She
is
really
just
making
sure
all
you
know
halo.
We
should
have
made
sure
all
the
senior
people
were
on
board
with
this
project.
The
Cassandra
project
I
think
everyone
is
now,
but
just
there
were
some
times
during
the
life
cycle
of
X
and
adoption
where
it
would
have
been
really
good.
If
everyone
had
been,
you
know,
oh
yeah
Sandra's
the
way,
rather
than
any
sign
of
any
trouble
at
all.
You
know
questioning
it
and
then
what
we
could
have
hopefully
ended
up
with.
Is
you
know
to
sort
of
sort
of
olay
management
fears?
A
We
could
have
tried
to
end
up
with
a
situation
where
we
we
put
in
place
solutions
from
day
one.
Basically,
so
we
could
have
looked
at
that
problem
of
you
know.
Maybe
we're
going
to
be
reducing
the
kind
of
feature
set,
making
it
harder,
but
we
can
put
and
place
this
other
solution,
which
means
that
when
people
join,
they
can
pretty
much
do
it.
You
know,
maybe
hive
would
have
answered
this
I'm,
not
sure,
but
we
should
have.
We
should
have
done
better
on
that
all
right,
so
just
to
quickly
wrap
up.
A
We,
like
we
basically
like
Sandra,
a
halo
and
we
like
it
because
we
feel
it
has
a
solid
design
so
we're
building
on
something
that
has
solid
design,
principles
and
its
core,
and
it
seems
to
have
been
well
thought-out
in
terms
of
you
know
how
it
operates
and
the
trade-offs
it's
making.
You
know
the
kind
of
masterless
design
of
a
dynamo
and
these
sorts
of
things
we
like
that.
We
like
the
haterade
characteristics
and
we
like
the
easy,
easy
setup
for
multi
DC.
A
So
we,
you
know
we'll
probably
end
up
with
we'll,
probably
end
up
in
five,
maybe
six
data
centers
as
we
continue
to
grow,
and
we
like
the
simplicity
of
operation,
although
it's
not
when
out
as
issues
EG
trying
to
hire
people
who
know
about
it,
it
hasn't
really
causes
enormous
number
of
problems
at
halo
and
we
haven't
right
now
we're
doing
a
little
bit
of
work
to
to
kind
of
just
get
back
on
top
of
it.
But
until
now
we're
pretty
much
left
to
the
line
in
terms
of
successful
adoption.
A
I
think
I
think,
having
an
advocare
is
probably
the
single
most
important
thing.
In
my
mind.
You
want
someone
who's
going
to
go,
and
you
know
if
everyone
in
the
company
of
all
the
developers
believe
in
it,
then
it's
your
life
can
be
a
lot
easier,
because
when
things
go
wrong,
they
still
going
to
believe
in
it
rather
than
blaming
at
all
and
other
stuff,
like
you
know,
invest
in
the
tools.
A
So
the
future
for
halo
well
we're
going
to
we're
going
to
continue
with
Sandra
and
we're
going
to
we're
migrating
more
and
more
stuff
to
Cassandra.
It's
plausible
up
by
this
time
next
year
will
be
running
pretty
much
surely
on
Cassandra
I
think
that's
kind
of
where
we're
heading
we're.
Looking
for
people
with
experience,
running
Sandra,
if
you're
interested
you
can
always
grab
me
we're
going
to
try
and
expand
our
reporting
city
so
that
problem
of
you
know
can
people
do
ad-hoc
queries
we're
going
to
try
and
address
that.
A
A
It
buries
it
really
does
vary.
That
was
just
that
was
kind
of
just
one
that
was
basically
one
column,
family
out
of
I,
don't
know
30
or
something
so
I'm,
not
really
sure
he
does
vary
some
of
them
like
the
stats.
One
is
really
really
right:
heavy
yeah
we're
just
doing
four
or
five
thousand.
You
know
right,
second,
and
almost
virtually
no
reads
really
and
then
some
of
the
stuff
like
customer
details
I.
Ideally
we
should
split.
A
Like
I
said
we
should
split
the
into
to
Colin
family,
so
one
of
them
would
have
almost
no
rights,
and
not
that
many
reads,
but
a
reasonable
number
and
the
other
one
should
be
lots
of
rides.
Very
few
reads:
you
write
a
lot
of
updates,
isn't
as
opposed
to
insert
like.
Do
we
update
yeah,
we
do.
We
do
have
use
cases
which
update
a
lot
actually
I'm,
not
sure.
That's
a
brilliant
pattern
to
the
customer.
A
Do
you
know
we
update
things
in
the
customer
when
you,
when
you
use
the
app
will
update
I,
know
last
logged
in
or
whatever
and
last
you
know
last
did
something
will
do
things
like
that?
We
are
over,
we
are
doing
a
lot
of
overrides
I'm,
not
hundreds
in
sure,
that's
a
good
idea,
but
we
are
doing
yep
from
cabs.
We
basically
need
their
locations
so
that
we
know
where
they
are
so
that
we
can
offer
them
jobs.