►
Description
Speakers: Jimmy Mårdell, Tech Product Owner at Spotify & Patrick McFadin, Chief Apache Cassandra Evangelist at DataStax
A
So
welcome
to
european
cassandra
summit
2014
jimmy.
Thank
you
very
much
so
jimmy.
Why
don't
you
introduce
yourself?
Tell
us
what
you
do.
B
A
So
all
right,
spotify,
of
course
I
know
what
it
is.
I
love
it.
What
about
those
people
who
live
under
a
rock
that
have
never
heard
of
spotify.
B
What
are
you
telling
people
about
so
so?
Spotify
is
a
music
streaming
service,
bringing
you
the
right
music
for
every
moment,
be
it
on
computer,
mobile
or
tablets,
we're
essentially
your
music
wardrobe.
We
we
have
all
the
music
that
that
you
can
ever
want
to
listen
to.
I.
B
A
So
like
what
are
some
of
the
use
cases
that
you
use
cassandra
for.
B
So
we
so
we're
using
cassandra
for
it's.
It's
really
the
our
de
facto
database
when
it
comes
when
we
want
to
store
data
higher.
We
need
to
be
highly
available
and
used
by
many
users,
and
we
have
a
lot
of
users
and
oneness.
For
example,
we
all
our
playlists
that
you
have
that
you
have
created
are
stored
in
like
big
cassandra
clusters,
but
we
also
use
a
cassandra
for
social
networking.
We're
using
cassandra
for
storing
music
collection
recommendations
and
everything.
A
So
that
was
actually
one
of
the
first
presentations
I
saw
you
guys
do
as
a
playlist.
You
had
a
really
interesting
playlist
application.
So
of
course
I
appreciate
that
as
a
spotify
user,
that
my
data
is
probably
going
to
be
there
when
I
save
something
on
playlist,
it
will
stay.
But
would
you
say
that
I
mean
what
was
one
of
the
main
reasons
you
use
cassandra.
B
So
when
we
started
a
spotify,
we
had
this
problem
of.
How
do
you
store
data
reliably
for
millions
or
hundreds
of
millions
of
active
users
and
because
we're
building
our
own
data
center,
because
we
want
really
low
latency
when
you
play
music
and
we
had
to
figure
out
our
own?
What
we
need
to
host
our
own
databases
and
we
tried
out
many
different
kind
of
databases-
relational
ones.
They
didn't
scale
us
very
well.
As
you
know,
we
tried
out
some
other
ones,
and
then
we
found
this
dynamo
paper,
which
is
famous
that
we
started.
B
Something
on
top
of
that
it
was
miserable.
Okay,
we
made,
let's
make
a
second
attempt,
but
hey
wait.
We
have
some
guys
here
that
have
done
something
based
on
this
as
well,
which
seems
to
be
everything
we
want
right.
It's
based
on
dynamo,
it
uses
immutable
log
structure,
storage,
immutable,
fine
seems
very
durable,
so
we
decided
to
give
it
a
shot.
A
B
Yeah,
that's
a
bad
use
case
because
cassandra
used
to
typically
store
structured
data
in
in
cassandra
for
for
big
data
files
like
big
music,
mp3,
mp3
files
or
whatever
format
you're
using.
You
typically
store
that
in
big
storage
or
cdn,
or
something
like
that.
A
B
We
use
cassandra
for
many
different
use
cases.
Typically
the
way
spotify
works.
You
have
a
micro
service
architecture,
so
we
have
a
lot
of
small
services
doing
one
thing
and
doing
one
thing
well,
and
each
of
these
services
needs
to
store
data
you,
the
data
that
does
that
service
needs
to
store,
it's
usually
fairly
simple.
It
can
be
a
simple
key
value
store
and
you
can
use
metadata,
basically
key
value
store
because
excellent
key
value
store.
It
brings
all
this
durability
and
everything,
but
then,
of
course,
cassandra
is
so
much
more.
B
You
can
use
it
also
to
store
time
series
state,
it's
excellence
on
time,
series
data,
it's
excellent
for
storing
many
different
kinds
of
data,
and
it's
really
this
high
availability
and
multiple
data
center
support.
That's
made
at
us
cassandra
for
almost
anything.
So
I
think
there's
almost
nothing
that
you
couldn't
use
cassandra
to
store
so.
A
Recently,
we
don't
have
to
talk
about
exactly
what
the
feature
was,
but
you-
and
I
were
talking
about
multi-data
center-
is
becoming
the
problem
to
solve
right.
So
would
you
say
that's
probably
one
of
the
primary
reasons.
B
I
still
think
that's
a
fairly
common
thing
and
it
works
horribly
because
data
centers,
you
want
to
do
maintenance,
you
can
have
routers
going
down
the
switches
and
we
want
to
be
able
to
shift
the
user
from
one
data
center
to
another
without
losing
the
data
right
and
using
multi.
Multiple
data
center
replication
in
cassandra
gives
us
that.
B
Yeah,
either
because
of
the
provider
doing
something
bad
or
which
has
gone
wrong
or
any
kinds
of
problems
they
can
be.
I
mean
you
can
have
network
partitionings
between
between
data
centers.
We
had.
There
was
an
example.
B
A
B
Yes,
it's
also
multiple
data
center
gives
much
greater
latency
much
better
latency.
A
True,
that's
probably
true
now,
with
your
microservices
architecture.
Does
that
meld
well
with
what
cassandra
does
for
you?
I
mean
like
being
distributed
and.
B
Yes,
I
mean
so
what
we're
doing
for
cassandra
when
it
comes
to
microservice
architecture?
Is
that
we're
actually
using
many
cassandra
clusters?
So
that's
something
I
believe
that
maybe
not
so
many
many
companies
are
using
their.
I
think
it's
more
common
scenario
that
you
have
just
one
or
two
or
very
few
cassandra
clusters,
and
you
have
many
key
spaces
to
store
a
lot
of
data
in
there,
and
then
you
have
one
dba
or
two
debates
handling
all
that.
Instead
we're
going
a
different
direction.
B
A
B
It
once
upon
a
time
until
after
a
year
ago,
we
had
yes,
indeed,
one
team
that
did
all
the
debating
and
managing
cassandra
and
working
with
center
upstream.
But
then
there
was
a
shift
about
a
year
ago
where
the
operational
responsibility
was
pushed
out
from
this
centralized
team,
and
this
is
not
only
cassandra
but
any
operations
within
spotify
from
one
team
to
all
the
teams
actually
developing
the
features
so
we're
pushing
out
the
responsibility
share.
The
load
spread
the
pain
and
as
a
developer.
A
That's
becoming
more
of
a
common
theme
and
I
think
that's
great
I
mean
it's.
It
makes
you
move
a
lot
faster,
yeah
so
and
that's
good
as
a
user.
I
like
to
see
more
features
coming,
so
I
had
less
downtime.
It's
awesome
yeah.
So,
yes,.
B
So
when,
when
we
made
that
change
there,
there
was
okay,
we,
this
would
probably
not
work.
Well
initially
we
thought
it
was
on
the
long
run.
It
would
work
well,
but
they
actually
turned
out
that
down
that
up
time
went
up
right
away
from
the
start.
Oh
that's
great
yeah,
so
it
will
be
pleasant
positively
surprised
by
that.
So
you're
doing
a
talk
today,
yes
and
we'll
talk
today.
B
So
I
will
talk
about
exactly
what
yes,
totally,
that,
how
our
how
we
operate
the
center
model,
this
decentralized
model,
spreading
the
pain
not
having
a
small
team
operating
at
center
clusters.
I
will
explain
how
we
came
to
use
cassandra
and
then
we'll
talk
a
bit
about
repairs,
which
is
as
jonathan
explained
this
morning
is
the
very
common
pain
in
cassandra.
And,
finally,
I
will
talk
about
day-tier
compaction,
which
is
a
a
recent
invention
that
we
did
at
spotify
and
which
is
now
available
so.
A
That
that
one
is
actually
something
you
and
I
share
a
little
common
ancestry
on,
because
that
was
something
that
I
was
very
interested
in
early
on
and
then
I
was
in
stockholm
last
spring
yep
and
I
met
bjorn.
Yes,
so
now
bjorn
works
for
spotify
yeah.
He
does.
Can
you
tell
a
little
bit
of
how
that
came
about.
B
Yeah
so
about
a
year
ago,
we
had
our
our
main
cassandra
engineer,
marcus.
He.
He
came
up
with
this
idea
of
that
for
time,
series
state
and
none
of
the
compaction
strategies,
zeiss
tier
or
level
tier,
is
actually
perfect
for
the
use
case.
So
he
came
up.
We
could
do
something
different
about
this,
and
then
we
had
this
master
thesis
student
who
wanted
to
work
on
cassandra
and
we
thought
that
was
a
perfect
fit.
A
It's
well
it's
so
really
it's
about
maintaining
a
different
compaction
strategy
around
the
fact
that
you
know
that
the
data
is
immutable,
that
it's
going
to
have
a
long
tail
and
stop
recompacting
data,
that's
already
being
compacted
yeah.
B
A
This
in
production
right
now,
so
what
have
you
seen?
What
are
some
of
the
benefits
you've
seen.
B
If
so,
it's
only
been
in
production
for
like
two
three
weeks,
so,
unfortunately,
we
haven't
gathered
as
much
data
on
it,
yet
we're
using
it
actually,
for
we
have
several
use
cases
for
it
and
we're
using
it
for
actually
a
case
right
now
with
ttl
data.
So
one
other
benefit
of
data
compaction
is
that
if
you
have
a
lot
of
detailed
data,
you
can
actually
drop
entire
asset
tables
at
a
time
because
you're
not
mixing
all
the
new
data,
because.
A
So
that,
actually
that
sounds
like
a
great
idea:
you're
just
deleting
a
file.
B
Yes,
so,
and
that
works
turns
works
very
well
for
time
series
we
we
still
just
very
early
on
in
testing
phase.
I
can't
give
you
any
numbers,
but
we
did
we
did
when
it
was
developed.
We
did
run
it
in
server
mode
for
a
long
long
time
in
production
and
saw
that
yeah
that
the
performance
was
much
better.
A
B
B
For
a
typical
case,
for
instance,
where
we're
really
looking
where
we're
trying
this
out
now,
which,
where
it's
it's,
that
we
have
this
cassandra
cluster-
that
monitors
all
all
services,
spotify
right,
thousands
of
them
and
there's
a
lot
of
metrics
that
come
in
ever
and
it's
it's
a
your
standard
time
series
use
case,
and
then
we
have
all
the
graphs.
Everything-
and
I
mean
the
most
common
thing-
is
that
you
only
want
to
look
at
the
graphs
from
the
latest
day
or
latest
week,
but
we
store
it
for
entire
year
right.
B
A
So
I'm
excited
about
this,
because
I
I
mean
I,
this
has
been
something
I've
been
wanting
for
at
least
a
year
or
two,
and
so
I've
been
anxious
to
watch
and
see
how
it
goes
so
I'll
probably
be
hitting
you
up
again
and
see
how
things
are
going.
Maybe
the
next
summit
talk
will
be
on
how
it
works
in
production.
Yes,.
A
B
In
a
large
organization,
if
you
get
started
with
center
today,
you
should
realize
that
you,
you
will
get
exposed
to
sql
immediately
and
but
sql
is
not
sql.
So
you
really
need
to
understand
how
cassandra
work
you
can't
just
assume
this
is
oh
now
I
can
use
cassandra
as
a
relational
database
because
you
really
need
to
understand
data
modeling
and
there's
a
lot
of
excellent
guides
out
there
on
how
to
get
started
with
that,
but
also
how
to
set
up
cassandra
clusters
make
sure
make
sure
you
read
up
on
best
practices.
B
A
Well,
and
thank
you
for
being
a
part
of
that
community
too,
thank
you.
I
did
a
meet
up
there
and
I
think
that
was
probably
one
of
my
biggest
ones.