►
Description
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
A
A
And
welcome
to
this
Cassandra
community
webinar
from
MongoDB
to
Cassandra
architectural
lessons
before
we
get
started,
I
wanted
to
draw
your
attention
to
a
new
free
online
resource
that
we
have.
If
you
are
new
to
Cassandra,
want
to
learn
more
about
it.
We
have
free
online
training.
I
will
show
this
link
at
the
end
of
the
webinar
as
well.
The
first
course
is
Java
development
with
Apache
Cassandra,
and
we
are
releasing
new
modules
every
single
week.
So
now
I'd
like
to
introduce
Blake,
eggleston
and
John
Haddad,
great
friends
of
the
Cassandra
community.
A
This
is
actually
the
second
webinar
that
Blake
and
John
have
done
about
their
migration
from
MongoDB
to
Cassandra.
As
always,
we
will
leave
questions
for
the
end
of
the
webinar.
So
if
you
have
a
question
for
jon
and
blake,
please
use
the
Q&A
tab
inside
of
WebEx
post
your
question
there
and
we
will
reserve
time
at
the
end
to
get
through
as
many
of
them
as
we
can
so
jon
and
blake
welcome
and
you
have
the
ball.
Take
it
away
all.
C
D
So
yes,
I'm
black
I
open.
This
is
John
nada.
Were
you
talking
about
some
architectural
lessons
from
MongoDB
to
Cassandra?
Okay?
So
today
we're
going
to
be
covering
initially
some
differences
in
the
architecture
between
the
two
databases
and
then
we
go
into
some
look
at
some
of
the
internals
of
our
applications
and
discuss
some
of
the
more
interesting
ways
that
we
are
using
Cassandra
to
solve.
Some.
B
Well,
yes,
okay,
so
starting
out
on
the
MongoDB
architecture,
side
of
things
through
simple
important
concepts
that
we
should
really
go
over.
So
at
the
smallest
unit,
you've
got
a
single,
monitor,
server,
which
I
think
everybody's
aware
of
you
can
build
a
server
you
put
a
bunch
of
them
together,
you
can
create
a
replica
set
which
is
just
a
master
slave
configuration.
B
So
once
you
take
those
replicas
of
you,
take
a
bunch
of
replica
sets
together
and
then
you
can
put
them
together
to
form
a
cluster,
so
the
individual
replica
sets
are
considered
shards
and
they
basically
just
hold.
Each
shard
holds
a
portion
of
your
data
to
manage
the
cluster.
You
have
these
config
servers
which
handle
the
network
topology,
which
shower
sorry,
which
basically
it
determines
which
Keys
go
onto
which
sean
to
connect
to
the
cluster.
B
D
C
D
Contrast
with
cassandra,
there
is
only
one
type
of
server,
which
is
just
the
commander
server.
There
is
no
master
or
slave
and
cassandra
sets
itself
up
as
a
token
ring
where
each
node
is
responsible
for
a
certain
range
of
tokens
and
the
you
know
in
that
the
hash
Frank
babe
with
so
each
node
is
responsible
for
a
certain
range,
and
then
it
replicates
its
data
forward
on
the
ring
to
the
next
two
or
three
notes,
depending
on
how
what
your
replication
factor
is.
D
We
use
three
here
so
spell,
but
you
know
that's
what
we
use
for
all
of
our
example.
So
the
advantages
of
this
is
you
have
no
single
point
of
failure.
If
a
if
a
node
just
disappears
goes
down,
it's
not
a
big
deal,
because
the
data
has
been
replicated
and
will
automatically
start
routing
queries
to
different
nodes,
and
you
know
everything's,
okay.
D
So
when
you're
doing
reason
right,
these
Keys
has
to
a
particular
location
on
the
ring,
and
this
will
limit
the
query
flexibility
a
little
bit
because
you
always
have
to
query
by
the
row
key
so
that
it
knows
we're
on
the
ring
to
route.
The
query
to
finally
each
query
has
a
consistency
level.
So
if
you
want
to
be
kind
of
wild,
just
safa
loose
your
data,
you
can
do
that
in
the
only.
C
D
B
So,
at
the
heart
of
each
cassander
server,
you
have
your
storage
model,
which
is
important
to
get
into
and
understand.
You
have
to
know
this
in
order
to
build
your
tables
efficiently.
So
one
thing,
that's
very
important
to
keep
in
mind
is
that
at
those
tables
which
are
the
file
little
storage
for
Cassandra
their
immutable,
so
once
they're
written
there
never
changed.
There's.
B
Advantages
to
this,
one
of
which,
being
it's
harder
to
get
corrupted
another
one,
is
make
backups
super
super
easy,
because
once
an
SS
table
is
written,
all
you
have
to
do
is
copy
that
table
somewhere
and
you're
good
to
go
with
each
piece
of
data.
That's
written,
you
have
a
key
and
you
have
a
column,
and
then
you
have
a
value
now
you
also
in
get
for
free,
a
timestamp
of
when
it's
written
so
because
you
can't
edit
an
SS
table
in
place.
What
you
do
is
you
actually
end
up?
B
Writing
the
same
data
or
sorry,
your
end
up.
Writing
the
same
key
and
column
combination
to
multiple
SS
tables,
and
that's
actually
not
that
big
of
a
problem.
So
basically
what
happens
is
when
a
read
happens.
You
end
up
reading
in
several
SS
tables
and
the
the
column
with
the
newest
timestamp
wins
so
you're
probably
wondering
what
does
that
mean?
Is
that
efficient?
What
happens
with
multiple
copies
of
the
same
column?
B
Well,
there's
something
called
compaction
and
where
compaction
does
is
it
takes
multiple
SS
tables
and
it
will
merge
them
into
a
single
one,
and
this
just
basically
gets
rid
of
the
duplicate
column,
values
and
just
takes
the
newest
form.
So
the
other
thing
to
keep
in
mind
is
with
a
distributed
system
like
this.
Those
I
got
a
concept
called
a
tombstone,
so
a
tombstone
is
basically
like
a
null
value.
It
basically
says
this
data
has
been
deleted.
C
B
D
Ok,
so
now
I've
been
about
how
what
happens
when
you
head
to
a
rice
against
your
Cassandra
cluster.
When
you
hear
right,
you
can
execute
against
any
node
in
the
cluster
and
that
node
then
becomes
the
coordinator,
and
it
will
figure
out
based
on
the
key,
which
knows
this
right
should
go
to
the
wii.
For
rice.
You
right
is
a
safe
into
men
table
which
is
basically
a
portion
of
Cassandras
database
of
that
lives
in
memory,
and
then
it's
also
written
to
a
commit
log,
and
then
men
tables
are
periodically
flushed
to
SS
tables.
D
D
When
it
turns
back
on,
it
will
basically
replay
the
commit
log
to
bring
its
understanding
of
your
data
set
to
where
it
was
when
your,
when
your
note
turned
off
so
here
are
a
painter
read
as
far
as
the
way
that
cluster
behaves
on
a
network
level.
It's
pretty
much
the
same
as
right.
You
can
read
from
any
node
and
then
that
note
will
go
out
and
talk
to
the
nose
that
actually
own
the
data.
D
You
as
your
you're,
the
answer
to
your
query.
It
may
also
perform
a
read
repair
if
there
are
inconsistencies
in
the
data
Achatz
act,
and
so
one
thing
to
also
point
out
is
it
reads
our
more
time
consuming
than
rights,
because
it's
not
just
going
out
and
appending
to
a
commit
log.
It's
actually
going
doing
disk
C
pulling
data
in
looking
at
it
and
then
sending
it
back
over
the
wire
to
be
reconciled
on
the
coordinator.
B
Ok,
so
for
those
of
you
familiar
with
MongoDB,
you
probably
know
a
lot
of
these
advantages.
There
are
I
think
one
of
the
things
that
that's
touted
for
as
its
as
it's
big
advantage
is.
It
has
super
super
super
flexible
documents,
so
I
think
a
lot
of
a
lot
of
people
when
they're
talking
about
data
modeling
with
long
ago.
They
don't
really
need
to
take
into
consideration
how
your
data
is
stored,
so
because
you
can
query
it
so
flexibly.
You've
got
a
lot
of
flexibility
with
your
indexes
and,
as
a
result,
your
queries.
B
B
B
B
To
the
entire
cluster,
so
every
shard
will
actually
have
to
execute
a
portion
of
the
query
and
the
results
will
come
back
to
the
mando's
process.
Now.
The
downside
here
is
that
it's
harder
to
scale
linearly
when
every
shard
is
being
contacted.
To
answer
a
query.
So
really
what
ends
up
happening
is
this
whole
flexible
data
model
ends
up
becoming
a
lot
more
simplified,
so
all
those
features
that
they
towels
is
being
really
cool
and
useful.
B
You
end
up
not
using
them,
because
once
your
data
hits
a
certain
size,
it's
just
impractical
when
you
write
documents,
if
you're
doing
updates
and
your
document
size
is
constantly
growing,
you'll
actually
end
up
allocating
more
memory,
which
will
be
a
problem.
You
need
to
do
DB
repair
in
order
to
fix
this
or
restart
your
server.
Basically,
you
can
end
up
with
a
lot
of
really
crazy
problems.
B
C
B
C
C
C
B
B
D
B
C
D
So
Sandra
some
of
the
advantages
is
that
it's
a
person,
it's
multi
data
center,
aware,
and
it's
had
that
for
quite
some
time,
so
it's
pretty
reliable,
as
we
saw
with
the
architect
the
cluster
architecture
overview.
There
are
many
fewer
moving
parts.
There's
one
type
of
server
is
to
worry
about.
You
don't
have
to
worry
about
any
sort
of
walking
a
database
table
level
and
because
of
the
way
the
destroys
data
on
disk
time
series
data
or
any
data.
This
sorted
is
super
super
super
fast.
D
Also
because
of
the
way
the
coordinator
nodes
talks,
two
different
knows:
when
talk
two
minutes.
Looking
for
key,
you
can
scale
pretty
much
linearly,
as
you
add
servers
to
your
cluster,
and
it
also
has
a
compaction
options
for
your
SS
tables
that
are
optimized
for
either
your
traditional
spinning,
disk
or
SSD,
and,
like
I
mentioned,
you
have
a
ton
of
control
over
how
your
data
is
actually
stored
on
disk.
D
D
To
do
but
again
when
you're
talking
about
data,
that's
stored,
stored
on
disk
is
much
faster.
The
JVM
can
be
a
little
tricky
to
work
with.
You
know,
it's
fine
to
start
up
a
cluster
and
get
it
working,
but
once
you
start
getting
into
performance
tuning
stuff,
there's
a
lot
of
understanding
what
the
JVM
that
you
need
to
have
to
do
that
effectively.
D
Also-
and
this
is
positive
and
kind
of
not
but
the
laying
out
your
your
your
tables
and
data
modeling-
requires
a
bit
more
planning.
You
kind
of
need
to
know
up
front.
What
kind
of
queries
are
going
to
be
doing
before
you
start
laying
on
a
table,
and
this
makes
ad
hoc
queries
a
little
trickier,
especially
stuff
with
lots
of
permutations
or
parameters
a
lot
of
times.
B
C
C
B
Media
managers
in
ad
buying
tool
that
we
have
a
management
tool
for
facebook
and
twitter
we
sink.
Two
billion
adds
depth
a
month
which
we
perform
constant
roll-ups
on.
So
as
soon
as
we
get
our
data
back
from
facebook,
we
roll
it
up
at
several
levels
and
we're
actually
sinking
each
each
ad
that
be
published
on
facebook,
or
we
know
about
we're
sinking
all
of
its
data
back
roughly
every
five
to
ten
minutes.
So
it's.
C
B
Up
to
date,
all
this
is
happening
on
a
10,
doge
cluster
and
we're
running
in
Amazon
the
high
idle
instances.
So
we
are
taking
advantage
of
SSDs
now
the
highest
our
cluster
than
right.
Now
we
had
150,000
queries
per
second
executing
up
the
cluster
and
it
performed
admirably.
So
we're
really
really
happy
with
that
and
right
now
we've
got
150
gigs
of
data,
but
that's
growing
right
now
at
a
rate
of
ten
percent
per
week.
B
D
D
The
ads
are
stored
in
this
hierarchical
structure
of
in
media
manager.
Where
you
belong
to
a
team,
the
team
has
folders,
the
folder
contains
campaign,
and
the
campaign
contain
act.
Now
the
app
stats
are
collected
in
an
ad
level
and
as
soon
as
an
app
that
comes
in
all
those
parents,
containers
basically
need
to
have
their
their
stats
updated
as
well.
D
So
we've
gone
with
this
this
method
for
doing
the
roll
ups,
where
we
have
a
set
of
tables
that
are
that
contain
all
of
the
ads
and
stats
for
a
single
campaign
and
date,
and
so
when
we,
when
we
collect
the
staff
for
an
ad
we
go
in
and
we
we
update
the
staff
/.
It
happened,
ok
and
then,
since
this
is
all
a
single
physical
row
on
disk,
we
read
out
that
entire.
C
D
Dates
and
then
the
campaign
staff,
and
then
we
do
that
again
and
roll
that
back
up
into
the
team.
So
when
structuring
our
data
this
way
and
Aquarius,
we
are
able
to
every
time
we
get
an
ad
really
quickly,
roll
up
the
campaign
and
holder
and
a
team
with
something
like
probably
five
to
six
right.
Okay,.
B
B
The
worst
case
scenario
of
when
our
data
is
not
stored
in
memory
hitting
tons
of
different
locations
on
disk,
incurring
penalty
of
multiple
seeks,
for
you
know,
with
with
the
solid
state
drives
and
sound,
is
much
of
an
issue,
but
still
streaming,
data
from
a
single
point
in
disk
is
going
to
be
much
faster
than
doing
a
bunch
of
random
I/o
lookups.
So
one
thing
that
made
it
impractical
to
use
is
that
the
data
that
we
store
is
in
consistently
size
and
constantly
changing.
B
So
the
heuristics
that
uses
in
order
to
allocate
memory
for
documents
is
not
very
helpful
for
us,
because
the
document
size
is
because
the
rows
actually
change
unpredictably.
So
if
we
were
to
use-
and
we
were
to
try
and
stuff
everything
in
one
document,
we
would
be
allocating
many
many
instances
of
memory.
C
B
D
Okay,
so
now
we're
talking
bit
about
shift
shift
is
our
other
product
and
it's
essentially
a
it's
a
real-time
messaging
platform
and
it's
built
for
markers
and
what
it
does
is.
It
allows
people
to
communicate
across
organizations
and
departments
in
a
single
place
or
team.
Additionally,
there
are
a
set
of
third-party
applications
that
work
on
top
of
shift
and
use
it
with
the
violets
API,
and
these
are
also
able
to
communicate
with
with
these
teams
and
then
in
the
future,
but
also
be
able
to
communicate
with
themselves.
B
Okay,
so
the
interesting
component
of
shipwreck.
How
is
our
messaging
platform
so
a
message
when
it's
sent
to
a
team
has
fanned
out
to
everybody
on
that
team
and
a
team
may
have
hundreds
of
people
on
it.
We've
designed
this
thing
to
handle
hundreds
or
thousands
of
people
from
a
technical
standpoint.
As
a
result
of
this,
each
person
has
appt
respectable
view
on
the
message.
So
there's
a
lot
of
metadata
that
surrounds
the
messages
themselves,
whether
or
not
it's
on
red.
B
What
tags
they
have
on
that
message,
several
several
other
components,
so
we
end
up
doing
a
lot
of
rights
and
basically
we're
doing
this
and
way
similar
to
think
of
it
kind
of
like
a
Twitter
stream.
So
when
someone
sends
a
message
when
someone
tweet
something
that
goes
out
and
everybody
that's
the
follower,
get
it.
C
B
The
way
that
we've
architected
our
message
inbox
is
we're
actually
utilizing
the
time
you
I
uuid
column
type,
that's
available
in
Cassandra.
So
a
time
uuid
is
uses
the
uuid
one
algorithm,
which
includes
an
embedded
timestamp
within
the
uuid,
and
this
is
really
useful
because
when
you
look
at
the
way,
Cassandra
can
store
data
on
disk.
If
you
use
the
time
uuid
option,
it
will
actually
store
your
data,
I,
ordered
by
the
timestamp
portion
of
that
uuid,
which
makes
it
really
really
really
convenient.
B
Messages
are
out
of
water,
so
it's
not
enough
to
give
them
to
push
messages
to
each
person.
We
actually
have
to
keep
track
of
the
time
stamp
that
each
person
has
for
that
message,
because
it's
actually
possible
to
have
a
message
bumped
up
for
one
person
and
not
others.
There
can
be
side
conversations
due
to
the
way
that
we
utilize
the
time
you
IDs.
It's
really
really
really
easy
and
fast
to
do
a
query
for
the
first
n
items
and
users
inbox,
because
it's
essentially
taking
the
head
of
a
list
super
fast,
really
awesome.
B
B
C
D
D
Cassandra
1.2
and
it
it
basically
builds
queries
for
you,
build
cql
queries
for
you
and
right
now
we
support
a
TTL
/,
corey
consistency,
blind
table
updates,
batch
cory's
counters
and
all
the
collections
map
set
lists.
In
addition
to
that,
it
will
also
manage
your
schemas
for
you.
It
will
create
tables
that
you
define
as
classes
and
Python,
and
it
will
also
update
your
schemas.
D
If
you
add
columns
to
that
class,
not
automatically,
of
course,
yes
policy
but
they'll
do
it
for
you
anyways,
and
you
can
also
define
compaction
settings
/
table
and
we
just
have
a
feature
which
allows
for
table
poem
or
prisms.
What
table
polymorphism
is?
Is
it
basically
allows
you
to
store
multiple
types
of
objects
in
a
single
table?
So
our
big
use
case
for
this
is
we
have.
D
C
C
C
D
B
All
right
so
coming
up
in
situation,
we
have
a
bunch
of
features
plan
so
right
now
the
system
is
only
set
up
to
work
with
a
single
contender
cluster.
We
want
to
change
that,
make
it
so
that
if
you've
got
three
four
or
five
clusters,
whatever
you
have,
you
can
work
with
all
of
them.
We
want
to
get
the
native
driver
integrated
right
now,
I
believe
it's
in
beta
when
it's
out
stable
everyone
loves,
it
will
definitely
get
it
integrated
in
that's
a
really
really
big
deal
for
us.
B
There's
a
ton
of
performance
improvement
for
they're
going
to
come
along
with
that
with
prepared
statements
and
just
moving
away
from
using
the
wrapper
of
thrift
execute
cql.
It
should
be
a
ton
faster,
more
configuration
options
at
the
table
level
like
lakeside.
We
support
compaction
options
which
is
really
good
because
you
can
optimize
for
different
disks.
We
want
to
take
that
even
further
we're
going
to
have
key
cash
and
go
cash,
configuration
all
the
features
available
in
to
10
to
two
point:
oh,
they
have
their
lightweight
transactions
to
pack,
those
conditional
inserts.
B
B
A
A
B
So
compaction
basically
is
a
way
of
optimizing.
Your
reads:
you
end
up
paying
a
little
there's
a
little
bit
of
a
penalty
for
I/o.
In
a
way
there
are
throttles
to
make
sure
that
it
doesn't
impact
performance.
So
if
you
have
a
really
really
fast
disk,
you
can
kick
up
the
throttle
or
you
can
disable
it
completely.
It's
it's
very,
very
configurable.
The
two
types
of
compaction
are
meant
for
different
types
of
workloads.
B
So,
if
you
have
solid
state
drives,
you
can
end
up
using
what's
called
levels
compaction
which
will
make
sure
that
your
data
is
stored
and
as
few
as
the
tables
as
possible
for
a
given
key,
so
it
minimizes
the
amount
of
lookups
you
have
to
do
so.
It's
it
basically
helps
your
read,
be
super
fast
and
again.
The
whole
aspect
of
it
is
configurable.
How
many
occur
simultaneously?
How
fast
it
occurs
you
you
normally
don't
have
to
worry
about
doing
income
action
because
it
will
actually
do
it
as
part
of
the
Cassandra
process.
B
So
we
will
just
run
them
for
you.
You
don't
generally
need
to
run
them.
It
is
possible
to
start
a
major
compaction
which
will
put
all
of
your
SF
tables
and
do
a
single
large
SF
table,
and
this
is,
if
you're,
using
side
trip,
and
it
will
every
movie
super
super
fast.
But
the
downside
of
that
is,
as
you
grow
your
data,
it's
less
likely
that
one
giant
table
will
be
compacted
with
any
other
tables,
at
least
in
against
inside
tiered,
so
yeah.
C
A
Okay,
great-
and
this
is
one
mm
that
I
will
answer-
will
your
slides
be
available
afterwards,
so
I?
Imagine
you
guys
probably
will
make
them
available
either
on
your
blog
and
through
SlideShare,
but
also
we
are
recording
this
session
and
tomorrow
on
planet
Cassandra,
we
will
post
both
the
link
to
the
video
recording
and
also
to
the
slides
as
well.
A
B
C
A
C
A
C
B
A
B
Not
not
in
the
sense
that
a
relational
database
would
support
them.
There's
there
are
some
that
they're
considered
lightweight
transactions.
Patrick
McFadden
actually
did
a
much
better
job
of
explaining
this,
and
then
I
could
I
know
that
the
his
video
on
this
is
available
through
planet
Cassandra's,
youtube,
channel
and
I
assume
on
planning
center
as
well.
He
deserves
uper
like
amazing
job
at
talking
about
the
features
of
two
point.
Oh
so
I
would.
A
And
to
everyone,
by
the
way
that
you
know
we
touched
on
something
that
it's
really
important
to
understand
in
the
no
sequel
world
is
terminology
can
be
very
similar
to
the
relational
world,
but
the
actual
meaning
is
slightly
different.
So
when
you
hear
Cassandra
supports
transactions
in
2,
dot,
0
and
Patrick
does
do
a
good
job.
Explaining
these
you
know
it
is
not
a
one-to-one
relationship
with
a
with
a
relational
database,
meaning
it's
not
exactly
the
same
as
a
transaction
in
Oracle
or
sequel
server
or
maestro
gosh
lots
loves
coming
through
here.
A
D
A
D
B
We're
dealing
with
customers
that
may
be
buying
tens
of
thousands
of
ads
on
Facebook
and
again,
so
we
essentially
have
a
lot
of
tools
to
make
that
a
lot
easier,
because
otherwise
it's
really
hard
to
do
manually
so
they've
basically
provided
api's
and
we're
one
of
the
third
parties
that
you
know
provides
tooling
around.
It's
actually
making
you
know
useful
or
manageable,
otherwise,
it's
impossible.
B
A
B
In-Memory
data
is
pretty
cheap,
so
basically
we're
not
dealing
with
you
know.
There's
a
10,000
people
on
the
team.
We
wouldn't
be
firing
off.
10,000
queries,
we'd,
be
firing
off
a
handful
of
queries
and
doing
a
ton
of
rights
all
at
once
and
because
those
rights
are
all
push
together
and
they're
all
sequential
writes
it's
really
fast.
So
Cassandra
strength
is
its
right
speed.
So
for
us,
we're
weren't,
really
not
afraid
of
distributing
messages
to
you
know
10
100
or
500
people.
It's
all
the
same
to
us.
A
D
B
Sorry
I
think
I
think
I've
heard
the
question
all
that
differently.
So
basically
the
when
you
do
a
query
against
if
you're,
not
using
the
shard
key
or
the
ID
does
not
know
which
shard
the
answer
is
actually
going
to
come
from,
so
it
will
actually
have
to
contact
all
the
shots
and
then
pieced
together.
The
sections
of
the
query.
A
B
Temiz,
thank
you,
yeah,
that's
kind
of
where
we
were
talking
about
our
like
the
query
flexibility.
So
you
can,
you
can
do
all
sorts
of
wild
stuff
against,
but
ultimately
you
have
to
not
do
that
if
you
have
a
application
of
any
sort
of
like
significant
size,
because
once
you
start
getting
every
note
on
every
query,
your
whole
database
just
drags
to
a
halt.
It's
extremely
inefficient.
A
D
C
B
The
amount
of
work
that's
been
put
into
it
in
order
to
make
sure
that
no
doubt
a'chiz
aren't
a
problem
and
multi
DC
stuff
works.
For
me
it
is.
It
is
phenomenal
from
a
reliability
standpoint
for
us
it
has
been
absolutely
fantastic.
So
when
we're
going
to
we're
more
messing
around
things,
we
take
notes
down
all
the
time
like
it.
It
is
such
a
resilient
database,
so
maintaining
it
from
an
operational
standpoint.
Super
super,
straightforward,
yeah,.
B
As
far
as
as
far
as
developing
against
it
there's
definitely
a
little
bit
of
a
you
know:
extra
work
that
have
to
go
into
modeling
your
data
up
front,
but
I
really
don't
think.
That's
a
bad
thing.
I
think
having
a
really
good
understanding
of
your
data
is
something
that
you
have
to
have
and
I
think
more
often
in
a
knock.
Whenever
you
hear
people
talking
about
performance
problems
with
their
database,
it's
probably
because
they
have
no
idea
what
they're
really
going
to
do
with
their
data
and
there's.
C
A
B
Yeah,
so
we
don't
have
a
lot
of
HBase
experience
really
so
each
base
in
my
memory
still
has
a
few
more
working
components
than
Cassandra
I.
Think
I
think
it
should
I
mean
truthfully,
like
we
haven't,
deployed
it
in
production.
So
it's
really
hard
for
me
to
to
comment
I'm,
not
an
authority
on
it.
I
know
that
I.
A
A
A
A
D
C
D
But
you
know,
email
is
fine
too.
A
So
Jonathan
are
it's
your
cql
engine
and
ORM
type
product?
If
so,
how
do
you
map
the
relationships
between
entities
and
how
do
you,
oh
by
the
way?
This
is
the
second
elasticsearch
question,
so
I
think
that's
take
care
of
the
other
one.
How
do
you
use
elasticsearch?
Do
you
use
it
to
do?
Queries
related
to
your
rolled
up
and
the
other
by
the
way,
just
kill
two
birds
with
one
stone,
the
other
elasticsearch
one
was:
how
do
you
compliment
Cassandra
with
elasticsearch
to
duplicate
a
lot
of
data?
A
D
C
D
D
All
of
you,
like
industrial-strength
queries
that
we
do
that,
like
you
know,
complete
doing
is
part
of
our
day-to-day
operations.
Those
are
in
Cassandra,
but
for
like
the
more
obscure
queries,
we
will
use
lock
screen.
Yes,
we
do
duplicate
our
data.
Typically,
we
will
throw
it
into
elasticsearch
in
in
a
way
that
often
doesn't
look
like
the
table.
Just
because
we
put
it
in
a
lots
of
church
to
facilitate
the
way
that
we're
going
to
be
querying
it
not
necessarily
being
a
correct
representation
of
how.
B
In
the
user
information,
so
in
addition
to
the
user
information,
we
might
put
all
the
teams
that
there
are
basically
a
lot
of
metadata
about
user
that
allows
us
to
efficiently
query.
For
you
know
anything,
we
want
like
the
number
honestly,
the
number
of
permutations
that
we
can
do
with
our
elastic
search
indexes
uncountable,
it's
insane.
B
A
Cool
I've
got
a
couple
of
wide
row.
Questions
coming
up
here.
Ramakrishna
ask
the
case
for
wide
rows.
Make
sense
when
the
data
is
already
compacted
to
an
SS
table,
as
the
data
is
stored
as
one
row
and
read
it
picking
the
data
from
just
one
place.
But
how
will
it
play
out
what
the
data
is
in
multiple
SS
tables,
it'll.
C
B
C
A
C
D
B
Ones
are
probably
a
few
hundred
thousand
or
a
million
right
now,
but
I
mean
Cassandra
technically
has
support
to
go
up
to
a
billion
I,
don't
know
if
anyone
is
really
doing
it
or
if
it's
there.
It's
practical
to
know
that
high,
but
I
do
know
that
some
people
are
above
10
million
and
it's
it's
not
a
performance
problem.
So.
A
B
Is
that
leveled,
it's
a
little
bit
more
I
owe
to
do
the
compaction,
but
you
end
up
with
your
data
in
fewer
SS
tables,
so
we're
we're
actually
going
to
be
moving
everything
over
to
the
solid
state
drives
because
they're
just
so
infinitely
fast
even
on
you
know
you,
even
when
you're
seeking
to
random
tato
you've
got
sub-millisecond
local
time
on
our
right
heavy
workload.
I
think
the
recommendation
is
five
tiered.
A
B
D
C
B
Runs
on
the
note
is
up,
but
I,
don't
think
it's
something
that
happened
like
a
compaction
in
Cassandra,
under
a
heavy
workload,
you're
always
going
to
have
confections.
So
it's
just
part
of
the
process
is
like
it.
Just
it's
something.
That's
just
part
of
a
tool
versus
something.
That's
an
administrative
thing
like
that.
You
would
run
manually
or
automate
to
run
like
once
a
day
or
something.
A
D
C
B
A
A
A
Top
of
the
hour,
thank
you
so
much
for
everyone.
I
would
like
to
flag
the
upcoming
webinar
we're
going
to
take
13
weeks
off
there.
Let
you
guys
have
your
turkey
and
stuffing
and
all
that
good
stuff,
and
we
will
be
back
on
December
twelfth
data
model
on
fire.
That
will
be
the
third
in
Patrick
McFadden
data
modeling
series.
So
if
you
haven't
watched
the
other
two,
the
first
two
you
can
go
to
planet
Cassandra,
org
and
view
the
past
webinars
there.
A
That
will
provide
a
good
baseline
of
Education
for
that
last
in
the
series
and
then
just
a
reminder:
we
have
our
free
online
Cassandra
training
courses
available,
that's
Java
development
with
Apache
Cassandra,
and
that
is
at
the
link
on
your
screen
so
guys.
Thank
you
very
very
much.
We
really
appreciate
it
and
you
know,
let's
look
forward
to
a
third
part
of
this
in
in
2014,
will
have
to
get
brainstorming
on
a
good
follow-up
topic,
for
you
sure.