►
Description
Speaker | Eric Lubow (CTO, SimpleReach)
Date | Tuesday October 23 @ 8:30AM PST
Join Eric Lubow, CTO of Simple Reach and DataStax MVP for Apache Cassandra as he examines the types of applications that are suited to be built on top of Cassandra. Eric will talk about the key considerations for designing and deploying your application on Apache Cassandra. This webinar is 101 level.
A
Okay,
so
without
further
ado,
we
will
get
cracking
in
our
college
credit
webinar
series
we're
taking
you
all
the
way
through
from
what
is
nosql
all
the
way
through
to
pretty
advanced
topics,
and
this
is
the
third
in
our
series
and
we're
very
happy
to
have
eric
on
with
us
today.
Eric
is
an
mvp
for
apache
cassandra
and
he
is
also
the
cto
at
simple
reach
and
having
having
met
eric
in
person
and
and
heard
about
his
mixed
martial
arts.
Definitely
not
someone
to
to
mess
with.
A
So
I'm
glad
we
messed
up
from
afar
and
he's
not
sitting
next
to
me
today.
Many
apologies
eric
for
question
and
answers.
So
in
the
previous
webinars
we've
had
tons
of
questions.
So
please
go
to
the
q
a
tab
in
webex
and
ask
your
questions
there.
I
will
be
monitoring
them
and
at
the
end
of
the
presentation,
we
will
ask
eric
those
questions
and
we'll
get
through
as
many
as
we
can.
B
Cool
so,
first
of
all,
thank
you
very
much
for
having
me.
I
will
try
to
do
my
best
to
get
through
some
of
this
stuff,
some
of
the
superficial
stuff
a
little
bit
more
quickly
to
give
everyone
time
for
some
questions,
but
yeah.
So
thank
you
very
much.
B
B
So,
where
are
you
if
you're
in
the
planning
stages,
then
you
know
this
is
definitely
a
good
thing
for
you
to
be
listening
to,
especially
if
you
need
to
try
to
decide.
You
know
what
the
goal
of
your
application
is
going
to
be.
Sometimes
that's
obvious,
and
sometimes
it's
not
just
because
you
know
what
specific
problem
you're
solving
doesn't
mean.
You
know
the
best
way
to
get
there.
I
can
speak
to
that
problem
from
experience.
A
Eric
eric
sorry
to
interrupt,
should
we
should
we
still
be
seeing
your
title
slide
or
your
agenda
slide
cause.
It's
still
on
the
title
slide.
B
So
if
you're
on
the
minimum
viable
product
stage,
then
my
suggestion
is
that
you
probably
are
going
to
spend
the
time
trying
to
decide
what
your
application
is
capable
of
or
needs
to
be
capable
of
coming
out
of
the
gate,
in
which
case
I
would
go
with
what
the
easiest
thing
for
you
is
to
use.
You
know
for
some
people
that
might
be
my
sequel
or
postgres
for
some
people
it
might
be,
and
if
you're
you
know,
if
you're
comfortable
with
cassandra,
then
that
might
be
a
good
place
to
start.
B
If
you
know
you're
going
to
be
moving
into
that
big
data
neighborhood.
But
again,
I'm
going
to
talk
a
little
bit
more
about
that
in
the
future.
If
you're
on
your
iterative
steps,
you
know
like
you've,
already
built
your
minimum
viable
product,
and
you
kind
of
see
this
going
in
the
direction
of
a
little
bit
larger
than
what
you're
used
to
handling.
B
Then
that
may
be
one
of
the
times
that
you
start
exploring
whether
or
not
this
is
good
a
good
fit
for
you
or,
if
you're,
on
your
final
decision,
and
it
is
a
final
decision,
you
probably
should
step
back
and
go
back
to
the
iterative,
there's
really
no
end
state.
You
really
need
to
make
these
decisions
on
a
on
a
continuous
basis,
because
sometimes
the
needs
of
your
app
are
going
to
change.
B
B
If
you're,
building
a
learning
project
then
make
sure
your
learning
project
is
apt
for
what
you're
going
to
be
building
for
the
data
store
that
you're
going
to
be
building
it
with
building
a
plugging.
A
blogging
platform
on
cassandra
is
not
typically
the
best
move.
It's
it's
certainly
doable,
but
it's
just
not
what
it
excels
at.
You
know.
B
If
you're
building
your
own
twitter
setup,
then
cassandra's
a
really
solid
example,
a
great
way
to
learn
and
if
you're
curious
about
whether
or
not
it's
going
to
work,
seems
to
work
pretty
well
for
twitter,
and
you
know
they
got
a
fairly
large
setup
going
on
on
their
end
and
if
you're,
really
just
after
building
a
big
data
system
in
general,
then
twitter,
then
I'm
sorry,
then
cassandra's,
certainly
something
that
you're
going
to
want
to
look
at.
B
B
B
Sometimes
your
data
is
bigger
than
can
fit
on
one
server
and
for
some
people
one
server
might
be
a
terabyte
of
space
and
and
then
you're
starting
to
move
into
big
data
territory.
B
B
So
the
term
is
a
very
loose
term
and
it's
a
very
buzzword
term,
so
make
sure
you
define
what
it
means
to
you
before
you
go
forward
and
here's
the
biggest
truth
about
dealing
with
large
amounts
of
data,
even
with
the
right
tools.
Eighty
percent
of
the
work
of
building
a
big
data
system
is
acquiring
and
refining
the
data
into
use
the
raw
data
into
usable
data.
B
That
is
something
that
most
people
do
not
understand
going
into
the
process,
I
being
one
of
them
and
the
vast
majority
of
the
terabytes
and
terabytes
of
data
that
we
look
at
day
in
and
day
out
at
simple
reach
is
stuff
that
we
had
to
spend
a
lot
of
time.
Turning
into
usable
data,
so
make
sure
you
keep
this
sort
of
thing
in
mind.
B
Now,
I'm
not
going
to
go
through
all
of
these
planning
questions,
they're,
just
they're,
just
really
notes
and
a
guide
I
went
through.
I
went
through
all
these
planning
questions
for
for
simple
reach
and
it
took
an
immense
amount
of
time,
but
these
things
are
all
these
things
are
all
very
specific
to
your
answers
will
be
very
specific
to
what
you
are
looking
for.
B
Some
of
the
basic
questions
are
built
are
broken
down
into
categories,
your
data,
what
type
of
data
how
you're
querying
it?
How
you're
loading
it?
What
type
of
schema
you
need
to
store
it
in
is
going
to
be
a
factor
of
that?
So,
if
you
need
to
aggregate
data
on
the
fly,
then
you're
going
to
perhaps
think
might
be
faster
for
aggregate
encounters.
B
Cassandra
has
has
distributed
counters,
which
means
the
reconciliation
might
be
a
little
bit
slower
on
the
order
of
a
few
milliseconds
or
a
few
seconds
and
depending
on
your
application,
that
might
be
entirely
too
slow.
So
you
really
need
to
look
at
what
what's
important
to
you
in
terms
of
technology.
It
doesn't
need
to
be
fault
tolerant.
Does
it
need
to
be
supportive
of
encryption
standards
depending
on
the
type
of
data
you're?
Storing
does
the
data
need
to
be
distributed?
B
I
think
everybody
got
a
healthy
dose
of
reminder
of
fault,
tolerance
and
distribution.
Yesterday,
with
amazon's
us
east,
one
data
centers
having
trouble
so
the
folks
that
were
able
to
distribute
their
data
and
had
a
fault
tolerance
system
experience
less
of
a
headache
than
those
who
didn't
one
would
hope.
B
Another
big
concern
is
their
support
for
your
language.
Cassandra,
for
instance,
has
a
support
for
has
support
for
a
great
many
drivers
and
languages,
and
when
we
got
into
it,
we
found
out
that
the
node.js
support
wasn't
great.
We
ended
up
having
to
roll
our
own
driver.
Another
company
that
I've
spoken
to
is
interested
in
using
cassandra,
but
most
of
their
technology
is
written
in
go
and
there
isn't
a
go
driver.
So
that's
something
that
you
need
to
be
aware
of.
B
You
know
if
something
goes
wrong
and
you
have
20
servers
in
one
location
and
20
servers
in
another
location.
You
know
that's
people
that
you
may
have
to
send
to
both
locations
if
you've
got
a
physical
data
center
and
if
you
and
the
other
things
you
need
to
keep
in
mind
are
what
legal
requirements
you
might
be
bound
by
folks
in
the
medical
field,
I
know
have
hipaa
to
deal
with.
B
B
B
B
B
We
look
at
approximately
150
to
200
million
events
per
day,
which
translates
to
about
2
000
events
per
second,
so,
on
a
on
a
light
day,
we're
going
to
be
looking
at
about
two
or
three
times
that,
because
each
event
happens
more
than
requires
more
than
one
right.
B
B
So
when
you
look
at
your
query
patterns
for
us
when
we
want
to
query
something
for
we
look
at
things
in
terms
of
page
views,
tweets,
facebook
actions
refer
data
and
the
way
we
get
to
that
data
is
we
slice
groups
of
information
out
of
the
rows,
so
we
want.
Let's
say
we
want
to
see
all
the
page
views
that
happen
in
a
particular
hour.
We
can
slice
out
page
views.
We
want
to
see
all
the
tweets
that
happen
for
a
particular
hour.
We
slice
out
those
tweets.
B
Another
really
important
piece
of
determining.
What's
good,
for
you
is
finding
things
that
have
tool
kits,
so
mature
applications
have
toolkits
built
around
them.
Cassandra
has
off
center
monitoring
a
system
that
is
distributed
in
this
fashion
is
not
very
easy,
especially
when
you
need
to
monitor
in
different
data
centers.
B
So
if
you
find
that
there's
a
particular
data
store
that
can
do
roughly
what
you're
looking
for
and
there's
no
tools
built
for
it.
B
Yet
that's
typically
a
sign
of
you're
going
to
have
to
do
a
little
more
work
than
you
may
want
to
do,
especially
if
you're
still
in
the
early
stages
of
your
product,
like
the
mvp
or
the
early
iteration,
and
one
of
our
favorite
features-
and
I
just
like
to
say
this:
all
the
time
is
is
the
ttling
of
of
certain
columns
or
rows
and
ttling
stands
for
time
to
live,
and
that
basically
sets
allows
you
to
set
an
expiration
date
on
your
pieces
of
data.
B
All
of
october's
data
goes
away
and
that
frees
up
a
few
terabytes
of
information,
at
least
in
our
case,
without
us
having
to
do
any
additional
labor.
B
So
that's
a
very
handy
feature
for
us
and
to
give
you
an
idea
of
what
the
data
looks
like
for
us
a
more
specific
example.
So
what
you're
looking
at
for
for
us
is
we
get
to
take
a
look
at
social
data
from
all
around
the
web
anytime,
an
article
gets
published.
We
look
at
all
the
social
data
that
comes
out
around
that
article
in
real
time.
So
we
take
a
look
at
the
facebook
actions.
We
count
the
page
views
and
we
count
the
tweets
and
we
go
across
all
the
rest
of
the
social
networks.
B
B
B
So
let's
talk
a
little
bit
about
so
one
of
the
things
that's
really
good
about
is
that
it
is
incredibly
fast
for
doing
atomic
increments,
meaning
that,
if
you
want
to
add
one
or
two
or
three
or
whatever,
the
number
is
to
a
number
that's
already
in
the
database.
B
If
you're
coming
from
a
json
based
language
or
javascript
based
language
like
node,
it
works
very
fast.
There's!
No
change
of
language
that
has
to
happen
or
serialization
deserialization
that
has
to
happen,
and
it
allows
you
to
do
things
very
quickly
and
that's
like
a
huge
advantage
for
us
because
of
the
way
we
because
of
our
infrastructure,
we
happen
to
be
a
node.js
heavy
shot.
B
For
example,
cassandra
takes
the
hash
keys,
which
is
how
it
decides
where
your
data
is
in
the
cluster
and
says
I
know
roughly
where
that
is,
and
it
can
go
out
and
get
the
data,
whereas
with
it
knows
exactly
where
the
data
is
in
each
in
every
case,
which
means
that
finding
your
data
might
be
a
little
bit
finding
out
where
your
data
is
maybe
slower,
but
then
getting
at
your
data
is
faster,
whereas
cassandra
takes
the
opposite
approach.
B
B
If
you're
building
the
system
from
scratch
and
you're
early
on
you're
going
to
want
something
that
has
a
good
tie-in,
especially
if
you
have
a
web
app
the
orm,
the
object
relational
manager
for
for
rails,
we
use
id,
but
there's
a
ton
of
them
out
there.
So
you're
going
to
want
something.
That's
able
that
allows
you
to
build
that
web
layer
and
for
us
there's
just
nothing
out
there
that
cassandra
has
so
we
needed
to
use
as
well.
B
The
other
thing
that
gives
us
is
a
pub
sub
system.
So,
if
you're
not
familiar
with
what
publish
and
subscribe,
does
it
really
gives
you
the
ability
to
say
anytime,
something
new
comes
in
pass
it
back
to
the
client
immediately
also
gives
us
b3
indexes,
which
is
something
we
don't
have
in
cassandra,
and
a
b
tree
index
is
essentially
says.
B
The
document
model
that
uses
is
very
handy
for
us
and
again
allows
us
to
ttl
data,
which
means
that
we
can
get
rid
of
it
whenever
we
need
or
whenever
the
clock
expires,
and
that's
really
helpful
for
us
in
terms
of
getting
rid
of
keeping
our
amount
of
free
space
available,
and
this
is
what
the
document
looks
like
for
us.
So
we
get
to
take
a
look
at
all
of
the
data
all
of
the
social
data
for
a
particular
url,
and
we
just
in
this
case,
use
the
increments.
B
B
Now
redis
is
a
whole
different
animal
and
the
reason
I'm
talking
about
redis
is
because
I
want
to
be
able
to
make
the
comparison
a
lot
of
people
compare
and
cassandra
directly
and
when,
in
fact
there
they
have.
You
know
different
use
cases
and
the
same
thing
goes
with
redis,
but
it's
slightly
more
obvious.
B
So
if
you
want
to
talk
about
what's
good
and
what's
bad
about
redis
reddit's,
just
like
or
cassandra
can
support
thousands
hundreds
of
thousands
trans
of
transactions
per
second,
but
they
all
do
them
slightly
differently.
B
But
what's
great
about
redis
is
that
you
can
guarantee
that
everything
that
you
every
transaction
you
go
after
is
going
to
be
stored
in
memory.
Everything
is
memory,
mapped
and
that's
that's
a
speed
thing,
but
it
also
binds
you
to
the
amount
of
memory
that
you
have
on
that
particular
machine,
so
again,
compromise
that
you're
going
to
have
to
make
if
you're
on
amazon,
that's
typically
going
to
be
64
gigs
and
that's
not
a
whole
lot
of
data.
B
B
It
also
allows
some
really
cool
variable
types
like
sets
and
sorted
sets
and
lists,
and
a
lot
of
those
things
are
coming
to
cassandra,
I
believe
in
1.2,
but
you
know
they're
not
out
yet,
so
you
need
to
keep
that
in
mind
when
you're
making
that
decision
about
where
to
you
know
where
to
go.
B
It
also
acts
as
an
excellent,
centralized,
locking
system
so
part
of
the
problem
that
you
could
have
that
you
could
end
up
running
into
with
cassandra.
The
cassandra
is
eventually
consistent.
So
if
you
try
to
write
a
lock
on
one
server,
it
may
not
get
to
the
other
server,
but
by
the
time
you
go
to
check
that
to
see
if
that
lock
exists.
B
So
it's
while
it's
great
for
distributing
your
data
and
writing
things
really
quickly.
The
fact
that
it
may
not
be
there
when
you
go
to
read
it
immediately
is
the
idea
of
eventual
consistency
is
something
again
that
you
need
to
take
into
account,
whereas
with
redis,
because
everything
happens
and
is
stored
in
memory
you're.
Guaranteeing
that
as
soon
as
you
write
that
lock
it's
going
to
be
available
on
the
following,
read.
B
Now
I
did
talk
a
little
bit
about
what
was
negative
about
cassandra
and
a
little
bit
about
what
was
negative
about.
But
it's
not
just
you
can't
just
look
at
the
good
parts
of
each
system.
B
Redis
is
fantastic
as
a
caching
engine,
but
its
limit
is
that
other
than
being
able
to
store
the
data
in
memory,
it
can
only
utilize
a
single
core.
So
if
you
have
64
a
64
gig
server
on
amazon,
only
using
one
core
you're
quickly
going
to
overwhelm
the
registers
on
the
cpu
and
you're
not
really
going
to
be
able
to
get
the
your
best
performance.
B
So
knowing
the
internals
of
your
system
is
also
something
that
you
need
to
keep
in
mind.
I
talked
a
little
bit
about
why
b.
Trees
are
are
important
for
us
and
how
they
could
be
important
to
you
and
one
of
the
things
that's
difficult
to
deal
with
with,
especially
if
you're
in
the
cloud
is
that
they
force
the
ping
times
to
be
very
very
short.
So
whether
or
not
they
check
how
they
check.
Whether
or
not
the
sort
of
sister
servers
are
alive
is
coded
directly
into
the
application
and
not
configurable.
B
So
if
you
have
a
widely
distributed
application,
say
you're
in
amazon,
us,
east
and
amazon,
us
west
odds
are
your
servers
will
not
survive
in
in
a
replica
set.
So
something
to
keep
in
mind
if
you
need
that
distributed
capability,
then
may
not
be
the
right
answer
for
you.
B
B
Series
data
is
that
everything
is
it's
a
very
right
heavy
system
so
as
soon
as
you
write
something
it
stores
a
time
stamp
and
it
writes
them
sequentially,
and
then
it
goes
back
at
the
end
of
a
large
amount
of
writes
and
does
something
called
compaction
puts
all
that
stuff
together
into
an
easily
accessible
format
for
the
system
to
read,
and
that
format
is
always
in
time
series.
So
you
storing
time
series
data
is
native
to
the
application
itself.
B
So
that's
something
to
keep
in
mind
when
you're,
working
with
time
series
data
like
for
sensors
or
for
events
or
anything
along
those
lines,
counters
voting
voting
systems
are
really
great
to
do
in
cassandra,
especially
if
you
have
a
high
volume,
because,
ultimately
you
can
write
those
you
can
make
those
increments
anywhere
feed
based
activity,
which
is
just
like
events,
say
you
know
you
could
think
github.
I
know
a
lot
of
people
use
github.
B
You
have
the
ability
to
every
time
you
get
an
event
just
throw
that
into
a
column
based
on
the
user,
say
the
user
is
the
route
and
you
can
just
start
pulling
that
back
into
the
application
in
chunks
say.
I
want
everything
that
happened
for
this
day
for
this
user
and
get
a
really
good
cross-section
of
what
you're.
After
also
when
you're
after
large
amounts
of
data,
you
really
need
to
think
about.
B
B
The
purpose
of
the
slide,
however,
is
to
explain
that
when
you
want
to
iterate
quickly,
you
want
to
test
your
assumptions:
you're
you're,
building
that
mvp
you're
doing
your
iterative
testing.
The
easiest
thing
to
do,
and
sometimes
cheaply,
if
you
want
to
use
spot
instances,
is
just
to
fire
up
a
couple
things
in
the
cloud
fire
up.
One
or
two
machines
build
a
small
cluster,
whether
it's
cassandra
or
or
reoc,
and
test
test.
Your
assumptions,
there's
libraries
for
doing
just
about
everything
in
the
cloud.
B
If
you
want
to
use
cassandra
you
can
use,
I
believe
it's
called
ccm
and
you
can
spin
up
a
cluster
with
about
four
lines
of
python,
to
test
your
assumptions
and
that's
a
really
good
way
to
get
going,
and
I
believe
those
four
lines
of
python
are
even
in
the
read.
So
it's
really
not
that
much
more
work
than
copying
and
pasting.
B
B
Is
it
going
to
cost
you
an
arm
and
a
leg,
and-
and
this
is
not
intended
to
be
a
a
a
pitch
for
the
guys
over
at
data
stacks,
but
these
guys
have
by
far
the
best
customer
service
I've
had
of
any
of
any
vendor
in
the
in
the
years
I've
been
working
in
I.t,
so
keep
that
sort
of
thing
in
mind.
B
B
I'm
sure
that
many
of
you
who
have
dealt
with
oracle
have
horror
stories
about
trying
to
get
oracle,
to
help
you
out
and
and
when
you're
in
need
of
you
know.
Maybe
you
lost
some
data
or
the
experts
and
having
that
expertise
available
to
you,
you
don't
think
about
it
until
you
need
it
and
when
you
need
it,
it's
probably
too
late
to
look
for
it.
So
make
sure
you
keep
that
in
mind.
B
So
what
happened?
What
did
we
talk
about?
Well,
we
talked
about
planning.
We
talked
about
finding
good,
write
and
read
patterns
for
your
data.
We
talked
about
tool
kits,
but
the
biggest
thing
I
think
we
talked
about
and
that
everyone
really
needs
to
be
aware
of
is
what
compromises
you're
willing
to
make
in
order
to
get
your
data
into
a
fashion.
That's
good
for
good
for
you
and
your
application
in
your
use
case.
B
So
I
hope
we
still
have
enough
time
for
questions
and
thank
you
very
much
for
listening.
A
Thank
you
very
much
indeed
eric,
so
you
can
start
to
submit
your
questions
in
the
q,
a
tab
on
the
webex
or
you
can
go
to
twitter
and
use
cassandra
qa,
and
we
will
pick
them
up
there.
A
In
the
meantime,
as
I
said
at
the
beginning,
this
is
one
of
a
series
of
webinars.
So
today
we
had
eric
is
my
app
a
good
fit
for
apache
cassandra.
In
two
weeks
time
we
will
have
aaron.
Morton
will
be
back
on
to
do
a
look
at
data
modeling
for
apache
cassandra,
so
we
look
forward
to
seeing
you
there
and
then
also
you
know
just
to
put
a
little
plug
in
there.
I
was
talking
with
our
recruiter
yesterday
and
he's
like
hey.
A
B
Yeah
there
is
one
there's
one
question
here
from
which
data
management
system
that
I
moved
to
from
because
did
I
move
to
cassandra
from
so.
A
Your
own
ones
eric.
I
am
not
seeing
questions
okay,.
B
Sure
yeah,
so
the
question
was
which
data
management
system
did
we
move
to
cassandra
from
the
answer
is
actually
when
we
started
out,
we
were
using,
and
that
was
it.
B
But
the
fact
is,
we
actually
haven't
ditched
and
we
we
really
have
no
plan
of
getting
rid
of
anytime
soon,
because
what
we
found
was
that
there
is
really
no
one
good
answer
for
our
problems.
I
mean
we
take
in.
You
know
we
have
terabytes
and
terabytes
of
data
and
the
required
view
is
that
is
different
from
various
parts
of
our
system.
I
mean
we
provide
a
social
analytics
package
and
sometimes.
B
B
The
other
thing
we
use
it
for,
as
I
had
mentioned,
was
pub
sub,
so
those
two
things
are
is
continuing
to
serve
a
purpose,
as
our
you
know,
as
a
primary
data
system
to
us,
and
the
only
thing
that
that
it
would
have
been
good
for
should
have
we
decided
to
use
it
later
on
is
probably
the
orm
you
know
being
able
to
have
the
user
and
account
management
for
the
front
end
application.
B
We
moved
to
cassandra
because
we
found
that
we
were
pretty
much
taking
out.
We
were
pretty
much
bringing
down
on
a
regular
basis,
just
based
solely
on
the
fact
that
the
the
data
input,
speed
and
volume
was
too
much
for
it
to
handle,
and
we
also
couldn't
get
the
distributed
fashion.
We
couldn't
get
the
fault
tolerance
that
we
needed,
so
we,
while
we
still
do
use
it,
is
not
our
authoritative
data
source
any
longer.
B
So
cassandra
works
better
for
the
distributed,
because
there's
no
real
concept
of
active,
active
with
cassandra.
B
The
whole
idea
of
active
active
is
basically
talking
about
the
fact
that
they're
required
to
be
a
master
and
whether
every
right
or
read
has
to
go
through
that
master
at
some
level.
B
The
fact
is
that
when
you
have
a
system
like
cassandra
that's
distributed
without
getting
too
much
into
the
nitty-gritty,
you
can
actually
query
any
node
for
any
bit
of
data
and
that
node
will
act
as
a
coordinator
for
the
query
itself.
So
just
because
all
the
data
is
not
stored
on
a
particular
node
that
node.
Where
knows
where
it
can
go
to
get
the
data
and
beyond
that,
it
doesn't
just
know
where
it
can
go
to
get
the
data.
B
It
knows
where
it
can
go
in
terms
of
distance,
and
when
I
mean
when
I
say
distance
I
mean
like
it
might
be
geography
to
us,
but
it's
topology
to
the
to
the
computer
to
themselves
to
the
network.
So
if
it
has
a
couple
of
other
nodes
in
the
same
data
center,
say
us
east
1a
and
there's
a
couple
of
nodes
in
u.s
west
1a,
there
is
something
called
a
snitch
that
will
say
you
know
what
I
know
that
it's
closer
for
me
to
get
to
the
usc.
B
So
I'm
just
going
to
ask
these
guys
and
if
they
have
the
answer,
then
I'm
going
to
go
here
rather
than
going
all
the
way
to
us
west,
which
has
a
slower
ping
time
to
me.
So
with
the
requirement
for
that
whole
for
the
replicas
having
to
be
very
close
to
each
other.
B
It
actually
puts
you
in
a
somewhat
limited
position,
because
you
can't
have
things
that
are
too
geographically
distributed,
because
if
I
remember
correctly,
the
the
required
replica
ping
time
is
like
one
second,
and
it
just
takes.
You
know
by
the
laws
of
nature.
It
takes
a
you
know,
a
few
milliseconds
just
to
get
from
one
side
of
the
country
to
the
other,
let
alone
one
side
of
you
know
either
side
of
an
ocean
and
if
you
get
any
any
blip
in
that,
then
you're
gonna
have
a
problem.
B
A
Okay,
thanks
eric,
we
will
try
to
get
through
as
many
of
these
as
possible.
Sure
tom
asks
did
you?
Did
you
consider
hbase?
Do
you
have
any
perspective
there.
B
Yes,
we
did
consider
hbase.
In
fact,
when
we
sat
down
to
do
our
original
considerations,
we
basically
looked
at
three
systems.
We
looked
at
react,
cassandra
and
hbase,
and-
and
I
know,
there's
more
out
there,
like
you
know,
google
has
has
theirs
and
amazon
has
dynamodb.
We
ruled
those
out
because
we
didn't
want
to
be
put
in
a
position
where
our
data
is
stored
in
a
place
where
we
can't
get
it
out
so
you're
sort
of
bound
by
what
amazon
or
google
gives
to
you
as
a
feature.
B
And
if
you
want,
for
instance,
we
have,
you
know
we're
still
growing,
but
we
should.
We
have
about
25
30
terabytes
of
data,
and
if
we
wanted
to
pull
that
out
and
still
maintain
some
sort
of
continuity,
that's
a
that's
a
pretty
challenging
feat
in
and
of
itself,
let
alone
you
know
dealing
with
the
30
terabytes
20
to
30
terabytes
on
your
own,
so
we
ruled
those
out.
B
We
looked
at
at
basho's
react
and
we
just
we
didn't
like
the
fact
that
there
weren't
too
many
tools
around
it
and
it
was
still
a
little
immature
at
the
time.
Although
it's
gotten
quite
a
bit
better
in
the
last,
you
know
six
to
nine
months,
so
we
sort
of
ruled
that
out
from
a
maturity
toolkit
perspective
and
when
it
came
to
hbase,
we
sort
of
looked
at
the
two
between
cassandra
and
hbase
and
said:
okay,
what
are
the
real
differentiators
here?
Well,
with
with
hbase
we
had
to?
B
We
knew
we
would
have
to
spend
some
time
with
zookeeper
and
dealing
with
all
the
region
nodes
and
dealing
with
I'm
sorry,
the
regions
and
and
some
of
that
stuff
just
gets
a
little
complex
and
what
was
nice
about?
What
datastex
does?
Is
they
actually
have
a
product
that,
like
kind
of
bundles
that
and
takes
care
of
it
for
you?
And
you
know,
if
you
choose
to
deal
with
it,
you
can,
but
otherwise
you
can
sort
of
ignore
it
and
leave
it
as
a
little
black
box
under
the
hood.
B
But
the
biggest
thing
for
us
was,
you
know:
how
do
we
get
support
and
how
do
we
know
that
the
product
can
do
everything
we're
going
to
need
it
to
do
not
just
now,
but
a
year
from
now
and
in
terms
of
support,
cloudera
does
a
great
job
and
they've
been
doing
a
great
job
of
taking
ownership
of
hbase,
but
going
forward.
B
We
just
didn't
really
see
that
they
were
having
a
whole
lot
of
control
in
terms
of
determining
the
roadmap
and
in
terms
of
all
of
our
tests,
we
found
that
you
know
the
speed
and,
and
everything
was
pretty
similar
between
the
two
of
them
on
a
lot
of
our
query
patterns
and
our
right
patterns,
and
when
it
came
down
to
make
the
decision
we
said
you
know
what
cassandra
looks
pretty
good
in
the
fact
that
we
can
help
determine
the
road
map,
and
we,
you
know
not
only
gotten-
to
do
that
by
speaking
to
the
folks
at
data
stacks
pretty
regularly
but
and
building
some
of
the
drivers
out.
B
But
we've
also
been
able
to
write
some
of
the
features
you
know
write
some
of
the
features
ourselves
and
hand
them
up,
and-
and
you
know
a
couple
of
things
have
been
brought
into
cassandra
as
a
result
of
that,
and
we
just
didn't
see
that
as
something
that
was
possible
with
with
a
space.
A
Great,
thank
you
very
much.
Al
jurgensen
asks
is
cassandra.
A
poor
data
store
for
an
app
that
requires
search,
I.e.
I
want
to
retrieve
by
equality
almost
every
column,
just
a
little
plug
there.
Al
for
date,
stacks
enterprise,
datastax
enterprise
actually
integrates
cassandra
is
the
engine
that
powers
the
platform,
but
it
also
integrates
solar
for
search
and
hadoop
for
batch
analysis
as
well
and
eric.
I
don't
know
if
you
have
some
perspective
in
your
app.
If
you
need
search
for
your
app,
but
you
know,
maybe
maybe
you
could
answer.
B
Sure
I
I
can
give
you
a
very,
very
small
and
basic
answer,
because
we
are
not
heavy
solar
users
we're
actually
in
the
midst
of
testing
solar
out.
Solar
does
not
work
very
well
at
the
moment
with
what
are
called
wide
rows.
It
works
very
well
for
skinny
rows
and
just
about
everything
we
do
is
wide
rows
and
composite
column
based
and
once
you
get
a
little
bit
more
into
the
cassandra
schema.
You'll
you'll
know
a
little
bit
more
about
what
composite
columns
are.
So
in
our
in
our
specific
use
case.
B
It's
actually
not
as
good
as
we'd
like
to
be,
though,
if
the
support
was
there
we
would,
we
would
almost
definitely
use
it.
So
what
we've
ended
up
doing
is
we
load
all
of
our
search
data
into
redis,
and
then
we
use
we
query
redis,
but
the
search
data
is
all
loaded
from
cassandra.
This
way,
redis
being
the
sort
of
ephemeral
store.
That
is
it
that
it
is.
B
If
it
goes
down,
we
could
we
could
bring
up
a
new
one
or
bring
up
multiple
redis
servers
and
just
fire
up
and
just
you
know
warm
the
data
in
the
cache.
So
I
don't
have
a
great
answer
for
you.
I
would
love
to
know
more
about
solar
in
that
case,
but
if
you
know
solar
happens
to
be
something,
that's
that's
a
good
fit
for
you.
I
do
highly
recommend
looking
into
solandra
or
dse.
A
B
Well,
if
you
really
need
the
transactional
nature
of
of
us,
if
transactions
are
really
important
to
you,
I
probably
wouldn't
go
with
cassandra.
You
could
certainly
fake
transactions,
but
I
don't
think
it's
it's
a
good
thing
to
build
transactions
on
top
of
a
non-transactional
system.
B
B
That's
that
low
level
building
that
on
top,
but
you
know
doing
it,
maybe
on
the
longer
end
like
if
you
store
the
initial
transactions,
and
this
way
you
can
do
I'm
sorry
if
you
store
the
initial
order
and
then
you
use
the
like
a
system
like
mysql
or
like
something
that
does
transactions
very
well
in
the
front
end.
B
A
And
eric
I'd
like
to
say
that
that's
what
we
see
a
lot
of
customers
doing
where,
if
you
think
of
your
old
oltp
paradigm,
the
olp
goes
into
cassandra
and
the
t
stays
in
a
relational
database.
We
see
oracle.
We
see
my
sql
quite
a
bit
so
that
you
know
that
that's
pretty
much
what
you
just.
B
A
There
are
cool
trevor
asks
it's
worrying
that
cassandra
is
such
a
low
version
level.
He
has
an
oracle
background,
and
this
suggests
it
will
be
buggy
as
not
yet
heavily
used.
Any
any
comments
there
right.
B
B
I
mean
the
amount
of
bugs
that
we
found
and
we've
been
using
it
for
probably
about
nine
ten
months.
At
this
point
we
we
found
a
lot
of
bugs
early
on.
Then
we
moved
into
dse,
we
found
very,
very
few
and
the
ones
that
we
found
really
weren't.
They
were
edge
case
bugs
so
you're
always
going
to
run
into
that.
B
Considering
that
you
know
at
this
year's
open
world,
they
just
decided
to
really
take
on
this
cloud
thing
head
on.
Considering
that's
been
around
for
quite
a
few
years.
You
know,
I
think
you
you
always
have
that
risk.
But
again
this
is
this.
Is
a
chicken
and
egg
thing?
The
less
people
that
adopt
it,
the
less
people
are
going
to
find
bugs
the
more
people
that
adopt
it.
The
more
bugs
can
get
worked
out.
So
you
know,
if
you,
if
you
want
to
take
the
chance,
you
know
it's
pretty
safe
system.
B
At
least
it
has
been
in
our
experience,
but
yeah,
that's
a
personal
decision.
You
know
those
are.
That
would
be
a
great
question
for
the
planning
slide.
Are
you
you
know
willing
to
take
that
sort
of
risk.
A
Master
or
chowdhury
asks:
can
you
share
your
data
modeling
experience
with
cassandra
and
and
I'd
like
just
to
reiterate
that
our
next
webinar
in
two
weeks
time
is
focused
on
data
modeling
in
cassandra.
Aaron
morton
will
be
doing
that
once.
Can
you
share
your
experience
briefly.
B
Sure
iteration,
that
is
our
data,
modeling
experience.
B
I
would
say
that
when
we
first
started,
we
probably
did
it
wrong
about
10
or
12
times
in
a
row,
but
we
knew
we
were
going
to
get
it
wrong
the
first
few
times
you
know
we
did
something
and
it
worked
for
a
couple
of
days
and
then
we
all
of
a
sudden
found
out
that
we
couldn't
query
like
that
very
efficiently.
B
So
we
tried
something
else
and
this
just
this
happened
over
and
over
and
over,
because
you
know
when
you're
solving
problems
that
haven't
been
solved,
you
know
very
either
before
or
don't
get
solved
very
frequently
you're
going
to
run
into
these
data
modeling
challenges.
B
So
the
idea
is
that
keep
keep
your
ability
to
iterate
open,
try
not
to
tie
yourself
down
until
you've
found
something
that
actually
works
for
you,
one
of
the
things
we
did
was.
We
actually
asked
the
folks
at
data
stacks
if
they
could
send
someone
out
to
our
office
and
spend
a
day
with
all
the
engineers
and
teach
us
about
modeling,
and
we
ended
up
doing
that
and
we
made
a
lot
fewer
mistakes
after
that.
B
But
there
is
going
to
be
a
learning
curve,
just
as
there
is
with
any
system.
I
mean
you're
going
to
find
out
what
the
query,
what
the
query
profiler
through
the
query,
profilers,
which
things
maybe
may
work
better
if
you're,
using
your
data
in
one
form
or
another
you're
gonna,
some
query:
optimizers
are
gonna
work
better
than
others
like
each
data
store,
has
its
own
particular
issues.
B
So
what
you
may
think
would
work
well
coming
from
a
my
sql
background,
isn't
going
to
work
in
and
if
you're
coming
from
a
background,
it
may
not
work
in
cassandra.
So
data
modeling.
There
are
some
basic
paradigms
to
follow
and
I'm
sure
aaron's
going
to
cover
those
very
well
in
the
next
webinar.
But
the
fact
is,
you
have
to
be
flexible
and
you
have
to
be
able
to
iterate.
A
One
more
in
rationale
asks
aside
from
needing
to
handle.
B
A
Me
aside
from
needing
to
handle
more
volume
or
increasing
velocity,
can
you
talk
about
some
other
considerations
for
moving
from
relational
to
a
big
data
model?
And
just
just
before
you
give
your
answer,
you
know
in
your
webinar,
you
know
you
highlighted
time
series
and
multi-data
center
replication.
Those
are
obviously
two.
So
are
there
other
things
that
you
would
bear
in
mind
other
than
volume
and
velocity.
B
Sure
the
biggest
thing
that
I
think
many
people
have
a
discomfort
with
when
moving
to
big
data
is
that
in
relational
systems,
you're
always
concerned
about
you
know
normal
form.
Third,
normal
form.
Fourth,
normal
form
whatever
it
is,
and
you
find
that
when
you
get
to
these
new
data,
modeling
systems
that
involve
big
data,
you're
going
to
store
your
data
once
twice
three
times
four
times
some
cases
even
five
times
and
you're,
showing
really
the
same
thing
just
in
a
few
different
ways.
B
There
is
a
couple
of
particular
sets
of
data
that
we
store
six
different
ways,
and
I
know
that
sounds
like
completely
eccentric,
but
it's
really
the
best
way
to
access
your
data,
so
something
you
know
and
when
I
say
six
ways,
I
don't
mean
six
ways
in
cassandra.
I
mean
I
think
we
actually
store
it
three
ways
in
cassandra
and
then
in
two
ways
in
redis
and
one
time
in
my
sequel,
seven
actually
because
it'd
be
one
of
so.
B
The
the
issue
then
comes
from
having
a
comfort
with
the
new
paradigms.
People
who
come
from
the
old
school,
our
rdb
rdbms
world,
tend
to
have
a
little
bit
of
resistance
to
change,
and
that's
not
everybody
but
you'll
find
that
when
you
say
you
know
we're
going
to
need
to
store
this,
I
understand
that
it
takes
one
terabyte
to
store
this
data,
but
in
order
to
really
access
it,
the
way
we
want
you
may
have
to
store
it
three
or
four
times
and
all
of
a
sudden.
B
You
have
four
terabytes
that
you
need
to
store.
It
becomes
a
little
bit
more
of
a
like
a
cost
challenge
and
a
paradigm
change
that
some
people
just
aren't
very
comfortable
with.
So
it's
it's
a
lot
more
cultural
than
one
would
think
when
you're
moving
from
relational
to
big
data-
and
I
certainly
think
that
that's
something
to
keep
in
mind.
A
So
eric
thank
you
so
much
apologies
again
for
the
issues
we
were
having.
Please
join
us
in
a
couple
of
weeks
time
on
the
7th
of
november
for
data
modeling
for
apache
cassandra.