►
Description
'Tis the season to get all of your urgent and demanding Cassandra questions answered live! Get ready for yet another very special edition of the Cassandra Community Webinar -- Q&A style. Please join us as we close out 2013 with an entire hour devoted to answering as many Cassandra questions as possible. Our panel of experts have a vast knowledge of Cassandra and will be answering questions ranging from a variety of personas; developer, architecture and operations.
A
Hello,
everyone
and
welcome
to
this
Cassandra
community
webinar.
Thank
you
for
taking
the
time
today
to
join
us
and
usher
in
this
season.
I
am
delighted
to
have
with
me
today
three
fine
gentleman
from
the
Cassandra
community,
Patrick
McFadden
he's
the
chief
evangelist
here
at
datastax
alto,
be
an
open
source
mechanic
and
enough
I'll
explain
exactly
what
that
is
in
just
a
second
and
match
them
Solutions
Architect
here
at
datastax.
So
thank
you
to
those
of
you
who
have
already
submitted
questions
for
us
to
answer
on
this
webinar.
A
If
you
would
like
to
go
ahead
and
submit
a
question,
please
use
the
Q&A
tab
inside
of
WebEx
type,
your
question
there.
We
will
get
through
just
as
many
of
them
as
we
can
in
the
remaining
time.
So
before
we
get
started
Patrick.
Why
don't
we
start
with
you
and
you
can
give
us
a
little
bit
about
your
background
and
also
your
your
areas
of
focus
for
Cassandra
and
then,
if
you
could
throw
it
over
to
our
and
then
I'll,
throw
it
to
Matt.
B
What
you
wanted
to
okay,
good?
Well
so
yeah
I'm,
Patrick
McFadden
as
al
likes
to
point
out
and
and
I
always
tell
him
to
stop
talking
about
it,
but
all
those
data
modelling
videos
that
I
do
yeah.
So
that's
me
so
I'm
chief
evangelist
and
a
stack,
the
multiple
solution,
architect
and
work
with
customers
on
a
lot
of
cool
stuff,
and
so
I've,
probably
seen
most
of
the
big
problems
and
I'm
always
surprised
if
every
day
I
find
a
new
one.
B
B
C
Al
I
am
open
source
mechanikat
data
sex.
What
that
means
is,
as
I
mess
around
with
little
projects
and
find
little
corner
cases
and
things
lines
of
Investigation
around
Cassandra
and
storage.
Things
like
why
are
these
this
kind
particular
kinds
of
disks
very
slow
and
try
to
share
that
with
the
community
I've
been
doing
operations
prior
this
operations
for
about
15
years,
so
I've
seen
a
heck
of
a
lot
of
storage
and
servers,
and
things
like
that.
So
those
are
all
fair
games,
questions,
including
things
to
do
with
Cassandra.
B
Hi,
my
name
is
Matt
stump
on
the
solutions
architect
with
this
tax.
Basically,
what
that
means
is
I
help
really
large
customers
deploy
cassandra
successfully
that
can
be
dated
modeling
performance
tuning,
advanced
troubleshooting
planning
applications.
All
that
sort
of
stuff
previous
to
join
dis
tax
I
was
a
customer
for
two
years.
I
was
one
of
the
first
customers
on
solar
and
to
do
so
a
lot
of
experience
both
within
this
box
and
actually
out
in
the
field
using
it.
As
a
user.
B
A
So
last
time
we
did
this
I
tried
to
preempt
who
would
tackle
the
questions.
I
learn
my
lesson
so
I'll
just
read
the
question
and
then
whoever
would
like
to
claim
it.
You
just
you
just
jump
on
it.
So
this
one
is
from
soul
shaker
and
he
says
hi.
First
off
the
keys
are
visible.
If
I
run
a
select
star
from
table,
till
compaction
occurs
for
tombstones
way
to
recover.
Is
there
a
way
to
cut
a
deleted
data
and
Cassandra
and
it
seems
tedious?
B
B
I
mean
that
tombstone
attention
system
marker
and
it
just
a
matter
of
unfortunately,
the
act
of
changing
a
tombstone
into
with
something
real
is
just
rewriting
data.
But
then
you
just
need
to
rewrite
your
data.
I
haven't
seen
anyone
create
something
that
would
go
through,
say
an
FF
table
and
remove
a
tombstone
that
seems
pretty
dangerous,
actually
yeah.
C
B
That's
probably
the
best
you
can
do
it
without
having
to
act
together.
A
bunch
of
code,
yeah,
no
I,
think
every
time
I've
heard
the
question
like
can
I
undelete
a
tombstone
and
then
I
delve
into
what
the
real
problem
is.
What
they're
looking
for
is
something
that
snapshot
does
a
lot
more
efficient
anyway.
So
I
think
that
yeah
unda
leading
a
tombstone
is
just
a
really
hard
way
of
doing
something
you
should
be
doing
in
the
first
place.
A
Okay,
great,
thank
you
very
much
for
that
next
question:
it's
a
two-parter
from
Simon,
so
part
one.
When
can
we
use
triggers
in
production
environments,
I?
Think
that's
more
stability
question!
Maybe
a
date
stags
Enterprise
related
area
there
and
part
two:
can
we
use
triggers
across
different
virtual
data
centers,
for
example,
for
the
insertion
of
a
new
user
profile
into
user
profile
table
in
virtual
data
center
one
and
it
can
trigger
an
insertion
of
a
new
hire
event
in
a
vent
table
in
virtual
data
center
to.
B
B
So
I'll
take
the
first
bed,
so
triggers
became
available
in
Cassandra
20
Cassandra
20
is
shipping.
There
are
people
using
it.
We've
gone
through
three
minor
passes
so
far,
so
currently
203
is
available.
However,
Cassandra
20
isn't
available
yet
in
datastax
enterprise.
So
that
means
the
largest
accounts.
The
largest
users
of
Cassandra
aren't
upon
2
point
0.
So
take
that
for
what
it's
worth
using
triggers,
how
you
described
is
just
n.
No
first,
you
can
use
a
trigger
to
trigger
on
a
mutation
which
then
executes
a
Java
class.
B
That
Java
class
can
do
anything
that
could
be
an
insert
into
another
column
family
that
could
be
a
call-out
to
a
external
service.
It's
just
Java.
As
long
as
you
implement
the
interface
you're
good
to
go,
the
triggers
are
executed
asynchronously
and
they
have
their
own
thread.
Pool
that's
separate
from
the
mutation
stage
in
Cassandra.
Now,
I
think
there's
a
misconception
about
how
data
centers
working
Cassandra
data
centers.
B
So
if
you
have
two
days
since
there's
1a
and
1b,
they
both
participate
in
the
same
ring,
and
so
they
both
have
views
of
all
of
the
information.
And
typically,
you
would
have
a
policy
that
says:
I
want
two
replicas
of
my
information
and
data
center.
A
new
replicants
of
my
information
data
center
beat
Cassandra
just
candles
that
for
you,
and
so,
if
you
write
to
one
of
the
data
centers
that
will
automatically
be
replicated
to
the
second
data
center.
B
The
answer
to
that
is
no
right,
and
I
was
thinking
that
it
was
about
moving
data
from
one
like
having
a
staging
table
or
having
a
propagate
from
one
table
to
another,
which
is
not
like
a
replication
thing.
Sony
has
33
peers,
treating,
but
then
I
would
just
replace
the
word
data
center
with
the
word
key
space
and
there
you
go.
C
B
Typically,
cost
twice
as
much,
but
if
you
start
to
delve
into
it,
is
actually
much
much
cheaper
to
run
SSDs
with
SSDs,
you
can
use
levels
compaction,
which
means
you
can
drive
your
disk
utilization
up
to
ninety
percent.
So
right,
there
you're
almost
getting
twice
the
capacity
for
the
than
the
equivalent
a
fast
drive
or
SATA
disk,
so
you're
going
to
recover
costs
they're.
Also,
the
bottleneck
in
cassandra
is
almost
always
the
disk
I/o
subsystem
as
soon
as
you
go
beyond
a
data
per
node
that
exceeds
the
buffer
cache
of
the
OS.
B
B
Just
got
this
question
the
other
day:
no,
the
cells
that
are
inside
a
partition
are
what
they
are.
There's
no
auto
partitioning
or
anything
like
that.
So
that's
when
you
really
need
to
consider
your
data
model
and
how
you
partition
your
data
from
using
the
primary
key.
So
when
you
have
multiple
partition
keys,
you
want
to
think
about
how
that's
going
to
break
down
your
data
and
how
many
cells
will
be
in
there.
C
Sic,
you
all
does
help
you
with.
If
you
do
have
wide
rows,
it's
in
a
case
where
your
data
model
isn't
a
problem
with
pagination.
You
can
read
through
those
what
wide
rows
without
having
to
have
because
under
load
those
all
into
memory,
all
at
once
and
basketball
over
the
wire
like
it
used
to
do.
You
can
actually
page
over
them.
You
know
a
thousand
at
a
time
or
something
like
that.
B
It's
an
artifact
of
how
counselors
are
implemented.
So
when
you
mutate
a
counter
column,
you
just
write
down
the
increment,
and
so,
if
I
want
to
increment
the
counter
by
one
or
I
just
write
down
post,
one
I
was
going
to
increment
it
by
three
write
down
plus
three
and
then
the
read
path
actually
merges
those
values
when
you
request
it
and
so
you're
merging
multiple
columns
together.
So
it's
difficult
to
do.
Ttls
for
that.
B
B
B
Yeah
I
think
he
you're
misunderstanding:
how
house
snapshots
work?
It's
just
hard
links
to
SF
table,
and
so
you
can
essentially
have
Cassandra
create
a
snapshot
which
is
just
a
series
of
hard
links
to
the
SF
tables.
Every
time
in
Essos
table
is
created
so
you're
there
for
getting
in
incremental
backup
you're,
not
creating
a
full
snapshot
of
your
data
with
every
single
snapshot.
It's
not
like
you're
copying
a
terabyte
of
two
terabytes
of
information.
It's
just
you're
pointing
links
to
your
existing
information.
That's
why
they
are
so
cheap
in
bandra.
A
C
That's
a
little
bit
complicated
if
it's
a
purely
deleted
SS
table
where
there's
no
overlapping
data
with
other
tables
that
are
still
loaded
in
the
system.
I'm,
pretty
sure
you
can
just
copy
those
into
the
data
directories
and
use
node
tool
refresh
and
it
will
pick
them
up
live.
You
need
to
make
sure
that,
ideally,
you
get
it
on
the
right
note.
C
In
the
token
Rick
get
it
in
the
right
part
of
the
token
range,
but
that
can
be
done,
live
and
snapshots
and
that's
the
stables
can
be
moved
around
while
the
system
is
hot,
since
the
NASA
stable
on
disk
is
in
the
musical
file,
it's
not
recommended
you
mess
around
too
much
in
those
data
directories.
Unless
you
really
understand
how
everything
is
laid
out
and
what
the
different
files
are,
but
yeah
an
SS
table
can
be
loaded
online
and.
B
Worst
case
scenario:
there
is
the
SS
table
loader,
which
you
can
use
to
load
at
the
tables
from
a
different
machine
into
a
cluster.
You
can
also
use
that,
if
you're
restoring
at
the
stables
until
clustered
with
different
size
or
using
a
different
partitioner
than
the
previous
cluster
we
have
Okla
jmx
is
one
of
my
favorite
tools.
You
can
use
point
AEV
use
confusing
DMX,
you
can
give
it
a
directory
named
full
you're
at
those
tables,
and
it
will
read
all
of
them
and
streaming
back
into
the
cluster
reorder.
A
Ok,
thank
you
very
much,
so
Patrick
I
think
you're.
You
may
be
pimping.
Your
data
model
talks
with
this
question,
but
awesome
asks
how
is
Cassandra's
data
model
from
big
table
and
HBase?
That's
part.
One
number
two
are
super
columns
removed
in
Cassandra,
20
or
simply
not
recommended
part.
Three
is
the
following
statement:
precise
column,
family
is
a
map
of
maps.
Keys
of
each
map
are
always
sorted
so,
let's
say
part
1,
because
this
is
a
probably
a
long
question.
How
is
Cassandra
data
model
different
from
big
table
and
HBase
I.
B
Suddenly
different
I
mean
yet
they
both
share
the
big
table
roof,
which
means
that
you
have
one
love
lots
of
columns,
and
so
that's
about
where
things
depart,
maybe
some
smaller
subtleties,
like
secondary
indexes,
exists
in
both
systems.
The
bigger
difference,
of
course,
is
that
HP
is
more
of
a
sequential
order,
dro
system,
so
it
can
do
things
like
roast
scams,
whereas
with
cassandra
is
randomized
and
there's
going
to
be.
B
You
know
more
of
an
implementation
detail
at
that
point
where
you
have
region
servers
and
you
can
have
scanners
and
filters
that
are
just
very
different
and
how
you
implement
HBase,
it's
much
more
complicated
in
that
regard.
Now
that
we've
added
cql
a
pretty
much
the
differences,
they're
huge
CTL
makes
the
programming
techniques
and
then
ddl
working
with
cassandra
so
much
easier,
and
it
makes
it
look
from
a
development
standpoint.
It's
just
a
lot
easier.
B
If
you
look
at
some
of
the
projects
around
HBase,
it's
about
building,
trying
to
build
schema
or
something
like
resemble
schema,
whereas
a
CTL
you
just
have
it
it's
enforced
on
the
database
level.
So
as
we
go
forward
with
both
of
those
systems
are
diverging
quite
a
bit
at
this
point,
and
maybe
two
or
three
years
ago
you
could
say
they're
really
really
close
but
they're
starting
to
diverge
quite
a
bit.
B
The
no
they're
not
removed
and
they're
still
not
recommended
I
actually
just
went
through
this
exercise
so
that
the
implementation
and
storage
engine
has
changed
quite
a
bit
and
we're
before
super
calm.
The
biggest
issue
is
super
columns
with
their
created.
This
massive
destabilization
problem,
where
you
had
to
read
in
the
entire
super
column
and
into
the
jb
m
I
I,
could
tell
you
that
I've
seen
more
garbage,
collection
issues
or
super
comes
and
anything
else,
and
that
is
usually
the
case
where
you
do
a
lot
of
reads
up
to
the
columns.
B
C
A
B
It's
truly
gets
a
little
bit
more
complicated
in
the
QL
three
because,
like
you,
can
have
multiple
what
is
essentially
columns
and
map
back
to
our
partition
and
the
the
the
last
element
of
your
primary
key
is
going
to
be.
The
column
is
used
to
determine
sort
order,
so
the
threat
of
the
question
is
generally
to
true.
It
gets
a
little
bit
more
complicated
with
the
QL
three.
C
No
one,
it's
probably
an
ec2
if
yours
to
what
you
were
using
OneNote
for
your
seed
note
in
that
you
had
to
replace
that
instance,
you
just
go.
You
just
go
through
and
just
read
a
different
note
for
your
seat.
Node
I
mean
you
can
have
as
many
in
there
as
you
want,
but
yeah
just
pick
another
one.
Any
node
in
the
cluster
can
be
seen.
B
That's
probably
one
of
the
bigger
misconceptions
that
good
speed
note
is
special,
nothing
specialized.
If
she
knows
just
a
designation,
if
not
kind
of
thing,
he
turn
it
into
like
a
master,
node
yeah,
it's
some
people
will
actually
use
just
you
know
round
robin
DNS
for
their
seed
node
and
points
to
a
different
note.
Every
time.
It's
essentially
the
first
note
that
it's
going
to
be
contacted
in
order
to
get
cluster
information
when
a
node
joins
the
ring.
B
Architecture,
so
you
all
right
have
to
be
performed
through
a
single
machine
and
then
those
rights
are
some
points
replicated
down
the
slaves.
There's
a
couple
of
problems
with
that.
The
first
is
that
your
white
throughput
is
limited
by
the
right
throughput
of
a
single
machine,
or
you
have
to
shard
that
machine.
B
The
other
is,
you,
can
you're
always
exposed
to
the
possibility
of
data
loss
unless
you
turn
on
synchronous,
replication,
so
I
write,
a
piece
of
information
for
the
master
master
will
acknowledge
that
right
to
the
the
client's
innocence
then
persisted
to
disk
and
everything's
good.
Well,
if
the
master
were
to
go
down
at
that
point
before
that,
piece
of
information
is
replicated
to
the
slave
and
that
piece
of
information
is
gone
forever
and
you
don't
have
a
way
to
determine
how
much
information
lip
gloss.
B
It's
just
gone,
so
you
can't
have
true
durability
if
you're
using
master
slave
complication
and
you're
doing
that
replication
is
incrementally,
which
is
what
most
advanced
solutions
do.
So
Cassandra
took
a
different
tack.
We
use
a
ring
of
machine.
Every
machine
is
the
same.
They
are
all
peers.
They
participate
in
the
ring.
Equally
and
when
you
perform
a
write-in
Cassandra,
you
specify
what
consistency
level
you
want
that
rights
to
have.
B
B
You
can
go
even
more
than
that
and
using
a
quorum
consistency
level
with
a
multiple
data
center
setup.
You
can
have
a
guarantee
that
your
data
exists.
Not
only
on
two
machines,
but
also
within
two
data
centers,
and
for
some
customers
that
are
using
there
are
storing
things
like
cryptographic
keys.
This
is
particularly
important
that
information
can
never
be
recreated
and
there's
very
large
real
talks
in
terms
of
dollars.
B
A
Okay,
thank
you
very
much.
So
it's
a
fantastic
explanations.
Thanks
a
lot
man,
okay,
Constantine
off,
and
actually
we
forwarded
you
this
question
in
your
inboxes
because
there's
some
level
of
detail
but
I
think
we
can
probably
tackle
it
at
a
high
level
here.
So
Constantine
had
a
four
node
cluster
running
Cassandra
12.
They
upgraded
to
20
and
enabled
vinos,
but
didn't
shuffle
them.
They
added
two
new
nodes
and
ran
a
repair
and
cleanup
on
each
of
the
six
nodes.
B
Well,
there
is
no
rebalance
operational
v
nodes
that
for
one
thing-
and
that
is
opinion,
I-
think
it's
a
pinion
gear
at
this
point,
I
think
the
first
problem
was
enabling
vinos
without
doing
a
shuffle,
because
that
means
that
all
that
data
is
still
parked
in
the
same
place
and
if
somebody
looks
with
I'm
looking
at
the
email
now
you
know
there
you
can
tell
that
there
is
talk
and
those
that
were
added.
It
didn't
get
all
the
data,
and
this
is
an
x1
and
d1
projects.
B
Without
going
into
details
about
what
we're
looking
at
here,
yeah
you
can
have,
you
can
have
imbalances
pretty
easily
with
vinos,
with
not
impossible
at
all.
So
one
thing
you
should
know
about
be
nodes.
You
will
not
get
a
perfect
balance.
You
will
see
variation,
five,
seven
percent
between
the
nodes
and
that's
based
on
the
amount
of
those
you
have
in
the
system.
B
Know
if
you,
if
you
upgrade,
you,
can
take
a
cluster
in
your
navel
vinos.
You
have
to
run
a
shuffle
after
that.
That's
the
step
to
the
way
we
we'd
like
to
do.
This,
though,
now
is
add
a
data
center
with
v
nose
in
it
bring
up
that
data
center.
Let
the
data
stream
into
that
that
cluster
and
then
use
that
as
your
new
cluster.
B
A
C
No,
not
at
all
I
mean
it
depends
on
what
you
want
to
do
in
your
develop
testing
environment,
but
you
should
be
able
to
run
just
fine
with
a
single
node
you'll
possess
your
schema
run
to
replication
factor
of
one.
It's
not.
We
don't
recommend
that
in
general,
but
if
you
have
two
offers
I
want
to
run
it
under
workstations
or
you
have
small
environments.
Absolutely
you
can
run
a
single
node
arm
whip
down
to
a
gigabyte
of
RAM
people
have
gotten
smaller.
C
Your
performance
isn't
going
to
be
seller,
but
for
non-production
uses
that
can
suffice
and
save
you.
A
lot
of
resources.
I
personally,
have
always
preferred
three
to
five
nodes
for
at
least
staging
clusters.
So
I
get
some
real-world
behavior
I
want
to
see
the
replication
system
in
action
in
my
staging
environment,
but
for
development.
It
doesn't
need
to
be
that
big.
B
So
the
caveat
of
courses,
don't
don't
spin
up
a
test
cluster
and
then
expect
it.
Yeah
I've
seen
this
too
many
times
where
well,
it's
not
performing
very
well
and
or
it's
not
what
I
expected.
You
have
to
balance
that
out.
If
you're
going
to
put
suboptimal
machines
or
a
smaller
configuration,
it's
not
a
performance
environment.
It's
more
of
functional!
Well,.
A
A
So
Ramakrishna,
if
you
go
to
planet
Cassandra
org,
we
have
a
five-minute
interview
there
and
use
case
with
blum
m
in
capital
management
there,
a
hedge
fund
manager
and
they
are
using
Cassandra,
fairly
heavily
fought
for
their
exchange
system.
So
you
can
go
and
read
up
about
that
and
there
are
a
couple
of
presentations
as
well
that
they've
given
up
meetups.
A
B
Can
say
in
general,
the
type
of
use
cases
in
C
and
financial
institutions
are
fraud,
detection
user
screen
analysis
also
mobile
applications
would
need
that
kind
of
scaling.
Friends,
the
mobile
front-end
information
things
like
their
customer
facing
websites
want
that
going
on.
So
these
are
generalities.
The
five
minute
interviews
are
always
going
to
have
more
detailed.
A
I'm
not
sure,
but
that
someone
has
an
IM
on
that
sounds
like
a
duck
as
being
stabbed
or
something
okay.
Next
question
from
Vijay:
if
we
have
different
data,
sets
with
the
same
t,
say
flow
underscore
ID
coming
in
at
different
time
intervals.
What
is
the
best
way
to
model
the
table
so
that
these
new
columns
can
be
inserted
and
are
there
any
performance
implications
with
updates.
A
A
B
This
is
straight-up
time
3d
model,
and
this
has
been
talked
about
aimlessly
tube,
so
that
that's
good
I
mean
there's
plenty
information
about
how
this
works.
I
mean
essentially
what
you're
looking
at
a
date
receive
a
storage
row,
but
in
eq
l
it
works
quite
well.
You
couldnt
create
a
primary
key
say
with
that
slow
ID
and
it
looks
like
and
I've
seen
this
several
times
is
modeling
natural,
but
flows
are
the
tcp
stream
or
something
like
that
and
there's
a
NASA
I
did
avneet
up
actually
on
this
very
topic.
B
Nasa
uses
it
for
this
very
use
case.
So
might
wanna
look
that
up,
but
the
it
works
quite
well.
If
you
use
the
time
staff
as
as
the
value
to
in
your
primary
key,
that
creates
the
partition
for
the
clustering
key
and
the
performance
is
really
good
and
that's
one
of
the
reasons
Cassandra's
so
good,
a
time
series
how
it
lays
out
that
data,
then
the
storage
engine.
So
if
you're
updating
that
data
say
with
a
flow
ideas
X
and
have
several
event,
it
hit
time
Syria
time
anyways,
it
was
started
memory.
B
It
is
stored
on
disk
and
assorted
for
master,
and
you
go
to
retrieve
it
assisting
the
slices
in
the
seat.
The
thing
I
always
mentioned,
though,
is
it,
is,
if
you're
doing
something
with
a
generality,
actual
ID,
think
about
the
size
of
that
of
that
store
or
even
embed
that
that
cluster
is
going
to
get
much
bigger.
So
the
partitioning
you
want
to
try
to
do
is
make
a
slow
ID
in
a
base
like
a
single
day
or
week
find
a
way
to
break
that
down.
B
C
And
what
the
updates
the
thing
to
watch
out
for
is,
if
you're
doing,
updates
on
hold
or
what
we
call
cold
data,
you
could
end
up
triggering
compaction
of
data.
That's
been
cold
for
a
long
time,
depending
on
which
pack
some
scheme
you're
on,
but
that's
the
one
to
watch
out
for
is,
if
it's
not
actually
the
kind
of
time
series
where
your
data
settles
down
and
kind
of
get
stored
on
disk
and
doesn't
change
anymore.
A
B
B
There's
one
there's
a
data
model
entire.
Actually
I
believe
I
did
time
series
data
modeling
in
a
supermodel
big,
some
big
danger,
crazy
yeah,
so
I
cover
actually
I
have
a
paper
up
to
you
on
planet.
Cassandra
is
getting
started
with
time
series
data
modeling,
that's
probably
exactly
what
you
want:
just
replace
weather
station
ID
with
slow
ID
and
go
for
us.
You'll
love
it.
That's
not
even
a
video
a.
A
C
I'm,
pretty
sure,
Patrick
or
Matt
have
seen
that
before
most
the
most
I've
seen
in
production
is
a
couple
dozen
armed
and
I
believe
there
are
some
problems
you
can
run
into
if
you
get
into
the
hundreds,
but
it
does
happen.
It's
just
not
recommended
right.
Patrick
for
math.
B
It's
the
number
of
columns
families,
but
that
there's
a
lipid
tuning
problem,
but
each
each
column
family
has
a
mag
of
overhead
and
splat
allocation,
which
you
can
turn
off.
So
thinking
about
memory
usage,
if
you
had
say
a
thousand
column
families
in
a
in
a
key
space
button,
you
would
have
a
Giga
data
sitting
there
and
that's
just
an
overhead
for
those.
But
you
can
turn
that
off
and
that's
that
is
a
teacher
hasn't
like
one
to
six
or
something
that
but.
C
B
Typically,
what
typically,
when
you
see
that
somebody
taken
an
existing
data
model
from
a
relational
database,
where
you
have
to
break
out
a
separate
table
for
every
single
minute
of
one
relationship
and
they
attempt
to
just
shoehorn
that
into
Cassandra,
because
the
data
model
is
because,
but
the
types
that
you
can
represent
in
a
contender
column
are
so
much
richer
and
you
have
hydros
and
things
like
that.
Typically,
what
you'll
see
is,
when
you
migrate
from
a
relational
database,
to
Cassandra
that
your
overall
table
count
shrinks.
B
So
that
would
be
my
first
inclination
is
that
somebody
hasn't
gone
through
that
process
and
they're,
probably
doing
more
queries
than
they
need
to
be
better
score.
Table
name
I
get
a
little
freaky,
because
that's
clear
that
it
was
just
an
export
of
relational
database.
Then
the
next
bad
thing
you'll
see
is
where
people
take
motive
tables
of
data
and
then
join
them
in
memory
like
an
application.
B
And
it's
just
that's
a
clear
misfire
on
the
data
model
where
I've
seen
a
lot
of
conf
amelie's
is
when
you
have
a
lot
of
indexing
of
your
building
and
that's
a
cool
use
case
and
doesn't
call
families,
but
yeah
I
really
did
sad
with
Matt
scenario
there,
where
I
see
people
just
dumping,
a
relational
database
and
Cassandra
and
hope
for
magic
unicorns.
It's
just
not
really
works.
A
Pennies
on
the
dollar,
okay,
next
question:
this
is
from
mpo
joins.
If
this
is
not
how
you
pronounce
your
name
but
I'm,
going
to
go
with
shady
and
I
think
again,
this
is
a
little
misperception
that
needs
clearing
up.
So
how
can
Cassandra
work
with,
let's
say,
Hadoop
they're,
actually
saying
cloudera
or
Hortonworks,
so
how
does
Cassandra
work
with
Hadoop
and
then
their
examples
Cloudera
homework,
given
that
Hortonworks
uses
HP?
What
are
the
best
practices?
A
C
Was
early,
there
are
a
lot
of
it
dependent
around
that
question.
So
one
of
the
one
of
the
use
cases
on
I've
seen
a
personally
worked
on
that
was
very
successful
in
Cassandra
and
YouTube
together,
it
was
called
era
was
armed
where
there
is
a
bunch
of
log
files
that
need
to
be
processed,
so
they
would
process
those
in
Hadoop
and
then
the
aggregates
would
get
written
up
to
Cassandra.
That's
a
really
sweet
use
case
of
where
the
two
come
together
nicely.
C
If
you
just
use
the
CQ
all
client
from
your
MapReduce
job
to
write
your
results
out
in
the
final
stages,
the
other
forms
you
know
where
you
want
to
do
things
like
pig
and
hive
over
your
Cassandra
data-
it
it
depends,
gets
given
more
nuanced
and
it
depends
on
what
you're
trying
to
do.
There's
DSC
that
produced,
but
there's
also
the
hive
and
pig
side
special
from
the
designers.
That
is
improving
a
lot,
especially
in
future
releases
matter.
Patrick
II
of
Montana.
B
This
pretty
much
spot
on
I
mean,
like
you
know,
if
you
need
to
import
data
from
Cassandra
into
your
Hadoop
cluster
there
you
know,
we've
got
a
bunch
of
classes
which,
in
years
that
utilize
the
streaming
protocol
that
usually
and
efficiently
it
did
it
in
and
out-
and
you
know
if
you
need
to
do
MapReduce
or
do
analytics
over
the
data
13
Cassandra.
Well,
we
have
a
product,
we
can
tell
you
and
call
the
Mystics
enterprise
and
it
works
really.
Well
it
does.
B
A
B
Boyo,
jim,
oh,
no,
don't
do
it
make
sure
you're
on
the
same
version
on
everything
I
mean
because
you
datastax
enterprise
version
usually
corresponds
with
a
version
of
data
to
standard
underneath
and,
for
instance,
you're
not
going
to
mix
a
1
dot
1
and
a
1
dot
to
cluster
together,
just
not
going
to
happen.
So
it's
best
really
to
keep
on
the
same
version
to
eliminate
any
kind
of
emergency
issues
that
could
happen
or
just
you
just
need
to
be
on
the
same
person.
I.
Don't
know
why
you
would
want
your.
C
So
if
your
organization
is
particularly
nervous
about
upgrades-
or
maybe
you
personally
are-
you
can
have
two
data
centers
in
the
same
data
and
you
can
upgrade
one
data
center-
try
to
only
do
this
over
minor
releases,
but
do
that
minor,
upgrade
on
one
data
center
and
then
leave
it
sit
for
as
long
as
it
takes
for
you
to
be
comfortable
as
that
release
is
good,
and
then
you
can
do
the
other
side,
but
you
definitely
shouldn't
leave
the
mix
for
more
than
that.
A
period
of
time
well,.
B
In
an
incremental
upgrade
sure
you're
going
to
be
stuck
in
that
situation,
leaving
it
that
way
for
any
particular
reason,
I
would
recommend.
It's
just
imagine
me
you're
talking
about
an
incremental
upgrade,
which
yeah,
of
course,
you're
going
to
go
from
one.
He
likes
3,
dot,
1,
2,
3,
2,
okay
or
even
three.
Two,
two,
two
three
three
three
three
two
three
that
sure
makes
sense
and
both.
B
A
Hey
you
guys,
remember
Vijay's
question
about
the
data
model
flow
ID
time
series
you
do
good
because
he
has
a
follow-up.
We
need
to
generate
a
lot
of
top-end
reports
based
on
the
flow
information
stored
as
described
above.
That
is
the
flow
ID
data
model
question.
Should
we
use
cql
or
solar
for
this
purpose,
as
datastax
enterprise
supports
both.
B
It's
where
do
you
want
it?
Yeah
I
can
do
this.
So
it's.
Where
do
you
want
to
spend
the
time
to
essentially
performing
the
work
to
get
your
top
at
so
with
solar?
Your
indexing,
the
information
as
it
comes
into
the
system,
so
you're
you
you're
putting
an
additional
cost
on
all
insistence
that
they
call
them
families
that
are
indexed.
B
So
you
have
to
price
that
in,
if
you
do
it
on
the
analytics
side,
then
you're
doing
the
work
after
the
date
is
in
the
system
and
you're
going
to
have
slower
response
times
of
those
queries.
Worse
and
solar.
You
get
immediate
response
time,
so
you're
doing
the
same
amount
of
work.
It's
just
whether
or
not
you
do
it
up
front
and
you
get
immediate
responses
or
you
do
it
out
the
back
end
and
you
get
slower
responses
so
up
to
you
and
what's
your
use
case
that
dictate.
A
A
Malay
from
my
health
care
provider,
Blue
Cross
Blue
Shield
here:
can
you
talk
about
each
base
versus
Cassandra
and
you
know?
Obviously
we
we
are
Cassandra
experts,
so
you
know
we
have
much
more
insight
there,
but
we
can
probably
talk
in
some
generic
terms.
Earlier
we
talked
a
little
bit
about
the
architecture
who
wants
to
take
a
crack
at
each
base
versus
Cassandra.
C
I
have
run
it
in
production,
but
the
big
ones
that
kind
of
stand
out
in
terms
of
when
we
evaluated
a
couple
times
when
I
was
at
we
are
armed
HBase
is
incredibly
complicated
to
set
up
and
operate.
You
have
to
have
zookeeper,
you
have
to
have
HDFS
pretty
much
into
all
of
the
dupe
has
to
be
there,
and
then
you
have
to
set
up
all
the
components
of
a
space
on
top
of
it,
and
there
are
a
lot
of
knobs
to
turn
in
a
lot
of
moving
pieces.
C
So
the
big
contrast
there
is
that
cassandra
has
eight.
The
nodes
are
all
identical
there
they're
homogeneous
arm,
whereas
HBase
has
a
bunch
of
other
moving
pieces
to
think
about
when
you're
trying
to
figure
out.
What's
going
on
so
from
an
operational
perspective
that
that's
always
been
really
important
to
me,
there's
obviously
a
higher
level
different
design.
What
the
other
guys
go
and
do.
B
C
B
B
I
know
each
base
is
more
of
a
subdued
first
type
implementation
and
it
works
well
in
those
environments,
but
for
programmers.
I
think
would
probably
prefer
to
you
something
like
to
draw,
and
I
think
that's
that's
where
we're
headed
and
I
think
that's
probably
good
reason
to
consider
to
dinner
yeah
what
I
mean.
So
it's
like
HBase
is
more
complicated,
got
single
points
of
failure.
The
crossfader
sensor,
replication,
isn't
as
strong.
The
api's
aren't
grade.
Is
the
data
model
sort
of
a
pain
in
the
butt
and
it
twice
as
slow.
B
C
Yeah
in
the
facebook
what
the
example
is
going
to
comes
up
a
lot
and
we
hear
it
a
lot
what's
interesting
is
when
you
talk
to
the
people
at
Facebook.
The
reason
why
they
stuck
with
HBase
was
because
there
they
were
at
the
time
that
Cassandra
was
being
kindled.
They
were
displaced
deep
into
HBase.
They
had
years
and
years
of
investment
into
it.
And
so
that's
why
that
went
down.
But
it's
interesting
to
hear
the
amount
of
effort
that
Facebook
is
invested
in
making
HBase
serviceable
for
their
needs.
C
B
A
Okay,
thank
you
very
much
and
yes
with
any
questions
like
that,
you
know
please
bear
in
mind.
We
are
you,
know,
Cassandra
experts.
First,
we're
always
going
to
come
at
it
from
that
standpoint,
but
you
know
definitely
recommend
you
asking
the
folks
who
are
more
familiar
with
a
space
as
well
for
their
views.
Also,
okay,
so
here
we
go
another
question.
I
think
we
probably
have
time
for
two
more
since
pig
and
hive
have
similar
capabilities.
Do
you
recommend
or
prefer
one
over
the
other
to
use
with
Cassandra
and
why.
B
So
the
they're
both
going
to
do
the
same
thing
hive,
is
going
to
be
a
little
bit
more
friendly
to
your
e
I,
guys
that
aren't
necessarily
coming
from
programming
background
pig.
It
has
a
DSL,
it's
a
little
bit
more
extensible
and
is
has
a
little
more
in
terms
of
capabilities
and
extensibility
that
a
programmer
or
somebody
with
life
programming
experience
I
can
tap
into
that's.
The
only
real
difference
isn't.
C
B
Yeah
I
haven't
heard
the
difference
in
speed.
I
think
that
the
it's
the
direction
of
who
wants
to
rubbing
it
the
operations
cake
that
you
know
there's
a
lot
of
people
know
pigs
for
the
semantics
and
then
hive
of
course
it's
like
a
SQL.
So
let's
be
here
to
approach.
If
you're
coming
from
that
direction,
did
the
speed
one
of
us
has
occurred
to
us,
though,.
A
B
C
A
Okay,
we've
got
another
comparison
question
here:
let's
hit
it
really
quick,
probably
going
to
talk
about
document
oriented
approach,
but
it
is
Cassandra
versus
DB.
Before
you
guys
answer
this
one
is
from
George
George.
If
you
go
out
to
Planet
Cassandra
and
look
at
the
five-minute
interviews,
we
have
dozens
of
examples
of
people
migrating
from
MongoDB
to
Cassandra
and
it
is
always
when
they
hit
either
a
scale
issue.
Where
cannot
keep
up
or
operational
complexity
which
can
get
operationally
very
complex
with
MongoDB,
but
anything
else
to
add
their
folks.
A
C
The
big
difference
is:
if
your
data
is
important
to
your
business,
then
you
shouldn't
be
using
manga.
Dt
I
mean
I,
yes,
I
work
for
datastax,
but
if
you
follow
the
news
or
online
going
to
be
especially
the
recent
stuff
about
going
based
on
the
the
writing
on
the
wall
is
pretty
clear,
I'm
going
to
be
at
a
certain
point,
it's
going
to
turn
on
you
and
lose
your
data.
I
mean
that's
just
it's
track
record
in
the
industry
arm,
you
know,
use
anything
else.
Actually,
it's
cassandra
is
not
a
good
fit
for
your
application.
C
B
C
One
postgres
has
H
store,
which
actually
a
lot
of
people
move
from
mondo
to
that's
a
different
animal
altogether.
B
A
Great
and
with
that,
thank
you
very
much,
gentlemen
for
joining
us
today.
Answering
so
many
questions.
We
are
off
for
the
holidays.
Well,
we're
not
actually,
but
we
we
are
off
from
webinars
until
January,
23rd,
Cassandra
back
to
basics.
In
the
meantime,
I
know
you
all
want
to
get
going
with
your
Cassandra
training.
We
offer
free
online
training,
Java
development
with
Apache
Cassandra,
at
the
link
on
your
screen
right
now,
datastax
Academy,
illogic,
learning,
calm.