►
Description
SlideShare: http://www.slideshare.net/planetcassandra/c-data-modeling-37298974
Shehaaz Saif: Software Development Engineer, Expedia Inc.
A
Hi,
thank
you
for
coming
for
cassandra
data
modeling,
so
start
about
me
just
quickly.
My
name
is
how
I
got
started
with
cassandra.
A
Datastax
had
a
big
data
developer
contest
a
few
like
last
last
april
and
during
finals
I
randomly
found
this
contest
and
I
joined,
and
I
built
a
deal
finding
app
on
android
and
then
I
was
one
of
the
five
finalists
and
they
flew
me
over
to
san
francisco
and
and
that's
where
I've
met
expedia
and
I'm
a
work
on
the
ssl
help
us
group
team.
It's
been
five
months
at
expedia,
so
I'll.
Just
this
talk
is
at
the
end
of
this
talk.
A
I
just
want
to
be
confident,
modeling
data,
so
it's
just
like
john
stock
talking
in
the
morning
and
I'm
using
examples
from
patrick
mcfadden,
his
youtube
videos,
just
so
basically
a
layman's
understanding
of
like
modeling
data
with
examples.
So
you
can
relax
and
just
like
not
to
stress
too
much,
and
this
is
twitter,
so
do
so
with
cassandra
you
have
to
you
have
to
you
have
to
want
to.
You
have
to
know
the
answers
you
want
to
get
out
of
the
data
before
you
put.
A
The
data
inside
the
database,
so
so
it's
a
query,
driven
data
monument
as
john
mentioned
and
don't
worry
about
redundant
data
so
like
so
don't
worry
about
writing
a
bunch
of
writes
to
optimize
for
the
read
and
don't
have
large
partitions.
So
this
is
one
of
the
mistakes
I
made.
When
I
made
my
deal
finding
app
like
I
had
like
the
row,
you
can
see
yeah
it's.
A
This
is
the
time
stamp
and
the
data
was
for
the
deal
that
the
person
would
post
and
this
would
run
out
like
really
quickly
after
you
know,
we'll
reach
2
billion
really
quickly
and
it's
wrong.
So
you
shouldn't.
Do
you
shouldn't
have
large
partitions
use
cql?
Initially
I
use
the
virgil,
which
was
the
rest
api.
It
didn't
support
cql.
It
was
using
thrift.
So
I'd
like
do
everything
on
my
own
on
the
application,
which
is
not
good,
and
it's
like
a
funny
cartoon.
A
You
can
read
it,
but
yes
and
the
sequel
looks
like
sql.
So
it's
just
exactly
like
what
john
showed
this
morning
super
simple
and
then
let's
go
into
an
example.
So
we
have
this
weather
station
and
you
have
they
give
you
this
information.
They
give
you
a
weather
station
id
a
time
and
temperature
and
you're
defined.
You
have
to
be
able
to
dash
this
this
on
a
dashboard
for
the
last,
like
five
minutes,
for
example,
and
how
would
you
do
it
so
the
table
would
look
like
this.
A
For
example,
you
would
have
weather
station
id
and
event
time
where
this
is
an
id
would
be
the
partition
key
and
the
event
time
would
be
the
clustering
column.
So
what's
what's
wrong
with
this,
because
eventually
you're
going
to
run
out
of
columns
because
you're
going
to
keep
on
adding
event
times
and
it
will
reach
2
billion.
So
this
is
a
way.
For
example,
you
would
insert
very
simple-
and
we
do
a
range
query.
A
A
Key
I
found
was
so
what
what's
the
data
I
have
when
I
create
the
table,
like
I
have
the
weather
station
id
and
the
date,
so
that
would
be
the
partition
key
and
the
clustering
column
is
what
I
want
to
sort
by
on
disk.
So,
for
example,
if
I
want
to
find
the
last
five
minutes,
there's
just
one
this
seek
and
I
can
take
the
data
and
then
put
it
on
a
dashboard,
and
you
put
this
ordering
thing
and
you
would
order
it
by
descending.
A
A
Now,
let's
talk
about
collections
really
quickly,
like
john
mentioned
this
morning,
really
he
didn't
give
a
given
example,
but
he
so
let's
talk
about
set
list
and
map
and
what's
great
about
them,
you
can
have
a
dynamic
item
in
a
row,
so
a
user
could
have
email,
one
email
too,
and
you
could
have
a
list
of
emails,
but
the
con
is.
There
are
serial
serialization
costs,
because
when
you're
reading
it
they
have
to
do
serialized
before
before
showing
it.
So
that
would
so
it
will
slow
down
your
application
and
the
the
list.
This.
A
A
One
and
two
I
mean
you
do
an
insert,
you
just
do
add
three
and
it
appends
it
at
the
end
and
when
you
do
zero,
it
prepends
it
in
the
beginning,
because
it
works
by
orders
by
the
secure
type.
A
If
anyone
when
you
want
to
delete
something
used
to
minus
that
element
and
there's
no
read
before
right
and
it
goes
away,
unless
you
you
decide
the
order
of
the
element
or
the
order
of
the
elements
and
for
anyone
to
append
something
at
the
end,
you
would
put
plus
three
and
if
you
want
to
put
it
at
the
beginning,
you
have
to
put
it
before.
So
I'm
deleting
the
same
thing
like
I
said,
maps
are
more
interesting
again.
A
When
do
you
pick
like
collection
now,
collection,
the
list
is
a
type
of
collection.
Collection
is
the
interface
yeah.
So
so,
for
example,
let's,
let's
look
at
a
map
example,
because
the
map
is
very
simple.
So
if
you
want
to
delete
something,
you
get
the
key
adding
and
you
want
to
modify
it.
You
want
to
put
it
in
spanish.
You
put
it
like
this,
so
map
example.
So
you
have
this
user
with
location
table
right
and
you
want
to
have
the
user
location
by
a
map
of
user's
location
by
the
time
you
uid.
A
So
we
want
to
see
where
they
logged
in
for
certain
when
they
logged
in
what,
in
which
location
so
remember
that
was
serialization
cost
with
with
the
collections
right.
So
you
would.
I
set
a
ttl
of
30
days
in
seconds,
so
this
update
would
like
go
away
in
in
30
days
and
and
then
you
would
call
the
now
now
function,
and
that
gives
the
time
in
milliseconds.
A
The
actual
calling.
C
C
A
So
replication
factor
is
how
many
copies
of
data
I
want
in
the
cluster
and
consistency
level.
Is
the
acknowledgements
you
get
from
the
nodes
after
you
do
a
read
or
write.
So,
let's,
let's
know
what
quorum
is
because
john
talked
about
this
morning
is
replication
factor
divided
by
two
plus
one:
that's
the
quorum.
So
if
you
set
a
replication
factor
of
three,
your
quorum
would
be
two.
So
you
after
you
do
a
right.
You
wait
until
you
have
two
acknowledgements
so
there's
something
called
row
level.
Isolation
which
I
found
interesting.
A
Also
so
you
have
let's
say
a
person
has
his
login
is
eric
21
and
he
has
a
he
wants
to
change
it
to
eric
22
and
set
a
new
password
after
cassandra
1.1.
A
It
updates
both
the
logging
and
the
password
with
the
for
the
same
row.
Key
and
there's
no
concurrent
read
would
get
like
eric
21
and
the
new
password
or
eric2,
and
the
old
password
like
both
get
written
or
none
get
written
so
just
kind
of
interesting.
A
It's
really
interesting,
but,
for
example,
if
you
have
a
quorum
quorum,
you
got
to
wait
for
two
responses
to
come
back.
You've
got
only
one
and
there's
a
time
out,
so
the
client
has
to
do
a
redo
to
make
it
make
sure
it
writes
in
because
you
didn't
get
the
second
acknowledgement
from
the
from
the
cluster,
so
index,
and
so
in
indexing.
Secondary
indexes,
like
you
mentioned
in
this
morning,
is
evil
and
you
shouldn't
use
them
so
I'll.
A
Just
give
you
a
quick
example
and
don't
tell
you
so
you
have
this
user
table
and
you
put
an
index
on
state
and
so,
for
example,
we
can
do
like
a
select
star
from
users
where
state
equals
texas.
This
is
kind
of
an
okay,
like
example,
because
there
are
many
rows
that
would
contain
that
indexed
value.
A
So
it
doesn't
destroy
it's
not
really
bad,
but
you
should
avoid
this
type
of
situation.
So
when
not
to
use
it,
when
you
have
something
unique
like
an
email
address,
product
id
or
video
tag,
you
want
to
put
an
index
on
a
video
tag,
because
you
can
tag
it
like
funny
cat
videos.
You
know
grumpy
cat,
like
a
bunch
of
like
a
bunch
of
the
tags
you
could
have
on
videos.
A
So,
for
example,
this
the
this
way
you
create
a
tag
index
for
the
video,
so
this
is
very
fashion
efficient.
So
every
time
the
user
updates
and
adds
a
tag
for
their
video,
you
would
update
the
the
tag
index
with
the
tag
and
the
video
id.
So
this
is
the.
Why
is
this
better?
Because
there
will
be
many
distinct
tags,
so
you
create
you
create
a
separate
index
and
because
we
don't
want
many
disk
seeks
for
a
few
results.
A
So
this
is
a
very
the
final
example.
I
have
is
so
you
have
a
you,
have
a
car
locked
right
and
you
want
to
find
a
car
according
to
the
color
model
and
make
right,
so
you
have
like
seven
combinations,
your
color
model
make
and
and
the
rest,
so
an
example.
Entry
would
be
ford,
mustang,
blue,
and
you
want
to
be
able
to
find
like.
Oh,
I
want
to
find
all
the
blue
cars.
I
want
to
find
all
the
fords,
so
so
what
would
be
the
partition?
The
partition
key
in
this
situation.
A
You
could
make
three
separate
tables,
but
a
more
efficient
way.
You
could
have
a
it's
a
it's
a
compound
partition
key.
So
so
it
should.
You
could
do
make
model
and
color,
because
the
trick
of
having
make
modeling
color,
because
you
have
this
information
before
creating
the
table
right,
the
vehicle
id-
could
change.
There's
a
trick
to
thing
like
how
can
I?
Why
would
I
choose
a
partition
key,
so
something
that's
unique
to
make
modeling
color
right.
A
A
So
you
would
write
seven
inserts
for
this
specific
ford,
mustang
blue,
with
all
the
combinations
and
like,
for
example,
you
put
like
empty
string
empty
string
blue
because
you
won't
tag
it
as
blue,
so
we'd
have
like
all
these
combinations
into
it
and
when
you
want
to
do
a
search
when
anyone
would
read
it's
really
quick
because
you
can
be
like.
Oh,
I
want
to
find
all
the
blue
forwards
and
you'll
quickly
get
it
and
you'll
sort
by
the
this
vehicle
id,
and
you
want
to
find
all
the
cars
that
are
blue.
A
A
When
you
do
a
query,
you
can
do
this
like,
for
example,
the
last
one
in
the
bottom.
There
I'm
doing
like
empty
string
embracing
blue,
and
that
gives
me
all
the
blue
cars,
because
when
I
inserted
it,
I
sent
it
yeah.
I
can't
because
that's
the
way
I
wrote
it
in
right,
so
when
you
read
it
because
you
have
to
give
it
while
you're
writing,
that's
why
you
there's
a
partition.
A
A
B
F
A
D
F
A
D
E
B
D
A
Yeah,
that's
yeah,
there's
seven,
there's
there's
a
list.
E
E
B
E
E
A
Like
a
quick
over
because
he
went
really
in
depth
this
morning,
because
I
want
to
like
go
over
it
again
really
quickly.
So
when
a
client
wants
to
do
a
write,
give
they
give
the
role
the
row
key.
It
goes
into
the
append,
only
commit
log
and
the
mem
table,
and
it
quickly
sends
back
the
acknowledgement.
That's
how
we
can
do
a
bunch
of
writes
and
it
quickly
sends
it
back
acknowledging
the
client
like.
A
I
have
the
data,
it's
all
cool
and
then
once
then,
it
gets
flushed
into
the
ss
table,
which
is
a
string
started
table
that's
inside
in
disk
and
for
example,
now
what
happens
when
an
update
comes
in
when
update
comes
in
the
asus
table
would
be
in
the
mem
table.
It
comes
into
it
comes
in
here,
and
it
sees
like.
Oh
it's
because
there's
a
new
access
table
added.
A
D
D
Okay,
now
I'm
going
to
update
that
vehicle
id
with
I
painted
it
red.
F
D
A
Yeah,
so
you
have
to
so
this
this.
It
became
blue
yeah
everything
that
had
the
color.
So
you
would
update
this.
You
have
to
delete.
A
D
B
D
C
E
E
C
Question
how
many,
how
many
items
can
you
have
in
a
primary
key
in
that
flight?
You
make
model
color
region.
E
A
And,
like
summarize,
like,
I
had
all
my
references
in
this
like
link
I'll,
send
it
to
you
guys
so,
there's
like
four
data
modeling
videos
that
I
based
my
talk
off
just
watch
all
four
of
them
and
their
links
to
it
and
just
like
they're,
very
useful.
So
it's
very
yeah.
So
the
example
that
I
use
used
the
car
made
model
is
from
patrick
and
he
actually
had
it
in
one
of
his
talks
and
talked
about
it.
So
yeah,
patrick
mcfadden,
the
chief
evangelist
of
datastax
and.
C
A
Okay,
sure
and
then
so
the
read
you
would
actually
come
into
the
key
cache
and
the
key
cache
would
check
the
row
key
and
see.
Does
it
map
to
an
ss
table
and
it
would
directly
go
into
the
ss
table
and
get
it
get
the
data,
send
it
to
the
mem
table
and
send
it
back
to
the
client
if
it
misses
it
goes
into
the
bloom
filter.
Bloom
filter
is
a
probabilistic
data
structure.
A
It
basically
finds
where
the
data
is
not
like.
I
tried
reading
about
it
last
night.
It
was
really
complicating
so
I
gave
up
and
then
it
goes
into
it,
so
it
finds
it.