►
From YouTube: C* Summit 2013: The World's Next Top Data Model
Description
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Slides: http://www.slideshare.net/planetcassandra/c-summit-2013-the-worlds-next-top-data-model-by-patrick-mcfadin
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
A
This
is
like
I,
said
the
third
part
of
a
series,
and
so
the
saga
continues.
So
this
is
a
there
will
be
no
models
on
stage.
This
is
a
data
modeling
discussion.
So
if
you
do
not
want
to
do
data
modeling
on
Cassandra
you're
in
the
wrong
place,
but
don't
move
because
I,
like
the
number
that's
good,
so
I
was
going
to
try
to
beat
Adrian
on
headcount
and
I,
don't
think
I'm
going
to
do
it,
but
I'm
close
I'm
really
close.
So
we
started
this
whole
series
together
well
by
show
of
hands.
A
I
guess
not
all
of
us,
but
you'll
get
there.
So
what
the
data
model
is
dead,
long
live.
The
data
model
was
really
all
about
going
from
the
relational
modeling
world.
So
my
background
is
I
was
a
relational
DBA,
very
relational,
developer.
I
think
a
lot
of
you
have
probably
built
in
the
same
boat.
So
how
do
you
get
from
point
A
to
point
B,
and
so
that
was
really
the
start:
how
to
get
from
a
model
relational
data
with
multi?
You
know
multiple
tables
and
normalized
data
forms.
A
A
So
in
bash
grips
right,
there
was
only
one
dude
that
ever
wrote
a
bash
grip.
The
rest
of
it's
just
copy
so
same
with
pearl
Mike,
my
clear
you
were
here,
I'm,
sorry,
it's
the
truth:
n
yeah,
mr.
I
wrote
a
pearl
driver
for
Cassandra,
I'm
going
to
copy
it,
and
then
I'm
going
to
make
my
modification
so
so
we're
just
gonna
go
through
a
few
little
topics
to
catch
everybody
up,
because
you
know
we
got
to
so.
A
Why
does
the
admonitor-
and
so
I've
said
this
before
I
just
said
it
this
morning
to
pankaj
my
friend
I
said
what
Cassandra
lives
closer
to
your
application,
and
so
this
is
really
where
the
data
model
is
much
more
important
with
Cassandra,
because
if
it's
living
close
to
your
application,
there's
no
generalizations
and
it
makes
you
think
a
little
differently
about
how
you
deploy
your
applications.
So
we
always
talk
about
well.
Cassandra
has
a
right
use
case.
Yes,
it
does
have
a
right
use
case.
I
was
an
Oracle
DBA.
A
What
did
I
want
to
do
in
pankaj,
I'm
gonna
go
back
to
use
you
as
an
example
we'll
just
use
it
for
everything
and
because
it
was
what
we
had,
but
then
it
didn't
work
out
for
a
lot
of
things.
I
want
to
do
multiple
data
centers
or
things
like
that.
I
love,
memcache,
I
use
a
lot
of
memcache,
but
that
doesn't
work
for
everything.
So
we've
gotten
to
this
point.
Polyglot
persistence
is
where
we're
at
now
is
2013.
We
store
our
data
differently
for
different
applications
so
for
different
use
cases.
A
So
when
we
do
that,
we
have
the
opportunity
to
really
screw
it
up
or
make
it
cool.
So
I
I've
seen
this
a
hundred
times,
I
work
with
customers,
all
the
time,
I
go
out
and
talk
to
people
and
I,
see
the
wrong
data
model
or
I,
see
the
right
data
model,
so
the
right
data
model
winning
and
the
wrong
data
model
there's
a
sad
panda.
So
let's
not
be
sad
pandas.
This
is
what
we're
trying
to
do
right.
A
B
A
So
when
to
use
Cassandra-
and
this
is
the
nutshell-
version
and
I
put
in
little
tiny
textear-
it's
real
tiny.
It's
about
that
big.
These
are
orders
not
hands
the
here's.
A
few
bullets
and
I
think
you
probably
heard
this
enough,
but
I'm
just
gonna
reiterate
for
the
rest
of
the
review
crew.
Here
we
need
to
be
more
one
data
center.
That's
a
requirement!
Okay!
Well,
if
you
have
multi
master
active
active,
what
other
choices
are
you
going
to
have
that
eliminates
a
lot
of
choices?
So
how
about
the
scaling
problem?
A
I
used
to
spend
tons
and
tons
of
time
trying
to
figure
out
performance
and
capacity
planning,
and
what
did
that
mean?
Well,
that
means
that
somebody's
got
to
give
me
an
accurate
number.
Well,
this
is
how
many
users
we're
gonna
have
okay
as
an
engineer.
What
did
I
do?
I
just
x,
10
so
and
then
alright,
I
get
a
call
HP
or
dell
and
say
I
want
the
biggest
box
you
got,
and
that
was
capacity
planning
well,
when
you
really
have
no
clue
and
or
you're
really
not
wanting
to
put
that
into
the
problem
domain.
A
Sandra's
a
good
choice
because
it
scales
so
well
when
you
need
it,
so
the
other
thing
that
I
hear
is
I
need
this
maximum
uptime
and,
what's
funny,
is
I
feel
like
people
have
been
lying
for
so
long
about
up
time,
because
I'm
I
did
it
to
maintenance.
Time
doesn't
count
on
your
up
time.
Oh
no,
that
was
planned.
Maintenance,
of
course,
is
down.
Well,
what
do
you
mean?
You
were
down
your
customers
couldn't
get
to
your
website.
Well,
yeah!
That
was
planned,
okay.
So
what
did
that
mean?
A
You
probably
had
to
do
it
like
three
a.m.
on
sunday,
and
that
was
exciting.
So
getting
out
of
that
world
is
awesome
and
so
getting
closer
to
a
hundred
percent
up
time.
There
you
go.
Cassandra
is
very
good
choice
for
that,
so
we
also
have
the
problem
of
money
because,
let's
just
face
it,
you
know
we
go
to
the
vp
of
finance
to
say
yeah.
We
need
an
unlimited
amount
of
money
to
this
application
because
it's
going
to
be
cool.
What
is
that
going
to
be?
A
No,
so
we
have
to
think
about
dollars,
and
so
we,
when
we
want
to
do
scaling
or
we
want
to
do
multiple
data,
centers
how's-
that
gonna
play
out
money
wise.
I
bought
a
lot
of
oracle
my
day
and
when
I
use
golden
gate
that
just
quadrupled
the
price
and
that
sometimes
made
it
so
well.
I
guess
we're
just
gonna
have
to
deal
with
it.
It's
in
one
data
center,
okay,
and
even
then
it
wasn't
that
good
of
a
solution,
so
it
just
makes
sense
economically
and
then
the
final
thing,
that's.
Why
we're
here
right?
A
A
So
here
we
go.
What
we're
gonna
do
today,
we're
gonna
go
through
for
real
world
examples,
real
world,
meaning
I
get
to
see
it
all
the
time
and
we're
gonna
try
to
go
through
those.
It's
not
going
to
be
a
deep
dive
into
the
application.
Let's
face
it.
We
got
an
hour
here,
I'm
going
to
be
crunched
to
try
to
get
that
much
time
in
I
want
to
leave
some
time
at
the
end
for
questions
and
answers,
because
I
know
you
have
a
lot
I'm,
just
quick
disclosure,
I'm
gonna
be
over
there.
A
After
my
talk,
so
you
can
come
up
and
talk
to
me.
Let's
just
try
to
but
keep
in
mind.
Everybody
has
questions.
I
know
that
well
I'll
try
to
get
through
Maz
quickly
as
possible,
but
I'm
not
going
to
be
able
to
dig
through
your
app
with
you
just
today,
but
there's
plenty
of
time
left
in
the
year,
so
we
can
figure
it
out,
so
we're
going
to
go
through
the
use
case.
Basically,
it's
just
here's.
What
they
were
trying
to
do
is
what
they're
trying
to
accomplish
and
then
how
we
did
that
now.
C
A
Cop,
that's
me
you're,
giving
away
my
company
secrets.
No,
not
now
I
see
these
all
the
time,
so
there's
I'm
not
even
away
anybody,
and
in
the
case
of
where
I
was
close,
I
probably
anonymized
it
some
and
I
blended
a
couple
too.
So
don't
worry
your
you're
not
going
to
pop
up
here,
maybe
a
little
bit,
but
not
a
lot.
So
all
these
examples
are
in
c
ql.
Three.
We
don't
do
any
thrift
on
these,
but
c.
A
Ql
three
makes
this
a
lot
easier
for
us
to
exploit
so
I'm,
going
to
express
the
data
models
and
c
ql.
Three.
It's
going
to
be
a
lot
easier
because
it
is
a
very
elegant
way
to
describe
a
data
model.
Here's
exactly
how
you're
going
to
store
your
data
and
there's
some
caveats
with
that
will
go
through
all
quick.
A
A
This
is
probably
the
thing
that
we
all
need
to
just
get
over.
If
you
don't
believe
it,
and
so
we
say,
there's
some
terminology
changes.
I
know,
as
you
know,
from
the
community.
This
has
really
been
kind
of
an
interesting
transition.
You
know
it's
it's
moving
from
one
house
to
another,
but
trust
me
we're
getting
there
together
and
I've
seen
a
lot
of
help
on
this.
So
if
you've
never
heard
of
grifter
you've
never
used
it.
Then
this
isn't
a
problem,
but
I
know
where
we
are.
A
We
have
some
people
in
this
world
and
some
in
that
world.
It
does
do
the
wide
rows,
meaning
that
you
know
you
could
have
lots
and
lots
of
columns
and
when
I
tell
people
that
they're
like.
But
this
is
a
fixed
schema.
No
really
and
Jonathan.
Did
this
really
great
blog
post?
I
really
suggest
everyone
ready.
A
It's
like
rows
and
columns
what's
really
happening
behind
the
scenes.
It
will
help
you
and
we're
gonna.
Look
at
some
of
these
models
here.
So
can't
put
my
hand
in
my
pocket.
My
friend
Rebecca
said:
I
put
my
hand
in
my
pocket
too
much
so
I'm
not
going
to
do
any
more.
It's
like
saying
with
your
pocket.
A
First,
the
end
model
shopping
cart,
it's
not
yours!
It's
somebody
else's
so
I
put
on
here,
cuz,
giving
customers
giving
you
money
is
a
good
thing
for
uptime
right
because
that's
true,
you
don't
want
your
shopping
cart
to
go
offline.
So
here
is
a
use
case.
We
want
to
be
able
to
store
it,
reliable
and
meaning
that
it's
when
someone
says
I
want
to
buy
something.
A
It
goes
in
there
and
stays
there
and
we
want
to
eliminate
downtime,
because
if
that's
not
available
we're
not
making
money
again,
let's
go
back
to
what
the
point
here
is
we're
making
money
on
our
website.
So
multiple
data
centers,
really,
if
you're,
if
you're
chasing
up
time
and
you're
in
one
data
center,
I
really
you're
faking
it
you're
not
doing
it
right.
So
the
other
thing
is
a
cyber
monday
problem
and
I
hear
you
know
everyone
has
a
type
of
problem
like
that.
A
A
So
what
we're
trying
to
avoid
here
and
what
has
been
the
problem
in
this
one,
is
that
for
every
minute,
you're
offline,
you're
losing
money
and
that's
really
not
okay,
and
so
you
know
that's
that
was
kind
of
the
driver
here
and
speed.
You
know
that
was
the
thing
that
Amazon
did
a
couple
years
back
my
history
in
web
performance.
That
was
one
of
the
things
we
were
chasing
as
the
fact
that
for
every
you
know,
millisecond
or
10
milliseconds
of
latency
on
your
web
app.
It's
something
amazon
calculated
using
him.
A
A
Those
of
you
who
have
gone
through
my
whiteboard
discussions
before
kind
of
looks
familiar
done
it
so
here's
my
whiteboard,
so
my
plan
is
that
each
customer
is
going
to
have
one
or
more
shopping
carts.
That's
right!
We're
gonna
have
something
really
cool.
They
can
have
two
or
three
or
four
shopping
carts
and
we're
going
to
denormalize
that
data
so
that
we
can
get
it
fast
d.
Normalizing
mean
putting
all
that
data
into
the
same
row.
A
So
when
I
ask
for
a
shopping,
cart,
I
get
all
of
the
information
back,
so
one
shopping
cart
equals
one
partition
or
that's
a
row
on
the
storage
on
the
storage
side.
So
that
means
we're
going
to
get
isolation.
Row-Level
isolation,
I'm
not
going
to
go
through
row
level
isolation.
But
if
you
go
back
a
couple
of
my
webinars
I
talk
about
it,
some
and
that's
really
how
you
make
sure
you
have
consistent
data
on
a
single
row.
So
then
each
new
item
is
going
to
be
a
column,
so
see
I
got
a
laser
pointer.
A
So,
as
the
cart
goes
in
I'm
gonna
have
this
partition
key,
which
is
a
row
key
underneath
in
the
storage
engine
of
a
username
with
a
card
ID,
and
then
this
wide
rows.
It's
just
all
these
items
and
it's
gonna
be.
Who
knows
how
big,
but
that's
cool
right,
because
what?
If
somebody
wants
one
hundred
and
three
items
great
about
200,
great
five?
Well,
you
gotta
work.
I,
have
another
data
model
to
get
him
to
six,
but
we're
gonna
try
to
make
it
so
that
it's
flexible.
A
So
whenever
someone
puts
random
amounts
of
data
in
their
great,
let's
do
it.
So
here's
and
I
really
love
this
big
screen,
because
last
year
I
was
on
smaller
screen
and
everything
that
I
put
up
was
micro.
I
watched
the
people
in
the
back
of
most
fall
out
of
their
chairs
or
like
what
is
that?
So
this
is
huge,
the
so
what
I
do
it?
I
create
I
have
a
couple
of
tables
here.
So
I
have
my
user
table
up
here.
A
Whoops
back
back
back,
there
I
my
user
table
at
the
top
here
that
it's
just
gonna
debts,
just
our
normal
user
or
its.
You
know
an
entity
where
we
have
a
username
and
a
first
name
last
name,
but
part
of
this
is
I,
have
a
set
now
part
two
of
my
webinar
we
will
went
into
collections
and
collections
are
really
a
useful
tool
in
c
ql
because
it
give
you
some
dynamic
portion,
but
they
keep
you
on
the
same
row.
A
You
could
denormalize
data
really
effectively
by
end,
but
also
put
that
random
element
on
top
of
it
and
so
using
a
set
list
or
map.
You
can
do
some
things
like
that
now
I
use
them
quite
a
bit
just
because
I
like
that
flexibility,
but
I,
think
you'll
see
that
they
I
do
have
some
really
interesting
uses.
So
what
I'm
doing
here
is
I'm
just
storing
a
set
of
shopping
cart,
so
one
user
has
3
10
100
spine.
A
They
can
have
that
many,
and
so,
whenever
someone
logs
into
our
website
again
thinking
about
this
from
an
application-level
someone
logs
into
our
website,
they
get
a
list
of
their
shopping
carts.
As
part
of
the
result
set,
then,
whenever
I
want
to
display
those,
I
have
those
already
I
can
go
get
the
the
bar
chart
is
going
to
be
storing
that
particular
shopping
cart.
So
what
I'm
going
to
do
at
this
point
is
that
I
have
a
shopping,
cart
table
and
there's
a
couple
of
things
that
I'm
doing
here
that
are
kind
of
interesting.
A
So
again,
it's
going
to
be
by
username,
who
are
they
the
cart
name,
and
then
I'm
going
to
start
having
these
item
ids
down
here
now.
These
two
things
right
here
are
going
to
make
up
my
partition
key.
So
that's
that's
that
part
right
here
that
I
created
when
I
created
my
primary
key.
That's
basically
one
row!
So
one
user,
one
cart!
One
row
awesome.
The
item
ID
makes
it
so
that
it
partitions
into
a
wider
format
and
it'll
put
it
all
on
one
row.
A
So,
whenever
I
put
in
new
items
like
up
here,
so
here's
my
one
storage
row,
one
partition
so
I
have
here's
my
this
is
me
and
I
have
a
shopping,
cart
called
gadgets.
I
want
and
then
here's
the
actual
item
that
I
want
to
put
in
there
and
then,
if
you
notice,
I've
created
a
map
here
at
the
end
for
dynamic
information
and
that
that's
again,
yes,
I,
said
I
use
it
a
lot.
A
But
it's
effective
I'm
going
to
use
that
map
part
down
here
to
put
in
these
random
stuff,
like
related
information
or
volume,
discounts,
just
things
that
are
applicable
to
that
particular
item.
So
when
someone
goes
to
my
website
and
they
click
on
their
on
their
shopping,
cart,
I'm
going
to
be
able
to
get
all
the
items
out.
So
I
have
all
of
this
information
in
one
poll
and
it
will
be
one
row.
So
it's
going
to
be
pretty
fast.
So,
let's
go
back
to
our
the
way.
A
Our
data
is
fast
and
Cassandra
row
look
up
for
speed,
and
then
we
do
that
slice
or
in
this
case,
looking
at
a
full
partition
for
the
efficiency.
This
keeps
us
all
in
one
nodes,
very
quick,
we're
talking
about
milliseconds
of
access
time
here.
So
this
is
great
we're
just
satisfying
a
couple
of
things
here.
It's
gonna
be
fast
for
our
users,
so
you
know
they're
not
going
to
walk
away.
The
other
thing
is
that
this
is
going
to
work
really
well
in
a
multi
data
center
model.
It's
not
over
multiple
rows.
A
It's
on
one
row,
but
it's
it's
I
mean
it's
good.
Cassandra
Cassandra
just
does
multi
data
center
perfectly.
So
that's
my
shopping
cart.
Data
model
see
I'm
doing
with
time.
Here.
Yes,
use
your
activity
tracking.
Now,
I'm
not
gonna,
go
into
a
full-blown
data
science
thing
here
we
got
people
to
do
that
all
the
time.
What
I
want
to
do.
This
is
get
into
more
of
the,
where
the
rubber
meets
the
road.
The
actual
use
case
where
people
are
using
that
data.
A
If
you
go
to
strata
you'll
hear
about
all
these
cool,
you
know
cool
things
that
people
are
go
yeah.
We
got
this
guy
who's
got
a
PhD
in
statistics.
You
figured
out,
you
know.
If
I
had
this
one
user,
they'll
click
on
something
and
if
they
click
on
that
they'll
go
by
three
more
of
these
things,
and
that
was
awesome.
Well,
that's
not
what
we're
trying
to
accomplish
it.
What
we're
trying
to
do
is
find
action
with
that
we
have.
A
We
can
watch
our
users,
so
we
want
to
react
to
that
in
real
time,
as
users
are
doing
things
on
our
website
and
I
know.
We've
all
seen
this
before
it's
a
little
creepy
whenever
things
happened
like
that
you're
going
on
a
website
or
you're
like
on
Amazon
or
Google,
or
something
like
that
and
you
click
on
something
and
all
sudden
things
start
popping
up
that
seem
a
little
related
and
you're
like
I
do
want
that.
A
Well,
that's
that's
part
of
what's
going
on
behind
the
scenes,
but
you
need
to
be
able
to
react
to
that
in
real
time.
If
they
came
back
to
you
say
an
hour
later
and
said,
wait,
oh
I
think
you
might
have
wanted
some
nails
too
late.
I'm
gone
dude.
So
that's
where
the
real
time
component
comes
in
right.
A
So
we
have
all
these
application
pods
that
need
to
be
supported
and
a
pod
in
an
application
sense
is
that
there'd
be
in
different
data,
centers
or
maybe
a
different
racks,
but
they
need
to
be
spread
out
all
over
the
place,
and
it
could
be
that
we
have
different
applications
themselves
that
are
watching
activities
that
can
be
cross
talking
to
each
other.
There's
plenty
of
companies
that
have
multiple
properties
underneath
one
umbrella,
so
you
get
the
crosstalk
between
those
and
the
scale
I
mean
that's
I.
Think
that's
going
to
be
always
a
good
reason.
A
So
the
the
bad
part
of
this
is
that
the
company
in
question
was
was
having
a
hard
time
because
they
were
losing
those
moments.
They
knew
that
they
had
actionable
items
that
were
going
through
and
they
were
missing
him
and
you
know
there's
nothing
worse
than
leaving
money
on
the
table.
So
in
that
case
they
needed
to
be
quick
on
this,
and
Hadoop
is
just
too
long
now.
A
Hadoop
can
create
models
and
do
things
very
well,
but
in
this
case
they
needed
to
action
that
need
to
be
right
on
it
milliseconds
and
they
needed
to
be
ready
to
go
so.
Here's
our
whiteboard
again
I
love
white
boarding,
so
here's
the
kind
of
like
a
high-level,
a
diagram
of
what
it
would
be.
So
here's
our
here's,
our
dude,
it
was
on
our
website
and
as
they
were
walking
through
our
website
were
making
decisions,
and
we
have
this
interaction
decision
algorithm.
A
That's
algorithm,
not
algebra,
so
that
that's
going
to
be
making
those
decisions,
but
they
have
to
have
input.
That's
the
hadoop
or
data
science
is
really
good
for
feeding
up
on
figuring
out
those
models.
But
then
it
comes
down
to
you
need
to
have
data
to
put
into
the
function
machine.
You
know
input
and
output
and
that's
a
little
harder
sometimes.
So
what
are
you
going
to
do
with
that
data?
And
sometimes
it's
not
just
one
thing.
It
may
be
a
course
of
action,
maybe
four
or
five
things
so
I'm
getting
feedback
from
my
website.
A
All
the
time
now
keep
in
mind
we're
trying
to
do
this
at
speed
velocity.
How
much
are
we
trying
to
do
this?
Okay,
five,
a
lot
of
people
on
cyber
monday,
I'll
click
on
on
my
website
and
I'm
trying
to
get
page
lift
then
we're
probably
talking
thousands
or
hundreds
of
thousands
of
clicks
per
second.
So
we
need
to
be
ready
for
that.
A
So
every
interaction
point
so
as
they
go
through
the
system
is
being
stored
in
a
table,
and
that's
that's
where
all
that
speed
is
going
to
come
from
the
long
term,
interaction
we're
going
to
break
that
out
into
a
separate
table.
Now
there
here's
a
concept
and
I've
been
if
you've
had
me
in
your
in
your
office,
whiteboarding
and
footing.
You've
probably
heard
me
say
this:
a
hundred
times
do
not
be
afraid
to
write
to
multiple
tables
because
Sandra
loves
rights.
So
if
you
got
to
do
five,
ten
hunter
tables,
/
interaction,
awesome
it'll!
A
Do
it
no
problem
got
that
and
what
does
that
mean?
That
means
you're
gonna,
be
ready
for
the
read
or
whenever
you
need
that
data
so
do
that.
So,
in
this
case,
I
have
this
requirement.
I
want
a
really
short
table.
I'm
gonna,
I'm
gonna,
show
you
how
I
do
this,
but
I'm
gonna
have
one
table
for
that,
and
I'm
gonna
have
a
longer
table
for
that.
Longer.
Interaction
like
I
want
to
store
it
out
later,
and
that
gives
me
some
options.
A
Then
I'm
going
to
use
a
dupe
on
that
longer
table.
But
that
means
that
the
old
did,
the
data-
that's
hot,
that's
fast,
I'm,
just
gonna
dump
it
yeah,
that's
right:
I'm
gonna,
get
rid
of
data
and
that's
kind
of
a
cardinal
sin
in
data
science,
but
I'm
gonna
do
it
so
the
other
thing
is
I
want
to
use
a
reverse
series,
and
you
probably
see
me
do
this
a
lot
and
I'm
going
to
show
you
why
this
really
makes
it's
about
speed.
We
want
to
be
as
fast
as
possible
on
the
database.
A
So
here's
my
data
model,
the
data
models
I'm
going
to
have
this.
These
two
user
activity
tables-
one
of
them
is
hot,
and
one
of
them
is
more
of
a
long
tail
table
and
really
the
biggest
difference
is
how
I'm
storing
that
data
in
them.
So
the
first
one
all
right
class.
Remember
this
one!
Don't
we
and
reverse
order
is
kind
of
my
own.
My
secret
weapons,
not
secret.
It's
just
a
great
weapon.
Reverse
ordering
again
means
that
as
I'm
storing
data.
So
I
have
my
user
activity
table.
A
Here's
my
person,
the
time
that
it
happened
and
then
some
activity
codes
with
some
details,
but
look
at
my
primary
key.
The
row
partition
is
going
to
be
the
username
and
then
I'm
gonna
have
my
my
columns
all
part
of
it
interaction
time.
So
that
means
I'm
storing
this
really
dynamic
row
data
columns
going
like
crazy
all
the
time
stamps
on
them
until
it
falls
off
the
stage
like
I'm
about
to,
but
I
don't
want
to
look
up
data
over
there.
A
I
want
to
look
at
the
last
thing
that
happened
so
I'm
gonna,
reverse
that
meaning
that
I'm
gonna
have
all
my
time.
Series
data
reversed,
naturally,
for
me
and
I
see
some
people
not
in
their
head.
Cuz
they've
seen
me
do
this
before
it's
cool
right,
but
that
means
when
I
say
I
want
the
last
ten
things
that
happened:
you're
looking
at
the
last
ten
things
on
that
storage
room,
you're,
not
iterating
over
a
hundred
thousand
items
or
10,000
items
to
go
to
the
end.
A
That's
just
not
efficient,
so
we're
just
going
to
store
it
in
a
reverse
format.
On
the
longer
tail
table,
I'm
going
to
put
the
interaction
date.
So
now
I'm
going
to
partition
the
Rose
themselves,
so
every
day
has
all
the
interactions
for
users.
This
is
going
to
make
it
easier
for
me
to
later,
when
I
want
to
run
hadoo,
where
I
can
create
a
Hadoop
job
that
will
iterate
over
all
those
rows
per
day.
A
If
I
wanted
to
do
a
range
of
days,
it's
just
an
easier
query
to
do,
but
it's
just
formatted
in
a
better
way
and
I'm,
not
reversing
it
I'm
just
going
to
keep
it
in
its
natural
order,
which
is
fine,
because
whenever
I
again
like
with
pig
or
hive,
that's
it
doesn't
really
matter.
I'm
going
to
be
iterating
over
a
lot
of
data
anyway.
So
let's
just
keep
it
like
this.
So
what
I'm
doing
whatever
I'm
inserting
my
data
into
user
activity?
Is?
A
You
will
notice
right
here
using
this
TTL,
I'm
going
to
expire
that
data
after
30
days?
I
just
want
to
have
30
days
worth
of
data
in
that
one
table.
The
other
table
doesn't
have
that
it's
just
going
to
be
forever.
Not
school.
Now
bonus,
it's
not
up
here,
but
I'm
going
to
tell
you
is
another
thing
you
could
do.
Is
your
column
family?
You
can
name
that
with
a
different
name.
You
could
put
the
month
in
there
or
a
quarter
or
something
like
that.
A
So
if
you
actually
did
want
to
keep
your
column,
families
separated
or
just
wanted
to
use
that
as
a
way
to
separate
your
data,
you
can
do
that
and
so
that
that's
just
an
option
in
there,
but
really
this
30-day
expiring.
Now
again
with
TTLs,
you
get
a
delete
for
free,
pretty
much.
That
means
I
mean
when
we
run
a
delete
on
oracle
or
even
my
sequel.
Would
you
run
a
delete
in
the
middle
of
production
day
if
you
had
to
say
delete
a
million
rows
in
rhetorical
know?
A
The
first
thing
you
get
is
a
phone
call
from
dbas
telling
you
you're
crazy,
but
it's
is
gonna,
create
a
lot
of
redo
logs.
This
is
a
great
solution
to
that
I.
How
many
batch
jobs
have
we
all
written
called
cleaner,
yeah
yeah,
so
this
is
gonna
really
help
out
right,
because
now
you're
gonna
have
this
idea
of
you
know
your
date
is
gonna,
be
gone
after
30
days
now,
keep
in
mind,
there's
no
kill
switch.
This
is
going
to
happen.
A
A
Really
hard
to
undo
that,
so
just
make
sure
that's
what
you
want
your
data
model,
so
now
what
how
am
I
going
to
use
my
data
when
I
have
this
now,
so
this
is
where
that
reverse
really
helps
things
out
so
up
here
I
have
my
select
from
user
activity
limit
5
by
doing
that,
I'm
only
getting
the
first
five
items
great,
very
efficient,
but
what
I
got
here
was.
I
got
this
whole
thing
right
here
like
okay,
so
I
logged
in.
I
went
into
my
gadgets.
I
want
I
deleted
it.
A
Then
I
created
one
company
has
verse.
We
guess-
and
I
went
to
jewelry
well,
oh
and
the
decision
engines
on
man.
Okay,
it's
time
to
go
me.
Let's
put
some
flowers
in
front
of
this
guy
I,
don't
know,
maybe
a
trip
to
Cabo
where
ru
Chad
alright,
so
you
were
gonna,
really
use
that
information
effectively.
So
this
is
kind
of
a
funny
example,
but
you
can
think
of
some
really
creepy
things
you
could
do
with
this,
which
is
cool
and
that's
actually
what's
being
done
in
real
world
this.
A
A
All
right
now
are
we
doing
on
time.
I
want
to
make
sure
we
have
plenty
of
time
for
questions
all
right,
we're
good
log
collection,
prot.
This
is
where
I
came
to
Cassandra
I
had
a
lot
of
logs
and
I.
Think
a
lot
of
people
find
themselves
here
because
of
the
way
that
Cassandra
stores
ate
it.
So
there's
two
problems
right:
log
collection:
it
means
that
something
is
fitting
out
a
lot
of
logs
and
it's
never
just
one
at
a
time
small
bits.
Now
it's
tons,
it
comes
like
a
torrent,
its
cure.
A
This
amazing
amounts
of
data
out
there
that
are
creating
now
from
machines.
So
we
have
to
have
something
and
to
keep
up
with
it,
so
the
high-speed
logging
part
is
that
was
a
requirement
in
this
particular
application.
So
we
also
need
to
have
our
Cassandra
nodes
near
where
the
logs
are
being
generated.
Now,
if
we
were
in
multiple
data
centers
with
our
applications,
we
want
to
make
sure
that
Cassandra
nodes
are
right.
There
too,
we
don't
want
them
to
go
across
data
center
laughing
at
one
is
a
bunch
of
bandwidth
bills.
A
Of
all
this
data,
just
getting
streamed
all
over
the
place.
I
want
Cassandra
to
manage
the
replication
in
a
more
efficient
way,
but
make
sure
that
the
applications
just
connect
to
those
local
Cassandra
notes.
That
was
my
that's
one
of
the
big
ones
and
then
I.
I
also
have
this
requirement
where
I
need
to
kind
of
pre
dice
my
data
now.
A
This
is
even
better
example
of
what
I
was
talking
about
earlier
about
writing
to
a
lot
of
tables
and
I've
advocated
this
so
many
times
with
the
rest
of
you
and
that
really
drives
us
home.
Is
why
I
hope?
But
that
was
one
of
the
requirements,
because
we
have
dashboards
what
dashboards
do
other
than
make
sea
level.
People
happy.
Why
else?
But
I
mean,
if
there's
like
a
lot
of
things
on
there?
Oh
it's
pretty,
but
they're.
A
You
know
they
have
some
use
I've
been
to
the
etsy
office,
and
you
know
they
have
this
wall
of
plasma
and
it's
just
a
bunch
of
you
know,
graphs
on
there
and
people
look
at
them,
sometimes
there's
meaning
in
it,
but
that
kind
of
stuff
requires
some.
You
need
to
have
speed,
because
a
graph
from
yesterday
is
pretty
boring
and
it
really
isn't
going
to
give
you
any
idea
what's
going
on
today.
A
So
the
bad
side
of
this
is
that
the
scale
I-
and
this
actually
happen
to
me-
I
just
couldn't
get
no
scale
out
of
my
relational
database
for
some
of
the
logging
that
needed
to
be
done.
If
I
denormalize
my
data
for
one
thing,
that
it
hurt
the
ingestion,
speed,
I
couldn't
index
anything,
it
just
turned
into
a
big
problem,
and
of
course,
so
this
many
times
before
my
oracle
box
was
scaled
right
up
until
I
ran
out
of
money
and
when
you're
storing
logs
it's
really
hard
to
get
a
lot
of
money.
A
For
that
because
it's
hard
to
say
well,
this
is
worth
a
million
dollars
of
Exadata.
No,
it
isn't
alright.
So
that
was
one
of
the
really
big
problems
to
get
around,
and
so
the
batch
analysis
too
it's
just
too
late.
Bouches
batch
batch
has
a
lot
of
use.
Don't
get
me
wrong.
I
love
doing
data
science
on
a
lot
of
data,
but
when
you
need
a
dashboard,
you
fire
up
a
hive
job.
It's
over.
You
know
you
lost
everyone's
attention
span
and
you
know
the
people
looking
at
it.
They
have
a
short
one.
A
A
That's
one
use
case
from
the
same
data
point
and
then,
with
my
late,
my
successes,
I'm
gonna,
throw
this
up
into
my
name's
fancy
graphs
over
here
for
eye
candy
later
awesome,
so
I'm,
really
taking
that
one
data
point
and
I'm
doing
multiple
things
with
it
really
cool,
it's
not
like
recycling,
but
this
data
model
is
not
that
hard.
It's
just
kind
of
a
concept
thinker.
You
got
to
really
put
here
your
mind
and
then
look.
You
have
to
look
from
it
from
where
it's
gonna
be
the
consumer
ingesting.
A
A
So
I
have
my
three
tables
here.
The
log
lookup
is
pretty
boring,
but
that's
okay,
because
I'm
going
to
have
a
lot
of
other
things
going
on,
so
the
log
look
up
is
just
I'm.
Gonna
have
a
source,
so
I
have
my
source
my
date,
so
I'm
going
to
be
storing
each
row
is
gonna,
be
one
minute
of
data
because,
let's
face
it,
those
logs
can
come
in
milliseconds.
So
we
have
date
up
to
the
minute.
A
So
by
using
this
compounding
here
we
have
a
source
and
a
date
to
the
minute
as
a
rocky
and
then
just
timestamps,
and
that's
just
going
to
create
how
many
columns
we
need
and
just
keep
going,
keep
going,
keep
going.
So
the
other
thing
I'm
going
to
do
here
is
I'm,
just
gonna
store
the
raw
log
and
I'm
gonna
gzip
that
and
that's
a
really
good
idea.
A
I've
seen
this
is
a
good
effect
and
play
a
lot
of
places
where,
if
you
just
have
like
some
JSON
or
XML
god
help
you
can
sit,
it
takes
just.
It
takes
a
few
microseconds
and
Java
to
run
it
through
a
compressor,
and
then
it
just
saves
so
much
on
wire
speed.
If
you're
putting
a
2k
block
of
text
over
the
wire
at
you
know,
10,000
per
second,
it's
going
to
add
up,
but
you
can
probably
crunch
that
down
by
fifty
percent
or
more
and
alt
you're
saving
is
wire
time.
That's
good!
A
And
let's
face
it.
When
you
do
a
lookup
like
this,
you
can
then
reverse
this
and
say
give
me
all
the
logs
from
this
source,
and
this
time
you
know
this
time
period
and
then
pull
it
back
and
deserializer
decompress
it
at
that
point.
That's
an
application
concept.
It
works
pretty
well,
but
our
other
two
here
are
much
cooler.
A
So
now
we're
going
to
we're
actually
going
to
get
into
some
interesting
uses
of
counters,
now
counters
have
good
uses
and
bad
uses
in
this
case
I'm
going
to
say
this
is
a
good
use
case,
because
what
we're
trying
to
do
is
just
create
numbers
that
will
be
graphed
for
people
looking
for
eye
candy.
So
in
the
login
success
is
here
we're
going
to
have
this
source
and
date
up
to
the
minute
again.
A
So
we're
going
to
do
this
as
the
same
type
of
key,
except
in
this
case
we're
not
compounding
it
so
we're
just
going
to
be
marking
these
sources
and
date
up
to
the
minute
and
then
marking
each
one
of
these,
so
we
get
counts.
So
that
means
that,
from
a
counter
standpoint
and
I'll
show
you
the
code
how
to
do
it.
That
means
it's
going
to
be
incrementing
a
counter
in
that
minute.
So
if
I
have
10
things
that
happen
in
that
minute,
it'll
be
the
count
of
ten
hundred
100.
A
That
sort
of
thing,
but
that
way
I
can
say
how
many
things
happen
over
these
many
minutes
and
I'll
get
multiple
counts,
I'm,
also
because
I
love
it
I'm
gonna
reverse
this.
So
I
can
ask
what
are
the
last
10
things
that
happen
last
10
minutes
of
things
that
happen
so
I've
created
to
table
serv,
you
notice,
I've
created
a
login
success
and
a
login
failure.
I'm
really
busting
out
this
I'm
dicing.
A
So
this
is
what
I'm
going
to
do
with
that
actually
creating
some
data.
So
we
have
this
one
simple,
select
command.
This
is
really
what
we've
done:
the
beauty
of
what
we've
done
here
when
we
ingested
that
data
and
diced
it
up
it
made
it.
So
we
have
this
opportunity
now
to
create
this
really
stupid,
simple
select
and
that's
not
going
to
be
very
long
to
run.
A
That's
going
to
be
a
few
milliseconds
to
run
and
what
I
get
out
it
is
I
say,
give
me
the
last
20
minutes,
because
I
know
everything's
in
there
for
a
minute.
So
I
get
this
nice
graph
of
data
and,
if
I
hit
refresh
I'm
gonna
get
up-to-the-minute
on
everything.
Now,
if
I
want
to
change
that
to
something
like
milliseconds
or
whatever,
that's
fine
and
the
counters
are
going
to
be
counting
along
in
the
background
so
notice
that
there
is
this
little
window
here.
A
I'm
gonna
put
this
in
as
an
example,
because
this
is
where
logged
eat
it.
Sometimes
it's
really
important
and
I
know
Eddie
I
know
you're
in
here
and
splunk
is
awesome,
but
this
is
a
cool
use
case
too.
Okay,
you
have
these
two
minutes
where
something
bad
happened
with
your
logins.
Well,
if
you
knew
about
that
tomorrow,
because
you
ran
it
in
batch,
then
you're
just
gonna
be
scratching
your
heads.
So
this
is
where
this
kind
of
usage
in
these
case
really
makes
sense.
It's
like
whoa
something
bad
happen.
A
A
A
It
is
really
a
use
case
that
I've
found
a
lot
of
places,
and
so
here's
the
here's,
a
problem
that
we're
dealing
with
is
that
we
want
to
store
the
version
of
a
form
indefinitely
and
we
just
want
to
have
a
very
efficient
way
of
doing
this,
so
that
we
have
version
1234
how
many
versions
they
are.
We
want
to
scale
to
any
number
of
users
and
that
that's
you
know,
always
a
requirement.
A
We
want
unlimited
scaling
and
because
we're
gonna
have
a
million
users
just
like
Facebook,
and
we
want
to
have
to
be
able
to.
We
want
to
be
able
to
commit
and
rollback
our
data.
If
I
make
a
mistake
on
version
2,
I'm
going
to
go
back
to
version
1
and
if
I
say
I
want
version
2
to
be
my
gold
version,
it's
going
to
turn
into
the
right
form
cool.
A
So
I've
tried
to
do
this
before
in
a
relational
database,
and
it's
not
easy.
I
mean
it's
a
lot
of
tables
that
have
to
be
joined,
and
so
it
wasn't
a
very
easy
model
there.
It's
just
not
mean
Z
model
anywhere
really,
but
in
this
case
I
think
this
will
work
really
well
with
a
the
way
that
rows
and
columns
work
and
sander.
A
We
also
have
this
need
where
it
needs
to
be
all
over
the
place.
We
have
our
local
data
center,
which
is
where
most
of
our
data
is,
but
we
also
have
cloud
components
like
some
of
its
in
hamazon
or
some
of
its
and
Rackspace,
so
it
needs
to
live
in
both
places
and
that's
really
difficult,
especially
if
you're
trying
to
create
like
a
homogeneous
persistence
layer
where
it's
maybe
the
same
technology.
A
A
So
here's
how
it's
going
to
work
our
whiteboard
session
says
we're
gonna.
Have
this
partition
key
row,
which
is
going
to
be
a
username
and
a
form
ID?
But
I'm
going
to
explore
these
blocks
of
those
form
attributes
for
each
time
they
make
a
version
change
and
so
I'm,
just
gonna
keep
growing.
That
out
is
you
know
random
amount
of
times.
If
we
have
somebody
who's
a
real
busybody
on
a
weekend
and
they
create
10
versions.
A
Fine
I'm,
not
gonna,
have
to
deal
with
that
in
any
other
way
than
just
my
data
model
will
maintain
it.
We
also
want
to
separate
the
the
tables
to
have.
We
have
the
data
that
they're
working
on.
We
want
to
have
some
stuff.
That's
live
on
the
production
site,
so
we're
gonna.
Have
these
two
things
going
on.
So
that's
right!
That's
that's
an
easy
requirement.
The
exclusive
lock
now
I
hate,
seeing
the
word
lock,
it's
got
it.
So
much
is
an
exclusive
lock
for
a
computer
which
is
a
different
problem.
A
It's
more
of
just
making
sure
someone
doesn't
stung
upon
somebody
else.
Now.
It's
funny
because
I've
had
this
discussion
with
somebody
else
about.
Why
don't
we
just
teach
all
our
users?
How
do
you
get?
Ok,
yeah,
ok,
yeah,
so
we're
gonna
get
a
lot
of
admins
in
like
HR
to
start
using
gay,
no
I
ain't
gonna
happen.
So
let's
well,
let's
think
about
this,
a
little
more
in
their
domain.
How
about
a
web
page?
Just
whenever
you
go
to
look
at
a
forum,
you
can
see
who's
currently
editing
it
there.
You
go!
A
That's
good
enough,
I
mean.
Can
you
imagine
somebody
doing
a
cherry
pick
on
a
forum
in
HR?
No,
it's
not
gonna
happen
all
right.
So
here's
our
data
model,
so
we're
gonna,
have
a
working
version
which
is
more
or
less
just
how
that's
going
to
be
where
all
the
activity
is
going
to
be
happening.
So
we
have
the
username
in
the
form
ID,
which
is
going
to
make
up
the
roki.
We
have
a
version
number
which
is
going
to
make
all
those
rows.
A
We're
gonna
have
a
locked
by
and
then
my
one
of
my
favorite
collection
items.
The
map
back
here
is
going
to
have
all
those
different
attributes
of
form
attributes
that
I'm
going
to
be
using
for
this
particular
form
ID.
So
that
means
that
if
I
change
any
of
this,
that
I'm
going
to
increment
the
number
in
my
application
and
it's
going
to
store
the
form
in
its
entirety,
so
that
in
each
form
can
be
different,
and
so
that
gives
us
a
lot
of
flexibility
and
why
not?
A
Because
I
like
to
do
it
every
time
anyway,
it's
kind
of
like
coming
my
boring
trick,
but
it's
very
effective,
reversing
that
so
that
my
the
last
form
that
I
was
using
is
the
first
thing
on
the
stack
more
or
less
so
whenever
I
want
to
get
the
latest
version,
I
know
that
it's
right
there.
I
don't
have
to
hit
a
rate
over
a
bunch
of
stuff.
That's
that's
just
really
in
general,
a
very
efficient
pattern
with
Cassandra,
so
we
have
our
first
version
that
goes
into
system.
It's
gonna,
look
like
this!
A
So
I
had
this
I
have
my
own
little
coding
scheme
here
where
I'm
gonna
have
a
text
box
called
first
name
and
here's
the
of
it
there's
a
display
name
and
here's
what's
gonna
happen
with
the
HTML.
So
this
is
my
first
version,
so
when
I
do
that,
I
create
that
that's
version
1,
so
I'm
going
to
lock
this
I'm
going
to
edit
it
again,
so
that
just
means
I'm
going
to
put
a
username
inside
that
table.
So
whenever
I
go
to
edit
that
table,
1
insert
puts
username
in
there.
A
So
if
other
users
come
in
to
use
it,
you
can
read
that
say:
is
it
blank
or
is
there
somebody
there?
Oh
is
somebody
there?
Okay,
I'm
going
to
go
back
and
display
to
the
user,
Oh
user
p
mcfadden
is
of
using
that
form
right
now.
Okay,
can't
touch
it.
That's
a
lot
easier
than
getting,
but
I
think
for
our
use
case.
It'll
be
fine!
Now,
if
you
want
to
modify
that
a
little,
that's
completely
possible
I'd
love
to
talk
about
some
of
these.
All
of
these
models
probably
have
variations.
A
That
I
would
love
to
talk
about.
These
are
meant
to
be
somewhat
simplistic,
so
we
can
get
through
it
in
an
hour.
But
there's
lots
of
variations
in
here
and
I
know
that
you
probably
have
some
really
cool
idea.
I
lie
here,
so
we
have
this
version
number
that
is
locked.
So
whenever
I
finish
it-
and
I
say
here-
I'm
done
you
know-
I
got
my
form
finished
up,
I'm,
just
going
to
put
a
null
in
here
whenever
I
increment
it
to
the
next
version,
so
that
pretty
much
releases
the
lock.
A
A
A
I
know
this
is
a
need,
so
I
figured
it
out
I'm,
you
know,
and
this
is
a
community
I'm
gonna,
try
to
help
you
figure
this
out,
and
so
you
can
do
this.
This
is
very
doable.
So
that
means
that
yours
is
next
and
I
want
you
to
think
about
what
you're
doing
right
now
and
I
know.
A
lot
of
you
are
working
on
projects
right
now
and
things
that
you're
doing
currently
because
I've
talked
to
you,
but
you
know:
try
out
a
few
things
and
really
iteration
is
the
key
here
there.
A
Maybe
there's
a
lot
of
different
ways
to
do
these,
so
try
different
ways
to
do
them,
see
how
they
work
and
if
it
doesn't
work
one
way
and
you
really
get
frustrated,
find
us
there's,
there's
we're
out
there.
The
community
is
out,
there
engage
an
expert,
you
know
I'm,
not
that
hidden
if
you've
noticed
I'm
kind
of
obvious
come
out
and
find
me,
and
you
know
twitter.
If
you
follow
me
on
twitter,
I
talk
about
data,
modeling
and
Cassandra
topics
all
the
time,
and
you
know
that
it's
you
can
use
it
like
an
RSS
feed.
A
If
you
want
and
I've
had
a
lot
of
people
directing
the
contact
me
about
data
modeling
and
as
much
as
I
can
do
144
characters.
I
can
help
you,
but
sometimes
it
turns
into
a
larger
discussion.
That
is
the
point.
I'm
trying
to
make
is:
engage
people
I,
really
hate
to
hear
how
people
try
to
project
with
Cassandra
and
failed
because
they
couldn't
get
their
data
model
to
work
when
they
didn't
even
ask
so
ask
people
are
here
to
help
I
mean
look
at
all
the
green
shirts.
We
have
in
this
room.
A
Raise
your
hand
if
you
got
a
green
shirt
on
oh
yeah,
okay,
so
that's
good
for
you
all
right
that
is
all
I
had
I
think
we
can
do
some
questions
and
I.
So
here's
the
rules
on
questions
I'm
not
going
to
rewrite
your
app.
But
if
you
have
general
questions
about,
why
did
you
do
this?
Or
how
did
this
work?
You
can
go
ahead.
A
G
So
when
you
do
multiple
rights,
you
replicating
your
data
to
multiple
locations
and
obviously
there's
bugs
in
your
code
and
you're,
going
to
make
mistakes
you're
going
to
write
it
incorrectly
to
one
location,
maybe
to
another,
is
their
processes
or
tools
or
concepts
that
have
emerged
to
help?
Make
sure
that
you
haven't
hosed
yourself
by
as
you
write
that
data
the
multiple
locations,
because
writing
bug
free
code
is
not
an
option.
A
All
right,
so,
if
you're,
if
you're,
in
a
position
where
you're
writing
code
that
and
you're
deploying
new
code
into
your
cluster
and
you
have
the
potential
of
writing
bad
data.
Essentially
that's
what
you're
talking
about
right,
yeah
that
that
is
probably
the
reason
number
one
why
you
do
a
backup
on
Cassandra
is
not
because
you're
going
to
lose
your
data
is
because
somebody
hosed
it
for
you,
and
that
happens
all
the
time.
A
So
in
this
case
snapshot
is
your
friend
I
always
tell
people
before
you
do
a
code
push
do
a
snapshot
and
a
snapshot
is
literally
that
it's
a
point
in
time.
So
if
you're
gonna
do
a
snapshot
right
before
the
code
push
and
then
I'll
send
your
code
deletes
a
bunch
of
email
addresses
right
on
pankaj.
Do
you
remember
that?
Don't
you
all
right,
so
I'm
gonna
point
out.
A
A
Solar
question
so
Solar
brings
out
a
whole
new
thing
and
really
it's
you
probably
want
to
think
more
of
your
data
in
terms
of
your
solar
schema,
then
Cassandra,
if
you're
using
it
for
just
solar.
If
there's
a
blend,
then
you
have
to
think
about
that,
but
keep
in
mind
that
if
you're
you're
overlaying
solar
on
topic
of
Sandra,
the
solar
schema
is
fit.
Here's
what
you
have
you
can
put
dynamic
fields
in,
but
really
it
is
fixing
the
scheme
in
some
way.
So
there's
a
little
different
way
of
doing
that.
A
A
I
know
I'm
gonna
get
that
oh
I
broke
the
rules.
Now
the
read
before
we
ride
is
really
about.
You
don't
want
to
do
a
lot
of
those
like
if
you're
doing,
read,
write,
read
right,
we'd,
write
in
this
case
I'm
doing
one
read
to
see
if
there's
anybody
there
and
its
user
based.
So
if
I
click
on
a
link,
I
get
a
read
and
then
I
can
I'm
gonna
make
a
fork
action
on
it?
I'm
either
going
to
move
to
the
next
form
or
I'm
going
to
give
them
a
different
form.
A
Saying
it's
locked
right
now,
so
it's
not
in
the
application,
I'm
not
reading
and
writing
it
like
one
after
the
other,
I'm
doing
a
read,
and
then,
if
there's
something
going
on
in
the
application,
that
then
will
result
in
a
right.
So
I
mean
it.
That's
I
guess
my
my
cheesy
way
is
saying
I'm
not
doing
a
read
before
right,
not
in
the
way
that
we
wouldn't
want
you
to
do,
which
is
right
after
the
other,
in
the
same
block
of
code,
I.
H
A
You're
talking
about
secondary
indexes,
why
didn't
I
use
them
all
right?
So
in
my
second
webinar
I
think
it
was
I
talk
a
little
bit
about
secondary
indexes,
I
feel
like
secondary
indexes,
are
a
crutch
for
relational
folks
because
they
think
its
speed,
secondary
indexes
are
built
for
convenience
and
not
for
speed
and
I
try
to
avoid
them
in
general.
A
Just
because
there's
a
lot
of
confusion,
but
if
there
was
a
case
where
you
did
need
it,
I
would
call
it
out,
in
this
case,
have
any
and
I
think
you'll
find
that
if
you
do
your
data
model
correctly
from
an
application
standpoint,
the
need
for
secondary
indexes
is
very
minimal.
I
mean
there's
six
there's
always
a
case
for
something
right,
but
I
feel
that
that
means
it's
very
small
case
pneus.
A
second.
H
A
That's
that's
what
I!
Here
too,
I
was
just
reading
that
blog
post,
it's
not
like
that
and
I
I
think
there's
just
a
lot
of
people
that
are
been
using.
Thrift
and
I
was
one
of
them.
It
took
a
while
for
me
to
get
my
head
shifted
around
and
understanding
it,
but
in
really
in
that
blog
post,
and
if
you
read
my
comment
on
it,
it's
really
about
understanding.
What's
going
on
in
the
storage
engine,
what's
really
going
on
here,
cql
is
abstraction.
A
It's
it's
a
way
to
make
it
look
better
for
the
user
and
for
the
application.
The
storage
engine
hasn't
changed
that
much
there's
some
there's
some
different
things
going
on,
but
not
that
much.
It's
still
doing
the
right
thing,
but
it's
forming
like
it's
giving
you
the
correct
path
to
get
to
that
right
thing.
So
that's
why
I
advocated
all
the
time
you're
gonna
have
the
better.
A
F
A
Time
anyone's
gonna
get
that
one
too.
He
never
put
a
lock
up.
That's
why
I
said
oh,
no
I
put
a
lock
up
for
you
think
you
always
get
the
lock
guys.
You
know
wait
a
minute,
so
the
first
question
collections
are
do
have
practical
limits
and
the
reason
that
there
is
a
practical
limit
is
because
it
cause
you
have
to
deserialize
them
so
that
can
cause
a
performance
hit.
A
I
talked
about
to
some
one
of
my
other
webinars,
where
it's
really
about
your,
where
you're,
at
with
your
performance,
if
you're
looking
for
the
most
performance,
the
serialization
is
not
something
need
to
take
lightly.
So,
if
you
put
say
a
hundred
thousand
things
in
there
or
ten
things
in
there,
there's
gonna
be
a
different
cost
right,
I
believe
there's
actually
a
hard
limit
of
65,000
right
now,
I,
don't
know!
If
there's
anybody
from
a
Cassandra
core
can
you
were
I?
A
Think
that's
what
it
was
65,000
right
now,
but
then
again
you
probably
don't
wanna
put
that
many
in
there
anyway.
It's
not
a
it's,
not
a
substitute
for
a
good
day
tomorrow.
It's
an
augmentation
all
right.
Second
question:
how
do
I
manage
either
luck?
I,
don't
have
a
good
answer
for
that
I
mean
there's
collisions.
It
can
happen
at
that
point
if
you're
really
that
worried
about
those
collisions
and
what
I
would
probably
say
is
that
you
can
create
a
different
model
where
you
put
a
timestamp
along
with
the
user.
A
You
create
a
locking
table
and
then
that
way,
if
you
do
have
two
people
in
there
at
the
same
time,
you're
going
to
see
that
there
are
two,
and
so
you
just
augment
that
row
or
that
column
value
so
that
you
have
a
little
more
information
in
there
say.
Oh
timestamp
is
really
good,
so
that,
if
you
do
have
two
in
there
that
you,
then
you
can
flag
that
as
a
problem.
It's
all
about
the
knowledge
that
there's
an
issue
yeah.
A
I
So
in
your
user
activity
table
you
had
four
or
five
columns
plus
a
clustering
column,
so
every
insert
will
write
essentially
five
columns
to
the
storage
engine.
Is
there
a
concern
of
the
table
getting
far
too
wide
above
the
typical
ten
thousand
columns
or
twenty
thousand
columns?
That's
considered,
you
know
safe
for
Cassandra,
which.
A
Table
was
that
LCD?
Oh
you
mean
the
initial
one.
Well
that
was
date
to
the
minute,
so
it
only
stored
up
to
the
minute
of
that
data.
So
there
would
only
be
one
minutes
worth
of
data
in
that
one
column,
family
all
right
that
one
row.
So
if
you
were
collecting
large,
say
nila.
Second,
that's
gonna
be
a
lot.
Yeah
I
mean
in
general.
A
What
I
look
at
from
the
width
of
the
column
is
more
about
the
volume
data
that's
in
there
and
not
the
count,
because
what
you're
trying
to
avoid
in
this
case
is
an
incremental
compaction,
so
we're
really
getting
into
the
weeds
now.
So
the
Inca
mental
compaction
that
I
try
to
avoid
is
say
like
that:
64
Meg
limit
on
a
mem
table.
Now
you
can
make
that
bigger.
A
If
you
wanted
to
to
keep
that
from
happening,
but
in
general
that's
what
I
look
at
now,
if
that's
gonna,
be
a
problem
in
this
case,
let's
change
that
to
every
100
millisecond
bucket,
or
something
like
that.
You
know
you
tune
that
there
are
some
teams,
probably
in
here
that
I've
worked
with,
where
we've
done
that,
where
we've
modulated
that
roki,
so
that
we
can
get
a
different
column
based
on
what
you're
trying
to
accomplish,
if
you
don't
mind
having
an
incremental
compaction,
because
it's
not
really
an
impact
on
your
system.
A
Fine,
let's
have
three
hundred
thousand
five
hundred
thousand
two
million
columns,
but
if
that's
more
of
a
concern
because
of
performance
later,
you
don't
have
the
I/o
to
take
it.
Then,
okay,
let's
figure
this
out.
That's
that's
the
try
things
and
then
measure
do
it
again,
and
this
is
how
we
do
our
development.
D
Again,
on
that
same
table,
there
was
a
clause
in
there
with.
I
think
it
was
with
clustering
order
by
right,
and
you
had
the
time
stamp
in
there
right.
I
I
clearly
don't
understand
how
that's
working,
because
it
seems
like
you'd
have
all
of
the
same
time,
stamps
together
and
if
you're,
looking
it
up
by
user
and
time
stamp,
that
you'd
want
it
clustered
by
user
and
time
stamp.
D
A
A
Okay,
we'll
beat
it
out
of
you
as
a
community.
We
do
that
as
a
service
right.
No,
it's
fine
it
because
that's
that's
where
you
think
about
what
your
application
needs.
I
really
wanted
that
in
some
sort
of
laid
out
for
Matt-
and
this
is
one
of
the
great
use
cases
for
Cassandra's-
that
temporal
data
and
probably
the
go-to
move
for
a
lot
of
people.
Then
unfortunately,
I
hear
a
lot
of
people
say:
well,
it's
only
good
for
now
it
isn't.
A
But
you
know
thinking
about
wow,
that's
perfect,
because
now
I
have
this
one
row
that
I'm
looking
up
of
a
source
and
I
need
that
data
over
a
certain
range
like
what
time
it
makes
perfect
sense,
because
what
do
you
always
do
with
time
data
you
say
I'll,
give
me
the
last
10
minutes
or
give
me
three
days
worth
or
something
like
that
so
yeah
putting
those
into
individual
columns
does
one
seek
on
the
disk.
That's
what
we
want
very
fast,
more.
A
You
can
and
that's
where
I
always
say
compress
if
you
can,
and
so,
if
you
actually
I
have
worked
with
people
that
are
doing
that
large
log
files
or
something
like
that.
There
are
even
images
that
there's
plenty
of
use
cases.
The
thing
you
have
to
consider
whenever
you
put
something
large
in
a
column
value
is
the
wire
cost,
or
you
know
how
long
it
takes
to
get
something
over
the
wire.
A
A
Maybe
you
have
a
larger
SLA,
so
that's
really
the
biggest
consideration
and
then
again
how
many
columns
you
have
you
know
you
may
get
to
a
large
physically
large
column
like
a
lot
of
megabytes
that
will
generate
a
different
type
of
compaction,
but
it's
more
tuning
and
that
may
not
be
a
problem
either,
especially
if
it's
somewhat
like
cold
storage.
So
to
speak,
we
always
look
and
again
back
to
our
application.
A
What
is
the
important
thing
here
if
your
SLA
is
the
most
important
thing
like
I,
have
to
have
every
we'd
come
into
20
milliseconds,
30,
milliseconds?
Okay,
let's
tune
for
that.
If
your
I
need
to
be
able
to
write
a
hundred
thousand
per
second
of
this,
but
I
don't
care
how
fast
I
read
it
out
completely
different
story.
That's
why
we're
thinking
about
our
application
first
and
not
our
data.
The
relational
way
was
I
got
a
lot
of
data.
How
can
I
dice
this
up?
A
A
Five
bucks,
the
for
answering
that
ask
thanks
back
yeah
that'll,
be
in
a
couple
weeks,
so
data
sex
enterprise
which
has
to
do
from
solar.
This
is
one
been
one
thing
that
our
data
sex
enterprise
team
has
been
working
on
very
hard.
It
really.
What
we're
talking
about
is
Cassandra
12,
which
has
a
lot
of
great
features:
virtual
nodes.
It
has
as
the
c
ql
native
transport.
A
Those
are
two
very
good
reasons
to
be
on
it,
but
data
sex
enterprises
of
using
version
11
to
get
to
version
12.
We
had
to
make
solar
and
hive
be
okay
with
using
things
like
virtual
nodes,
and
so
that's
pretty
much
done
so
that'll
be
out
pretty
soon
and
you'll
probably
have
it
before.
You
know
it
and
I'm
happy
to
say
that
so
DSC
team
you
grow
hawk.
Thank
you
very
much.
So
luckily,
it's
not!
I
don't
know
any
more
questions.
More
uh-oh
patricia
doesn't
get
any
asked
questions
all
right.
Fine,
I.
C
Thanks
so
so,
I
had
a
question
about
so
about
size,
constrained,
columns
or
size
constrained
rose,
so
I
know
it's
not
practical
to
to
constrain
your
rows
based
on
a
set
number
of
columns,
because
then
you
have
to
do
the
full
read
for
each
row
every
single
time.
You
do
it
right,
but
could
you
have?
Could
you
have
a
use
case
where
you
set
a
counter
column
for
each
row
and
then
check
that
counter
column
only
and
sort
of
go
through
and
do
do
a
hack
constraint
that
way,
I
think.
A
You
just
answered
that
you
said
the
hack
word,
no
I
wouldn't
do
it
that
way,
because
that
all
right,
there's
your
read
before
the
right:
okay,
you're
after
we'd
the
column
value
and
then
do
a
right
or
yeah
you'd
have
to
do
it
right
or
even
do
another
read
with
that.
So
you
know
you're
asking
like
if
the
column
count
is
say
10,000
or
a
million
or
something
like
that.
What
would
be
the
like
the
constraint?
What
are
you
worried
about
in
that
case,
like
having
too
many
columns
and
you're
having
a
good
count
just.
A
A
So
it's
kind
of
a
more
advanced
topic,
there's
more
interesting
ways
to
modulate
how
many
columns
you
have
by
using
the
row
key
and
I
would
prefer
to
explore
that
until
you
ran
out
of
it
to
do
before
you
do
anything
with
the
word
hack
in
it.
You
know,
if
you
have
the
word
hack
in
your
solution,
you're,
probably
gonna.
A
That's
gonna
come
up
with
an
X
print,
because
someone's
gonna
say
you
can't
have
a
hack,
so
yeah
I
would
I
would
look
at
why
you're
even
worried
about
that
and
address
that
problem
from
the
beginning
and
just
eliminate,
because
if
you,
if
you're
creating
a
counter
column
for
every
time
you
put
in
a
new
column,
then
I
really
feel
like
that's
an
anti-pattern
there's
and
it's.
Unfortunately,
the
answer
thanks.
C
A
A
No
creative,
no
yeah
I,
don't
think
he
could
do
that.
Wow
I've!
Never
done
that!
No,
you
cannot
because
of
the
way
that
the
collection
is
created.
It
would
not
work,
but
that's
an
interesting
idea.
Why
would
you
want
to
do
that?
It.
B
A
B
A
You're,
creating
an
immutable
set
then
just
create
the
right
data
model.
The
first
place.
Okay,
I
mean
I'm,
not
trying
to
be
harsh
bad.
I
mean
I'm
just
saying
that
that's
one
of
those
things
where
we
sit
there
and
talk
about
it,
because
if
you
really
say
this
is
immutable
or
I
have
fixed
fields
and
you
don't
need
a
dynamic
data
structure,
sure
yeah
sure,
okay,
all
right
I'll
be
over
here.