►
Description
Speaker: Patrick McFadin, Chief Evangelist of Apache Cassandra at DataStax
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
A
A
Those
of
you
that
are
your
first
one
you're
like
I,
don't
know
it's
kind
of
small,
but
this
is
amazing
because
we
got.
I
mean
it's
amazing
for
me,
because
I
know
so
many
people
in
here.
I
know
your
use
cases.
I've
probably
been
on
site
where
you
work,
I
probably
met
you
at
a
meet
up
and
it's
just
really
neat
to
see
everybody
show
up
and
talk
to
each
other.
I
mean
how
many
of
you
been
in
the
cassandra
live
room
and
got
yeah
come
on.
Lightsabers
got
playable
call
of
duty
in
there.
A
A
So
since
last
summit,
when
I
talked
last,
I've
had
a
couple
of
projects.
I've
worked
on
and
it
is
so
cool
because
they're
here
today,
sony
talking
about
playstation
4,
launching
playstation
4
with
cassandra
on
the
back
end,
tim
and
sean
over
here.
Talking
about
how
they
launched
call
of
duty,
I
got
a
show
I
got
to
show
up
in
vancouver
because
you're
like
well
we're
going
to
launch
this
game
on
cassandra,
you
better
be
there
in
case
something
happens
most
boring
week
of
my
entire
life.
A
A
So
I've
been,
I
it's
great
because
I
now
you're
coming
to
my
house,
I
live
around
here.
I've
been
all
over
the
world
this
year
and
I
everyone
I
see
out
there
I
mean
it's
just
amazing.
I
went
to
warsaw,
poland
had
a
meet-up
50
people,
rsvp'd
75
people
showed
up.
I
never
had
that
happen.
Before
I
mean
they
had
people
piling
out
of
the
room.
The
amount
of
interest
in
cassandra
worldwide
is
amazing.
I'm
heading
back
to
australia
shortly
and
it's
going
to
be
even
bigger.
A
A
And
what
was
great
and
the
the
story
behind
it
always
fun.
You
know
all
the
things
that
happened.
Seven
rcs,
that's
a
lot,
but
it
just
shows
you,
like
things,
are
starting
to
grow
up
a
bit
in
cassandra,
land
and
so
huge
changes.
In
this
I
mean
we're
talking.
The
performance
changes
alone
are
amazing,
but
the
data
model
is
definitely
different
and,
of
course,
that's.
A
Why
you're
here
here
when
I
talk
about,
but
those
amazing
changes,
and
that
the
performance
is
really
I
mean,
that's
just
a
little
shape
of
what's
to
come,
and
you
look
at
what
you
know
those
of
us
in
this
room
that
have
been
here
since,
like
0.6
or
0.7
cassandra.
A
You
know
this
is
amazing.
To
watch.
2.1
is
really
an
amazing
different
tool
and
I
just
love
to
see
the
progress
so
many
contributors
on
open
source
has
anyone
in
here
contributed
to
cassandra
open
source
in
any
way,
docs
jiras
anything.
I
see
a
lot
of
hands,
I'd
like
more
hands,
but
this
is
a
no.
We
we
own
apache
cassandra.
A
It's
an
apache
project.
Datastax
doesn't
own
apache
cassandra,
I'm
glad
I'm!
I
love
datasets
because
they
they
hire
me
to
come
out
and
talk
to
you
about
this
kind
of
stuff
and
it
really
helps
grow
the
community,
but
it
is
an
apache
project
and
it's
really
cool
to
see
community
people
working
together.
Talking
about
what
they're
doing
and
making
this
a
better
thing,
because
it's
ours,
it's
ours,
so
3.0
is
next,
so
that'll
be
my
next
year's
talk.
A
Just
just
hang
on
to
your
hat.
It
is
going
to
be
amazing.
I
I
got
to
go
to
the
committer
conference
back
in
june
and
I'm
not
a
committer
by
the
way
I
play
one
on
tv,
but
just
the
things
that
are
being
talked
about
not
just
dreamed
about,
like
specked
out,
are
amazing,
huge
changes
coming
and
it's
more
performance,
more
usability.
A
A
A
We
made
it
bigger,
yeah,
so
the
2012
summit.
I
talked
about
this
crazy
website.
With
this
crazy
cat
called
killervideos.com
come
and
find
out,
I
could
only
register
killervideo.com
and
it
was
supposed
to
be
this
complete
example,
and
I
wanted
to
have
something
that
was
you
know
fake,
but
had
a
real
feel
to
it,
and
it
was
just
you
know.
I
think
a
lot
of
us
have
seen
twist
sandra.
I
wanted
an
alternative
to
twist
sandra,
because
that's
that
just
didn't
appeal
to
me.
Well,
it's
gone
pretty
well
and
it
has
good
legs
on
it.
A
We
have
really
changed
things
quite
a
bit
so
whoa
backwards.
Jonathan
did
that
we
are
now
taking
this
thing
to
a
real
website.
It's
actually
there.
If
you
go
to
killervideo.com
it's
a
real
website,
it's
hosted
on
azure.
The
code
is
up
on
github
and
it
actually
looks
real
too.
I
mean
that's
better
than
my
little
cat
picture
right.
A
We
had
a
contest
for
the
design
that
design's
not
up
yet,
but
here's
the
point.
We
want
to
have
a
living
website
that
we
can
talk
about
data
modeling
that
we
can
all
participate
in
that
you
can
download
examples.
We
can
code
against.
We
can
do
things
with
it.
We
can
do
crazy,
stuff
and
even
highlight
new
features.
A
Now
you
can
use
2.1
today
with
the
java
driver
c,
sharp
python
c,
plus
plus
ruby
node.js
yeah,
those
just
shipped,
so
we're
going
to
try
to
create
this.
You
know
my
evangelist
team
we're
trying
to
keep
this
alive
as
just
something
that
you
can
use
as
an
example.
A
A
So,
like
you
know,
hey
if
microsoft
access
said
north
winds
right
so
we're
that's
the
intent
that
was
really
tepid.
It
was
a
monty
python
and
the
people
rejoiced.
A
So
let's
go
let's
dive
into
this
a
bit,
so
I'm
going
to
revisit
that
data
model
and
I'm
going
to
toss
in
some
2.1
jonathan
talked
about
it
this
morning
in
the
keynote-
and
I
know
that
aaron
did
some
talking
about
some
things
that
are
happening.
Let's,
let's
just
go
into
this
now.
What
I'd
like
to
do
is,
let's
hold
off
on
your
questions,
I'm
going
to
leave
a
big
gap
at
the
end,
because
I
know
how
these
things
roll.
A
I
know
how
you
are
you're,
all
questioning,
I'm
going
to
leave
a
big
gap
in
the
end,
and
I
hope
that
we
have
microphones
ask
a
question
I'll.
You
know
we're
going
to
answer
it.
Nothing
personal!
You
know
like
weird
that
I
you
know
I
hate
answering.
No,
I'm
kidding
we
want
to
answer.
I
want
to
answer
all
the
questions
you
have
about
what
I'm
talking
about,
because
I
know
I'm
going
to
leave
out
something
or
there's
something.
That's
burning
issue
in
your
mind.
I
want
to
hear
what
it
is,
because
this
is
being
videotaped.
A
People
are
going
to
see
this
they're
going
to
probably
have
a
similar
question,
so
speak
up
and
we'll
do
that
at
the
end.
If
I,
if
I
don't
go
over
time,
which
I
just
shouldn't,
so
what
are
we
going
to
do
with
these
2.1
data
models
we're
going
to
replace,
but
in
a
lot
of
cases,
we're
going
to
remove
a
lot
of
app
code
now
remove
appcode?
Why
would
we
do
that
now?
A
I
know
that
a
lot
of
you
probably
write
a
lot
of
app
code
because
a
certain
feature
is
not
in
cql
or
that
you
need
to
account
for
something
in
the
way
cassandra
manages
its
data.
So
that's
that's
not
really
efficient
from
an
application
standpoint.
It
doesn't
make
an
application
developer's
life
better.
So
when
we
put
it
into
the
standard
data
model,
it's
going
to
make
your
life
better
right.
Oh
I
don't
that's
one
less
thing.
I
have
to
write.
That's
less
code.
A
A
A
You
know
up
to
this
point:
we've
been
doing
this
separate
tables,
so
they're
kind
of
disjoint
in
a
way
not
too
bad.
It's
not
something
that
really
required
a
foreign
key
constraint,
but
it
is
something
that
requires
a
second
get.
So
in
this
case
we
say,
videos
is
in
one
table
and
your
video
metadata
is
in
another,
and
the
video
metadata
has
a
uid
that
points
back
to
the
video.
A
Now,
when
I
want
to
grab
all
of
the
video
metadata
for
a
particular
video
and
to
find
out
which
play
speeds,
I
have
playback
speed
or
bit
rates,
I'm
going
to
have
to
do
a
select
from
videos
first
and
then
I'm
going
to
have
to
do
a
select
from
video
metadata
next
and
then.
What
I
have
to,
unfortunately
do
is
some
sort
of
an
in
application
join.
A
Where
I
combine
the
data-
and
I
then
I
create
this-
this
data
that
is
on
your
screen
here,
like
you,
oh
yeah
there,
and
these
are
what
you
would
expect
to
see
on
a
playback
screen
like
I
can
do
my
my
playback
rate
or
as
my
title
and
description,
a
few
other
things
like
comments
and
what
have
you,
but
that
required
extra
code
to
do
that's
a
bummer
because
now
we're
spreading
data
around
we
get
into
these
discussions
about.
Well.
A
What
if
I
have
the
situation
where
I'm
setting
one
table
and
I'm
not
setting
the
other,
that's
a
bummer!
I
don't
want
to
do
that.
That
seems
stupid
get
out
of
my
house
so
yeah.
I
don't
want
that.
We
want
to
pull
it
together
so
now
with
udts.
What
are
we
going
to
do
with
this
we're
going
to
nest
that
puppy
we're
going
to
put
it
right
inside
the
videos
table?
So
now
I
have
a
type
video
metadata
and
that
video
metadata
is
contains
all
the
fields
that
I
wanted.
A
A
A
If
I
had
two
different
tables,
potentially
I'm
going
to
have
on
two
different
nodes,
it's
no
longer
co-located
and
if
I
ask
for
all
that
data
in
one
pass,
I'm
not
going
to
get
it.
I'm
going
to
have
to
go.
Do
a
couple
of
things
and
one
of
those
queries
might
time
out
so
now
it's
all
built
in
and
if
you
notice
I
have
a
set
of
a
set.
I
now
have
nesting
going
on
in
two
layers,
which
is
pretty
cool,
so
there
is
a
one
word
there
frozen.
A
A
Although
and
jake
are
you
in
here,
you
were
supposed
to
sing
this,
no
all
right.
He
left.
He
knew.
I
told
him.
He
was
going
to
have
to
sing
this
and
he
bailed
out
on
me.
So
the
whole
idea
of
frozen
is
really
an
attempt
for
us
to
stay
out
of
technical
debt.
A
A
But
jonathan
mentioned
it
this
morning,
but
really
what
we're
saying
is
that
that
that
is
going
to
be
fully
defined
whenever
you
create
it
and
you're
not
and
you'll
have
to
recreate
it
to
change
it,
and
so
frozen
and
2.1
makes
sense,
because
you
cannot
there's
no
dynamic
part
of
that.
You
can't
grow
it
out
in
any
way.
So
in
3.0
you
will
be
able
to
do
that
so
by
enabling
the
feature
in
2.1
without
having
the
dynamic
portion
of
it.
A
We
just
put
the
frozen
keyword
in
front
of
it
to
keep
it
useful,
but
then,
whenever
you
don't
want
to
have
this
like
python,
two
to
three
fiasco
where
oh
you
know
we
were
using
2.1
and
when
we
started
using
3.0
everything's
wrong
again.
No,
we
don't
want
that
because
if
you
start
using
udts-
and
I
guarantee
you
after
this
talk-
you're
going
to
go
out
and
use
them
probably
incorrectly
and
we're
going
to
try
to
fix
that.
But,
oh
I
can't
wait
for
that
phone
call.
Dude!
A
A
So
yesterday
I
had
a
data
modeling
class
and
we
were
talking
about
all
these
different
eight
hours
of
data
modeling
and
I'm
doing
another
hour
today.
So
eight
hours
of
talking
about
data
modeling
and
what
was
the
favorite
topic
everybody
wanted
to
do.
A
I
want
to
store
some
json
dude.
I
put
this
one
slide
up.
How
many
of
you
were
in
my
data
modeling
class.
Last
yesterday,
oh
yeah,
this
slide
made
you
come
to
this
talk,
didn't
it
it
was.
It
was
pretty
bad.
It
was
blatantly
bad.
I
I
apologize
to
all
the
other
speakers
in
every
other
track.
I
totally
trolled
you.
I
put
this
slide
up
and
now
I
drug
them
in,
but
this
is
what
I
want
to
do
right
here
is
a
some
json.
A
What
do
I
do
with
that?
There
are
people
in
this
room
that
I've
had
the
most
intense
conversations
with
it's
like
dude.
I
got
json.
What
do
I
do
like
a
total
captain
kirk?
What
do
I
do
because
it's
like
it's
nested
and
it
looks
like
a
schema,
but
it
doesn't
fit
into
anything
because
it's
not
flat.
I
can't
denormalize
this.
So
what
do
you
wind
up
doing
unnatural
acts?
You
start
pulling
things
apart.
You
start
pulling
out
fields,
you
flatten
them
out.
A
You
create
hashes
of
things,
it's
just
a
mess,
so
we
don't
want
that
now,
the
new
story.
What
am
I
going
to
do
with
this?
This
is
a
product
just
a
product
in
a
catalog
somewhere
it
has
an
id
some
descriptions.
It
has
some
dimensions
and
then
some
categories
in
those
categories
there
could
be
multiple
ones
like:
where
does
it
fit?
It
fits
inside
home,
furnishings
or
in
kitchen
furnishings.
Those
are
different
places
where
it
fits.
So
this
is
this.
A
So
I'm
going
to
pick
this
apart.
First
part
dimensions.
All
right,
I'm
going
to
create
a
type
for
that.
I
create
a
type
called
dimensions,
just
all
those
in
fields
that
I
want
and
I'm
giving
it
a
fixed
schema.
I'm
saying
no,
no,
no
length
needs
to
be
afloat.
Width
needs
to
be
afloat,
that's
good,
don't
put
in
a
text,
and
all
this
is
now
defined.
A
A
The
product
table
now
has
everything
it
the
entire
json
in
it
all
right.
That
is
cool.
Now
I
have
my
dimensions,
which
are
a
fixed,
udt,
a
user-defined
type.
But
then
I
have
my
categories,
which
are
a
map
so
whenever,
if
you
look
at
what
I
actually
have
here
category
home
furnishings,
my
my
map
key
is
home,
furnishings
and
kitchen
furnishings
and
the
object
that
it
contains-
or
this
is
contained-
is
the
category
itself.
A
I
just
saw
some
people
downloading
things
now
so
now
whenever
this
is
an
insert
statement.
So
when
I
insert
this
into
cassandra,
I
say
insert
into
the
units
go
into
dimensions
and
then
I
this
is
actually
an
insert
statement
now
I
busted
it
out
to
make
it
look
at
all
json-e,
but
that's
what
it
looks
like
and
that
actually
works.
This
is.
This
is
very
usable.
You
could
use
it
today
when
I
upload
these
slides,
you
can
go
get
them,
so
this
is
now
something
you
could
store
in
cassandra.
A
So
what
about
retrieving
this
is
where
3.0
will
make
these
things
a
lot
easier,
but
for
now
this
is
one
that
we
have.
If
I
want
to
get
the
height,
I
just
say:
dimensions:
dot,
height,
awesome,
boom,
32
and
if
I
need
to
grab
the
categories,
because
it's
a
map
map
restrictions
still
hold,
I
have
to
grab
the
entire
map
as
a
single
pass,
so
the
whole
entire
thing
that
will
be
changing
though,
but
for
now
I
got
a
map
of
all
my
categories
and
that's
that's
great.
A
A
Apparently
it's
all
to
make
counter
spark.
Well,
I'm
happy
to
say,
keep
the
change,
because
that's
gotten
it's
gotten
a
lot
better.
Maybe
they
didn't
know.
Oh,
but
so
we're
gonna
have
another
summit
because
we
have
extra
money
now
so
counters
have
been
around
since
point.
Eight
thanks,
twitter
and
commit
log
replays
has
been
really
the
biggest
problem.
If
you
do
a
replay,
as
jonathan
said,
it
changes
your
counter
oops
that
went
up
a
bit
and
repairs,
because
no
one
does
repairs.
A
Repairs
can
change
counters
too.
These
are
things
that
make
me
say:
don't
use
a
counter
ever
and
the
performance,
because
even
if
you
did
use
them
was
always
a
little
inconsistent
because
a
garbage
collection,
because
it
would
yank
all
the
stuff
into
the
heap,
put
these
big
fat
slabs
of
memory
into
the
heap
and
get
promoted
into
old
gin,
and
then
it
wouldn't
get
cms
properly
and
it
would
start
doing
this.
A
A
Well,
that's
time
for
wrecking
ball
boy.
I
was
so
close
to
putting
a
miley
cyrus
picture
up
there
and
I
saw
it
on
my
slide
and
I'm
like.
No,
I
just
can't
do
it.
It
was
just
I
can't
promote
that,
so
I
just
put
a
regular
one
up.
So
sorry,
I
you
could
thank
me
later.
So
what
what
what
happened?
What's
so
good
jonathan
explained
a
little
bit,
but
really
what
it
is
is
just
kind
of
an
internals
rewrite
and
just
taking
it
and
saying
whoops
mistake.
A
That
was
a
big
problem.
The
good
news
is
now
it's
stable
under
load
and
you
can
now
do
a
commit
log
replay,
woohoo
and
those
repair.
Weirdnesses
are
gone
yay.
I
I
think
those
two
issues
right
there
make
it
really
useful
and
that
still
it
just
takes
away
a
lot
of
the
problems
and
the
stability.
I
mean
look
at
the
stability
that
the
blue
line
is
what
I've
been
used
to
for
the
top.
A
One
is
for
uncontended
rights
to
counters
and
the
bottom
one
is
for
contended
the
uncontended
ones,
man
that
gc
was
just
a
mess.
Uncontended
yeah
you
kind
of
had
hot
sets
going
in
in
the
heap,
but
you
just
never
got
consistent
performance
and
gc
paws.
Let's
face
it,
that's
going
to
kill
your
95th
and
99th
percentiles,
so
you
don't
want
to
use
that
we
we
fixed
that
that's
good,
and
so
you
should
try
them
out
in
2.1.
A
A
You
still
can't
do
delete
and
I
know
that's
a
bummer,
but
that's
something
that's
being
worked
on
and
you
still
need
to
do
a
read
before
write
internally,
so
the
internal
rep,
the
internal
implementation,
still
does
a
read
before
right.
If
you
are
using
counters
on
a
really
weak
cpu
system
and
you're
using
compression,
I'm
telling
you
right
now,
turn
off
compression
compression
will
thrash
the
living
hell
out
of
your
cpu.
If
you're,
using
like
an
m1
large
in
amazon,
which
I
know
you
can't
buy
too
many
of
those
anymore
they're
hard
to
get.
A
A
Usage,
how
do
we
new
use
these
new
counters
they're,
so
cool
right,
they're,
fast
they're?
Just
it's
pretty
much
same
thing:
don't
no
change
to
the
api
at
all,
which
is
great
right
and
that's
that's
actually
a
good
feature
right
there
you
still
have
to.
When
you
look
at
your
counter.
You
can
just
do
a
plus
one
or
plus
three
to
increment
or
minus
one
minus
three
to
decrement
and
it'll
increment
or
decrement
your
counters,
just
that
simple.
A
So
if
you
have
counter
code
out
there
and
you
upgraded
2.1,
it's
still
going
to
work.
Just
fine!
That's
good!
A
Next
topic,
static
fields,
so
statics
are
new.
As
of
2
0
6..
They
have
a
very
specific
use
case
and
I'm
going
to
cover
that
real,
quick
and
thrift
people
will
love
this,
because
what
it
did.
There
was
a
complaint
about
the
thrift
and
thrift
implementation
of
how
you
put
data
into
a
storage
row
that
just
was
not
possible
in
cql.
A
If
you
look
up
here,
because
the
primary
key
had
a
partition
key
and
a
clustering
key
of
id
and
time
that
means
weather
station
name
and
temperature
had
to
be
included
in
every
single
partition
or
every
single
row
in
that
partition,
so
that
duplicates
data
across
if
the
weather
station
name
is
always
going
to
be
the
same,
which
I
guarantee
it
will
it
doesn't
change?
It's
still
the
same
thing,
then
that's
just
kind
of
a
waste
of
space.
This
also
eliminates
it
also
can
create
problems.
A
A
Now,
when
I
put
weather
station
name
as
a
static,
I'm
we're
looking
at
the
storage
row
representation
here.
This
is
the
graphic
it's
going
to
have
name
one
time
and
every
single
cql
partition
will
just
have
temperature
in
it.
At
that
point,
much
more
efficient,
much
cooler.
I'd
rather
see
this.
It's
going
to
save
a
lot
of
space
for
just
this
one
use
case.
I'll
bet
you
have
some
use
cases
right
now
you
can
think
of
where
this
would
apply.
A
Now
there
are
some
things
you
have
to
make
sure
you
put
the
static
at
the
end
of
the
declaration.
It
cannot
be
a
part
of
the
primary
key
because
that's
a
clustering
column
or
it's
it's
your
clustering
key,
which
would
be
your
row
key
in
the
storage
engine,
but
pretty
simple
usage.
You
just
put
static
at
the
end
move
on
so
this
one.
A
I
expect
to
see
that
quite
a
bit
this
year
now
here's
the
type
that
I
struggled
coming
up
with
a
reason
why
I
would
use
it
I'm
going
to
be
honest,
but
after
I
talked
to
a
couple
of
the
committers,
I
realized
what
we're
showing
here.
What's
actually
happening
is
we're
bubbling
up,
something
that
has
to
be
there
for
a
lot
of
other
things
like
udts
and
unfrozen
udts,
but
there
are
some
people
that
would
ask
for
it.
It
was
just
easy
enough
to
expose.
So
let
me
explain
what
this
is.
A
This
is
the
tuple
type
and,
yes,
you
have
to
freeze
these
guys
too,
but
a
tuple
type
is
really
simple.
It's
just
a
collection
or
group
of
attributes
like
I
want
a
three
tuple
with
three
things
in
an
in
a
text
and
a
float
just
numerating
it
out
or
a
four
tuple
within
textbook.
You
can
have,
I
said
256.
that
was
wrong.
I
put
a
sign
up,
it's
actually
32
000
elements
in
there
and
it's
really
simple
to
use.
A
You
just
create
a
group
of
things
and
you
can
store
them
all
as
a
group,
not
a
lot
of
use
cases,
but
I
found
one
and
it's
actually
pretty
appropriate
because
we
do
have
a
drone
flying
around
here
somewhere.
What
if
we
had?
You
know
in
a
cartesian
plane,
we
had
x
y
z
or
z
for
your
canadians
in
here,
I'm
multilingual.
A
So
if
you
have
3d
cartesian
coordinate
well,
that's
that
fits
pretty
good.
It's
a
tuple
x
y
z
and
when
we
have
that
just
to
store
as
a
group,
I'm
not
breaking
it
down.
I
just
I'm
gonna
store
that
so
you
can
see.
I
have
my
position
as
a
tuple,
a
three
tuple
and
storing
that
in
as
a
time
series.
So
whenever
I
need
to
know
where
my
my
drone
is
exactly
in
a
3d
coordinate,
I
can
just
pull
that
out.
A
So
if
you
have
drone
issues
there
is
your
data
model.
That's
free
final
little
bit
here
and
I
I
know
I'm
whittling
down
on
time,
so
I
better
hurry
up
so
partition
size.
Besides,
oh
my
god,
eventual
consistency
is
killing
me.
I
don't
know
what
I
don't
get
it
the
after.
We
get
through
that.
The
next
thing
is
wait
a
minute
how
big
of
a
partition
can
I
create?
That's.
That's
always
the
next
step
for
me,
and
so
I
I
spent
a
little
bit
of
time.
A
A
cql
partition
is
really
a
projection
of
a
storage
row.
If
you've
watched
my
data
modeling
talks,
you
know
I
talk
about
that.
The
storage
row
underneath
store
it
on
disk
right
and
a
storage
row
can
have
up
to
two
billion
cells.
That's
a
lot
now.
I
always
get
people
say:
well,
you
don't
really
store
two
billion
cells
in
there.
Well,
I
don't
know:
let's
find
out
so
it
can
also
hold.
Each
cell
can
hold
up
to
two
gigs
of
data.
That's
a
lot
of
data,
two
two
gigs
times
two
billion.
A
Well,
it's
good
to
do.
Math
kids
stay
in
school,
but
that
math
is
stupid
for
our
discussion,
because
if
anyone
ever
did
it
and
called
me
and
said
I
did
it
I'd
say
you're
stupid,
that's
what
I
meant.
So
how
much
is
too
much
how
many
cells
before
it
degrades
how
many
bytes
before
it
becomes
unmanageable?
A
What
is
practical
in
this
case
so
practical,
meaning
like?
How
would
I
store
this
so
aaron
morton?
Who
was
on
stage
here
earlier?
And
if
you
saw
his
talk,
he
actually
went
through
this
exercise
with
point
eight
in
2011
and
I've
used
this
I've
done
this
I've
I've
referenced
this
like
the
holy
world.
You
heard
here.
A
Do
not
do
anything
other
than
what's
written
in
this
blog
post
because
he
tested
this
out
how
big
of
a
partition
is
good
or
how
many
rows
would
you
put
in
a
or
how
many
cells
would
you
put
in
a
row
and
it's
a
great
blog
post,
but
it's
old
right
things
have
changed.
His
conclusion,
which
was
always
been
mine,
is
keep
it
less
than
10
000
cells.
Keep
it
below
64
megs
of
total
size
inside
of
it,
because
you
have
multi-pass
compaction
problems,
practical
advice
and
multiple
hits
to
those
64k
pages
of
data.
A
A
A
So
I
I
did
do
represent
what
the
intent
was
of
those,
and
so
I
I
have
the
code
I'm
going
to
probably
after
this
I'm
going
to
be
writing
a
blog
post
about
exactly
how
I
did
it,
because
it's
way
more
than
this
talk
for
those
of
you
who
are
interested
in
the
nerdy
details.
But
what
I
did
was,
I
said
all
right:
I'm
going
to
start
with
100
cells
on
a
partition
all
the
way
up
to
a
million
a
billion,
a
billion,
a
billion,
that's
right,
billions
and
billions.
A
So
the
results
were
actually
pretty
surprising
when
I
got
up
to
a
hundred
million
cells.
The
performance
really
didn't
degrade
that
much
and
these
are
in
milliseconds.
So
it
went
from
300
microseconds
to
600
microseconds.
The
test
was,
it
did
a
run
of
100
per
test
and
I
was
doing
things
like
random
reads:
in
the
middle
or
grab
all
the
row
grab.
A
100
cells
from
the
end
grab
100
cells
from
the
beginning,
the
only
one
that
was
really
consistently
consistently
low
was
grab
it
from
the
beginning,
which
is
right
there
in
that
blue
line,
and
it
seemed
like
that.
Sweet
spot
was
down
there
at
the
five
or
ten
thousand,
but
even
when
I
got
up
to
10
million
and
100
million
it
wasn't
that
bad
aaron's
test
showed
that
it
just
went
off
the
chart.
A
A
So
I
got
a
problem,
but
I'm
going
to
try
to
solve
that
and
it
was
actually
grabbing
all
those
indexes
and
putting
them
into
the
heap
caused
a
lot
of
issues.
So
that's
already
been
addressed
in
ajira.
Hopefully
that
will
get
fixed
as
we
go
along,
but
when
I
did
find,
if
I
broke
up
those
ss
tables
into
a
lot
of
chunks,
it
wasn't
that
expensive.
I
was
getting
pretty
usable
performance
at
1
billion
cells.
So
what's
the
new
answer,
the
new
answer
is
hundreds
of
thousands
is
no
problem.
A
Hundreds
of
mags
per
partition
is
probably
the
best
thing
operationally
I'll
tell
you
27
gigs
in
one
partition,
we're
not
quite
there.
Yet
your
operations,
people
will
probably
play
on
your
disappearance.
If
you
do
that,
it
was
just
hard
to
manage
and
get
get
things
moved
around,
but
I
think
that
that's
going
to
change
pretty
quickly
and
it's
really
an
operations
issue
when
it
comes
down
to
it.
If
you're
trying
to
follow
some
guidelines
or
some
limits,
these
are
just
guidelines.