►
Description
Speaker: Jonathan Ellis, Apache Cassandra Project Chair and CTO/Co-Founder of DataStax
SlideShare Presentation: http://www.slideshare.net/planetcassandra/nyc-jonathan-ellis-keynote-cassandra-12-20
A
Welcome
to
this
amazing
vente
de
today,
we're
here
to
celebrate
and
really
recognize
the
achievements
of
everything
that
the
Cassandra
community
is
done,
especially
on
the
East
Coast,
and
the
amazing
thing
about
this
product.
For
all
this
technical
achievements
that
have
been
there
and
have
been
achieved
over
the
last
few
years
is
the
fact
that
without
the
community,
none
of
this
would
be
possible.
It's
the
people
behind
it
that
make
this
thing
as
great
as
it
is.
It's
the
20
committers
that
work
on
this
code
every
single
day.
A
It's
the
hundreds
of
users
that
report
bugs
and
submit
lines
of
code
as
they're,
going
through
the
process
to
make
their
own
technologies
better,
all
the
way
to
the
thousands
of
businesses
that
depend
on
this
to
keep
their
company
up
and
running
on
a
day-by-day
basis,
but
it
all
starts
with
the
people
and
without
those
people.
None
of
this
would
be
possible
without
that
community.
Things
like
true
100%
uptime,
wouldn't
be
possible.
A
Things
like
true
multi
data
center
capabilities
would
be
a
dream,
and
things
like
without
the
ability
to
have
those
people
behind
it,
companies
would
not
be
serving
their
customers
better
every
single
day.
Today
we
have
two
amazing
tracks.
We've
got
many
speakers
here
to
give
you
the
highs
and
lows
of
the
technology
things
to
do
better,
how
they're,
using
it
we've
got.
Speakers
from
large
companies
such
as
eBay
we've
got
companies
such
as
simple
reach
here,
to
tell
you
more
about
how
they're,
using
it
all
the
way
through
to
new
development
advances
on
client
libraries.
A
We've
got
schedules
all
around
so
check
those
out
as
you
go
and
see
what
will
really
help
you
as
you're
building
around
this
and
speaking
of
the
community,
I'd
like
to
welcome
Jonathan
Ellis
to
the
stage
jonathan.
Is
my
co-founder
and
he's
going
to
tell
you
a
little
bit
about
the
state
of
Cassandra
and
where
it's
going.
B
Thank
you
Matt,
so
with
the
theme
that
man
kind
of
launched
us
off
with
we
have
about
twice
as
many
people
here,
as
we
did
just
over
a
year
ago
for
our
last
New
York
conference.
So
that's
that's
the
kind
of
growth
that's
really
awesome
to
see
and-
and
it's
part
of
a
general
trend
that
that
I've
started
to
see
if
Cassandra
really
starting
to
go
mainstream
and
people
starting
to
become
more
aware
that
it's
not
this
database
for
social
media
that
it
started
out.
B
As
that
you
know,
people
are
using
Cassandra
and
the
advertising
space
in
the
energy
sector
in
government
processes
in
healthcare,
in
retail
in
energy.
It's
it's
really
a
general
purpose
tool
and
solves
the
big
data
problem
that
more
and
more
people
are
having
better
than
anyone
else
as
a
system
of
record.
B
B
Cassandra,
of
course,
on
this
slide
is
the
is
the
line
going
up
into
the
right
at
the
top
the
solid
black
line,
the
next
and
then
the
number
two
is
HBase
there
about
a
third
of
the
way
down.
So
this
this
is
actually
what
the
the
Toronto
guys
did
is
perfect
for
what
I
want
to
you
know
tell
the
world,
in
terms
of
you
know,
for
a
high-performance,
scalable
database.
B
B
So
just
because
they're,
both
kind
of
part
of
that
same
category,
even
though
we
really
kind
of
tackle
different
markets,
so
datastax
actually
hired
a
company
called
endpoint
to
basically
repeat
the
the
vldb
benchmark
but
add
MongoDB
to
the
mix.
So
on
this
slide,
I
have
Cassandra
HBase
in
MongoDB,
the
Cassandra
HBase
results
match
what
the
Toronto
guys
found
pretty
closely,
which
makes
me
feel
good
that
you
know
this
is
this,
is
a
reproducible
result
and
then
DB
is,
is
the
green
line
at
the
bottom?
So
what's
interesting
about
this?
B
Is
that
you
know
this
is
this
is
a
logarithmic
scale,
both
on
the
x
and
y
axis?
So
on
the
bottom,
we
have,
you
know
the
number
of
machines
in
the
cluster.
How
is
it
scaling,
as
we
add
machines,
and
then
the
y
axis
is
operations
per
second?
So,
even
though
the
the
MongoDB
line
is
about
halfway
up
the
slide,
it's
it's
really
only
about
one-twentieth
the
throughput
of
Cassandra.
So
now
this
is.
B
B
You
know
I
need
to
show
people
how
Cassandra
scales
beyond
that
that
relatively
small
number
of
machines,
fortunately
Netflix,
did
a
public
study
of
this
about
a
year
ago,
and
you
know
you
can
see
the
results
here.
The
scaling
is
just
a
nice
straight
line,
all
the
way
out
to
300
machines
where
Netflix
said
no,
that's
that's
pretty
good.
That's
giving
us
a
million
updates
per
second
across
across
these
300
large
instances.
So,
rather
than
the
the
you
know,
raw
performance
numbers
which,
which
obviously
change
this
was
done
against
Cassandra
08.
B
B
Finally,
some
quotes
from
from
people
that
I
saw
on
Twitter's
from
Cassandra
users
that
I
enjoyed
saying
that
you
know
this.
You
know
Cassandra,
you
can
talk
about
in
theory
how
it's
designed
for
reliability,
but
really
the
the
proof
is
in
the
pudding.
How
does
it?
How
does
it
hold
up
in
the
real
world?
I'll
call
your
attention
to
the
lower
right,
Nathan
Milford
will
be
talking
later
this
afternoon.
B
So
that's
that's
what
this
is
referring
to.
So
starting
with
concurrent
schema
changes,
let's
get
into
a
little
more
detail
here.
This
is
this
is
a
little
bit
of
a
mulligan
for
me
because
we
actually
tried
to
do
this
in
one
dot
one,
and
we
got
almost
all
of
the
way
there,
except
for
creating
new
tables,
which
is
arguably
the
most
important
part,
and
so
we
had
to
go
back
to
the
drawing
board
and
fix
it
right
for
12.
B
B
So
for
that
second
case,
that's
what
this
is
talking
about-
that
now
it's
safe
to
let
your
application
do
that
we
with
no
risk
of
getting
confused
about
what's
going
on
and
the
rest
of
the
cluster,
the
a
bigger
feature
in
terms
of
I
think
the
impact
on
on
cluster
management
is
virtual
nodes,
so
we've
always
had
the
paradigm
that
each
Cassandra
node
was
responsible
for
a
single
range
of
data.
So
on
the
left
here
this
is
this.
B
Is
what
we're
talking
about
in
the
you
know
the
11
and
earlier
days,
and
what
we're
doing
in
one
dot
2
is
we're
splitting
that
up,
so
that
we're
where
each
node
is
still
responsible
for
the
same
amount
of
data,
but
it's
split
up
into
smaller
ranges
and
what
that
does
for
us
is.
It
makes
it
so
that
the
machines
that
it
shares
data
with
that
it
has
pieces
of
data
replicated
to
it
also
spreads
out
across
the
cluster.
So
as
an
example
of
the
problems
of
this
solves.
B
If
I
were
rebuilding
node
5
in
this
six
node
cluster,
so
it's
failed.
I'm
brought
a
new
machine
in
and
I
need
to.
Re
replicate
the
data
to
it
that
the
old
node
5
used
to
have
so
I
have
ranges
of
data,
C,
D
and
E
replicated
to
it,
or
that
I
need
to
replicate
to
it,
and
I
can
grab
those
from
node,
3,
node,
1
and
node
4
and
there's
there's
other
choices
I
could
make.
But
fundamentally
I
can
pick
one
node
to
grab
each
of
those
ranges
from
so
in
this
six
node
cluster.
B
I
have
a
50-percent
participation
rate
in
that
rebuilt,
so
that's
not
terrible,
but
as
I
scale
that
cluster
up
the
same
number
of
machines
can
participate.
So
if
I
have
a
hundred
node
cluster,
that's
a
three
percent
participation
rate,
so
I,
my
rebuild
is
going
to
be
not
nearly
as
parallel,
not
nearly
as
quick
as
it
could
be.
So
since,
since
V
nodes
lets
us,
you
know
split
those
ranges
up
and
spread
them
across
the
cluster
in
terms
of
who
I'm
replicating
with
that
lets.
B
Everyone
participate
in
the
rebuild
so
in
in
one
dot
to
the
default.
If
you
enable
V
nodes,
it's
actually
disabled
by
default,
because
we
want,
we
want
to
be
a
little
bit
conservative
and
and
basically
not
surprised
people
who
don't
know
what
they're
signing
up
for,
because
some
of
the
management
strategies
are
a
little
bit
different,
but
by
default
we
use
256
V
nodes.
If,
if
you
have
more
machine,
if
you
have
hundreds
of
machines
in
your
cluster,
you
know
that
might
not
be
enough.
B
You
might
want
to
increase
that,
but
Cassandra
is
capable
of
increasing
that
after
you
deploy.
So
it's
not
something
you're
locked
into
the
other
thing
that
the
vino's
helps
a
lot
with
is
adding
new
machines
to
the
cluster.
So
if
you've
managed
Cassandra
clusters
before
you
know
that
either
you
need
to
double
the
number
of
machines
in
the
cluster
or
after
you
add
some
machines,
you
need
to
rebalance
the
cluster
and
have
it
basically
shift
everyone
around
the
token
ring
in
terms
of
what
they're
responsible
for
and
V
nodes
means.
B
So
if
you
want
to
enable
V
nodes,
if
you're
upgrading
and
you
do
want
to
enable
that,
then
there's
an
a-line,
you
uncomment
in
the
configuration
file
and
then
what
that's
going
to
do
is
that's
going
to
split
up
my
range
into
smaller
virtual
nodes,
but
they're
all
still
going
to
be
right
next
to
each
other.
So
then
the
next
step
is.
We
need
to
kind
of
spread
those
across
the
cluster
and
so
there's
a
there's,
a
command
to
do
that
called
shuffle,
and
so
once
you
do,
that
Cassandra
will
start
spreading.
B
B
So
the
first
of
those
is
better
support
for
just
a
bunch
of
disks
deployments.
So,
historically
we've
you
know
the
best
practice
has
been
to
deploy
Cassandra
in
a
raid
team
configuration
because
in
this
scenario,
if
I
have
a
bunch
of
hard
disks
on
Cassandra
and
one
of
them
fails,
Cassandra
hasn't
known
how
to
recognize
that
you
know
that
disk
is
dead
and
it's
not
coming
back,
and
so
we've
encouraged
people
to
deploy
on
raid
10,
which
hides
those
single
disk
failures
from
you.
B
The
downside,
of
course,
being
that
yeah
I've
I'm
already
letting
Cassandra
replicate
my
data
three
times
across
the
cluster
or
however
many
times
you
choose
and
so
giving
up
an
extra
fifty
percent
of
disk
space
to
have
that
replicated
locally
as
well.
It
feels
like
a
waste.
It's
it's.
A
trade
we'd
rather
not
make
so
one
dot
to
where
we're
emphasizing
support
for
that
jbug
configuration
just
like
Cassandra
managed
the
raw
disks,
and
so
we've
talked
to
sandra
to
recognize
that
you
know
when
it
disk
fails
and
what
to
do
about
it.
B
What
to
do
about.
It
is
a
little
bit.
It's
not
a
one-size-fits-all
answer,
actually
so
by
default.
What
we'll
do
is
if
we
recognize
that
a
disk
is
dead,
will
actually
shut
down
the
cassandra
process
on
that
machine
and
then
we'll,
let
you
you
know
either
replace
reboot,
strap
that
machine
or
you
know,
maybe
you
want
to
run
a
repair,
but
it's
it's
you
know
will
will
shut
it
down
by
default.
B
The
reason
is
that,
if,
if
we
instead
allow
that
machine
to
continue
running
knowing
that
it
has
a
missing
disc,
then
if
a
request
comes
to
me
for
data
that
I'm
supposed
to
be
managing
I'm
supposed
to
have
a
replica
of,
but
that
data
was
on
the
disk-
that's
gone,
you
know,
I,
don't
know
which
rose
I'm
missing.
All
I
know
is
that
I've
lost
a
disk,
but
I
don't
know
exactly
what's
missing.
B
So
that's
why
conservatively
we
stop
the
process
and
let
you
you
know:
if
you
reboot
strap
it,
then
you
won't
be
getting
any
out-of-date
information
served
up,
but
it's
up
to
you.
You
know,
if
you,
if
you,
if
you're
okay
with
with
running
it
with
Cassandra,
serving
up
those
that
obsolete
data,
then
that's
an
option
that
you
can
configure.
B
So
as
as,
as
you
know,
if
you've
you
know
done
garbage
collection
tuning
in
anger,
and
it
usually
does
make
you
angry
the
the
Java
heap
you
know
the
or
rather
the
JVM
is
garbage
collection.
Algorithms
can
deal
with
a
heap
up
to
about
eight
gigabytes
or
so
before
it.
You
know
it
really.
It
really
starts
to
get.
You
know
the
pause
times
get
worse.
Fragmentation
gets
worse
now.
B
Everything
gets
worse
above
that
you
might
be
able
to
push
it
up
to
12
or
16,
but
that's
really
the
outside
of
of
what
you
can
you
know,
grow
a
java
heat
to.
So
what
we
needed
to
do
was
we
needed
to
move
some
of
some
of
our
memory
usage.
We
needed
to
move
it
to
native
memory,
so
not
need
the
JVM
garbage
collector
to
deal
with
it.
So
that's
what
we
did
in
one
dot
to.
B
It's
good
to
have
that
garbage
collection
on
by
default
makes
a
lot
of
the
concurrent
algorithms
that
we
do
a
lot
more
sane.
But
in
some
of
these
cases
we
do
have
to
kind
of
go
behind
its
back
to
get
the
the
performance
that
we
need
so
switching
gears.
Now
to
what
have
we
added
on
the
client
development
side?
What
new
features
do
we
have
for
you
to
use
in
your
applications?
B
One
of
the
the
first
of
these
is
atomic
batches.
So
as
review
before
I
talk
about
the
atomic
part,
I
just
wanted
to
do
a
quick
review
of
what
regular
batches
are
and
and
what
the
problem
is
that
we're
trying
to
solve
with
atomic
batches.
So
a
batch
is
just
a
group
of
updates
to
different
rows
that
you
want
Cassandra
to
apply
as
a
unit.
So
in
this
slide,
I've
got
a
red,
yellow
and
blue
rose
that
that's
my
batch
and
I
and
those
live
on
different
replicas.
B
So
now
the
Red
Road,
you
know,
is
not
part
of
the
same
token
range
as
the
yellow
row
or
the
blue
row.
So
this
is
what
that
looks.
Like
you
know.
If
everything
goes
well,
that
the
client
says
here's
my
batch,
the
coordinator
says
okay
I'll
figure
out
where
each
of
those
rows
goes
and
send
them
out.
The
problem
is
what,
if
the
coordinator
actually
starts,
sending
out
those
rows
but
then
dies
partway
through
so
now,
I
had
this
group
of
rows
that
I
wanted
to
apply
as
a
unit,
and
you
know
I.
B
Now
there
it's
in
some
unknown
state.
You
know
some
of
the
rows
may
be
applied
and
not
others
I,
don't
know.
So
what
we
do
with
atomic
batches
is
we
actually
basically
create
a
backup
coordinator
by
using
this
concept,
called
a
batch
log,
which
is
basically
just
a
system
table
where
the
coordinator
will
pick
a
couple
other
machines
in
the
stir
and
say
here's
the
batch
that
I'm
about
to
apply.
B
If
you
don't
hear
back
from
me
soon
then
assume
that
I'm
dead
and
you
can
take
over
applying
that
batch.
So
in
that
scenario
the
batch
alot
will
actually
my
diagram
slightly
misleading
because
it
doesn't
know
which
rose
got
applied
either,
so
it
will
actually
replay
all
of
them,
but
that's
safe
in
the
Cassandra
world,
because
rights
are
idempotent,
so
maybe
the
the
biggest
change
in
12
is
that
so
we've
been
working
on
this.
This
thing
called
cql
the
cassandra
query.
B
B
B
So
what
I
want
to
do
here
is
show
you
how
Cassandra
can
map
data
schema
that
was
defined
under
the
old
thrift
rules
and
how
that
will
map
to
c
ql
and
I'm
gonna
get
a
little
bit
it's
going
to
get
a
little
bit.
Hairy
and
brains
will
explode,
but
the
take-home
lesson
is
that
you
can
express
anything
in
c
ql
that
you
could
have
using
the
thrift,
api
and
upgrading.
If
you
decide
that
I
want
to
start
writing
new
features
for
my
application
in
c
ql,
then
that's
a
gentle
upgrade
path.
B
I
don't
have
to
dump
and
recreate
tables,
or
any
of
that
I
can
use
the
same
data
files
that
I've
been
using,
but
Cassandra
will
know
how
to
deal
with
that
with
the
with
the
cassandra
query,
language,
so
I'll
show
you
how
that
works,
I'm
going
to
be
talking
about
a
fairly
simple
data
model
where
I
have
songs,
and
I
have
playlists
that
have
groups
of
songs
and
and
show
how
that
maps
to
c
ql.
So
the
songs
definition
is
is
both
the
simplest
and
the
hairiest
begin.
B
Just
in
terms
of
how
long
it
is
because
the
you
know,
the
thrift
schema
definition
was
not
optimized
for
this,
but
but
what
I'm
doing
is
I've
been
basically
creating
what
we
would
call
a
static
column
family
where
I
have
four
columns
in
this
in
this
table,
and
they
all
know,
every
row
has
the
same
columns.
Every
row
has
a
title
and
all
album
and
artist,
and
then
some
song
data,
you
know
mp3
or
flac
or
whatever.
B
So
this
actually
maps
one-to-one
straightforwardly
with
with
the
cql
definition
where
I
have
create
table
Titus
title
artist,
album
data-
you
know
very
straightforward,
so
you
can
see
how
now
these
rows
here,
where
I
have
these
thrift
or
storage
engine
data
cells?
You
know
those
turn
into
one
to
one
to
the
the
cql
columns.
So
there's
very,
very
straightforward.
B
Now
things
start
to
get
more
interesting
when
we
want
to
model
what
we,
what
we
called
a
wide
row
column
family,
so
to
illustrate
that
I'm
going
to
use
a
song
tags,
I
wanted
AG
my
my
songs
with
different
categories
and
each
song
can
have
multiple
tags
and,
of
course,
each
tag
can
be
applied
to
multiple
songs.
So
the
way
the
straightforward
way
to
do
this
in
one
dot,
one
would
have
been
with
a
table
like
this.
B
I'll
have
a
song
tags
table
and
what
I'm
going
to
do
is
I'm,
going
to
take
that
column,
name
and
I'm
going
to
treat
it
as
a
piece
of
data.
And
so
what?
What
we're
going
to
see
here
is
that
this
isn't
going
to
map
one
to
one
with
cql
columns
anymore
and
so
we're
I'm
going
to
do
a
little
bit
of
a
retroactive
terminology.
B
Change
so
from
from
here
on
when
I
say,
column,
I'm
going
to
be
talking
about
the
cql
concept
and
instead,
when
I'm
talking
about
the
the
thrift
concept
of
you
know
a
name
and
a
value
tuple,
I'm
going
to
say,
sell
so
I'm
to
just
to
make
it
clear
which
one
I'm
talking
about
so
so
each
of
these
cells
has
a
name
that
you
know
is
determined.
That
is
basically
the
name.
Is
the
the
tag
data
that
I'm
interested
in
so
the
way
we
map
that
to
c
ql?
B
Is
we
introduced
the
concept
of
a
compound
primary
key,
so
I'm
going
to
have
my
partition
key
that
the
song
ID
here
and
then
I'm
going
to
have
the
the
tag
name
and
by
having
that
tag
name
as
part
of
the
primary
key?
That's
telling
Cassandra
that
that
tag
name,
that's
actually
contained
in
the
cell
name,
and
so
each
of
those
cell
names
I've
tried
to
illustrate
here
that
this
orange
this
row
at
the
top
here
that
I've
grouped
in
an
orange
box
that
turns
into
one
row
per
sale
in
the
c
ql
world.
B
B
The
other
thing
that's
important
to
note
here
is
that,
if,
if
I
pull
so
did
this
create
table
here
up
at
the
top,
that's
kind
of
that's
optional
in
the
sense
that
if
I
don't
have
that,
if
I
had
the,
if
I
had
this
definition
in
one
dot,
one
and
then
I
upgrade
to
1
dot,
2
and
I
say
select
star
from
playlists
I'm
going
to
get
I'm
going
to
get
these.
This
result
set
back,
except
that
Cassandra
won't
know
what
name
to
give
to
each
of
those
components.
B
So
what
it
will
give
me
back
is
instead
of
ID
title
artist.
It'll
give
me
key
column,
one
column
to
column
three
and
value
is
what
it
will
give
me.
So
the
the
sort
of
column
metadata
is
optional
and
you
can
give
it
to
Cassandra
without
recreating
anything.
All
you
need
to
do
is
you
say,
alter
table,
playlists,
rename,
column,
12
title
and
that's
what
you
would
do
yeah.
The
other
thing
that
we've
added
in
1
dot
2
is
kind
of
syntactic
sugar
for
certain
types
of
composite
columns,
and
we
expose
these
as
elections.
B
Dictionary
and
I
I
won't
go
into
that
in
in
detail,
except
to
say
that
this
replaces
all
the
kind
of
one-off
methods
we've
had
for
thrift
for
described,
schema
for
getting
the
the
token
ring
and
so
forth.
So
we've
got
you
know
all
in
all
of
these
are
in
the
system.
Key
space
you've
got
key
spaces,
you've
got
column,
families
you've
got
local,
which
is
data
that
I
know
about
myself.
B
Note
that
the
tokens
column
and
because
of
the
wrapping
it's
a
little
bit
hard
to
tell,
though
the
tokens
column
is
the
second
to
last
one
here.
So
you
can
see
that
that's
a
set
of
zero
is
what
that
is.
So
I
have
a
single
token,
and
that
and
my
token
is
0
on
this
machine,
and
so
as
if,
if
you
upgraded
to
V
nodes,
then
that
would
be
a
larger
set,
so
we're
already
using
these
these
data
types
internally.
B
So
this
can
be
very
useful
for
figuring
out.
Why
is
Cassandra
not
as
fast
as
I
thought?
It
should
be
so
something
that
a
lot
of
people
like
to
do
when
they're,
starting
out
as
saying
that
hey
since
cassandra
is
going
to
order
everything
within
a
partition.
For
me,
I
can
really
I
can
use
that
to
making
a
persistent
q
really
easily.
So
my
q
definition
might
look
like
this,
where
I
have
a
queue,
ID
and
then
I'll
have
Q
entries
that
are
the
created
at
and
then
the
the
value.
B
So
because
so
Cassandra
will
give
me
these
q
entries
by
in
chronological
order.
So
as
I
pull
those
out,
you
know
I'll,
delete
them
and
then
always
get
the
the
next
most
recent
one
from
my
cue.
So
the
problem
is
so
I'll.
Here's
the
query,
I'll
do
at
the
top
here.
You
know
to
get
the
next
item
in
the
queue,
but
after
I've
done,
you
know
thousands
of
these
inserts
and
deletes.
Then
the
tombstones
start
to
be
a
problem,
and
so
this
is
this
doing
a
trace.
B
Illustrates
that,
where
everything
is
you
know
small
numbers
of
microseconds
until
we
get
to
this
highlighted
one
in
green,
that
all
of
a
sudden
this
took
you
know
35,000,
microseconds
or
35
milliseconds,
and
that
says
I
read
one
live
cell.
That's
what
you
asked
me
for
limit
1
and
a
hundred
thousand
tombstoned.
B
So
you
know
it
the
tombstones,
you
know,
aren't
free,
and
this
is
one
of
the
things
that
you
need
to
keep
in
mind
as
you're
designing
your
Cassandra
data
model
and
it's
always
been
a
little
bit
tough
to
to
see
those,
because
you
know
everything
in
the
client
API,
basically
months,
it's
a
tombstone.
It
doesn't
exist
anymore.
So
it's
very
easy
to
not
see
the
effect
that
they're
having
on
you
and
tracing
exposes
that.
So
briefly,
I
also
wanted
to
give
a
little
bit
of
a
heads
up
today
on
what
we're
working
on
for
20.
B
Like
I,
said
we're
targeting
this
for
july.
The
items
on
this
list
are
ordered
in
kind
of
how
far
along
we
are
in
implementing
them,
so
eager
retries
and
improved
compaction.
These
are
done
and
then
triggers
compare
and
set
and
a
more
efficient
repair.
Those
are
works
in
progress,
so
eager
retries
is
no
for
historical
reasons.
B
But
you
know
you
can
still.
You
can
still
have
hiccups
in
that.
You
know
either
because
I
routed
the
request
to
the
to
the
middle
one
here
and
then
he
died.
You
know
after
I
sent
him
the
request,
or
maybe
he
didn't
die,
but
maybe
maybe
he
had
a
garbage
collection
pause.
You
know
just
briefly,
so
what
what
eager
retries
does?
Is
it's
tunable
by
default?
It
will
use
75th
percentile.
B
So
that's
going
to
be
very
useful
for
your
smoothing
out
the
the
latency
volatility,
we're
doing
a
couple
things
for
improved
compaction,
we're
introducing
specialized
compaction
strategies
for
four
different
workloads,
a
one
that's
pretty
easy
to
take
care
of
is
I'm
only
inserting
new
data
I'm,
never
overriding
existing
rose
and
then
every
so
and
I
want
that
data
to
stay
around
for
30
days
or
or
three
months
or
whatever.
So
we
can.
We
don't
need
to
merge
those
data
files
with
existing
ones
to
take
care
of
your
row
over
rights.
B
All
we
need
to
do
is
expire
those
data
files,
all
at
once,
when
everything
in
that
row
or
in
that
table
in
that
data
file
has
expired.
The
interest
the
more
interesting
question
is:
can
we
do
anything
for
a
general
purpose
workload?
Can
we
do
better
and
I'm
going
to
refer
you
here
to
to
Jake
Lucy
yannis
talk
later
today,
he's
going
to
be
talking
about
his
new
compaction
strategy,
I
found
out
about
this
yesterday,
it's
it's
pretty
clever.
I'd
recommend
checking
that
out.
B
B
We
have
a
proof
of
concept,
it's
not
finished,
but
we
do
have
a
proof
of
concept
that
shows
that
it's
possible
and
this
actually
builds
on
the
atomic
batches
from
12,
which
is
why
that
it's
a
lot
more
tractable
now,
so
the
syndics
is
a
little
bit
up
in
the
air,
but
what
it
looks
like
is
it's
going
to
be
probably
something
like
this.
Where
you
tell
Cassandra
I
want
you
to
execute
this
trigger.
That's
in
this
jar
against
you
know
this
table
and
then
inside
the
jar.
B
You'll
have
a
class
that
implements
the
trigger
interface.
That
basically
takes
a
row
and
you
can
add
or
remove,
updates
from
that
row,
and
since
it
is
your
raw
code,
you
can
also
do
things
like
send
an
email.
You
can
do
things
like
emit
a
storm
tuple
and
other
things
like
that.
So
we're
giving
you
kind
of
maximum
flexibility
as
well
as
maximum,
not
user
friendliness,
because
you
know
it
is,
it
is
raw
code.
B
So
the
classic
example
I
like
to
use
is
what,
if
I
have
you
I
want
to
support
user
registration
so
in
in
Cassandra
by
itself?
There's
no
really
good
way
to
do
that,
because
I
can
have
concurrent
users
as
I've
Illustrated
here
both
say:
does
this
user
exists
Cassandra
says?
No,
so
both
of
them
try
to
create
it
and
remember
that
insert
in
Cassandra
is
really
insert
or
update.
So
one
of
these
is
going
to
scribble
over
the
insert
that
the
other
guy
did
so
we
need
to
be
able
to
separate
those.
B
And,
of
course,
paxos
is
the
distributed,
consistency
superhero
and
that's
what
that's
what
we
ended
up
with
after
after
a
lot
of
research,
so
Paxos
is
basically
two-phase
commit
on
steroids
that
it
lets
a
new
proposer
kind
of
take
over
and
and
finish
up
a
proposal
that
that
got
have
got
partway
through.
So
it
handles
the
kind
of
failure
conditions
that
two
phase
commit.
Doesn't
so
there's
a
couple
interesting
questions
when
we
apply
this
to
Cassandra,
one
of
which
is
what
do
we
call
it?
B
You
know
I
can
say,
compare
and
set
to
an
audience
like
this
to
a
more
general
audience.
I
think
I
need
something:
a
little
more
descriptive,
a
little
more
user
friendly,
so
kind
of
trying
to
figure
that
out.
Another
is
that
this
is
actually
a
rare
feature
that
it's
actually
easier
to
create
a
thrift
API
poor
than
it
is
a
cql
one,
because
there's
no
real
analog
in
the
relational
world.
The
relational
world
solves
this
problem
with
transactions
which
are
not
a
good
fit
for
Cassandra
for
a
bunch
of
reasons.
B
Finally
we're
looking
at
doing
more
efficient
repair.
So
repair,
of
course,
is
consist
of
two
phases
that
we
call
validation
where
it
builds
a
hash
tree
of
the
data
that
it
has
and
exchanges
it
with
the
other
replicas
and
then,
after
that,
the
the
replicas
dig
down
in
the
hash
tree
to
find
out
where
the
inconsistencies
are,
where
the
replicas
have
different
information,
and
then
they
stream
that
data
to
each
other.
So
most
of
the
most
of
the
time,
there's
a
relatively
small
amount
that
we
need
to
stream.
B
So
the
validation
part
is
the
painful
part.
So
what
I
have
to
do
is
I
have
to
go
over
all
the
data
by
that
I
have
for
the
range
that
I'm
repairing
turn
it
into
a
hash
tree.
If
I
then
add
more
data,
I
have
to
build
a
new
hash
street
from
that
I
have
to
go
through
that
same
process.
What
we'd
like
to
do
is
if
I've
added
new
data
I
want
to
build
a
hash
tree
just
for
that
new
data
and
and
resync
that
so
I
think
we
can
do
this.
B
This
does
mean
that
we're
going
to
need
to
have
kind
of
two
modes
of
repair,
because,
if
going
back
to
my
first
example
of
j
bata,
I've
lost
a
disk
and
I
need
to
rebuild
the
data
that
was
there.
So
in
that
case,
I
can't
just
repair
it.
New
data
I
need
to
rebuild
everything,
so
we'll
need
to
add
an
option
to
repair
that
says
you
know
include
previously
repaired.
You
know,
rebuild
everything
kind
of
mode,
so
I
I'll
take
two
questions.
2
questions.
B
The
question
was
since
we're
adding
a
cql
language
inspired
by
you,
know,
sequel,
have
you
considered
new
sequel
and
and
the
approaches
they're
taking
their
fundamentally
it's
those
systems
are
taking
a
different
approach
where
there
they
are
trying
to
provide
you
full
acid
transactions
and
and
there's
a
bunch
of
limitations
that
come
with
that
that
we're
not
prepared
to
accept
primarily
around
not
being
able
to
do
the
kind
of
multi
Dennis
data
center
replication
that
we
do
as
well
as
now.
There's
a
lot
of
overhead
with
that
and
and
Cassandra's
faster.
B
Are
you
going
to
decommission
the
existing
thrift
API
and
will
will
new
features
that
are
being
added
to
c
ql
become
available
to
existence?
Rift!
That's
a
good
question,
so
I
should
have
should
have
mentioned
that,
because
I
want
to
be
very
clear
that
thrift
isn't
going
anywhere
so
we're
not
going
to
break
working
code,
we're
very,
very
firm
on
that.
B
That's
that
said,
those
things
like
compare
and
set,
and
that's
straightforward
to
add
a
thrift
api,
for,
I
just
add
a
new
cass
thrift
method,
things
like
collections
that
doesn't
really
make
sense
in
the
thrift
world,
because
that's
kind
of
syntactic
sugar
that
I've
done
to
your
thrift
row
to
expose
that
to
c
ql.
So
on
the
thrift
side
you
would
still
be
dealing
with.
You
know
lists
of
byte
buffers
right
and
then
so
you
can
still
access
that
collection
data,
but
you
don't
really
have
anything
that
says
you
know
treat
it
differently.
B
You
just
have
to
go,
buy
that
composite
cell
so
yeah.
We
we
are
going
to
try
to
expose
things
to
thrift
where
that
makes
sense,
but
I
think
that
there's
going
to
be
cases
like
the
collections,
where
it
doesn't
really
make
sense
to
try
to
try
to
do
anything
more
than
thrift
already
does
with
it
so
I.
We
do
need
to
break
now
we're
going
to
the
the
two
tracks
now
and
at
has
one
more
word
of
housekeeping
for
us
just.
A
A
cute
few,
very
quick
things.
First
of
all,
at
twelve-fifteen
today,
edward
kappa
yolo
is
going
to
be
signing
the
book.
He
wrote
on
Cassandra
the
high-performance
cassandra
cookbook
right
over
yonder.
So
if
you
want
to
go
ask
them
any
questions
about
the
book
feel
free
to
go
visit
him
as
a
reminder.
There's
23
total
rooms
for
this
or
three
tracks.
This
is
the
first
one.
A
The
second
is
on
the
fourth
floor
and
then
the
meet
the
experts
session,
which
starts
at
launch,
is
on
the
eighth
floor
and
the
elevator
and
stairs
are
right
back
here.
We've
got
free,
Wi-Fi
access
in
here.
The
SSID
is
metro
wireless
and
the
password
is
Metro
2013
all
lowercase
on
that.
Second,
one.
A
Two
Metro
wireless's
and
it's
the
one
without
the
space
we've
got
charging
stations
/
by
our
partner
pavilion,
as
well
as
on
the
fourth
floor,
so
for
everyone
who
needs
to
have
a
laptop
knock
yourself
out
and
speaking
of
the
partners,
a
lot
of
this
wouldn't
be
possible
without
them.
So
please
stop
by
and
say
hi
to
them
and
see
if
there's
anything
of
interest
in
there,
because
they've
been
very,
very
gracious
and
making
this
event
possible
or
than
that
have
a
great
session.
Everyone-
and
thank
you.