►
From YouTube: Cassandra Summit 2015 Keynote
Description
Featuring Billy Bosworth, CEO of DataStax, Jonathan Ellis, Apache CassandraTM Project Chair, and Scott Guthrie, EVP of Microsoft.
The Cassandra Summit 2015 Keynote dives into the continued rise of NoSQL databases, Cassandra 3.0, and live demos featuring the leading distributed database technology, Apache Cassandra.
B
B
Wait
a
minute
how
come
up
here
alone,
where's
Rachel!
She
just
left
me
hanging.
Okay,
all
right!
Well,
all
right,
we'll
just
keep
going
alright!
So
first
thing
we're
going
to
do
this
morning
is
a
riveting,
live
demo
right
and
let's
bring
up
the
the
demo
here.
So
I
have
a
couple
of
data
centers
here
this
is
going
to
be
really
awesome.
Those
are
raspberry,
PI's,
running
Cassandra,
they're,
really
really
running
Cassandra,
so
I
have
data
sent
or
two
over
here
David
center,
one
over
here
killer
video
anybody
know
about
killer
video.
B
Oh
yeah,
we're
going
to
dig
into
it
we're
going
to
do
a
code
dive,
we're
going
to
see
how
it
works.
It's
amazing
and
we're
having
it
running
right
now
and
I
have
a
status
page
up
here.
It
shows
all
the
nodes
running.
You
know
something
you
might
see
in
a
plasma
and
you're
in
you're
in
your
data
center
or
something
like
that,
but
it's
just
showing
how
the
thing
is
running
right,
but
I'm
doing
this
alone,
all
right
where
the
hell's
Rachel
at
I
mean
she
should
be
here.
Well,
I'll
just
keep
going.
Okay!
B
B
B
C
B
B
B
B
A
minute
hold
on:
no,
no,
no,
no
hey
Bad
Monkey.
What
are
you
doing?
Oh,
no,
bad,
bad!
Oh!
No!
Oh!
No!
Put
that
down!
Put
that
down!
What
are
you
doing?
He
cut
the
network,
cable.
C
B
B
C
B
C
B
B
B
E
B
And
despite
a
tornado,
that
actually
happened,
but
I
mean
this
is
this
is
what
we
needed,
because
it
was
mission
critical
and
when
I
think
a
mission
critical.
It
could
be
anything
from
a
heartbeat
monitoring,
a
heartbeat
to
something
as
trivial
as
a
cat
video.
But
it
doesn't
matter.
Your
application
is
the
critical
part
and
if
you're
in
the
21st
century,
gown
is
dead,
you
cannot
survive
with
downtime
and
you
can't
survive
with
slow,
and
you
know,
because
you're
here,
the
Cassandra
that
database
that
can
deliver
this.
B
E
E
You
are
going
to
be
the
authors
of
a
new
way
of
life
and
the
expansion
process
was
going
to
occur
and
you
were
going
to
hold
the
power
of
authoring.
This
new
future
and
I
predicted
that
the
expansion
would
start
to
happen
pretty
quickly
after
last
year's
event.
But
I
was
actually
a
little
wrong
on
that
one,
because
the
expansion
has
happened
so
much
quicker
than
I.
E
Think
even
we
would
have
imagined
it
has
been
a
spectacular
thing
to
watch
us
go
beyond
Planck
time
for
it,
for
you,
nerdy
ones,
in
the
audience
and
into
the
inflationary
period.
We're
watching
this
stuff
grow
at
an
incredible
rate.
A
few
statistics
from
last
year's
conference
we
came
and
we
had
over
2,600
people
decided
to
join
us
on
site
to
register.
To
come
to
the
event.
We
had
more
than
a
thousand
people
who
watched
via
streaming
video
selected
portions
of
the
conference,
and
we
had
sixty
sessions
that
were
created
for
you
and
by
you.
E
That
was
remarkable
because
it
wasn't
too
long
before
that
when
we
were
having
conferences
with
2025
sessions,
and
we
were
calling
you
the
week
before
saying
hey,
can
you
present?
We
need
a
few
more
sessions
to
get
filled?
Well,
that's
all
changing
very
radically.
In
fact,
I
want
you
to
do
something
that
not
too
weird
I
won't
ask
you
to
get
up
and
hug
anybody
or
anything
like
that,
but
stretch
out
a
little
bit.
Cuz
I
do
want
to
ask
for
a
show
of
hands
on
something.
E
E
Let's
put
those
numbers
in
statistics
for
this
year's
conference
and
think
about
where
we
are
today
more
than
6,100
people
tried
to
get
here
today
by
registering
we
had
to
for
the
first
time
in
our
history,
stop
the
registrations
we're
out
of
space,
the
real
fire
marshal,
not
Patrick
with
the
hat,
but
the
real
fire
marshal
said.
We
can't
have
any
more
people
in
here,
so
we
had
to
actually
stop
the
registration
process
more
than
5,000
people
are
going
to
be
watching
this
event,
online
I
think
it's
going
to
be
far
more
than
that.
E
E
137
sessions
over
the
next
two
days,
five
months
ago,
our
community
team,
who
is
responsible
for
shepherding
the
process
datastax,
does
not
decide
which
sessions
get
presented.
That's
a
very
democratic
process
that
you
all
vote
on,
but
we
do
Shepherd
that
procedure
and
about
five
months
ago
the
team
walked
into
my
office
with
a
binder
about
that
thick
and
I
said.
What's
that,
and
they
said
these
are
abstract
submissions.
E
We
will
I
can't
even
imagine
what
these
numbers
are
going
to
be
as
we
go
forward,
but
this
is
incredible
in
one
year's
time
most
companies
don't
even
make
it
this
far
in
the
size
of
a
conference
throughout
their
entire
life
and
to
watch
this
happening
happening
and
unfolding
right
in
front
of
our
eyes
and,
having
you
be
a
part
of
it
is
just
remarkable.
So
in
case
it's
not
obvious
to
you
yet.
You
have
chosen
wisely
investing
your
career.
Thank
you
for
the
three
applies
right.
E
You
have
chosen
wisely
making
these
choices
with
your
energy
and
talent
is
a
non
trivial
exercise,
so
you
are
making
a
bet
you're,
making
a
bet
with
the
most
important
thing
you
have
in
life.
Your
time,
that's
something
we
all
have
in
common
in.
You
are
choosing
a
technology
that
is
special,
but
what
makes
it
so
special
it
does
some
amazing
things,
but
a
lot
of
software
does
amazing
things.
I
think
what's
different
about
Cassandra.
E
It
is
that
it
is
foundational
it's
bedrock
and
when
you
have
a
technology
like
that,
that's
not
an
ancillary
technology,
but
a
foundational
technology.
All
of
a
sudden,
you
can
build
upon
it
in
amazing
and
incredible
ways:
you
can
build
applications,
you
can
build
your
career
and
you
can
build
entire
companies
on
this
technology
at
datastax.
We
are
very
privileged
to
be
so
integrated
and
intertwined
with
this
technology.
E
We
have
our
offering
called
datastax
enterprise
that
we
are
very
delighted
to
bring
people
world-class
support
and
enterprise
functionality
and
features,
and
very
easy-to-use
tools-
that's
great
and
exciting,
and
it's
a
privilege
to
do
that
for
our
summers.
But
for
us
that's
just
the
beginning.
That's
just
the
foundation
and
for
you
in
your
world,
in
your
applications
and
your
company's,
it's
probably
much
the
same.
E
You
see
this
technology
as
a
bedrock
foundation,
and
then
you
can
start
to
free
your
mind
to
think
very
differently
about
how
you
build
applications
and
service
customers
and
bring
context
of
transactions,
whether
it's
with
technologies
like
spark
or
solar
or
in
memory
or
a
variety
of
other
things.
How
can
you
take
those
ancillary
technologies
plug
them
into
this
bedrock
foundation
and
then
do
things
in
new
and
interesting
ways?
E
I
think
the
best
way
to
really
understand
that
is
to
hear
from
people
who
live
this
every
day
at
datastax
we
have
a
community
team,
they're
called
evangelist
and
you
are
their
audience.
They
live
to
be
passionate
about
you
about
how
to
bring
you,
the
wisdom
and
knowledge
and
education
and
examples
to
help
this
community
grow
and
to
learn
from
you
and
to
bring
it
all
together.
E
Two
of
our
best
are
John
Haddad
and
Luke.
Tillman
they've
done
lots
of
really
good
creative
work
to
make
examples
for
people
to
learn
from
and
understand.
How
do
you
think
about
building
a
modern
transaction?
How
is
it
different
from
the
way
things
used
to
be?
Is
it
more
difficult?
Is
it
easier?
Is
it
different
and
hearing
from
them
directly
I
think
will
help
us
all
gain
a
very
tactile
understanding
of
what
some
of
this
future
can
be?
So
would
you
please
join
me
in
welcoming
to
the
stage
John
and
Luke.
A
Thank
you
very
much
Billy,
as
Billy
mentioned,
I
am
Luke
Tillman
and
I'm
John
Haddad,
and
we
are
technical
evangelist
for
datastax.
So
before
john
and
I
joined
datastax
believe
it
or
not,
we
were
actually
responsible
for
putting
things
in
production.
So
before
we
started
living
this
sort
of
glamorous
life
of
evangelist,
they
actually
let
us
write
code
and
put
things
into
production.
That's.
F
Right
and
it's
important
to
us
that
we
continue
to
understand
what
that's
like
we're
talking
about
provisioning
servers
in
the
cloud
we're
talking
about,
reading,
documentation,
understanding,
how
things
work
and
we're
also
talking
about
writing
code.
There's
only
one
way
that
we
can
continue
to
understand
this.
We
have
to
keep
building
things
so.
A
You
saw
it
in
the
earlier
demo.
We
decided
to
build
something
called
killer:
video,
calm
and
killer.
Video
is
a
video
sharing
web
application
and
it's
powered
by
datastax
enterprise
and
Microsoft
Azure.
So
if
you
go
to
killer
video
com,
that's
killer
video
without
an
e
you
will
find
links
to
well
you'll
find
not
only
the
live
demo,
but
you'll
find
links
to
things
like
the
source
code.
The
cql
schema
as
well
as
a
whole
bunch
of
other
resources
to
help
you
get
started,
building
your
own
applications
on
top
of
Apache
Cassandra.
A
F
Yep
now
what
if
we
want
to
build
something
a
little
bit
more
complex?
What
if
we
want
to
build
something
like
a
personalized
recommendation?
Engine
you've
probably
seen
recommendation
engines
before.
Maybe
you
bought
something
at
an
online
store
and
you've
gotten
other
things
that
maybe
you'd
be
interested
in
buying,
or
maybe
you
waited
a
movie
highly
on
netflix
and
you
got
recommendations
to
watch
other
movies
that
are
very
similar.
F
A
You
haven't
heard
of
apache
spark
before
you're
going
to
hear
a
lot
about
it
over
the
next
couple
of
days,
and
so
apache
spark
is
just
a
framework
for
doing
distributed
computing
and
we
spend
a
lot
of
time
at
datastax,
making
sure
that
spark
works
well,
not
only
with
open
source
Cassandra,
but
we've
also
taken
and
integrated
it
in
to
datastax
enterprise
and
so
with
our
recommendation
engine.
Instead
of
building
it
from
scratch,
we're
going
to
leverage
one
of
sparks
machine
learning,
algorithms
called
alternating
least
squares
and.
F
F
A
model
lets
us
get
recommendations
for
people
based
on
the
movies
that
other
people
have
already
seen
and
liked
that
have
in
common
traits
with
me,
and
this
is
really
cool-
we're
going
to
take
the
predictions
that
we
get
out
of
this
algorithm
and
we're
in
a
store,
the
MCUs
sandra,
so
that
we
can
show
them
on
the
homepage.
So
this
is
really
really
cool,
but
that's
not
the
coolest
thing
about
this.
The
thing
that's
amazing
is
that
we
managed
to
build
this
in
less
than
a
day
and
under
a
hundred
lines
of
code.
F
Now,
that's
absolutely
unbelievable.
This
is
something
that
would
have
been
ludicrously
hard
just
a
few
years
ago,
and
now
anyone
in
the
room
can
do
this.
Okay,
you
don't
need
a
team
of
PhDs
and
machine
learning
to
be
able
to
do
this.
Hey
Luke,
you
have
a
PhD
I,
do
not
have
a
PhD.
Do
you
have
a
PhD
I,
don't
have
a
PhD,
so
it's
really
cool
about
this
is
Luke
mentioned.
This
is
an
open
source
project.
It's
apache
spark,
but
it's
also
baked
right
into
DSC
and
it's
really
great
and
easy
to
use.
So.
A
F
It's
really
not
enough
to
just
ask
people
to
tag
everything
and
be
only
be
able
to
search
on
those
tags.
Users
expect
that
they
can
do
things
like
search
on
the
title.
They
can
search
on
the
description,
other
metadata,
that's
available,
and
so
to
do
that.
What
we've
done
is
we've
leveraged
the
search
feature
built
into
DIC
as
well,
so.
A
If
you've
never
seen
this
before,
this
is
what
it
looks
like
to
turn
on
search
on
our
videos
table
in
killer
video.
It's
just
a
single
line
from
the
command
line,
and
once
we
have
this
turned
on,
we
can
start
sending
solar
queries
to
Cassandra
and,
if
you've
never
seen
solar
query
syntax
before
this
is
what
it
looks
like
to
send.
A
simple
query
for
Cassandra
in
the
description
of
a
video
and
one
of
the
cool
things
about
DSC
search.
A
Is
that
DSC
that
so
creating
is
baked
right
into
cql
the
query
language
for
Cassandra,
so
when
I
want
to
send
this
search
to
Cassandra
I
can
actually
do
a
cql
query.
That
looks
something
like
this
and
I'm
also
not
just
limited
to
simple
stuff.
Like
simple
queries
like
this
one,
where
we
do
cassandra
in
the
description
because
we're
based
on
apache
solr,
we
have
all
sort
of
the
power
and
the
flexibility
of
the
solar
project
behind
us.
So
when
somebody
actually
uses
the
search
box
on
killer
video,
we
really
send
a
query
over.
F
One
of
the
things-
that's
really
amazing
about
this,
similar
to
the
work
that
we've
already
looked
at
was
sparked.
Is
this
wasn't
particularly
difficult?
In
fact,
this
took
only
a
couple
of
hours
and
maybe
was
about
20
lines
of
code
in
order
to
get
the
search
functionality.
That's
pretty
cool,
you
don't
need
a
team
of
search
experts
in
order
to
add
search
to
your
site,
that's
already
backed
by
Cassandra.
Hey
look.
Are
you
a
search
expert.
F
Yeah,
your
search,
excellent
yeah.
So
what
have
we
learned
here?
Hopefully
we
all
taken
away
that
adding
these
features,
which
used
to
be
really
complex
and
used
to
be
really
hard,
can
now
be
added
in
a
trivial
amount
of
time.
This
is
absolutely
amazing.
We
no
longer
have
to
choose
between
a
scalable
operational
data
store
and
one
that's
easy
to
use,
and
it's.
F
E
E
Hopefully,
in
an
audience
of
this
size,
some
of
you
at
least
are
as
old
as
I
am,
and
you
remember
the
heady
days
of
something
we
used
to
call
internet
development
and
when
we
were
doing
our
internet
development,
we
were
looking
at
all
kind
of
new
technologies
and
trying
to
figure
out
how
this
new
world
was
going
to
sort
out,
and
you
probably
remember
the
day
when
you
first
took
a
look
at
something
called
asp.net
anybody
remember
those
days
come
on.
You
got
me
that
old
all
right.
We
got
some
old-timers
in
here.
E
E
E
E
E
A
bit
simple
and
it's
easy
that
way
so
Scott
you
still
work
for
Microsoft
I
assume
this
audience
is
largely
going
to
be
very
open
source
oriented,
very
Linux
oriented,
there's
a
lot
of
them.
Are
you
in
the
right
place?
Are
you
sure
you
meant
to
be
at
this
conference
I
hope
so?
Ok,
so
it's
gotten
I've
been
talking
quite
a
bit
about
that
fact
that
there's
this
traditional
sort
of
disconnect
between
the
technologies,
that
much
of
many
of
you
are
very
well
versed
in,
and
the
traditional
technologies
of
Microsoft.
E
D
It's
good
point:
it's
more
things
that
happened
in
July,
Billy
joined
me
at
a
microsoft
event,
which
is
our
big
worldwide
partner
conference,
and
we
kind
of
talked
about
the
same
question
which
is
gosh.
Why
is
he
here
and
you
know,
I
think
part
of
it
is
really
hopefully
representative,
the
kind
of
new
type
of
Microsoft
that
we're
really
looking
to
build,
which
is
one
where
how
do
we
put
customers
really
at
the
center
of
everything
that
we
do
and
then
work
back
and
figure
out?
D
How
do
we
best
meet
their
needs
and
how
do
we
enable
them
to
transform
their
businesses,
build
great
applications
and
really
leverage
the
cloud
in
order
to
transform
everything
that
they
do?
And
that
means
you
know
I'm
able
m
to
use
every
technology,
and
so
we
love
windows.
We
love
Linux.
That
means
we
love
open
source
and
not
only
are
we
looking
to
embrace
open
source
and
enable
it
to
work
great
with
our
platform.
But
how
do
we
take
even
our
core
platform?
D
Things
like
asp.net
and
the.net
framework
and
open
source
that
as
well
I,
contribute
our
own
code
into
the
ecosystem
and
so
yeah.
That's
that's
a
big
change
for
us
and
it
we're
still
early
in
the
journey,
but
hopefully
opens
up
a
whole
bunch
of
possibilities,
and
you
know
enable
some
great
partnerships,
like
the
we
have
with
data
stocks
today.
It.
E
E
D
I
mean
I,
think
every
customer
right
now
and
every
organization
is
really
looking
at.
How
do
you
embrace
the
cloud
in
a
pretty
substantial
way
and
that
journey
is
going
to
look
different
for
different
organizations?
You
know
some
have
large
existing
on-premise
investments
that
are
going
to
take
years
to
depreciate
or
years
to
migrate,
and
you
know
there's
others
that
are
startups
and
you
know
don't
have
any
existing
legacy.
I
can
move
even
faster
and
I.
D
E
D
I
mean
one
of
the
things
that
we
often
talk
about
its
kind
of
core
to
our
strategies.
How
do
we
enable
a
world
where
you
can
take
advantage
of
hyperscale,
where,
when
you
have
a
runaway
hit
application,
you
can
scale
up
to
any
amount
of
capacity
where
you
can
basically
run
your
solution
all
over
the
world
close
to
your
customers
yeah?
D
So
you
have
the
optimum
performance
that
can
really
both
please
them,
but
also
frankly,
drive
your
revenue
and
then
marry
that
hyperscale
with
an
enterprise-grade
platform,
and
then
the
hybrid
capability
to
maintain
maximum
flexibility
and
I.
Think
that
combination
of
hyperscale,
hybrid
and
enterprise
grade
ends
up
being
super
powerful
and
basically
means
you
can
go
and
tackle
any
type
of
scenario.
So.
E
When
you're
having
these
very
practical
discussions
with
somebody
and
you
go
in,
and
and
they
have
these
six
data,
centers
and
they're-
trying
to
go
through
a
data
center,
rationalization
project
and
they're
trying
to
optimize
for
these
things-
are
these
some
of
the
core
topics
that
you
find
yourself
engaging
with
at
a
strategic
level
with
the
with
the
CIO
or
the
VP
of
IT
or
people
are
trying
to
figure
this
out.
Yeah.
D
I
mean
I
think
right
now
we're
the
stage
in
the
industry
around
cloud
adoption
we're,
certainly
the
enterprise
level
I'm
now
finding
pretty
much
every
Enterprise
is,
you
know
grappling
with
these
problems
because
they
are,
they
do
require
a
lot
of
things
to
work
out
and
figure
out.
What
does
that
journey?
D
E
One
of
the
things
that
you
and
I
were
talking
about
offline
because
it
is
surreal.
Joy
gets
to
talk
to
Scott
about
the
development
world
and
how
application
architectures
have
changed
and
I
think.
When
people
talk
about
cloud,
they
automatically
start
to
assume
things
like
operational
cost
reduction,
and
you
know
don't
need
as
much
power
as
much
footprint
and
I
don't
need
the
cost
of
administering
all
those
machines.
I
think
that's
important,
but
again
when
you
have
a
database
like
Cassandra.
That
starts
to
enable
you
to
distribute
the
database
in
an
active
state.
E
There's
another
really
important
element
that
I
that
I
know
many
of
you
really
are
trying
to
optimize
for
and
that's
low
local
latency
right,
great
alliteration
there
but
low
local
latency
is
is
the
key,
because,
with
these
apps
being
at
this
scale,
the
throughput
becomes
so
important.
When
that
app
is
successful,
which
sometimes
is
the
worst
cause
of
failure,
unfortunately,
is
the
app
succeeds.
It
needs
to
scale,
and
then
it
falls
over.
E
D
I
think
you
know
it's.
You
know
one
of
the
things
that
there's
been
a
lot
of
studies
on,
which
is
you
know
what
is
the
cost
of
say,
10
or
15
milliseconds
of
latency
in
your
application
for
a
mobile
or
web
based
solution
to
any
customer
was
a
consumer
or
an
enterprise,
and
you
know,
basically,
you
know
for
every
millisecond,
you
add.
D
You
really
have
a
winning
solution,
and
so
you
know
the
thing
that
we've
focused
on,
for
example
Thatcher
is
we
now
have
19,
we
call
data
center
regions
around
the
world.
Put
that
perspective,
that's
actually
more
than
AWS
and
Google,
combined
and
they're,
literally
all
over
the
world.
North
America
South
America
Europe,
Asia,
Australia
Japan.
We
even
have
two
regions
in
mainland
China
we're
the
only
cloud
provider,
Western
cloud
provider
that
operates
there
and
so.
D
They're
up
and
running
in
beijing
and
shanghai
today,
Wow,
and
so
you
can
basically
take
datastax
or
cassandra
based
solution
and
you
can
deploy
them
into
one
or
all
of
those
different
data
centers
and
you
get
to
choose
where
you
want
to
run
your
code
and
the
beauty.
Is
you
know
if
you
want
to
reach
the
Australian
market,
you
want
to
make
sure
your
app
is
running
in
Australia.
D
E
Great
yeah
I
do
think
that
availability,
we
always
think
of
availability
as
this
kind
of
dramatic
outage,
but
in
today's
world,
if
an
app
is
slow,
it
might
as
well
be
unavailable
not
just
for
the
CSAT
in
the
bounce
rate,
but
it
will
literally
start
to
queue
up
to
a
state
where
it
can't
catch
up
anymore,
and
so
the
throughput
has
to
be
there
and
that
low
local
agency
is
really
important,
yep.
So
some
closing
thoughts.
E
You
have
we're
all
here
and
we
get
to
benefit
from
some
of
your
industry
experience
and
you
talk
to
global
customers
on
a
regular
basis.
One
thing
I
love
about
scott
is
an
executive.
Is
he
really
truly
is
a
person
of
the
people?
He
is
out
a
lot
with
customers,
so
if
you
were
going
to
pass
on
just
a
couple
of
nuggets
of
wisdom
so
that
we
don't
bang
our
heads
on
the
same,
maybe
walls
that
others
have
before
us.
What
would
you
leave
us
with?
Well.
D
I
think,
if
you're
building
apps
in
this
modern
era,
there's
many
lessons-
I
mean
if
I
to
pick
3
I'd,
say
on
the
really
design
for
agility
and
think
hard.
As
you
start
building
and
scaling
solutions,
how
do
you
have
the
flexibility
and
the
agility
to
react
because
you're
going
to
have
different
things,
whether
it's
you
know
a
natural
disaster
or
someone's
going
to
come
in
your
office
and
say,
look
I
need
this
feature
done
quickly.
D
Yeah
and
I
need
to
be
able
to
scale
around
the
world
so
having
kind
of
an
engineering
system
and
a
set
of
platform
choices
that
give
you
that
agility
I
think
it's
probably
the
most
critical
thing
in
the
cloud
space
I
think
the
other
one
I'd
say
so
have
good
monitoring,
because
it's
one
thing
to
be
able
to
kind
of
know
that
you're
reacting
to
our
tornado.
It's
another
thing
to
know
that
the
tornado
actually
hit
you
yeah
and
I'd
say
the
number
one
thing
for
any
online
system
that
we
typically
run
into
is
gosh.
D
I
wish
I
had
more
monitoring
data
in
place
to
understand
what
the
problem
was.
I
think
the
third
one
is
really
just
don't
throw
any
data
way.
It's
amazing
how
much
value
you
can
extract
from
the
data
that
you're
already
storing
inside
your
applications.
You
saw
the
the
demo
a
little
bit
earlier
in
terms
of
how
can
you
apply
spark
and
solar
and
others
to
data,
but
I'd
say
you
know
in
general:
that's
something
that
that
you
can
really
find
and
extract
just
huge
amounts
of
value
from
that
data.
E
Very
different
mindset
from
how
we
thought,
just
even
five
years
ago
and
certainly
ten
years
ago,
on
how
to
build
these
apps.
So
thank
you
very
much
for
that
wisdom,
Scott,
but
we
did
something
with
the
video
killer,
videos
demo.
That
was
a
big
jump,
and
that
is
we
just
assumed
that
the
infrastructure
was
in
place
when
we
began.
E
We've
just
announced
some
more
about
our
strategic
partnership
together
as
datastax
and
microsoft,
and
part
of
that
is
the
experience
that
we
want
you
to
feel
when
you're
working
with
an
azure
environment
and
when
you're
working
with
a
datastax
cluster.
So
would
you
be
willing
today
to
actually
show
us
what
that
will
feel
like
in
real
in
real
time
sure,
let's
write
that
most
let's
come
on
over
and
get
started.
D
Password
here,
I
can
basically
we'll
call
it
Scott
demo,
the
first
time
I
did
it
and
then
basically
I
can
choose
where
I
want
to
run
this
around
the
world.
As
I
mentioned,
we
got
19
regions
open
for
business.
Today,
you
can
basically
just
click
any
of
them
from
this
list,
which
is
doing
the
West
us
and
so.
D
That
I'm
just
gonna
hit
okay
and
then
you'll
notice.
What
we've
done
is
sort
of
integrated
in
the
datastax
specific
settings
as
well
into
the
experience,
and
so
you
know,
as
I
mentioned,
you
could
do
a
single
BM
but
be
more
impressive.
You
could
use,
let's
say
90,
know
Cassandra
cluster.
Where
will
basically
install
it
and
coordinate
it?
It's
as
simple
as
pulling
it
from
the
drop-down
list.
D
I
can
choose
the
size
of
machines,
I
want
to
run,
and
so
our
largest
VMs
can
do
a
half
a
terabyte
of
RAM
7
terabytes
of
local
SSD
storage,
which
is
a
pretty
screaming
fast.
Have
you
got
90
of
those
deployed
around
the
world
and
then
basically,
I
can
just
go
ahead
and
enter
in
my
datastax,
username
and
password,
which
is
the
hardest
part
of
the
demo.
So
have
to
remember
what.
D
This
conference,
basically,
you
could
okay
you'll
see
the
summary.
We
click
OK
and
then
basically
just
confirm
it.
If
you
don't
have
a
license,
you
can
actually
buy
it
and
transact
it
directly.
As
part
of
this
experience,
click
x
and
we're
now
deploying
a
90
node
Cassandra
datastax
cluster
in
the
western
half
the
u.s..
This
will
take
a
few
minutes
and
then
at
which
point
I've
got
a
fully
working
system
up
and
running.
Here's
one
I
deployed
in
East
Asia
yesterday
and
you
can
see
basically
or
all
the
nodes
that
are
now
running.
D
If
I
want
to
drill
into
any
of
these
directly
in
the
azure
management
console
you
can
see,
I
can
actually
do
that,
and
so
I
can
see
all
the
settings,
the
network
settings
it's
pulling
out,
cpu
percentage
directly
from
asia,
and
you
can
see
going
back
to
the
how
much
we
love
linux.
We've
even
integrated
directly
the
linux
serial
console
output
directly
in
the
management
portal,
and
so
you
can
see
here.
D
E
D
In
one
place,
full
role-based
access
control,
everything's
set
up.
You
know
this
this
we
show
this
90
node
cluster.
We're
deploying
we're
even
going
to
working
on
a
template
that
found
out
actually
after
Ursa
yesterday,
which
will
also
now
allow
you
to
deploy
multiple
regions
simultaneously
and
set
up
a
virtual
private
network
automatically
or
if
the
same
sort
of
two
minutes
to
wow
factor.
You
saw
here
and
then
basically
you
know
the
beauty
is
you
can
use
all
the
data
stacks
tools
against
this,
so
here's
that
90
node
cluster
up
and
running
and.
D
E
If
you
guys
are
interested
in
more
of
that
killer,
videos,
application
and
you
want
to
dive
deep
into
the
data
model
or
how
the
code
was
constructed
or
the
queries
that
were
used
in
the
partner
pavilion
area.
You
can
find
our
meet
the
experts
centers,
and
they
will
help
you
go
very
deep
under
the
hood
of
that
application
if
you'd
like.
So.
Thank
you
very
much
also
to
our
engineering
teams.
One
of
the
things
that
has
made
this
a
great
partnership
is:
we've
got
executive
alignment
between
Scott
and
me.
E
E
Next
up,
I
would
like
to
introduce
mr.
Jonathan
Ellis.
I
have
had
the
privilege
of
working
with
Jonathan
now
for
over
four
years
at
datastax,
Jonathan
was
one
of
the
founders
along
with
Matt
file
and
jonathan
is
a
pretty
special
person.
I
continue
to
be
impressed
more
each
year
of
getting
to
know
him
and
not
just
for
the
benefits
that
he
brings
to
the
technical
world
on
the
Cassandra
side.
Him
and
his
team
do
amazing
things.
E
He
and
his
team
do
amazing
things
tell
my
daughter's
use
the
right
program,
but
they
are
also
so
passionate
about
every
one
of
you.
They
really
want
the
community
experience
to
be
phenomenal
and
they've
done
so
much
work
to
that
end.
So
I
would
like
without
further
ado
to
have
you
help
me
welcome
on
stage
the
Apache
Cassandra
chairman
mr.
Jonathan
Ellis.
G
Thank
You
Billy
I'd
also
like
to
thank
Scott
and
the
evangelist
for
those
fantastic
demos.
I
would
like
to
be
the
third
person
to
get
on
stage
here
and
tell
you
to
check
out
that
killer
video
application
and
see
how
easy
it
is
to
build
a
modern
application
with
Apache
Cassandra
and
deadest
X
enterprise.
We've
put
a
lot
of
effort
at
datastax
over
the
last
couple
years
to
bring
you
a
first-class
experience
in
building
those
applications
with
our
Cassandra
drivers.
G
This
is
the
kind
of
challenge
that
modern
applications
need
to
be
able
to
cope
with
and
they
need
a
new
generation
of
infrastructure.
That's
designed
to
handle
this
kind
of
event,
that's
why
in
2013
Gartner
recognized
this
and
replaced
their
oltp
database
report
with
one
covering
a
broader
category
of
all
operational
databases,
including
next-generation
technology
like
Cassandra,
and
if
your
ebay
or
Instagram
or
Salesforce
and
you're
looking
to
build
a
new
application
or
extend
an
existing
one.
G
Now
for
me,
when,
when
I'm
thinking
about
a
category,
it
helps
to
have
specific
examples
to
help
me
wrap
my
mind
around
around
that,
and
so
in
the
database
industry
I
like
to
think
of
databases
along
two
axes
and
on
the
top.
Here
we
have
the
operational
category,
that's
where
you
run
your
business
as
opposed
to
the
analytical
category
on
the
bottom,
which
is
where
you
run
reports
against
what
happened
in
your
business
and
so
in
the
upper
left.
G
We
have
the
Oracle
and
sequel
server
operational
systems
and
in
the
upper
right
you
have
the
next
generation
operational
systems
like
Cassandra
and
people.
Ask
me:
well:
what's
the
difference
between
know
this
technology
from
the
90s
and
Oracle
and
sequel,
server
and
so
forth,
and
you
know
Apache
Cassandra,
what
makes
it
different
and
better
suited
to
modern
web
mobile
and
IOT
applications
and
there's
there's
three
key
properties
of
a
modern
operational
database
that
Cassandra
does
better
than
anyone
in
the
industry.
G
Those
are
the
ability
to
be
always
on
and
the
ability
to
scale
and
the
capacity
to
deliver
high-performance.
So
I
want
to
take
a
couple
minutes
and
talk
about
each
of
these.
In
turn,
when
web
10
hit
in
the
late
90s,
that
was
the
the
first
beginning
of
a
change
in
customers.
Expectations,
google
launched
or
started
their
company
in
1998
and
a
lot
of
people
in
the
room
have
been
using
google
almost
as
long
as
they've
been
alive
and
every
time
you
go
to
that
google
com,
you
expect
to
get
that
search
box
back.
G
You
know
it
would
be
unthinkable
to
go
to
google
and
get
back
a
page
that
says
we're
down
for
Planned
Maintenance
come
back
at
6am,
it's
just
unthinkable
and
the
ten
years
after
Google
was
started.
Steve
Jobs
introduced
the
iphone
and
this
same
expectation
of
availability
started
propagating
to
the
mobile
world,
and
today
mobile
is
the
dominant
player
in
markets
bigger
than
desktop
in
a
lot
of
places
and
the
the
problem
with
with
this
new
world
of
needing
to
design
for
scale
and
availability.
G
G
With
of
this,
which
is
perhaps
even
more
scary,
which
is
that
the
architecture
is
brittle
so,
for
instance,
if
I
have
instead
of
my
machine
going
down,
all
my
machines
are
alive,
but
I
have
a
network
partition
and
a
switch
fails,
and
now
some
of
the
machines.
The
machines
on
the
lower
left
here
can
talk
to
each
other,
but
they
can't
talk
to
the
machines
in
the
upper
right
and
vice
versa.
G
And
so
what
will
happen
if
you're?
Not
careful,
is
each
of
those
halves
of
your
network
will
elect
masters
and
start
accepting
requests,
and
so
this
is.
This
is
called
a
split
brain
scenario
and
when
you
have
multiple
masters,
accepting
updates
for
a
given
partition,
then
you're
going
to
introduce
corruption
and
that
this
isn't
just
a
theoretical
problem.
Arman
Ron
occur,
gave
a
talk
a
couple
years
ago
about
how
this
exact
problem
happened
to
his
MongoDB
cluster
and
the
architecture
that
I've
shown.
G
You
is
basically
a
simplified
version
of
how
MongoDB
works
and
arman
described
how
there
was
a
network
partition
and
mongodb
got
confused.
There
were
multiple
masters
elected
and,
and
he
had
a
corruption
in
his
database.
This
is
how
MongoDB
has
achieved
the
industry-leading
reputation
that
it
has
today.
G
By
contrast,
Cassandra
manages
your
data
without
any
master
slave
replication
set
up
each
replica
and
a
master
in
a
cassandra
cluster
can
handle
your
updates
independently
of
the
others.
So,
even
if
two
of
those
replicas
are
down,
it's
no
problem,
I
don't
need
to
take
any
heroic
failover
events.
Cassandra
keeps
on
working
because
it's
designed
to
tolerate
that
kind
of
failure.
G
Now
it's
true
that
that,
in
an
extreme
situation
like
you,
have
Patrick
and
Rachel,
destroying
every
node
in
your
in
your
cluster
that
cassandra
cassandra
can't
deal
with
that.
But
in
a
more
realistic
scenario,
this
kind
of
design
can
mitigate
real-world
failures,
so
the
kind
of
the
best
petri
dish
for
our
infrastructure
failure,
maybe
amazon
web
services-
and
I
said
that
not
not
to
throw
stones
at
amazon,
because
you
know
they're
they're
the
best
in
the
business
at
this,
but
even
though
they're
they
have
so
much
experience
and
so
much
expertise.
G
You
still
have
a
roughly
one
major
outage
a
year.
It's
almost
it's
almost
uncanny,
how
that
happens
like
clockwork,
and
so
what
I
take
away
from
this
is
that
now,
even
if
you
are
the
best
in
the
business,
then
you
do
need
to
plan
for
that
kind
of
outage,
because
it's
going
to
happen
so
in
2011,
EBS
took
down
us
East,
EBS,
again,
elb
bad
network
hardware,
reboot
apocalypse
and
then
most
recently,
dynamodb
metadata
service.
G
So
you
know,
Cassandra
can
can
help
you
deal
with
these
Christo's
from
netflix
described.
How,
during
last
year's
reboot
scenario,
where
ten
percent
of
all
the
men,
all
the
VMS
in
amazon,
were
rebooted.
They
lost
over
200
cassandra
machines
over
20
of
those
didn't
come
back
at
all,
and
there
was
zero
downtime
and
by
the
way,
Christos
is
here
at
the
conference
you'll
be
here
tomorrow
and
give
at
his
talk.
One
of
the
things
he'll
discuss
is
how
Cassandra
helped
mitigate
the
most
recent
downtime.
G
The
next
thing
I
want
to
talk
about
is
scale
and
it's
absolutely
critical
to
be
able
to
scale
to
the
largest
workloads
in
the
industry.
Apple
was
here
last
year
talking
about
their
75,000
Cassandra
nodes,
but
it's
arguably
even
more
important
to
be
able
to
scale
as
your
business
grows
and
protect.
Wise
is
a
good
example
of
that.
Now
they
started
two
years
ago
with
a
three
node
Cassandra
cluster.
Today
there
at
three
hundred
Cassandra
nodes,
so
Cassandra
has
grown
with
their
business
by
two
orders
of
magnitude
and
and
made
that
happen
smoothly
and
seamlessly.
G
G
So
when
everything
is
going
well
and
you
have
Cassandra
replicating
to
multiple
regions
and
you
can
geolocate
your
users
and
send
them
to
the
closest
region
for
the
fastest
possible
response
time-
that's
nice,
that's
nice,
to
have
that
that
the
ability
to
do
that.
But
this
is
really
critical.
When
things
don't
go
well
and
you
lose
one
of
the
region's
and
now
I
don't
need
again,
I
don't
need
to
do
any
failover
events
inside
Cassandra.
G
G
Finally,
I
want
to
talk
about
performance
and
again,
there's
kind
of
a
right
way
in
a
wrong
way
to
build
a
database
to
deliver
maximum
performance,
and
the
wrong
way
is
to
is
to
build
it
on
top
of
an
abstraction
layer
that
that
keeps
you
from
taking
advantage
of
what
your
modern
hardware
has
to
offer.
So
this
is
a
diagram
of
the
Hadoop
file
system.
Hdfs
and
the
details
aren't
super
important.
So
much
is
that
hdfs
is
designed
for
moving
large
replicas
of
data
at
once,
so
the
block
size
in
HDFS
is
64.
G
Megabytes
HDFS
is
kind
of
an
eighteen-wheeler
of
big
data
where
it's
designed
to
you
put
a
lot
of
boxes
in
it
and
it
ships
those
all
to
the
same
place,
doesn't
accelerate
particularly
quickly,
but
it's
very
cost
effective
and
efficient
at
moving
large
amounts
from
one
place
to
another.
That's
what
it's
designed
for
the
problem
is
when
you
take
this,
this
file
system,
that's
designed
for
an
analytical
workload,
and
you
try
to
build
an
operational
database
on
top
of
it.
G
By
contrast,
cassandra
manages
its
storage
locally
and
manages
its
replication
natively
to
get
the
best
possible
performance
out
of
your
hardware.
So
we
can
build
on
the
blocks
that
the
operating
system
gives
us
like
a
map
and,
like
f,
advised
to
pull
data
into
memory
when
we
need
it
most
and
give
you
the
best
possible
performance
data.
G
Sterics
contracted
with
a
company
called
endpoint
earlier
this
year
to
benchmark
the
top
no
sequel
systems
where
you
have
Cassandra
and
couchbase
and
HBase
in
MongoDB,
and
if
you
ask
someone
who
was
you
know
kind
of
familiar
with
the
industry
and
said
what
workload
would
Cassandra
be
most
appropriate
for
he'd,
probably
tell
you,
it
would
be
a
right
heavy
workload.
So
that's
what
we've
gotten
in
this
first
graph,
where
I
have
operations
per
second
on
the
y
axis,
and
this
is
a
ninety
percent
rights
work
load
across
the
x-axis.
G
What
you
see
is
going
from
a
two
node
cluster
and
doubling
the
cluster
up
to
an
eight
node
cluster
and
again
Cassandra
in
blue,
doing
very
well
against
its
industry
peers.
But
you
might
be
surprised,
then,
till
to
look
at
the
read,
read
dominated
workload
and
cassandra
is
actually
doing
even
better,
relatively
speaking
in
this
workload.
So
cassandra
is
really
a
general-purpose
system
that
can
handle
a
wide
variety
of
workloads.
This
is
the
the
balanced
workload.
What
we're
doing
fifty
percent
fit
reads.
G
Fifty
percent
rights-
and
you
can
see
that
some
of
Cassandras
competition
has
trouble
with
the
competition
between
those
reeds
and
those
rights
that
their
conflicting
with
each
other
and
reducing
performance
and
then,
finally,
we
did
want
to
recognize
that
no
even
operational
databases
will
need
to
do
some
light
analytics
as
part
of
your
application.
Workflow
such
as
the
the
machine
learning
that
Luke
and
John
showed
us
with
spark
against
the
killer,
video
demo.
G
In
fact,
most
of
them
didn't
so
we
split
it
up
into
two
releases
and
we
said
we're
going
to
try
to
get
people
the
new
features
as
fast
as
possible,
so
we're
going
to
split
out
those
features
that
don't
depend
on
the
new
he
engine
in
222
and
then
the
features
that
do
depend
on
the
new
each
engine.
Those
are
going
to
be
in
30
a
little
bit
later,
so
30
is
in
release
candidate.
Now
we
released
that
on
Monday.
We
expect
it
to
be
generally
available
in
October
to
dot.
G
G
Now
you
know
that
that
Cassandra
thinks
about
the
world
in
terms
of
rows
and
columns
and
I
can
have
a
table
that
looks
like
this
and
I
can
insert
data
into
it
with
a
cql
statement
like
this
starting
with
22
I
can
do
all
of
this
in
JSON
as
well,
so
the
syntax
for
that
says,
insert
into
table
name
JSON
and
then
I,
give
it
a
JSON
document.
Literal
and
Cassandra
will
parse
that
and
transform
it
into
its
native
representation.
G
This
is
designed
to
allow
Cassandra
to
seamlessly
integrate
into
a
world
of
JSON
based
microservices,
so
there's
actually
a
subtle
difference
in
the
two
statements
that
I
gave
here
and
you'll
notice
that
in
the
c
ql
statement,
I
generate
my
user
ID
with
the
built-in
function.
Now
that
creates
a
time-based
uuid,
whereas
in
the
JSON
example
I'm
giving
it
a
uuid
literal
and
that's
deliberate
because
we're
in
the
in
the
JSON
world,
the
idea
is
I'm
consuming
data
from
another
service
rather
than
rather
than
generating
it
myself.
G
Cql
allows
you
to
nest
data
structures
in
in
your
cassandra
rose
and
we
can
expose
those
to
json
as
well.
So
we
introduced
collections
back
in
Cassandra
12.
Here's
a
table
that
has
a
tuple
asset
and
a
list
column
types
here
and
so
I
can
insert
into
that
table
with
cql
like
this,
and
note
that
my
tuple
is
represented
with
parentheses
around
the
values
the
set
has
curly
braces
and
the
list
has
square
brackets,
so
I
have
different
representations
of
each
of
those
literals
and
c
ql.
G
The
other
way
to
nest.
Data
inside
cassandra
is
with
user-defined
types
and
a
simple
user
defined
type
looks
like
this,
where
I
have
an
address
street
number
and
street
name,
and
then
I
can
use
that
in
my
table
definition
and
say
I
have
a
street
address
column
that
is
an
address
value
and
then
I
can
insert
into
that
with
cql
and
then
with
json
as
well.
So
a
user-defined
type
becomes
a
sub
document
in
the
JSON
world
and
we
can
take
this
further.
G
We
can
nest
things
arbitrarily
deeply
in
a
Cassandra
table
and
expose
those
to
JSON
and
the
reason
you'd
want
to
do
this.
Is
you
want?
You
want
to
denormalize
the
data
that
your
application
needs
for
a
request
into
a
single
Cassandra
row,
so
you're
not
having
to
join
data
from
different
machines
together
in
a
given
request?
Nesting
data
into
a
Cassandra
table
makes
that
easier.
G
So,
let's
take
a
slightly
more
complicated
address
definition
where
now
I
have
a
set
of
phone
numbers
associated
with
with
each
address
and
then
I'm
going
to.
Let
users
have
multiple
addresses
now
my
users
can
have
a
home
address
and
a
work
address
and
we'll
just
let
you
supply
any
kind
of
address
you
want.
G
G
We've
also
added
support
for
role-based
authorization,
and
what
this
is
intended
for
is
for
larger
enterprises
that
are
managing
multiple
members
of
a
team.
We
we'd
want
to
avoid
the
problem
we're
through
human
error.
Some
members
of
the
team
have
different
permissions
than
the
others,
so
you
can
create
an
accounting
role.
G
You
can
assign
permissions
to
that
role
and
you
can
assign
users
to
that
permission
and
Cassandra
makes
sure
that
that
all
stays
in
sync
so,
as
you
add
users
and
remove
users
from
your
accounting
team,
all
you
need
to
do
is
add
and
unassign
that
role
and
that
possibility
of
a
mismatch.
Permissions
goes
away.
G
We've
also
added
user-defined
functions,
and
the
idea
here
is:
we
want
you
to
be
able
to
push
logic
closer
to
the
data
in
the
Cassandra
cluster.
So
as
a
very
simple
example,
this
is
how
you
would
define
a
sine
function
that
computes
the
mathematical
sign
in
Java
in
Cassandra.
So
you
say,
you
know,
create
function,
you
give
it
the
parameter
list,
you
give
it
the
language
name,
and
then
you
give
it
the
function
body.
G
That's
all
there
is
to
it
out
of
the
box.
We
support
java
and
we
support
javascript,
but
we
support
integrating
with
any
javascript
in
a
pie
compatible
language.
So
if
you
want
to
write
functions
in
Ruby,
you
just
drop
the
JRuby
jar
in
your
classpath
and
and
you
can
create
functions
in
Ruby
and
Cassandra.
G
So
I
can
invoke
this
function
like
this.
In
my
select
statement
where
I'm
invoking
the
the
function
on
a
column
called
value-
and
you
may
be
thinking
that
you
know
this
actually
doesn't
look
terribly
useful,
because
whether
I
compute
the
sign
in
the
cassandra
cluster
or
whether
I
compute
the
sign
in
my
application
code
that
doesn't
really
matter
a
great
great
deal,
because
I'm
pulling
the
same
amount
of
data
back
to
back
to
my
application,
server,
either
way
and
and
you're
right.
G
So
the
the
more
important
thing
that
that
these
user
defined
functions,
enable
is
the
ability
to
compute
aggregates
at
the
server
side
and
then
aggregates
a
little
more
complicated,
because
I
need
to
define
an
intermediate
state
function
that
accumulates
values
that
are
being
processed
and
then
I
need
to
define
a
final
function.
That
gives
me
the
final
value
from
that
intermediate
state.
G
This
is.
This
is
a
lot
more
useful,
because
I
can
combine
thousands
of
values
at
the
server
and
compute
the
average
without
having
to
leave
the
server
and
then
I
can
just
send
the
average
that
the
client
wanted
back
to
him
and
that's
much
more
efficient,
now
much
more
efficient
use
of
network
resources.
G
Finally,
for
422
we
added
commit
log
compression
you'll.
Remember
that
for
two
dot
one,
we
did
a
lot
of
effort
on
our
performance
for
cql,
reads
and
writes,
but
we
were
starting
to
be
bottlenecked
by
commit
logged
performance
and
by
the
amount
of
data
we
could
push
to
a
single
disk,
and
so
we've
all
EV
ated
that
bottleneck
in
22.
So
you
can
see
in
green
Cassandra
two
dot.
G
So
it's
not
only
faster
overall,
but
it
gives
you
more
consistent
performance
commit
log
compression
is
kind
of
experimental
into
dot
to
it's
off
by
default,
but
it's
going
to
be
turned
on
by
default
in
30,
and
I
encourage
you
to
to
play
with
that.
You
can
turn
it
on
on
a
single
machine,
make
sure
everything's
still
stable
and
then
roll
it
out
to
the
rest
of
your
cluster.
G
Finally,
a
day
tiered
compaction
strategy
is
a
new
compaction
strategy
that
we
created
during
the
22
time
frame,
but
it's
not
actually
limited
to
22,
because
we
designed
compaction
strategies
to
be
pluggable.
We
were
actually
able
to
take
this
back
to
two
dot,
one
and
even
20,
without
risking
the
stability
of
the
system,
so
I'm
going
to
look
I'm
going
to
show
you
two
graphs
that
illustrate
date
tiered
compaction
performance
and
what
this
is
designed
for.
G
It's
designed
to
handle
dense
time
series
workloads
and
allow
you
to
pile
a
lot
of
data,
even
cold
data,
on
a
single
box.
So
what
I?
What
I've
done
in
this
workload
is
I've,
actually
pushed
out
all
the
way
to
18
terabytes
of
data
on
a
single
machine?
And
so
this
this
graph
here?
This
is
the
Reed
performance,
so
I've
put
leveled
compaction
at
the
bottom.
G
Sighs
tiered
compaction
in
the
middle
tiered
compaction
at
the
top,
and
you
can
see
I
know
how
each
of
those
compaction
strategies
compares
across
across
this
data
set
leveled
compaction
as
you're,
probably
not
too
surprised
to
find
out,
falls
off
a
cliff
at
about
two
and
a
half
terabytes
of
data
and
you're,
probably
not
surprised
because
level
compaction
is
designed
for
read
mostly
workloads.
This
one
is
ninety
percent
rights.
G
So
that's!
What's
that's?
What's
coming
in
to
dot
to?
We
have
a
lot
of
new
features
across
the
board,
as
mentioned
earlier
that
that
we
pulled
out
into
30.
We
pulled
the
new
storage
engine
into
there
and
that
touches
everything
in
the
system
at
an
enables
new
features
that
we
couldn't
do
with
our
old
engine,
our
old
engine.
No,
it's
it's
served
us
well,
but
fundamentally,
at
the
atomic
level,
it
thinks
of
data
in
key
value
pairs
and
so
to
deliver
features
like
materialized
views.
G
We
don't
need
to
panic
it's
designed
to
handle
that
one
of
the
ways
is
designed
to
handle
that
is
with
something
called
hinted,
handoff,
where,
if
any
replica
doesn't
acknowledge
an
update
for
whatever
reason
the
coordinator
will
write,
what's
called
a
hint
to
itself
that
says
once
that
node
comes
back
online
or
I'm
able
to
talk
to
him
again,
we'll
send
that
that
mist,
update
over
and
and
delivering
that
hint
is
called
handoff.
So
that's
where
the
the
feature
name
comes
from
the
the
problem
with
hinted.
G
That
deletion
creates
a
tombstone
in
the
commit
log
in
the
mem
table
and
on
disk
again,
and
we're
still
not
done
because
right
now
in
this,
in
this
picture,
I
have
the
original
hint
in
one
data
file
and
the
tombstone
that
says
it's
deleted
in
another.
So
I
actually
need
to
compact
those
together
to
reclaim
that
disk
space,
and
so
this
is
there's
a
lot
of
overhead
in
this
design
that
doesn't
actually
help
us
in
in
hint
delivery,
because
we
don't
care
about
indexing
hints
by
their
idea
or
random.
G
Access
to
two
different
hints
that
Cassandra's
storage
engine
is
designed
to
provide
all
we
care
about
is
store
a
hint
safely
and
then
bulk
deliver
them.
So
430
we
created
a
very
simple
custom
storage
engine
for
hints,
and
it
just
looks
like
this,
where
I'm
going
to
create
a
flat
file
for
when,
when
a
replica
doesn't
ignore
my
update
and
I'll
just
append
hints
to
that
file
and
I'll
do
that
for
every
replica
and
the
cluster
that
I
need
to
store
hints
for
and
then,
when
the
replica
comes
back
online
and
I
deliver.
G
G
It
varies
widely
based
on
your
cluster
size,
so
I
have
here
the
most
extreme
possible
example
where
I
have
a
two
node
cluster
and
one
of
the
nodes
is
down,
and
I'm
writing
hints
for
it
to
the
other,
and
this
is
extreme
because,
as
my
cluster
size
grows,
the
responsibility
for
storing
hints
for
a
dead
node
gets
spread
across
all
the
nodes
in
a
cluster.
So
in
a
ten
node
cluster
you'd
expect
the
difference
to
be
about
you
know.
G
G
I
can
create
the
view
and
say
select
star
from
songs
and
my
primary
key
now
is
going
to
be
not
just
the
ID
but
first
the
album
and
then
the
ID,
and
by
doing
that,
that
that
tells
Cassandra
to
partition
the
data
by
the
album
and
so
now
I
can
go
and
I
can
say,
select
star
from
songs
by
album
for
a
given
album
and
Cassandra
can
get
me
that
song
list
and
you're
probably
thinking
this
sounds
a
lot
like
indexes
and
functionally
they're
very
similar.
I
can
take
my
songs
table.
G
I
can
create
an
index
on
the
album
and
I
can
select
from
the
table
using
that
index.
So
it's
functionally
very
similar.
The
difference
is
that
indexes
are
managed
locally.
Each
node
in
the
cluster
index
is
the
data
that
it
owns.
So
what
that
means
is
when
I
go
and
I
asked
an
index
what
songs
are
in
this
album
cassandra
has
to
scatter
that
query
across
the
entire
cluster
and
then
each
node
can
look
up
in
its
local
index.
G
So
as
a
consequence
of
this,
if
I
go
from
my
six
node
cluster
here
and
I'm
doing
10,000
requests
per
second
and
I
go
up
to
a
60
node
cluster,
then
I'm
still
going
to
be
pushing
around
10,000
notes
per
second
because
of
the
bottleneck
has
become
the
scatter
and
the
gather
parts
of
the
operation,
like
contrast,
a
materialized
view,
since
cassandra
is
repartitioning
that
data.
In
your
view,
I
only
have
to
go
to
a
single
replica.
G
G
Now
the
the
reed
performance
of
a
materialized
view
is
absolutely
identical
to
the
reed
performance
from
a
normal
table.
It's
the
exact
same
code
path
manage
the
exact
same
way,
but
I
want
to
give
you
some
idea
of
what
to
expect
from
the
right
performance
in
your
cluster
as
you
introduce
materialized
views
to
it.
So
we're
going
to
look
at
that
a
couple
ways.
The
first
way
is
to
know,
let's,
let's
take
the
writing
to
a
table
without
any
materialized
views
in
at
the
top
here
in
purple
and
then
in
the
red
line.
G
I
have
a
single
materialized
view
that
I've
added
and
then
in
the
purple
at
the
bottom
I've
got
five
materialized
views.
So
the
rule
of
thumb
is
that
adding
a
materialized
view
to
your
table
will
cost
about
ten
percent
of
your
performance,
and
this
is
done
just
with
the
standard
Cassandra
stress
tool.
But
we
also
wanted
to
look
at
this
a
slightly
different
way
and
say
no,
given
that
I
need
to
denormalize
this
data
for
my
application.
I
either
need
to
do
that
now.
G
G
Is
that
in
my
base
table
of
playlists
here
as
I'm,
inserting
more
and
more
data
into
into
each
of
those
playlists
I'm
increasing
the
time
I
spend
in
lock
contention
against
the
base
table
because
materialized
view
maintenance
needs
to
take
out
exclusive
locks
against
the
base
table
to
keep
the
views
consistent
with
the
base
table
data
and
so,
as
a
rule
of
thumb
that
that
contention
starts
to
become
meaningful
at
about.
200
rose
/
partition.
G
G
So
we
looked
to
Intel's
tik-tok
development
process
for
inspiration
and
what
Intel
does
is
they
split
their
their
microprocessor
development
into
a
tick
of
smaller
transistors
and
new
manufacturing
process
and
a
talk
of
a
new
micro
architecture,
and
so
by
splitting
those
up,
they
reduce
the
possibilities
for
error
and
the
possibilities
for
conflicts,
and
you
get
a
more
reliable
schedule.
So
the
way
we're
applying
that
to
cassandra
is
that
every
other
monthly
release
will
be
a
pause
in
feature
development
and
just
include
bug
fixes
so
30.
G
G
Now,
if
you're
paying
attention
to
the
industry
recently,
intel
has
actually
had
some
hiccups
in
their
tick-tock
process
and
we're
realistic
enough
to
acknowledge
that
will
probably
have
some
hiccups
as
we
move
to
this
development
process
as
well.
So
what
we're
doing
is,
in
parallel
with
the
tick-tock
process,
we're
going
to
continue
to
deliver
traditional
stabilization
releases
of
the
30
line,
so
we'll
release,
301,
302
and
so
forth,
and
each
of
those
will
not
include
any
new
features
at
all,
but
strictly
contain
bug
fixes.
G
But
for
that
entire
30
X
series,
it's
going
to
be
a
hundred
percent
compatible
I,
hope
you're
as
excited
as
I
am
about
the
new
features
in
Cassandra
22
and
what's
coming
in
30
and
the
new
development
process.
That
will
allow
us
to
continue
delivering
new
features
at
a
regular
cadence
and
keep
Cassandra
the
best
operational
database
in
the
industry
for
years
to
come.
E
Jonathan
before
you
head
off,
I
want
to
get
some
help
from
you
here.
Should
I
get
this
right.
I'm
gonna
roll
this
out.
You
grab
that
end
and
let's
see
what
we're
looking
at
here,
don't
burn.
My
fingers,
you're
cutting
me
here,
walk
back,
so
I
don't
fall
off
the
stage
all
right.
What
we
have
here
is
signatures
from
our
inaugural
certification
process
for
Apache
Cassandra
sponsored
by
o'reilly.
So
big,
congratulations
to
everybody
on
the
list
and
we
will
make
that
available
for
everybody
down
in
the
in
the
partner
pavilion
area
Jonathan.
Thank
you.
E
So
a
lot
of
deep
stuff
there
right
Jonathan
is
nothing
if
not
thorough,
and
we
will
always
get
a
good
look
at
what's
going
on
with
the
technology
and
the
inner
bowels
of
the
systems
and
how
they
work
and
that's
one
great
thing
about
being
in
the
open
source
community.
But
that
does
lead
to
a
challenge
that
is
vitally
important
that
we
help
you
solve,
and
that
is
the
ecosystem
itself.
E
How
fast
can
we
get
the
ecosystem
up
to
speed
on
how
to
use
and
take
advantage
of
these
great
new
features,
and
so
our
training
program
that
we
ran
yesterday
was
far
far
more
successful
than
we
had
anticipated.
We
had
planned
for
600
people.
That
was
our
capacity.
We
started
to
try
and
push
it
to
650,
because
we
had
a
lot
of
inbound
demand.
Then
we
tried
seven,
we
ended
up
I,
don't
know
how
we
put
737
people
through
the
training
yesterday
that
was
done
with
a
partnership
with
O'reilly
for
the
certification.
E
That
is
the
kind
of
thing
that
is
going
to
take
now
this
technology
and
once
again
put
another
accelerant
on
it,
so
that
you
can
take
this
knowledge
and
start
to
apply
it
in
very
meaningful
ways.
Not
everybody
in
here
is
going
to
need
to
know
how
hints
are
handled
at
that
level.
But
what
you
will
need
to
know
is
how
to
build
a
fantastic
data
model
to
support
your
application.
E
You
will
need
to
know
the
basics
of
how
you
get
your
data
in
and
out
of
the
system,
so
our
training
initiatives
are
going
to
be
centered
around
helping
this
world
move
much
much
faster,
so
please
take
advantage
of
them.
We
have
a
lot
of
free
offerings
available
online
that
you
can
come
and
get
yourself
trained
at
your
own
pace,
but
do
it
do
it
right?
Because
when
you
do
it
right,
you're
helping
yourself
you're
helping
your
company
you're,
helping
everybody
see
the
value
that
you
bring
to
the
market
by
getting
things
like
a
certification.
E
So
let's
learn
how
to
do
this
stuff
right
out
of
the
gate.
It'll
make
everything
go
much
faster.
Next
I
want
to
thank
a
very
special
group
of
people.
We
got
the
opportunity
to
talk
to
Microsoft
with
Scott
and
see
what
they're
doing
with
us,
but
the
partner
pavilion
as
I
said
earlier
is
just
off
the
charts.
E
This
year
we've
got
and
I
know
well
beyond
35
partners
back
there
and
I
know
many
of
you
were
there
this
morning,
and
already
the
room
was
getting
packed,
but
just
looking
at
our
our
gold
partnerships
for
a
second
is
very
representative
of
what's
happening
in
the
industry
and
what
you're
seeing
is
market
leaders,
people
that
have
had
market
dominant
positions
for
decades
are
realizing.
We
have
to
also
get
in
this
game.
E
That
is
fantastic
when
you
see
that
kind
of
endorsement
from
a
traditional
ecosystem
come
to
an
event
like
this
and
then
finally,
there's
a
third
classification
of
partners
that
you'll
see,
which
is
the
startup
crowd,
to
build
the
next
couple
of
decades
of
these
giant
companies
that
are
going
to
define
our
industry.
So
we
really
want
to
thank
them
for
all
their
effort
and
I
know.
You
won't
be
disappointed
when
you
go
back
and
spend
some
time
talking
and
seeing
what
they're
all
up
to.
But
finally,
and
most
importantly,
it
is
really
all
about
you.
E
This
is
where
we
want
to
thank
you
for
every
thing
that
you've
done
for
the
time:
the
dedication,
the
passion,
the
energy
that
you're
applying
to
learning
this
new
market
in
this
new
technology.
Everything
I
said
last
year
about
you
being
the
authors
of
this
new
world
is
a
hundred
percent
true,
but
I
believe
it
is
risen
by
an
order
of
magnitude
when
I
see
what's
happening
now
in
the
market
and
when
you
go
to
these
137
sessions
and
you
listen
to
what's
going
on
and
you
realize
this
is
in
your
control.
E
This
is
your
future.
You
get
to
write
an
entire
industry
that
doesn't
happen
very
often
guys.
It
just
doesn't
happen
very
often
it
can't
the
ecosystem
can't
sustain
that
in
anything
other
than
a
couple
of
decades
at
a
time.
So
it
is
a
real
privilege
to
be
able
to
stand
here
and
thank
you
for
your
participation
and
to
help
you
get
accelerated
and
to
help
you
get
creative
and
get
passionate
about
what
you're
doing
so.
Thank
you
all
very
much
and
I
hope
you
have.
A
wonderful
conference
enjoy
yourself.