►
Description
Link to blog referenced in video: http://www.planetcassandra.org/blog/this-week-in-cassandra-contemplating-compaction-482016/
A
Oh
here
we
go
we're
up
here.
We
are
this
week
in
Cassandra
on
planet
Cassandra.
Today
we've
got
Patrick
McFadden
and
Jeff
jirsa
from
CrowdStrike
we're
going
to
be
talking
about
the
news
and
then
we're
going
to
be
talking
about
some
stuff
that
Jeff
has
been
working
on
over
the
last
few
months.
That's
pretty
interesting!
Before
we
get
going,
though
Patrick
we've
got
some
big
news,
don't
we.
A
B
B
Is
a
nice
change
of
pace,
so
Cassandra
summit
is
live
today,
Cassandra
summit
org
and
you
can
register
now
watch
Twitter
for
discount
codes.
I
did
don't
give
yours
yet,
but
watch
Twitter,
John
and
I
three
hi
there's
a
few
others.
You
might
be
in
your
Twitter
feed
that
will
be
giving
out
discount
codes,
but
it
is
open
today
and
I
will
tell
you
this.
You
need
to
register
early
because
we're
we're
looking
at
the
sizes
of
this
year
and
I
think
that
the
community
has
spoken
loud
and
clear,
including
myself.
B
We
do
not
want
a
massive
10,000
persons
summit,
they're,
just
not
useful,
although
we're
gonna
do
we're
going
to
keep
it
about
the
same
time
sizes
last
year
it
doesn't
mean
the
demand
is
lower.
That
means
that
now
it's
going
to
be
a
little
tighter.
So
if
you're
waiting
to
the
last
second,
you
may
be
unhappy
yeah.
A
I,
you
know
we
I
think
we've
we've
been
to
some
conferences
where
people
are
just
kind
of
wandering
around
without
any
real.
You
know
enthusiasm
being
there.
It's
kind
of
just
like
a
way
to
spend
a
day.
That's
not
in
your
office.
That's
definitely
not
the
point
of
this
summer.
Right
like
if
you're
a
Cassandra
user.
You
want
to
learn
like
you.
You
want
to
take
this
stuff
seriously
and
meet
some
really
passionate
people
like
that's,
that's
the
way
to
go
like
I've
mentioned
on
here.
Like
a
couple
times
it.
A
It's
like
who
is
this
guy,
and
you
know
it's
awesome
like
you
get
a
lot
of
people
and
you
get
a
lot
of
enthusiasm.
You
get
a
ton
of
energy
around.
You
know
this
database
and
kind
of
the
related
ecosystem
and
there's
a
lot
of
cool
stuff,
and
you
end
up
having
a
lot
of
side
conversations
I
mean
I've
had
conversations
were
like
all
of
a
sudden.
We
pull
out
a
whiteboard,
we
start
talking
about
algorithms
and
data
modeling
and
all
of
a
sudden,
we're
like.
A
B
C
You
know
that's,
what's
one
of
my
favorite
events,
we
go
and
we
spent
a
lot
of
time.
You
know
we
talk
to
people
like
the
W
at
the
Weather
Channel
and
we
got
a
lot
of
value
out
of
just
you
know,
actually
meeting
people
and
talking
about
you
know
some
use
cases
and
we
talked
to
Olivia
about
some
secondary
index,
use
cases
that
we
haven't
thought
about
and
we
talked
to
Christo's
at
Netflix
about
some.
C
You
know
disk
performance
things
like
you
know,
topless
yield
was
about
EBS
using
2,
Samuel,
mebs
yep,
and
you
know
they
obviously
have
you
know
thousands
of
Cassandra
notes
at
AWS.
So
you
know
we
demonstrated
some
cost
savings
and
they
were
really
interested
in
it.
So
there's
a
lot
of
my
directional
knowledge
flow
that
happens
that
you
just
don't
have
an
opportunity
to
talk
about
in
any
other
place,
yeah.
B
So
one
of
the
things
we're
going
to
planning
on
doing
this
year
and
we
night
of
the
planning
and
I'm
trying
to
take
that
feedback
and
turn
it
into
something
really
useful,
and
that
is
trying
to
encourage
more
collaboration
instead
of
just
making
it
a
drill
or
traveling
like
a
slog
of
sessions,
one
after
the
other
and
you're
just
running.
One
thing
to
the
other
scapa
mountain
under
the
create
spaces
for
people
to
have
those
conversations,
because
you're.
A
A
A
This
is
pretty
cool,
so
there's
a
there's,
a
common
data
model
where
people
have
let's
say
like
leaderboards
and
effectively.
What
you
have
is
a
partition
key,
where
it's
like
the
game
and
that
a
bunch
of
scores
and
people
and
frequently
you
may
want
to
say,
like
I-
want
to
get
the
top
10
players
in
each
game.
Well,
70
17
is
about
adding
a
per
partition
limit
clause,
so
you're
effectively
kind
of
doing
like
a
full
table
scan,
but
only
taking
the
head
of
a
partition.
So
this
is
pretty
cool.
A
B
That
and
that
points
out
the
one
of
the
I
think
de
as
a
three
dot
0
3
X
is
moving
forward.
This
is
such
a
big
deal
as
usability
and
the
changes
that
are
happening
in
c
ql
are
making
it
more
usable,
I.
Think
that's
the.
If
you
have
three
corners
for
a
good
database,
it's
going
to
be
the
usability
is
one
of
those
three
and
then
performance,
and
then
no
density
is
another.
But
you
know
the
usability
has
to
be
critical
and
it's
so
hard
because
there's
just
a
million
things
to
do.
A
I
mean
because
before
this
one
of
the
one
of
the
interesting
parts
about
this
particular
issue
is
before
this,
you
would
have
to
keep
track
of
every
game
in
some
table.
Right
you'd
have
to
have
a
list
of
all
your
partition
keys
and
then
you'd
have
to
go
out
and
query
them
all
at
once.
So
this
is
usually
not
like
a
I
mean
we're
using
a
leaderboard
case,
there's,
usually
not
a
ton
of
data
associated
with
that,
but
I.
A
Think
with
you
know
any
of
these
types
of
queries
you
find
like
top
end
and
a
partition
is.
It
is
very,
it's
widely
hated
right,
like
you
know,
if
I
have
a
billion
partition,
keys
going
out
and
finding
all
of
them
first
then
doing
a
billion
queries,
that's
a
ton
of
overhead
and
it's
kind
of
pointless,
and
so
that
this
will
definitely
be
really
really
cool.
I.
Think
we'll
see
some
fun
data
models
come
out
of
this.
A
A
B
A
No
clue
this
was
coming
donkey.
Sandra
is
effectively
a
rest
based
document
database
built
on
top
of
Cassandra.
So
like
the
things
that
people
like
about
Pearson's,
like
whatever
we'll
just
use
Cassandra
instead
cuz
like
that
queer
flexibility
with
the
flexibility
of
and
having
documents
is
cool,
but
they
don't
really
seem
to
like
be
baggage.
I
would.
B
C
B
Absolutely
I
mean
the
fact
that
they
decided
to
to
well,
like
I
said:
there's,
probably
a
story
but
and
I've
worked
with
a
lot
of
large
companies
that
have
struggled
with
doing
put
this
into
open
source
and
I
I.
Do
a
talk
of
oz
con
on
this
and
I
hate
to
see
when
people
put
something
in
open
source
when
it's
just
like
they
put
that
box
like
a
junk
on
the
curb
and
put
free
on
it?
You
know
it's
like
they're,
just
throwing
stuff
over
the
wall,
but
this
seems
like
there.
A
And
it's
there:
it's
there,
not
their
business,
so
I
think
a
lot
of
people
are
afraid
to
open
source
things,
but
that
thing
isn't
what's
making
them
money
right
like
Pearson's,
not
a
database
company,
there's
no
reason
for
them
to
say,
like
we're,
gonna
open
source,
we're
not
going
to
open
source.
This
like
we,
it's
our
secret
sauce
is
our
database.
It's
like!
Well,
probably
not
you
know,
I
say
think
it's
pretty
cool
and
they're
they're.
A
Actually
they're
pretty
cutting-edge
I
mean
it's
not
just
this,
but
they
you
know
they
were
tightened
users
before
anybody
even
heard
of
Titan.
You
know
I
think
they
were
like
the
first
one.
I
think
I
may
have
been
the
second
one,
but
yeah
they're,
like
those
guys,
are
willing
to
try
new
things
and
I.
Think
it's
pretty
cool!
Oh
yeah!
It's
definitely
one
of
those
projects
that
hey
everybody.
B
A
A
He
did
some
work
a
while
ago
when
he
was
at
Apogee
and
there's
a
you
know,
video
out
there
somewhere,
where
he's
kind
of
showing
off
like
oh,
we
it's
we
wanted
to
see
if
we
can
make
wire
protocol
work
with
Cassandra
and
there
he
goes
so.
This
is.
This
is
a
pretty
cool.
You
know,
evolution
of
that
I
definitely
want
to
see
how
far
they
take
this
because
I
think
it'd
be
pretty
cool
to
just
be
like
a.
A
You
have
super
flexible
document
database
topic
Sandra,
so
alright,
so
I've
been
I've
been
ranting
out
for
a
little
while
about
document
databases,
the
meat,
the
meat
of
the
matter.
Why
we
have
Jeff
on
here
Jeff,
so
you
wrote
something
called
time
window
compaction,
which
is
pretty
cool
like
I
I've,
taken
a
look
at
this
like
it's.
It's
really
well
written
code
and
I
know
that
you
guys
are
using
this
in
production
at
pretty
big
scale
right.
Thank
you.
We
can
you
give
us
a
little
about
your
cluster
size.
Are
you
using
it
so.
C
We
we
have
a
very
lunch
time
series
cluster
comets.
It's
only
ordered
of
a
couple
hundred
notes
and
it's
on
the
order
of
about
seven
hundred
kilobytes
of
actual
data.
So
no
the
available
disk
is
in
solutely
higher
than
that,
but
about
seven
hundred
kilobytes
of
light
data
and
it's
all
TT
element
all
x-fighters.
So
one
of
a
kind
of
ongoing
concerns
as
being
able
to
expire
that
officially
and
make
sure
that
you
know
we
don't
keep
extra
data
on
disk
that
we
don't
need.
C
So
if
we
use
something
like
size
to,
if
we
use
level
who's
no
guarantee
when
data
will
be
except
when
tombstones
will
be
actually
cleaned
up,
so
we
wanted
something
that
was
very
easily
understandable,
easy
to
reason
about.
So
we
explicitly
made
a
new
compaction
strategy
that
you
know
instead
of
clipping
SS
tables
by
sides
or
by
petition
keys.
We
can
group
them
into
something
the
easiest
thing
we
could
think
about,
which
was
basically
say.
I
want
one
ss
table
per
day.
C
Well,
I
want
an
SS
table
good
week,
but
I
want
one
ss
table
for
every
two
minutes
of
data,
and
that's
all
it
does.
It's
like
the
simplest
possible
thing
to
think
about,
and
basically,
as
you
feed
data
in,
it
does
kind
of
size
to
compaction
and
as
soon
as
it
hits
that
we
know
it
kind
of
joins
them
all.
I
believe
adequately
smallest
list,
and
it
gives
us
kind
of
a
very
easy
to
think
about
the
reason
about
compaction
strategy.
That
gives
us
a
little
bit
performance.
Nice.
A
Very
cool:
what
so,
what
are
you
guys
doing
with
that
over
there
CrowdStrike
so
does
everything
we
should.
B
C
It's
it
it's
a
fascinating
company,
I've
been
here
just
about
in
what
they
did.
Is
they
make
little
level
sensor
that
tracks
things
like
file,
handles
and
addresses
creation
and
red
injection,
and
you
know,
network
connections
and
what
we
do
is
we
use
to
track
at
vs,
so
we
use
it
to
check
malicious
actors
and
you
can
think
of
it
almost
like
Amazon
still
it,
but
it's
more
than
that.
You
know
someone
opens
an
email
and
saves
an
attachment
that
attachment
you
know,
creates
a
file.
It's
executed
makes
a
connection
to
some
foreign
country.
C
You
know
any
of
those.
Individual
steps
may
not
be
a
big
deal,
but
when
you
look
at
it
as
a
whole,
you
know
that
whole
situation,
we
can
kind
of
understand,
is
malicious,
so
we
keep
kind
of
instant
access
to
this
massive
amount
of
computing
data.
Sensitive
telemetry,
is
what
it
effectively
is,
so
that
we
can,
you
know,
help
large
organizations
fight.
You
know
very
advanced
tactics
under
loans,
state
nation,
state
level
to.
A
C
You
know-
and
we
you
know
when
we
built
yet
we
talked
a
little
bit
about
this
bastard
summit.
We
actually
tried
things
like
neo4j,
they
tighten
and
we
actually
thought
about
doing
it
as
like.
Shutting
my
sequel,
I
am
with
kind
of
the
relational
joins
then,
and
we
realize
that
all
these
different
things
that
we
kept
trying
to
do
except
to
tighten,
we
basically
building
casana,
so
we're
looking
at
shutting
neo4j
and
shutting
my
sequel
and
shielding
all
these
other.
B
You
know
that
I,
this
is
something
I
rail
against
often
is
in
2016
the
not
invented
here
on
a
database
I
think
it's
it's
kind
of
outmoded
anymore.
You
can
move
on
to
other
problems.
There
plenty
other
interesting
problems
to
solve
and
when
I
hear
an
organization
building
a
database
technology
in
this
day
and
age
it
just
it
doesn't
make
a
whole
lot
of
sense.
There's
put
your
if
you
don't
put
your
thoughts
in
efforts
behind
something
like
Casilla
pick.
Another
open
source
database
there's
plenty
of
them
out
there.
B
A
So
it's
one
of
those
things
that
people
don't
really
realize
how
much
work
is
involved,
because
Facebook
built
out
my
sequel
at
scale
and
and
then
that's
the
references
well
Facebook
did
it.
It's
like
well,
they've
also
invested,
like
you,
know,
millions
of
dollars
in
making
my
seed
scale
with
like
lots
of
people,
so.
C
B
Was
cheap,
I
would
go
back
to
the
point
on
when
Facebook
started.
There
was
no
other
technology,
and
so
they
are
now
and
I've
said
this
before
their
admired
and
technical
debt.
When
it
comes
to
this,
you
know
they're
sharding
technology,
that
they've
gotten
I
think
when
it
there
somebody
senior
there
its
structure
when
you're
was
like
it's
the
it's
like
a
death
of
a
thousand
cuts.
It's
like
the
worst
kind
of
thing
they
can
deal
with
because
they
do
that's
a
high-interest
credit
card.
They're,
never
gonna
pay
off.
A
C
Coming
back,
you
know
the
choosing
opens
always
led
to
you,
you
know,
go
in
and
fix
things
do
you
have
and-
and
you
know
we'll
put
the
community-
you
know
I,
we
found
that
know.
People
like
whether
calm
and
they
like
and
some
other
people
using
time
window
compaction,
I,
think
you'll.
Like
it.
You
know
this
user,
that's
using
at
five
million
elitism,
which
is
even
hotter
than
what
we
use
it
at
and
you
know
had
an
allele
of
us
chosen,
a
commercial
DB
that
wouldn't
have
been
an
option.
Have
we
invented
something
ourselves?
Yep.
A
And
this
is,
this
is
actually
comes
back
to
what
we
were
talking
about
before
with
doc.
You
Sandra,
like
you
know
they
released
it
because
that's
not
how
they're
making
their
money,
like
you
put
you
put
a
time
window.
You
know
you
wrote
your
own
compaction
strategy,
you
put
it
out
there
and
I
mean
I.
Remember
when
we
talked
about
this
a
little
while
ago,
you've
been
getting
pull,
requests
and
you'd
be
getting
enhancements
from
other
people
and
that
that's
just
really
cool.
That's
amazing!
Well,.
B
B
A
basin
yeah
thanks
the
world
needed
that,
but
instead
what
they
did
is
they
just
layered
on
top
of
something
that
was
already
established,
they
don't
have
to
worry
about
the
distribution
layer.
You
notice
there's
a
bunch
of
people
worrying
about
that
all
the
time
and
they
can
fix
bugs
as
they
come
up
and
deal
with
that
they
can
deal
with
the
top
layer.
And
that's
that's
another
interesting
way
to
think
about.
Open
sources
is
ecosystem.
A
Yeah
I
think
it's
very,
very
smart,
so
okay
cool,
so
I
think
I
think
we
hit
all
the
topics
that
we
wanted
to
touch
on.
Jeff
really
appreciate
you
coming
on
talking
about
CrowdStrike
and
what
you
guys
are
doing
and
I
didn't
even
realize.
We'd
be
talking
about
graph
databases,
but
we
touched
on
graph
Cassandra.
We
may
have
even
mentioned
my
sequels
sharding
and
sanity.
A
So
I
think
we
we
hit
a
lot
of
things
so
Jeff
is
there
anything
you
wanted
to
talk
about
before
we
sign
off
here.
I
think
I
think
you,
you
may
be
hiring
yeah.
C
We
we
have
openings
in
all
sorts
of
appointments,
so
everything
from
you
know
my
infrastructure
team
got
security.
Openings
we've
got
dev
openings.
We've
got
just
about
anything.
Interesting
at
the
company
has
openings
in
it.
So
if
you're
interested
in
close
like
we
definitely
got
some
openings
cool.