►
Description
Speaker: Matt Stump, Senior Backend Engineer at KISSMetrics
Slides: http://www.slideshare.net/planetcassandra/1-matt-stump
The ability to manipulate and query very large datasets in realtime is a pressing need for most large data enterprises. Recently, we've seen an explosion of tools such as Impala or Druid, but all of these tools suffer from single points of failure or can't deliver the sub 1 second query times necessary for realtime results. Together we'll explore how to break down these seemingly intractable problems. We'll learn how to build horizontally scalable query engines with Cassandra, capable of sub-second query times across multi-billion row datasets.
A
I'm
Matt
stump
I'm,
the
backend
architect
or
kissmetrics
I'm,
also
deus
ex
MVP
and
the
author
of
lib
cql,
the
c++
native
protocol
driver
for
Cassandra
I
work
for
KISSmetrics
just
a
little
bit
about
what
we
do
because
that'll
help
inform
you
know
why
we
went
the
path
that
we
went.
We
ran
analects
company
similar
to
Google
Analytics,
we
put
JavaScript
on
your
website
or
you
can
feed
us.
You
know
events
from
your
back-end
system
and
Google
Analytics
will
tell
you
you
know
what
happened.
A
You
know
there's
a
page
view
from
North
America
and
you
know
ten
percent
come
from
like
ie
users
or
something
like
that.
But
we
take
a
different
approach.
We
take
a
person,
centric
approach,
so
some
of
the
stuff
that
our
customers
are
interested
in
all
has
to
do
with
people
we're
Google.
Linux
will
tell
you
you
know.
Ten
percent
of
your
traffic
comes
from
IE
users.
We
can
tell
you
that,
oh,
that
ten
percent
actually
represents
ninety
percent
of
your
revenue.
So
you
can't
just
deprecated
that
browser
or
we
can
tell
you.
A
Twitter
ads
may
convert
at
a
lower
rate,
but
those
users
actually
sign
up
for
higher
monthly
reoccurring
plans,
and
so
you
want
to
spend
more
of
your
money
there
or
green.
You
know
we
do
all
the
standard
AV
stuff,
but
really
we
want
to
help
our
customers
dive
into
the
data
and
and
find
the
difficult
to
uncover
conclusions
that
can
really
transform
your
business
to
help
you
run
more
optimized,
more
lean
and
essentially
build
a
better
product.
So
we
deal
with
hundreds
of
millions
of
events
per
day.
Some
of
those
are
pageviews.
A
Some
of
those
are.
Oh,
this
person
bought
this
product
and
that
represents
ten
percent
of
ten
dollars
worth
of
revenue,
but
and
we
and
we're
using
a
custom-built
MapReduce
framework
written
in
Ruby
to
do
that
and
as
bad
as
that
sounds,
it's
actually
pretty
performance,
because
you
know
we're
able
to
return
results
and
render
pages
in
real
time
and
we're
doing
it
across
hundreds
of
terabytes
of
data
yeah.
A
But
we've
gotten
to
the
point
where
it's
really
starting
to
hold
back
business,
and
so
they
brought
me
on
to
figure
out.
How
can
we
switch
to
a
real-time
interface
where
we
can
do
really
complex
queries
and
return
results
in
less
than
500
milliseconds,
so
that
I
can
render
a
page?
Basically,
everything
was
done
in
mind
of
being
able
to
consume
the
data
from
multiple
Facebook's
and
then
run
a
regex
query
across
the
hundreds
of
terabytes
of
data
and
then
have
a
response
returned
in
500
milliseconds.
A
A
A
You
know
they
don't
deal
with
high
cardinality
data
pretty
well.
The
query
planner
is
really
limited
because
these
indexes
are
actually
distributed
across
the
cluster.
So
you
can't
do
things
like
unions
of
different
indexes
very
efficiently,
and
it
only
works
for
some
data
types.
It
doesn't
work
for
counters
things
like
that
and
range
cure.
Queries
involve
broadcasting
the
query
out
to
the
entire
cluster
and
you
have
to
gather
results
from
every
single
cluster
notes,
dish
that
back
into
a
really
big
I
can
date
list
to
return.
A
20
results
to
render
a
page,
and
so
it's
just
not
really
feasible
to
use
it
in
a
query.
Language
like
sequel,
where
I
want
multiple
index,
is
multiple,
where
clauses
I
want
to
do,
reg,
ex
search
and
and
rain
churches,
and
all
that
stuff
it
just
doesn't
work.
So
what
I
want
is
I
want
hyper,
knology
data,
so
you
know
every
single,
zip
code.
You
know
in
the
u.s.
A
is
one
example,
but
I
want
to
be
able
to
index
every
possible
counter
value
for
a
128-bit
counter
or
I
want
to
be
able
to
index
every
single
tweet.
That's
ever
happened
and
be
able
to
search
by
a
regular
expression,
Micah
and,
like
I
said
all
results
have
to
return
in
less
than
500
milliseconds
and
that's
for
billions
of
rows
and
I
want
subfield
searching
with
regular
expressions.
A
So
if
I
index
to
tweet
I
want
to
be
able
to
do
the
regex
on
tweet
I
want
range
queries,
so
range
queries
for
us
typically
mean
things
like
I
want
all
users
in
the
past
30
days
that
have
generated
revenue
more
than
ten
thousand
dollars
and
came
from
you
know,
ad
campaign
a
you
know,
things
like
that,
so
I
can
really.
You
know
that
way.
A
I
can
do
things
like
target
you
I
experiments
just
towards
those
users
or
target
you,
I
experiments,
just
twitter
users
and
all
sorts
of
other
interesting
stuff
that
you
can't
do
with
existing
systems.
So
did
a
lot
of
research
tried
to
figure
out
you
know
what
do
other
databases
do?
I
mean
this?
This
has
to
be
a
solved
problem
and
really
it
comes
down
to
bitmap
and
bit
slice
indexes
and
I'll
go
through,
and
you
know
try
and
teach
you
as
best
as
I
can.
How
they
all
work,
they
all
work
so.
A
Okay,
so
this
is
your
standard
inverted
index
where
I
have
a
an
index,
name
and
I
store
the
you
know
which
users
actually
hit.
You
know
which
users
are
using
94
110
is
their
zip
code
and
this
instance
just
C
star.
So
in
order
to
make
this
a
more
tractable
problem,
a
single
Michigan
problem,
what
you
want
to
do
is
you
want
to
encode
that
as
a
bit
array-
and
so
it's
basically
as
small
as
I-
can
get
now
get
into
how
this
works.
A
A
And
so,
basically,
as
long
as
I
have
unique
row,
IDs
that
can
be
run
through
some
hash
function
would
be
reduced
to
an
integer.
That
means
I
can
use
bit
slices
to
represent.
You
know
any
yes,
no
clause,
so
did
they
hit
this
event?
Are
they
in
zip
code,
9,
4,
110,
all
that
sort
of
stuff
you
can
be
answered
through
a
single
single
dimension,
bitmap
or
a
bit
slice
or
bit
vector
whatever
you
want
to
call
it.
A
If
I
want
to
do
a
an
intersection
like
an
and
claws,
so
I
want
to
find
all
users,
they
have
triggered
event
1
or
event,
20
event,
1
and
event.
Two
then
I
do
the
intersection,
so
the
value
the
the
ones
have
to
be
lined
up
in
both
of
those
bit
vectors.
So
right
there
I've
got
the
basis
of
boolean
algebra
I've
got
my
hand
Mike
out
my
or
I've
got
my
ex
or,
and
so
really
I
just
have
to
figure
out
more
creative
ways
of
encoding.
A
The
same
problems
in
these
bits
defectors
bit
slices
whatever
you
want
to
call.
So
how
do
I
encode
more
interesting
values
into
a
bitmap
index?
Well,
you
had
more
bitmaps,
of
course,
so
the
really
what
you
end
up
doing
is
you
create
these
two
dimensional
arrays,
so
I'll
have
multiple
bit
slices
all
for
the
same
field
and
then
for
each
possible
value.
Let's
say
I'm
indexed
encounters,
then
I'll
do
one
array
for
the
value,
1
and
1
array
for
the
value
too.
A
So
if
your
presence
in
the
array
for
value
1,
that
means
your
counter
value
is
set
to
1.
If
your
presence
and
counter
value
two
in
the
the
bit
array
for
counter
value
to
that
means,
your
counter
value
is
2,
and
so
that
way
I
can
do
range
queries.
So
give
me
all
users
that
have
triggered
this
event
less
than
five,
so
I'll.
Just
take
the
first
five
bitmap
indexes
do
a
union
across
them,
and
that
gives
me
all
users
that
have
triggered
event
have
triggered
this
event
as
zero
through
five
times,
and
that.
B
B
A
You
know
it's
ginormous
32-bit
could
be
you
know
four
gigs,
and
so,
if
you
have
just
a
bunch
of
four
gig
arrays
sit
around
with
one
bit
of
data,
and
you
know
a
bunch
of
zeros.
That's
not
very
efficient.
I
haven't
solved
my
problem.
I
have
made
this
until
one
machine
problem
and
if
you
read
through
the
papers
pretty
much,
every
single
paper
is
dealing
with
that
exact
same
problem
and
there's
a
different
number
of
different
approaches
and
I'll
I'll
tell
you
the
one
that
I
took.
A
So
how
do
we
in
do
indexing
on
stuff,
like
text,
because
that's
sort
of
the
Holy
Grail,
my
reg
ex
expression
across
my
hundred
and
ten
terabytes
of
data?
You
break
it
into
trigrams,
so
trigrams
if
you've
used
solar
or
if
you
used
reg
ex
search
and
postgres
you'll
be
familiar
with
this.
Basically,
you
take
a
sometimes
it's
referred
to
as
n-gram,
but
you
take
a
string
and
you
break
it
into
three
character
chunks,
depending
on
how
you
want
to
indexed.
You
know
for
case
insensitive,
vs
I'm
case
sensitive
you'll
run.
A
A
My
first
try
gram
and
so
really
I've
just
taken
the
the
Unicode
byte
values
and
I've
stored
it
in
2,
128
bit
int,
and
the
reason
why
I'm
using
128
bits
bit
inces,
because
unicode
characters
have
a
maximum
length
of
4
bytes,
which
means
I
can
stuff
four
of
them
into
128
bit
integer.
But
I
can't
do
that
for
a
64
bit
and
we
we're
a
global
company
like
I
assume.
All
of
you
are
so
we
want
to
support
full
unicode.
A
A
I'll
get
into
that,
but
and
that's
really
tightly
integrated
with
my
performance
solution
that
he
mentioned
they
asked
about
earlier.
But
these
can
be
really
large,
and
so
you
can't
stick
it
into
an
individual
column,
because
individual
columns,
I
have
a
limit
of
I,
think
256
megs,
and
so
each
of
these
slices
could
potentially
be
a
bit
bigger
than
that
dependent
on
the
distribution
of
your
data.
A
A
So
if
I
want
to
do
a
regex
search,
then
what
I
do
is
I'll.
Take
that
same
regular,
I'll
I'll
break
this
into
a
series
of
bullying
clauses
that
need
to
be
matched
in
order
for
the
regex
to
be
satisfied.
So
for
this
example,
I
have
to
try
grams
that
it
needs
to
match
in
order
for
the
regex
to
be
sassed
by
thi
and
ing,
and
so
what
I
do
is
I
can
vert
those
to
the
the
hex
representation.
I'll.
A
A
Yep,
so
if
you
look
back
at
this
trigram
really,
oh,
it's
really
a
range.
So
really
as
long
as
oh
sorry,
that's
not
actually
what
I
wanted
to
show.
So
I
only
need
to
come
down
to
the
candidate
list,
and
the
candidate
list
gives
me
of
a
list
of
documents
to
have
a
probability
of
matching
the
regular
expression
right
and
then
from
that.
A
I
need
to
actually
run
the
regular
expression,
but
the
nice
thing
is
that
were,
for
the
most
part
were
not
ever
asking
for
give
me
all
documents
right
now
that
have
that
potentially
match
this
regex
we're
returning
20
results
at
a
time,
and
so
what
we
do
is
I'll
collapse,
the
indexes
down
to
my
final
candidate
list
and
then
I'll,
look
at
the
first
20
records
and
then
I'll
look
at
the
the
actual
text
for
those
20
records
and
run
it
through
an
actual
regex
engine
to
verify
that
it
matches
if
it
met
if
all
20
match,
then
I
just
returned
those
20
results.
A
If
one
of
those
doesn't
match,
then
I
discard
it
and
I
get
the
next
item
out
of
the
bitmap
array
and
then
I
return
that
to
the
customer,
and
so
it's
really
it's
taken
a
problem
that
seems
intractable
and
its
really
collapsed
it
down
giving
me
a
very
mandrel
set
of
documents
to
look
at
and
from
there
I
can.
You
know,
run
regex
on
a
tiny,
tiny
fraction
of
the
overall
results.
I
need
to
return,
and
this
is
last
year.
There
is
a
paper
by
the
the
guy
who
wrote
google
code
search.
A
This
is
exactly
exactly
how
Google
code
search
works,
and
so
one
of
the
examples
that
they
give
is
right.
So
we're
going
to,
we
want
to
run
a
grep
across
on
the
the
Linux
kernel
source
code.
If
you
just
ran
the
raw
regex
checking
32,000
different
files,
but
if
I
use
the
trigram
index,
I'm
checking
like
20
files,
and
so
it's
more
than
100
x
performance
improvement
over
the
naive.
Let's
just
do
a
linear
scan
over
all
the
data
really
and
that's
what
we're
trying
to
avoid.
A
If
we
want
to
do
linear
scans
over
the
all
the
date,
we
would
just
go
with
Hadoop
or
drum
all
or
any
of
those
other
ones.
Really
it's
like
and
there's
just
no
way
you
can
make
that
fast
enough,
because
you
know
you
can't
scan
through
that
much
data
quickly
enough
to
return
a
web
result.
The
trick
is
to
take
these
seemingly
intractable
problems
and
make
them
single
the
size
of
a
single
machine,
and
from
that
you
get
raw
cpu
performance
I
mean
like
the
implementation
that
I'm
working
with
I'm
only
limited
by
the
band.
A
The
memory
bandwidth
of
the
machine
performing
the
query
so
on
my
laptop
I,
get
16
gigabytes
per
second
of
rock
query:
throughput
on
server
hardware,
I
get
40
gigabytes
per
second
of
rostov
or
throughput,
and
that's
just
you
know
answering
it.
For
you
know
one
user
on
a
given
shard
or
product
and
I
have
multiple
replica
sitting
around
so
I.
Can
you
know
answer
three
of
those
for
any
given
consumer
and
then
this
whole
solution
scales
linearly.
So
you
just
take
this
very
large
problem.
A
You
break
it
in
smaller
and
smaller
parts,
and
so
you
get
the
smallest
thing
that
you
can
actively
work
on,
so
that
was
great
for
reg
X's
that
that
easily
break
down
to
trigrams.
So
how
does
it
work
for
this
case
where
I've
got
th
and
ing,
so
the
th
isn't
a
complete,
trigram
and
really
could
match.
You
know
in
the
case
of
English
26,
possible
trigrams,
so
I
I.
What
I
do
is
I
break
it
into
the
boolean
clauses
again.
I
know:
that's
not
complete.
A
Trigram
I
break
this
into
a
range
query,
so
I
say
give
me
all
trigrams
that
match
th
plus
byte
0,
all
the
way
to
th
plus
0xff,
and
so
it
involves
getting
more
right.
Bitmap
indexes,
but
again,
bitmap
indexes
are
very
fast.
It's
running
at
the
speed
of
hardware:
it's
not
a
big
deal,
and
it's
going
to
be.
You
know
compared
to
your
network
latency
for
returning
results
and
stuff
like
that.
A
So,
yes,
it
really
is
tacos
all
the
way
down
and
the
reason
why
it's
tacos,
all
the
way
down
is
because,
if
you,
google,
Turtles
all
the
way
down,
this
is
what
you
get:
okay
I
like
the
tacos,
so
implementation
really
scary,
wireframe
diagram,
but
basically
events
come
into
the
system.
We've
got
engine
X
sitting
at
the
edge
that
writes
up
to
a
log
file
just
for
redundancy
purposes,
and
it
goes
off
to
an
airline
process
called
glutton.
A
Glutton
has
a
local
q,
that's
level
Davey
DB
and
it
just
it's
only
real
purposes
to
to
spool
events
off
into
a
distributed
q.
We
were
originally
targeting
Kafka,
but
it
looks
like
we're
probably
going
to
end
up
going
with
NS
q.
I
talked
to
my
ops
guy
this
morning
and
he's
in
love
with
it
already
so,
oh
and
glutton
very,
like
lightweight
Erlang
process,
it's
actually
faster
than
engine
axe.
It's
capable
of
both
a
single
machine
is
capable
of
handling
the
excess
of
200,000.
A
Eight
events
per
second,
it's
actually,
it's
I'm
not
sure
how
fast
it'll
go
because
all
of
my
load
testing
tools
broke
before
there
and
they
were
able
to
stress
it
far
enough
so
I'm
happy
with
that.
It
goes
into
the
the
distribute
queue
and
then
we've
got
another
erlang
process
that
pulls
items
out
of
the
queue
to
be
indexed
and
the
the
way
that
it
works.
Is
that
because
we
consume
events
from
multiple
different
sources,
not
only
the
ones
that
we
generates
as
part
of
our
JavaScript
or
our
app
integration.
A
Api's
we've
got
3rd
party
plugins
for
things
like
stripe,
MailChimp
we're
curly.
You
know
it's
really
that
that
list
can
be
n
length,
so
I
didn't
want
to
have
to
go
in
and
muck
with
C++
every
single
time.
Somebody
want
to
add
a
new
customer
integration.
So
all
events
have
types
and
those
get
dispatched
to
a
lua
script,
so
we've
embedded
Lua.
A
So
if
you
need
to
add
another
event
type,
you
just
add
another
Lewis
script
and
you
register
it
with
the
system
and
the
Lewis
script
will
take
the
the
event,
convert
it
to
the
canonical
format
and
tell
the
system
what
fields
it
wants.
Indexed
it'll
just
index
them,
and
then
it
writes
it
through
to
Cassandra
and
cassandra
is
the
eventual
data
store
everything's
pluggable
here?
So
if
you
wanted
to
swap
out
the
scripting
engine
with,
say,
v8,
that's
doable.
A
A
Anything
else
there,
oh
we're
also
on
the
this,
isn't
done
yet,
but
the
we
also
allow
Lewis
scripts
for
filtering
of
columns
that
we
return
in
results.
So
the
you
can
say
only
returned
columns
that
have
a
value
greater
than
10,
or
something
like
that.
So
you
can
actually
write
scripts
that
filter
out
the
the
columns
that
return
to
so
we're
not
returning.
You
know-
and
just
like
you
know,
give
me
this
list
of
columns
or
something
like
that.
You
can
really
dive
deep
into
the
data
and
get
as
detailed
as
you
want.
A
So
how
do
I
store
the
data?
Basically,
the
way
that
works
in
Cassandra
I
do
the
field
that
I'm
indexing
and
then
underscore
the
the
value
for
the
bit
slice
index.
So
the
bits
slice
that
represents
00
has
its
own
row,
the
bit
slice,
the
hat
that
represents
value,
01
m.
It
has
its
own
row
and
that's
0
1
and
then
within
each
bit,
slice
I
only
want
to
store
the
values
that
are
set.
A
I,
don't
want
to
store
massive
amounts
of
zeros,
and
so
the
columns
are
actually
offsets
and
so
I
store
the
index
in
chunks
of
n
bytes
right
now,
n.
Is
that
the
256?
So
if
I
have
five
bits,
sit
at
the
big
set
at
the
beginning
and
then
five
gigabytes
of
zeros
and
then
another
bit
set
well,
I
only
have
to
store
512
bytes.
A
That's
the
approach
that
I
took.
The
other
approach
is
I
can
get
into
him,
because
I
spent
way
too
much
time
reading
papers,
but
they're
using
compression.
So
if,
like
the
druid
project,
that's
another
database,
that's
capable
that
uses
bitmap
index
is
capable
of
doing
lots
of
really
cool
things,
but
they're
using
compression
in
the
way
that
bitmap
compression
works
is
that
it's
just
run
length
compression,
so
I'll
store.
A
Let's
say
the
first
five
bits
are
set
and
then
I've
got
five
gigs
two
zeros
well
they'll
set
the
first
five
bits
and
then
they'll
store
a
little
header.
Essentially
that
says
the
next
five
gigabytes
are
zeros
and
then
the
next
you
know,
whatever
couple
of
bits,
are
actually
literal
bits
of
stored
values.
A
The
problem
with
that
is
that
you
don't
you
can't
seek
directly
to
the
position
that
you
want,
and
so
I
always
have
to
read
through
the
entire
bit
slice
in
order
to
figure
out
all
right
what
values
are
actually
set
or
heaven
forbid
I
want
to
change
something.
I
have
to
seek
to
that
position
in
the
array
set
the
bit
and
then
re-encode
the
bits
that
became
became
before
it,
and
so
Allah
involves
lots
of
memory
copies
and
things
like
that.
There
are
other
things
like
range
encoding
there.
A
You
can
sort
of
store
this
stuff
as
a
giant
skip
list
so
that
you
know
so
like
I
can
skip
two
different
segments
and
possibly
do
compressions.
That's
too
complicated
for
a
first
version,
so
I
just
decided
to
go
with
what
simple
will
works
and
I
think
this
will
solve
most
of
the
problems
and
if
it
doesn't
well,
we
can
do
other
things
also
because
it's
in
Cassandra
we're
using
compression
I'm
using
my
lips
eql
version
in
the
driver,
so
I'm
using
snappy
compression
there.
The
level
DB
is
using
snap-snappy
compression.
A
A
This
is
the
query
language
giant
wall
of
text,
but
I
just
want
I
felt
I,
should
give
you
an
example
of
what
the
query
language
looks
like
it's
lispy
fractions
s,
expressions
because
I
like
Lisp,
but
it's
just
the
dsl
implemented
in
C++,
using
the
the
spirit
parser
from
boost
the
the
core.
The
most
basic
operation
in
the
query,
language
is
a
slice.
Basically,
I
want
to
create
I
want
to
compress
a
two-dimensional
bitmap
index
into
a
single
one,
dimensional
slice
which
gives
me
you
know
what
rows
are
hit.
A
So
if
I
just
want
them
like
all
users
that
have
ever
triggered
the
visit
event.
Well
then
I
just
I
can
press
the
entire
two
dimensional
index
into
one
bits,
life
and
I,
just
name
my
index.
If
I
want
to
get
all
users
that
have
this
in
2013
well,
then
I've
got
I'm
using
I'm
doing
aggregation
on
datetime
buckets
like
I
assume.
A
lot
of
you
are
so
like
you
know,
I'll
have
visit
for
the
total
visit
counts.
A
A
Aggregate
buckets
if
I
want
to
invert
that
so
give
me
all
users
that
haven't
triggered
an
event
in
this
particular
index.
Then
I've
got
an
operator
if
I
just
want
people
that
have
performed
the
visit
event
more
than
six
times
for
the
for
that
particular
aggregation
bucket
and
I
have
an
arranged
offer
and
then
I've
got
a
full
bullying
support,
so
I've
got
or
xor
and
and
there's
one
other
operator
in
there,
but
those
are
nestable.
A
So
you
can
have
a
tree
of
n
depth,
so
you
couldn't,
like
you
know,
have
all
right:
I
want
users
that
have
bought
something
in
the
past.
30
days
that
came
from
this
ad
campaign
and
they
should
like
the
color,
blue
or
possibly
periwinkle,
you
know,
and
you
can
just
you
know
you
can
do
that
you
can
go
crazy,
crazy
town
and
I.
A
Don't
really
put
limitations
on
you
just
have
to
realize
that
you
mean
you
guys
are
also
more
people
you,
you
know
there
is
a
cost
to
query
complexity
and
the
flatter
the
shallower
you
make
the
tree,
the
cheaper
it's
going
to
be,
and
then
at
the
very
bottom,
I've
got
my
reg
ex
example
with
a
regex
operator.
Basically,
it's
just
a
convenience
method
that
converts
the.
A
Converts
that
string
into
a
series
of
born
and
booming
clauses
which
gets
executed
just
like
any
other
or
and
bullying
Clause
would
be
in
the
tree
right
now.
The
query
planner
is
very
naive,
so
I
just
sort
of
I
do
what
the
tree
tells
me
to
do.
But
there
are
some
optimizations
available
like
for
and
clauses
a
matter.
You
can
get
some
optimizations
depending
on
size
of
individual
indexes.
So,
like
you
want,
you
want
to
end
the
two
smallest
indexes
to
smallest
indexes.
A
First,
that
means
them
because
if
a
bit
can
only
be
I
can
come
through
an
and
expression
if
it's
set
in
all
the
different
indexes.
So
if
you
look
at
smallest
to
and
get
those
results
well,
I
don't
have
to
look
at
the
the
all
this
giant.
You
know
tale
of
these
larger
indexes,
because
I
know
that
none
of
these
other
bits
are
set,
so
it
can't
make
it
through
and
so
than
you
anyway,
I'm
rambling.
A
Results
so
far,
so
so
this
is
just
on
my
laptop
for
an
8
Clause
query
for
4
billion
rows
is
less
than
two
seconds,
and
you
know
that
goes
up.
You
know
and
that's
a
single
threaded
by
the
way,
so
it
gets
better
with
multi
threaded,
because
all
of
this
is
inherently
parallel
and
I
haven't
even
started
to
use
the.
A
Fancy
CPU
instructions
yet
right
now,
I'm
just
using
64-bit
ins,
so
there's
fancy
hardware
support
for
vectors
in
s
SS
three,
so
I
can
start
doing
parallel
loading
and
X
and
bitwise
operations
on
chunks
of
512
bytes.
So
this
will
get
much
faster.
You
know
and
then
it
goes
to
all
right.
You
know
we
compared
this
byte
of
chunk
of
index
in
like
seven
CPU
cycles.
You
know
and
I've
get
billions
of
CPU
cycles
per
second
per
core
and
I've
got
32
cores
on
my
box.
A
Do
you
guys
do
the
math,
so
I've
got
full
regular
expression,
support
full
support
for
range
queries,
the
ability
to
index
any
numeric
value,
and
that
includes
all
values
that
can
be
hashed.
So
let's
say
you
just
you
don't
care
about
regular
expressions
for
a
particular
string
value?
Let's
say
it
was
you
know,
browser
agent,
you
know
that's
a
good
one!
I
don't
needs
to
perform
regex
on
my
browser.
Agent
I
just
need
a
hash,
and
so
that
would
just
be
run
through
a
hash.
A
What
isn't
finished
yet-
and
this
is
going
to
be
done
in
in
version
one
so
version
one
is
due
to
go.
You
know
limited
live
by
the
end
of
this
month,
so
this
stuff
will
be
done
within
the
next
couple
of
weeks,
but
support
for
atomic
counters
so
I'll
be
able
to
index
whatever
counter
you
guys
encounter
fields.
You
guys
are
storing
in
Cassandra
group
by
query
aggregations
so
give
me
cohort
analysis
or
funnel
reports.
A
For
you
know,
revenue
based
off
of
ad
campaign
hits
and
we're
still
working
on
some
of
the
event,
processing
and
distribution.
But
you
know
that's.
You
know
work
before
this.
I
came
from
the
react
core
people
and
you
know
we
were
hashing
out
how
we're
gonna
do
some
of
this
stuff,
but
it
this
is
all
in
active
development.
All
of
this
is
going
to
be
in
place
for
version
1,
like
I,
said
it's
all
open
source,
so
we're
doing
I've
created
its
own
org
project,
Z
I'm,
not
very
good,
think
about
names.
A
Project
X
was
already
taken,
so
you
know,
and
then,
within
that
it's
being
broken
out
into
several
sub-projects
mutton
is
the
core
library.
That's
the
only
thing
that's
public
right
now,
but
the
all
the
event
processing
stuff
is
called
sourdough
and
that's
also
going
to
be
made
public
and
some
of
the
Lewis
stuff
is
called
the
chuga
which
is
Spanish
for
lettuce.
If
you
haven't
noticed
already,
we
have
a
sandwich
theme
going,
but
OOP
alright.
So
that's
my
spam
trap
and
my
twitter
handle.
A
A
A
So
we
have
right
now
we're
doing
locking
on
each
individual
index
but
I'm,
probably
to
move
that
to
each
individual
slice
because
and
that
could
actually
be
moved
down
to
each
segment.
If
you
know
if
I
wanted
to
but
and
I'm
also
doing
tiered,
locking
so
I've
gotta
read
lock
and
a
write
lock.
So
multiple
reads
can
happen,
but
only
one
right
can
happen
and
really
the
the
problem
is
that
a
segment
is
only
it.
Each
segment
is
its
own
little
chunk
and
you
have
to
make
sure
that
you
don't
have
multiple
chunks
overriding
it.
A
C
C
A
The
thing
is,
it's:
it's
really
dependent
on
your
distribution
of
data,
so
you
know
if
you
have,
if
you
have
one
bucket
or
shard
or
whatever
you
want
to
call
it
or
or
we
have
one
really
hot
customer,
then
it
could.
You
could
have
some
right
contention
if
they're
all
writing
to
the
same
counter
a
lot
and
they
all
happen
to
be
in
the
and
all
the
users
happen
to
be
in
a
very
tight
distribution.
C
A
A
A
Where
I
know
there
could
possibly
be
issues
locking
being
one
the
CP
different
CPU
instructions
being
another.
The
way
some
of
the
data
is
out
late
and
memory
where
optimizations
could
fit
in
if
the
need
arises.
So
it's
not
like
this
is
we're
you're
not
locked
into
anything
you're.
Not
if
you
decide
you
want
to
like
you
know,
use
this
and
then
also
because
you're
processing,
the
events
asynchronously,
you
can
always
allow
events
to
back
into
the
queue
a
little
bit
and
you'll
have
like
a
soft
real.
A
It
won't
necessarily
be
true
real
time,
but
it
will
be
soft
real
time
of.
Like
you
know,
events
coming
into
the
system
might
live
in
the
queue
for
a
couple
hundred
milliseconds
to
wait
for
their
they're
actually
committed.
Then
we
also
answer
queries
preferentially
to
committing
new
data,
because
the
queries
are
representing
actual
user
computer
interaction.
So
you
want
that
to
be
snappy,
whereas
you
know
I've
been
coming
to
the
system
if
it
waits
a
hundred,
an
extra
hundred
milliseconds,
no
big
deal
so
yeah.
D
A
A
Yeah,
so
the
problem
is
that
we've
got
too
much
data,
so
we're
doing
with
hundreds
and
hundreds
of
terabytes
of
data
and
I
mean
you
could
use
loose
I
mean
you
could
do
some
of
the
same
ideas
and
usually
seen
to
do
the
bitmap
stuff.
But
you
know
I've
got
higher
performance
stuff
in
the
back
end
already,
then
their
scenes
capable
of
doing.
A
A
D
A
That
terabytes
is
a
rounding
error
for
me.
So
I
mean,
like
you,
know,
we're
literally
we're
consuming.
You
know
more
than
we're
in
the
hundreds
of
millions
of
events
per
day
and
that's
going
up
very
very
quickly,
because
we've
sort
of
reached
that
nice
deflection
hockey
stick
curve
that
every
company
hopes
to
get
to,
and
it's
just
a
lot
of
data
it's
and
it.
If
we
were
talking
about
tens
of
terabytes,
then
yeah
I
would
have
done
it
because
I've
actually
run.
A
A
No
because
the
the
data
is
actually
stored
in
the
so
the
mutton,
ands
the
the
query
tier
actually
stores
its
own
copy
of
the
data
and
it
writes
through
to
Cassandra
and
so
that
you
can
be
accessed
from
Hadoop
and
all
the
other
tools,
and
also
it's
the
if
all
the
clustering
stuff
that
we
wrote
absolutely
clash
off,
luckily
fails.
We
can
reconstruct
it
from
Cassandra,
but
none
of
the
live
data
is
our
live.
A
Queries
are
stored
from
Cassandra
we're
actually
using
leveldb
on
the
individual
query
nodes
so
that
we
have
a
local
copy
and
then
we
actually
keep
all
the
indexes
resonance
and
memory.
That's
another
reason
why
we
use
C++
over
Java
is
that
we
have.
You
know
it's
not
a
big
deal
in
C++
for
me
to
consume
200
gigabytes
of
memory
and,
like
you,
ain't
just
load
up
a
box
with
256
gigs
dedicate
to
hundra
that
just
to
this
one
process,
no
big
deal,
but
in
Java
you
can't
do
that.
So.
B
A
A
It
can
be
all
over
the
place,
so
we've
had
some
bad
customers
that
want
to
shove
entire
stack
traces
through
as
an
event
we'd
have
tendency
to
get
angry
with
those
individuals,
but
it
could
be
something
as
simple.
As
you
know,
this
page
was
viewed
and
Dick
or
it
could
be.
You
know
this.
Video
name
was
viewed,
and
this
is
the
users
ID
and
some
other
stuff.
So
really
it
could
go
from
a
couple
of
bites
to
let's
say
2,
I'm
gonna,
probably
like
the
average
larger
event
is
probably
like
two
or
three
k.
A
B
C
A
A
So
as
long
as
you
can,
you
know,
grab
the
lock
it's
going
to
update
it
and
so
much
quicker
I
don't
have
to
worry
about.
You
know,
there's
a
lot
less
complexity,
and
you
know,
as
long
as
there
are
free
cycles
and
we're
not
answering
queries
which,
for
the
most
part
you're
not
in
like
new
customers
aren't
hidden.
You
might
you
know
thousands
of
queries
per
second
well,
an
individual
user
won't
be
hitting
with
thousands
of
queries
per
second.
So
most
of
the
time
the
locks
are
going
to
be
free.
A
A
Because
I
just
store
in
regular
offsets
of
like
256,
bytes
or
512
bytes,
and
so
from
that
I
can
just
do
the
math
and
know
right
if
this
segment
were
to
exist.
This
is
the
calm
the
would
exist
in
and
so
I'll
ask
for
that
column.
If
it
exists,
then
I
get
back
fine.
If
it
doesn't,
then
I
assume
it
doesn't
exist
and
then
I'll
just
create
an
empty
segment
to
set
the
bit
and
then
persist
it
to
the
the
data
stores.
A
Oh
so
the
field-
let's
say
all
right
so
of
a
bucket
called
bucket
foo
a
field
called
field
bar
right
and
so
that'll
be
concatenated
as
the
first
chunk
of
my
row
key
and
then
let's
say
I'm
indexing
value.
One
then
it'll
be
underscore,
then
the
byte
value
that
I'm
that's
being
indexed,
which
is
128-bit
int
and
then
within
that
I'll
have
the
individual
columns
which
represent
the
segments
that,
for
the
actual
rose,
it's
sort
of
hard
to
do
with
my
hands.
C
A
Yes,
one
bit
always
maps
back
to
a
single
row,
not
necessarily
user,
but
a
row.
Oh
so
too,
how
do
you
invert
that?
Yes,
okay,
all
right,
I,
totally
get?
Yes,
it's
I
just
have
another
index
that
map's
the
the
hash
value
or
the
row
position
back
to
the
actual
roki
does
to
reverse
it.
So
yeah
I
didn't
show
that
it
is
possible
to
have
a
collision,
but
with
128-bit
hash,
collisions,
very
rare,
and
so
what
I'll
do
is
I'll
just
store.
Those
all
I'll
store
those
two
values
for
the
hash
value.