►
From YouTube: This Week in Cassandra: Analytics without ETL 3/25/2016
Description
Link to blog referenced in video: http://www.planetcassandra.org/blog/this-week-in-cassandra-analytics-without-etl-3252016/
A
Hello,
human
beings:
how
are
you
I'm,
John
Haddad?
This
is
this
week
in
Cassandra
on
planet
Cassandra
with
me:
Luke
Tillman
also
technical
evangelist
for
data
stacks.
Of
course,
now
we
have
Evan
Chan
from
double
jump.
He
is
our
pretty
much
one
of
the
most
intense
analytics
people
in
the
Cassandra
community
that
I
know
of
so
that's
pretty
low.
It's
pretty
exciting,
I
think
so.
B
A
We
got
a
couple
blog
posts
that
we're
going
to
be
looking
over
we're
going
to
be
talking
to
Evan
about
analytics
on
Cassandra
without
etl,
which
is
kind
of
a
weird
little
thing.
I'm,
not
a
lot
of
people
are
aware
of,
I
think,
or
at
least
having
been
over
the
last
few
years,
definitely
getting
more
popular.
So,
let's
take
a
look
at
our
our
blog
posts.
First
thing
that
we're
going
to
dig
into
you
this
cystic
post
about
analyzing.
A
What's
going
on
in
production,
this
thing,
I
thought
was
awesome
being
like
a
pretty
having
a
lot
of
time
in
the
Hoffs
world.
I
love
this
sisting
post.
Look
at
him.
Tell
me
Luke
I,.
B
You
know
I'm
not
an
ops
guy,
so
you
know
any
time.
There's
an
ops
post.
You
know
I
kind
of
it.
I,
kinda
and
I'm
also
I
also
happen
to
be
the
token
windows
guy
on
the
team.
So
you
know
my
linux.
Foo
is
pretty
weak
when
it
comes
to
stuff
like
this,
so
I'd
ask
you,
because
you
were
really
excited
about
this.
You
know.
I
was
definitely
familiar
with
some
of
the
tools
that
they
were
talking
about
where
it
looks
like.
B
A
I,
what
I
was
gonna,
try
and
not
wrapped
about
this
for
the
next
10
minutes,
but
since
the
balls
in
my
court
I
don't
mind
doing
it,
ok,
I
haven't
spent
a
lot
of
time
working,
let's
just
dig
yet
so
like
reading
this
article
was
really
for
me.
There's
a
couple
things:
I
really
appreciated
it
about
it.
One
was
the
attention
to
detail
that
is
in
this
post
to
me
is
absolutely
amazing.
I
love
this
thing.
A
A
A
C
A
A
B
Know
another
good
thing
about
this
article,
though
too
was
he
kind
of
walks
you
through
the
steps
of
how
to
debug
a
Cassandra
problem
like
that
using
this
tool
too.
So
you
know
from
a
practical
practical
perspective,
not
just
to
like
you
know,
talk
about
how
great
this
tool
is.
You
know
it's
very
nice
to
to
kind
of
see
the
thought
process
that
went
into
it.
Yeah.
A
Huge
fan
of
this
one,
the
the
next
thing
that
we
had
on
our
list
was,
was
kind
of
cool,
a
like
an
auntie
patterns
post
and
I
I.
Like
these,
I
love
like
a
little
bit
of
sarcasm,
how
not
to
start
with
Cassandra.
This
is
it's.
It's
really
really
it's
great,
because
I
think
it's
how
a
lot
of
people
try
and
start
with
cassandra,
and
unfortunately,
they
kind
of
get
it
wrong
and
shoot
themselves
in
the
foot.
Yeah.
A
B
Do
you
think
I
was
going
to
say
I
kind
of
feel
like
it
was?
It
was
like
you
know
we
go
around,
we
do
these
Cassandra
day's
events
and
you
and
I
are
often
in
the
beginner
track,
presenting
to
people
that
are
brand-new
that
have
relational
database
experience,
which
is
almost
a
hundred
percent
of
the
rooms
we
we
present
to,
but
don't
have
really
any
Cassandra
experience
and
I
felt
like
this
blog
post
is
like
a
lot
of
things
that
we
tend
to
say
during
those
presentations.
B
A
Guys
so
one
of
the
things
that
the
points
that
this
blog
post
I
hadn't
here
was
about
iterating
over
your
data
model.
Right,
like
just
you
know
things
that
you
have
to
solve
up
front
you
guys.
So
one
of
the
things
that's
interesting
about
this
is
that
we
always
tell
people.
Is
you
want
to
know
your
queries
up
front
right,
so
you
can
optimize
to
query
against
a
single
partition.
A
A
Your
queries
like,
like
I,
have
not
encountered
a
magical
database
that
you
just
throw
random
stuff
at
it
and
then
all
of
a
sudden,
it's
like
hey
I,
got
your
data
like,
and
it's
fast
and
I
didn't
need
to
know
anything
like
I.
Don't
know
if,
if
you
guys
touched
it,
this
mythical
unicorn
database
for
it,
just
as
for
you
automatically
no.
B
Works
and
I
mean
I,
guess
I
just
make
the
point
like
and
I
like
to
make
this
a
Cassander
days
to
is
that,
like
you
know,
you've
touched
on
third
normal
form,
and
it's
so
like
it's
so
ingrained
in
us
with
people
with
relational
backgrounds
that,
like
yeah,
you
do
data
modeling
up
front,
but
a
lot
of
it
is
on
autopilot,
because
there
is,
this
sort
of,
like
kind
of
you
know,
prescribed
way
that
you
kind
of
start
your
data
model.
So,
like
you
know,
you're,
not
thinking
about
it,
probably
as
much
and
you're.
B
Definitely
not
thinking
about
it
in
the
same
ways,
I
think
that
you
do
with
Cassandra
data
modeling
where
you're
you
know
kind
of
at
least
on
the
transactions.
I
you're,
starting
with
your
queries,
upfront
to
try
and
you
know,
try
and
make
the
past,
and
that's
because
we
don't
have
joins,
we
don't
have
secondary
indexes.
We
can't
just
you
know.
B
Do-
and
we
have
better
ones
now,
yeah
but
that's
a
whole
other,
but
the
whole
their
topic,
but
we
don't
have
those
kind
of
things
to
just.
You
know
the
queries.
We
don't
think
about
up
front.
We
don't
have
those
things
to
kind
of
fix
the
problem
for
us
later.
You
know
in
it
like
that.
We
do
in
a
relational
database,
so
yeah.
A
Well,
maybe
maybe
maybe
wait,
maybe
we
actually
do
have
those
tools
that
kind
of
it's
an
interesting
segue
kind
of
you
know.
The
other
thing
that
we
wanted
to
talk
about
was
I've
ended.
This
full
analytics,
bold
right,
fixing
your
data
model
like
Luke,
I,
I,
completely
agree
with
you
that
you
know.
If
you
have
a
totally
broken
data
model,
you
roll
into
production
and
you're
like
hey
I'm,
going
to
use
Cassandra's,
let's
say
you're
using
33.4
secondary
indexes.
B
A
A
How
many
times
have
you
looked
at
like
lambda
architecture
and
seen
you
know
we
have
to
do
all
this
like
chaos
and
this
etl,
and
you
it's
just
a
lot
of
like
cognitive
overhead,
so
I
think
one
of
the
one
of
the
reasons
why
I'm
really
excited
to
have
have
you
talked
with
us
today.
Evan
is
you
know,
you're
one
of
the
first
people
that
that
I've
heard
of
that
had
been
working
with
spark
and
cassandra
together.
I
actually
saw
your
talk
that
at
the
fort
mason
summit,
would
you
kind
of
present
definite,
really
yeah?
A
C
Think
we're
definitely
seeing
more
and
more
people
that
want
to
try
out
using
a
spark
and
Cassandra
to
do
queries,
especially
those
coming
from
the
relational
world
or
having
to
work
with
traditional
of
BI
stacks
kind
of
type
and
I.
Think
the
two
things
you
speak
up
are
really
interesting.
Kind
of
11
is
the
need
for
data
modeling,
but
but
two
is
that
coming
into
in
cuz
underworld,
things
are
a
you
know,
a
bit
different
right,
mom.
C
What
that
you
don't
have
the
joins,
but
but
I
think
that
when
you
look
at
you
know
the
power
of
spark.
One
thing
that
it
gives
you
is
is
that
you
can
do
these
more
complex
things.
You
can
do
joints
that
doesn't
make
them
fast
necessarily,
but
the
fact
that
you
can
do
them
and
can
do
other
things
opens
up
a
lot
of
other
possibilities.
For
example,
you
see
more
people
trying
to
marry
Cassandra
with
machine
learning
by
using
spark
like
they
can
store
the
raw
data
at
the
time
series
data.
C
Then
it
can
pull
it
out
and
build
models
with
it,
which
is
pretty
powerful.
You
know
and
like
what
you've
mentioned
about
the
lambda
architecture,
that's
something
that
I've
spoken
about,
and
my
colleague
Helena
adults
in
this
talked
about.
If
I
were
doing
a
drawing
target
at
strada
next
week,
I
saw
this,
but
what
the
idea
is
that
I
think
for
a
lot
of
people
that
you
know
they
want
the
benefits
of
Cassandra.
C
They
want
to
be
able
to
write
to
a
solid
database,
for
that
is
idempotent
for
their
IOT
or
time
series
stuff,
but
at
the
same
time
they
need
to
run
a
lot
of
analytics
on
it.
So
what
a
lot
of
people
are
doing
is
that
they
would
ingest
in
a
Cassandra,
but
at
the
same
time
they
are
doing
utl
into
whatever
hdfs
files,
and
then
suddenly
you
got
this.
These
two
systems
that
are
both
like
you
know.
You
need
to
maintain
two
systems:
I
need
to
figure
out
how
to
merge
resell.
C
So
this
is
like
pretty
complex
right
and
something
that,
if
you
can
avoid
avoid
it
and
do
everything
one
system,
why
not
do
that
right?
So.
B
C
I
think
that's
a
really
good
question.
I
mean,
I
think
the
I
think,
a
challenge
that
people
have
is
is
that
cassandra
is
designed
for
you
to
read
small
amounts
of
data
like
what
massive
concurrency
write
and
read
and
write,
and
so
it
works
extremely
well
for
that
arm.
C
But
if
you
want
to
use
it
to
read
a
huge
amount
of
a
for
bulk
analytics
that
you
need
to
be
a
bit
more
creative,
so
if
you
just
use
the
normal
cqo
tables,
you
might
find
that
the
data
size
and
crave
sweets
are
not
what
you
might
be
used
to
from
the
HDFS
or
Hadoop
worlds.
So
you
need
to
be
a
bit
more
creative
and
there's
a
couple
of
different
strategies
you
can
take
like
I.
Think
the
traditional
strategy
would
be.
Let
me
model
many
different
tables
for
crazy.
C
So
let
me
have
a
job
that
can
summarize
like
one
thing
we
used
to
do
at
regala
Whidbey
to
have
massive
lube
jobs.
We
would
then
crunch
every
conceivable
slice
and
dice
kind
of
queer.
You
could
run
and
we
would
write
those
in
Turkish
under
that
way.
You
can
read
them
out.
As
you
know,
very
small
price.
A
C
How
does
some
data
some
limits
and
becomes
inflexible,
because
then
anytime,
you
need
to
change
something
I
would
need
to
edit
of
me
to
edit
my
massive
Hadoop
job.
If
that
would
not
be
easy
and
for
certain
kinds
of
queries,
you
don't
have
enough
space
to
write
out
every
single
kind
of
query
right.
All
right,
like.
C
Impractical
and
the
the
other
thing
is
that
so
right
now,
I'm
working
with
the
enterprise
that
is
actually
looking
to
move
a
data
warehouse
into
sparking
Cassandra.
So
for
them
it's
kind
of
a
trade-off
right.
The
more
of
this
room
reports
you
to
run
and
there's
some
flexibility
that
is
required
on
the
extreme.
C
We
can
try
to
I
great
everything
into
tables,
then
you
can
do
small
reads,
but
that
increases
the
etl
complexity
and
it's
there
certain,
like
David,
traditional
star,
schema
where
certain
dimension
tables
might
be
got
slowly,
updated
and
certain
it
to
get
updated,
then
having
to
update
all
of
your
query
tables
becomes
more
complex,
so
it's
kind
of
a
trade-off
like
if
you
look
at
a
scale
like
do
you
want
to
go
all
the
way?
To
that
end,
where
you
write,
you
know
everything
and
try
to
update
everything.
C
Or
do
you
try
to
do
something
that
is
more
in
between
I?
Think
smart
gives
you
that
flexibility
that
were
you
don't
have
to
write
everything
you
don't
have
to
carry
data
modeling
to
the
extreme.
You
can
have
you
still
date,
a
more
like
still
very
important.
You
still
to
have
partition
keys
that
allow
facilitate
really
fast
lookups,
but
then
you
don't
have
to
quite
get
it
to
that
extreme
and
can
do
more
things
in
spark.
Yeah.
A
A
No,
but
if
you
you
know,
if
you
have
a
big
analytics
job
that
were
to
take
like
you
know,
10
minutes
to
run,
and
then
you
kind
of
like
a
cool
like
I,
made
an
optimization
to
this
key,
and
it
takes
nine
minutes
to
run
like
you
didn't
really
solve,
like
I'm,
probably
a
major
problem
right
like
unless
I
happen,
to
have
an
SLA
of
like
nine
minutes,
45
seconds
life,
then
you're,
fine,
I
guess
but
like
for
the
most
part
of
its
gonna,
make
that
big
of
a
difference
yeah
so
yeah.
There's.
A
I,
like
the.
I
really
really
like
the
additional
flexibility
that
you
can
kind
of
go
with
spark
and
not
having
to
remodel
all
your
data.
I
completely
agree
with
you
and
that
and
that's
really
what
kind
of
like
the
whole
data
science
like
world
is
all
about.
Right.
We
don't
know
what
we
want
to
get
and
that's
kind
of
what
people
struggle
with
a
lot
of
times.
They
look
at
Cassandra
they're,
like
what
do
I
do
like
what
happens
when
I
don't
know
all
the
queries
that
are
coming
of
fun
and
right.
I
love.
A
Get
that
with
foot
spark
I
wanted
to
also
talk
to
you
having
a
little
bit
about
the
project
that
you've
been
working
on
yeah
four
mile
now
so
Philo
DB,
we,
you
know
we
had
our.
We
had
a
meet
up
together
in
Amsterdam
and
it
was
pretty
cool
to
talk
to
you
for
a
while
about
that.
I
don't
know.
Can
you
talk
a
little
bit.
C
About
file
ODB
yeah
definitely
would
look,
would
love
to
dive
into
that.
So
sometimes
you
have
cases
as
John
mentioned.
Sometimes
your
jobs
are
like
machine
learning
jobs
that
could
take
near
a
long
time
like
half
an
hour
to
an
hour,
and
you
don't
care,
but
sometimes
you
do
actually
want
care
about
response
time
like
you
might
for
it,
for
example,
for
bi
your
enterprise
may
be
used
to
say
second
response
times
right,
so
you
don't
want
something.
That's
half
an
hour.
C
Typically,
you
only
care
about
a
few
columns
out
of
your
many
columns
in
your
faculty,
but
whatever
it
is,
and
so
it
stores
this
data
as
regular,
pants
on
tables,
and
the
benefit
of
this
is
that
you
can
manage
file
ODB
tables
just
like
regular
consignor
tables
from
operations.
C
Point
of
view
you
can
back
them
up
and
we
store
them
exactly
the
same
way
that
you
know
for
regular
Cassandra,
but
you
get
the
benefits
of
very
fast
queries,
beads
for
spark
mom,
so
the
data
because
it
gets
the
word
very
compactly,
I-
think
for
some
tact
tables
that
we've
seen
up
to
a
40x
reduction
in
size
and
the
and
the
speed
gains
are
also
quite
big.
So
basically,
we
did
a
blog
post.
B
C
I
think
there's
definitely
an
interest
that
you
see
folks.
In
fact,
the
current
customers
are
working
with
I.
Think
looking
at
some
of
the
work
we've
done
before
you
know
they
that's
one
reason
why
they're
interested
in
using
Cassandra
spark
and
they
devotes
datastax
Enterprise
to
replace
the
data
warehouse
because
they
saw
this
possibility
and.
B
C
I
mean
I
think
for
for
some
folks.
It's
definitely
that
you
know
that
sedan.
Some
some
folks
are
like.
Well,
we
don't
really
want
to
run
a
giant
Hadoop
system.
You
know
we,
you
know
we
like
what
cuz
I
no
promises,
so
yeah
I
think
that's,
definitely
a
big
part
of
it
and
I've
seen
one
person
at
one
company
that
is
considering
using
stuff
like
red
shift,
but
for
them
they
from
the
economics
of
running
and
the
simplicity,
running
Cassandra
sack
in
the
clan
owed
with
what
something
my
father
TV
actually
makes
more
economic
sense.
A
C
Yeah,
a
couple
of
things
highlighted:
we
have
on
the
net
and
Ellison
and
I
have
a
talk
at
strata.
Title
know:
lambda
simplifying
your
analytics.
What
spark
streaming
Cassandra
and
colonies
so
come
and
check
it
out.
I
think
it's
on
Wednesday
the
30th
of
march
in
san
jose
convention
center
annex
next
week.
I
will
also
be
doing
a
developer
showcase,
their
on
fala
DB,
where
I'll
be
showing
some
demos
in
working
with
it.
C
What's
mark
mark
well
and
some
interesting
data
set
so
come
by
and
check
it
out,
and
I
finally
to
put
job
today
today
we
provide
development
services
for
enterprises
wishing
to
integrate
the
latest
the
best
in
open
source,
big
data,
including
spark
and
Cassandra
so
interested
in
partying
ourselves
out
I'll.
Also,
let
me
know
cool
and,
I
think,
are
also
hiring.