►
From YouTube: This Week in Cassandra 2/26/2016
Description
Link to blog discussed in video: http://www.planetcassandra.org/blog/this-week-in-cassandra-2262016/
A
Oh
human
beings,
hello
this
week
in
Cassandra,
planet
Cassandra
good
times.
This
is
our
fourth
week
today
we
have,
I
believe
it's
pronounced
eric
the
big
from
the
bayous
of
Louisiana,
not
even
point
from
sorry
special
guests
right
thanks
for
coming
buddy
long
time,
no
see
I'm.
A
B
A
Look
at
you
alright!
So
let's
look
at
what
we
have
for
this
week.
First
up
in
the
blog
posts
how
to
write
a
distributed
test
for
Cassandra,
hey,
you
know
what
I'm
pumped
about
this
Luke.
What
do
you
think.
C
Well
so
I,
you
know
I
like
reading
through
tears
and
stuff.
I
see
detest
mentioned
all
the
time
and
I
always
I
mean
I
knew
it
stood
for
distributed
test,
but
it
wasn't
really,
you
know
sure
and
I've
noticed.
You
know
that
pretty
much
anytime
somebody
submits
a
patch.
You
know
one
of
the
contributors
will
almost
always
ask
for
a
detest
covering
it.
A
C
C
You
know
like
when,
when
we
did
the
recommendation
engine
for
killer
video
John
like
John
and
I,
were
writing
some
Python
code,
mostly
John,
and
then
I
was.
You
know
like
trying
to
fix
some
stuff
after
the
fact,
and
so
I
had
to
like
get
my
toe
get
my
toes
in
there
and
you
know
like
I,
don't
eat
Python
yeah.
A
Noodle
wars
and
there's
a
lot
of
pain
there.
Well,
there
was
just
some
things
that
were
horribly,
broken
and
Python
to
like
strings,
for
instance,
like
Python
3,
it's
like
Unicode
string
so
that
the
thing
that
they
really
screwed
up
was
they
actually
changed
how
the
language
behaved
so
like
everybody's,
like
a
like,
none
of
our
stuff
worked
and
it's
been
like
years
and
nobody's
using
Python
3.
So
it's
very.
B
Good
and
it's
not
going
to
change,
because
enterprises,
especially
like
us,
who
have
tens
of
thousands
of
lines
of
Python
code
in
production,
can't
just
drop
in
Python
3
to
test
anything
and
the
drivers.
Don't
work,
libraries
don't
work.
The
programming
paradigm
is
completely
different.
It's
very
problematic.
Yeah.
A
We
I
I
did
the
upgrade
for
when
I
was
still
managing
cql
engine
I
made
that
compatible
with
Python
3,
and
it's
like
it
wasn't
that
it
was
hard.
It's
that
it
was
annoying
like
you
have
to
use
like
the
six
library
which
is
like
compatibility
with
like
both.
So
instead
of
using
like
Python
strings,
you
got
to
use
like
the
six
strings
and
it
like
just
does
it
for
you
and
you're
just
like.
B
Fine
and
really
nice
to
do
this
is
worth
is
when
you
deal
with
other
things
that,
like
people
would
normally
use
Python
for,
like
you
know,
one
of
the
things
that
we're
getting
into
is
a
bit
of
the
graph
database,
work
and
I
know.
Datastax
is
getting
into
that
as
well,
and
all
the
graph
database
drivers
are
written
using
Python
3,
which
makes
it
you
know
nearly
impossible
for
anybody
to
release.
A
B
A
Right
we
can
move
out
the
pipe,
but
we
digress
all
right,
we'll
talk
about
Cassandra
you're
right
how
about
about
this
next
post,
removing
a
disk
mapping
from
Cassandra
all
over
the
last
pickle,
unfortunately,
I
believe
this
dude's
name
is
pronounced.
Elaine
I
see
another
mailing
list
all
the
time
he
posts
every
day,
but
I'm
I've
never
actually
heard
his
name
pronounced
out
loud.
So
if
I'm
butchering
it
kind
of
the
opposite
of
how
I
pronounce
erics
last
name
correctly,
the.
B
A
A
B
Actually
I,
if
anybody's
heard
any
of
my
recent
talks,
I
actually
explain
what
it's
like
to
upgrade
individual
nodes
and
those
nodes
in
some
cases
in
my
case,
have
had
upwards
of
a
terabyte
of
data
on
them
and
when
you
want
to
do
in
place
sort
of
swap
the
nodes
use
our
sink
and
not
just
once
or
twice
over
and
over
and
over.
You
know,
arcing
the
data
over
shut
down
the
node
do
the
final
arcing,
so
the
commit
logs
written
the
disk
can
get
our
st.
B
over
and
then,
when
you
spin
up
the
new
node
you,
you
know
just
double
check
that
everything's
there.
By
doing
our
say,
cash
comparisons,
crc
checks
is
another
way
of
saying
it
and
then
yeah
it's
it's
a
it's
a
headache,
but
without
but
with
arsenic.
It
actually
becomes
a
lot
easier
to
deal
with.
So
the
guide
that
Nick
put
together
is
actually
Mick
from
the
last
pickle
is
quite
helpful.
Yep.
A
What
else
we
got
here,
Patrick
McFadden,
we
work
with
him.
The
most
important
thing
is,
you
know,
and
Cassandra
data
modeling
the
primary
key.
This
is
kind
of
like
a
little
refresher.
Almost
it's
like
first
thing
like
just
seriously
like
kulit
primary
keys
are
important
kind
of
matter,
a
lot
nick
sandra
controlling
how
data
is
laid
out
in
the
cluster.
A
B
Won
anything
or
anything,
I,
just
I,
think
one
of
the
important
things
that
he
covers
here
is
like
he
just
does
the
basic
hey
you
can
select
by
a
primary
key
and
that's
a
lot
of
times
what
you're
going
to
be
doing
it
in
you
know
your
standard
applications
is
you're
going
to
be
selecting
brighter
by
your
primary
key,
so
knowing
how
to
do
it
and
knowing
how
to
create
that
system,
it's
like
it's
pretty
important
all
right,
because
your
primary
Keys,
your
primary
key
selections,
your
fastest
operation
so
I.
That
should
be
your
benchmark.
B
C
A
We
talk
about
that
like
Luke
we've,
given
what
like
roughly
one
trillion,
talks
on
encore,
Cassandra
concepts
and
that's
what
we
know.
That's
what
we
talked
about
we're
just
like
hey.
You
know
you
need
to
get
private
data
by
your
primary
key.
People
are
like
wool.
How
come
I
can't
just
like
dude
joins
are
like
well.
If
you
have
like
a
trillion
row
table
and
you
want
to
have
another
trillion
Road
table
like
then,
you
want
to
do
like
random,
like
scatter
gather
across
all
that,
like
it
just
doesn't
it
does.
B
You
could
see
that
by
looking
at
cpu
utilization
and
that's
when
I
started
to
say.
Well,
how
did
you
choose
your
primary
key
like?
How
did
you
choose
your
clustering
key
yeah?
So
it's
actually
from
whatever
side
you
sit
on
if
you're
working
with
cassandra
like
this
is
an
important
not
just
beginner
thing,
but
it's
a
good
reminder
thing.
Yeah.
A
Like,
for
instance
like
if
aging
your
example,
if
people
like
they
just
have
like
one
partition
for
like
they're
like
oh,
this
is
my
leaderboard
and
it's
like
well,
you
have
like
one
leader
board,
which
means
you're
always
going
to
go
to
the
same
server.
So
even
if
you
keep
expanding
your
cluster,
like
you're,
your
leader
board
is
pull
up,
getting
pulled
up
the
same
amounts
river
every
time
it's.
B
Right
it
was
spreading.
I
can.
I
can
give
you
a
very
specific
example,
because
we
did
something
similar.
You
know
where
that
simple
retort
analytics
company
and
one
of
the
things
we
do
is
collect
web
traffic.
B
You
know
we,
we
collect
web
traffic
data
and
we
decided
initially
that
we
were
going
to
store
all
raw
events
in
one
table
but
segmented
by
our,
and
that
was
a
really
early
mistake,
because
that
meant
that
for
one
hour,
every
single
piece
of
traffic
that
we
saw
got
written
to
the
same
node
and
then
it
would
move
and
we
were
like.
Why
is
this
happening?
Well,
we
didn't
really
think
through
that
it
should.
There
should
be
some.
It
should
be
like
a
compound
key.
B
A
That's
it
that's
an
easy
mistake
to
make
right,
because,
in
your
case,
you're
talking
about
a
set
of
parentheses
can
change
the
key
completely
change
the
behavior
of
entire
cluster.
It's
like
a
set
of
parentheses,
can
literally
like
mean
either
you
did
it
right
and
everything
is
awesome
and
you
can
use
like
a
hundred
node
cluster
or
you
did
it
totally
wrong
and
you're,
like
just
you're,
just
failing
big
time
well,.
B
That's
exactly
what
happened
is
we
were
working
off
of
a
twelve
node
cluster
and
we
couldn't
figure
out
what
what
happened?
What
was
going
on
so
we
bumped
it
up
to
thirty
nodes
and
we
made
the
node
sizes
larger
and
we're
like.
We
can't
continue
to
operate
like
this,
and
we
called
in
one
of
your
fellow
datastax
folks
at
the
time
touchin
Harper,
very
smart
fellow,
and
he
just
showed
us
that
we
were
missing
parenthesis
and
this
would
be
a
lot
better
and
we
turned
our
30
node
cluster
back
down
to
six.
B
It
has
since
grown
to
70
plus
nodes,
as
the
company
has
grown,
but
just
that
one's
like
/,
a
parenthetical
mistake,
and
you
know
more
than
double
the
size
of
our
cluster,
so
understanding
those
keys
is
it's
a
beginner
thing,
but
it's
also
incredibly
important
for
growth,
because
it's
very
difficult
to
change
migrate.
Your
data
later,
you
know,
if
it's
basically
needing
to
know
beforehand
what
your
schema
should
look
like
and
trying
to
change
it
post
facto.
A
B
A
You
know
this
is
this
is
actually
kind
of
interesting.
This
is
a
case
where
I
doing
like
stress
testing
with
a
real
cluster,
makes
a
huge
difference
like
people,
sometimes
only
benchmark
against.
Like
a
single
note
like
on
my
laptop
I'm,
like
a
I'm
just
running,
run,
one
Cassandra
note
and
that's
an
easy
mistake
to
make,
because
you
wouldn't
necessarily
see
the
effects
of
how
your
key
affects
the
distributed
environment.
If
you're
only
working
on
your
laptop.
B
So,
to
actually
tell
that,
I
believe,
on
your
side,
Jake
luciani
wrote
a
an
update
to
cast
endure
a
stress
that
has
schema
base
to
it
now,
so
you
can
actually
put
in
your
schema
spin
a
cluster
whether
that's
on
CCM,
the
Cassandra
cluster
manager
or
putting
it
somewhere
in
you
know:
AWS
whatever
it
is,
you
can
spin
up
a
cluster
with
your
schema
and
Hammer
it
and
see
what
happens
even
if
it's
just
a
three
node
cluster.
So
it's
there
are
tools
out
there
to
test
your
assumptions,
even
in
a
distributed
fashion.
Mm-Hmm.
A
And
I
love
breaking
stuff
in
a
distributed
fashion.
So
so,
let's,
let's
take
a
look
at
these
JIRA's
that
have
been
updated
because
there's
a
couple
more
that
I
definitely
want
to
look
at
here.
Allow
custom
tracing
implementation.
This
just
got
merged
into
trunk
and
I
am
ludicrously
excited
about
this,
like
maybe
overly
excited,
I,
have
first
quiet
fresh
first
glance,
you
probably
like:
why
does
he
care
so
much
about
this
I?
Think
it's
huge
yeah.
B
Yeah
I
mean
I,
I
have
I,
have
a
system
of
hon
I
have
an
infrastructure
of
hundreds
of
notes
and
it
works
with
message
queues.
It
works
with.
You
know,
message
processing
that
gets
passed
in
between
different
applications,
consumers
and
producers,
and
we
had
to
write
our
own
tracing
system
because
there
wasn't
really
something
like
this
at
the
time
and
we
wrote
our
own
tracing
system.
B
That
involved
like
injecting
changes
in
the
messages
and
then
reading
them
at
different
points,
and
we
wrote
our
own
receiver,
which
is
effectively
like
just
ingesting
data,
turning
it
in
Jason
and
and
then
we
had
to
build
a
front
end
for
it.
Yeah
look.
We
did
it
with,
because
we
it's
just
easier
to
write.
A
For
a
little,
let
me
add
a
little
background
to
this
yeah.
Please
cut
the
custom
tracing
implementation.
The
the
inspiration
here
is
adding
zipkin
support
into
Cassandra's,
so
zipkin
comes
from
twitter,
it's
apache
licensed
and
it's
an
open
source
implementation
of
wood,
a
paper
that
Google
put
out
about
their
infrastructure
tracing
system
called
dapper
and
effectively
in
a
distributed
system.
As
Eric
mentioned,
you
have
hundreds
of
nodes,
it's
really
hard
to
figure
out
like
where
performance
problems
are
like
in
a
request.
A
If
you
hit,
you
know,
50
nodes
over
the
course
that
request,
it's
really
hard
to
find
the
exact
point
at
which
something
is
broken
and
what
Zipkin
does
is,
gives
you
the
ability
to
trace
a
request
throughout
the
entire
system
and
now
we'll
be
able
to
trace
it
through
Cassandra.
So
that's
going
to
go
into
Cassandra
3.4,
so
this
is
really
really
cool.
In
my
opinion,
if
you've
you
know,
Zipkin
is
integrated
into
your
application.
It's
integrated
cassandra
and
it's
all
there,
your
databases
and
tools
that
you
use.
B
The
message
starts
at
your
infrastructure
to
the
time
it
ends
takes,
say,
seven
seconds
and
just
making
up
numbers
and
if
all
of
a
sudden
it
starts
taking
10
you're
not
going
to
know
why
or
where
that
happened
in
the
near
have
to
trace
every
single
piece
of
the
application
by
hand
and
when
Zipkin
allows
you
to
do
is
basically
say:
hey!
Guess
what
I
think?
B
B
Yeah,
an
alkyl
ops
guy-
that
is
the
one
of
the
rules,
and
if
you
talk
to
most
Ops
guys,
they
will
say
a
a
service
or
application
does
not
exist
unless
it's
monitored.
That's.
This
is
a
way
of
monitoring
and
instrumenting
an
infrastructure
that
is
very
tightly
coupled
to
the
way
information
flows
through
your
infrastructure,
which
is
very
difficult
things
for
most
developers
to
conceptualize,
but
having
a
system
that
sort
of
understands
it
just
because
it
gets
plugged
in
along
the
way
is
incredibly
valuable.
Yep.
C
From
a
developer's
perspective,
you
know,
I
can
just
even
thinking
back
to
like
one
of
my
previous
jobs
like
having
that
information
available.
You
know
yeah,
it
could
be
an
ops
problem,
that's
causing
your
you
know
your
latency
or
your
message,
processing
time
to
go
from
600
milliseconds
to
49
milliseconds
or
it
could
be
a
code
change
right,
and
you
know
if
you
can
point
a
developer
to
hey
this.
It's
this
particular.
You
know
part
of
the
system
or
it's
this.
You
know
in
this
day
and
age,
everybody's
doing
micro
servers.
C
A
Having
yeah
now
having
the
metric
behind
you
is
it's
a
lot
easier
to
have
a
conversation
where
you're
like
hey
I,
have
a
metric,
and
it
shows
that
this
thing
is
slow.
Vs,
hey
your
micro
service
is
slow
and
then
it
becomes
like
a
personal
battle.
People
going
thanks,
I'm
about
it
like
okay.
You
know,
people
are
what
are
you
talking
about
it
I
didn't
like
it's
not
me.
Obviously,
like.
B
I
do
I'm.
The
cool
thing
is
that
it
might
and
to
your
point
Luke.
It
may
have
nothing
to
do
with
the
code
itself,
but
the
code
may
help
diagnose
a
problem
and
say
the
database
related
to
that
service
right
and
like
that,
just
having
the
you
know,
just
understanding
the
structural
integrity
of
your
system
through
something
like
tracing
is
it's
it's
mind-blowing,
Lee,
easier
to
diagnose
problems.
Yeah.
A
And
and
being
able
to
see
like
even
within
the
micro
service,
that
a
particular
query
took
a
long
time
and
it's
like
you
know,
you
can
like
look
at
a
high
level
or
you
can
like
really
drill
down
and
see
it
a
much
at
a
really
detailed
level.
So,
like
the
amount
of
information
you
can
get
onto,
this
is
incredible.
The
time
savings
huge,
the
fun
off
the
charts
would
all
right.
Let's,
let's
go.
We
got
support
for
group
by
in
the
select
statements
boom.
You
know
what
drop
in
the
hammer
Cassandra
open
source
I.
A
B
Mean
I
think
for
me:
I'm
really
interested
to
see
what
this
I
think.
This
is
a
game
changer,
but
it
really
just
depends
on
speed.
I.
You
know.
I
definitely
would
like
to
see
some
benchmarks
when
it's
done,
you
know
not
like
I'm
trying
to
be
a
hater
or
anything
I.
Just
think.
It's
important
to
understand
like
what
what
potential
trade
offs
you're
making
when
you
get
at
something
powerful
like
group
by
you
know
a
wide
rows
are
one
thing,
but
what
you
talk
about
beyond
beyond
that
you
know.
B
A
Dude
well
so
here's
the
thing
is
your
ear.
I
poll
I'm,
not
a
hundred
percent,
honest
but
I-
think
that
you're
limited
to
single
partitions,
so
you're
not
going
to
be
like
give
me
the
whole
database
like
let's
just
grew
up
on
what
ever
like
you
you'd,
be
looking
you'd,
say:
okay,
I
can
either
take
back
like
you
know,
10,000
rows
out
of
this
partition
and
aggregated
in
my
application,
or
I
can
let
the
database
do
it
and
not
transfer
everything
over
the
wire.
C
A
Be
that'll
be
a
good
one
to
watch
and
that's
a
Cassandra
attend
707.
This
isn't
alone
the
blog
post,
so
definitely
look
read
the
blog
post.
Follow
the
link,
don't
listen
to
me
Wow
and
the
return
of
diagnostic
tools.
Ss
table
dump
replacing
SS
table
to
JSON,
which
I
know
your
co-workers,
a
huge
fan
of
right.
There
yeah.
B
Ruff
ruff
Bradbury
is
a
big
fan
of
this
SS
table
to
JSON,
see
one
of
the
things
that
I
think
it's
important
to
understand
when
you
build
systems
that
are
larger
than
just
one
or
two
machines
is
having
a
standard
communication
media.
So
whether
that's
tanner
communication
medium
is,
you
know,
XML,
which
I
guess
would
be
great
if
you
don't
like
yourself,
very
much
or
or
JSON.
If
you
move
to.
B
Want
you
to
process
this
data
in
whichever
form
it's
easiest
for
you
and,
let's
just
double
check
what
we
have,
let's,
whatever
create
a
materialized
view
for
ourselves,
let's
whatever
it
is,
you're
working
in
it
with
a
standard
set
of
tools
and
SS
table
to
JSON
is
a
thing
that
actually
allows
that
to
happen,
whereas
the
other
sorry
SS
table
the
original
version.
When
you
had
JSON
in
Cassandra
and
then
tried
to
dump
it,
it
made
it
very
difficult
to
process
I.
Think.
A
B
Yes,
no
sorry
we're
not
accident
we're,
not
analyzing
the
raw
SS
table.
We're
accessing
we're
rewriting
some
of
the
older
style
data
all
right
as
we
need
it
in
a
form.
That's
much
more
conducive
to
spark
queries,
ok,
gotcha!
So
it's
very
custom
and
I.
Otherwise,
I
would
say
we'd,
happily,
you
know
get
it
out
there,
but
it's
wouldn't
work.
Yeah.
All.
C
B
A
That
you'll
feel
free
to
write
your
own
I
will
so
it
looks
like
we've
got
our
JIRA's
out
of
the
way
hey
Eric.
Are
you?
Are
you
guys
hiring
by
the
chance,
maybe
you're
hired?
That's.
B
All
you
should
mention
that,
actually,
we
are
hiring
here
at
simple
reach,
so
I've
already
told
you
guys
a
little
bit
about
the
size
of
our
infrastructure,
but
what
we're
looking
for
is
just
a
back
end
engineer,
somebody
who
liked
writing
Python
who
likes
right
and
go
that's
Python
to
know.
We
talked
about
Python
to
deuces
and
you
know
we're
we're
looking
for
somebody.
We
have
a
distributed
team.
B
So,
wherever
you
are,
if
you
you
like,
learn
working
with
large
amounts
of
data
and
amber
light
like
working
with
me
and
Russ,
the
other
author
of
the
book,
practical
Cassandra
then
reach
out
the
the
link
is
in
the
blog
post,
mmhm
lyst,
plug-in.
A
Alright
cool
we
there's
more
stuff
in
the
the
blog
post,
I
think
we're
going
to
wrap
it
up.
For
today,
though,
we've
got
some
CFPs
that
are
open
for
analytics
and
python
conferences
to
meet
up
information.
That's
in
here
so
definitely
check
that
out
and
go
to
your
meetups.
It
will
make
Lena
happy
if
you
don't
know
Lena.
She
is
our
community
wrangler
cat
Wrangler.
If
you
will
she's
very
good
at
getting
this
stuff
making
this
stuff
happen
so
huge
thumbs
up,
let's
see
how
anything
else,
you
guys
have
anything
nada.