►
Description
Speaker: Andy Cobley, Lecturer at University of Dundee
Slides: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-hardware-agnostic-cassandra-on-raspberry-pi-28053540
Abstract: The raspberry Pi is a credit-card sized $25 ARM based linux box designed to teach children the basics of programming. The machine comes with a 700MHz ARM and 512Mb of memory and boots off a SD card, not much power for running the likes of a Cassandra cluster. This presentation will discuss the problems of getting Cassandra up and running on the Pi and will answer the all important question: Why on Earth would you want to do this!?
A
A
A
It
can't
be
that
difficult,
surely,
and
once
we
do,
what
we're
going
to
do
with
it
once
we
once
we
get
it
to
working.
So
I
guess
I
shall
explain
who
I
am
and
that
might
explain
slightly
more
why
we
have
the
idea
of
doing
this,
so
I'm
andy
cobley.
A
I
got
the
grand
title
that
I
gave
myself
of
the
program
director
of
the
mscs
in
data
science
and
business
intelligence
at
the
university
of
dundee
got
my
twitter
account
there
just
in
case
you
want
to
follow
me,
okay,
so
I
guess
the
first
question
is
how
many
of
you
guys
have
got
a
raspberry,
pi
and
there's
a
couple
of
you
yeah.
A
So
it's
really
interesting
is
that
when
I
first
started
giving
this
talk,
there'll
be
like
two
hands
going
up
and
when
the
end
of
the
month
it
was
like
half
the
audience
and
it
slowly
slowly
didn't
even
slowly.
It
just
completely
took
off
the
number
of
people
with
these
things.
A
Which
makes
it
particularly
good
if
you,
if
you
want
to
make
images,
it's
just
like
basically
creating
a
virtual
machine
image.
You
can
just
create
a
whole
bunch
of
sd
cards
and
one
of
the
image
you
want
and
just
boot
it
very
very
quickly.
A
It's
got
an
ethernet
port
which
makes
it
useful
and
basically
everything
you're
going
to
need
to
run
a
general
purpose.
Computer
got
a
wii
picture
here
of
one
of
these
things
next
to
a
pound
coin.
A
A
A
A
Okay,
so
there's
some
other
some
of
the
challenges.
The
cassandra
is
obviously
designed
to
be
quite
fast
at
writing
very
fast.
At
writing.
It's
now
founded
fast
at
reading.
I
mean
any
of
you
guys.
Who've
used
cassandra
for
some
time
back
to
some
like
0.8.
It
used
to
be
considerably
slower
at
reading
and
writing,
but
they've
done
some
great
work
in
making
that
faster.
A
A
Okay,
so
even
more
bad
news,
you
can
try
it
if
you
want
to,
but
if
you
put
an
external
usb
drive
on
there,
a
proper
full
usb
drive,
even
a
nice
fast
ssd
you're
going
to
find
it
runs
even
slower
than
just
running
off
one
of
these
disks,
and
if
I
can
I'm
going
to
try
and
pull
you
up
a
schematic,
I
don't
know
if
I
can.
A
A
A
No
that's
open
jdk,
of
course,
open
jdk
is
considerably
slower
than
oracle,
probably
in
real
life
in
general
life,
even
on
real
machines,
but
certainly
on
on
a
raspberry
pi
and
that's
important,
because.
A
The
official
distribution
of
the
operating
system
is
raspbian,
which
is
basically
debian
for
the
pi,
and
it
uses
the
hard
floating
point
accelerator
which
will
make
your
debian
a
bit
faster.
I
say
much
faster
there,
but
actually,
let's
be
realistic.
It's
a
bit
faster,
we're
not
talking
about
fast
machines
and
the
current
official
oracle.
Jdk
won't
run
on
debian
at
all.
So
that's
why
people
had
to
use
open
jdk,
that's
the
official
one.
Fortunately,
oracle
have
released
a
beta
version
of
oracle.
8
java
8.,
so
java
might
prefer
six.
A
A
Sadly
it
doesn't
it's
not
entirely
clear
why?
But
if
you
do
run
jdk8
on
debbie
and
you
don't
get
much
of
a
performance
boost
at
all,
using
the
hardware
float
the
floating
point,
accelerator.
A
You
guys
are
all
cassandra
experts
you'll
know
that
you've
got
in
1.0,
cassandra
started
using
compression
for
compressor
tables,
which
gives
you
two
to
two
to
four
times:
reduction
in
the
data
size,
25
to
35
percent
performance
on
reads
and
5
to
10
performance
on
rights.
This
is
one
of
the
reasons
why
reads
now
run
almost
as
fast
as
rights
in
the
compression
now.
This
gives
us
trouble
too.
A
Until
recently,
cassandra
used
two
types:
the
googie,
google,
snap,
snappy
compressor,
which
is
faster
reads
and
writes,
and
the
deflate
compressor.
These
are
configurable
in
the
cassandra
yamble
file,
which
is
basically
a
java,
zip,
it's
slower
but
gives
a
better
compression
problem,
is
snappy
compression
isn't
available
on
the
pi.
A
It
requires
native
methods
to
get
it
to
run,
and
actually
there
is
a
there
was
a
java
jira
bug
report
filed
because
compression
was
turned
on
by
default
in
some
places
and
it
completely
stopped.
It
was
at
least
one
java
version
that
just
wouldn't
run
on
a
raspberry
pi
because
it
just
went
bang
when
it
tried
to
load
up
the
snappy
compressor
and
the
snappy
compressor
in
fact
wasn't
producing
the
right
exception.
A
Fortunately,
consider
1.2
started
using
lz4
compression,
which
just
works
fine
on
this
on
the
on
the
raspberry
pi,
no
problem
at
all,
any
you
and
if
you've
actually
looked
into
the
startup
script,
you
might
find
this
curious
line
that
says
some
processors,
such
as
raspberry
pi,
don't
report
the
number
of
processes
they
have,
and
this
is
true.
The
startup
script
by
cassandra
will
allocate
memory
according
to
how
many
processes
it
has.
A
A
This
is
one
of
the
advantages
of
at
least
trying
to
do
it
on
on
a
machine
like
this,
we
don't
know
where
cassandra
is
going
to
be
turning
up
in
the
future,
at
least
now
someone
in
production
for
whatever,
whatever
job
they're
trying
to
do,
isn't
going
to
run
into
this
problem.
A
If
you're
going
to
run
these
in
a
cluster,
you
have
to
at
least
in
cassandra
end
set
up
the
the
jmx
configuration
properly
else.
Node
tooling
won't
work
and
your
cluster
won't
talk
to
each
other
real,
most
machi
and
clusters
that
you're
running
in
production.
You
don't
need
to
do
this,
but
raspberry
pi.
You
do
have
to
do
it.
A
A
Sadly,
the
jdk
1.8,
that's
available
at
the
moment,
doesn't
have
support
for
this
option
when
it
tries
to
so
when
it
tries
to
start
up
it's
going
to
go
bang
again,
so
you
just
need
to
go
into
cassandra
dot
shell
and
comment
that
out.
A
A
Well,
there's
one
thing:
we've
forgotten
raspberry
pies
cost
about
25
quid.
You
need
an
sd
card
for
next
to
nothing
you've
seen
here,
you
might
be
wondering
I'm
powering
these,
these
raspberry
pi's.
Basically,
I've
got
a
usb
port
here,
it's
not
actually
doing
any
usb
stuff,
it's
just
acting
as
a
four
port
power
supply,
so
that's
nice
and
cheap,
and
these
are
standard
samsung
phone
charging
cables
at
about
one
pound,
25
each.
So
there's
a
data
center
power
supply
for
20
quid
or
thereabouts.
A
What
you
can't
do
is
you
can't
run
more
than
four
off
one
of
these
usb
hubs
or
you
can
but
you'll
soon
find
out.
It's
not
a
good
idea.
What
happens
is
as
you,
if
you
put
five
on
there
and
then
you
start
running
a
stress
test
as
they
start
to
as
the
pi
start,
to
get
more
and
more
power
that
they're
using
they
actually
drag
down
the
voltage
supply
on
this
until
they
start
resetting
and
start
resetting
themselves.
So
four
is
about
the
maximum.
A
There's
a
64
node
supercomputer
produced
by
southampton
university
for
less
than
two
thousand
pounds
and
again
they're
they're,
using
their
students
they're,
giving
that
to
their
students
to
play
with
it's
not
running
cassandra
that,
but
it
doesn't
matter,
it's
nice
somebody's
produced
a
32-node,
beowulf,
compute
computer.
That's
joseph
keepers
at
boys,
university
he's
done
a
particularly
nice
job,
I
think
of
flashing
lights,
and
it's
not
just
universities
as
well.
I
came
across
this
recently.
This
is
linkedin.
A
They
created
a.
I
think.
It's
a
10,
node,
hadoop
cluster
running
raspberry
pi,
and
the
nice
thing
I
like
about
this
is
that
they
they
put
little
leds
on
there,
so
the
nodes
that
are
doing
the
map,
job,
flash
red
and
the
ones
that
are
doing
the
reduced
jobs,
flash
blue.
So
you
can
see
the
process
of
the
hadoop
job
across
the
processes.
A
As
we
all
know,
adding
nodes
isn't
is
good
and
adds
performance.
It
adds
copies
of
the
data.
Make
sure,
though,
that
your
ring
is
balanced.
If
you're
trying
this
on
a
raspberry
pi,
they
really
don't
like
being
unbalanced,
they
start
going
into
nasty
garbage
collection
and
compaction
fits,
and
one
of
them
will
just
die
so
make
sure
you've
got
it
all
balanced.
A
A
So
what
sort
of
performance
do
we
get?
That's
the
performance
for
a
3,
4,
node
cluster.
It's
not
particularly
stellar
the
blue
lines,
three
nodes,
the
sort
of
green
lines,
four
nodes
getting
up
to
about
700
operations
per
second-
and
you
can
see
at
the
end
of
both
of
those
runs.
You
can
see
where
it
starts.
Compacting
the
tables.
A
A
A
Essentially
you
just
change
to
the
boot
command
boot
directory
and
you
copy
out
the
start.l
file
and
replace
it
with
arm224
start.elf.
If
you
want
to
do
this
manually,
if
you
don't
want
to
do
it
manually,
what
you
can
do
is
you:
can
boot
up
a
fresh
copy
of
raspbian
plug
it
in
with
a
into
a
monitor,
and
it
gives
you
a
nice
gui
that
you
can
actually
set
all
these
options
yourself,
sorry
yeah,
yeah,
yeah
or
you
can
run
that
manually
yeah.
A
Some
of
these
slides
are
really
just
for
people
who
want
to
follow
this
through
and
do
it
themselves,
so
I
prefer
static
network
addresses
on
these,
just
because
it
makes
it
easier
when
you're
doing
it
on
a
kitchen
table
to
know
exactly
which
one
you're
writing
to.
So,
if
you
actually
look
at
these
you'll
see
that
I've
actually
written
10
on
there,
so
that
is
actually
19196168.0.10,
so
I
actually
know
each
which
you
want.
If
anyone
any
of
them
fail,
I
can
pull
out
the
plug
directly.
A
I'm
frightened
of
that,
and
if
you
want
to
these
things,
are
written
onto
an
sd
card
as
a
fairly
simple
unix
command
to
copy
them
and
say
the
nice
thing
you
can
do
is
it
means
you
can
just
make
one
copy,
then
copy
it
multiple
times
and
just
plug
it
in
if
you're
doing
it
in
the
sort
of
environment,
I
think
we'll
by
doing
it
with
undergraduates.
A
It
means
that
you
can
have
a
set
of
cards
with
different
versions
of
cassandra
set
up,
so
people
can
compare
different
performance
and
different
configurations,
and
you
could
do
it
virtual
virtual
machines,
but
this
is
nicer
if
you
want
to
start
it
as
a
service,
there's
a
few
differences
to
starting
up
on
standard
debian.
So
I
took
the
an
example:
startup
file
hacked
it
around
a
bit
and
you
can
find
it
out
on
github
there
and
use
that
instead,
so
you
can
start
cassandra
as
a
service
on
this
thing,.
A
A
Stress,
tested,
generally
abused,
doesn't
matter
if
a
student
runs
off
with
one
of
these
we've
lost
25
pounds
worth
of
computer
as
opposed
to,
however
much
a
real
computer
is
we
can
simulate
data
racks
data
centers
long
network
delays
under
some
of
our
undergraduate
students
are
using
these
for
playing
with
configurations
at
the
moment,
because
we
just
started
a
big
data
undergraduate
course:
undergraduate
module.
A
And
I've
lost
pictures
lots
of
my
diagram
there
for
some
reason.
So
if
you
imagine
on
switch
one
you've
got
a
data
center
of
raspberry
pi's
and
on
switch
2.
You've
got
another
data
center
of
raspberry
pi's.
You
use
a
nice
10
ml
megabits
hub
as
opposed
to
a
switch.
An
old-fashioned
repeater,
essentially
inject
some
noise
into
there.
So
you
can
slow
down
the
track
between
switch
one
and
switch
two,
and
that
will
simulate
dirty
lines,
bad
bad
delays
and
all
sorts
of
other
things
for
students
to
play
with
somebody.
A
A
So
if
you're
on
tc
on
them
all
you're
going
to
do
is
delay
the
the
network
traffic
between
each
cassandra
node.
What
you
actually
want
to
do
is
do
the
delay
between
those
two
switches.
So
one
thing
you
could
do:
I
guess
if
you
can
get
yourself
a
linux
box
with
two
network
ports
and
replace
that
hub
with
a
tc
thing.
B
A
So
obviously,
there's
some
recommendations
for
running
cassandra
on
machines
in
general,
and
one
of
the
recommendations
is
to
bind
the
thrift
interface
to
one
card
and
the
rpc
interface
to
a
different
card.
I've
seen
out
there
can't
do
that.
A
So
what
about
cassandra
2?
I
guess
you
guys
know
that
in
cassandra,
2
there's
inter
node
compression,
as
well
as
compression
on
the
disk,
so
the
data
traffic
between
nodes
is
actually
compressed
using
the
snappy
compressor
and
again,
if
you're
trying
to
doing
that
on
these
things,
it's
going
to
go
bang,
and
so
you
need
to
go
into
the
yaml
config
file
and
turned
no
compression
off.
A
I
believe,
coming
in
the
next
version
of
in
cassandra,
I
think
in
2.02,
or
perhaps
a
bit
later,
they're
going
to
give
you
the
option
to
use
the
l4
z
compressor
instead
of
snappy
compressor,
so
that'll
give
you
better
compression
ratio
anyway,
and
that
kind
of
came
because
we
found
that
this
wasn't
going
to
work
on
these
raised
jira
issue.
A
What
I've
seen
happening
and
I've
still
not
got
to
the
bottom
of
it
I'll
tell
you
now
is
that,
as
you
run
the
stress
test,
the
stress
test
is
going
to
run
for
about
five
minutes
and
then
one
node
is
going
to
go
into
terminal,
garbage
collection,
compaction
and
it's
always
a
random
load
as
well.
It's
really
weird,
I
can't
work
out
what's
causing
it.
So
one
thing
you
can
do
is
that
this
this
john
berryman,
that
some
of
you
may
follow
on
twitter
created
this
wonderful
blog
because
he
had
he
had
a
customer.
A
That
said,
can
we
run
cassandra
and
64
megabytes
of
memory
for
our
developers,
so
I
can
run
a
whole
bunch
of
virtual
machines
and
each
one
has
got
their
own
virtual
one.
So
there's
a
blog
there
on
how
to
tune
cassandra
down
to
very
low
amounts
of
memory,
and
if
you
do
that
with
cassandra
2
on
these
pies,
it
will
actually
run
it
just
runs
a
bit
slower
than
it
would
have
done,
but
there's
some
really
useful
advice.
There.
A
A
Okay,
so
69
operations
per
second
to
begin
with
it's
immensely
fast
that
will
ramp
up
after
a
certain
amount
of
time
and
then
you'll
see
obviously,
some
of
the
messages
by
tracing
the
log
file
up
there
up
to
104,
et
cetera.
A
Class
10,
so
it's
about
the
fastest
that
will
look
fastest.
That
will
run
on
these
yeah.
B
A
That
that's
the
other
thing
that
will
also
cause
it.
We
had
a
question:
what
class
of
sd
card
are
you
using?
If
you
use
a
lower
class
or
slower
and
you'll,
get
lots
and
less
speed,
so
class
10
is
about
the
fastest.
You
can
use
we're
up
to
257
operations
per
second
and
at
some
point
that
will
start
dying.
A
Okay,
so
takeaway
message
here:
cassandra
wouldn't
run
a
pie
to
begin
with
it
does
now.
It
was
only
minor
changes
that
needed
to
be
done,
but
they
shook
out
basically
some
minor
bugs
that
will
bite
someone
at
a
later
time.
Perhaps
some
of
the
some
of
my
bi
students
are
data
science
students
at
least
one
of
them
works
in
a
very
secure
lab
and
to
be
able
to
get
a
new
piece
of
software
or
a
new
machine
in
that
lab.
A
It's
a
nice
way
of
getting
round
official
them,
for
instance,
mostly
though
pi
is
for
fun,
I
mean
I
did
this
because
it
was.
I
started
out
doing
this
because
it
was
just
pure
geeky
fun.
Let's
see
if
I
can
ever
get
this
working,
but
someone
was
saying
to
me
last
night
in
the
pub
you
think
about
it,
some
kid
in
his
bedroom.
A
He
can
now
run
essentially
and
play
with
a
four
node
cluster
in
front
of
him,
and
he
can
see
what
happens
when
you
do
that.
The
cluster
and
turn
it
off
and
then
turn
it
back
on
again.
Actually
it'll
be
fine,
it
doesn't,
it
doesn't
bother
it.
I
wouldn't
do
that
with
a
real
cassandra
cluster
by
the
way,
just
pull
the
power
supply.
A
But
that
works
I
have
to
do
the
obligatory
plug.
The
university
pays
me
to
come,
so
I
have
to
mention
that
we
do
actually
have
a
data
science
msc
where
we
deal
with
big
data
and
we
do
some
cassandra
training
amongst
other
things.
This
is
one
of
our
students
smiling
in
front
of
us,
and
you
can
ask
him
how
what
it's
like,
but
I
have
to.
I
have
to
do
that.
A
A
Not
yet
the
the
the
new
beta
versions
are
coming
out,
I
mean
it
could
be
that
they've
just
released
one
and
I
didn't
know
yet.
A
B
A
B
A
B
A
A
B
A
Yeah,
exactly
like
the
linkedin
one
on
hadoop,
which
showed
different
colored
lights
according
to
what
they're
doing
so,
I
mean
you've
got
the
source
code
for
raspberry
pi.
You
could
go
in
there
and
hack
it
and
start.
I
mean
I've.
I've
thought
about
having
a
cassandra
driven,
robot
or
something
so
you
send
sequel
commands
to
cassandra
and
it
draws
across
the.
B
A
Yeah,
and
so
I
mean
we're,
going
very
off
topic
because
this
isn't
standard,
but
so
you
any
of
you
guys,
you
know
arduino
hardware,
you
could
there's
a
there's,
a
there's,
a
converter
you
can
plug
into
the
top
and
that
allows
you
to
plug
arduino
boards
directly
into
it,
but
it
won't
drive
motors
directly.
So
I
think
I've
got
up
somewhere
on
github
or
somewhere.
A
There's
a
I've
written
a
pulse
with
modulation
controller
for
this
that
allows
you
to
drive
motors
directly
through
the
arduino
board,
but
we're
very
off
topic
into
hardware.
Lessons:
okay,
I
think
I'll
call
out
a
day
and
I
think
that's
about
half
an
hour.
So
thank
you
very
much
for
coming
and
I
hope
you've
learned
something.