►
Description
Speaker: Jesse Young, Director of Research at Zonar Systems
Slides: http://www.slideshare.net/planetcassandra/2-jesse-young
Come learn about how Zonar Systems uses Cassandra for logistics use cases such as tracking fleets of school buses and other fleet management services. Zonar uses Cassandra because because of its ability to scale horizontally, its continuous availability and operational ease. This talk will cover details about the implementation and our 3 year journey that got us here, including the challenges along the way.
A
A
So
a
quick
overview
of
zone
R
we're
a
seattle-based
company.
We
deal
with
heavy
fleet
telematics.
Now.
What
is
that
heavy
fleet
is
any
vehicle,
that's
over
10,000
pounds
or
carries
over
eight
passengers.
That's
our
specific
target
customers.
We
do
deal
with
lightweight
vehicles,
Ford,
f-150s,
etc,
but
our
main
customer
base
is
those
heavy
fleets.
Fleet
telematics
is
just
the
collection
of
all
the
data
from
these
fleets.
Gps
data
fall
codes,
any
kind
of
data.
A
That's
really
going
to
help
those
fleets
function
and
really
what
we
are
is
a
hardware
enabled
software
as
the
service
company.
What
that
is,
is
we
create
a
hardware
device?
It's
a
GPS
of
a
GPS
receiver
with
the
GSM
modem
in
it
we
do
engine
diagnostics
and
connect
up
to
the
engine,
computers
of
all
these
vehicles
as
well,
and
then
we
offer
a
SAS
based
application
for
this.
So
we
host
all
of
the
customers
data.
We
offer
a
web
front-end
for
those
customers
to
get
the
data.
A
Then
we
also
offer
a
nice
API
for
those
customers
to
get
the
data
and
bring
it
into
their
back-end
software.
We
know
that
a
api's
are
huge
and
that's
something
that
we've
always
focused
on
word
or
an
open
source
company
similar
to
data
stacks
and
cassandra
users.
We
started
out
and
really
noticed
that
customers
needed
access
their
data
and
we
wanted
to
make
sure
that
we
offered
that
to
the
customers.
So
what
kinds
of
data
are
we
really
dealing
with
over
his
owner?
Well,
first
and
foremost,
we
started
as
a
safety
inspection
company.
A
We
weren't
dealing
with
GPS
data.
To
start
we're
doing.
Do
T
required
pre
and
post
trip
inspection?
These
are
inspections
that
these
drivers
are
required
to
do
to
make
sure
these
pleats
are
these
heavy
fleet
vehicles
are
safe
to
be
driving
out
on
the
road
with
all
the
rest
of
us
commuters,
we're
now
tracking
GPS
data
in
the
last
eight
years.
A
Oil
temperatures,
stop
engine
lights,
check,
engine
lights,
really
anything
that's
out
there
on
that
engine,
computer,
we're
able
to
collect
and
we're
pulling
that
information
and
doing
all
sorts
of
fun
analytics
for
our
customers
on
net
or
even
starting
now
to
get
into
photos
with
our
inspection
device
for
about
or
at
least
an
Android
tablet,
and
when
these
drivers
are
out
doing
their
pre
and
post
trip
inspection.
We
know
that
a
picture's
worth
a
thousand
words
so
rather
than
the
driver
trying
to
explain
exactly
what
they're
seeing
is
wrong
with
the
vehicle.
A
They're
gonna
be
able
to
take
a
photo-
and
this
is
just
opening
up
a
world
of
opportunities
for
us,
so
there's
just
all
sorts
of
data
that
we're
trying
to
collect
and
give
a
quick,
fast
reporting
on.
So
some
of
the
technical
challenges
we've
had
at
zone
art
again,
we've
been
around
for
about
13
years.
We've
got
over
a
hundred
database
servers.
Now
that's
over
3,000
databases
due
to
our
the
way
that
we've
sharded
these
databases.
It's
a
lot
of
a
large
amount
of
data.
It
now
constitutes
over
a
hundred
terabytes
of
data.
A
So,
what's
that
start
bringing
in
to
us
the
big
data
issue,
we've
got
all
sorts
of
problems
that
we're
trying
to
deal
with,
and
although
your
typical,
our
DBMS
is
capable
of
handling
those,
it's
not
always
the
best
solution.
We
need
fast
diggit
of
replication
and
I.
Need
it
easy
to
be
done
all
right
need
it.
I
need
to
be
able
to
do
it
easily.
You
know
I
can
do
all
sorts
of
replication
with
all
the
our
DBMS
tools
out
there,
but
they're
not
easy
they're,
not
easy
to
administer.
A
I
need
to
maintain
fast,
inserts
and
fast
retrievals
from
the
same
data
store.
I,
don't
need
to
have
an
OLTP
database
and
a
data
warehouse
I
need
to
be
able
to
do
all
of
these
things
in
one
fast
place,
I
need
to
horizont
be
able
to
horizontally
scale
easily
I.
Have
this
bet
we
have
this
done
fairly
well
with
our
DBMS
is
now,
but
it's
not
as
easy
as
just
adding
it
a
couple
it's
putting
in
some
tokens
and
being
done
with
it
and
the
last
but
not
least,
is
easy
to
administer.
A
We
want
a
system
where
we're
not
having
to
have
constant
DBAs.
We
don't
want
to
have
to
continue
adding
system
administrators
and
systems
engineers
just
to
scale
out
horizontally.
So
with
our
typical.
Our
DBMS
just
starts
not
being
relevant
anymore.
It's
got
its
place
and
it's
a
great
solution,
but
we
needed
to
some
something
better.
So
we
got
our
big
data
solution
as
Cassandra.
A
As
we
all
know,
cassandra
has
built-in
data
replication.
This
makes
it
very
easy
to
continuously
add
nodes
at
data.
Centers
add
extra
rings.
If
we
need
more
performance,
we
need
that
fast
data
insert
this
is
something
Sandra's
done
very,
very
well
at
same
thing,
with
fast
retrieval
of
data,
I
can
insert
the
data.
I
can
pull
it
out
all
at
the
same
time
and
have
a
very,
very
high
throughput.
It's
very
easy
to
administer.
We've
done
this
as
owner
for
the
last
couple
of
years.
A
Now
we
have
a
ten,
no
drink,
that's
running
my
administrators
very
rarely
have
to
do
anything
with
it
and
the
only
solution
they
typically
have
is
restart
the
Cassandra
service,
maybe
restart
the
server
itself.
Never
any
other
issues
beyond
that.
One
of
the
other
things
that
we
we
happen
to
come
upon
with
Cassandra
was
the
need
for
TTLs
in
Cassandra,
supports
this
very
well
we're
dealing
with
some
DoD
compliance
data
and
other
data
that
our
customers
are
required
by
law
to
keep
around
for
a
specific
amount
of
time,
but
after
that
required
amount
of
time.
A
Some
of
them
don't
want
that
data
around
if
you're,
pointing
that
into
a
typical
already
our
DBMS
driving
did
you
do
delete
statements
or
partition
the
data
a
certain
way.
This
causes
a
lot
of
problems,
and
then
you
have
to
do
vacuum
statements
and
all
sorts
of
fun
stuff
like
that.
We've
just
been
able
to
avoid
a
lot
of
that
with
Cassandra
by
using
TTLs,
so
some
quick
examples
of
how
we're
using
to
Sandra
right
now,
we've
got
our
photos
that
we're
just
starting
to
collect.
A
Now
really
for
this
we
need
a
cheap
storage,
as
we
start
collecting
millions
and
millions
of
photos.
We
don't
want
to
put
that
on
big,
expensive
sands
or
anything
that
it's
just
going
to
be
cost
prohibitive.
Cassandra
were
able
to
use
our
commodity
Hardware,
the
data
gets
replicated
and
it's
relatively
cheap.
We
can
grow
this
capacity
easy
over
time
by
just
adding
additional
nodes.
I
don't
have
that
vial
that
in
initial
infrastructure
right
up
front,
I
can
start
out
with
three
nodes
six
nodes
and
continue
to
grow
it.
A
This
is
another
area
where
those
TTL
is
just
make
a
lot
of
sense,
but
certain
photos
don't
need
to
be
around
forever,
so
I
don't
want
to
have
to
continuously
figure
out
what
needs
to
be
deleted,
set
the
TTL
when
we
store
it
and
then
the
photos
are
gone
when
they're
gone.
Another
big
use
case
that
we've
got
is
elevation
data
we're
starting
to
get
really
big
into
analytics
with
the
data
we
use.
A
One
of
those
those
key
important
factors
for
us
was
elevation
data
and
knowing
where
these
vehicles
are
traveling
and
what
elevation
they
were
at.
This
is
a
really
large
data
set
that
our
engineers
have
had
to
work
on
I.
The
data
gets
loaded
once
to
the
system,
and
then
it's
just
read
heavily.
We
might
update
this
once
every
year
if
we
have
to,
but
elevations
aren't
changing
around
the
world
very
often
at
least,
and
so
we
just
needed
those
very
heavy
reads.
We
found
what
we
were
looking
into.
A
The
solution
that
was
just
gonna
be
a
really
quick
key
based
application.
We
needed
a
scalper
from
perform
performance.
We
know
that
within
the
first
year
we're
gonna
need
to
do
bursts
of
up
to
six
thousand
reads
per
second
and
do
over
a
hundred
and
fifty
million
reads
per
day,
and
that's
just
in
the
first
year
as
we
continue
to
add
more
devices
to
our
to
our
system.
That's
even
more
reads
and
more
more
more
performance
that
we're
going
to
be
able
to
need
to
be
able
to
do.
A
We've
got
another
application
that
we've
jumped
into
which
is
Z
Pass,
and
what
this
is
is
tracking
bus
ridership.
So
we
need
to
know
when
people
are
getting
on
vehicles
and
getting
off
of
the
vehicles.
We
need
to
know
that
these
vehicles
are
being
utilized
to
their
fullest
capacity
if
you've
got
a
bus
riding
or
driving
around
with
5v5
passengers
in
it
you're
not
being
very
economical
with
your
your
vehicle
or
the
fuel
that
you're
using.
So
for
this
we
needed
to
be
able
to
read
and
write
very
heavily.
A
At
the
same
time,
these
are
very
small
bursts
of
traffic
throughout
the
day.
Typically,
two
big
Peaks
throughout
the
day,
a
lot
of
people
don't
ride
the
bus
at
midnight
or
early
in
the
morning.
It's
it's
typically
a
couple
times
throughout
the
day,
so
we
needed
a
way
for
millions
of
users
to
actually
access
this
data
and
we
needed
to
be
able
to
do
at
least
20
million
writes
per
day
just
in
the
first
year
again,
we
know,
as
we
continue
to
add,
via
and
passengers,
there's
even
more
rights
that
are
gonna
happen
throughout
time.
A
We
needed
to
be
able
to
scale
horizontally
the
neverending
story
for
everybody.
Do
you
want
to
be
able
to
easily
scale
horizontally?
So
that
was
something
that
we
really
knew,
that
Cassandra's
been
able
to
do
and
one
of
the
fun
things
for
us
was.
We
just
took
a
look
at
with
Sandra.
We've
got
a
very
basic
app
that
reflects
a
Twitter
type.
Feed
to
Sandra
is
a
big
example.
A
That's
constantly
given
out
there
we're
able
to
look
at
that
code
and
adapt
it
and
use
it
to
help
us
do
some
rapid
development,
so
the
road
to
Cassandra
usage,
it's
kind
of
been
a
long
road
for
us
and
in
a
curing
this
reflected
over
and
over
again
the
talk
by
Accenture
kind
of
talked
about
this.
It
was
great
to
hear
there's
many
resources
out
there
for
you
to
start
using
Cassandra
one
of
our
system,
architects,
Josh
Hansen,
really
started
with
data
stacks
ever
with
Cassandra
early
early
on
way.
A
A
There's
the
planet,
cassandra
community,
that's
out
there,
that's
really
good
I'd
highly
suggest
you
go
to
the
meetups.
Those
have
been
just
great
to
get
more
informed
with
the
local
community
and
people
that
can
help
you
locally.
Twitter
is
another
great
resource
for
everyone
to
use.
We've
got
IRC.
The
IRC
community
is
great
for
Cassandra.
A
One
last
thing
that
I'd
like
to
point
out
too,
is
a
way
that
you
can
rapidly
develop
with
Kassandra,
specifically
using
DSC
and
AWS,
and
we
found
this
to
be
really
important
for
us
using
the
AWS
a.m.
is
we're
able
to
quickly
use
a
couple
Python
scripts
with
about
two
or
three
lines:
each.
We
can
bring
up
a
DSC
ring
in
five
minutes
that
could
be
a
six
node
ring
twelve
node
ring
and
just
really
rapidly
bring
those
systems
up
and
test
them
and
take
them
down.
Now
it's
nice
from
a
management
leadership
position.
A
I,
don't
have
to
acquire
servers
for
my
developers
to
use
every
to
test
everything
out.
They
can
quickly
do
it
tear
down
the
system
and
it's
very,
very
cost
effective
for
us,
specifically
with
that
we
can
load
up
a
hundred
two
hundred
gigabytes
of
test
data
to
really
start
testing
queries
out
with
it
in
under
30
minutes.
It's
just
a
really
huge
thing
for
us.
A
That's
primarily,
we
are
hiring
just
like
every
other
company.
If
you
find
to
you,
spatial
data
in
Cassandra
are
very
interesting.
Please
come
talk
to
me,
we're
starting
to
really
get
into
some
fun
analytics
with
this
data
and
just
looking
forward
to
doing
everything
else
with
it.
Thank
you
any
questions.
A
After
volume
from
node
Josh,
do
you
know
500
gigs
yeah?
It's
it's
one
of
those
things
where
we're.
We
really
wanted
to
move
all
the
GPS
data
into
Cassandra
initially,
but
we
also
found
it's
easier
to
start
moving
new
applications
or
bring
new
applications
up
on
Cassandra,
and
that
really
brings
that
paves
the
way
to
start
migrating
data
over.
A
Just
a
pet,
it's
random
I
would
say
on
certain
manufacturers
are
better
than
others.
We've
got
a
very
nice
relationship
with
the
manufacturer
right
now
and
it's
pretty
easy.
The
heavy
fleet
vehicles
are
a
they're,
more
standardized
than
say
your
lightweight
vehicles,
like
obd2
those
manufacturers,
very,
very,
very
largely
on
the
J
bus
protocol,
which
is
the
heavy
fleet
protocol.
It's
a
little
bit
more
standardized
with
some
manufacturers
with
very
custom
engine
fault
codes,
and
things
like
that.
A
A
We're
doing
the
analytics
on
our
own
really
right
now
what
that,
what
we're
really
doing
is
joining
multiple
types
of
data.
This
elevation
data,
I'm,
really
big
into
fuel
analytics
right
now,
so
as
these
vehicles
are
driving
down
the
road,
how
can
we
help
our
customers
save
more
money
by
enhancing
their
fuel
economy?
That
could
be
things
such
as
looking
at
our
PM
ranges
that
the
vehicles
going
at
as
they
travel
down
the
road,
which
is
where
elevation
becomes
very
interesting.
A
If
the
vehicle
is
driving
up
a
steep
elevation
and
in
a
high
rpm
range,
can
we
get
the
driver
to
downshift
and
actually
save
a
little
bit
more
fuel
economy?
It's
just
a
joining
of
multiple
types
of
data
that
we're
really
starting
to
get
into
most
of
it's
all
in
Cassandra.
It's
it's
the
easy
way
to
scale
and
get
that
date
that
that
read
access
very
quickly.
A
B
A
A
A
A
It's
ramped
up,
I
we'd
started
using
Cassandra
early
early
on
just
through
development
cycles.
It
did
have
a
little
bit
of
a
learning
curve
in
some
developers
are
still
learning
to
utilize
it
the
best
way,
but
it
was
relatively
quick
for
a
lot
of
developers
to
start
accessing,
especially
for
reads
the
PHP
developers
typically
aren't
writing
data
into
Cassandra.
It's
very
easy
for
them
to
use
PHP,
casas
and
really
just
start
reading
the
data
in
treating
it
like
a
typical
data
store.