►
From YouTube: i2O Water: How Cassandra Helps i2OWater Save Over 235 Million Litres of Water Everyday
Description
Speakers: Mike Williams, Software and IT Director at i2O Water
In this presentation, I will give an overview of the SaaS Platform and overall system that we have built at i2O Water to migrate our customers and assist i2O to scale it's business. I will discuss it's merits and especially the benefits that technologies such as Cassandra bring to overcome technical challenges that we faced with a more traditional architecture and tooling. I will discuss some of the challenges we have faced using leading edge open source software tools and how we have tried to overcome them.
A
A
In
the
world,
we
were
a
small
start-up
in
2005
we've
grown
to
about
60
people,
so
we're
not
a
huge
company
by
any
standard
of
the
people
that
are
attending
some
of
this
conference.
Today,
we
currently
work
with
over
70
water
utilities
around
the
world
to
help
them
save
the
water
and
just
as
an
example,
water
can
be
incredibly
cheap
to
produce
and
deliver
here
in
the
UK,
for
example,
those
are
from
the
UK
or
Northern
Europe
know
that
we
have
a
lot
of
rain.
A
Therefore,
it
costs
less
than
a
few
cents
per
litre
to
produce
water.
However,
there
are
certain
parts
of
the
world
the
Middle
East,
far
east,
where
water
is
extremely
scarce
and
is
also
very
expensive
to
produce
potable
drinking
water
for
people
it
can
cost
in
excess
of
two
to
three
dollars
per
litre
by
comparator
around
the
wheel.
A
We
currently
have
about
2000
systems
and
I
explain
a
little
bit
what
a
system
is
later
installed
and
this
car
currently
around
accounts
for
around
two
and
a
half
terabytes
worth
of
data
that
we
manage
it
within
our
platform,
just
to
pick
up
on
something
from
the
keynote.
This
morning,
people
Billy
was
talking
about
how
we
might
be
doing
things
that
can
affect
people's
lives.
A
So
just
obviously
water
is
something
that's
very
important
to
people's
lives,
but
just
as
an
example
just
recently
in
Saudi
Arabia,
there
was
a
large
festival
where
people
would
go
and
commit
their
Hajj
ceremony
at
Mecca.
As
you
may
know,
there's
quite
a
lot
of
people
go
and
do
that
there's
around
about
5
to
6
million
people
go
to
the
specialty
set
up
camps.
I
20
worked
with
the
customers
in
Saudi
Arabia
to
deliver
clean
drinking
water
and
air
conditioning
and
control.
A
All
of
that
water
delivery
with
750
of
our
systems
to
deliver
literally
one
weekend's
worth
of
water
supply
guaranteed
in
previous
years,
there's
been,
unfortunately,
interruptions
to
that
service,
which
has
resulted
in
people
dying.
So
what
we
do
is
really
important
and
I'm
really
happy
to
be
here,
to
tell
you
a
little
bit
about
it.
As
my
initial
slide
said,
our
total
daily
savings
of
water
for
our
customers
around
the
globe
is
currently
in
excess
of
235
million
liters
of
water.
Do
you
have
any
idea
how
much
water?
That
is?
A
Anybody
have
anything
that
they
could
compare
that
to
a
lake-
okay,
possibly
okay,
so
it
is.
It
is
the
equivalent
of
being
more
than
100
Olympic
swimming
pools
worth
of
water.
If
you
think
about
that,
that's
a
lot
of
water,
that's
being
wasted
every
day,
and
we
can
help
save
that.
In
fact,
the
amount
that's
being
wasted
is
far
in
excess
of
that.
A
A
So
how
do
we
do
it?
So
apologies
I'm
going
to
just
talk
a
little
bit
about
water
networks
and
water
pressure,
and
no.
This
is
not
necessarily
what
we're
all
here
for,
but
it
just
gives
you
a
bit
of
background
as
to
what
we're
doing
so
within
a
within
a
water
network.
The
top
graph
shows
how
the
pressure
varies
over
time
and
we
have
two
days
worth
of
data
shown
there
when
there's
no
active
control
of
the
network.
The
pressure
into
this
zone
is
constant.
A
So
the
blue
line
at
the
top
represents
the
pressure
of
the
water
entering
the
zone
and
further
downstream
in
the
zone
at
a
point
which
is
known
as
the
critical
point.
Usually
it's
the
point
where
the
pressure
is
lowest
or
the
services
worst.
The
pressure
varies.
It
follows
this
fairly
well
understood,
diurnal
pattern.
A
The
red
line
represents
the
minimum
pressure
that
the
water
utility
is
committed
to
deliver
so
that
their
customers
within
that
zone
all
have
water
throughout
the
day,
whatever
they
are
going
to
do
with
it.
The
excess
in
pressure
between
the
wiggly
green
line
and
the
red
line
is
the
pressure
we
look
to
remove
from
the
network.
A
This
excess
pressure
leads
to
high
leakage
in
the
network
and
lots
of
bursts.
Most
of
the
networks
are
old.
They
don't
have
very
good
infrastructure,
they're
leaking
constantly
and
by
delivering
excess
pressure.
As
you
can
imagine,
with
a
hose
or
a
pair
of
coupled
pipes,
the
more
pressure
you
push
through
that
the
more
likelihood
it
is
that
it's
going
to
blow
and
burst
a
major
bursts
are
not
what
customers
want
so
when
we'd
apply
an
I
20
system.
A
What
I
meant
by
that
earlier
is
that
we
actually
put
two
of
our
intelligent
devices
into
that
Network,
one
at
the
inlet
to
the
zone
and
one
at
this
lowest
point
and
by
gathering
the
data
from
those
points
and
crunching
them
through
our
machine
learning,
algorithms.
We
learn
the
characteristics
of
that
zone
and
in
doing
so,
after
a
short
period
of
time,
usually
only
two
weeks,
we
can
reduce
that
Wiggly
green
pressure
by
actively
controlling
the
blue
pressure
in
the
network,
and
it
ends
up
looking
like
this.
A
So
instead
of
there
being
a
constant
pressure
being
forced
into
the
network,
we
vary
the
pressure.
According
to
the
demand,
as
the
demand
increases
use
the
peak
times
or
in
the
morning,
when
people
get
up
to
have
showers,
then
we
will
ensure
that
there
is
sufficient
pressure
downstream
for
the
customer.
The
title
we
can
put
this
green
line
to
the
red
line,
the
more
water
we
save.
You
know,
because
there
is
a
direct
relationship
between
the
pressure
and
the
lost
through
leaks
and
bursts,
as
I
said
earlier.
A
Just
to
give
another
concrete
example:
one
of
our
customers
shabbos,
who
control
the
water
in
Kuala
Lumpur
in
Malaysia
Kuala
Lumpur,
is
very,
very
scarce.
On
water
they
often
have
complete
lockdowns
on
water.
Sometimes
they
have
outages
where
they
can't
deliver
any
water,
so
water
to
them
is
extremely
precious,
so
I
20
currently
covers
about
seventy
percent
of
their
water
network.
For
them
we
produced
up
to
forty-eight
percent
reduction
in
bursts
in
that
Network
for
them
over
a
period
of
a
year.
A
So
this
is
really
valuable.
So
how
do
we
go
about
doing
some
of
this
stuff
so
previously
right?
Where
we
were
a
start-up,
we
were
formed
as
I
said
way
back.
What
did
we
do
so
we
had
this
very,
very
simplistic
architecture.
We
had
this
very
what
you
might
consider
to
be
a
standard,
n-tier
architecture
based
at
that
time,
around
a
microsoft
net
stack,
because
that
was
the
background
of
the
developers
that
were
first
with
the
company
and
it
was
built
on
top
of
clustered
relational
database.
A
Unfortunately,
as
we
all
know,
putting
prototypes
into
production
is
never
a
great
thing,
and
it
continues
to
be
there
today,
alongside
the
desperate
need
that
we
had
to
to
address
the
challenges
of
not
only
the
architecture
but
the
growth
of
our
business
as
our
business
grows.
We
deploy
more
devices,
we
have
more
data,
we
have
more
customers,
we
have
more
users,
etc.
A
So
the
problem
of
scale
and
maintenance
and
basically
trying
to
be
a
true
size
platform.
This
architecture
did
not
address
so
currently.
How
does
it
look
so?
I,
don't
know
if
any
of
you
were
in
the
previous
talk,
but
there's
some
similarities
to
the
architecture.
We
have
to
the
architecture
that
was
demonstrated
by
the
previous
people,
so
weary
architected
our
platform
in
2011,
and
it
uses
a
complete
event
driven
architecture.
So
are
people
familiar
with
event-driven
architecture
serving
up
here?
A
For
me
this
with
the
bright
lights
but
term
I
can't
really
see
you
guys,
but
do
people
know
what
event-driven
architectures
are
few
nods
and
a
few
hands?
Okay,
thank
you.
So,
in
a
very
simplified
manner,
we
have
a.
We
have
a
set
of
very
loosely
coupled
collaborating
services
arranged
into
what
we
call
our
ecosystem
and
these
services
communicate
very
indirectly
very
loosely
via
a
distributed
set
of
brokers
and
they
raised
and
consumed
important
events.
A
We
also
have,
as
will
come
to
see
you
a
little
bit
later.
There
is
flavors
of
data
stores
that
we
use
within
our
architecture
to
hold
the
data
that
we
require
to
help
us,
run
our
algorithms
and
help
the
customers
save
their
water
and
to
allow
them
to
remotely
control
their
network,
which
is
also
very
important.
It
saves
them
a
lot
of
manpower
if
they
can
control
their
water
network
from
a
web
application
rather
than
having
to
send
men
in
vehicles
to
sites
and
locations
which
still
quite
a
lot
of
them,
do.
A
So
we
talked
about
services.
Lots
of
people
talk
about
services,
so
what
the
services
to
us
services
to
us
are
logically
a
group
of
what
we
term
single
responsibility:
handlers
in
software
encode,
plus
some
infrastructure
code-
that's
present
in
all
of
our
services
and
some
data
stores.
These
data
stores
can
be
architected
so
that
they
could
be
physically
or
logically
separated
between
read
stores
and
write
stores
pending
on
the
domain
that
the
service
is
modeling,
the
services
themselves
can
be
scaled
out.
The
services
can
be
scaled
out
across
servers.
A
If
we
wish
they
can
be
scaled
by
having
multiple
instances,
they
can
work
together
completely
independently.
They
can
be
scaled
up.
We
can
have
multiple
instances
of
handlers,
running
performing
the
same
tasks,
and
we
can
also
group
handlers
into
thread
pools
to
enable
us
to
allocate
work,
two
groups
of
handlers
should
we
find
dynamically
that
some
of
the
services
are
finding
themselves
under
heavy
load
and
through
analysis.
We
can
dynamically
spin
up
new
handlers
or
groups
of
handlers
with
more
threads.
A
This
enables
us
to
grow
and
shrink
our
ecosystem
according
to
the
demand
of
what's
happening
within
it.
So
some
examples
of
the
domains
that
we
use
are
integrating
data
clearly
coming
from
the
assets
within
the
water
network,
allowing
those
assets
to
be
configured
such
as
the
customer
can
choose
what
settings
they
have
to
control
the
physical
pieces
of
hardware
they
have
in
their
network
and
the
key
one
for
us
is
also
the
pressure
optimization
how
we
optimize
the
pressures
in
the
network.
A
We
made
some
technology
choices
which
will
come
to
see
a
little
bit
later.
That's
allowed
us
to
develop
these
services
in
a
language,
agnostic
fashion.
These
services,
let
us
say,
loosely
collaborate
which
other
via
event.
So
events
are
raised
when
a
service
determines
that
something
important
has
happened
and
they
consume
events
that
may
have
been
raised
from
one
or
more
other
services.
They
know
nothing
of
the
existence
of
the
other
services.
Our
services
are
not
allowed
to
communicate
with
each
other
in
request
response
patterns,
for
example,.
A
So,
just
a
little
bit
now
the
technologies
we
use,
I
20.
So
at
this
sort
of
front
facing
side,
we
use
a
series
of
web-based
technologies
for
providing
our
web
presence,
but
we
also
use
these
for
our
communications
with
our
devices.
We
also
have
distributed
cache
that
we
use
quite
heavily,
even
though
some
of
our
data
technologies
are
pretty
fast.
Some
things
are
just
not
fast
enough,
so
we
have
to
use
Redis
as
a
cache.
A
At
the
back
I
mentioned,
we
have
a
multitude
of
data
stores,
Cassandra
being
the
principal
one
where
most
of
the
rump
of
our
data
lives.
We
also
still
use
relational
database.
We
use
postgres
coupled
with
post
GIS,
some
extensions
for
geographic
information.
We
hold
the
geography
and
the
topology
of
these
networks,
so
it's
quite
important
that
we
can
navigate
that
and
the
customers
can
see
it
and
we
also
use
elastic
search
within
our
architecture
for
auditing
and
for
free
format,
text
searching
against
our
event.
A
This
is
all
glued
together
with
some
middleware
based
around
rabbitmq
and
distributed
brokering
in
a
pub
sub
mechanism.
The
two
logos
in
the
middle,
which
you
may
or
may
not
recognize,
and
the
one
on
the
right
is
amqp,
which
is
advanced
message
queuing
protocol,
which
is
a
kind
of
language
agnostic
message
protocol
and
the
one
on
the
Left
MQTT
is
an
open
standard
produced
initially
by
IBM
for
the
Internet
of
Things,
so
our
devices
use
MQTT
to
communicate
with
our
platform.
A
Why
we
became
language
agnostic
is
that
all
of
our
events
that
fly
through
our
system
are
encoded
using
Google's
protocol
buffers,
which
in
turn
itself
is
somewhat
language
agnostic,
so
any
of
our
services
can
be
written
as
long
as
they
can
under
stand,
amqp,
which
most
client
in
which
most
technologies
have
client
libraries
for
that
connect
to
rabbit
and
Google
protocol
buffers,
which
is
fairly
ubiquitous.
Of
course,
pretty
much
most
programming
languages
within
a
20.
A
We
have
an
awful
lot
of
still
dotnet
experience,
so
we
have
a
lot
of
our
services
still
written
in
net
and
C
sharp.
That's
where
a
lot
of
our
business
logic
and
some
of
our
algorithmic
workers
written,
but
we
also
use
nodejs
in
our
ecosystem,
which
works
largely
around
the
web
side
and
the
web
backend
and
the
integration
with
the
services.
A
So
how
does
Cassandra
help
us?
So
we've
been
using
cassandra
in
production
since
2011
version
1
and
we're
now
up
to
version
2.6.
It
gives
us
great
right
performance,
but
we're
not
using
SSDs.
The
previous
speaker
also
said
that
we,
because
of
the
nature
of
this
data
and
the
nature
of
the
customers
we
work
with.
We
have
to
be
in
a
very
secure
high
availability
data
center.
We
can't
use
the
cloud
they
won't
allow
us
to
keep
their
data
in
the
cloud.
A
So
it's
not
cheap
for
us
to
switch
hardware
and
infrastructure
very
easily,
so
we've
not
yet
moved
over
to
SSDs,
but
we
think
we
will
soon.
We
get
good,
read
performance.
We
don't
get
lightning
Reed
performance,
we're
using
spinning
disks
by
the
way,
but
we
use
the
cash
very
heavily
to
help
with
that
from
a
customer
user
experience
perspective
and
Cassandra
itself
has
a
superb
scaling
model
which
I'm
sure
we
all
know
about,
and
those
who
don't
are
here
to
find
out.
A
So
we're
going
to
talk
a
little
bit
about
our
use
cases.
The
predominant
use
case
we
have
is
time
varying
data
I,
don't
think
of
it
just
as
time
series
data,
although
that
is
sure
quite
a
large
part
of
what
we
do.
But
we
also
have
to
track
spot
events
that
occur
in
the
water
network
and
they
occur
at
different
points
in
time
and
we
have
to
correlate
them
together.
So
we
also
take
care
of
those
in
Cassandra
tree.
Evolution
is
something
I'll
spend
a
little
bit
more
about
later,
but
that's
something
else.
A
We
use
Cassandra
for,
and
it's
extremely
helpful
in
solving
a
problem
that
we
have
our
algorithm
development.
We
use
via
streaming
data
out
of
Cassandra
before
they
had
spark.
So
we
wrote
a
lot
of
our
own
code
to
do
that,
and
so
we've
been
examining
that
as
to
whether
that's
an
alternative,
and
so
we
have
our
machine
learning
and
optimization
algorithms,
which
also
pull
their
data
from
Cassandra
and
store
their
data
as
needed
in
Cassandra
to
enable
them
to
very
quickly
recompute
Andrey
optimize,
the
water
network,
I
think
to
ourselves.
A
It
was
seemed
at
the
time
a
fairly
unique
use
case,
I,
not
sure
if
it
is
today
and
I
may
find
out
more
from
others,
but
within
our
ecosystem
we
have
a
very
key
feature,
which
is
that
we
have
an
auditor
service
similar
to
the
event
service.
That
was
mentioned
in
the
previous
talk,
which
enables
us
to
perform
historic,
replay
and
I'm
going
to
talk
a
little
bit
about
that
later.
A
So
I
chose
to
show
some
of
these
by
hand
rather
than
using
graphics
package.
I'd
say:
we've
been
using
Cassandra
for
some
time,
so
our
data
models
have
had
to
evolve
as
Cassandra
has
evolved
and
like
many
people
using
Cassandra,
we've
made
many
mistakes,
we're
a
small
we're
a
small
group.
We
don't
necessarily
always
know
what
the
best
practice
is.
The
best
practice
often
changes
too.
So
we
start
with
some
of
the
simpler
events
that
occur
within
our
network,
where
we
are
just
using
combinations
of
fields
as
primary
keys
and
clustering
keys.
A
These
are
things
such
as
when
channels
of
data
go
high,
I'll
go
low.
If
a
device
resets
or
switches
power,
source
or
its
battery
is
getting
low,
other
devices
are
low
powered.
They
run
off
batteries
they're
under
the
ground.
They
live
for
up
to
five
years
being
unmaintained
by
a
human.
So
it's
very
important.
We
know
what's
going
on
with
them
from
their
power.
A
A
Time-Varying
data,
so
this
is
more
recently
the
work
that
we've
been
doing
on
the
measurements
that
we
hold.
We
hold
measurements
at
different
levels
in
the
network
at
location
levels
and
that
area
levels,
and
so
we
now
have
the
ability
to
shard
the
data
a
lot
more
easily
than
we
were
able
to
previously
by
time
and
we'll
a
lot
better
at
being
able
to
cluster
and
partition
the
data
in
such
a
way
that
we
can
pull
this
data
back
for
our
algorithmic
work
in
a
much
more
efficient
manner.
A
So
examples
of
things
that
we
might
use
for
areas
and
locations
are
flows
and
pressures,
sorts
of
things
you've
seen
on
the
charts,
but
we
also
have
to
look
at
things
like
voltages
of
our
devices.
Gsm
signal
strength,
I
didn't
mention,
but
our
devices
communicate
with
our
platform
over
GPRS
gsm
mobile
phone
networks,
which
are
highly
unreliable
for
one
thing
and
the
signal
strength
grossly
affects
the
energy
usage
of
the
device.
A
So
it's
quite
important
for
us
to
utilize
line
some
of
our
modeling
to
enable
us
to
have
predicted
views
of
how
the
life
of
the
device
is
going
to
behave.
Given
the
current
environmental
conditions,
it
finds
itself
in,
whilst
we
designed
them
for
five
years
at
a
particular
usage
case.
If
it's
in
a
very
low
weak
signal
strength
area,
then
the
device
will
draw
much
more
energy
from
its
batteries
in
order
to
communicate.
A
Temperature
also
has
a
dramatic
effect
not
only
on
our
devices,
but
also
on
the
consumption
of
water
as
I'm
sure
those
were
fortunate
enough
to
live
in
a
country
where
the
weather
occasionally
gets
hot.
We
tend
to
use
the
lot
more
water
during
times,
and
so
we
use
temperature
data
recorded
by
our
devices
to
also
spot
correlations
and
patterns
in
usage
and
consumption.
That
would
enable
us
to
change
our
control
models.
Andrey
optimized.
A
So
when
we
use
that
when
we
use
our
algorithms
of
course
date,
data
within
cassandra
is
encouraged
to
denormalize
it,
but
it's
also
within
our
ecosystem.
We
encourage
our
domain
developers
to
duplicate
data
that
there's
no
real
downside
to
duplicating
data,
so
our
services
designed
the
data
models
that
best
fit
their
purpose,
and
so,
when
we're
doing
some
of
our
analysis,
work
or
optimization
work,
we
have
to
time
sync
our
data.
What
I
mean
by
that
is,
we
have
to
take
in
data
from
various
locations
within
the
network.
A
As
I
say,
the
minimum
is
two
and
they
have
to
be
time
synchronized
correctly.
Otherwise,
the
algorithms
won't
produce
the
optimal
output
in
terms
of
pressure
savings.
We
also
have
found
that,
as
time
has
evolved,
as
we've
worked
with
larger
and
larger
water
utilities
that
we
have
to
integrate
a
lot
more
tightly
with
their
in-house
Garda
systems.
Those
gather
systems
have
to
have
elements
of
quality
of
data,
so
we've
had
to
evolve.
A
Our
data
models
to
involve
include
some
some
of
these
other
fields
on
there,
which
are
the
things
like
the
normal
values
and
normal
ranges
that
they
would
expect
to
see
on
those
data,
and
this
is
one
of
the
examples
where
we
use
Cassandra
indexes
for
outlier
detection
and
also
looking
for
validity
of
data.
We,
this
is
probably
one
of
the
only
areas
where
we
use
Cassandra
secondary
indexes.
We
haven't
found
them
to
be
of
great
benefit.
I
have
to
confess.
A
This
allows
the
service
to
catch
up
and
behave
as
though
it
was
in
the
ecosystem
since
epoch.
An
example
of
this
would
be
as
deploying
a
new
algorithm,
a
new
learning
algorithm.
We
can
deploy
the
algorithm
into
the
ecosystem.
We
can
request
historic
data
of
various
types
from
the
past
and
we
can
construct
the
learning
that
we
require
to
then
test
out
that
service.
Other
services
are
completely
unaffected
by
the
existence
of
the
new
one,
the
other
use
case
we
have
for
Cassandra
as
evolving
trees.
A
What
do
I
mean
by
that?
Well
within
a
water
distribution
network
I
showed
you
the
diagram
earlier
as
I
sort
of
laid
out.
The
customers
represent
it
quite
often
as
tree
like
directory
structures.
So
when
they
start
off
using
our
our
system,
they
might
try
on
a
few
areas,
a
few
zones,
and
so
they'll
have
effectively
just
a
very
simple
tree
or
network
of
their
of
their
water
utility
network,
but
over
time
it
grows.
A
It
grows
for
lots
of
reasons
they
buy
more
systems
from
us.
They
change
things,
networks,
change
the
customers.
Network
gets
rees
owned
as
they
take
on
new
customers
of
their
own
within
our
our
solution,
data
arrives
constantly,
but
it's
often
late
and
it
has
gaps
in
it
and
the
reason
for
that
is
because
of
the
mobile
gsm
communication.
It's
not
very
reliable,
as
I
said
earlier.
So
therefore,
we
have
to
expect
data
to
turn
up
at
any
given
point
in
time
and
the
packets
of
data
that
arrived
relates
to
time.
A
That
was
some
way
before
it.
So
we
have
to
track
how
the
network
looked
at
the
point
in
time
related
to
the
data
that
arrives,
and
so
we
have
to
keep
this
historic
evolution
of
the
network
trees
over
the
whole
course
of
epoch
over
time,
and
that's
quite
important
because
we
have
to
perform
certain
types
of
aggregations
on
our
data
not
just
time
aggregation
in
terms
of
looking
at
data
every
minute,
15
minutes
hour
or
whatever.
A
But
we
have
to
carry
out
calculations
associated
with
how
the
water
moves
in
and
out
of
zones
of
that
network,
and
that
enables
us
to
be
able
to
analyze
the
network
to
look
for
problems
within
it.
And
if
we
don't
track
these
trees
properly,
we
run
into
issues
where
we
have
missed
balances
of
water
within
the
network.
A
Going
to
talk
a
little
bit
migration
now
so
we've
had
to
deal
with
effectively
two
forms
of
migration
within
Cassandra.
Since
we
started
using
it.
The
first
one
is
fairly
straightforward
that
most
people
will
face.
No
doubt
our
schema
changes,
and
sometimes
those
schema
changes
are
because
we
adapt
our
models.
We've
had
changing
requirements.
A
A
The
second
type
of
migration
we
have
had
to
deal
with
is,
as
I'm
sure,
you've
seen.
We
have
to
migrate
customers
from
our
old
architecture
to
our
new
architecture.
This
time
we
use
either
specially
written
tools
or
we
use
our
event-driven
architecture
again
in
certain
spots
to
pull
data
from
our
old
system.
So
we
actually
mimic
the
devices
sending
the
data
to
our
new
platform
as
though
they
were
doing
that
originally,
even
though
they
were
always
sending
it
previously
to
our
old
platform.
A
A
The
very
fact,
the
final
part,
the
thing
at
the
bottom,
the
device
switcheroo,
is
that
we
have
these
devices,
as
I'm
sure
you're
now
aware,
they're
communicating
with
our
platform
when
they're
communicating
with
our
legacy
platform,
we
have
to
remotely
tell
them
to
start
communicating,
so
we
call
that
the
switcheroo.
So
we
have
a
very,
very
nervous
point
in
time
when
that
happens,
when
the
devices
get
some
new
instructions
as
to
which
platforms
start
talking
to
so
what
challenges
did
we
have
with
Cassandra
the
biggest
challenges
that
we
are?
A
A
There's
a
smallish
talent
pool
we're
based
in
the
south
coast
of
England
in
Southampton,
and
so
it's
very
hard
to
find
people
with
this.
These
skills,
upgrading
versions
in
cassandra
has
been
a
challenge.
Minor
version
upgrades
generally
no
issues
but
major
issues.
Major
sorry
version
upgrades
we've
had
some
challenges,
their.
A
Data
modeling-
it's
been
mentioned
numerous
times
yesterday
in
the
training
sessions
on
sure
it
gets
mentioned
lots
in
the
talks,
it's
quite
a
hard
subject
with
Cassandra.
It
takes
a
different
mindset
for
people
to
work
with.
There
are
many
choices
and
those
choices
evolve
over
time
as
you've.
Seen
even
in
some
of
our
examples
and
we
we
didn't
always
get
all
the
patterns,
we
thought
we
did,
we
put
them
in
place.
We
ran
through
the
data.
We
would
do
it
by
some
of
the
techniques
which
I've
explained
and
it
didn't
behave
the
way
we
expected.
A
A
A
A
So
that's
a
good
question:
we
we
typically
see
somewhere
in
the
order
of
1
to
2
magnet
orders
of
magnitude,
particularly
when
we
use.
We
also
use
protocol
buffers
between
our
devices
on
our
platform
and
so
there's
a
huge
amount
of
compression
that
we
want
to
gain
there,
because,
with
the
more
data
we
send
across
the
mobile
phone
network,
the
longer
the
modem
is
on
the
more
energy
you
use.
The
low
battery
weakens
you.
So
those
are
the
typical
orders
of
magnitude.
A
Yes,
they're
deployed
our
services
so
that
there's
they
sit
there.
They
they
react
to
events
and
events
include
things
like
data
packets
that
have
arrived
and
they
will
analyze
that
data
and
they
will
relearn
any
characteristics
and
they
will
reproduce
their
outputs,
which
might
be
control
models
for
the
devices.
It
could
be.
An
amelie
detection
for
asset
condition,
monitoring
looking
at
assets
in
the
water
network.
Are
they
going
to
go
bad?
We
heard
this
morning
about
health
on
people.
We
also
have
health
monitoring
on
physical
hardware
assets
that
the
water
companies
hold.