►
From YouTube: Apache Cassandra at British Gas Connected Homes with Josep Casals, Lead Data Engineer 1
Description
Speakers: Josep Casals, Lead Data Engineer at British Gas Connected Homes & Christopher Batey, Technical Evangelist for Apache Cassandra at DataStax
A
Okay,
welcome
to
the
eu
cassandra
summit.
My
name
is
christopher
beatty,
I'm
a
technical
evangelist
for
apache
cassandra
and
I'm
here
with
joseph
from
british
gus.
So
thanks
for
agreeing
to
talk
to
us,
thank
you
very
much
for
having
me
yeah.
So
do
you
want
to
start
off
just
give
us
an
overview
of
what
you
do
at
british
gus
and
how
you're
using
sunroof.
B
Yes,
so
we
I
work
on
on
a
on
british
has
connected
homes.
This
is
a
company
which
is
fully
owned
by
by
british
gas,
but
we
are
independent,
we're
in
the
center
of
london
and
we
very
much
deal
with
stuff
relating
to
devices
connected
in
people's
homes.
Our
most
popular
product
is
hive
heating,
which
is
a
mobile
application
that
lets
you
control
your
heating
remotely
and
but
we
also
deal
with
other
all
sorts
of
other
stuff
like
connected
boilers,
smart
meters
and
gas
and
electricity
readings.
B
All
that
kind
of
kind
of
thing.
So
we
use.
We
have
started
to
use
cassandra
for
to
process
all
this
amount
of
data
we
collect
from
people's
homes,
so.
A
B
Per
day
it
will
depend
on
the
on
on
the
service.
If,
for
example,
we
have
a
trial
called
a
trial
that
collects
energy
information
from
people's
homes,
every
10
seconds
that
that
that's
quite
fast,
a
smart
meter
typically
would
send
you
information
every
half
an
hour.
We
get
like
every
minute
information
from
other
kind
of
devices.
We
get
lots
of
battery
measurements,
temperature
exchanges-
I
don't
know
it
depends
on
the
device,
but
it
can
go
from
from
the
seconds
to
the
to
two
fractions
of
an.
B
A
B
A
Exactly
so
so
how
did
you
go
about
selecting
cassandra
like
how
do
you
see
it
as
fitting
your
use
case?
Well,.
B
There
are
many
reasons
why
we
are
working
very
much
with
cassandra
most
important
one.
It's
proof,
it's
proven
scalability
I
mean
it's,
it's
not.
We
do
need
to
collect
a
lot
of
data
and
we-
and
we
do
need
to
to
scale
very,
very
big
so.
A
B
B
Actually,
work
with
many
technologies.
Still
now
we
work
with
many
technologies.
One
of
the
main
decisions
we
have
taken
has
been
between
cassandra
and
riac.
We
we
at
some
point
we're
doing
a
lot
of
stuff
with
react.
I
would
say
the
decisive
factor
that
made
us
turn
more
towards
cassandra
is
analytics
capability
introduced
by
spark,
oh
okay.
That
is
something
that
we
do
really
need,
because
we
do
collect
data,
but
we
also
need
to
analyze
this
data.
You
know
we.
B
We
have
data
scientists
in
in
our
teams
producing
very
smart
algorithms
that
then
we
have
to
apply
on
this
data
yeah
and-
and
this
really
makes
a
difference
being
able
to
use
spark
for
that.
So
how
long
have
you
been
using
spark
for
them?
Well,
not
very
long.
Actually,
we
have
used
spark
standalone
versions
of
spark
like
for,
for,
I
would
say
two
or
three
months
already
we're
just
starting
to
to
to
use
it
with
data
stacks
enterprise.
We
have
found
that
version.
4.5
was
datastax.
B
Enterprise
wasn't
really
working
very
well
with
spark
because
it
packed,
I
think
it's
version,
0.9
of
of
spark
and
with
our
the
volume
of
data
we
we
use
when
it
does
the
it's
not
very
optimized
when
it
shuffles
data
and
it
we
found
that
we
were
running
out
of
memory
very
quickly.
We've
been
testing
4.6
and
it
looks
like
now.
This
promise
is,
it
has
been.
A
Sold
so
so
brilliant,
so
you
say
some
data
comes
in
at
every
10
seconds.
So
what
kind
of
right
and
read
throughput
to
your
like
cassandra
clusters
handling
at
the
moment?
Okay,.
B
As
I
said,
it
is
a
combination-
fortunately,
this
10
second
thing
we're
just
trying
right
now.
We
don't
have
many
customers,
but
we
expect
like
next
year.
We
expect,
like
we
work
in
different
ways
to
ingest
these
data.
Sometimes
we
we
just
copy
it
from
files
we
get
from
another
system,
and
otherwise
we
we
use
we
stream
from
queues.
B
We
we
use,
spark
streaming
to
pick
this
information
from
rabbit
queues
and
save
it
into
cassandra
and
for
next
year
we're
building
up
our
capability
to
be
able
to
handle
like
15
000
messages
per
second,
this
kind
of
loads.
A
Yeah,
oh
nice,
so
did
you
have
a
lot
of
cassandra
experience
in
the
team
when
you
started
or.
B
We
do
make
an
effort
to
train
everyone
as
quickly
as
possible.
Data
stack
is
helping
that
in
that
a
lot-
and
I
would
say
now,
everyone
understands
very
well
how
cassandra
works,
but
but
no
not
everyone
has
been
working
with
cassandra
a
long
time.
Yeah.
A
B
Different
it
is,
it
is
a
simple
concept.
You
have
to
model
for
what
you
query,
but
not
everyone,
but
it's
it's
counterintuitive.
B
If
you
come
from
a
relational
database
world,
you
you'll
you'll,
try
to
you'll,
naturally
try
to
do
things
that
you
cannot
do
with
cassandra
and
and
then
you'll
learn
with
through
error
that
some
of
the
things
you
try
to
do
don't
work,
but
it
is
going
to
be
painful,
so
I
would
think
focus
on
understanding,
very
well
how
keys
and
partitioning
and
and
how
you
have
to
model
data
and
and
think
that
duplicating
data
denormalizing
is
not
a
problem.
It's
actually
the
way
it
should
be.
Yeah.
A
B
Building
a
cluster
to
be
able
to
to
work
with
it
with
the
data
set
of
30
terabytes
right
now.
Okay,
so
we
don't
have
everything
in
place.
We
we're,
starting
with
we're
working
like
with
six
very
big
nodes
right
now,
and
the
way
we
are
going
to
work
is
that
we
separate
this
cluster
into
an
analytics
cluster
and
a
pure
transaction
serving.
A
B
The
moment
we're
building
everything
in
the
same
data
center
it
is
security,
is
very
important
for
us
yeah.
We
are
handling
sensitive
personal
data,
so
we
cannot
for
for
some
application.
It
is
fine
to
have
data
in
in
aws
in
the
cloud,
but
for
many
it
is
not
so
right
now
we
are
putting
making
a
big
effort
to
build
our
capability
in
in
one
data
center
and.
A
So
you're
you're
deploying
it
to
your
own
hardware
rather
than
somewhere
like
aws
or
that's
google.
At
the
moment,
yes,
okay
and
as
you've
been
getting
going
with
cassandra.
How
have
you
found
the
say,
the
community
in
london
and
more
widely
online.
B
B
They
are
full
also
would
say
we
start
to
go
to
spark
meetups
and
that
these
are
twice
as
full.
B
A
lot
of
interest,
and-
and
it
is
really
great
to
be
in
london
working
on
that
because
it's
you
really
everyone
who
is
involved
in
cassandra
sooner
or
later
comes
here
and,
and
you
can
first
can't
talk
to.