►
Description
Speaker: Stefan Piesche, Chief Technology Officer at Constant Contact
Slides: http://www.slideshare.net/planetcassandra/data-stax-presentation-stefan-1-0
During this presentation Stefan Piesche, Chief Technology Officer at Constant Contact, will discuss how he and his team were able to grow and scale Constant Contact's technology infrastructure by aligning technology with horizontal business growth to improve performance and reduce costs. He will share some of the lessons learned, best practices, and recommendations for other technology executives looking to transform their technology infrastructure to business.
A
All
right
guys
gripping
audience
quick
topic.
Okay,
my
name
is
stefan
piescher,
I'm
the
cto
at
constant
contact
and
I'm
here
to
talk
a
little
bit
about
how
we
at
constant
contact,
use
data
stacks
specifically
from
a
technology
strategy
perspective.
So
it's
not
if
you
expect
to
see
cassandra
source
code.
That's
this
is
the
wrong
wrong
session.
B
A
A
little
just
a
quick
word
about
who
we
are
constant
contact
is
not
that
well
known
here
on
the
west
coast,
so
we
are
a
company
that
provides
digital
marketing
services
to
about
half
a
million
small
businesses.
Everything
from
email,
marketing,
social
media
deals,
event
marketing
everything,
a
small
business
needs
to
do,
promotions
and
market
themselves.
We
have
b2cs,
b2b
is
non-profit,
so
these
are
the
small
businesses
you
interact
with
on
a
daily
basis.
This
is
your
local
coffee
shop,
your
print
shop,
maybe
even
your
church.
A
If
you
go
to
a
church,
so
many
small
businesses
see
that,
within
the
daily
basis,
70
of
our
customers
are
less
than
10
employees,
very,
very,
very
small,
and
we
engage
these
type
of
customers
on
a
day-to-day
basis
about
10
000
times.
So
the
call
we
have
free
seminars
all
over
the
country
and
we
work
a
lot
with
them
on
teaching
them
how
to
be
better
marketers.
A
So
I
joined
constant
contact
about
four
years
ago
and
specifically
because
constant
contact
wanted
to
grow
from
a
single
product
vertically
growing
company
to
a
true
multi-product
company
that
has
more
customers
that
acquires
companies.
So
the
goals
for
our
sas
company
were
to
become
a
multi-product
company.
A
We
wanted
to
go
from
one
to
many
products,
expand
the
organization,
more
depth,
development
teams
and
grow
by
emerging
acquisitions,
and,
as
you
can
imagine,
that
requires
a
little
bit
of
a
different
technology
approach
than
what
you
get
when
you
start
out.
So
when
I
joined
constant
contact
was
a
business
already
for
a
decade
with
a
fellow
large
install
base,
lots
of
customers,
several
million
lines
of
code
and
various
things.
A
So,
at
the
end
of
the
day,
what
we
found
when
we
got
started
on
this
journey
was
a
single
product,
fairly
monolithic
architecture
right,
and
you
probably
all
know
what
that
means.
Gigantic
build
gigantic
deployables,
very,
very
database
centric
and
a
very
narrow
technology
stack
really
only
using
jte.
In
that
specific
case,
websphere
and
db2.
A
A
A
Velocity
right,
gigantic
builds
gigantic
code
bases.
Development
velocity
is
fairly
slow
in
that
scenario
and
the
build
times
take
forever
right,
I
mean,
if
you
have
one
gigantic
build.
This
is
a
million
lines
of
code.
It
takes
like
30
minutes
for
developers
to
do
anything,
so
we
decided
we're
going
to
change
all
of
this
and
roll
across
all
tiers
from
the
application
tier
some
technology
stacks
and
even
down
to
the
persistence
tier.
A
Many
single
points
of
failure
forgot
that
one
right
something
breaks
with
analytic
architecture,
even
data
architecture.
Everything
goes
down,
so
our
technology
strategy
goals
were
then
we
wanted
to
support
multiple
data
centers.
That
leads
a
lot
to
the
choices
of
the
technologies
we
chose
for
this
in
cassandra,
specifically
horizontal
scalability,
all
right.
Instead
of
buying
from
a
32
cpu
database
server
to
a
64
cpu
database
server,
you
want
to
ability
to
add
servers
and
scale
out
of
scaling
up
extensibility.
A
If
you
want
to
go
from
one
product
to
two
three,
four:
five:
six
products:
the
architecture
has
to
be
suited
for
that
across
all
technology
choices,
cost
effective
and
predictable
right.
What
does
it
take
to
roll
the
next
hundred
thousand
customers?
So
today
we
have
over
half
a
million
customers
and
we
want
to
be
sure
what
it
will
take
to
get
to
the
milling
customer
mark
improving
development
velocity.
It's
a
big
topic
here
right.
A
If
you
have
one
database,
everybody
steps
on
each
other
and
fault
tolerance,
and
it
goes
back
to
if
a
single
database
large
scale.
Somebody
writes
a
row.
Query
everything
goes
down,
so
our
technology
strategy
approach
had
multiple
facets
and
cassandra
is
a
big
part
of
this.
But
let's
talk
about
all
the
components,
so
service
entered
architecture,
not
monolithic
but
distributed
open
source,
so
big
strive
from
using
vendor-related
technology
from
the
big
ones
like
in
the
specific
case,
ibm
to
go
more
open
source.
A
So
interesting
story
here
right
when
we
got
started
with
cassandra,
I
was
interviewed
by
some
magazine
about
our
choice
because
we
started
very
early
with
cassandra
with
version
0.7
and
they
were
asking
us
about
why
no
sequel,
how
we
plan
to
use
it
and
explain
all
of
this,
and
I
was
misquoted
in
the
article
saying
that
I
plan
to
replace
all
db2
with
cassandra
with
constant
contact.
So
you
can
imagine
I
got
the
call
from
the
call
from
the
ibm
rep
saying
I
read.
You
have
replaced
all
the
db2
with
consent.
Oh
my
god.
A
What
are
you
doing?
I
said,
don't
worry,
don't
worry.
Cassandra
is
not
a
tool
for
all
of
it,
so
we're
not
going
to
replace
all
db2
with
cassandra.
The
rest
will
replace
with
my
sequel
and
so
an
interesting
journey
really
began
for
us
and
going
away
from
these
very
proprietary
high
cost
solutions
like
the
ibms
and
oracles
of
the
world,
so
dbt
replacement.
We
started
out
okay,
so
my
sequel,
first
choice
right,
integrate
rubric
and
rails
along
the
lines.
A
We
made
an
acquisition
about
two
and
a
half
years
ago
of
a
company
that
had
a
crm
product
which
was
written
in
ruby
and
rails
sharding.
So
the
typical
approach
to
scale
data
tiers
specifically
is
to
short
out
the
data
right.
So
instead
of
one
database,
you
build
10
databases
and
we'll
talk
about
that
in
a
second
continuous
integration.
Continuous
delivery
of
software,
these
all
technologies
across
all
the
tiers,
including
the
data
tier,
have
to
support
this
right.
Schema
tolerance,
all
of
these
type
of
things,
test
driven
development
and
going
from
webs
to
the
jboss.
A
So
we
started
out
actually
not
looking
at
cassandra
and
said:
okay,
we're
just
going
to
replace
db2
with
my
sequel,
I'm
going
to
shout
this
out
we're
going
to
build
many
databases
here,
and
the
advantages
are
clear
right.
It's
a
semi-predictable
way
to
scale
a
data
tier,
but
it's
pretty
difficult
to
introduce
in
a
mobility
architecture.
You
can
imagine
that
you
have
a
bunch
of
code,
a
bunch
of
applications
that
thinks
all
data
is
in
one
place
right
now.
A
So
if
you
have
20
30
database
instances,
imagine
what
operations
has
to
do
when
the
schema
changes
and
it's
not
always
cost
effective
right,
okay,
cassandra
to
the
rescue,
so
this
was
a
little
bit
over
two
years
ago,
cassano
was
on
version
0.6
that
we
realized
that
the
change
of
charter
database
is
pretty
much
the
easy
part,
an
infrastructure
site,
but
changing
all
the
code
would
have
resulted
in
about
a
one
year.
Project
right,
we're
gonna,
keep
our
customers
happy,
but
I
keep
building
products
at
the
same
time.
A
A
When
we
looked
at
our
large-scale
uses
of
data,
we
realized
a
lot
of
our
actual
relational
nature.
The
two
major
types
of
data
we're
dealing
with
here,
the
first
one
is
content.
So
this
is
all
the
core
marketing
content
our
customers
produce.
So
we
send
out
about
four
and
four
to
four
and
a
half
billion
emails
a
month,
and
this
is
a
comes
across
about,
I
would
say,
probably
three
million
marketing
campaigns.
A
So
every
month
our
customers
produce
three
million
pieces
of
large
content,
which
is
composed
of
smaller,
smaller
pieces
right,
a
news
that
has
images
and
text
and
links
and
all
of
these
type
of
these
things.
So
there's
hundreds
of
mill
pieces,
hundreds
of
millions
of
pieces
of
content
every
month,
but
this
is
stuff
that
you
don't
go.
Okay,
five
million
news
that
I
want
to
use
the
word
bob.
A
A
I
said
okay,
so
really
we
really
don't
need
a
traditional
rdbms
here
and
the
second
one
was
tracking
events.
So
when
we
send
out
our
four
billion
plus
emails
a
month,
we
keep
track
of
everything
that
happens
after
that
right.
It's
not
only
emails,
it's
social
engagements,
so
it's
email
sends
opens
clicks.
A
A
Why
do
we
pick
a
sandwich?
So
we
looked
at
all
of
the
vendors,
so
it
was
a
little
over
two
years
ago
right
and
if
you
remember
what
we
talked
about
on
our
technology
strategy
side
high
performance
right,
if
at
any
point
in
time
we
have
somewhere
between
50
and
75
000
people
working
on
our
system
and
it
doesn't
seem
a
lot.
A
But
this
is
people
that
actually
create
content
that
look
at
reports
that
mine
data
that
manage
their
contacts
in
the
crm
system,
so
high
performance
on
low
latency
is
most
certainly
a
requirement
here,
transparent,
charting
right.
So,
as
I
said,
cassandra
to
all
the
monolithic
applications
that
we
still
had
at
that
point
in
time
looks
like
one
data
store
right.
There
is
no
need
to
break
it
apart
artificially
and
teach
the
applications
their
data
is
in
many
places
now
horizontally.
Scalable,
of
course,
fault.
Tolerant
multi-data
is
in
the
support
and
cost-effective.
A
A
A
If
you
guys
use
my
sql,
you
know
it's
not
not
that
easy
at
all.
Actually,
all
right,
and
so
cassandra
attracted
us,
because
it's
at
that
point
in
time,
and
even
today,
it's
smaller
data
center
replication
features
are
the
most
mature,
the
nosql
market
period,
and
it
works
really
really
well
and
we'll
talk
a
little
bit
about
the
cluster
layout
at
the
end
of
the
presentation
and
fault.
Tolerant
is
a
must
for
us
right
with
that
many
customers.
A
Now,
if
we
make
a
mistake
and
something
breaks,
it's
not
like
our
customers
lose
a
ton
of
money
right,
a
small
business,
but
we
ruin
the
lives
of
half
a
million
people
on
day-to-day
basis,
and
we
really
try
to
have
four
nines
uptime
at
all
points
in
time
and
be
available
to
our
customers.
So
they
can
run
their
business.
A
All
right,
and
so
what
we
did
is
we
replaced
the
db2
database
that
hosted
the
content.
This
was
our
first
project.
You
can
imagine.
So
this
was
a
db2
database.
The
database
server
alone
was
like
200
grand.
It
was
attached
to
sand
for
million
dollars
in
one
data
center
and
we
replaced
it
with
the
cassandra
cluster
that
cost
us
a
quarter
of
a
million
dollars
that
had
10
times
the
performance
in
support
of
two
data.
Centers
was
a
huge
win
for
us
right.
A
So
now,
as
part
of
this
casino
move,
we
put
ourselves
in
a
position
to
really
make
technology
choices
based
on
what
is
the
need?
What's
the
perfect
way
to
solve
this
problem
right,
and
this
is
where
it
goes
way
beyond.
Of
course,
data
t
on
cassandra
as
an
architecture.
It
goes
really
building
an
soa
building,
abstraction
layer,
so
you
can
replace
pieces
as
you
move
along.
A
So
we'll
talk
a
little
bit
about
that
cluster
in
a
minute,
so
data
right.
What
are
we
storing
in
those
clusters
today?
So
it's
one
billion
pieces
of
content
totally
that
we
need
to
manage
here.
Our
customers
stay
with
us
for
four
years
right,
so
you
do
500
000
times
10
million
times,
40
months
or
48
months.
They
come
to
like
a
billion
pieces
of
content
that
you
need
to
manage
to
move
around
and
that
you
can't
just
back
up
right.
I
want
real-time
access
to
that.
A
A
Clicks
facebook
likes
twitter
posts
that
the
end
consumers
of
our
customers
really
produce
that
we
track
measure
and
ultimately
report
on
and
then,
as
part
of
this,
we
derive
analytics
for
them
and
the
total
number
of
records
we'll
look
at
that
is
most
of
it
is
stored
in
cassandra
is
about
between
150
billion
records
we
drive
analytics
from
so
we
have
netflix,
okay,
we're
not
ebay,
but
we
still
have
a
lot
of
data
that
we
need
to
look
at
on
behalf
of
our
customers.
A
A
And
what
that
boils
down
to
on
the
content
side,
it
is
a
17
node
cluster,
with
a
36
terabyte
usable
for
content,
another
7
to
note
cluster,
with
the
same
capacity
for
those
events
which
will
expand
to
turn
up
nodes
over
time
right.
So
those
hundred
billion
events
actually
don't
fit
into
those
36
nodes,
so
that
cluster
will
grow
to
first
to
double
the
size
and
about
triple
the
size
over
the
next
two
to
three
years
and
then
there's
a
fairly
small
cluster
with
higher
density.
A
A
So
what
this
looks
like
today
is
that
cluster
is
really
spans.
Those
two
data
centers
right,
so
36
nodes
in
boston
and
36
nodes
and
santa
clara,
a
triple
replication
each.
So
each
data
element
exists
six
times
actually
right,
and
so
this
will
get
a
lot
of
the
performance
and
fall
tolerance
for
the
downside
is
now,
of
course,
if
you
add
another
third
or
fourth
data
centers,
that
will
then
grow
each
of
the
existing
9
or
12
times.
A
For
analytics
here
is
actually
not
having
to
move
the
data,
so
those
tracking
events-
that's
like
100
billion
events.
They
come
in
at
a
rate
of
like
10
000,
a
second
right
and
when
we
talk
to
big
data
vendors
in
the
hadoop
space,
dmcs
the
ibms
down
to
the
startup
cloud
providers
and
say
how
do
you
move
that
data
out
of
cassandra
into
hadoop
cluster?
A
I
got
a
lot
of
silence,
and
so
what
we
decided
then,
is
okay.
I
guess
then,
we'll
just
leave
it
in
cassandra
and
use
dse's
hadoop
functionality
to
analyze.
The
data
with
now
means
that
we
have
to
move
all
the
other
data
we're
going
to
analyze.
That
is
rdba
masses
in
there
as
well,
but
I'd
say
a
small
price
to
pay
and
actually
works
pretty
well.
A
So
we're
probably
one
of
the
first
adopters
of
hadoop
on
dlc
working
with
data
stacks
here
and
with
pretty
good
success
so
far,
so
the
results
really
are
for
our
cassandra
strategy
is
reduced.
Cost
of
the
data
to
infrastructure
by
80,
substantial
right-
and
this
is
just
the
infrastructure-
even
talking
db2
licensing
and
maintaining
maintenance
costs
higher
fortunes.
We
had
no
outages
yet
at
least
no
major
ones,
as
always
are
like
little
glitches
here
and
there
you
have
to
deal
with.
We
had
five
to
ten
times
the
performance
right,
we're
stunned
to
see.
A
Probably
if
you
guys
use
cassandra,
look
at
operational
metrics
to
see
all
the
sudden
response
times
in
the
sub
millisecond
range.
You
look
at
like
the
300
microsecond
response
time
for
for
data
retrieval,
so
that
can't
be
possibly
be
right,
but
it
is,
and
it
enables
us
to
be
able
to
use
cases.
We
really
couldn't
do
before
right,
so
there's
100
billion
tracking
events.
A
I
was
talking
about
we're
actually
only
now
able
to
do
because
of
cassandra,
because
before
that
on
on
using
traditional
db2
or
even
my
sequel,
it
would
be
impossible
to
scale
right
the
number
of
my
sql
instances.
We
would
need
to
store
100
billion
records
at
about
120
terabytes
of
data
across
the
data
centers.
It's
just
way
too
hard.
A
So
the
learnings
are
living
on,
the
edge
can
pay
off.
We
started
very
early
with
cassandra.
Is
that
very
small
fortune
favors
the
bold
but
the
prepared.
So
when
we
pick
strategies
like
this
and
open
source
technologies
like
this,
we
really
pick
vendors
like
data
stacks,
that
help
us
figure
these
things
out
right.
So
we
don't
go
on
those
things
blindly.
A
We
do
a
lot
of
product
typing
and
from
a
migration
process
perspective,
we
stand
up
the
new
system
parallel
to
to
db2
the
cassandra
system,
do
dual
reads
and
dual
rights
on
both
system
and
compare
the
results
and
make
sure.
Okay,
that's
what
the
old
system
does,
and
eventually
I
can
turn
the
old
system
off.
A
A
You
don't
right
actually,
so
we
have
been
doing
this
at
constant
contact
even
on
the
relational
side,
always
that
all
changes
we
made
are
always
on
the
data.
T
are
always
backwards,
compatible
right
and
there's,
of
course,
a
lot
of
testing
that
goes
ahead.
We
have
a
staging
cluster.
We
test
all
of
those
changes
beforehand,.
A
We
do
right
and
it
was
a
little
hard
to
understand
first
right,
so
the
first
conversation
throw
with
our
rit
team.
Well,
we
want
to
back
up
your
cassandra
cluster
and
say,
but
why
I
have
every
data
element
six
times.
How
often
do
you
need
it
right,
but
the
reason
really
is
that
it's
human
error
right
so
let's
say
somebody
makes
a
mistake
and
changes.
B
A
So
during
this,
this
casano
migration
project,
our
customer
base,
grew
from
250
000
to
500
000,
small
businesses
and
all
of
these
businesses
create
content,
send
email.