►
Description
Speaker: Boris Wolf, Lead Engineer CMB Project at the Comcast Silicon Valley Innovation Center
Slides: http://www.slideshare.net/planetcassandra/c-summit-2013-cmb-an-open-message-bus-for-the-cloud-by-boris-wolf
The Comcast Silicon Valley Innovation Center has developed a general purpose message bus for the cloud. The service is API compatible with Amazon's SQS/SNS and is built on Cassandra and Redis with the goal of linear horizontal scalability. This presentation offers and in-depth look at the architecture of the system and how they employ Cassandra as a central component to meet key requirements. Latest feature enhancements and performance data will also be covered.
A
It's
a
general
purpose
message:
bus
infrastructure
based
on
built
on
top
of
Cassandra
and
reddit.
Now,
if
you,
if
you
look
more
closely,
you
will
find
that
CMV
really
consists
of
two
distinct
web
services.
One
is
a
queuing
service
called
cqs
that
allows
you
to
create
queues
and
NQ
and
DQ
arbitrary
messages,
and
then
there
is
a
second
complimentary
service
cns,
which
is
a
topic
based
publish-subscribe
service.
Now
the
the
noteworthy
thing
about
these
two
services
is
that
they
are
fully
API
compatible
with
their
Amazon
counterparts,
SQS
and
SNS.
A
So
if
you're
familiar
with
those
amazon
web
services,
there
is
a
simple
queuing
service
called
SQS,
and
there
is
another
pub/sub
service
called
SNS.
The
simple
notification
service
and
our
implementation
is
fully
API
compatible
with
those
amazon
services
and
that
compatibility
goes
so
far
that
you
can
actually
use
the
amazon
sdk.
The
java
sdk,
for
instance,
to
interact
with
our
implementation
of
those
services.
A
So
but
first
of
all
the
question:
why
would
we
even
copy
the
this
API?
Why
would
we
even
go
through
that
effort
of
building
our
own
version
of
this
and
to
explain
that
I
first
want
to
highlight
some
of
our
key
requirements,
so
someone
one
of
the
requirements
was
that
that
we
said
we
wanted
to
have
a
message:
bus
that
basically
replicates
messages
across
data
center
boundaries.
So
imagine
you
have
clients
in
different
data
centers.
A
They
are
interacting
with
the
same
queue
by
writing
messages
to
the
queue
and
reading
messages
from
the
queue
and
in
other
words
we
want
messages
to
be
able
to
flow
freely
among
data
centers.
That
is
what
we
would
call
operating
a
queue,
an
active
active
mode.
Then,
in
addition
to
that
scalability,
obviously
we
want
the
system
to
to
scale.
So,
ideally,
we
are
looking
for
linear
or
near
linear,
horizontal
scalability,
and
what
do
we
want
to
scale?
Well?
First
of
all,
we
want
to
scale
the
message
throughput.
A
What
the
the
number
of
messages
we
can
pump
through
an
individual
queue,
so
you
could
imagine,
for
instance,
a
very
busy
back-end
service
that
just
runs
across
a
single
or
just
a
handful
of
cues.
So
we
would
need
to
scale
to
thousands
of
messages
per
seconds
of
throughput
and
beyond,
but
there's
also
a
second
aspect
of
scale,
and
so
you
could
imagine
a
completely
different
use
case
at
comcast.
We
care
a
lot
about
set-top
boxes
so
that
we
ship
to
customers
living
rooms.
A
Essentially,
so
you
could
imagine
a
back-end
service
that
is
designed
in
a
way
where
you
have
a
relationship,
one
message:
queue
assigned
to
one
set
up
box.
So
in
such
a
scenario,
those
cues
probably
would
not
see
a
whole
lot
of
traffic
because
it's
limited
to
a
single
user
or
a
single
household.
But
we
would
need
to
scale
to
millions
of
cues,
so
we
would
want
to
scale
in
both
these
dimensions
and
then
finally
latency
requirements
somewhat
arbitrarily
at
the
beginning
of
this
project.
A
What
is
so
beneficial
about
this
API,
so
this
year
is
giving
us
QSR
I'm
starting
with
SQ
s
and
later
I
will
go
into
s
and
s,
so
this
is
giving
you
the
SQ
s
queuing
service
in
60
seconds.
If
you
will
so.
First
of
all,
the
the
service
focuses
on
guaranteed
delivery.
Everything
is
centered
around
guarantee
delivery
message.
A
Loss
is
not
an
option
which,
by
the
way,
is
also
a
one
of
our
key
requirements
for
the
service,
whereas
orderly
delivery,
meaning
that
the
messages
come
out
of
the
queue
in
the
same
order
that
you
put
them
in
and
also
uniqueness
of
messages
that
you
don't
see
any
duplicates
as
you
pop
messages
of
your
cues.
Those
things
are
secondary.
The
service
makes
a
best
effort
there,
but
there
are
no
guarantees.
Then
there
are
a
few
simple
core
API.
A
A
Then
there's
this
third
one
delete
message:
what
do
you
need
delete
message
for
so
imagine
a
scenario
where
you
have
a
single
message
queue
and
then
you
have
two
clients
competing
for
messages
from
map
Q,
so
those
two
clients,
identical
redundant
clients,
possibly
call
receive
message,
say
every
hundred
milliseconds
or
so
to
check
for
messages.
So,
as
a
message
arrives,
one
of
those
two
clients
will
get
that
message.
First,
come
first
serve
right,
so
what
happens
then
happy
day
scenario?
A
That
also
means
that
client
will
never
come
back
to
call
delete
message,
and
in
that
case
the
service
there's
a
defined
timeout
period
that
we
call
visibility
timeout
by
default.
That's
30
seconds
after
30
seconds
that
message
that
otherwise
would
have
gotten
lost,
will
magically
reappear
on
the
queue
for
a
second
client
to
pick
up
and
process
adequately.
So,
as
you
can
see
here,
basically,
the
service
does
not
trust
the
message
recipients.
A
Not
only
does
the
service
guarantee
or
guarantee
that
the
service
itself
doesn't
lose
any
messages,
it
even
tries
to
guarantee
that
the
recipients,
the
clients
of
the
service,
do
not
lose
any
messages.
So
this
understanding
of
guarantee
delivery
built
into
the
core
API.
We
thought
that
was
very
elegant
and
we
adopted
that-
and
here
I
just
want
to
spend
60-seconds
on
talking
about
the
advantages
of
adopting
an
API
/
building
one
or
designing
one
on
your
own.
A
So,
typically,
when
you
set
out
to
do
something
like
that,
you
have
a
specific
first
use
case
in
mind
right.
Otherwise,
you
wouldn't
do
it
and
what
often
happens,
or
most
often
happens,
is
that
somehow,
as
much
as
you
try
to
come
up
with
a
generic
multi-purpose,
API
design,
somehow
those
requirements
for
that
first
use
case
find
their
way
into
your
API
design.
It
will
be
somewhat
biased
towards
that
first
use
case,
and
even
if
you
manage
to
stay
clear
of
that
issue,
it's
almost
guaranteed.
You
won't
get
it
right.
A
A
So
overall,
it's
really
all
the
advantages
of
implementing
an
existing
standard
over
trying
to
set
one
on
your
own.
So
in
hindsight,
following
this
pattern
has
been
very,
very
successful
for
us
and,
and
we
felt
it
was-
it
was
a
very
good
approach,
but
back
to
the
original
questions
question,
why
didn't
we
just
go
with
with
SQS
right?
Obviously,
we
like
the
aspect
of
guarantee
delivery.
We,
like
the
simplicity
and
robustness
of
the
API
and
obviously
Amazon
scales
pretty
well,
so
you
could
have
some
business
concerns
right.
A
The
costs
associated
with
with
running
these
amazon
services
or
losing
control
over
your
data,
but
those
were
not
really
our
concerns.
Our
concerns
will
repeal
a
technical
one
of
the
major
concerns
were
the
router
latency,
not
so
much
throughput
but
latency.
So
by
definition,
if
you
interact
with
an
Amazon
service,
it's
a
cross
data
center
transaction
right.
So
what
I
said
earlier
that
we
want
like
95th,
percentile,
10
millisecond
response
time?
A
You
will
probably
never
get
a
10
millisecond
response
time,
if
you,
if
you
interact
with
an
amazon
web
service,
unless
the
only
alternative
would
be
if
you
can
host
your
entire
ecosystem,
everything
you
own
in
ec
two
in
the
same
networking
zone
as
SQS,
really
but
at
least
right
now,
that's
not
an
option
for
us.
So
latency
was
a
big
issue
and
then
also
there
are
a
number
of
limitations
that
are
put
on
a
user
of
these
amazon
services.
A
There
is
a
maximum
message
size,
for
instance,
there's
a
number
of
timeout
periods
that
have
maximum
that
have
certain
maximum
that
you
cannot
get
around
and,
of
course,
we
reimplement
all
those
same
limitations,
except
that
those
limitations
simply
become
properties
in
a
configuration
file
to
us.
So
if
you
deploy
the
service,
you
can
adjust
those
numbers
and
in
a
way
deploy
your
own
flavor
of
this
service
tailor
to
your
own
needs.
A
So
to
summarize,
we
set
out
to
build
a
horizontally
scalable
queuing
service
on
top
of
Cassandra
and
Redis,
which
is
API
compatible
with
Amazon's,
SQS
API
and
so
going
into
the
architecture
and
the
details
of
the
design
we
chose
Cassandra.
Of
course
the
main
reason
was
for
its
cross
data,
center
persistence
and
replication
ability,
and
we
no
its
scales
horizontally
very
well
and
in
fact
we
had
an
original
implementation
that
was
purely
based
on
Cassandra,
and
then
we
realized
we.
A
We
couldn't
meet
the
latency
requirements
that
we
have
set
for
ourselves,
and
so
we
added
Redis
into
the
picture
at
first
too,
to
improve
latency
as
a
cash.
But
then
we
realized
that
it's
also
very
useful
helping
out
with
the
best
effort
ordering
of
messages,
and
we
are
using
reddits
to
manage
that
visibility
time
on
mechanism
that
I
was
describing
earlier
that
whole
business
of
hiding
messages
and
then
making
them
reappear.
If
you
don't
delete
them
all
of
that
we
handle
in
Redis
and
not
in
Cassandra.
A
So
first
I
want
to
talk
a
little
bit
about
data
modeling.
How
did
we
model
the
column,
families
in
Cassandra
that
sit
behind
the
cqs
queuing
service?
So
whenever
you
do
a
project
in
Cassandra,
the
recommendation
usually
is
figure
out.
What
are
your
key
data
structures
and
what
are
the
most
frequently
asked
questions?
The
most
frequently
asked
question
so
as
we
did
that
we
went
through
a
number
of
iterations
and
I
just
want
to
highlight
three
of
those.
So
the
first
approach,
some
wood
naive.
We
said:
okay,
how
about?
A
We
represent
the
messages
for
a
single
cue
in
a
single
column,
family,
one
column,
family,
/,
q,
and
in
this
scenario
we
have
a
column
family
with
a
single
column
which
has
a
static
column,
name
message
body.
The
row
keys
would
be
the
time
stamps
when
the
messages
were
received
and
the
column
values
would
be
the
actual
payload
the
actual
messages.
So
we
would
at
that
point
in
time
we
didn't
really
know
what
we
were
doing.
We
came
from
a
relational
database
background,
didn't
know
much
about
Cassandra
and
the
problem
here
is
so.
A
What
is
the
query
that
you
typically
would
ask
from
a
queue?
So
probably
you
would
to
know
what's
at
the
head
of
the
queue
or
the
tail
of
the
queue
right
and
in
Cassandra
terms
what
this
means
with
this
design
is
he
would
you
would
want
to
do
a
range
slice
query,
and
you
can
only
do
that
if
you
use
what's
called
the
order,
preserving
partitioner
and
again
at
the
time.
A
We
didn't
even
know
what
that
really
means,
but
we
consulted
with
people
who
did
know
about
that,
and
everybody
told
us
basically
stay
away
from
the
order,
preserving
partitioner
at
all
costs.
If
you
can,
if
you
can
at
all
right,
so
we
said
okay
that
doesn't
work,
so
we
quickly
rejected
this
approach
here.
In
fact,
we
never
implemented
that
that
was
our
first
iteration,
so
we
said:
if
that
doesn't
work,
how
about
we
do
it?
The
other
way
around?
A
Let
us
store
the
messages
that
are
sitting
in
a
queue
in
a
row
instead
of
a
column.
So
now
the
first
advantage
here
you
are
seeing
here
is
that
now
we
can
store
these
messages,
we
can
store
messages
for
multiple
queues
in
a
single
column,
family.
The
row
key
now
is
an
identifier
for
the
QA
q
ID.
If
you
will,
the
column
names
are
the
timestamps
when
the
messages
were
received
and
the
column
values
are
actually
actual
message
payloads.
A
So
now
we're
taking
advantage
of
a
nice
standard
cassandra
feature,
which
is
that
it
will
sort
columns
by
their
column
name
automatically
for
us
right.
So
if
we
configure
this
right,
messages
will
be
sorted
in
chronological
order
from
here
in
this
visual
from
left
to
right,
because
the
column
names
are
timestamps.
A
So
this
is
good,
because
now
we
can
actually
do
callum
slice
queries
and
we
can
ask
for
the
head
of
the
queue
or
the
tail
of
the
queue
or
any
anything
in
between
really
there's
still
a
problem
with
this
approach,
and
this
is,
as
you
can
see,
everything
for
a
cue
goes
into
a
single
row
right
and
everything
you
have
in
a
single
row
will
end
up
on
a
single
note
in
your
cassandra
ring
so
for
a
very
busy
Q.
You
could
bottleneck
on
that
on
that.
A
We
have
an
equal
amount
of
right,
read
and
delete
operations,
so
we
are
producing
a
ton
of
tombstones,
as
we
are
doing
this
so
again
by
putting
all
of
this
in
a
single
row
we
are,
we
are
having
the
potential
for
bottlenecking
on
on
a
single
note
in
the
cassandra
ring
and
facing
some
issues.
In
fact,
I
believe
in
his
keynote
Jonathan
Ellis
said
yesterday,
something
like
I
was
told.
He
said
I'm
paraphrasing
that
don't
do
this.
Basically,
don't
try
to
build
a
queuing
system
on
top
of
cassandra
and
well.
A
We
did
anyway
so
now,
I'll
show
you
I'll,
show
you
how
we,
how
we're
doing
it.
So
we
didn't
go
with
that
design.
We
went
with
a
modified
version
of
the
design
that
I
just
showed
you,
so
this
is
essentially
the
same,
except
that
now
we
are
charting
the
messages
for
a
single
queue
across
a
number
of
rows.
So
we
are
in
this
configuration
here.
We
are
charting
across
up
to
100
rows.
A
So
you
see
now
we
have
a
composite
roki
consisting
of
the
queue
ID
concatenated
with
an
index
number
that
ranges
from
0
through
99,
and
we
write
messages
to
random
charts
to
a
random
to
random
rose
out
of
these
100
rows.
Still
messages
will
be
sorted
chronologically
from
left
to
right,
but
they
are
written
to
random
shards
and
when
we
read
we
will
read
from
a
random
chart
so
now
that
that
works
pretty
well,
because
now
we
are
actually
distributing
the
messages
for
a
single
queue
across
up
to
100
nodes
in
the
Cassandra
ring.
A
If
your
ring
even
is
that
large,
the
only
problem
is
that
we
are
missing
out
a
little
bit
on
orderly
delivery
right
because
we
are
reading
from
random
charts.
It's
all
random.
It
could
happen,
for
instance,
that
here
is
the
third
messages
here
right.
So
if
we
happen
to
read
from
this
shark
99
first,
we
would
get
message
3
out
before
message,
one
for
instance.
That
would
be
out
of
order
right.
A
However,
what
would
never
happen
like
the
first
message
in
the
last
message?
They
happen
to
sit
in
the
same
shark
right
and
chart
number
zero.
You
would
never
read
message
six
before
message,
one
because
message:
one
is
essentially
in
front
of
it
in
the
same
chart.
So
what
this
really
means
is
globally.
If
you
look
at
a
high
traffic,
you
globally,
you
have
orderly
delivery
but
locally.
Any
two
consecutive
receive
message
calls
may
be
out
of
order
and
we
are
using
Redis
to
straighten
that
problem
out
and
I'll
show
you
how
that
works.
A
Looking
at
a
data
flow
example,
so
I'm
giving
you
a
simple
data
flow
example
where
we're
sending
we're.
Looking
at
a
single
cue,
we're
sending
three
messages
in
message:
123
in
that
order,
then
we
make
one
call
to
receive
message,
and
hopefully
we
get
the
first
message
out
that
we
send
that's
message
one
and
then
we're
assuming
the
happy
day
scenario.
We
are
coming
back
to
delete
that
message
so
in
this
visual,
what
you
have
here
upper
left
is
the
web
service
endpoint.
A
In
fact,
this
is
the
only
component
of
the
system
that
you,
as
a
user
of
this
service,
will
ever
see
and
interact
with
this.
One.
Here
is
the
cassandra
column,
family,
the
architecture
that
I
just
described
and
then
on
the
right
side,
we
have
our
Redis
caching
layer
and
you
can
already
see.
Redis
is
not
just
a
simple
key
value
store
like
memcache,
for
instance,
it
actually
offers
rich
data
structures.
You
can
create
lists
hash
table
sorted,
sets
all
kinds
of
things
and
we're
making
heavy
use
of
that.
A
So,
if
I
call
sendmessage
for
the
first
time,
the
message
call
comes
in
on
the
service
endpoint.
The
message
is
written
to
a
random
charge,
an
hour
Cassandra
Callum
family
and
the
message
ID
goes
into
a
Redis
list,
so
notice.
I,
hope
you
can
see
that
in
the
ID
that
we
are
storing
in
that
list
is
again
a
composite
key,
this
time
consisting
of
the
roki
from
the
Cassandra
Callum
family
concatenated,
with
the
column
name.
So
this
is
still
very
little
information
very
dense
information,
but
it's
enough
to
actually
pinpoint
the
message.
A
Payload
locate
the
message
payload
in
the
cassandra
column,
family.
So
then
I
send
the
second
and
the
third
message,
as
you
can
see,
they
trickle
in
into
random
charts
in
the
Cassandra
column
family.
But
they
line
up
perfectly
in
that
Redis
list.
So
we
have
one
red
is
list
/
q.
Then
we
call
receive
message
and
what
that
does.
Is
it
essentially
pops
off
that
message?
A
Id
from
that
Redis
list
reads
the
payload
out
of
Cassandra,
and
it
also
puts
that
message
ID
on
a
hash
table
in
Redis,
along
with
a
future
data
timestamp
30
seconds
into
the
future,
and
there
would
be
a
background
process
that
checks
this
hash
table
periodically.
And
if
we
don't
delete
the
message
within
that
30-second
time
window,
it
would
be
placed
back
on
the
list
and
that
would
constitute
making
that
message
read
visible.
But
here
we
are
assuming
happy
day
scenario.
A
We
are
calling
delete,
which
means
it
will
be
read
that
ID
will
be
removed
from
that
hash
table.
And
finally,
we
are
deleting
it
actually
from
Cassandra.
So
I
should
step
back
one
slide
back
because
you
should
notice
here
when
I
actually
received
the
message
notice
what
happens
in
Cassandra
right?
Nothing
really
by
handling
this
entire
visibility
timeout
in
Redis
we're
solving
saving
ourselves,
some
churn
in
Cassandra.
A
I
will
take
questions
after
the
talk.
Okay,
so
architecture
recap.
We
have
a
Cassandra
persistence
layer
messages
are
sharted
across
100
rows
per
q
and
we
are
mainly
doing
that
to
avoid
wide
rows
and
by
that
I
mean
rows
that
are
wider
than
500,000
entries.
We
are
doing
that
to
minimize
churn
and
the
creation
of
tombstones
and,
most
of
all,
to
distribute
queues
among
cassandra
notes.
A
A
Key
cassandra
features
that
we
are
using
well,
of
course,
the
main
reason
just
I
just
want
to
highlight
two
things
on
this
slide.
The
main
reason
to
choose
cassandra
in
the
first
place
is
its
ability
to
replicate
across
data
centers.
Of
course,
we
are
using
local
quorum
reads
and
writes,
which
we
find
strike
a
very
nice
balance
between
still
having
the
data
center.
Local
response
time
characteristics
that
we
want,
but
at
the
same
time
making
sure
that
eventually
data
will
arrive
in
all
data
centers.
A
So
that
was
the
main
reason
to
choose
cassandra
right,
but
there
are
also,
as
you
get
into
the
nitty-gritty
implementation
details
there,
a
lot
of
nice
little
features
that
help
us
out.
So,
for
instance,
I
just
want
to
point
out
one
of
those
here.
There's
TTL
right.
You
can
set
a
time
to
live
on
column
on
Calum
values,
so
that
fits
very
nicely
with
a
feature
of
the
queuing
service,
the
sks
service,
which
is
a
message
retention
period.
A
A
So
that's
just
an
example
of
that
scalability
the
core
API
send
receive
and
delete.
We
implemented
those
as
constant
time
operations,
so
they
should
scale
linearly
and
obviously
they
scale
with
the
size
of
your
Cassandra
ring
number
of
service
endpoints
API
servers
you
have,
which
are
stateless
and
the
number
of
reddits
charts
you're
using
so
we're
using
charted
Redis
by
the
way.
The
only
thing
I
should
point
out
here
is
that
ques
cannot
be
sharded
across
reddit
servers.
I
mean
the
whole
point
of
using
a
Redis
list
is
to
get
perfect
ordering
right.
A
Availability,
we're
banking,
of
course,
on
the
availability
of
our
Cassandra
ring,
but
I
should
point
out
that
the
service
does
function
without
Redis
around.
If
red
is
crashes
is
unavailable,
the
service
will
still
be
there.
You
will
see
out
of
order
delivery.
You
will
probably
see
a
lot
of
duplicates
because
we're
using
that
hiding
mechanism,
but
all
of
that's
okay,
because
the
the
service
guarantee
is
guaranteed
at
least
one.
A
Its
delivery
duplicates
and
out
of
order
are
ok,
so
we
still
fulfill
that
contract,
even
even
if
red
is
is
not
available,
and
here
I
have
a
slide
that
visualizes
how
we
do
data
center
failover
right
now.
So
what
you
see
here
is
to
data
center
status
and
a
one
data
center
to
both
of
them
have
the
cqs
service
fully
deployed.
So
in
this
example,
I
have
three
service
end
points
behind
a
local
load,
balancer,
three
Redis
charts
and
then,
of
course,
a
Cassandra
ring
that
spends
across
those
two
data
centers.
A
So
all
clients,
clients
are
at
the
top
here
they
go
through
a
global
load,
balancer
that
global
load
balancer
routes
all
traffic
to
in
currently
active
data
center,
which
is
in
this
case
data
center.
One
should
that
data
center
go
down
for
any
reason
from
one
moment
to
the
next,
the
global
load
balancer
will
fail
over
to
the
second
data
center.
All
traffic
will
be
routed
to
the
second
data
center.
That's
ok,
because
all
the
data,
all
the
data,
all
the
data
restoring
in
the
cuse,
is
already
available
in
the
cassandra
ring.
A
The
only
problem
is
that
you
see
the
Redis
charts
over
there.
They
are
completely
unaware
of
anything
that
has
happened
so
far.
They
are
blank.
Basically,
so
we
have
on
a
cache.
Miss
will
have
again
a
background
process
that
will
kick
in
and
heat
up
those
caches,
boot
strip
them
essentially
from
data
with
data
read
from
from
the
Cassandra
ring
and
depending
on
how
much
traffic
you
have
after
a
few
seconds
or
a
couple
of
minutes,
the
service,
the
performance
characteristics
of
the
service
will
return
back
to
normal.
A
So
this
is
not
quite
the
active
active
that
I
was
proposing
earlier
on
my
requirement
slide,
but
you
can
see
that
by
building
this
service
on
Cassandra,
we've
laid
a
great
foundation
for
moving
in
that
direction.
It's
really
not
Cassandra.
That
stands
in
the
way
of
that.
In
fact,
the
only
reason
we
cannot
do
active
active
right
now
is
the
reddest
component,
so
we
do
have
a
roadmap
and
some
ideas
on
how
to
get
around
that
and
improve
that.
A
So
we
can
actually
do
an
active
active
in
the
future,
but
for
now
this
is
the
recommended
deployment
and
use
for
failover
of
this
service.
Okay,
so
with
that
that
was
my
talk
about
SQS,
so
now
I'm
going
to
move
to
the
top
publish-subscribe
service,
SNS
and
cns
our
implementation
of
that.
So
it's
the
simple
notification
service.
It's
a
topic
based
pub/sub
service.
A
A
Again
we
have
a
set
of
simple
core
api's,
so
you
create
topics
you
subscribe,
endpoints
to
those
topics
and
then
really
the
only
API
that
you
really
use
once
you
have
all
that
setup
is
publish,
so
you
publish
a
message
on
a
topic
and
what
that
means
is
that
message
will
be
fanned
out
to
all
the
subscribers
for
that
topic
at
that
point
in
time
again,
the
service
does
not
trust,
really
trust
message
recipients.
So
there
is
a
redelivery
policy
built
into
the
service,
which
means
this
applies
mainly
to
http
endpoints.
A
So
if
your
endpoints
are
temporarily
unavailable,
responded
with
a
404
or
are
completely
off
the
network,
the
service
will
attempt
redelivery
I
think
by
default
it
will
be
3
retries
over
the
course
of
60
seconds
and
that's
the
default
policy.
But
you
can
adjust
that
policy
to
whatever
you
see
fit
now.
I,
don't
think
we
have
the
time
to
go
into
the
same
level
of
detail
as
I.
Give
you
I
want
to
give
you
a
dataflow
example
here
as
well
to
explain
to
you
how
our
implementation
of
that
service
functions.
A
So
I
can't
go
into
the
same
level
of
detail
as
I
did
with
CQ
s.
So
don't
worry
if
you
don't,
if
not
every
detail
makes
sense
to
you
immediately,
but
I
just
want
to
give
you
a
rough
idea
of
how
we
implemented
that
pops
up
service
so
I
have
a
example.
The
ideas
you
have
a
single
topic
that
topic
has
for
subscribers
as
1s2
as
five
and
a
six.
The
first
two
are
HTTP
subscribers
and
the
second
to
our
cqs
cues,
as
subscribers
so
again,
I
have
a
visualization
of
that
on
the
left.
A
Is
your
service
endpoint,
where
you
make
API
calls
on
the
far
right?
You
see
those
for
subscribers
for
that
topic
as
one
as
through
the
HDP
endpoints
and
s5
and
s6
those
cues
and
in
the
middle,
this
blue
box
here
this
is
what
you,
as
a
user
of
the
service,
never
get
to
see.
This
is
our
internal
implementation
of
it,
and
the
main
point
I
want
to
make
here
is
that
cns
is
implemented
over
cqs.
We
are
using
cqs
cues
internally
to
distribute
the
work
of
fanning
out
messages
to
subscribers.
A
So
I
called
publish
to
publish
a
message
on
a
on
my
topic,
T
and
all
that
API
server
does
is.
It
will
place
a
message
on
this
first
internal
queue,
which
is
called
the
published
job
queue,
and
this
message
will
contain
the
payload
that
we
actually
want
to
publish
as
well
as
an
identifier
for
the
topic.
A
We
want
to
publish
it
on
and
then
from
there
that
message
travels
to
that
second
queue,
which
is
the
endpoint
published
queue,
and
as
we
do
this,
we
look
up
all
the
subscribers
for
that
topic
and
replace
the
topic
identifier
with
a
list
of
all
subscribers.
Now
this
list
can
be
very
long
right
and
if
it
is,
we
actually
split
it
up
into
multiple
messages.
A
A
So
architecture
recap
on
CNS.
We
are
using
cqs
cues
internally.
Those
cues
preserve
messages
when
these
publish
workers
that
I
just
mentioned
are
either
down
or
unavailable.
Also,
another
aspect
is
the
sea
q.
Sq
has
the
visibility.
Timeout
I
talked
about
right,
so
that
helps,
if
published,
workers
crash
right,
say
in
my
last
slide,
say
the
published
worker
that
picks
up
this
message
to
to
publish
that
to
the
yen
points.
Five
and
six
picks
up
that
message
from
the
q
and
then
crashes
right
taking
down
that
message
with
it.
A
If
that
happens
after
30
seconds,
that
message
will
reappear
on
that
cue.
That's
a
feature
of
the
sea
q
SQ
and
another
publish
worker.
A
redundant
publish
worker
that
has
not
crashed
could
pick
it
up
and
andrey
deliver
that
may
produce
some
duplicate
deliveries,
but
again,
that's
okay,
because
the
service
only
says
guaranteed
at
least
once
delivery
and
then
finally,
we
have
the
retry
policy
that
helps
with
delivery
in
case
that
the
receiving
end
points
are
unavailable
temporarily.
A
So
if
you're,
comparing
amazon's
SQ
s
and
SNS
with
our
implementation,
cqs
and
cns,
the
goal,
of
course,
is
full
api
compatibility
and
we're
pretty
much
there.
We
have
all
AP
is
implemented.
Most
parameters
implemented
there
really
only
very
few
exceptions.
For
instance,
we
are
not
implementing
the
latest
version
of
the
digital
signatures
in
amazon
version.
One
and
two
are
okay,
but
version
4.
We
don't,
but
really
the
the
things
we
don't
implement
are
very
few.
A
We
do,
however,
have
a
few
enhancements,
so
we
offer
a
number
of
AP
ice
for
management
and
monitoring
of
these
services,
and
I'm
sure
amazon
has
these
to
accept
that
they
don't
make
them
available
to
normal
users
right
so,
but
but
we
have
things
like
that
and
all
the
limitations.
What
I
mentioned
earlier,
like
maximum
message,
size,
maximum
message,
retention
period,
maximum
visibility
time
out
all
these
things
you
can
configure
if
you
deploy,
seek
us
and
cns,
see
where
we
are
time
right
now.
A
A
Well,
first
of
all,
we
have
made
the
source
code
available
as
open
source,
so
its
up
on
github
for
you
to
check
out,
and
if
you
do,
you
will
notice
that
this,
the
CMB
core
code,
basically
all
the
architecture
that
I
just
described,
that
part
of
the
code,
has
really
not
seen
a
lot
of
change
recently
and
not
seeing
a
lot
of
feature
enhancements
or
bug
fixes
for
that
matter.
We've
done
an
extensive
amount
of
testing,
including
scalability
testing
and
we're
using
it
at
comcast
for
an
increasing
number
of
production
systems.
A
So
first,
so
the
rest
of
this
talk
I
want
to
spend
telling
you
a
little
bit
about
the
testing
we
did
and
the
performance
metrics
we
gathered
and
then
finally,
I
want
to
give
you
a
single
use
case
as
an
example.
So
testing
we've
done
functional
testing
and
by
that
I
mean
unit
tests.
So
we
have
pretty
good
code
coverage.
We've
done
stress
testing
by
that
I
mean
mainly
error
case
and
error
case
testing,
so
a
corner
case
testing.
So
we
have
simulated
Redis
outages.
We
have
simulated
data
center,
fail.
A
Overs
we've
done
endurance
tests
where
we
ran
the
service
for
days
or
weeks
or
even
months
at
a
time.
So
we've
done
all
that,
but
what
was
most
important
to
us,
the
actually
proving
that
the
system
has
that
property
of
linear
horizontal
scalability.
It
took
us
a
long
time
to
get
around
to
doing
that,
and
there
were
several
reasons
why
it
took
us
so
long
I'll
just
give
you
one
reason.
So
if
you
think
about
it,
you
need
a
bunch
of
machines
for
this
deployment
right.
A
You
need
API
servers
behind
a
load
balancer,
you
need
Redis
charts.
You
need
a
cassandra
ring.
You
also
need
probably
extra
machines
to
generate
the
load
for
testing,
so
even
for
a
small
deployment.
You
end
up
requiring
probably
like
10
machines
or
something
like
that,
and
then
you
want
to
scale
up
right,
probably
to
50
or
100
machines.
So
you
can't
really
do
that
with
a
couple
of
F
boxes
anymore,
you
need
to
go
to
the
data
center
and
well
I.
A
Don't
know
if
you
have
any
experience
with
requesting
like
a
sizable
chunk
of
data
center
capacity
from
a
data
center
in
a
large
organization
for
me
and
was
at
first
and
and
it
turns
out
that
there's
some
process
and
overhead
involved
right
so
from
the
time
that
you
actually
request
this
capacity
to
you
getting
access
to
it.
Sometime
can
pass
right.
So
so,
while
we
were
waiting
for
that
I
guess
we
got
somewhat
impatient.
A
We
were
at
some
point
even
contemplating
doing
this
scalability
testing
in
ec
to
but
then
we
figured
it
would
be
just
too
silly
to
first
clone
and
Amazon
service
and
then
hosted
an
ec2.
We
could
have
done
that,
but
we
didn't
so
what
we
actually
ended
up
doing.
Is
we
tested
this
in
an
openstack
cloud,
so
it
turns
out
that
at
comcast
we
are
operating
a
number
of
data
centers
based
on
OpenStack.
A
Some
of
them
already
run
production
systems,
but
there
are
also
a
few
that
are
still
fairly
experimental
and
only
have
few
tenants
and
actually
the
project
group
that
operates
these
data.
Centers
approached
us
and
said:
like
hey:
don't
you
want
to
test
your
service
in
our
OpenStack
data
center?
So
we
can
at
the
same
time
see
if
our
infrastructure
works,
and
so
it
was
a
win-win
for
the
both
of
us
and
that's
what
we
ended
up
doing
and
I'll
share
some
of
the
results
that
we
found.
A
So
the
way
we
went
about
this
is
we
wanted
to
make
sure
that
all
components
of
the
system
are
essentially
oversized,
except
for
the
Cassandra
ring
right
so
that,
as
we
run
our
tests,
everything
would
be
underutilized
and
pretty
much
well,
not
exactly
idle,
but
at
not
stressed,
and
then
we
would
make
sure
that
we
put
some
kind
of
load
of
stress
on
the
Cassandra
ring.
Take
whatever
a
throughput
number.
A
We
would
measure
and
then
increase
the
ring
size
and
do
that
same
test
again,
so
we
played
around
with
ring
sizes,
Cassandra
ring
sizes
from
four
to
twenty
four
nodes
and
had
the
necessary
capacity
of
API
server
and
reddish
shards
around
this
year.
This
bar
chart
shows
you
a
typical
run
test
run
that
we
did.
We
did
many
many
of
those
types
of
tests,
so
each
bar
represents
one
test
minute
and
what
you
see
here
is
an
absolute
numbers.
How
many
API
calls
we
we
sent
to
the
service?
A
Color
coding
tells
you
what
calls
they
were.
The
blue
ones
were
send
message,
calls
putting
messages
on
cues,
then
the
green
ones
are
received,
message
calls
and
the
brown
ones,
I
believe,
are
delete
message
calls.
So
one
interesting
thing
you
see
here
already
is
that
apparently
sends
are
faster
than
receives
and
deletes
right,
and
this
is
actually
a
characteristic
that
we
would
expect
from
a
system,
that's
backed
by
Cassandra
right
that
rights
are
cheaper
than
then
reads
and
deletes
so
by
the
way
these
tests.
This
is
looking
from
the
perspective
of
a
single
API
server.
A
We
have
ten
of
those.
So
what
you're
seeing
here
is
ten
percent
of
the
load.
Those
tests
were
pushing
two
million
messages
each
and
we
did
that
in
roughly
15
minutes
or
so
translating
two
million
message.
Many
messages
translates
to
6
million
API
calls
right
because
you
have
send
receive
and
delete
and
yeah
on
a
single
API
server.
You
would
see
ten
percent
of
that
load,
which
translates
to
200,000
messages
for
that
very
same
test.
We
also
captured
the
response
time.
Percentiles.
A
The
first
chart
here
shows
you
the
response
time
percentiles
on
the
API
server.
So
that's
end-to-end
time
that
each
API
call
took
for
that
call,
mix
of
send
receive
and
delete,
and
the
resolution
for
that
is
10
milliseconds.
So
each
change
in
color
means
an
increment
of
10
milliseconds
in
response
time.
A
The
greener
it
is
the
faster
it
was,
the
more
red
or
purple
the
longer
it
took,
and
then
the
second
and
the
third
diagram
show
your
response
time
percentiles
for
those
same
calls
but
broken
down
into
how
much
of
that
time
was
spent
in
Redis.
That's
the
second
diagram
and
how
much
of
the
time
was
spent
in
Cassandra
and
the
resolution
for
those
two
is
one
millisecond,
not
ten
milliseconds,
but
one
millisecond.
A
So
without
going
into
much
to
do
much
detail
here,
what
you
can
see
is
that
the
API
service,
as
well
as
the
Redis
charts,
were
still
pretty
happy,
whereas
Cassandra
was
stressing
like
during
the
middle
of
this
test,
more
than
fifty
percent
of
the
calls
came
back
as
purple,
which
means
in
this
diagram
10,
milliseconds
or
more
or
way
more
like.
If
a
call,
if
one
of
those
calls
took
a
second
or
two,
it
would
also
fall
under
this
purple
category
here.
So
for
each
and
every
test
we
ran.
A
You
see
this
starts
out
perfectly
linear
and
then
at
12
no
drinks,
as
we
add
a
slight
drop
off,
and
then
it
continues
linearly.
So,
if
you're
asking,
why
is
that
slight
drop
off
there?
We're
still
investigating
this?
This
is
brand-new
stuff.
By
the
way
we
took
these
numbers
over
the
past
couple
of
weeks,
so
you
are
actually
the
first
audience
to
see
this
here.
So
I
I
would
say
that.
Well,
first
of
all,
we
are
not
really
testing.
Just
a
Cassandra
ring
it
right
under
perfect
lab
conditions.
A
We
are
actually
putting
a
realistic
load
on
a
relatively
complex
system
that
contains
a
Cassandra
ring
and
as
much
as
we
try
to
make
all
these
other
factors
disappear.
We
may
have
not
completely
succeeded
at
doing
that,
like
from
the
numbers.
What
we
looked
at
it
still
looked
like
there
was
some
room
and
Sandra
for
four
more,
so
this
line
could
have
been
more
linear.
To
give
you
one
example
like
11
of
those
factors,
for
instance,
we
didn't
have
a
real
load
balancer
at
our
disposal.
A
We
were
actually
resorting
to
using
a
software
load,
balancer
H,
a
proxy
and,
as
we
were
increasing
the
load,
we
saw
that
there
were
some
issues
with
that,
so
we
have
to
tweak
that
a
little
bit
and
they
may
very
well
be
other
things
in
those
tests.
But
overall
it's
still
I
mean
we
wanted
near
linear
scalability
and
it
doesn't
look
like
we
are
entirely
off
here.
I
mean
we
were
actually
pretty
happy
with
these
results.
A
Then
we
did
the
same
thing
for
cns
for
the
pops-up
service.
We
wanted
to
see
throughput
scalability
here
as
well,
and
here
the
most
important
metric
is
end-to-end
latency
right.
How
long
does
it
take
from
the
moment
in
time
that
you
publish
a
message
on
a
topic
until
all
the
subscribers
have
received
that
message?
That's
end-to-end
latency.
A
So
in
this
case,
what
we
wanted
to
do
is
we
wanted
to
make
sure
that
the
underlying
cqs
service
is
underutilized
and
not
stressed,
and
we
wanted
to
see,
we
wanted
to
stress
the
publish
workers
instead
and
see
how
we
could
increase
throughput
as
we
increase
the
number
of
published
workers
that
handle
the
fan
out.
I'll
spare
you,
the
numbers,
I'll
just
show
you
the
visuals
and
the
most
interesting
thing
here
is
circled
in
green.
A
We
could
double
the
throughput
to
eight
thousand
messages
per
second,
because
we
doubled
the
number
of
published
workers
right
while
retaining
an
end-to-end
latency
of
roughly
200
milliseconds,
so
again
near
linear
scalability.
So
that
was
my
testing
and
performance
metrics.
That
I
wanted
to
share
with
you
and
then
I'll
conclude.
We
are
almost
at
the
end.
I'll
conclude
with
a
use
case.
This
one
here
is
the
sports
app.
A
So
this
runs
on
the
x1
setup
box,
which
we
now
deliver
to
all
new
Xfinity
customers
and
one
of
the
nice
things
about
this
x1
setup
box
is
that
you
can
run
apps
on
it,
and
this
here
is
showing
you
a
screenshot
from
one
of
our
most
popular
apps,
which
is
the
sports
app.
So
here,
on
the
left
hand
side
where
it
says,
travel
HD.
That
is
where
you
would
continue
to
watch
your
television
program
as
you
bring
up
a
nap
and
then
the
right
hand
side.
A
A
What's
the
current
score,
you
can
drill
into
a
detail,
screen
for
a
live
game
and
see
some
more
statistics
about
it
and
behind
the
scenes,
all
this
data
we
get
from
a
third-party
data
service,
things
from
a
third-party
data
service
called
stats,
calm
and
in
our
back
end
we
are
using
a
CNS
topic
to
fan
out
game
event,
notifications
to
all
active
instances
of
the
sports
app.
So
I
have
another
screenshot
that
shows
you
what
kind
of
traffic
went
over
that
topic,
and
so
this
one
here
was
taken
during
the
period
of
about
one
month.
A
Last
year
you
see
these
little
red
clusters
here
our
evenings
or
afternoons
where
game
activity
was
going
on
and
typically
we
have
like,
maybe
10
or
up
to
20
messages
per
second,
when
there's
game
activities.
So
it's
not
really
a
very
high
throughput
application
really
over
that
entire
month.
I
think
we
collected
roughly
a
million,
publish
calls
a
million
messages.
Then
maybe
there
are
some
evenings
where
there
were
more
games,
so
there
are
higher
loads.
But
an
interesting
point
is
this
spike.
A
You
went
up
to
130
messages
per
second
for
a
while,
and
so
this
was
not
a
production
system
here.
This
was
actually
a
q
Isis
q,
a
system
that
was
not
terribly
redundant.
In
fact
it
had
a
single
published
worker
and
that
published
worker
failed
at
that
point
in
time,
and
as
you
can
see
when
that
happens,
when
you
have
no
published
workers
around
no
message
is
no
message
gets
lost
because
these
messages
are
all
stuck
in
one
internal
c:
q
SQ.
A
So
when
we,
when
we
fired
up
the
publish
worker
again,
it
basically
just
caught
up
and
published
all
the
messages
that
it
have
missed
out
on
just
reading
them
from
a
an
internal
cqs
q.
So
I
thought
that
was
interesting,
so
this
is
my
last
slide.
Moving
forward,
we
seek
our
seek
us,
and
cns
services
are
still
under
active
development.
We
will
continue
to
follow
SNS
and
SQS
api's
as
they
evolve.
We
still
have
plans
for
more
load
and
stress
testing,
and
one
focus
recently
has
been
ease
of
deployment
and
and
scaling.
A
A
So
we
don't
have
those
puppet
scripts
up
on
our
github
repository
yet,
but
we
will
make
those
available
in
the
near
future
and
then
finally,
we
are
looking
at
more
in-house
production
deployments
which
are
currently
all
isolated
by
application
and
then
eventually,
maybe
we
can
even
offer
CQS
and
cns
as
a
service
and
yeah.
That's
that's
all
I've
got
for
you,
I,
don't
know.
If
we
have
time
for
a
couple
questions
yeah,
we
do
okay,
great.
So
if
there
are
any
questions
yeah,
if
you
could
come
up
to
the
mic,
is
that
okay,
thanks.
A
Were
one
kilobyte
messages
and
we
actually
found
that
in
that
range
like
anything
like
between
just
you
know
a
few
bites
to
up
to
like
two
kilobyte
or
so
we
pretty
much
get
the
same
numbers
like
you,
you
would
start
see
like
Layton
sees,
shift
and
throughput
go
down
if
you
have
like
message
sizes
that
go
beyond
north
of
two
kilobyte.
Basically,
okay.
Other
questions.
Yes,
go
ahead,.
B
A
That's
that's
an
excellent
question.
So
I
was
simplifying
the
storing
a
little
bit,
the
in-between
our
topic
that
that
publishes
and
the
setup
box.
There
is
an
app
server
in
between
so
the
setup
boxes
are
entirely
cloud-based,
so
they
are
pretty
simple
in
their
design
and
the
actual
app
is
running
in
an
app
server
on
in
a
data
center.
A
So
each
of
those
we
have
a
number
of
of
these
app
servers
as
we
scale
up,
but
each
of
those
observers
takes
care
of
a
whole
bunch
of
set
of
boxes,
so
we
are
actually
publishing
to
the
app
servers
and
they
then
in
a
second
layer
fan
out
to
the
actual
setup
box.
So
the
HTTP
endpoint
is
on
that
app
server,
not
not
on
the
setup
box.