►
From YouTube: C* Summit 2013: Big Architectures for Big Data
Description
Speaker: Eric Lubow, CTO and Co-founder at SimpleReach
Slides: http://www.slideshare.net/planetcassandra/2-eric-lubow
Having many different technologies within an organization can be problematic for developers and operations alike. Structuring those systems into discrete modules not only abstracts away a lot of the complexity of a heterogeneous architecture, it also allows the evolution of systems using common access and storage patterns. This session will discuss how to think about, architect, and maintain a service architecture for a big data system.
A
All
right
guys
get
started
so
yeah,
there's
that
my
name
is
Eric
lubo
I'm,
the
CTO
of
simple
reach
and
I'm,
going
to
talk
to
you
today
a
little
bit
about
how
we
built
our
architecture
to
process
the
amount
of
data
that
we
see.
So
these
are
the
things
I'm
going
to
talk
to
you
about
and
tell
you
a
little
bit
about
the
company
I'm
going
to
talk
to
you
about
what
the
goals
of
having
that
architecture
are.
A
What
the
tools
that
we
use
to
build
them
are
and
then
I'm
going
to
hopefully
have
time
for
some
questions,
but
there's
no
guarantee
they're.
So,
first
of
all,
one
of
the
things
that
we
learned
when
we
were
starting
to
work
with
really
large
amounts
of
data
is
that
most
of
it's
absolutely
useless
and
you
have
to
get
through
the
useless
stuff
in
order
to
get
to
the
good
stuff,
but
the
actual
way
to
phrase
that
which
I
prefer
a
little
bit
more
to
Borat.
Is
this
even
with
the
right
tools?
A
A
Okay.
So
how
did
we
do
it?
We
use
all
of
those
we
use
all
those
technologies
in
order
to
be
able
to
deliver
to
our
customers
and
deliver
to
internally
delivered
to
the
team,
all
the
data
that
they
need
in
a
timely
fashion
and
presented
to
them
in
the
fashion
that
they
need.
We
use
vertica,
Redis,
Cassandra,
solar
and
in
terms
of
programming
languages
we
use
Ruby,
ember,
j/s,
node
python
and
go
and
you'll
notice.
Java
is
not
on
there
and
I'm
super
thrilled
about
that.
A
So
in
order
to
build
this
this
system
to
process
all
this
data,
we
have
a
couple
of
goals
in
mind.
We
need
we
want
consistent
non
data,
storage
layer
access
patterns
and
I'll
show
you
what
that
means.
In
a
little
bit.
We
want
data
accuracy
across
storage
engines.
If
you're
storing
the
same
data
in
as
you
are
in
Cassandra,
you
want
to
make
sure
that
the
numbers
match
there's
really
interesting
ways
that
it
appears
when
they
don't.
You
want
to
minimize
downtime
or
minimize
the
cost
of
downtime,
because
there
will
invariably
be
down
time.
A
So
you
want
to
minimize
its
impact.
You
want
highly
available
systems.
You
want
to
allow
access
to
many
tool,
sets
right
if
you're,
using
all
these
systems-
and
you
know
using
you're
using
Cassandra,
you
want.
You
want
the
access
to
the
toolkits
that
and
the
tool
sets
that
each
one
of
those
systems
will
provide
you,
because
certain
things
are
they
just
there's.
A
Just
cool
tools
for
there's
just
some
cool
tools
that
only
work
on
Cassandra,
there's
just
some
cool
tools
that
are
only
backed
by
Redis,
so
you
want
to
be
able
to
take
advantage
of
all
those
things,
all
clients
that
need
to
talk
to
your
architecture
and
that's
not
people,
clients.
That's
you
know,
consumers
of
data
with
algorithmic
clients,
system
system,
type,
clients.
You
want
them
to
have
minimal
knowledge
of
the
underlying
architecture.
A
A
You
want
authentication,
tracking
and
throttling,
and
I
will
show
you
why
that's
important
in
a
minute
and
and
you
want
to
control
your
data
flow
patterns.
These
are
what
was
very.
These
are
the
tenants
that
were
very
important
to
us,
so
consistent
data
access
patterns
for
us
we
have
something
called
the
real-time
score.
A
Cassandra
gets
stored
as
a
composite
column
and
in
it
can
either
be
stored
as
real
time
score,
real
time
or
SRT,
depending
on
whether
or
not
it's
a
short
document
or
a
long
document
so
rather
than
having
the
client
asked
for
which
ones
which
you
just
have
a
consistent
access
pattern,
always
call
it
real
time
score.
So
one
is
good.
One
is
bad
and
this
will
become
important
later
on.
So
why
is
authentication
tracking
and
throttling
important?
A
Well
you,
it's
really
easy
to
have
services
run
amok
when
you
have,
when
you're
deploying
all
the
time
different
consumers
different
data
producers,
you're
ingesting
different
types
of
events.
It's
really
easy
to
have
these
things
just
over
process.
Ask
too
many
questions,
or
you
can
just
end
up
writing
some
bad
code
and
not
that
anyone
here
writes
bad
code,
but
just
in
case
someone
did
the
services
could
just
you
know
da
see
ourselves
internally,
so
you
want
/
service
access
keys,
so
every
single
one
of
our
services
has
their
own
internal
access
key.
A
You
want
to
track
call
volume
because
you
need
to
know.
Do
I
need
more
API.
Endpoints
do
I
need
more
capacity
to
handle
certain
types
of
requests.
Do
I
need
more
capacity
in
the
data
storage
layer
for
this
type
of
request
than
say
another
request
again
you
want
to
prevent
internal
denial
of
service
attacks.
These
are
unintentional,
but
they
can
happen
and
when
they
do
happen,
it
really
sucks
to
blame
yourself,
and
you
also
want
to
want
to
monitor
availability
and
performance
of
those
calls.
A
If
you
typically
see
that
say,
your
call
for
an
account
data
for
a
bit
of
account.
Data
takes
10,
milliseconds
and
all
of
a
sudden,
it's
taking
40.
Well,
that's
you
know,
might
only
be
30
millisecond
difference,
but
that's
kind
of
a
big
deal.
So
how
do
we
control
our
data
flow?
We
use
something
called
NS
q
and
I'll
and
I'll
show
you
what
NS
q
is
on
the
next
slide.
But
for
us,
NS
q
is
a
really
interesting
piece
of
software
because
it
allows
a
couple
of
things
one.
A
It
allows
us
to
queue
up
all
of
our
requests,
all
of
our
events
and
all
of
our
two
dues
at
every
stage,
and
it
allows
us
to
do
multi
casting
of
those
requests.
So,
for
instance,
say
a
new
tweet
comes
in.
It
gets
stuck
at
the
edge,
so
our
social
data
consumer
will
pick
that
up
and
we'll
multicast
it
to
three
different
consumers.
A
It'll
first
pick
up
the
batch
and
right
data
right,
it'll,
just
take
a
whole
bunch
of
tweets,
put
them
together
and
process
them,
update
a
total
count
and
write
it
to
disk
this
way,
you're
not
doing
one
right
for
every
tweet.
You
can
group
them
up
and
in
Cassandra
you
get
to
take
advantage
of
like
the
batch
mutate,
for
instance,
raw
data.
Same
thing
you
just
take
the
raw
data,
put
it
in
a
batch
mutate
and
have
Cassandra
write
it.
A
This
way,
you
turn
many
rights
into
one
right
and
on
our
end,
we
do
something
where
every
time
we
see
a
social
event,
we
like
to
calculate
a
new
score,
because
that
obviously
updates
the
value
of
a
particular
piece
of
content.
If
it
was
tweeted
about
so
we
then
update
the
score,
which
then
will
create
a
new
NS
q
job
for
writing
the
score
to
the
various
data
stores
that
it
needs
to
go
to.
A
The
control
data
flow
allows
us
to
create
maintenance
windows
with
no
downtime.
So,
for
example,
where
we
were
writing
that
raw
data,
we
can
just
let
the
cues
back
up
while
we
say
to
a
Cassandra
upgrade
run
through
all
of
the
upgrade
nodes
and
then
let
that
stuff
process
and
the
processing
will
be
quickly
because
we're
not
writing
single
events,
we're
batching
them
up
into
groups
of,
say
you
know
a
hundred
or
a
thousand
and
letting
them
right
in
groups.
A
It
also
has
a
really
cool
feature
where
it's
just
a
femoral
channel.
So
if
you
want
to
take,
if
you
want
to
just
take
a
look
at
what's
coming
down
those
message
queues,
you
can
look
at
it
and
you
won't
lose
those
messages
and
won't
act
them,
and
you
can
take
a
look
at.
What's
going
on
and
those
messages
can
get
processed
by
your
actual
system
and
you
don't
have
to
worry
about
the
delivery.
A
So
what
does
that
look
like
when
it
comes
to
our
actual
system?
We
have
a
bunch
of
different
ways
that
we
collect
data.
The
specifics
are
not
really
important,
but
from
the
internet
we
make
sure
that
all
data
does
what
we
like
to
call
flowing
downhill.
So
it
comes
from
the
internet
to
our
edge
collectors,
hits
a
set
of
cues
those
cues
talk
to
the
consumers,
pull
from
the
cues,
the
cues
I'm.
A
Sorry,
the
consumers
will
then
go
to
the
internal
service
architecture
to
get
any
data
that
they
need
to
process
their
jobs,
that
internal
service
architecture
is
responsible
for
talking
to
the
individual
data
storage
layers
and
then
passing
the
information
back
up,
and
then
it
will
write
them
again.
Going
downhill
and
I'll
show
you
what
the
internal
service
architecture
looks
like
in
a
minute.
So
how
did
we
get
here?
I'll
tell
you.
A
That
was
everything
from
how
often
we
access
each
type
of
data
store
the
types
of
messages
we
produce
the
types
of
messages
that
come
in
everything
that
everything
revolved
around,
knowing
where
we
could
be
accepting
of
read
latency
and
knowing
where
we
could
be
accepting
of
write
latency
and
knowing
where
the
real
time
patterns
were
important
and
knowing
where
the
the
sort
of
offline
jobs
would
be
important.
We
built
the
service-oriented
architecture
and
we
also
had
to
do
a
lot
of
data
accuracy
checks.
A
So
again,
if
you
have
data
stored
in
and
in
Cassandra
and
in
vertica,
like
in
our
case
and
actually
read
us
as
well,
if
those
things
are
not
all
identical,
then
when
you
pull
from
one
and
you
pull
from
another
and
they
don't
match
and
you
can
show
them
both
on
the
same
page,
for
instance,
then
a
you
know,
a
customer
is
going
to
look
at
that
and
be
like.
Well,
how
come
this
thing
says:
20
tweets-
and
this
thing
says
25
there's
clearly
a
discrepancy.
A
So
one
of
the
things
that
we
had
to
build
is
going
down
through
the
line.
All
the
consumers
needed
to
be
aware
of
what
was
being
written
to
the
other
data
stores,
so
that
we
could
ensure
accuracy
in
one
datastore
versus
the
other
data
store,
and
we
built
a
frame
work
out
for
testing
and
for
testing
different
engines.
So
what
this
means
is
that,
as
in
front
of
the
service
architecture,
I'm
sorry,
the
service
architecture
sits
in
front
of
all
the
data
stores.
A
So,
as
we
were
trying
to
decide
which
data
store
was
going
to
be
best
for
a
particular
feature
or
feature
set,
we
needed
to
see
you
know,
did
we
want
to
use,
for
instance,
verdict?
Oh
did
we
want
to
use
vertica?
Did
we
want
to
use
info
bright
if
we
want
to
use
infinity,
be
all
these
are
column
storage
engines
and
for
us
to
find
the
right
one
that
fit
our
business
and
fit
those
features
we
had
to
figure
out
a
way
to
do
it.
A
So
what
we
were
able
to
do
was
just
plug
and
play.
We
put
all
the
storage
engines
behind
that
service
architecture.
We
ran
the
query
against
the
service
architecture.
The
architecture
would
then
say,
ok,
I
know,
I
need
to
run
this
query
against
vertica
and
infinity
be
and
info
bright.
The
results
should
all
be
identical.
It'll
log,
the
response
times
write
them
off
to
a
different
place
for
us
to
look
at
later
and
when
we
decided,
which
one
we
wanted
to
go
with,
which
was
ultimately
vertica.
A
We
just
pull
infinity,
be
and
info
bright
out
zero
downtime.
Nobody
was
the
wiser
we
made
our
decision
and
we
had
an
entire
testing
system
and
we
can
do
the
same
thing
with
with
with
cueing
engines.
If
we
wanted
to
try
rescue,
vs,
n,
SQ
or
RabbitMQ,
because
that
service
architecture
has
gives
us
the
ability
to
have
a
consistent
access
pattern
and
put
space
and
time
in
front
of
requests.
A
The
other
thing
that
we
did
that
gave
us
the
ability
to
build
this
architecture
out.
Is
we
made
sure
everything
looked
the
same?
Every
chance
we
get
we
get
so
the
base
image
starts
out
with
an
Amazon
am
I.
We
have
our
organizational
information
which,
as
you
can
see,
is
users
is
the
application
specific
configuration
application
groups,
meaning
is:
does
s
need
to
be
on
there
for
sharding?
A
Does
the
NS
QD
client
need
to
be
on
there
for
consuming
or
for
producing
or
if
it's
a
lookup,
and
then
we
have
whatever
application
sits
on
top
and
this
provided
us,
the
ability
to
say
anytime,
we
need
to
launch
a
new
image.
We
just
launched
that
base
image.
We
put
the
organizational
specific
stuff
on
there
and
then
beyond
that
we
decide
what
application
group
does
it
fit
in.
Is
it
a?
Is
it
a
database?
Is
it
a
application?
Is
it
a
web
server
and
we
put
the
appropriate
things
on
there?
A
This
is
all
because
we
have
a
great
systems-
guy
I'd
love
to
say
team,
but
we're
a
small
company.
So
it's
really
just
one
guy
with
a
lot
of
headaches
and
I
can't
get
them
to
wear
the
hat,
so
I've
tried.
So
we
make
extensive
use
of
AWS.
Aws
has
some
great
stuff
like
opsworks,
which
is
a
chef
based
system
for
configuring
machines
and
type
and
application
types.
We
also
monitor
everything
very
heavily
and
anybody
who's
had
any
level
of
monitoring.
Experience
know
that
no
matter
how
much
you
do
it
never
seems
to
be
enough.
A
So
we've
got
nagios
for
the
base.
Monitoring
stats
d
for
application-specific,
instrumentation
and
I
would
love
a
replacement
for
graphite.
So
if
anybody
knows
one,
please
tell
me,
because
it's
not
as
awesome
as
we'd
like
it
to
be
so
again,
chef,
opsworks
vagrant
for
anybody
who
does
systems
in
the
room.
It
allows
you
to
spin
up
a
small
machine
set
up
on
your
local
machine,
so
you
can
have
a
production
like
scenario
or
production
like
setup
on
your
local
for
development.
A
Css
HX
is
a
cluster
ssh
client,
which
basically,
what
we
use
before
we
had
the
ability
to
use
chef
and
deploy
across
everything.
We
would
just
spin
up
cluster
SSH
sessions
SSH
into
a
hundred
machines
at
a
time,
run
the
command
and
then
close
that
out.
That
is
exactly
as
painful
as
you
would
imagine
deployment.
We
all
do
we
do
now
with
Chef.
A
So
I'm
putting
that.
I
put
this
slide
up
here,
because
if
anybody
is
working
in
AWS,
they
typically
stick
to
the
standard,
the
ec2
and
end
and
EBS
volumes,
and
that's
that's
good
if
you're
a
small
shop,
but
it's
also
not
good
and
the
reason
it's
not
good
is
because
there's
all
these
other
features
that
make
your
life
significantly
easier
and
we
were
kind
of
hesitant
to
use
some
of
them.
A
You
know
putting
our
vertica
cluster
inside
of
the
virtual
private
cloud
meant
we
were
able
to
reduce
latency
between
machines,
understanding
what
external
tools
are
available.
So,
for
instance,
we
run
a
lot
of
offline
jobs.
We
run
a
lot
of
MapReduce
stuff.
An
elastic
MapReduce
is
good,
but
you
have
to
have
everything
on
s3.
So
what
we
did
to
get
around
that
was.
We
found
a
company
called
mortar
data
which
uses
elastic
produce
under
the
hood.
A
You
put
your
you,
give
them
access
to
some
of
your
s3
buckets
and
even
other
methods
of
access,
and
you
get
to
take
advantage
of
the
AWS
services
without
even
having
to
you
know,
understand
EMR
elastic,
beanstalk,
every
time,
we're
doing
new
rails
development
or
testing
new
apps.
We
just
spin
up
an
elastic,
Beanstalk
app,
which
basically
comes
with
all
the
rails
pieces
built
in,
and
it's
so
much
less
work
for
us
to
do
on
the
system
side.
A
So
all
these
things
are
tool
sets
that
we
get
to
take
advantage
of
because
you
can
just
plug
and
play
right
into
our
architecture
and
the
developers
do
not
need
to
become
aware
of
additional
things
in
the
architecture.
So
this
is
actually
what
it
looks
like
in
a
very
superficial
sense.
I
promise
there's
more
than
that
number
of
machines,
though.
So
what
does
the
service
architecture
look
like
again?
We
start
with
our
base
image
layout.
A
This
is
I
can't
stress
how
important
having
that
base
image
layout
is,
especially
when
it
comes
to
the
monitoring
and
instrumentation,
so
the
proxy
machine
and
you'll
see
on
the
next
slide
what
the
proxy
machines
are
for.
The
proxy
machine
sits
in
front
of
any
storage
machine
and
it
tells
the
the
requesting
app
to
hold
on
a
second
while
it
gets
the
information.
A
I,
know
I
need
to
get
this
type
of
data
from
this
type
of
data
from
vertica
and
this
type
of
data
from
Cassandra
and
packages
it
all
up
and
sends
it
back
through
JSON
and
I'm.
Sorry,
a
JSON
format
back
through
to
the
querying
machines.
The
reason
that
this
is
a
very
good
methodology
is
because
it
does
not
force
the
existence
of
a
giant
monolithic
service
architecture.
So
every
time
you
do
a
code
deploy,
for
instance,
if
somebody
makes
a
typo
and
it
takes
down
your
entire
internal
service
architecture.
That's
for
obvious
reasons,
a
problem.
A
So
what
we
did
was
we
broke
it
all
down
in
a
little
chunk,
so
each
API
endpoint
is
its
own
tiny
system,
so
10
minute
content
its
own
little
Python,
App,
hourly
content,
the
own
its
own
little
Python
App
account
its
own
little
Python
app
so
and
it's
not
just
Python
I
was
Python
as
the
example
we
use,
whatever
language
is
actually
best
for
the
data
storage
layer.
So
in
some
cases
we
use
go
in
some
cases
we
use
Python
and
in
some
cases
we
use
nodejs.
So
we
did
this
for
availability.
A
We
did
it
again
for
the
consistent
access
patterns,
which
is
another
feature
that
I
can't
stress
enough
again
minimal
downtime
on
your
changes.
If
you
want
to
deploy
just
a
change
to
say
one
endpoint,
there's
no
need
to
deploy
a
giant
again:
monolithic
service
architecture,
war
file
or
whatever
it
might
be.
You
just
deploy
that
one
little
app
change,
restart
that
one
app
it
becomes
unavailable.
For
you
know,
however
long
it
takes
to
restart
that
app
and
if
it's
a
small
one
it'll
probably
take
a
cut.
You
know
a
second.
A
Maybe
if
that
and
you've
got
yourself
a
newer
version
of
the
of
the
endpoint
smaller
code
deploys
clearly
I
like
that
word.
So
what
is
how
do
we
keep
ourselves
available?
We
made
sure
that,
even
though
we
spin
up
an
auto
scale
quite
frequently,
we
made
sure
that
the
distribution
within
Amazon,
at
least
us
East,
is
such
that
we
can
lose
an
entire
data
center.
Like
us,
East
18
could
just
drop
out
of
existence
and
we'll
still
be
good,
because
every
time
a
new
machine
comes
up,
it
checks,
the
other.
A
It
checks
the
other
availability
zones
to
make
sure
that
we're
doing
some
evenly
balanced
machine
deployments
and
here's
an
interesting
fact
that
we
learn
the
hard
way
last
week.
If
there
is
not
an
even
number
of
machines
within
an
availability
zone,
elastic
load
balancers
will
push
mower
of
your
a
disproportionate
amount
of
your
traffic
into
the
availability
zones
with
more
instances.
So,
for
instance,
if
we
have
four
instances
of
an
end
point
in
u.s.
East
one
a4
and
1b
and
3in1
a
then
one
a
will
get
like
ten
percent
of
the
traffic
and
the
other.
A
That
might
not
seem
like
a
big
deal,
but
when
you
got
millions
of
events,
it's
pretty
easy
to
take
down
a
small
number
of
machines,
hence
that
internal
denial
of
service
attack-
you
know,
accident
stuff,
so
you're
always
going
to
run
into
problems
and
anybody
who's.
Seen
any
of
my
talks.
I
just
love
this
unicorn,
so
I
put
them
in
everything.
It
really
has
no
meaning,
but
you're
always
going
to
end
up
with
problems,
and
this
guy
clearly
has
them.
A
So
the
reason
we
built
that
architecture,
the
way
we
did
is
because
we've
seen
when
we've
been
in
a
scenario
where
us
East
1a,
for
instance,
has
gone
down
and
we
haven't.
So
in
fact,
when
I
was
creating
this
slide,
I
guess
about
six
months
ago,
prior
to
another
conference,
US
East
18
did
go
down,
which
is
why
I
actually
used
this
one
in
particular,
and
we
did.
We
did
not
go
down
with
it
because
of
our
distribution.
A
Every
time
we
want
to
create
a
new
service.
We
have
a
very
specific
subset
of
questions
that
we
have
to
ask,
and
we
ask
ourselves
the
same
set
of
questions
along
with
a
few
others.
First,
can
this
host
can
the
host
of
the
service
be
completely
homogeneous,
meaning
does
it
fit
with
our
pattern?
Does
it
fit
with
that
architecture,
pattern
of
the
chef
base,
the
organizational
users
and
then
the
application
or
the
application
group,
and
then
the
application
itself?
Can
it
except
downtime,
and
what
does
downtime
look
like?
A
Can
you
create
a
scenario
where
you
can
let
the
producing
the
producing
of
messages
back
up
in
a
queue
with
minimal
impact,
because
that's
always
our
goal
is
to
have
the
minimal
amount
of
impact
for
downtime?
Does
it
fit
into
an
existing
service?
In
other
words,
would
it
be
better
to
lump
it
in
with
another
service?
Would
it
create
an
additional?
Would
it
create
a
large
code
base
for
that
service,
or
can
we
just
or
does
it
really
need
to
be
its
own?
Does
it
require
data
center
distribution?
The
answer
to
that
is
almost
always.
A
Yes,
but
again,
you
just
have
to
ask,
because
you
need
to
know
what
what
what
trade-offs
you
may
need
to
make
when
creating
a
new
service.
How
should
it
be
instrumented
or
monitored?
And
again?
This
is
also
very
critical
because,
as
I
said,
denial
of
service
attacks,
whether
internal
or
external
you're,
going
to
want
to
be
aware
of
them
and
you're
going
to
want
to
need
to
know
you're
going
to
want
to
know
what
normal
usage
patterns
look
like.
A
So,
just
to
tell
you
everything
that
I
already
told
you
again
only
in
a
few
short
bullet
points,
you
need
to
know
what
you're
looking
at
you
need
to
know
what
you're
working
with
and
that's
sort
of
an
evolution.
We
made
a
ton
of
mistakes
over
the
past
few
years
to
get
here,
we're
quite
happy
with
where
we're
at,
but
we're
still
working
towards
we're
still
working
towards
you
know
smaller
more
efficient
code
bases
and
deployments
so
build
use
and
integrate
the
external
tools
again.
A
So
those
are
the
those
are
the
big
things
that
that
really
need
to
be
thought
about.
This
is
the
new.
Thank
you
slide.
I've
decided
that
so
if
any
of
this
sounds
even
remotely
interesting
feel
free
to
come.
Find
me
last
thing:
I
have
is
just
a
little
announcement
myself,
and
this
gentleman
right
here
are
writing
a
cassandra
book
which
is
hopefully
going
to
be
published
in
September
called
practical
cassandra,
so
be
on
the
lookout
for
it.
If
you
have
any
questions,
I
got
two
or
three
minutes,
so
I
can
probably
take
a
few.
A
So
he
asked
in
the
service
architecture
and
in
the
storage
layers
under
the
service
architecture.
Are
we
storing
different
information?
And
the
answer
is
yes,
we
find
is
much
better
at
storing
aggregates
or
counts
the
incrementer
zar
faster
and
way
more
reliable
than
counters
in
Cassandra.
We
try
to
avoid
them.
We
can't
really,
but
you
know
we
use,
for
instance,
for
handling
our
users
and
our
accounts,
because
it's
got
a
great
object
layer,
an
ORM
layer
that
rails
plugs
in
very
easily
with
so
it's
just
different
there's.
A
A
Yes,
yes,
we
do
have
processes
to
reconcile
the
differences
at
the
end
of
every
hour.
At
the
top
of
every
hour,
we
try
to
make
everything
within
our
system
trigger
based.
So
every
time
a
new
event
comes
in
or
a
new
message
is
created
that
will,
you
know,
kick
off
n
number
of
other
events,
but
there's
just
certain
things
that
it
really
won't
work
with.
A
A
So
the
question
was:
how
do
how
do
we
know
that
that
the
URLs
yeah?
Actually
we
don't
so?
The
question
was:
how
do
we
know
that
a
URL
is
become
inaccessible
I'm?
Actually,
we
don't
really
care
all
we
care
is.
We
want
to
know
how
many
events
that
we've
seen
for
this
to
date,
if
it
doesn't
exist
anymore,
its
kind
of
not
our
problem.