►
From YouTube: NYC* 2013 - "The Big Data Revolution is an Evolution"
Description
Speaker: Eric Lubow
SlideShare: http://www.slideshare.net/planetcassandra/big-data-revolution-is-an-evolution-ny-cassandra-20130320
Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.
A
A
How
users
are
engaging
with
them
on
a
social
level,
time
series
data
we've
got
a
lot
of
it.
Everything
that
happens
in
terms
of
tweets,
Facebook,
Likes
linkedin
shares
anything
across
the
major
social
networks.
These
are
all
things
that
content
creators
want
to
know
about,
but
what's
the
important
stuff
right
like
that's
something
that
people
really
haven't
figured
out
and
that's
something
that
we're
working
on
figuring
out,
because
there's
all
these
disparate
data
sources.
You
know
you
get
a
little
bit
from
facebook.
You
get
a
little
bit
from
Twitter.
A
Also
had
to
try
and
figure
out
what
the
best
thing
to
do
along
social
media
was.
You
know
that
that
creates
a
situation
where
everyone
has
to
know
everything
about
everything,
and
that's
that's
not
what
we're!
That's
not
what
anybody's
after
so
we've
done
here
is
we've
created
a
dashboard
that
allows
content
creators
to
take
a
look
at
what
they've
published
and
it's
sorted
by.
A
We
look
at
millions
of
URLs
every
day,
so
clearly
right
off
the
bat.
That's
a
big
data
problem,
because
not
only
in
finding
out
what's
going
on
on
each
individual
URL,
but
you're
also
finding
out
all
the
social
actions
that
take
place
around
each
individual
URL.
That
translates
over
a
billion
page
views
per
month.
The
b-word
is
a
fun
word,
because
that
means
that
everything
that
you've
done
that
previously
worked
will
probably
no
longer
work.
A
So
it's
a
it's
a
total
change
in
scope,
which
again
is
why
things
are
evolution,
because
everything
that
you
don't
worked
with
a
few
hundred
thousand
URLs
is
not
going
to
with
a
few
million
URLs.
Unless
you
did
it
right.
The
first
time,
in
which
case
I'd
like
to
hire
you
because
that
doesn't
happen
very
often
we.
A
We
use
the
we
use
the
Amazon
Cloud
and
we
do
that
because
everything
we
do
needs
to
scale
up
and
down
with
traffic
patterns
at
a
given
time,
we're
using
between
90
and
130
machines
during
peak
and
laws
of
the
day,
and
with
all
that
information
we
built
a
predictive
measurement
algorithm
to
measure
a
predictive
algorithm
to
measure
the
social
web.
So
where
did
that
start?
Well,
we.
D
A
A
A
A
This
is
not
exactly
what
one
would
call
a
homogeneous
environment,
but
we
what
we
have
works
for
us
to
get
things
done,
and
this
didn't
happen
overnight
either.
This
is
this
is
two
years
in
the
making.
So
how
did
we
get
there?
Well,
we
knew
them,
for
we
knew
that
we
needed
something
that
could
handle
a
lot
of
volume
very,
very
quickly,
because
once.
B
A
A
A
That
works
well
for
us
if
we
want
to
pull
all
the
tweets
that
happen
on
a
particular
article
in
a
particular
time
frame,
we
can
do
that
because
we
can
say
just
give
me
all
the
tweets
if
you
tried
that
in
my
sequel
on
you
know
a
table
that
had
five
million
rows.
Good
luck
because
we
did
and
it
didn't
work
too.
Well,
we.
A
We
don't
need
to
keep
all
the
data
forever.
We
can
just
say
this.
Data
is
good
for
30
days
and
then,
at
the
end
of
30
days,
a
terabyte
of
data
just
disappears
and
is
long
if
you're
not
sticking
around
cleaning
up
a
terabyte
of
data
at
the
end
of
every
30
days,
your
life
gets
a
lot
easier,
especially
when
you
have
to
do
it
across
multiple
systems.
A
A
A
A
Keeps
our
of
time
we've
actually
been
the
the
Amazon
outage
that
happened
a
couple
months
back
we're
entire
data
center
went
down
one
of
the
availability
zones.
We
didn't
even
notice.
We
got
a
few
alerts,
but
none
of
our
systems
complained
other
than
hey.
I
can't
talk
to
the
Cassandra
servers
and
we
were.
We
ran
just
fine,
so.
A
Know
put
them
in
the
exact
same
category.
Well,
one
of
the
things
we
really
like
about
mondo
is
the
atomic
increments.
It's
incredibly
fast,
because
a
lot
of
the
stuff
we
do
is
with
JavaScript
and
in
nodejs.
It
drops
down
to
the
wire,
sends
it
across
in
the
beasts
on
protocol
and
as
a
result,
it
is
fire
and
forget.
Those
increments
have
been
faster
than
on
most
memcache
or
reddit
servers
I'll
get
to
why
that
happens
in
a
minute
yesterday
released
their
latest
version,
which
includes
hashed
shard
keys.
This
was
a
real
problem.
A
B
A
B
A
Gives
a
b-tree
indexes
until
Cassandra
provides
some
support
for
deterministic
indexing.
We
can't
do
range
queries.
You
can
do
range
warriors
where
you
reduce
the
cardinality
like
with
secondary
indexes,
but
on
really
really
large
data
sets
you
return
times.
Your
query
response
times
are
just
not
going
to
be
sufficient,
or
at
least
they
weren't
for
us
so
being
able
to
have
access
to
btree
indexes
of
agri
data,
which
is
what
we
store
in
is
very
handy.
We
we
also
get
to
take
advantage
of
document
based
stuff.
A
We
store
mostly
aggregated
statistics
collections
in
and
that's
why
documents
work
very
well
for
us
and
again
ttls
when
you
can
just
get
rid
of
we
don't
exactly
have
terabytes
in,
but
when
you
can
get
rid
of,
you
know
tens
of
gigs
or
in
some
cases,
hundreds
of
gigs
of
data
at
the
end
of
30
60
90
days.
It
also
becomes
very
handy
Redis.
Why.
A
A
A
A
Everything
you
do
is
going
to
be
fast.
When
you
query
it,
it's
got
great
support
for
transactional
operations.
You
can
roll
back
if
you
really
need,
because
you
can
do
that
on
the
application
side
and
it's
a
centralized,
queuing
and
locking
system
I
know.
John
talked
this
morning
about
the
attempt
to
bring
that
locking
system
into
Cassandra
and
they're
still
working
on
that,
but
in
the
interim
we
use
Redis
as
our
queuing
and
locking
system.
A
What
do
we
use
info
bright
for?
Well,
if
you
don't
know
what
info
bread
is
it's
a
my
sequel
based
column
store,
which
is
basically
storing
the
columns
vertically
rather
than
horizontally,
when
you
think
of
the
normal
way
that
my
sequel
stores
data
the
way
you
envisioned
it
and
the
way
it
comes
up
to
you
in
the
console
horizontally
is
exactly
the
way
my
sequel
stores
it
on
desk.
In
most
cases,
columnstore
is
stored.
A
The
opposite,
if
you
were
to
just
slice
out
that
vertical
section,
that's
the
way
it
stores
it
on
disk
can
do
things
like
aggregating
along
the
column
much
much
more
quickly,
so
it
allows
for
ad-hoc
queries.
It
works
for
the
standard,
my
sequel
driver,
it's
great
for
for
ad
hoc
analytics,
you
get
great
compression
or
your
data,
and
you
it
allows
for
pre
aggregation.
A
A
Many
programming
languages.
Well,
we
got
a
few
reasons
for
that,
the
biggest
is
each
language
has
its
own
benefit
right
for
us
in
order
to
really
understand
the
social
space,
we
found
that,
although
we
started
with
Ruby
and
Ruby's
great
for
quick
development
and
especially
for
a
lot
of
the
web
work,
it
just
didn't
support
all
the
data
science
stuff
we
needed
and
there
if
there
was
a
little
graphic
for
our
rather
than
the
letter.
A
R
I
probably
would
have
put
that
up
there
too,
but
frankly,
the
letter
ours
is
not
as
cool
as
a
gopher,
so
we
use
Python
for
a
lot
of
our
data
science
and
that's
what
our
data
science
team
spends
most
of
their
time
with,
we
do
a
little
work
and
see
as
well,
but
for
the
most
part
we
stick
with
Python.
So
everything
that
we
do,
then
that
needs
to
access
any
of
our
data
stores.
We
need
to
be
able
to
access
in
Python
anything.
A
A
A
A
A
A
Not
come
out
exactly
the
way
you
want
it
to
so
there's
a
problem
to
doing
all
this.
Each
data
store
and
each
language
does
come
with
its
own
cost.
So
Redis
you
can
only
use
a
single
core
and
if
you
want
to
get
one
of
these
beefy,
eight
core
machines
with
64
gigs
of
memory
in
Amazon
and
just
start
throwing
everything
at
it.
That's
great,
except
it
will
still
only
use
one
core.
So
the
way
we
solve
that
was.
A
We
took
one
of
those
great
eight
core
machines
and
we
stuck
seven
instances
of
Redis
on
it.
Did
CPU
pinning
and
made
sure
that
anything
that
we
wrote
and
anything
that
we
any
transaction.
We
knew
that
we
were
going
to
have
to
pay
the
serialization
d,
serialization
price
for
anything
string
related
and.
A
A
Was
much
much
faster
than
that
took
45
minutes.
So
again
you
need
to
know
the
trade-offs
that
you're
making
and
the
other.
The
big
trade-off
that
we
had
with
cassandra
is
that
it
didn't
have
vietri
indexes,
and
you
know
I
explained
a
little
bit
why
that
was
important.
But
if
you
want
to
be
able
to
run,
you
know
range
queries,
that's
going
to
be
a
problem
and
that's
going
to
be
a
problem.
As
your
data
size,
data
set
size
gets
larger
if
anybody's
used
at
all
regularly
there's
two
things
that
you'll,
probably
notice.
A
A
Have
a
replica
pain
time,
that's
forced
on
you
a
has.
This
idea
of
the
least
amount
of
tunable
is
possible
for
a
working
system,
which
is
great
in
theory.
Unless
you
want
a
working
system
in
a
in
something,
that's
not
ideal,
say
the
cloud,
something
that
you
can't
predict
what
the
latency
is
going
to
be
between
two
machines
at
any
given
time.
So
you'll.
A
A
A
A
Nodejs
shop
and
we
still
use
javascript
very
heavily
as
well,
but
we
found
that
the
exception
handling
is
rather
limited
and
having
to
bubble
everything
all
all
the
way
up
to
the
top.
If
you
get
an
unknown
or
an
unexpected
exception
is
not
only
very
costly,
the
application,
but
most
programmers
eventually
start
to
hate
it,
even
those
who
love
javascript.
So
what's
the
takeaway?
Well,
the
takeaways,
very
simply
that
all
this
stuff
takes
a
lot
of
work.
All
these
lessons
that
we
learned
and
all
these
things
that
I'm
presenting
in
a
single
line.
D
A
Service-Oriented
architecture
and
what
we,
what
we
like
to
refer
to
as
our
internal
API
and
what
that
is,
is
basically
a
layer
that
asks
that
you
can
ask
any
question
to,
and
it
knows
where
to
get
the
data
from
at
any
given
point.
So
we
have
all
these
data
stores
and
all
these
languages,
so
you
just
use
that
JSON
layer
communicate
over
HTTP
ask
the
JSON
as
the
API
for
whatever
you
want
say.
A
Recent
hour
it'll
go
to
to
Cassandra
for
everything
that
happened
within
this
UTC
time
period
and
then
prior
to
that
it'll
go
to
info
bright
for
the
rest
of
the
data
and
I'll
return,
all
of
it
to
you
in
a
single
JSON
response,
I
mean
that
alleviates
the
developers
from
having
to
know
anything
else
other
than
I
need
to
ask
one
particular
location.
This
question.
A
The
other
thing
you
have
to
worry
about
is
your
data
accuracy.
When
you're
dealing
with
multiple
data
stores,
you
need
to
know
that
what
it
looks
like
what
data
looks
like
in
one
place,
what
it
looks
like
in
the
other
place
and
the
only
way
that
you
can
find
out
that
it's
wrong.
The
first
time
is
when
someone
else
most
likely
external
to
your
organization
says
a
doesn't
match
be.
So
how
did
we
handle
that?
We
did?
We
started
out.
Writing
checks.
Programmatically.
We
found
out
that
sometimes
that
works.
A
Sometimes
it
doesn't
so
we
had
to
spot-check.
Once
we
figured
out
what
we
were
spot
checking
regularly,
we
were
able
to
write
slowly,
more
programmatic
and
ultimately
algorithmic
tests
to
see.
Does
this
data
make
sense,
and
the
biggest
reason
for
this
was
when
you're
dealing
with
things
that
are
external
to
your
organization,
like
data
that
you
cannot
control?
For
instance,
in
our
case,
we
deal
with
all
the
social
api's.
You
know
Twitter
Pinterest,
Facebook
LinkedIn.
D
A
A
Seven
and
then
you're
like
but
I,
don't
know
about
that.
You
just
told
me
it
was
500
and
then
you
ask
him
again
the
like
no
it's
504
and
that's
really
confusing,
especially
when
you're
looking
at
it
against
multiple
data
stores.
So
in
order
to
understand
how
to
make
those
differentiations,
you
need
to
first
figure
out
where
the
problems
are
and
that's
a
lot
easier
to
do
visually
most
times
than
it
is
programmatically.
So
the
other
thing
that
the
service
oriented
architecture
allowed
us
to
do
was
build.
D
A
Allowed
us
to
build
a
framework
for
testing
and
what's
really
cool
about
this,
is
that
any
time
we
want
to
test
something
new?
We
don't
actually
have
to
bring
our
system
down.
If
we
want
to
test
a
new
queuing
system,
say
right
now
we
use
Redis
but
we're
pulling
or
putting
n
SQ
in
there
and
we
tried
RabbitMQ
for
a
little
while
we
didn't
have
to
bring
anything
down.
A
A
D
A
Like
to
give
is,
has
built
this
tangent,
the
folks
who
make
tengen
and
built
this
great
system
called
mms,
and
it
works
very
well.
It's
just
not
great
to
look
at
right.
It
just
tells
you
how
your
systems
are
doing
on
a
superficial
scale.
Right
shows
you.
Some
graphs
tells
you.
You
know
your
flush
times,
things
that
are
important
to
the
system
level
side
and
then
datastax
built
this
thing
called
off
center
right
and
if
you
use
datastax
enterprise
off
center
is
basically
this
interface.
A
That
tells
you
on
a
low
level
or
a
high
level
exactly
what
your
systems
look
like,
and
it's
much
much
nicer
to
look
at,
and
the
benefit
is
that
we
get
to
look
at
how
the
systems
are
performing
in
a
nice
way,
and
if
we
really
want
to,
you
know,
feel
bad
about
ourselves.
We
could
go,
look
at
them
and
mass
and
be
like
all
right.
We
got.
A
A
Is
just
how
the
systems
look?
The
the
internal
API
is
the
guard
right.
Nothing
can
nothing
goes
past,
nothing,
query!
Those
things
directly.
The
internal
API
holds
the
drivers
for
all
of
those
systems
and
as
long
as
the
the
developers
don't
need
direct
access
and
there
really
isn't
a
need
for
them
to
have
direct
access,
they
can
go
straight
to
the
internal
API
and
ask
any
question
and
have
a
return
to
them.
Concurrently.
A
The
other
lesson
we
learned
is
trying
to
keep
a
packet,
the
path
of
the
packet
consistent
down
the
line,
and
what
does
that
mean?
Well,
for
example,
if
you
take
me,
if
you
have
what
comes
in
from
the
internet
on
one
side
and
the
data
stores
on
the
other,
just
traveling
down
the
first,
the
first
set
of
diagrams
that
you
see
that
are
yellow,
we
take
a
look
at
the
Twitter.
A
A
Are
you
know
the
facebook
and
twitter
api
for
things
that
they
don't
pass
down
the
fire
hose,
and
then
we
have
collection
systems
for
like
Google+
and
other
tears
that
don't
give
us
direct
access
that
we
have
partners
for
everything
that
comes
down
the
line
gets
immediately
dropped
onto
the
queuing
system
once
it
is
the
queuing
system,
it's
picked
up
by
a
consumer
or
a
worker.
The.
A
Accesses
the
API
to
make
an
equal
to
ask
any
questions
that
it
needs
to
process
the
data.
The
internal
API
will
then
ask
the
questions
of
the
data
stores
and
then
pass
it
back
to
the
consumers.
This
way,
if
there's
a
breakdown,
you
know
roughly
where
the
breakdown
happened,
because
if
the
packet
made
it
this
far,
it
had
to
follow
a
certain
path.
A
The
other
thing
that
we
did
is
this
is
how
we
distributed
our
architecture
when
we
put
everything
into
the
cloud
like
most
people,
we're
like
us,
East
1a,
just
spin
up
nor
machines
and
you
East
1a,
and
that
was
great
until
us,
East
18
went
down
and
then
so
did
we
everything
went
down.
So
what
we
did
was
we
took
a
look
at
the
distribution
of
our
data
and
the
distribution
of
our
systems
and
said:
okay,.
B
A
A
You
won't
even
notice,
and
that's
actually
what
happen
to
us
and
the
same
thing
goes
for
all
of
the
other
system
types,
but
the
key
here
then
the
key
here
for
us
was
having
this
internal
API
be
accessible
across
any
one
of
these
across
any
one
of
the
the
availability
zones.
Because
when
you
ask
any
one
of
the
internal
API
servers,
it
knows
about
the
systems
and
the
other
availability
zones.
A
B
A
A
A
For
instance,
we
tried
to
off
load
our
queuing
system
onto
Amazon
because
we
figured
hey
these
guys,
probably
got
it
right,
like
they
work
with
a
lot
of
large-scale
systems,
so
we
tried
to
off
load
our
queuing,
remove
it
from
Redis,
and
we
figured
out
that
for
us
to
maintain
our
own
queuing
system
would
cost
us
roughly
twenty-five
hundred
dollars
a
month
at
the
time.
This
was
just
given
our
workload
and
the
machine
cost.
So
then
we
tried
all
floating
it
on
Amazon.
A
A
Not
10
grand
or
13
grand
a
month
easier.
On
the
other
hand,
when
we
wanted
to
offload
some
of
our
are
scaling
when
we
needed
to
do
just
simple
web
services,
scaling
Beanstalk
will
allow
us
to
spin
up
a
new
rails
app
with
the
Ember
front,
end
on
it
with
all
of
our
cache
data
sitting
on
that
machine,
and
it
cost
us
less
than
half
of
what
bringing
up
a
new
instance
would
cost
just
to
handle
that
traffic
period.
A
So
the
reason
that
people
run
in
the
cloud
isn't
just
because
it's
cheaper
or
you
know
more
highly
available,
it's
because
the
services,
or
at
least
for
us
it's
because
the
services
that
they
offer
and
that
are
they
are
continuing
to
offer,
make
both
the
job
of
the
developers
and
the
people
that
do
the
ops
in
our
systems.
That
makes
their
lives
a
lot
easier,
and
you
know
just
even
going
down
the
list.
A
You
can
see
that
that
most
of
these
features
weren't
wearing
around
six
to
nine
months
ago
and
I
think
other
than
maybe
the
queuing
service
and
an
redshifts.
We
probably
use
all
of
them
at
simple
reach
in
at
least
one
form
or
another,
but
to
really
take
advantage
of
all
of
that,
we
we
really
needed
to
figure
out.
You
know
a
good
way
of
expanding
the
role
of
what
one
person
was
capable
of
doing
so
we
have
one
really
really
smart,
DevOps
guy,
who
is
insanely
overworked
and
probably
not
too
happy
at
that
fact.
A
And
for
those
of
you
don't
know,
chef
is
a
configuration
management
system.
We
started
out
with
me
logging
into
every
machine
and
installing
everything
that
was
necessary
by
hand
copying
binaries,
bringing
the
data
over
and
ultimately
hating
every
single
person
that
I
came
in
contact
with
after
setting
up
a
machine
once.
A
Scalable
for
me,
or
anybody
in
my
immediate
vicinity,
we
decided
that
you
know
we
would
shift
over
to
chef
chef
allowed
us
to
bring
up
a
machine
just
by
saying
I
want
another
one
that
looks
like
this,
and
then
amazon
came
in
and
said.
You
know
what
we
can
do
better
than
that
you
can
just
build
these
things
yourself.
You
can
build
a
template
yourself
and
rather
than
saying,
I
want
another
machine
that
looks
like
this.
A
Anytime
that
we
needed
to
deploy
something
it
didn't
just
become
a
matter
of
spinning
up
the
machine
building
engine
X
building
rails
deploying
the
latest
code
base.
We
would
just
say
give
me
one
that
looks
like
this
and
deploy
this
code
hash
from
from
github.
The
other
thing
we
did
is
took
has
made
really
extensive
documentation,
because
if
our
DevOps
guy
gets
hit
by
a
bus
and
I
know,
everyone
loves
that
scenario.
A
A
A
You
also
should
know
when
to
build
your
own
tools
when
to
use
existing
tools
and
how
to
integrate
the
tools
that
are
out
there
and
then
again,
there's
a
lot
of
folks
that
that
stick
strongly
to
use
only
there's
a
lot
of
books
that
stick
strongly
to
build,
but
I
feel
like
in
the
best
environments
I'm.
You
involve
the
best
tools
that
not
only
work
for
your
organization,
but
that
work
for
the
people
in
your
organization.
A
A
A
B
A
Thing
is
monitor
everything.
Automation
does
not
come
easy.
It
takes
a
lot
of
time
to
build
up
to
the
point
where
you
can't
automate,
but
it
will
ultimately
make
your
life
easier.
So
once
you
figure
out
the
best
processes
for
you
and
once
you
figure
out
the
way
that
your
system
works
and
the
best
way
to
keep
your
systems
working,
automate
ways
to
get
yourself
there
and
I
just
want
to
leave
you
with
you
know
something:
I
alluded
to
at
the
beginning.
A
A
Company
for
15
years
I
mean
it's
just
not
gonna,
it
would
be
great,
but
it's
not
going
to
happen.
So
when
you
set
things
up,
set
things
up
as
if
you
were
walking
into
the
you
were
going
to
be
the
next
person
walking
into
it
right,
evolve
the
systems
so
that
when
it's
time
for
you
to
move
on
that,
somebody
can
walk
in
and
say,
I
understand
what
this
person
was
thinking.
Not.
I
would
love
to
kill
the
last
person
who
had
this
thought
the
new.
Thank
you
slide
the
new
thanks
for
listening.
A
A
C
A
A
I
know
where
this
data
ultimately
needs
to
end
up.
If
it's
a
piece
of
data
that
needs
to
be
available
immediately
as
a
real
time
as
a
real-time
number,
then
it's
going
to
go
to
because
our
system
uses
mongols,
almost
everything
that
we
do
in
real
time.
If
it's
going
to
need
to
be
cashed
at
some
layer
or
invalidate
a
cash,
it's
going
to
go
to
Reddit
and
or
make
a
query
to
rent
us
and
anything
that
needs
to
be
sort
of
etl
and
in
the
sense
where
it's
going
into
a
more
permanent
store.
A
Something
like
my
sequel,
where
you
can't
do
the
adjustments,
you
can't
do
the
updates.
You
can't
do
the
delete,
then
we
put
it
all
off
to
the
side
in
a
separate
queuing
section
where
we
say
come
back
later
process
this
differently
and
if
there's
anything
that
we've
learned
about
this,
that
may
invalidate
it.
Let
us
know
that,
for
instance,
to
go
back
to
my
example
with
Twitter.
If
twitter
says
you
know
hey
this
thing's
got
500
tweets.
A
Now
it's
got
seven
and
then
it's
got
504
and
we,
you
know
during
that
period
of
time
of,
say:
we
use
the
granularity
of
a
day
if
that's
changed,
and
then
we
go
to
do
our
daily
roll-ups
and
we're
like
whoa.
This
makes
no
sense
like
you
will
never
see
tweets
drop
by
that
much
then
we
process
it
and
say:
okay,
we're
going
to
we're
going
to
get
rid
of
this
middle
data
point
most
likely.
A
Our
clients
say
we
need
real-time
data
on
XYZ
and
we
need
it
and
real-time
has
its
own,
but
in
some
cases
they
need
it
within
a
minute.
In
some
cases
they
needed
within
30
seconds.
It's
not
like
the
financial
social
data.
In
this
case,
it's
not
like
the
financial
world.
It's
not
microseconds,
it's
you
know.
Sometimes
a
minute
I
mean
you
can
get
away
with.
You
know
minute
minute
and
a
half
in
some
cases,
but
for
the
most
part
the
workers
decide
what
goes
where
when
and
that's
decided
in
advance.
C
A
A
We
wouldn't
because
the
the
problems
that
we
have
to
solve
there
is
no
one-size-fits-all.
There
is
no
single
solution
that
that
does
everything
that
we
need
in
some
cases,
we
need
to
organize
the
data
in
a
purely
chronological
way
and
that's
pretty
easy
because
you
know
with
Cassandra.
You
just
store
it
by
date.
Well,
if
you
want
to
store
it
by
date
and
by.
A
A
A
So
you
know
that
piece
of
the
architecture
is
also
very
important
and
is
also
a
requirement.
That's
that's
given
to
us.
I'd
say
that
the
only
thing
that
we
probably
would
have
done
sooner
is
built
the
internet
intermediary
layer
that
service
architecture.
Earlier
because
early
on,
we
were,
we
were
doing
everything
direct
to
the
datastore,
which
meant
that
every
engineer
knew
had
to
know
how
to
not
only
effectively
right
to
every
datastore
but
needed
to
know
the
schema
and
the
storage
pattern
and
all
that
stuff
took
probably
longer
to
figure
out
and
or
document.
A
B
B
E
E
A
That's
that's
why
we
went
to
something
like
chef
right
because
chef
just
says:
here's
what
the
machine
should
look
like
chef
says
if
this
is
a
I
mean
we're
very
particular
with
using
Amazon
Linux.
But
if
you
want
it
to
use
like
sent
to
us
or
something
as
your
base
image,
you
could
say,
I'm
going
to
use,
sent
OS,
56
and
sent
to
us
56
is
going
to
look
pretty
much
the
same
on
rackspace
or
amazon.
So
you
say
I
want
sent
to
us
56
with
all
these
packages.
Spin
me
up.
A
Well,
the
cueing
itself
is
just
merely
an
application
that
sits
on
top
of
one
of
those
systems.
If
we're,
if
we're
using
an
Amazon
specific
service,
then
yeah,
then
we
have
locked
ourselves
in
a
little
bit,
but-
and
we
don't
do
that
in
all
cases
there
are
yes
that
can
become
a
problem,
but
it
has
not
yet
to
date
real.