►
Description
Speakers: Matt Pfeil(DataStax), Eddie Satterly(Splunk), Edward Capriolo(Media6Degrees), Matt Conway(Backupify), Russell Bradburry(SimpleReach), Jake Lucianni(BlueMountain Capital)
A
Handle
today,
which
is
really
the
name
for
things
that
go
wrong
and
we're
going
to
teach
you
how
to
avoid
some
of
those
things
just
to
show
off
really
quick.
My
name's
Matt
file
on
the
co-founder
and
VP
of
customer
solutions
of
data
size.
When
we
go
down
the
panel
real,
quick
and
never
will
do
a
quick
introduction
of
who
you
are
and
what
company
you
work
for
and
how
they
interact
with
Cassandra.
B
C
D
E
F
F
A
F
A
B
So
we
were
initially
implementing
search
components
for
the
front
end
web
site.
We
built
a
data
model
that
we
were
all
sure
was
exactly
perfect
and
we're
going
to
be
able
to
get
to
get
out
the
way
that
we
intended
to,
and
they
only
figured
out
that
we
represent
wrong
and
from
a
performance
perspective,
the
retrieving
data
became
quite
painful
and
considerably
lower
than
it
had
been
promised
and
had
to
basically
scrap
and
restart
from
the
very
beginning
and
get
the
data
model
correct
the
next
time.
H
B
C
C
C
This
mindset
of,
like
oh
yeah,
we're
gonna,
be
able
to
stuff
like
four
terabytes
of
data
on
a
Cassandra
machine
like
to
makes
bram
it's
everything's
going
to
read
super
fast
magically,
so
you
really
have
to
benchmark
these
things
and
figure
out
how
how
much
data
you
have
how
much
random
read
is
going
to
be
in
your
data
and
then
try
to
size
hardware
appropriately,
I,
don't
really
think
anyone's
running
Cassandra
for
terabyte
disks.
I
could
be
wrong.
So.
D
The
list
of
things
we
did
right
is
a
lot
shorter,
but
in
terms
of
doing
things
wrong,
writing
millions
of
rows
instead
of
millions
of
columns,
I
eat
data
modeling,
that's
very
important,
really
impacts
performance,
not
leaving
enough
ram
for
the
colonel
to
cash.
You
know
I
ho
things
I'm,
not
giving
too
much
RAM
to
the
JVM,
which
kind
of
falls
over
after
a
certain
size.
Things
like
that
the
list
goes
on
Jana
studio.
We
started
on
Cassandra
no
dot
six
and
not
as
much
was
written
about
it
back
then.
So
I.
E
F
F
A
I
heard
a
common
theme,
though,
that
data
modeling
is
pretty
important
and
to
just
make
sure
everyone's
very
implications.
Their
consent
is
very,
very
good
large
amounts
of
data
ingestion,
but
you
have
to
plan
on
how
you're
going
to
read
that
data,
so
putting
a
little
bit
of
homework
and
upfront
solves
a
lot
of
pain
after
the
facts
and
the
data
size
we're
working
with
customers
of
ours.
We
actually
see
that
that's
probably
probably
eighty
percent
of
the
issues,
and
so,
if
you
get
that
right
up
front,
it
actually
sort
of
just
works
after
that.
A
G
A
C
Obviously
there's
the
thing
Jonathan
talked
about
in
the
morning
about
how
they're
moving
to
a
j-bot
set
up
instead
of
raid,
it
was
initially
hard
to
figure
out
what
was
the
right
raid
for
Cassandra
servers
and
and
making
the
assumption
that
getting
a
raid,
a
raid
of
12
disks,
will
give
you
more
seeking
capabilities.
You
know
rotational
disks,
don't
really
see
faster
with
the
bigger
rates
that
they
just
stream
faster.
So
that
can
be
an
early
on
hardware.
C
D
We're
on
ec2,
so
no
hardware
per
se,
but
you
know,
choosing
the
right
instant
size
helps
a
lot
more
choice.
These
days,
you
can
use
SSDs
rotational
discs.
I
think
these
will
sweet
spot
when
I
was
looking
into.
It
was
extra
large
ec2m
one
extra
larges.
Those
work
pretty
well
have
just
enough
disk
space
just
enough
RAM
and
you
can
scale
out
pretty
easily.
D
F
F
A
Ssds
were
mentioned
in
there
and
one
of
the
common
things
that
we
also
see
today
is
I.
Think
there's
this
lack
of
belief
about
how
many
random
seeks
spinning
disk
and
do
just
put
this
in
perspective.
A
15k
SAS
drive
stands
for
15,000
rotations
permitted.
We
do
that
math.
That
means
it
rotates
a
little
over
200
times
per
second,
which
things
that,
if
you're
doing
random,
seeks
on
spinning
media,
the
most
random
seats
you
can
get
out
of
a
single
hard
drive
is
about
200.
A
K
H
A
And
say:
wow
SSDs
are
so
much
more
expensive.
Well,
first
of
all,
if
you
look
at
it
from
an
ions
perspective,
SSDs
actually
give
you
the
100
everyone's
Mac
or
doing
at
least
10,000
I
office,
but
the
cost
is
only
about
2
to
3
X,
especially
the
commodity
side.
So
we
highly
highly
recommend
starting
with
commodity,
SSDs
and
you're,
getting
ready,
because
your
price
points
actually
lawyer
whoa
to
lower
when
you're,
comparing
two
spinning
media
from
a
pure
input
output,
so
SSDs
sort
of
the
gods
grace
to
databases.
It
seems
like.
A
B
The
JVM
is
correctly
making
sure
you
understand
what
it's
called,
which
again
no
one
else
who
has
changed
substantially,
so
it's
a
little
easier,
but
we
ran
into
knocking
over
notes
because
of
evening
garbage
collection
several
times,
so
that
was
probably
the
biggest
name
Wayne.
The
biggest
thing
we
had
to
play
with
Micah
with
was
changing
around
the
day,
Liam
sizing
and
making
sure
we
had
the
heap
set
correctly.
As
far
as
the
common
tasks
we
we
actually
went
with
to
production
with
that.
B
C
Yeah
I
would
follow
up
with
that.
I
get
a
lot
of
good
points.
That
was,
that
was
an
early
on
thing.
It's
a
very
interesting
mix
of
how
much
for
have
you
have
in
a
machine
and
how
much
you
want
to
give
to
the
heat,
and
it
seems
very
appealing
to
use
like
rope.
Cash
is
everyone's
like
Oh,
memcache
memcache
just
put
everything
in
memory,
but
you
could
do
yourself
a
disservice,
like
Jonathan
mention
le
there's,
JVM
fragmentation
issues
so.
C
C
Give
yourself
that
free
overhead
room,
because
that
is
what
what
will
kill
you
in
performances
you're
like
95,
percentile
latency.
If
a
hundred
requests
go
in
two
milliseconds,
but
one
takes
10
milliseconds
right,
that's
going
to
hang
up
your
clients,
possibly
and
hang
up
your
web
application,
so
you're
really
trying
to
optimize
not
for
the
hundred
reads
that
are
fast,
but
for
the
one
that
may
be
slow.
D
On
top
of
all
that,
there
are
a
number
of
things
really,
you
know
we're
still
doing
everything
manually,
we're
not
using
maximum,
so
tips
along
that
front
understand
how
your
token
ring
works.
You
know
where
the
data
for
arrange
is
where
the
replicas
are.
One
thing
we
do
that's
proven
useful
is
you
know,
starting
at
token
zero
that
would
be
Cassandra
01,
looking
whatever
Oh
too
kind
of
the
node.
H
D
Matches
relatively
where
it
is
in
the
ring,
can
helps
you
visualize
it
when
you're
doing
maintenance
tasks
get
familiar
with
jconsole.
You
know
you
can
dig
in
quite
a
bit
on
there's
a
lot
of
data
in
there
that
you
can
pull
out
and
look
at
even
better
would
be
to
use
something
like
collecti
or
graphite
to
automatically
collect
importance
that
so
you
can
graph
them
over
time
and
then
sort
of
aggregate
them
across
your
cluster
or
drilling
due
to
individuals.
E
It
an
important
note
on
knowing
your
rain,
especially
if
you're
in
the
Amazon
Cloud,
knowing
where
your
replicas
ours
is
extremely
important.
We
have
our
our
rain,
set
up
to
have
a
rapid
RF
of
three
in
a
replica
in
one
replica,
each
different
availability
zone
in
three
different
availability
zone
so
that
we
could
theoretically
lose
to
they
will
use
Ohm's
and
still
have
an
entire
set
of
data.
F
So
the
question
was
about
done
the
emmaus
task.
Yes,
well,
you
know
so
we're
using
Cassandra
12.
So
fortunately
we
don't
have
to
worry
about
the
poking
a
token
placement
problem,
which
is
which
is
certainly
one
of
the
largest
ones.
I
would
say
our
biggest
maintenance
thing,
or
you
know,
sort
of
our
current
headache
is,
is
really
around
cup
compaction
problems,
and
this
is
something
that
will
I'll
be
talking
about
it
in
my
top
later
on
a
day,
but
going
back
to
to
monitoring.
F
F
Bremen
and
graphite,
and
all
those
things
so
being
able
to
basically
track
every
stat
on
every
column,
family,
on
every
node
in
the
cluster
and
being
able
to
have
a
tool
like
graphite,
it's
kind
of
to
kind
of
build
your
your
own
dashboards
is
really
really
useful.
So
it
really
helps
us
object,
input,
problems
and
and
fix
them.
A
As
we've
gone
across
the
group
here,
we've
had
a
few
things
come
up
about
running
in
either
amazon
of
the
cloud.
Why
don't
we
just
touch
on
that?
A
little
bit
more
deeply
and
talk
about
some
of
the
things
you
recommend,
with
whether
it's
amazon
or
a
different
cloud
provider
that
you
either
highly
recommend
to
do
or
highly
recommend
to
avoid
at
all
costs?
Well,
since
I
don't
know
which
of
you
guys
actually
run
on
Amazon
start
with
Matt
and
we'll
go
over
there.
I'm.
D
Sure
what
I
haven't
said
yet,
obviously
you
know
get
the
the
right
instant
size
for
you.
Need
you
don't
want.
The
center
doesn't
work
too
well
with
too
much
disk
space.
We
found
that
the
extra
larges
worked
great.
It
was
like
a
terabyte
of
disk
granot
da
garam,
half
of
which
me
of
the
Cassandra
half
of
which
we
give
to
you,
know
everything
else
that
works
really
well.
You
know
similar
to
sorry
for
north
what
we
we
also
spread
out
across
availability
zones.
D
D
Quarterly
not
do
not
use
EBS
use
the
ephemeral
stores
or,
if
you
can
afford
it,
use
I
think
you
can
do
provision
by
apps
which
are
effectively
SSDs,
I
think
or
we
can
get
SSD
instances.
What
we
do
is
we
use
the
femoral
stores
and
we
raid
them
together
in
a
stripe,
configuration
to
get
the
maximum
amount
of
bandwidth
out
of
them.
That
I.
D
E
Have
a
very
similar
setup,
we're
using
home
or
using
the
extra
large
machines
that
are
about
15
gigs
of
RAM.
We
give
we
give
92,
Cassandra
and
rest
to
the
operating
system.
I
think
like
like
he
mentioned.
Bbs's
is
a
no
go
there.
The
instances
we
use
come
with
a
for
ephemeral
disks
that
we
stripe,
that
we
strived
and
just
all
together
and
kind
of
right
right
to
those,
because
it's
it's
a
lot
faster
SSD
machines
are
are
definitely
proven
to
be
faster.
L
E
I,
actually,
just
we
we
we
don't,
do
we
don't
have
any
traditional
I
guess
back-up
plan.
Our
back-up
plan
is
our
multi
data
center
replication
and
certification
factors
between
the
Nova's.
So
we
don't.
We
don't
hold
data
down
anywhere.
If
there's
a
disk
failure
of
a
node,
we
consider
that
node
lost
and
we
replace
it
so.
D
D
Since
we're
back
a
company,
we
also
backup
our
Cassandra
cluster
I
think
nightly
or
weekly.
We
dumped
the
basically
just
tore
up
the
data
volcano.
The
data
store
on
this
to
a
snapshot
back
it
up.
Dump
the
tar
files
in
s3
just
doesn't
extra
security,
blanket
we've
never
needed
it,
but
it's
there.
Okay,.
L
C
Believe
also
here
in
the
cloud
Netflix
has
its
Priam
tool
and
I
believe
the
Priam
has
some
features
to
deal
with
that
and
I
really
low
level,
since
SS
tables
are
right
once
if
you're
clever
about
it,
you
could
kind
of
watch
the
directory
and
see
when
new
SS
tables
and
bloom
filters
are
created
and
when
old
ones
are
deleted
and
you
could
kind
of
copy
them
incrementally
somewhere
else,
so
you
don't
always
have
to
do
the
full
at
way.
Let's
get
everything
and
move
it,
you
can
usually
be
slightly
more
intelligent
about
it.
A
A
Netflix
does
everything
from
taking
those
snapshots
periodically
in
pushing
the
history
like
every
30
or
60
minutes
to
every
single
operation,
depending
on
how
important
the
data
is
that
you
need
to
be
able
to
recover
from
very
very
quickly,
and
the
nice
thing
is
on
that
human
error
recovery
aspect.
If
you
are
going
to
recover,
you
want
that
data
on
the
machines
already
so
just
having
it
the
snapshots
locally
on
the
machines.
A
Allow
you
to
recover
very,
very
quickly:
first
copying
it
and
recovering
from
somewhere
so
and
I
didn't
really
comment
on
the
general
Q&A
policy.
If
you
have
a
question
about
what
we're
doing
up
here,
please
raise
your
hand
and
we'll
get
a
mic
over
to
you
and
I'm,
going
to
save
some
time
at
the
end
for
just
general
questions
all
together.
So
if
it's
about
the
sort
of
topic
just
raise
your
hand
and
you'll
come
running
with
a
microphone,
so
we
talked
a
little
bit
about
backups
and
the
cloud.
A
Why
don't
we
sort
of
bring
that
right
away
into
multi
data
center
and
the
fact
that
Sanders
a
whole
does
have
a
true
multi
data
center,
offering
which,
in
my
opinion,
means
you
can
have
as
many
as
you
want,
they're
all
active
for
rights,
which
is
extreme
rarity
and
a
huge
advantage
for
this
technology?
What
are
some
of
the
things
that
you
guys
have
encountered
with
multi
data
center
scenarios
that
either
make
your
life
better
or
worse,
or
things
that
you
can
do
to
really
make
that
easier?
Let's
start
down
Jake!
F
Yeah
we
currently
run
with
the
two
data
centers
and
I
would
say
you
know
the
benefit
of
that
is
our
compute
nodes
sort
of
run
run
in
in
in
both
data
centers.
So
it
basically
means
to
act
access
data,
the
compute
nodes,
don't
don't
have
to
pull
data
across
the
wind.
On
the
other
hand,
the
problem
is,
is
that
currently
you
know
we're
running
at
at
RF
6,
so
that
means
we
have
like
six
copies
of
the
data
and
three
in
three
in
each
data
center.
So
you
know
the
downside
of
that.
F
E
The
the
fact
of
the
matter
is
is
that
all
those
rights
are
ultimately
going
to
happen
on
that
three
node
cluster
and
if
the
load,
the
I.
Oh
wait
on
that
three
node
cluster
goes
through
the
roof.
It
will
ultimately
slow
down
your
larger
cluster
that
and
be
aware
of
your
latency.
The
the
latency
between
the
clusters
is
also
something
to
be
aware
of
we're.
D
Not
using
any
of
the
multi
DC
stuff,
you
know
for
our
needs.
We
just
wanted
a
little
extra
reliability
or
we're
done
I.
Guess
safety
fail
safety,
so
we
spread
them
out
across
the
availability
zones
and
that
works
for
us.
It
was
a
lot
simpler.
So
if
you
don't
need
it,
it
may
not
be
worth
the
extra
overhead
and
complexity
kind
of
go
with
a
simpler
broke.
Frequently.
C
I
just
was
thinking
of
scenario
that
rent
a
lot
of
people
head
into,
which
is
Cassandra
logic
and
figure,
both
a
data
center
name
in
a
rack
name,
and
unless
you
really
really
know
what
you're
doing
you
don't
want
to
have
different
rack
names,
people
kind
of
think.
Sometimes,
oh,
this
thing's
really
interact
too
so
I
should
make
it
rack
too,
but
the
replication
strategies
have
some
special
logic
and
it
may
not.
C
B
Well,
so
exactly
the
opposite
of
that
is
what
we
did
was
great
now,
so
we
we
actually
implemented
across
two
data
centers.
Previously
we
had
two
separate
sequel
clusters
that
we're
doing
the
work
and
someone
had
to
go
in
and
aggregate
a
bunch
of
the
data
across
and
then
figure
out
how
to
combine
the
two.
B
If
traffic
was
going
to
one
data
center
and
the
other,
what
we
got
was
a
benefit
when
we
rolled
DSE
out
and
the
multi
data
center
strategy
was,
we
were
able
to
handle
rack
failures
within
the
data
center,
because
we
had
had
several
issues
with
going
over
power
limits
and
other
things
within
space
with
machines
that
shouldn't
really
been
there.
We
also
had
at
least
one
occasion
where
only
one
of
the
or
the
other
data
centers
could
be
online
at
a
given
time.
B
We
were
doing
patches
and
do
any
testing
so
having
that
capability
and
never
having
to
worry
about
being
able
to
serve
it
up,
because
we
had
a
large
link
between
the
two
sites.
So
if
compute
nodes
have
to
connect
directly
to
the
other
data
center,
it
wasn't
that
big
of
a
deal,
but
in
certain
scenarios
it
was
horrible
so
having
the
capability
to
run
one
side
or
the
other
or
both
and
be
able
to
have
all
of
the
search
result
data,
but
into
either
side.
D
C
A
cloud
hater,
I
guess
really
bad,
but
there's
a
lot
of
benefit
in
it
and
it
really
depends
on
who
you
are
and
what
you
want
to
do.
I
think
there's
a
lot
of
crowd
is
not
new
new,
but
it's
pretty
new,
so
everything
new
has
a
lot
of
pipe
which
makes
people
tend
to
use
them
the
wrong
way.
Sometimes,
and
let's
keep
that
mind.
B
I
well
personally,
I,
don't
feel
strongly
either
way,
but
depending
on
the
use
cases
to
really
get
to
look
at
because,
from
the
you
know,
my
previous
life,
we
could
not
have
possibly
gotten
the
performance
we
needed
or
the
scalability
that
we
needed
in
a
way
that
made
sense
financially
versus
buying
the
high-performance
computers
that
we
did
to
deploy
them
into
the
data
centers.
So
just
for
my
cost
perspective,
as
well
as
your
use
case
and
what
you
need
from
a
performance
perspective,
I
mean
it's
easy
to
make
a
decision.
B
You
know,
I
have
a
five
note
close
to
running
and
eight
addressed.
It
was
great
for
the
use
case
we
were
using
it
for,
but
we
had
144
nodes
running
in
data
centers
that
were
necessary
to
handle
the
scale
and
volume
and
the
cost
of
that
would
have
been
like
half
a
million
dollars
or
something
to
Amazon
versus
you
know,
buying
the
gear
which
was
considerably
less
even
with
power
and
everything
else
to
run
it
yourself
and.
A
We've
definitely
seen
in
our
customer
base
from
everything
from
startups
to
fortune
tens.
You
know
once
you
get
into
roughly
the
fortune,
500
or
maybe
even
fortune
1000.
The
cost
scale
is
just
so
much
better
than
you
already
have
the
infrastructure
in
place
to
go
ahead
and
use
your
own
infrastructure.
So
it
makes
sense,
but
as
opposed
to
Ed
I
love,
the
clouds
and
I
think
it
makes
sense
to
get
started
very
quickly,
so
the
lifecycle
of
machines,
especially
in
the
clusters,
they
generally
start
smaller
and
they
end
up
bigger.
A
C
C
Surprisingly,
Cassandra
does
work
fairly
well
with
different
mix
and
matches
of
servers
as
long
as
they
all
have
enough
capability
for
your
load
and
now,
with
the
really
like
I
think
we
mentioned
it
before
with
SSDs
they're,
not
even
really
that
expensive
anymore,
it's
you're
buying.
You
know
you
may
be
buying
a
four
to
eight
thousand
dollars
server
and
you're
going
to
either
spend
what
forty
dollars
on
a
SATA
disk,
or
you
know
three
hundred
dollars
on
a
scuzzy
to
an
SSD
disk.
C
It's
not
like
a
huge
cost,
so
I'm
pretty
much
all
on
the
SSD
market
as
well.
These
days
are
not
always
easy
to
get
your
hands
on,
especially
the
really
high
fusion-io
cards.
It
sounds
great
and
then
they
like
it
might
take
a
month
to
ship,
but
for
the
general
purpose
SSDs,
you
could,
you
know,
get
those
with
the
servers
and
no
issues
there.
C
Well,
I
don't
have
a
particular
model
because
I'm
not
really
involved
in
which
ones
we
get
and
why?
But
there
is
an
interesting
there's,
a
lot
they're
very
new.
So
there's
different
details
like
if
you
some
say
you
should
use
trim,
but
you
can't
use
trim
if
you're
using
hardware
raid,
but
then
some
disks
have
their
own
trim.
So
there's
like
a
lot
of
new,
you
know
you
got
some
people
suggest
you
should
use
a
different
different
disk
scheduler.
You
know
like
Linux
has
its
own,
this
scheduling
it
and
it
works
differently.
C
Maybe
you
should
do
lvm
instead
of
software
raid,
so
there's
a
lot
of
its
it's
not
as
cut
and
dry
as
it
was
with
the
hardware
RAID
you
just
say:
I'll
get
a
big
raid.
That's
it
hardware
controller.
We
know
it's
fast.
You
kind
of.
If
you
get
a
question
where
you're
someone's
scratching
your
head
and
then
you
have
to
investigate
your
decision
and
say
maybe
we
should
have
used
trim
with
lbm
an
ext4,
so
those
things.
G
I
C
N
D
E
D
A
D
H
E
Think
I
think
a
big
thing
to
look
at
it
just
like
if
you
have
to
determine
if
it's,
if
it's
a
cluster
white
issue
or
if
it's
just
a
something
something's
going
wrong
with
your
application,
maybe
it's
a
hot
spot.
Maybe
it's
something
else
that
may
not
be
indicative
of
maybe
growing.
Your
cluster
will
help
fix
that
problem
in
the
short
term,
but
it's
ultimately
going
to
resurface.
B
Yes,
I'll
do
that
I
just
said:
it'll
do
the
same
shameless
plug
for
my
new
employer,
because
when
I
was
in
my
previous
role,
I
actually
useful
I'm
very
heavily
to
monitor
the
environment,
especially
our
Java
Web
Services
side.
We
pulled
a
lot
of
stats,
but
the
thing
that
kind
of
hit
harder
than
anything
else
is.
B
So
we
were
constantly
having
to
look
at
it
to
see
how
we
scaled
up
and
we
basically
set
a
bunch
of
alerts
to
start
looking
when
we
started
seeing
trends
that
were
out
of
what
we
expected
and
adapt
it
that
way
and
we
use
puppet
to
deploy
all
the
systems.
So
it
was
very
easy
to
just
turn
up
new
nodes
if
we
needed
to-
and
we
had
extra
HP
cenotes
sitting
around-
that
we
could
bring
up
when
we
needed
either
to
swap
out,
because
we
had
a
failure
or
to
be
able
to
add
capacity.
C
Your
question
what
I
noticed,
especially
back
with
yet
with
pre
SSDs,
was
the
first
thing
to
go,
was
kind
of
I
ops
which,
but
you
could
tell
that
very
easily.
If
you
have
a
standard,
if
you're
running
top
right
top
breaks
down
your
CPU
into
user
system
and
wait
right,
so
wait
is
really
kind
of
the
key
for
those
type
of
systems.
C
It
says
I'm
a
process
running
and
I'm
waiting
on
disk
right
and
then
the
other
thing
to
look
at
is
if
those
are
all
good,
you
have
to
just
count
the
size
of
different
column
families
because
how
they
perform.
If
each
know
it
has
10
gigs
of
data,
there
is
even
with
SSDs
and
lots
of
RAM.
There
is
a
difference
between
how
it
would
perform
with
10
gigs
of
data
and
20
gigs
of
data.
You
know,
there's
bloom
filters
and
every
read
has
to
check
all
the
bloom
filter.
So
all
these
things.
C
So
if
you
watch
the
size
of
column,
families,
the
cache
hit
rate
and
those
latency
numbers,
you
can
see
when
things.
Hopefully
they
go
slowly
and
then
you
can
react
in
time.
But
if
you
turn
on
a
new
feature
and
then
you
get
a
ton
of
lower
than
you
have
to
think
about
the
new
feature
and
what
it
means
to
the
system
as
a
whole,
it's.
E
Each
machine
still
like
be
allocated
more.
The
TC
pauses
would
get
a
little
little
too
high.
It
kind
of
depends
on
your
use
pace.
So
if
you
have
small
column,
family
has
been
an
extremely
high
write,
write
throughput
and
you
know
scaling
up
my
work
for
you.
But
if
you
have
very,
very
large
families,
I
think
scaling
out
with
smaller
notes
might
might
be
a
better
idea.
Just.
D
L
A
F
Best
is
I,
don't
get
woken
up
at
4am
when
I
pissed
eyes
or
you
know
it's
it's
it's
very
hard
to
totally
lose
the
data.
There's
been
there's
been
times
when
you
know
you
think,
like
oh,
like
this
one
horribly
wrong,
there's
you
know
I
lost
like
to
nose
at
the
same
time,
writes
our
family
I'm,
losing
data
and
then,
when
you
print
leave
it
back
up
and
and
repair
everything
that
says
it
wrote
at
a
forum
is,
is
still
there.
E
G
E
E
Off
of
you
know
the
hour
that
the
event
happened
and
that
way
I
can
say:
okay,
if
I
want
to
build
an
hour's
worth
of
data,
I
just
pull
this
one
row
out
and
rebuild
it
across
my
cluster
or
and
in
addition
to
that
I
also
write.
You
know
numerous
amount
of
counters
at
the
time
that
each
and
vid
comes
in
for
the
different
ways
that
I
want
to
read
data.
So
we
have
this.
E
D
I
D
Top
of
that
I'm
not
have
to
work,
but
sharding
is
also
great
right,
sharda
for
a
scale
anyway.
Aside
from
what
you
have
to
know
about
it,
for
getting
your
data
model
right,
I
know
that
I
can
just
keep
putting
data
in
there
and
chances
are
I'll,
never
hit
the
ceiling.
I
guess
technically,
there
is
a
ceiling,
but
we
have
no
come
remotely
close
to
it.
Yet
we
write
a
lot
of
data
I.
C
C
Peer-To-Peer
is
really
the
best
part
because
I
we
have
a
lot
of
systems
and
they
all
have
their
their
pluses
and
minuses,
but
the
things
that
Cassandra
guarantees
are
good
to
me.
For
example,
we
have
my
sequel
servers
and
the
replication
breaks,
and
then
all
these
apps
are
reading
from
the
slaves.
But
what
are
they
getting
there
getting
something
old
because
those
applications
don't
know
that
the
three-tiered
replication
structure
is
somehow
behind.
C
Unless
you
build
a
lot
of
logic
and
to
try
to-
and
your
client
has
to
ask
all
these
questions
but
kassandra's
the
consistency
models
is
pretty
basic.
You
read
a
quorum
and
writing
quorum.
You
have
consistency.
Even
if
a
node
fails
because
a
lot
of
systems
they
don't
they
don't
have
that
like
when,
if
I
ever
have
to
kick
certain
things
over,
you
know
and
I
wonder
like
what
happens
to
commit
log.
C
Sometimes
I
really
don't
know,
because
there's
just
one
there's
one
thing
and
I'm,
not
one
hundred
percent,
confident
that
one
thing
is
right
in
all
situations.
You
know
if
you
have
a
my
sequel
master
slave.
What,
if
you
know
the
swap
disc
was
bad?
Could
a
quarry
be
off
or
something
who
knows
it's
just
nice
to
know
that
not
one
single
piece
of
something
going
wrong
could
make
the
whole
system
go
really
bad.
So
that's
what
I
like
about
it.
I.
M
B
Say
the
right
performance
was
probably
the
biggest
thing
we
had
45
sequel
servers,
which
will
replace
with
basically
eight
notes
of
Cassandra.
So
in
order
to
get
the
scale
of
rights
of
Anita,
there
were
45
sequel
servers
and
a
cluster
which
is
insanely
expensive
and
on
boxes
that
were
ninety
six
gigs
of
memory
and
twenty
four
quarters
and
a
crazy
amount
just
to
be
able
to
handle
the
load
that
was
out
there
and
we
replaced
it
with
eight
well
used.
I
PMI
data
blocks
hpc
notes,
but
we
replaced
it
with
basically
eight
blades
in
42.
A
J
B
Well,
a
you
basically
depends
on
what
your
workload
is,
but
I
would
definitely
recommend
you're
getting
three
or
four
nodes
in
the
environment.
I've
been
working
with
a
university
that
doesn't
have
a
lot
of
funny,
so
they're
they're
standing
it
up
on
basically
about
six
year
old,
Bell
nodes
that
were
retired
because
they
were
end
of
life
and
they
haven't
built
that
way
and
they're
doing
a
research
project
on
that
with
a
considerable
volume
of
data.
So
you
can,
you
can
run
it
on
anything.
B
You
I
have
an
instance
of
three
node
cluster
that
runs
on
my
macbook
when
I
need
to
stand
it
up
and
test
an
application
and
test
failure.
Scenarios
I
have
a
three
node
virtual
cluster
that
runs
on
VirtualBox
on
my
back
bro
and
you
know,
I
can
test
failure.
Scenarios
I
can
knock
over
those
of
the
cluster
I
can
handle
reasonable
right
volume.
It's
an
SSD
in
the
machine,
so
it
does
reasonably
well
I.
B
Don't
have
a
lot
of
disk
space
because
of
it,
but
I
mean
you
can
start
very,
very
small
at
home,
my
mac
minis.
I
have
three
back
minis
that
run
a
cluster,
so
it's
not!
You
don't
have
to
have
a
lot
of
expensive
hardware.
You
don't
have
to
have
a
lot
of
requirements
depending
on
your
use
case.
I
think.
H
E
Don't
want
to
go
through
all
the
trouble
of
setting
up
virtual
machines
and
joining
to
an
actual
cluster
there's
a
great
tool
out
there
called
CCM
the
standard
cluster
manager
and
it'll
actually
bring
up
a
localized
cluster
of
end
nodes
that
you
can
just
it's
very
easy
to
use
command
line
project
that
will
just
bring
up
local
cluster
of
n
nodes
and
you
neck
good
start
filling.
C
One
of
the
other
things
too
is
that
I
know
I.
One
thing
I
really
like
is
that
there
is.
You
could
use
Cassandra
embedded
very
easy
if
you're
developing
a
Java
like
tom
cat
I,
actually
find
it
very
hard.
When
using
my
sequel,
you
can't
have
an
embedded,
my
sequel
in
Hudson
or
Jenkins,
so
you
kind
of
in
a
packing
a
lot
of
workarounds
like
I'll
use,
h2,
but
an
age
to
the
statements
aren't
exactly
the
same.
H2
may
have
a
slightly
different
date.
C
Time
function,
what's
nice
about
Cassandra,
for
me,
is
that
I'm,
a
job
of
guy,
so
I
guess
I
like
it
this
way,
but
I
can
bring
out
my
app
and
Cassandra
in
a
single
JDM
and
test
and
tear
it
down
and
I.
Don't
need
to
do
anything
outside
of
the
realm
of
my
JVM
or
worried
about
how
hibernates
going
to
turn
this
query
on
this
database
and
what
it
means
I
could
actually
test
on
on
the
real
deal.
A
A
F
I
mean
if
you
pull
out,
you
know
the
color
palette
or
a
a
histogram
that
that
might
make
sense
to
basically
basically
store
that
metadata
per
row
or
or
or
you
could
use
a
composite
to
table
in
in
c
ql.
Three.
You
know
that
I
might
be
useful.
Also.
The
graph
targets
going
on
now,
but
probably
you
know,
has
has
some
useful
things.
Work,
you're,
building
a
graph
of
similar
features
to
build
a
recommendation
engine.
You
know,
I
know
that
it's
it's
a
use
for
that
as
well.
H
P
In
an
organization
that
doesn't
have
Cassandra
and
you
have
an
existing
application,
let's
say,
for
example,
it's
of
Maya
school
application
as
a
memcache
or
something.
What
would
you
consider
is
the
best
Queen
to
have
a
successful
portion
of
that
is
a
you
know.
You've
talked
about
like
server-side
cookies,
a
handle
it
x,
logging.
P
B
What
we
were
using
it
for
previously
is
are
basically
catching
up
our
search
results,
because
we
did
several
common
searches
that
happen
on
regular
windows
that
the
Refresh
times
were
very
low.
That
was
a
huge
when
a
matter
of
fact,
you
can
watch
the
presentation
from
last
year's
summit,
but
he
was
a
very
huge
one
for
us
and,
basically
a
hope
to
grow
and
is
continuing
to
grow
the
footprint
within
the
expedia
environment.
B
E
E
H
B
D
D
E
Used
brisk
particularly,
but
I
do
use
deeply
you
use
DSC
heavily,
which
is
the
data
section
of
rice
and
it's
wonderful
it
does.
It
does
exactly
what
you
would
expect
it
to
do
and
we
particularly
use
the
hive
portion
of
it
heavily
and
it
really
breaks
down
the
column
families
for
you
easily,
especially
if
you
have
wide
rows,
it
puts
it
into
a
long
and
skinny
type
table
for
visualization
purposes.
A
Okay,
well
I
want
to
say
thanks
to
everyone,
just
because
we're
about
five
minutes
over
at
this
point.
If
you
have
any
other
questions,
some
of
these
guys
would
be
available,
perhaps
right
now
and
then
we'll
have
a
room
full
of
experts
and
I
think
1250
when
lunch
starts
up
on
the
end,
for
thank
you.