►
Description
Ooyala has been using Apache Cassandra since version 0.4.Their data ingest volume has exploded since 0.4 and Cassandra has scaled along with it. In this webinar, Al will share lessons that he has learned across an array of topics from an operational perspective including how to manage, tune, and scale Cassandra in a production environment.
Speaker: Al Tobey, Tech Lead, Compute and Data Services at Ooyala
Al Tobey is Tech Lead of the Compute and Data services team at Ooyala. His team develops and operates Ooyala's internal big data platform, consisting of Apache Cassandra, Hadoop, and internally developed tools. When not in front of a computer, Al is a father, husband, and trombonist.
A
To
welcome
to
this
edition
of
our
Cassandra
community
webinar
series
delighted
today
to
have
with
me
Alto
be
al:
is
the
tech
lead
at
who
yalla
and
has
been
using
apache
cassandra
since
the
very
early
days?
And
there
are
very
few
people
in
the
world
that
have
as
much
experience
with
cassandra
as
al,
especially
you
know,
running
systems
in
production,
so
al
is
going
to
pass
along
some
great
lessons
learnt
around
extreme
Cassandra,
optimization
so
al.
Thank
you
very
much
for
joining
us
today.
We
will
pass
you
the
ball
and
take
it
away
all.
B
A
Oh
one
more
thing:
I
forgot
my
little
housekeeping
item.
As
always,
we
will
be
taking
questions
via
WebEx
use,
the
webex
Q
&
A
panel.
If
something
is
very
contextual
for
Al,
he
said
he
doesn't
mind
being
interrupted
during
the
flow,
but
generally
we
will
reserve
the
last
10-15
minutes
of
the
session
to
take
your
questions.
Okay
and
with
that
I'll,
take
it
away.
B
B
We
provide
analytics
and
a
full
video
solution
to
our
customers,
but
that's
not
what
you're
here
for
so
I'll
start
with.
This
is
just
what
I'm
going
to
go
over
how
not
to
manage
your
Cassandra
clusters.
We've
learned
a
lot
of
lessons
over
the
years
working
with
Cassandra
and
making
various
mistakes.
B
We
know
how
to
make
it
to
fix
those
situations,
I'm
going
to
talk
a
little
bit
about
performance
tuning
and
how
I
performance
tuning,
not
as
a
scientist,
but
is
what
I
call
our
efficient
and
then
some
of
the
tools
that
you
need
to
really
do
the
system
side
of
performance,
tuning
and
kind
of
tuning,
your
Linux
machines
and
making
make
everything
flow
a
lot
smoother
and
I'll
go
over
a
few
other
things.
That's
about
at
the
end.
B
B
We
have
about
100
Cassandra
nodes,
that's
actually
closer
to
200
ml.
We
just
turned
on
a
new
cluster,
that's
114
nodes
and
obviously
we
all
is
always
hiring
just
like
everybody
else
and
I
already
talked
about
viala.
If
you're
interested
in
the
stuff
is
a
slide
to
be
out
soon,
so
we
always
been
with
Cassandra
stand,
0
dot,
for
that
was
before
my
time,
I
came
in
at
about
the
time
we
run
0
dot
six,
so
we
use
it
for
our
analytics
data.
We
have
a
Duke
system
that
processes
all
of
our
raw
logs.
B
Our
analytical
product
is
because
when
it
goes
to
get
those
statistics,
it
just
comes
out
of
Cassandra
it's
you
know
some
100
millisecond,
it's
really
better
than
that.
There's
network
stuff
in
the
middle
that
makes
it
more
complicated.
I'll
come
to
that.
We
use
it
in
various
places
as
a
high
bill,
highly
available
key
value
store,
as
opposed
to
a
memcache
d
or
reddit.
We've
replaced
reticent
a
few
places
with
Cassandra,
simply
because
we
needed
to
high
availability.
We
use
it
for
time
series
data.
B
We
have
an
internally
developed
monitoring
system
that
currently
takes
about
50,000
to
150,000
in
search
for
a
second
into
Cassandra
and
I,
very
dimension,
playhead
tracking,
but
we
also
do
for
some
of
our
customers
if
you're
watching
on
Netflix
and
actually
I
believe
that
place
uses
it
for
the
same
purpose:
users
Catana
Perez,
which
is
we
track
part
of
what
that
player
reports
back
to
us,
is
where
you
are
in
your
video.
We
record
that
in
Cassandra.
B
So
if
you
go
to
a
different
device
and
pick
up
the
same,
video
it'll
pick
up
where
you
left
off
and
then
we
also
have
some
machine
learning
stuff.
All
the
outputs
in
the
machine
learning
system
is
good
sword
in
Cassandra
so
that
they
can
be
used
by
our
edge
infrastructure,
which
has
five
nines
availability.
B
So
that's
what
I've
been
describing
thus
far,
our
primary
and
original
use
case
for
Cassandra's
is
now
what
we
call
our
legacy
platform.
We
have
a
new
platform,
that's
coming
out
toward
the
end
of
this
year,
where
we're
offering
third
party
analytics,
so
you
can
use
it
with
other
people's
players,
but
this
this
older
ones
been
around
for
a
long
time
pretty
much
the
entire
history
of
the
company,
which
is,
if
you
start
in
the
upper
left
hand
corner.
We
have
all
the
players
everywhere.
B
All
of
our
customs
across
all
of
our
customers
reporting
that
information
to
what
we
call
our
loggers.
All
that
data
gets
stored
in
log
files
put
into
s3
and
when
our
Hadoop
cluster
will
actually
run
a
MapReduce
that
just
sucks
those
files
down
into
HDFS
and
then
the
our
pipeline
fires
up
processes
of
log
files
writes
the
output
into
Cassandra
and
you'll.
Note
that
I
have
on
that
barrel
between
the
orange
aduke
boxes
and
the
blue
cassandra
boxes
that
were
agree
modified
right.
That's
also,
you
can
translate
that
and
just
say:
evil.
B
That's
getting
better
in
kiss
92.5
it
I'll
just
set
in
a
few
minutes,
and
then
we
have
a
service
that
sits
in
front
of
cassandra.
That
has
a
thrift
api.
That,
basically
knows
what
our
schema
is:
an
abstract
sit
away
from
our
edge
web
applications,
so
read-modify-write,
here's
the
problem
with
read-modify-write
in
Cassandra
and
as
I
said
when
Cassandra
2
point
0
hits
the
ground
there's
going
to
be
the
cavs
arm,
support
that
actually
has
a
built-in
way
to
do
it
safely.
I
see
it
question,
hdfs
vs.
CFS
its
just
its
legacy.
B
B
What
we
have
is
so,
let's
say,
I
hurt
the
Cassandra
conference
that
was
earlier
in
July
I
took
a
lot
of
my
team
to
that
to
that
conference
and
for
an
example,
I
said
well,
let's
talk
about
how
much
we're
all
going
to
drink
so
that
we
can
coordinate,
make
sure
that
at
least
one
of
us
is
sober
on
the
next
day.
So
what
we
did
is
it's.
B
What
I
have
here
is
a
column
family
that
has
each
of
our
names
is
the
row
keys
and
then
I'm
going
to
close
the
QA
for
a
second.
So
I
can
get
through
this,
so
the
rocky
is
the
name
suit
the
Albert
Evan
Frank,
Calvin,
Kristoff
and
Phillip,
and
then
the
columns
are
on
the
right
for
where
I'll
have
tuesday
as
the
column
key
and
then
the
number
six
is
the
column
value
so
what's
day,
I
just
fire
all
this
data
into
Cassandra.
B
So
and
then
I
say
well,
you
know
I,
don't
really
want
to
drink
that
much
because
I
gotta
talk
on
the
second
day
of
the
conference,
so
I'm
going
to
update
this
so
I
go
and
I
right
over
the
same
value.
It's
the
same
row
keys
in
column
key
and
are
updated
with
a
new
value.
Now
Cassandra
is
holding
two
different
values
in
memory.
B
If
you
do
this
really
fast
with
modern
Cassandra,
it's
not
such
a
bad
thing,
but
you
have
a
potential
to
have
race
conditions
where
a
value
hasn't
replicated
yet
so
you
need
to
do.
Do
this
at
arm
read
repair
at
a
hundred
percent.
You
got
to
do
a
bunch
of
tricks
to
make
sure
that
you
get
consistent
values
in
the
today
like
I,
said:
cadfix
is
this
so
now
I
have
in
let's
just
say
that
my
mem
table
has
been
flushed
out
to
enough
this
table,
and
now
my
mum
table
has
a
different
value.
B
B
So
what
this
does
is
it
does
a
couple
of
things
until
compaction,
fires
I've
got
all
three
of
those
values
values
in
in
the
Cassandra
system,
and
there
are
bunch
of
side
effects
of
this,
so
one
of
them
is
is
that
you
have
a
lot
more
work
for
compaction
to
do
as
as
time
progresses.
So,
if
you
have
old
data
that
you
just
consistently
update
over
a
long
period
of
time,
you're
going
to
be
compacting
files
that
that
otherwise
wouldn't
need
compacting.
Ideally
you
want
your
older
data
to
sit
at
rest.
B
That's
where
you're
going
to
get
the
best
performance.
That's
why
things
like
time
series
really
kick
butt
on
Cassandra,
so
after
compaction,
this
all
can
packs
down,
and
then
you
got
a
nice
clean
up
this
table
and
you
can
read
from
it
our
system
does
this
every
20
minutes
forever,
and
so
it
just
causes
a
lot
of
extra
work
on
the
system
and
there
are
other
ways
to
approach
writing
to
Cassandra
and
design.
Your
schema
I
highly
recommend
Patrick
mcdonoughs
talk
on
schema
design
and
there's
another
article.
B
If
you
look
on
twitter,
I'll
retweet
it
later
on
the
that
recently
came
out
on
the
datastax
website
about
how
compaction
really
works,
and
I
highly
recommend
that
as
well,
if
you're
interested
in
these
things,
so
I'm
going
to
move
on,
but
the
gist
of
that
is,
if
you
can
find
a
design
for
your
software
that
avoids
the
read-modify-write
cycle.
That's
where
you're
going
to
get
the
best
performance,
even
with
calves,
down
the
road.
B
It's
going
to
be
a
lower
performance
option
than
doing
straight
right
through
design
patterns,
so
around
2011,
when
we
were
doing
the
zero
that
fixed
2028
upgrade.
We
hadn't
had
some
problems
and
it's
all
kassandra's
fault
and
I
mean
that
in
the
best
possible
way.
So
what
happened
is
is
cassandra
is
just
ticking
along
where
our
by
produced
job
is
hammering
it
like
crazy,
every
20
minutes,
and
it
just
worked.
We
had
an
18
node
cluster,
they
just
SAT
there
cranked
along
and
people
just
forgot
about
it.
Literally,
they
just
forgot.
B
It
was
there
and
so
Rick
repairs
didn't
get
run
and
that
used
to
be
really
really
important.
It's
not
it
it's
not
as
bad
as
it
better.
Today,
none
of
the
maintenance
stuff
was
done,
backups
weren't
really
running,
so
we
kind
of
got
back
into
a
corner
where
we
had
this
59
system
sitting
there,
where
we
had
to
service
it
in
place.
B
While
the
bus
is
hurtling
down
the
road
at
70
miles
an
hour
and
figure
out
a
way
to
clean
up
all
the
data,
get
it
into
a
new
cluster
and
make
sure
that
we
scrubbed
all
the
old
data,
get
rid
of
all
the
old
tombstones
all
in
one
pass,
and
do
this
without
taking
anything
dumb.
We
couldn't
scheduled
downtime
in
the
system.
So
what
we
did
is
we
played
this
dirty
trick.
We
because
the
kernel
on
the
system's
was
old
and
didn't
have
an
affectionate.
B
We
ended
up
just
using
blusterous
up
point-to-point
mounts
which
were
actually
works
really
fine.
Today,
what
you
will
about
Gloucester
at
us,
it
has
its
problems,
but
the
point-to-point
mode
is
occasionally
very
handy
for
exporting
apala
system
from
one-call
system
to
another.
If
you
just
ignore
all
the
distributed
systems-
part
of
it,
so
what
we
did
is
that
as
eight
ball,
18
notes
exported
their
cassandra
data
file
systems
to
all
80
nodes
of
our
hadoop
cluster.
And
so
is
this
big
cross-matched
mess.
B
And
then
we
ran
a
MapReduce
job
to
actually
went
over
all
of
the
SS
tables.
We
pulled
the
coat
out
of
the
gutter
cassandra,
one
of
the
beauties
of
open
sources
that
we
were
able
to
do
that
pulled
it
out
of
the
Estes
tables
scrub.
The
data
using
things
that
we
do
about
the
data
that
could
even
cassandra
didn't
know,
because
we
know,
if
the
business
logic
and
what
the
schema
really
meant
scrub
that
data
and
then
wrote
it
back
into
the
front
end
of
a
new
cluster
of
cassandra.
B
So
we
did
then,
the
other
thing
we
did
is
we
discovered
that
our
indexes
are
manually
built
indexes
as
opposed
to
we're
not
really
using
secondary
indexes
today.
So
our
inverted
indexes
and
things
like
that,
rather
than
copying
those
we
just
scan
back
over
all
the
data
and
rebuilt
all
those.
So
as
a
bonus
we
got
all
of
our
index,
is
cleaned
up
and
react
amides
and
straighten
out
this
took
about
three
months
to
from
start
to
finish
most.
It
was
developing
the
software
and
testing
it
multiple
times
before.
B
B
We
move
from
a
I
think
at
the
time
it
was
a
16
gig
heat
to
a
24-game
heap
I.
Don't
recommend
that
other
people
do
that
anymore,
especially
with
Cassandra,
1,
dot,
2
and
upwards,
because
of
off
deep
cash
and
all
those
things
that
most
people
should
be
able
to
sit
at
eight
gigs,
although
if
you
are
running
into
problems
where
your
Cassandra
nodes
are
crashing,
when
you're
loading
huge
rose,
it
is
something
to
consider
trying
we
updated
to
the
latest
on
Java
one
dot.
B
Six
I
think
it
used
to
be
on
openjdk,
which
is,
which
is
a
really
bad
idea,
at
least
OpenJDK
six
openjdk,
seven,
if
you're
really
adventurous
can
be
done,
but
it's
not
recommended.
We
move
to
the
Linux
kernel,
26
36.
At
that
time,
that's
a
custom
in
house
built
colonel
from
upstream,
and
we
move
to
md
raid
5
on
xss.
So
a
word
on
raid
is
while
cassandra
is
a
distributed
system
and
it
has
its
own
replication.
It's
one
thing:
that's
really
nice
about
having
rain.
Underneath
it
is
your
ops.
B
People
will
be
a
lot
happier
and
reason.
Why
is
this?
Are
the
most
likely
thing
in
any
distributed
system
to
fail
and
in
your
computer
and
any
server
that's
run
working
data
centers
over
the
last
15
years
and
by
far
nothing
fails
more
than
hard
drives.
So
it's
really
common
for
our
hard
drive
to
fill
in
a
larger
cluster.
If
you
have
a
raid
5
or
raise
10
or
really
any
kind
of
anything,
but
raid
0
underneath
your
system,
what
happens?
Is
you
don't
have
to
rebuild
the
whole
node
every
time
it
disc
sales?
B
Today,
if
we're
I'm
raid
0
and
one
of
the
disk
fails,
that
note
is
offline
and
you
got
to
replace
the
disk
and
then
you
got
to
rebuild
the
note,
because
hundred
does
that
just
fine!
So
it's
up
to
you
and
your
business.
Do
you
decide
whether
having
that
exposure,
if
it
takes
24
hours
or
48
hours,
to
rebuild
that
node
and
if
you're,
ok
with
that,
then
great
go
with
raid
0
or
go
with
Jay
Boz?
B
But
if
that's
not
okay
and
you
need
maximum
availability,
consider
using
raid,
5
or
Z,
which
I'll
talk
about
later
and
protect
your
database
from
single
disk
failures
and
your
office,
people
will
be
a
lot
happier
and
then,
by
far
the
most
important
tuning
thing
and
I
recommend
everybody.
Take
this
to
heart
is
if
you're
dealing
with
any
kind
of
database,
cassandra,
mysql
oracle,
even
MongoDB.
The
most
important
thing
you
can
do
on
any
Linux
system
is
disabled
flop
entirely.
B
Is
that
tells
the
Linux
kernel
to
never
swap
out
my
applications
to
make
space
for
DFS
cash
and
what
I'll
explain
that
a
little
more
and
what
that
means
is
Linux
I,
the
VSS
Linux
will
actually,
as
you
read,
files
off
of
the
disk,
will
load
all
the
pages
from
those
files
into
memory
and
just
to
optimize
it
so
that
if
you're,
if
you
go
back
and
read
the
same
page
again,
it
would
just
get
it
out
of
memory
rather
than
going
back
to
disk.
This
is
how
we
get
good
performance
out
of
hard
drives.
B
What
linux
will
do
and
a
lot
of
people
notice
this
over
the
years.
A
lot
of
the
old
school
system
ends.
You'll
still
see,
will
disable
things
like
locate,
DB
and
things
like
that,
because
what
they
do
is
they'll
stand
over
all
the
disks.
It
will
want
more
memory,
for
that
and
Linux
will
actually
swap
out
your
applications
to
make
space
for
VFS
cash,
and
it's
totally
wrong
in
my
opinion,
but
it
does
work
really
long
in
desktops
and
that's
what
the
defaults
are
all
set
up
for.
B
B
So
moving
on
last
year,
we
decided
to
expand
the
system
again,
so
go
from
18
notes
to
36
nodes.
It
gone
off
to
do
a
different
project
for
a
little
while
so
I
wasn't
operating
with
Cassandra
clusters
and
handed
off
some
other
people
and
they
got
distracted
as
well
and
again
it
was
just
sitting
there
ticking
along.
So
people
forgot
about
it.
We
wrote
notes
down
a
document
saying
you
need
to
repair.
There
are
strips
in
place
at
some
point.
They
failed
and
we
ended
up
in
the
same
boat.
B
Again,
it's
pretty
embarrassing,
but
that's
what
happened
so
we
took
that
opportunity.
We
do
already
knew
what
to
do.
We
just
did
the
same
process
again.
The
gloucester
plus
p
2
p,
except
this
time
we
threw
in
a
twist
with
instead
of
using
our
production
to
do
cluster,
which
runs
at
about
115
perspective,
has
to
be
all
the
time
and
realize
that's
a
silly
number.
But
if
you
look
at
our
ganglia
graphs,
that's
actually
what
it
says
arm,
though
we
actually
used
the
DSD
MapReduce
this
time,
it's
good.
B
Since
it
was
a
scholar
job,
it
was
really
trivial
for
us
to
load
up.
We
loaded
up
BSC
three-point.
Oh,
we
ran
the
MapReduce
there
that
way.
Only
the
new
cluster
had
the
resources
being
spent
on
joining
the
MapReduce
job.
It
was
actually
writing
back
to
itself
and
that
worked
really
really
nicely.
We
were
doing
about
20
gigabit
at
second
of
transfer
from
one
cluster
to
the
other.
It
was
really
fun
to
watch
and
we
got
an
opportunity
to
do
a
whole
bunch
of
a
whole
bunch
of
performance
singing.
B
B
B
We,
because
one
of
the
problems
we
ran
into
was
prior
to
moving
to
levels
compaction.
We
were
on
size,
peered.
We
ran
out
of
space.
We
crossed
that
fifty
percent
threshold
and
people
got
really
nervous.
We
got
we
convinced
ourselves
that
raid
0
is
a
good
idea,
and
that's
why
I
spent
time
talking
about
that
earlier.
Is
we
move
to
raid
0
big
mistake
because
I
swear
a
week
later,
I
lost
two
drives
on
two
different
nodes
that
were
right
next
to
each
other
and
I
had
to
go
scramble
down
to
the
datacenter.
B
Grump
drag
my
team
down
so
that
we
could
do
get
this
all
fixed
up,
get
the
drives,
replaced
and
rebuild
these
notes
before
had
an
outage.
It
was
alright.
We've
had
pretty
close
to
a
hundred
percent
up
time
with
this
cluster,
but
that's
the
lesson
learned
is:
if
it's
really
important
that
you
have
five
or
more
nines,
which
Cassandra
will
do
happily
just
put
right
under
it
and
then
the
other
one
was.
We
should
have
gone
up
to
a
bunch
of
recites
I.
B
Don't
want
to
listen
to
really
long
in
the
tooth
now,
not
as
three
LCS
is
old
and
also
some
of
the
native
stuff
in
DSC
three-point.
Oh,
won't
even
work
on
there
because
it
was
compiled
on
debian
six,
which
is
a
perfectly
reasonable
choice.
So
that's
just
one
thing
is
Bunty
precise
or
onwards.
Debian
six
is
good
or
Ralph
six
and
I
think
is
supported.
B
So
we
made
a
couple
config
changes.
We
did
this
load,
we
switch
the
levels
compaction,
an
important
thing
to
remember
about
levels.
Compaction
is,
if
you,
if
you
love
your
ops
people
or
you
want
them
to
not
hate
you,
how
do
you
recommend
using
level
even
if
it
is
a
slightly
lower
performance
option?
So
if
you're
doing
large
volume
rights,
sighs
secured
compaction
can
still
be
a
higher
performance
option,
but
level
compaction
is
always
going
to
be
a
lot
easier
to
operate.
B
Just
because
you
don't
have
that
space
constraint
where
you
have
to
have
2x
your
largest
cup,
your
largest
column
family,
where
key
base
yeah
I,
think
it's
calm
down
your
largest
column
family
to
be
able
to
do
compaction.
The
other
one
was
is
the
bloom
filter,
false
positive
chance
and
I
know
that
they've
changed.
The
defaults
recent
in
recent
releases.
B
I
haven't
revisited
this
in
the
last
12
months,
but
that
value
what
happened
was
it
was
set
to
the
0.007
I
think
by
default,
and
it
uses
a
ton
of
heat
space
in
the
JVM
and
that's
why
we
had
to
have
such
large
heaps,
because
those
really
large
Ruth
rose
would
consume
a
ton
of
space
for
bloom
builders.
The
other
thing
we
ran
into
and
I've
seen
a
few
people
on
IRC
run
into
this.
Is
the
default
at
this
table
size
and
megabytes
in
Cassandra
one,
not
one
anyways.
B
B
That's
all
fine
and
good,
but
when
Cassandra
starts
up
and
opens
up
and
M
maps
all
those
files,
you
start
to
reach
out
to
the
edge
of
what
the
JVM
and
even
the
kernels
and
support
reasonably,
and
you
spend
a
lot
of
time
in
the
kernel
end
and
cassandra
is
just
doing
bookkeeping
on
all
those
files,
even
if
they're
not
being
accessed
so
I
recommend
increasing
it
I'm
running
at
256
megabytes.
My
race
systems
perform
really
nicely.
B
We
have
we've
tuned
things
that
our
mem
tables
flush
rate
is
pretty
steady,
so
it
works
really
well
for
us,
it's
probably
too
big
for
a
lot
of
people
unless
you're
on
SSD,
so
maybe
128,
megabytes
64
megabytes
might
be
better
for
you.
It's
definitely
something
that
you
should
decide
based
on.
Your
environments
needs.
B
We
also
switch
to
enable
snappy
compression
thus
far,
we
have
been
fairly
happy
with
it,
but
I've
gotten,
actually
better
compression
out
of
using
thought
system
based
compression.
That's
something
to
consider
if
you're
willing
to
experiment
with
all
system
technologies-
and
we
also
switched
in
the
Cassandra
animal
we
disabled
compaction
throughput
are
limiting,
and
that
was
because,
as
we
were
doing,
this
huge
data
load
of
I
think
the
total
was
30
terabytes
across
from
the
old
cluster
in
the
new
cluster.
B
We
just
got
really
far
behind
on
compaction,
and
so
the
first
thing
I
tried
was
sending
compaction
throughputs
to
say
a
thousand
megabytes,
a
second
which
is
more
than
the
array
can
do
thinking
that
that
would
take
care
of
it
and
that
actually
didn't
do
it
because
it
still
is
consistently
behind.
So
the
next
try
I
roll
through
the
cluster
again
and
disabled
it
and
then
boom.
All
the
problems
with
compaction
just
went
away.
B
But
this
one
is
this
time:
it's
not
because
we
screwed
up
it's
because
of
good
things,
which
is
that
we
built
this
brand
new
114
notes
cluster
and
we
decided
to
run
vse
on
the
entire
cluster
and
we're
hoping
replace
our
old
Cloudera
cluster
with
it.
It's
yet
to
be
seen,
because
we've
got
a
lot
of
stuff
going
on
right.
Now
we're
even
doing
this
migration
again.
So
there
are
a
couple
of
problems
with
these
migrations
and
we've
been
working
really
closely
with
datastax
on
this
is
migrating
this
data.
B
We
don't
want
to
run
that
big
MapReduce
process
again,
because
it's
just
a
lot
of
work
for
us,
so
we're
actually
doing
a
different
kind
of
migration,
we're
working
out
a
post
about
how
we're
doing
that.
We
were
probably
on
the
eol
blog
in
it
in
a
few
weeks,
and
so
we
have
to
do
this
with
no
downtime.
B
So
the
reason
why
we're
doing
this
is
we
bought
a
new
cage
stage
with
neck
linux
and
the
new
one
is
much
bigger
and
we
have
newer
racks
and
you
were
networking
and
all
this
stuff
and
the
old
one
costs
a
ton
of
money,
so
we're
going
to
deprecate
that
shut
it
down
and
move
everything
over.
So
you
had
another
migration
and
that's
going
to
be
in
moving
us
to
DSP
3.1
at
the
same
time,
so
we
have
a
couple
of
new
use
cases
coming.
We
have
those
events
that
I
mentioned
earlier
from.
B
You
can't
have
a
diagram
of
this
next,
yet
so
those
those
events
coming
from
our
players
to
the
loggers,
the
new
architecture,
is
going
to
be
the
loggers
in
real
time
for
that
to
Kappa,
which
is
a
really
nice
queuing
system,
well
written
by
linkedin,
open
source
and
from
casket
is
pulled
down
to.
We
have
a
custom
in
Jeff's
system
written
in
Scala
that
pulls
in
those
events.
B
Processes
them
does
does
a
little
bit
of
massaging
just
a
lightweight
normalization
and
inserts
it
directly
into
Cassandra
in
real
time
updates,
two
or
three
indexes
in
real
time,
so
that
all
of
our
get
raw
data
system
Cassandra
for
our
goal
is
about
three
years
of
retention
and
that
so
that's,
basically
the
gist
of
the
new
architecture.
B
So
Cassandra
is
really
good
at
this
kind
of
thing,
where
you
just
have
to
insert
load,
is
there's
no
modification
happening
in
the
database
after
it's
inserted
right
only,
and
so
it
can
fail
independently
of
our
query
system,
it's
nice
and
separated
it's
a
nice
design
pattern.
I
think
so,
that's
one!
We
are.
B
B
So
we
have
this
nice
platform
where
that
that
arrow
up
to
the
upper
right-hand
corner,
that's
the
demarcation
between
what
my
team
manages
and
what
my
development
team
is
managed
and
what
they
developed
so
that
they
don't
have
to
deal
with
all
the
gory
details
of
how
to
set
up
a
cassandra
cluster
and
set
up
spark
and
connect
to
it.
All
of
that
stuff
is
abstracted
away,
so
we
really
do
have
a
platform
as
a
service.
B
So
just
a
little
bit
of
time
spent
on
performance
tuning
in
general
and
then
I'll
come
back
to
attended.
More
specifics
is
tuning
for
performance
has
there's
a
lot
more
to
it
than
just
performance.
As
the
slide
says,
you
got
to
think
about
a
lot
of
different
dimensions,
and
I
run
into
this
a
lot
where
it's
just
people
forget-
and
you
know-
maybe
they're
just
haven't-
run
into
it
before,
but
it's
really
important
to
consider
security
first,
it
should
always
be
first.
B
You
know
you
don't
want
to
compromise
the
security
of
your
of
your
systems
to
get
at
one
percent
of
performance.
It's
just
not
worth
it.
You
got
to
think
about
the
cost
of
goods
sold,
so
I
can
build
you,
a
cluster
that
can
do
to
millisecond
response
times,
though,
under
extreme
duress,
it's
going
to
cost
you
a
few
million
dollars,
but
so
you
need
to
think
about
that.
What's
my
budget
do
I,
spend
more
time
on
tuning
or
do
I
just
go
buy
more
hardware.
B
If
I
have
money
in
those
times,
then
I
spend
the
money.
If
I
have
lots
of
time
and
no
money,
then
I
spend
the
time,
and
that's
that's
just
one
of
those
trade-offs.
You
guys
think
about
your
operations,
people
if
you're
a
software
engineer
you
got
to
go
and
you
got
to
think
about
well,
if
I
design
it
this
way,
if
I
choose
level
compaction
versus
size
here,
if
I'm
doing
a
right,
whole
new
system,
I'm
doing
read-modify-write
that
all
has
an
impact
on
your
operations
team
and
your
DBA
teams
whoever's
operating
Cassandra
for
you.
B
If
it's
not
you,
if
it
is
you,
you
should
care
even
more.
Those
are
things
to
consider
because
they
got
to
deal
with
the
fallout
if
the
system
gets
loaded.
If
you're
on
size
here
and
they
don't
in
their
new
to
Cassandra-
and
they
don't
know
about
compaction,
doing
major
compaction,
then
you
get
into
corners
so
coordinate
with
your
operations.
B
People,
you
know,
coordinates,
go
read
all
the
data
stack
posts
about
compaction
in
these
things
and
how
to
set
up
for
clean
operations
and
make
sure
that
your
operations,
people
are
on
board
and
that
you're
considering
their
their
work
life,
because
if
you
help
them
they'll
help
you
I've
been
in
operations
for
15
years.
I
can
tell
you,
that's
always
true
developers
come
and
ask
any
of
my
colleagues
and
say:
hey
I'm
doing
this
new
thing.
B
I'm
really
thinking
about
doing
this,
and
these
are
the
trade-offs
and
discuss
it
with
them,
and
everybody
agrees
you're
going
to
have
a
much
easier
time
going
forward.
Another
big
one
to
give
offer
happiness,
so
you
can
tune.
There
are
some
choices
you
can
make
in
schema
design
and
tuning
settings.
That
will
give
you
a
little
bit
more
performance,
but
the
arm,
if
you
just
go
after
the
performance-
and
you
don't
consider
the
fact
that
it's
going
to
make
future
development
developers
insane
you're.
Just
it's
not.
It
is
not
a
worthy
trade
off.
B
Obviously
you
know
how
many
racks
do
you
have?
Are
you
in
the
cloud,
amazon
or
giant
or
whatever
you
know?
Those
are
decisions
that
you
got
to
take
into
account.
What's
available
goes
back
to
cost,
these
things
are
all
interconnected,
reliability
and
resilience.
33
note
busters.
If
you
got
a
small
shop
and
you're
trying
to
maintain
costs,
three
is
the
bare
minimum
like
I.
B
What's
my
hat
Soleil
and
make
the
decision
based
on
that,
rather
than
just
a
wild
guess
or
a
google
search
and
then
always
be
ready
to
compromise
as
I've
been
saying
through
the
slide,
is
these
things
are
all
trade-offs
and
performance
is
awesome.
I
love
tuning
for
maximum
performance,
but
there's
always
there
always
needs
to
be
that
compromise
where
you
pull
it
back
and
make
sure
that
you
make
it
secure
and
available
and
maintainable
enough
on
that.
A.
B
Problem
so,
as
I
mentioned
earlier,
I'm
just
kind
of
playing
with
this
word
and
passing
your
own
people
to
see
what
they
think,
but
the
way
that
I
approached
performance
tuning
and
not
everybody,
does
it
the
way
I
you,
but
most
of
us
in
the
ops
trenches
end
up
doing
it.
This
way,
the
whether
we
like
it
or
not,
is
I
would
really
love
to
be
scientific
about
this.
I
want
to
have
a
lab
with
identical
to
production
hardware
and
be
able
to
just
set
up
clusters.
B
Tear
them
down,
try
all
kinds
of
different
stuff
run,
load
tests
and
track
all
the
numbers
in
a
spreadsheet
and
be
very
scientific
and
make
it
make
the
best
possible
decision.
The
reality
is,
is
almost
none
of
us
have
the
time
or
resources
to
do
that,
so
what
we
have
to
rely
on
a
touristic
and
educated
guesses
so
that
that's
the
big
difference
is
performance
tuning
and
not
being
afraid
of
doing
it.
B
That
way,
because
you'll
hear
people
say
and
they'll
hammer
on
the
soapbox
and
say
which
I'm
doing
it
now
on
and
say
you
have
to
be
scientific
about
it.
Do
clear
measurements
and
get
all
the
noise
of
the
system,
and
it's
just
BS,
except
for
in
really
large
environments
that
have
the
resources,
so
what
I
recommend
is
is
leaning
into
the
database.
It
has
the
replication
in
place
and
you
can
do
things
like
make
changes
to
single
notes,
observe
the
performance
over
a
couple
of
days
back
out
to
change.
B
B
And
the
quickest
way
to
achieve
better
performance
in
this
kind
of
situation
is
just
as
go
after
your
bottlenecks.
Look
at
your
applications,
see
where
the
latency
is
building
up
do
traces
through
your
system.
If
you
can
and
then
go
after
those
first,
if
you
have
one
particular
insert
load,
that's
got
high,
latency
or
you've
got
to
read
that's
taking
too
long
go
after
that.
B
Sucker
use
Cassandra
tracing
and
look
at
all
your
systems
and
see
where
see
where
it's
hanging
up
and
generally,
you
can
get
the
pretty
acceptable
performance
rather
quickly,
just
by
going
going
after
that,
and
then
basically,
this
whole
thing
kind
of
comes
out
of
or
a
great
way
to
start
with.
All
this
is
welcome.
B
If
you
have
non
production
systems,
obviously
do
it
there
first,
but
nothing
is
going
to
behave
the
same
as
your
production
systems.
Most
of
us
have
non
production
systems
that
are
scaled
back,
their
smaller
to
sometimes
single
knows.
I.
Don't
recommend
doing
that
because
it
just
changes
about
just
about
everything
about
how
this
is
going
to
behave
and
yes,
keep
iterating
and
just
going
on
the
same
subject:
I
really
love
testing
shiny
things,
it's
kind
of
what
I
do
I
mess
around
with
new
kernels
when
they
come
out.
B
The
latest
323
series,
310
311
kernels,
have
a
lot
of
really
cool
features
that
have
benefits
for
Cassandra
things
like
transparent,
huge
pages,
which
will
make
your
large
JDM
beeps
a
lot
more
efficient.
In
terms
of
how
many
pages
that
kernel
has
to
track,
the
file
systems
are
all
been
improving
steadily
over
the
last
20
years.
That
Linux
has
been
existence
in
the
existence.
They're
constantly
doesn't
fixing
bugs
and
performance
regressions,
sometimes
giving
us
new
ones.
B
I've
had
some
experience
in
the
past
gfs,
and
it
happens
that
there's
a
native
port
of
ZFS
to
Linux
now
so
we're
experimenting
with
that
running
in
production
Mel.
It's
really
awesome.
In
terms
of
management.
We've
got
some
nodes
running
on
btrfs,
which
has
some
of
its
like
DFS's
little
brother.
It
has
a
lot
of
the
similar
features
and
kind
of
the
really
neat
ones
for
Cassandra
is
that
it
has
built-in
raid
subsystems.
B
So
you
can
I
think
now
you
can
do
rate
5
and
raid
6
underneath
btrfs
without
using
MV
raids,
but
the
more
important
ones
are.
You
can
use
the
file
system
compression,
which
we
started
with
this
before
snappy
and
lv4
were
available
in
Cassandra,
but
I've
found
very,
very
good
compression
rates
using
LG
for
through
ZFS
or
btrfs
under
Cassandra,
because.
B
B
Well,
we've
done
some
things
with
OpenJDK
7
is
it's
just
not
quite
as
reliable
if
you're
really
passionate
about
open
using
the
entirely
open
source
tax,
it'll
work
but
like
I
said,
data
stack,
doesn't
recommend
it
and
I've
noticed
that
just
the
Oracle
JDK
is
a
lot
more
consistent
and
then
these
all
these
things
you
know
I
have
not.
Everybody,
has
larger
clusters
and
has
this
luxury.
B
But
if
you
have
20
notes,
if
you
take
one
node
and
try
out
btrfs
on
it,
I
don't
recommend
doing
on
more
than
one
note
at
a
time
when
you're
first
getting
started,
but
you
can
just
replace
that
file
system
with
btrfs,
rebuild
a
node
and
then
watch
and
see
what
happens
and
get
some
experience
with
it.
Rather
than
saying
blue
scary
new
file
system
throw
it
on
one
node,
you
got
a
distributed
database.
If
something
fails,
who
cares
just
rebuild
it?
Go
back
to
exp
for
AXA
best
or
whatever
your
poison
is.
B
So
I
was
going
to
build
this
chart
and
then
I
realized
that
it
was
already
done
for
me
earlier
this
year
at
scale.
Brendan
Greg
at
Giant
is
one
of
the
probably
the
world's
expert
on
DTrace
and
he
gave
a
really
nice
talk.
There,
I
didn't
get
to
see
it
but
I.
You
know
I
got
the
slides
afterward
and
you
can
see
the
link
here.
B
I
highly
recommend
printing
out
this
chart
if
you're
not
familiar
with
all
these
tools
already,
because
almost
all
of
these
will
make
your
life
better
if
you're
debugging
systems
or
performance
problems
in
production.
My
favorites
I'll
come
to
those
and
clear
up
here,
so
my
very
favorite
tool
for
Linux
system
monitor
performance
monitoring
his
visa.
It's
written
in
Python,
really
easy
to
write.
Plugins
I
haven't
had
to
write
any
yet
because
it's
got
support
for
just
about
everything.
B
Saudi
stat
minutes,
lrv
n10
is
kind
of
what
I
do
in
the
morning
is
I
fire
up
screen
and
it
logs
into
all
my
clusters
and
it
fires
this
command
up
automatically.
So
I
can
flip
through
the
screens
and
see
the
last
hour
of
activity.
I
actually
run
those
at
60
second
intervals.
So
what
will
happen
is
those
lines
that
you
see
will
update
every
second
and
then
it
will
flip
the
new
line
every
10
seconds,
which
is
that
command
line
argument,
and
you
can
see
everything
on
a
page.
B
You
can
see
your
disk
I/o
happening
network
aisle.
Your
CPU
another
one
that
a
lot
that's
kind
of
more
advanced
is
context.
Switches
can
be
kind
of
indicative
of
a
lot
of
performance
problems
and
on
the
left-hand
side,
you
know
you
have
your
typical
load
average
and
processes,
and
then
your
memory
in
the
middle
there,
where
most
of
the
green
is.
B
Steel,
nuts,
that
this
is
a
song
I,
wrote
a
number
of
years
ago
and
I
just
keep
dragging
it
around.
It
is
on
github
and
I,
haven't
seen
many
tools
like
it
and
what
this
does
is
it
logs
into
all
the
systems
with
a
persistent
ssh
connection
and
just
rips
the
performance
metrics
out
of
/
croc
on
Linux
every
two
seconds
and
then
computes
this
display,
and
it
just
updates
every
two
seconds.
It
just
rolls
the
screen.
It's
not
fancy
at
all.
B
It's
something
that
I've
probably
spent
a
total
of
about
four
hours
on
over
the
last
five
years,
and
but
what
I
found
is
when
I'm
tuning
large
distributed
systems
having
a
global
view
like
this,
it's
really
helpful
and
being
able
to
see.
What's
going
on
when
you
spin
up
your
load
on
your
Cassandra
cluster,
you
can
see
the
network
I'll
take
off.
You
can
see
how
that
falls
down
on
the
disk
watch.
B
The
load
average
gives
you
kind
of
a
loose
idea
of
how
busy
the
CPUs
are,
and
then
those
those
totals
on
the
bottom
are
just
really
fun
when
you've
got
a
total
network
traffic
on
your
Cassandra
cluster
of
20
megabits,
a
second
that's
just
kind
of
cool,
so,
like
I,
said
github
I'm,
probably
going
to
push
it
to
see
pan
tune
just
because
I've
had
a
lot
of
complaints
about
installations.
So
I'll
update
to
get
up
when
I
do
that
when
you
have
a
raid
system,
I'm
I
got
about
a
few
minutes.
B
Left
I
won't
leave
time
for
questions
I'm
a
little
stunned
when
you're
tuning
a
raid
system,
especially
or
even
when
you're
on
J
bod,
one
of
the
most
common
things
to
go
wrong
and
cause
performance
to
go
belly-up
is
you
have
one
dry,
especially
with
spinning
Ross?
You
have
won
drive.
That's
just
got
it's
starting
to
go
bad.
It's
not
gone
bad
yet,
and
you'll
see
that
latency
on
that
one
drive
will
start
to
climb
and
the
best
place
to
see
that
is
with
iostat.
B
Minus
x1
is
what
I
use,
and
what
I'm
looking
at
is.
Is
the
a
weight
and
the
RA
way
and
all
those
numbers
on
the
right-hand
side
I,
don't
really
care
about
the
throughput
necessarily
what
I'm
looking
at
is
the
latency
per
hour
drive
and
I'm,
not
looking
for
any
particular
number.
What
I'm
looking
for
is
outliers
so
you'll
see
you
know,
I
got
one
there,
that's
like
5.6,
a
and
4.4,
but
if
I
see
one
in
that
same
group,
that's
up
at
100,
then
I
know.
B
B
An
ec2
are
wildly
variable
in
latency,
and
it's
something
when
I
spin
up
new
clusters
need
you
to
is
actually
when
the
first
things
I'll
do
is
I'll,
actually
20
all
the
drives
of
DD
I'll
spin
up
something
like
four
or
five
times
as
many
as
I
want
to
end
up
with
and
I'll
call
all
the
slowest
ones
right
away
and
I've
had
very
good
luck
with
getting
good
performance.
By
doing
that,
a
lot
of
people
don't
know
about
a
chop
and
I.
B
Think
it's
crazy,
because
it's
way
better
than
the
default
tops,
that's
installed
on
systems
is
by
default,
breaks
things
down
by
threads.
You
can
see
here,
we've
got
thunder
running
and
it
shows
all
the
processors
and
it
just
looks
really
neat
and
your
colleagues
will
think
it
looks
cool
and
it
has
a
bajillion
options
for
configuration
and
setting
up
the
display
and
configuring
exactly
the
way
you
want
check
it
out.
B
If
you
haven't
used
it
before
and
J
console
and
visual
vm
are
the
other
two
big
tools
that
really
help
you
tune:
Cassandra
clusters
or
anything
to
do
with
the
JDM.
So
you
want
to
take
a
look
at
these
and
start
to
get
an
idea.
What
healthy
and
unhealthy
garbage
collection
patterns
are,
and
that's
very
sigh
cluster,
but
what
you're
really
looking
for?
Is
you
don't
want
to
see
stuff
like
this,
where
you're
filling
up
your
your
new
size
and
then
it's
going
down?
B
B
Op
Center
is
really
great.
We
really
love
this
view.
We
keep
it
up
on
dashboards
with
114
sure
it
looks
really
awesome
there.
I'm
working
with
data
Spectre
I
need
to
get
some
of
these
bugs
worked
out
just
because
they
there
just
aren't
that
many
clusters
of
this
size.
We
were
really
happy
with
the
tool.
Are
my
engineers
are
very
happy
with
the
schema
browser,
it's
quite
handy
for
being
able
to
go
and
figure
out
what
data
is
in
your
database.
B
Obviously,
notes
you'll
ring
I'm
going
to
skip
the
head
real
quick
now,
so
we
can
do
questions.
Cf
staff
is
another
one
to
learn
to
look
at.
It
takes
some
time
to
learn
what
all
these
values
means,
but
the
kind
of
the
interesting
ones
are
you
know.
That's
the
stable
count
can
be
useful
when
you're
looking
at
compact,
maybe
you're
compaction
behind
things
like
that
and
the
bloom
filter
as
I
mentioned
earlier.
This
is
where
you
can
look
and
see
how
how
much
space
you're
wasting
on
bloom
filters.
B
So
if
you
see
that
false
positive
0
and
the
false
ratio
is
well
0.000,
but
if
that
false
positive
is
a
zero
and
you
got
a
bunch
of
heat
being
used
by
it,
the
filter
space
use,
you
should
probably
tune
that
upwards.
The
false
positive
Chancery
proxy
histograms
is
something
that
we
talked
about
on
IRC
a
lot
in
terms
of
performance,
tuning
and
kind
of
figuring
out,
where
all
of
your
latency
is
I'm
going
to
address
that
now
compaction
stats.
B
So
if
you
see
that
your
iOS
subsystem
on
your
servers
is
being
thrashed,
this
is
a
good
one
to
look
at
to
see
where
Milan's
I/o
is
coming
from.
Very
often
it's
just
compaction,
that's
causing
the
iOS
rashing.
That's
the
state
mental
right
out
to
the
disk
stream
out.
They
go
very
fast,
especially
even
on
spinning
media.
It's
usually
compaction.
That's
causing
the
Iowa
to
be
saturated,
stress,
testing
tools
not
going
to
go
into
these,
but
Cassandra
stress
is
a
really
nice
tool,
quite
easy
to
use
we're
experimenting
out.
B
Obviously,
I
said
production
is
the
best
place
for
that
Terra
sort
on
DSC
is
something
we've
used
for
performance
benchmarking.
It's
a
really
good
way
for
figuring
out
how
your
MapReduce
is
going
to
perform
and
then
I.
We
have
some
homegrown
tools.
We
have
I
have
one
on
on
github
called
go
stretch,
I
wrote
and
go
that
basically
just
tries
to
hammer
the
database
as
hard
as
possible.
B
These
stats,
the
colonel,
pin
max
that's
just
what
I
like
it's
kind
of
a
vanity
thing,
but
if
you're
running
MapReduce
jobs
it
basically
keeps
your
pits
from
rolling
over
too
fast,
it's
more
cosmetic
than
anything,
but
I'll,
really
like
it
and
I
haven't,
found
anything
that
it
breaks
yet,
and
the
rest
of
these
are
really
nice
to
have
on
any
Cassandra
cluster.
The
dirty
ratio
dirty
background
ratio,
I
recommend,
reading
the
colonel
documentation
on
those
before
you
mess
with
them.
Bm
sloppiness
put
that
in
every
system
you
have,
even
if
slop
is
disabled.
B
That
way.
If
somebody
comes
along
and
decides
to
enable
it
won't
screw
everything
up.
But
the
file
max
max
napkin
are
the
important
ones
if
you're
running
at
large
scale,
and
then
those
TCP
settings
are
just
kind
of
the
defaults
that
you
should
put
on
almost
any
server
just
to
give
the
TCP
subsystem
and
Linux
a
lot
more
memory
to
work
with
it's
more
sophisticated
than
that,
but
that's
the
short
version
almost
done
so
our
feet.
Local
I
put
these
things
in
there,
but
the
quick
and
dirty
ones
are
the
cfq.
B
Scheduler
is
really
good.
If
you
want
to
use
see
groups,
if
you
don't
deadline,
is
probably
the
best
throughput
option
still
on
Linux.
So
you
can
echo
deadlines
into
this.
The
second
line
from
the
bottom
sis
block
s,
baq
scheduler
and
then
the
striped
cache
size
thing
at
the
bottom
is
really
important.
If
you
do
in
raid,
5
I
recommend
googling
it
and
reading
up
on
it
and
then
the
NR
requesting
that
I'm
going
into
totally
random
order.
B
B
So
fitting
is
a
multi-dimensional
thing.
You
got
a
lot
of
things
to
consider
production
mode.
It's
going
to
tell
you
more
than
any
other
benchmark
and
lean
into
the
database,
because
it
really
did
we'll
take
good
care
of
you.
I've
had
almost
I've
had
zero
data,
corruption,
failure,
I've,
never
lost
a
bit
of
data
and
all
of
the
outages
we've
had
has
been
our
own,
our
own
fault.
So
thank
you
and
there's
my
contact
information
and
hopefully
we
can
get
through
some
questions.
A
B
Prefer
soccer
aid,
because
it's
just
easier
for
me
to
manage
hardware
RAID,
can
has
some
advantages
and
having
md
realm
of
batteries,
it
also
has
a
higher
cost,
and
since
the
database
does
such
a
good
job
of
doing
performance,
tuning
or
doing
linear
rights
and
all
these
things
on
its
own,
we've
never
had
a
real
problem
with
performance
on
our
supper.
Raise.
So
that's
why?
If
you
like
hardware
raid
and
go
for
it,
it's
just
it's
an
extra
cost
that
I
don't
need.
B
That's
a
little
bit
longer
if
you're
really
curious
about
that
I
believe
the
recording
of
Evan
chance
taught
at
the
Cassandra
summit
is
online.
If
you
go
to
the
dates
tax
website,
you
can
find
it
I'll
try
to
find
the
link
and
tweet
it
a
little
later,
but
he
talks
about
that
in
depth.
But
we're
using
that
for
all
of
our
new
analytical
systems
is
a.
B
A
A
B
B
So
for
us
we
want
that
compassion
happened
really
quickly
and
be
over
with,
because
our
reads
are
coming
in
pretty
much
all
the
time.
It's
just
for
our
workload.
It
works
really!
Well,
we
haven't
seen
any
issues.
Are
newer,
bigger
cluster,
we're
running
into
a
few
more
interesting
problems,
a
lot
of
that
just
do
the
size
of
it.
But
no,
we
haven't
seen
any
issues,
but
that's
cuz
I.
We
spent
a
lot
of
time
really
tuning
our
I/o
subsystems
to
to
run
as
fast
as
possible.
A
B
Yeah,
that's
a
schema
level
setting
up
I'm.
Sorry
I
forgot
to
mention
that
so
I
think
it's
when
you
define
the
column
family
that
you
set
that
and
it's
an
option.
I
have
some
juice
on
github
and
I
should
be
taking
those
people
Christian
here
any
later
of
all
the
stuff
I'm
saying
L,
say
and
Twitter
in
phila
yeah,
so
ski
bum,
yeah,
okay,.
A
B
A
B
And
we're
kind
of
waiting
on
that.
We
want
to
wait
for
three
dot,
one
dot,
one,
not
because
we
don't
trust
datastax,
but
just
because
we
have
a
lot
of
really
important
data
and
we're
doing
some
migrations
and
merging
some
clusters.
And
it's
a
lot
easier
if
they're,
all
in
the
same
version.
So
once
we
get
done
with
all
those
merges,
then
we'll
we'll
go
back
to
upgrading
the
DSP
3.1
great.
B
Varies
by
cluster,
our
older,
so
that
legacy
cluster
is
one
I
think
it's
at
about
200
gigs,
/,
a
node
right
now,
two
or
three
hundred
the
new
cluster
we're
going
to
be
pushing
it
a
lot
further!
That's
going
to
require
DSC
3.1.
Actually,
we've
had
a
lot
of
varying
issues
with
having
really
deep
nodes.
I
think
the
recommendation
generally
is
to
keep
it
under
200
gigs,
but
yeah
on
average.
Most
of
our
larger
clusters
are
Ontario
expert
node,
the
smaller
ones
are.
It
varies
from
a
couple
gigs
up
to
you
know:
40
50,
gigs,.
B
We
haven't
had
a
lot
of
trouble
with
that
in
our
primary
production
systems,
mostly
because
they're
fairly
mature
and
most
of
them
are
doing
very
small
reads.
Most
of
that
comes
down
to
what
I
was
talking
about
earlier
with
having
that
collaboration
with
your
Cassandra
operators.
If
it's
not
you
and
really
thinking
of
thinking
through
making
sure
your
software
is
designed
to
do
that
right
in
the
first
place,
obviously,
in
a
perfect
world,
that's
what
would
happen.
The
Cassandra
trade
stuff
will
do
it.
A
B
A
couple
times
where
I
had
crews
that
were
crashing
nodes
because
it
was
trying
to
read,
you
know:
20
30
gigabytes
of
data
in
one
in
one
query
over
thrift,
and
it
was
just
filling
up
to
eat,
and
it
just
happened
that
we
just
step
through
the
software
manually
and
found
it
I,
don't
know
about
what
tools
are
just
because
they
haven't
needed
them.
B
So,
like
I
mentioned,
is
I
recommend
raid
if
at
least
59
s
enough
availability
is
important
and
if
you
have
a
separate
operations
team
that
has
to
service
the
disks,
if
you
have
physical
of
data
center,
you
have
a
different
crew.
That
actually
goes
and
does
that
work
so
putting
a
raid
subsystem
and
that's
the
primary
reason
to
do
it.
It
just
gives
you
a
little
bit
of
a
buffer
disks
fail
all
the
time
I
mean
literally,
you
know:
I'm
500
old
cluster.
B
It's
only
a
couple
months
old
I've
already
had
five
discs
fail
right.
So
having
raid
in
place
just
protects
me
for
a
little
bit
longer
indicates
the
disk
failure
and
provides
an
extra
level
of
redundancy.
It
wastes
a
little
bit
of
space.
What
the
reality
is.
You
don't
want
to
fill
up
three
disks
anyway,
so
raid
five
works
fairly
well,
the
md
raid
5
is
pretty
it's
fairly
performant
raid
10
is
going
to
give
you
the
best
performance
and
that's
why
rajat?