►
From YouTube: DataStax: Extreme Cassandra Optimization - The Sequel
Description
Speaker: Al Tobey, Partner Architect
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
A
A
A
The
wall
and
everybody
scratching
their
heads
as
I
go
help
people
figure
out
what's
going
on,
so
this
talk
is
going
to
be
kind
of
a
little
different
from
a
lot
of
performance
advice
that
you
get
around
Cassandra,
it's
more
my
angle,
coming
at
it
from
the
system
and
just
kind
of
a
different
way
of
thinking
about
it.
So
this
is
specific
to
sign
or
2.1.
A
lot
of
this
stuff
is
going
to
that.
It's
awesome,
Thank,
You,
Apple,
perfect
timing
and
I,
don't
what
it
seemed.
Five
volts
cold
x
over
and
pulled
out
11!
A
You're
right
I'm
go
right
to
the
last
slide.
It
says
to
use
wind
at
four
point:
oh
yeah
today
is
power,
point
right,
alright,
so
a
few
things
that
I'll
say
are
considered
apocryphal
in
the
designer
developer
community
and
you
know,
use
with
caution.
I'll
try
to
call
that
out
and
my
focus
is
usually
I'm
finding
the
low-hanging
fruit
in
most
clusters
there
is
a
lot
of
low-hanging
fruit
in
terms
of
performance
tuning
little
things.
A
You
can
do
little
tweaks,
you
can
make
here
and
there
small,
safe
things
you
can
do
and
open
up
a
lot
of
performance.
There's
this
thing
called
it
ooda
loop,
which
is
out
of
military
jargon,
I'm
not
going
to
talk
about
that
today.
I'm
just
mentioning
it,
because
what
I'm
really
talking
about
is
a
kind
of
a
watered-down
version
of
it.
It
looks
more
like
this
and
it
is
to
say
start
with
thinking.
A
Think
about
the
hardware.
You
have
at
hand
the
database
that
you're
running
the
fact
that
it
is
replicated
to
other
nodes,
the
fact
that
you're
running
on
the
JVM,
how
much
memory
do
you
have
how
much
CPU
right?
These
are
the
kinds
of
questions
that
you
ask
before
you
even
touch
anything
before
you
make
any
estimations
of
what
kind
of
config
you
have
to
think
through
that
and
go
I
have
eight
cores.
Are
they
real
course?
For
example,
in
ec2
and
I
to
that
2x
large
says
it
has
eight
cores?
A
There
are
really
only
four
chords
or
hyper
threads
and
that's
a
really
important
difference.
If
you're
trying
to
estimate
how
much
throughput
you
should
get
out
of
a
machine-
and
that's
that's
where
you
lead
into
what
is
the
performance
potential
of
this
machine
and
that's
your
goal
is
to
get
Cassandra
to
use
as
much
of
the
performance
potential
of
the
machine
as
possible.
A
A
lot
of
you
how
many
people
are
running
in
the
cloud
today,
amazon,
google,
Xue
er-
how
many
people
are
in
bare
metal
all
right,
so
most
you've
ever
met
will
probably
recognize
this.
So
this
is
a.
A
So
this
is
a
quad
socket
Neumann
machine,
and
this
is
important
to
keep
in
mind
when
you're
talking
about
modern
machines,
even
if
you're
in
the
cloud,
because
the
reality
is
that
Amazon
Google
and
those
guys
are
all
buying
these
servers
to
run
your
bm's
on,
and
the
fact
is
that
you're
on
a
shared
platform,
but
a
lot
of
times,
you
can
do
cool
things
with
Numa,
because
each
one
of
these
squares
is
like
a
little
embedded
computer
that
has
a
fast
network
to
the
other
squares.
Next
to
it,
that's
what
human
is.
A
So
all
the
RAM
on
that
socket
right
here
is
connected
to
the
memory
controller
on
that
cpu
and
then,
and
so
on,
are
on
all
four
of
them.
So
when
you
divide
a
NUMA
box
up,
you
can
actually
consider
those
as
separate
computers.
They
certainly
not
required
and
things
like
that,
but
the
network
is
already
shared
right
with
the
things
like
MDMA.
You
can
do
things
like
assign
a
nvme
drive
to
this
particular
socket
by
putting
it
the
right
slot
and
then
it's
all
to
access
is
local
to
that
cpu.
A
This
is
a
little
topic
called
mechanical
sympathy
that
mark
thompson
is
popularizing.
It
actually
comes
from
an
f1
racer
in
god,
Jackie
Stewart,
so
it's
understanding
the
hardware,
understanding
the
environment
that
you're
operating
in,
so
that
you
can
push
it
harder
and
safer.
A
little
word
on
hypervisors,
if
you're
into
cloud
you're
sitting
on
a
hypervisor.
If
you're
running
on
VMware
you
sitting
on
a
hypervisor
high,
provides
just
a
word
for
virtualization.
It
means
the
little
micro
kernel
that
sits
underneath
the
VMS
and
arbitrates
all
the
access
to
the
hardware,
important
things
on
hypervisors.
A
A
So
that's
just
one
thing
is
to
keep
an
eye
on
I've
seen
it
a
few
times
where
you
go
into
an
environment
and
they're
going.
We
can't
get
this
cluster
to
go
and
then
you
go
look
in
there
using
an
emulated
nic
driver,
for
example,
and
that's
just
one
thing
just
to
knock
out
of
the
way,
but
it
also
happens
in
amazon,
for
example,
if
you're
outside
of
BBC
or
if
your
honor,
you
can
see,
and
you
forget,
to
check
the
little
box
that
says
enhanced
networking
you're
going
to
be
sitting
on
unenhanced
networking.
A
We
discovered
this
with
CrowdStrike.
If
you
saw
Jim
plushes
talk
this
morning,
we
kept
hitting
the
wall
going
up
and
hidden
like
seven
thousand
transactions.
Sometimes
we
tried
everything
and
just
kept
hitting
it.
Finally,
we
noticed
hey
enhance
efforts,
an
identical
flip
it
on
and
all
of
a
sudden
we
broke
through
that
barrier
and
then
other
things
they
had
fixed
too.
A
But
you
know
that's
what
you're
doing
is
you're
finding
all
these
little
things
that
you
can
fix
and
keep
fixing
all
the
little
ones
and
eventually
those
things
add
up
into
a
much
bigger
gaming
performance.
So
when
you
enable
enhance
networking,
what's
happening
is
modern
Intel
chips
and
AMD
chips
have
something
called
an
iommu
on
them.
What
that
lets?
A
You
do
is
take
a
piece
of
hardware
and
map
it
through
to
a
vm
and
modern
nicks,
and
even
some
storage
stuff
up
is
aware
of
this
and
we'll
let
you
assign
shards
of
the
hardware
through
to
a
vm
so
easy
to
enhance
networking.
Does
this
I
think
Google's
gotta
appreciate
ur
Microsoft's
got
it,
so
that's
just
something
to
watch
out
for
when
you
see
those
kinds
of
things
that
are
called
something
like
enhanced
networking
make
sure
you
turn
it
on,
because
it
can
enable
a
lot
more
performance
by
reducing
the
overhead.
A
Anybody
heard
of
docker
yeah
it'd
be
tired
of
hearing
about
docker
I'm,
not
I
like
it
so
with
docker
the
if
you're
running
a
sounder
and
dr.
I'm
going
to
spend
a
lot
of
time
on
this.
But
it's
an
important
thing
to
remember:
people
will
talk
about
how
doctors,
networking
performance
stinks
and
it's
not
doctors
fault.
It's
the
fact
that
what
the
doctor
default
networking
is
a
virtual
device
that
passes
packets
out
to
the
host
OS.
That
then
goes
through
in
that
translation
layer,
so
they
can
send
packets
out
on
the
network.
A
That
has
a
lot
of
performance
overhead,
a
lot
of
data
copying
and
is
lower
performance.
So
whenever
you're
ready,
Cassandra
ordained
stacks
enterprise
inside
of
doctor
or
any
kind
of
container
technology,
we
highly
recommend
running
with
minus
minus
net
ego
host
or
host.
Networking
which
says
do
not
do
do
like
this
right
hand
side
here
and
do
not
actually
go
through
the
NAT
layer
and
just
let
the
application
inside
the
container
talk
directly
to
the
interface
and
bind
to
that
interface
and
do
its
network
traffic
directly.
A
So
that's
kind
of
a
world
I
got
a
lot
to
go
through,
so
this
is
going
to
go
fast.
Now
we're
talking
about
configurations,
it's
probably
what
you're
here
for
so
Java
Java
Virtual
Machine.
First
thing
to
remember:
java
8
is
supported
by
its
under
2.1
and
up
supported
by
DSC
4.7
up
and
just
switching
the
JVM
without
taking
anything
else,
probably
open
up
the
few
percent
performance
right
out
of
the
box.
A
If
you're
going
to
use
Java,
8
I
highly
recommend
going
you
45
or
higher,
and
usually
want
to
be
on
the
very
latest
for
security
fixes
and
things
like
that.
But
the
other
reason
why
you
want
you
45s
you're,
going
to
look
at
G,
1,
G
C,
see.
Why
did
you
see
below
that
and
Java
7?
It's
jump
and
jump
in
New,
45
and
below
its
junk,
and
some
people
will
say
it's
junk.
A
Even
now,
I
tend
to
use
it
just
about
everywhere,
because
it's
a
lot
less
work,
we'll
talk
about
that
in
a
minute
openjdk
is
is
supported
by
Cassandra.
Now
is
supported
by
DSC
4.7
mm
and,
if
you're
going
to
use
it
watch
your
distro,
because,
if
you're
using
say
in
a
bunt
too
they'll
ship,
openjdk,
s8
u20,
some
really
old
version,
then
they'll
never
update
it.
For
that
release
of
Linux
again,
you
can
find
ppas
and
things
to
fix
it.
A
That's
all
I
can
really
say
about
it
today,
because
we
haven't
work
through
our
testing
with
them
yet
but
I'm
sure
they
would
love
to
hear
from
you.
If
you
have
extra
money
did
for
that.
Moving
on
so
t1d
see
anybody
here,
running
g,
one
g
c
in
production
for
any
up,
so
maybe
running
with
Cassandra
cool
cool
DSC
4.8
is
going
to
shift
with
an
option
to
switch
to
G
when
DCI
visit
didn't
fall.
For
me,
I,
don't
remember
which
what
the
decision
was.
A
It
just
recently
happened
and
this
time
to
three
point:
0
is
going
to
ship
with
these
options
in
the
sutter
D&D
stage
as
well
so
yeah.
What
D
1
does
is
a
completely
different
garbage
collection
scheme
from
CMS,
which
is
what
we've
traditionally
used,
and
it
breaks
the
memory
up
into
a
bunch
of
regions
and
does
compacting
and
a
bunch
of
other
cool
stuff.
The
really
cool
feature
of
g1
that
I
like
is
what
they
call
a
ergonomics.
A
As
I
said,
you
know,
setting
the
heat
eatin
size
is
a
black
art
that
open
almost
nobody
understands.
It
takes
a
lot
of
testing
and
the
other
thing
is
machined,
especially
in
the
cloud
have
different
to
harbor
characteristics.
You
might
think
that
two
motherboards
the
exact
same
memory
exact
same
CPU,
are
exactly
the
same,
but
especially
as
the
age
that
will
change.
A
You
have
memory
cells
failing
or
whatever
it's
just
weird,
especially
the
cloud,
those
where
that
happens,
where
you
get
28
I
to
that
4x,
larges
and
they'll
have
completely
different
performance
characteristics
and
one
of
the
things
that
g
one
does
really
low
is
its
adaptive
to
the
environment
that
it's
running
it
and
you
don't
have
to
go
and
mess
with
flags
all
the
time.
So,
if
I'm
running
a
cluster
of
gy
machines,
I
might
see
that
they
eatin
on
different
machines.
One
might
be
at
two
gigs
one.
A
Is
that,
like
fifteen
hundred
megabytes
and
it'll
shift
all
over
the
places,
the
workload
changes
and
that's
the
really
cool
part
is
as
your
workload
changes,
how
many
people
deploy
an
application
and
then
never
deployed
it
again,
exactly
so
a
few
things
to
tune
so
that
the
whole
point
of
g1
is
it's
mostly
auto
tuning,
but
for
cassandra
is
kind
of
a
an
outlier
from
that.
In
that
we
push
the
JVM
really
really
hard.
So
you
got
to
add
some
other
stuff
to
get
the
best
throughput
out
of
it.
A
A
The
the
next
one
is
the
initiating
deepak.
You
can
see
that
says,
start
collecting
sooner,
otherwise
it
waits
until
they
eat
it's.
Forty
percent
pull
to
start
collecting
and
oftentimes
that's
too
late,
and
then
you
end
up
hitting
a
stop
the
world.
So
sometimes
you
want
to
push
that
down
even
lower
if
you
have
cpu
stubborn,
but
20
is
a
good
starting
spot.
That's
that
almost
all
these
settings,
that's
if
I
put
a
number
a
solid
number
instead
of
range,
it's
just
a
starting
point
and
it
usually
needs
a
little
bit
of
flavoring.
A
This
g1
are
set
up
dating
plus
nine
percent
is,
is
just
about
how
much
work
it
tries
to
do
and
stop
the
world.
So
if
your
peni
9.9
are
dominated
by
garbage
collection
and
almost
all
of
your
clusters
are
going
to
be,
that
will
bring
that
down
a
little
bit
and
because
it
will
do
a
little
less
a
trip
door
during
stop
the
world
and
will
instead
do
it
during
run
time
in
background
friends,
and
then
these
two
are
coming
out.
A
The
parallel
GC
threads
concurrent
GC
threads,
that
for
whatever
reason
the
jdk
stops
adding
threads
at
eight
threads
for
DC
and
at
first
for
whatever
reason,
because
Oracle
probably
designed
this
or
some
designed
this
a
long
time
ago,
they
thought
eight
is
the
most
cards,
will
ever
see
on
x86.
And
what
was
that
wrong
right?
So
if
you
want
to
go
more
than
eight
fret
for
GC,
you
got
to
bump
that
up
and
so
I
recommend
pushing
it
up
to
whatever
your
count
of
real
courses.
A
Don't
call
hyper
thread
course:
I
haven't
tested
that,
but
it
seems
to
work
really
well.
That
way.
This
max
GC
possibilities
is
the
number
one
like
the
main
tuning
knob
for
g1
and
the
default
is
200
milliseconds
and
the
important
thing
to
remember
is
it's
not
guaranteed
it's
a
target
for
the
average
pause
time.
So
when
you
set
that
you
can
push
it
down
below
200
milliseconds,
it
won't
do
squat.
It
won't
go
below
200
milliseconds,
no
matter
what
you
do.
A
Setting
up
to
500
the
it'll
get
a
little
less
aggressive
in
terms
of
garbage
collection.
You
get
a
little
bit
higher
throughput
I
pushed
it
as
high
as
2,000,
and
that
does
open
up
a
little
bit
more
fruit
book,
but
you
also
get
those
kind
of
outlier
GCS
that
nobody
wants.
So
if
you
do
push
it
up
over
a
thousand
make
sure
you
look
at
your
timeouts
and
Cassandra
done
gamble
and
make
sure
that
they're
set
accordingly,
so
that
you
get
a
two
second
pause
that
you
don't
have
timeouts
in
your
database
on.
A
Finally,
a
parallel
rough
product,
it's
not
usually
a
big
problem.
The
reference
processing
in
Java
8
with
Cassandra
isn't
usually
a
lot
but
I've
seen
it
creep
up
over
five
percent
of
the
stop
the
world
time
before
it,
and
turning
this
on
just
means
that,
in
the
background,
Italy
reference
processing
before
it
hits
off
the
world
speeds
things
up
a
little
bit.
So
it's
a
good
thing
to
have
on
and
then
kind
of
a
final
note
on
g
one
g
c:
is
it
really
had
a
hit?
A
Sweet
spot
is
around
twenty
six
gigabytes
of
heat,
which
sounds
like
an
awful
lot.
I
wouldn't
go
more
than
thirty
two
gigabytes
for
Cassandra
you,
but
it
does
work.
Ok
at
eight
gigabytes.
It's
just
that
eight
gigabytes
CMS
can
still
often
beat
g14
for
some
workloads.
So
just
keep
that
in
mind
if
you're
gonna
switch
know
if
I'm
t1
/
so
for
CMS,
one
or
two
point
one,
there
are
few
options
you
can
add
to
open
up
a
little
bit
more
throughput.
A
The
main
thing
is
is
to
watch
the
default
in
the
consignor
e
of
the
they
all
have
probably
seen
it.
It
recommends
100
megabytes
per
core.
This
is
dead
wrong.
You
need
far
more
than
that.
Usually
you
can,
if
you
keep
the
new
gen
low
and
you
have
frequent
garbage
collections,
they'll
be
nice
and
short,
because
it's
not
much
memory
to
collect
and
eaten,
but
by
pushing
that
up
to
about
two
gigs
starting
point
about
twenty
twenty-five
percent.
It's
a
much
better
starting
point
than
say
eight
hundred
megabytes,
for
example
the
other
ones.
A
Are
this
power
GC
cards
prescribed
for
a
chump
I've,
been
using
4096
a
lot
of
people,
use
32
k
and
have
really
good
luck
with
that.
The
point
is:
is
to
get
into
buf
well
above
the
default
of
512,
and
this
actually
is
reported
to
open
up
a
lot
of
performance
in
GC
on
CMS
and
the
last
two
are
from
the
Cassandra
8150
ticket
that
a
lot
of
people
have
been
using
for
this
kind
of
guidance.
A
A
Basically,
what
it
said
is
is,
if
I
have,
the
assumption
was
that
most
java
threads
were
not
that
we're
locking
we're
not
contended
locks,
but
we're
talking
about
Cassandra
here,
if
you're
doing
a
lot
of
through
putting
a
lot
of
peril
operations
blocks
are
contended
a
lot
in
Cassandra
because
the
CD
architecture,
so
this
actually
is
a
net
loss
in
performance
and
by
clipping
this
off,
if
you're,
looking
at
the
system,
stats
you'll
see
your
system.
Time
come
down
a
little
bit.
A
Sometimes
you'll
see
it
doesn't
really
show
up
in
the
latency
figures
as
long
as
you're,
not
at
saturation
load,
but
it
does
help
make
things
more
efficient.
The
next
to
use
T
lab
is
on
by
default,
but
resize
t
lab,
isn't
and
when
that
does
tells
the
JVM
to
go
ahead
and
adjust
the
amount.
A
threadlocal
allocation
block,
their
video
at
local
allocation
block
is
all
right.
A
A
threadlocal
allocation
block
is
a
trick
in
the
threading
environment,
where
you
basically
allocate
some
memory
to
a
thread
that
thread
owns
so
that
it
doesn't
have
to
go
and
talk
to
it
do
any
kind
of
locked
operations
to
allocate
memory.
You
can
just
go
and
allocate
it
because
it
knows
it
ones
that
memory
range,
and
so,
when
you
have
a
you
know,
cassandra
is
three
four
or
five
hundred
threads
sitting
there
trying
to
allocate
memory.
It's
a
huge
performance
benefit.
So
the
resize
says
you
know.
A
Some
of
these
threads
are
doing
a
lot
of
allocation
and
some
are
you'll
be
doing
a
little
bit.
So
we
can
actually
make
these
threads
that
only
need
a
little
bit
of
allocation
and
make
their
30
laps
smaller,
which
frees
up
memory
for
GC
for
other
threads
to
use,
and
then
on
the
other
side
it
can
make
them
a
little
bit
bigger
for
the
threads
that
do
a
lot
of
throughput.
A
So
it's
a
good
thing
to
have
enabled
it
had
isn't
really
easy
to
measure
the
difference
it
with
Cassandra
running
just
because
there's
so
much
other
noise
and
system,
but
it
just
seems
to
be
one
of
those
things
if
you
think
about
it,
really
helps
with
performance.
Those
last
two
are
to
prevent
just
kind
of
dumb
things
in
the
JDM
that
haven't
been
fixed
yet
from
happening,
and
so
the
disabled,
explicit
GC,
there's.
A
I
think
in
jmx
there's
a
kind
of
there
somewhere
in
there
there's
a
24
hour
timer
that
just
once
a
day,
it'll
go
and
just
do
a
full
GC
whether
you
want
to
or
dominance.
So
it's
turning
that
said,
putting
that
flag
turns
that
off,
so
that
that
doesn't
happen,
because
if
you
just
seen
this
one
like
two
second
pause
every
24
hours,
that's
probably
what
it
is
and
then
perf
disabled
sharing
them.
A
A
I'm,
sorry
mo,
so
the
default
setting
is
32
threads,
and
so
this
is
a
thread
count.
You
can
verify
it
by
looking
at
PS
minus
e
FL
change.
One
of
these
to
some
ludicrous
number
and
you'll
see
that
many
more
threads,
so
PS
minus
EF
with
a
capital
L,
will
show
you
all
the
threads
in
a
consignor
system
or
anything.
It's
running
on
your
system
in
bumping
that
to
about
128
on
SSDs
can
open
up
a
lot
of
performance,
especially
on
the
Repat.
A
So
my
starting
point
to
use
you
128
less
of
my
hard
drives
and
then
I
usually
stick
at
32
64,
just
because
hard
drives
state
on
Google.
If
you're
using
google
persistent
disks,
you
may
want
to
go
even
higher
as
high
as
256
just
because
they
they
like
a
lot
of
parallel
Iowa
to
be
having
the
next
one
is
none
tables
if
you're
increasing
your
heat
over
8
gigabytes.
A
This
is
set
automatically
to,
I
think
twenty
five
percent
of
the
heat,
and
so
you
really
want
to
if
you're
going
over
eight
gigabytes
to
set
this
to
a
static
number
that
is
in
twenty
five
percent
of
the
heat.
If
you
don't
want,
eight
gigabytes
of
mint
tablespace
is
going
to
call
its
garbage
collection
problems,
so
I
like
to
set
that
to
a
static
sighs.
Just
so
imma
have
to
think
about
it.
I
like
it
when
things
are
nice
and
locked
down
and
they
don't
change
under
my
feet.
A
You
know
just
did
I
forgot
about
it
six
months
later
and
go
change
something
else
and
also
make
blows
up
right.
So
that's
what
I
do
there
and
then
you
move
on
to
these
next
two,
so
mem
table
flush.
Writers
is
a
really
important
one
to
tune,
especially
on
heiio
systems.
So
the
default,
I
think,
is
one
or
two
and
that's
way
too
few
for
for
kind
of
bigger,
more
powerful
systems,
and
especially
SSD
based
systems,
see
what
about
that.
I
started,
or
on
most
boxes.
That
seems
to
be
pretty
good
across
the
board.
A
You
may
want
to
go
as
high
as
eight
I
haven't
seen.
A
lot
of
cases
where
you
want
to
go
much
higher
I
could
be
wrong
on
that,
but
I
haven't
really
run
into
it
myself,
so
I'm
not
going
to
say
that
you
should,
if
you
set
my
table
flush,
writers,
higher,
there's
a
automatic
algorithm
Cassandra
that
sets
the
mem
table
clean
up
threshold
and
it's
a
factor
of
the
flush
writers.
A
So
if
you're
going
to
set
the
number
of
blush
writers,
you
should
always
set
the
cleanup
threshold
where
you're
just
going
to
get
unexpected
sizes
and
the
nice
part
about
setting
this
is
now.
You
know
how
big
your
flushes
will
be.
If
you
set
all
three
eighties,
this
says
that
flush
mm
table
when
it
hits
ten
percent
of
the
mem
table
end
table
space.
A
So
that
means
in
this
with
this
setting
I'll
be
flushing.
It's
about
two
hundred
megabyte
SS
tables.
So
that's
just
a
nice
size
for
when
compaction
comes
along
I'm,
not
reading
a
1
gigabyte
file.
Reading
a
bunch
of
200
megabytes
and
compaction
is
fairly
efficient.
That
way
and
then
for
heavy
right
workloads.
You
might
want
to
consider
off
keep
objects
in
Cassandra
2.14
for
the
mem
tables,
which
is
off
heat
allocation.
A
Cassandra
2.9
got
released
in
casado
2.1.
There
was
a
there's,
a
myth
right:
stick
MIT
log
aren't
aligned
on
a
4k
boundary,
so
there
it
ends
up
causing
Linux
to
do
a
read
before
right.
When
you
do
rights
to
commit
log
and
the
old
worker
on
is
to
set
the
big
dogs
segment
size,
one
megabyte
larger
than
the
commit
log,
sighs
and
megabytes,
and
so,
if
you,
if
you
can't
get
to
219,
then
you
can
do
that.
A
So
just
keep
track
of
that.
One
I
think
it's
off
by
default
as
of
219,
but
I
like
to
make
be
explicit
about
it
again,
so
that
your
ex
prices
later
right
and
that
can
bring
Tanya
latency
on
right,
heavy
rope
lights,
especially
something
I've
noticed
in
Cassandra
clusters,
and
it's
not
really
clear
how
to
address
mr.
Fixit.
It's
just
kind
of
a
thing
that
happens
in
distributed
systems
sometimes
is
when
you're
doing
a
lot
of
writing
and
you
push
a
whole
lot
of
data
through
Cassandra.
It's
writing.
A
The
commit
log
is
writing
them
tables,
but
it
doesn't
call
Epstein,
at
least
in
the
default
setting.
It
doesn't
call
F
sync
for
every
single
right.
It
doesn't
on
a
10.
Second
timer
what'll
happen
after
a
little
bit
of
time
on
some
clusters
is
they'll
buffer,
a
whole
bunch
of
data
in
memory,
and
when
that
f
st.
hits
it
has
to
flush.
A
And
so
what
you'll
see
is
that
if
you're
watching
the
throughput
in
real
time
as
you'll,
see
all
the
throughput
kind
of
take
a
dip
and
then
it
will
come
back
up
and
then
it
will
dip
and
it
will
come
back
up
and
that's
basically
that
10
to
that
ten-second,
commit
interval
and
by
setting
trickle
up
seems
to
true
it's
sending
it
to
a
low
value
that
faults.
10
megabytes,
that's
pretty
good!
I
like
it
at
one
megabyte
just
because
I
like
to
see
it
flush
all
the
time
that
way.
A
I
have
less
rip
data,
a
Triscuit
memory,
it's
a
Linux
thing,
the
alternative
and
I
think
we
should
do
this
on
any
database
system,
especially
if
you're
benchmarking
it
is
to
set
the
linux
kernel.
Setting
dirty
background
bites
too
I've
set
it
to
about
eight
megabytes
on
most
amazing
machines,
leading
my
personal
machines
and
what
that
does
is
tells
the
legs
curl
to
once.
It
gets
it
over
eight
megabytes
of
data
dirty
data
in
memory
to
start
flushing
it
in
the
background,
so
it
doesn't
block
applications.
It
just
starts
flushing
it.
A
So
that's
a
pretty
good
setting
to
have
in
place.
Sometimes
you
will
see
throughput
come
down
a
little
bit
when
you
do
that,
but
that's
just
kind
of
the
cough
your
point
in
time,
throughput
right
so
you're
you
big
numbers,
will
come
down,
but
it
will
smooth
out
over
time
and
that's
what
you
want.
You
want
nice
smooth
consistent
in
performance
rather
than
kind
of
this
kind
of
performance.
A
Couple
more
in
Cassandra
amo
the
default
be
nodes
is
256.
You
can't
change
this
on
a
running
cluster,
but
if
you're
building
a
new
cluster
I
recommend
starting
at
32
or
if
you're
kind
of
old-school,
you
can
start
with
one,
but
that's
kind
of
not
be
notes,
promoted,
right,
and
so
some
shops
really
like
to
stay
away
from
V
notes,
because
there
is
a
novella
bility.
A
Current
trade
off
I,
really
like
the
operational
side
of
he
knows
where
you
can
kind
of
just
add,
knows
one
of
the
time
and
not
have
to
think
too
much
about
it.
So
32
is
a
good
starting
point.
Some
people
like
eight
even
as
low
as
that,
and
then
you
still
have
the
advantage
of
the
binos
without
having
kind
of
a
lot
of
the
performance
overhead
that
comes
with
a
lot
of
binos.
A
The
default
in
OSS
for
compression
is
on
and
if
you're
seeing
low,
throughput
and
high
cpu
usage
on
a
especially
in
OSS
cluster,
just
check
and
see
if
this
is
turned
on,
because
you
don't
really
need
compression
on
most
LEM
networks
right,
you
got
10
gig
in
your
data
center.
You
really
don't
need
it
so
just
set
it
to
DC,
which
across
datacenters
you
probably
want
compression,
but
inside
of
a
single
data
center,
it's
kind
of
a
net
loss.
So
so
that's
DC
or
none,
and
then
I
was
mentioning
the
network
throughput
thing
before.
A
A
What
this
does
OTC
coalescing
strategy
was
added
in
215
and
that
tells
Cassandra
to
go
ahead
and
groove
a
bunch
of
mutations
or
transactions
together
and
then
try
to
push
them
out
in
a
single
packet,
and
it
does
that
on
a
by
default,
50
micro,
second
window.
So
it
doesn't
really
hurt
your
latency
too
much,
but
it
can
really
bring
throughput
up.
A
If
that's
your
problem,
I
don't
recommend
it
for
all
clusters,
especially
if
you
have
a
really
hot
10
game
network,
but
it
can
help
in
some
cases
and
then
the
other
one
in
the
animal
is
the
streaming
socket,
timeout
millisecond
you're
doing
a
lot
of
repair
or
replication
across
datacenters.
The
default
is
zero,
which
means
that
it
doesn't
timeout
and
sea-based
going
to
restart
things
like
repairs
and
things
so
that
they
changing
this
I.
Think,
probably
to
that
one
to
ten
or
very
soon
it's
going
to
be
the
vaulting
to
I.
A
A
A
A
lot
of
people
are
talking
about
our
schemas
at
this
conference,
but
if
you're,
really
struggling
with
performance
and
you've
tried
all
the
other
things
in
the
system
and
tuning,
maybe
you
need
to
look
at
your
schema
again,
and
that
does
happen
an
awful
lot
in
the
field,
so
just
I
want
to
make
sure
I
put
that
out
there.
The
default
compression
block
size
is
128
kilobytes.
If
you
don't
specify
that
your
compression
is
on
by
default
and
that's
a
good
thing,
but
you
may
want
to
bring
that
number
down.
A
Bring
that
number
down-
maybe
64
it
just
depending
on
your
record
size
and
your
schema,
and
things
like
that.
So
I
can't
give
you
a
kind
of
a
default
setting,
but
it's
something
you
might
want
to
take
a
look
at
if
you're,
seeing
a
lot
of
throughput
issues,
I
stick
with
size,
tiered
compaction
for
most
things.
There
are
some
cases
where
you
want
LCS
or
a
date
to
your
compaction,
but
for
the
most
part
start
with
the
spcs
and
move
on
from
there.
A
Finally,
keep
I'm
going
to
go
the
ambassadors
for
short
on
time
already
we're
going
to
talk
about
dirty
background
bites
these
two
file
max
max
math
count
I
just
want
to
mention
those,
because
the
lot
of
people
already
know
about
this
is
well
documented.
You
need
to
bump
these
up
something
we
found
months
ago
was.
A
If
you
set
these
two
max
values
like
Max
integer
value,
the
linux
kernel
freaks
out,
so
don't
go
crazy
and
have
my
general
rule
now
is
never
use
min
or
max
values
the
linux
kernel,
because
it
just
it
freaks
out
too
much
and
crashes.
So
I
say
except
vm,
sloppiness
21.
You
can
set
it
to
zero,
but
then
the
colonel
starts
doing
weird
crap.
We
have
you
set
it
to
one,
you
get
ninety-nine
point
nine
percent
of
the
effect
and
you
don't
get
any
the
weird
stuff.
A
Storage
screening,
so
you
use
the
deadline.
Scheduler,
always
there's
a
lot
of
advice.
Up
to
it
says
you
use
no
offline,
Harbor
raid
or
on
virtualized
systems.
A
It's
compact,
the
Colonel's
compiled
so
only
has
no
up
and
that's
fine
three
heads.
It
falls
to
128
kilobytes,
that's
way
too
high
for
almost
every
SSD
in
existence,
except
some
little
m2s
that
don't
have
a
lot
of
cash
on
them
and
then
maybe
read
heads
okay,
but
I
set
it
to
eight
again.
It's
according
that
same
rule,
don't
use
min
values
for
for
linux.
8
kilobytes
is
two
blocks
on
modern
drives.
It's
not
that
big
a
deal
in
terms
of
read
ahead.
So
that's
what
I
use
it
almost
every
case.
A
A
If
you
have
money
to
burn
and
you
need
for
say
you
have
a
data
center
out
in
Timbuktu
and
can't
you
go
every
male
hands,
then
you
probably
want
some
kind
of
rate
underneath
in
case
of
dry,
fails
so
that
you
can
back
you
up
until
you
have
time
to
go,
replace
the
drive
and
then
in
grade
five
six.
It
happens.
It's
not
recommended,
but
sometimes
you
need
for
capacity.
A
There
are
a
few
things
you
can
do
to
tune
that
and
they're
going
to
go
into
their
complicated
and
then
to
one
does
have
j-bot
support
it's
by
far
the
highest
performance
option,
but
it
does
have
some
caveats
that
you
need
to
read
about
before
you
choose
it:
don't
just
throw
it
on
there,
but
especially
on
huge
s.
Sata
drives
it's
a
good
choice,
because
then
each
of
the
drives
minh
works
separately.
A
A
Now,
if
you
make
sure
that
all
the
power
saving
stuff
is
adjusted
right
for
a
server
workload,
you
don't
always
want
to
turn
it
off,
so
you
can
turn
most
of
it
off
with
just
idle
equal
pull,
which
tells
the
colonel
to
sit
there
and
burn
on
the
cpu
and
it's
idle
and
then,
but
what
it
does
is
it
can
burn
up
your
CPU.
If
you
don't
have
to
cooling,
it
shortens
the
life
time.
It
wastes
a
ton
of
power.
A
So
just
to
look
this
up
as
a
separate
exercise.
It
isn't
the
guy
that
I
published
just
about
an
hour
ago
bit
more
about
that
sable
cpu
frequency
scaling
it's
not
on
a
lot
of
servicing
views,
but
some
of
the
newer
things
like
the
z
undies.
Sometimes
it
is
enabled
by
default.
It's
really
not
bad
for
Cassandra.
Just
turn
it
off.
You
need
to
know
if
it's
on
there's
a
tool
called
I
7z
that
will
do
it
or
you
can
be
run
Sisyphus
or
the
power
top
tool.
A
You
can
just
yum,
install
it
and
run
it
and
you'll
be
able
to
see
it
pretty
quickly.
Dr.,
like
I,
said
always:
Nettie
will
host
always
use
a
volume
for
your
data.
Otherwise
you
just
got
a
good
chance
of
losing
it.
The
guy
could
save
a
lot
more
doctor.
There's
a
JDM
platic
called
nice,
XX
cohen,
always
Preet
CH,
which
tells
the
jvm
to
actually
go
and
zero
out
all
of
its
memory
as
soon
as
it
starts
up
and
what
that
does?
A
It
forces
a
page
fault
in
the
colonel,
so
all
the
memory
is
faulted
and
you
get
nice
big,
contiguous
memory,
regions
of
memory,
and
that
can
be
especially
good
on
virtualized
environments,
where
you
know
you
have
two
levels
of
indirection
in
your
memory.
In
your
memory
management
agreement
editor
it's
very
dependent
on
the
hypervisor,
whether
it
has
any
effect
at
all,
but
the
other
impact
that
this
has
is.
A
If,
if
huge
pages
are
enabled,
then
you
will
get
huge
pages
on
the
anonymous
huge
pages
automatically
it'll
get
nice
and
laid
out
to
welcome
ex
crew
over
your
performance
like
it
snowed
too.
Unfortunately,
I
ran
out
of
time
for
benchmarking,
but
these
will
be
posted
online
soon,
I
did
post
a
guide.
It
has
almost
everything
in
here
in
prose
form
on
my
blog
and
I'll
skip
to
that
slide
here
now.
So
thank
you
very
much
and
good
luck
with.