►
Description
Speakers: Patrick McFadin, Chief Evangelist at DataStax & Al Tobey, Open Source Mechanic at DataStax
It's time to play "Stump the Experts", with Al Tobey, Open Source Mechanic at DataStax, and Patrick McFadin, Chief Evangelist at DataStax. Bring your urgent Cassandra questions to this session and have our expert panel answer them for you.
A
So
it's
only
one
day
event
too
it'd
be
really
cool,
so
I'm
patrick
mcfadden.
I
evangelist,
but
I'm
also
a
solution
architect
with
biggest
tax.
So
I
get
to
run
into
all
kinds
of
crazy.
B
Stuff
I'm
alex
w.
I'm
also
starts
to
evangelize
to
do
stacks
for
malevolent,
yellow
and
broke
cassandra
in.
A
A
So
we
had
our
first
question
in
the
hallway
in
transit
wow,
we
packed
the
room
right
off
hold
on
thanks.
Jeff
first
question
was:
how
do
I
scan
all
the
rows
in
my
column
panel?
A
You
do
aha,
not
dumped
yet
didn't
say
it
was
pretty.
No,
it's
not
pretty
it's
not
performance
it
damn.
You
can
do
it.
So
there
is
actually
a
recipe
with
the
ask.
Annex
driver
called
get
all
ropes
and
the
idea
is
that
you,
because
it's
not
just
it's
going
to
be
in
order
and
that's
probably
the
thing
I
have
to
qualify.
This
is
not
going
to
be
an
in-border
operation.
A
If
you're
using
random
partitioner,
it's
random
doesn't
actually
generate
all
the
tokens
and
no
what
it
does
is
each
so
what
happens
is
that
whenever
you
create
a
row
key,
it
will
hash
it
into
a
token
value
and
then
those
token
values
are
assigned
to
each
node.
Well,
because
each
node
has
a
token
range.
What
you
do
is
you
just
iterate
over
that
range?
A
A
A
A
So
if
you
have
16
different
16
threads
doing
it,
you
can
break
it
into
16,
threads
or
16
different
ranges
of
tokens
and
there
we.
B
E
The
lists
are
not
efficient
for
many
use
cases
because
they
are
limited
so
and
as
for
the
blocks,
it
is
written
that
the
lists
are
safe
so
because
we
have
to
delete
from
a
list.
What
do
I
do?
I
read
all
the
values
and
then
find
an
index
then
delete
and
then
so
from
the
from
the
box
documentation.
We
said
that
this
operation.
E
E
A
A
E
C
A
You
and
I
talked
about
this
before
you
were
going
to
send
me
the
like
to
outline
how
it
worked.
So
I
like
to
try
it
out,
because
if
that's
really
a
bug,
I
could
fix
it
right.
B
Values
I
mean
the
the
index
inside
your
collection
yeah.
When
you
write
the
tombstone
and
then
you
write
a
new
value
and
then
you
delete
and
then
you
write
a
new
one.
If
that
happens
really
really,
you
know
very
right,
fast
yeah,
but
you
you
could
get
the
same
counter
value
to
order
the
same
time
marketplace.
A
But
okay,
the
value
is
the
value,
but
the
index
inside
the
collection
isn't
necessarily
the
exact
origin.
G
We've
got
a
nice
little
production
cluster
of
12
nodes
and
what
we're
seeing
is
that
sometimes
out
of
blue
three
times
on
c
spikes,
right
left
and
c
spikes-
and
it
looks
like
rooting
issues
but
aws
has
quite
been
transparent
about
it.
The
problem
is
that
slow
mode
really
manages
to
bring
a
whole
cluster
down.
That
note
doesn't
as
soon
as
we
disconnect
that
node
the
cluster
is
behaving
perfectly,
but
it's
low
notice.
This
is
really
bringing
the
whole
cluster
to
a
call.
How
on
earth
do
you
debug
that.
A
Okay,
so
a
couple
of
questions:
what
clan
are
you
using.
G
A
Headlines,
okay,
holiness
I've
seen
that
before,
when
all
of
your
connections
seem
to
be
going
through
one
server,
mistaking
you
that's
one
thing
to
look
for
and
because
you
really
have
no
awareness
of
what
node
you're
connecting
to
unless
you're
asking,
then
you
can
have
off.
You
can
get
a
hot
spot,
you'll
go
through
one
shuffle.
So
if
that's
the
case
so
that
that'd
be
the
first
thing
to
look
for,
because
I've
seen
it
where
one
node
gets
all
the
connections
and
then
also
not
one
node
goes
through
gc
and
it
looks
like
it's
across.
A
B
We
haven't
completely
checked
that
out
yet
because
I've
seen
that
in
ec2
before
where
a
node
dc2
basically
will
happen,
is
nodes
will
be
working
fine
one
day,
the
next
day
you
get
a
couple
noisy
neighbors
and
all
of
a
sudden,
your
discs
go
from
performing
just
fine.
They
just
kind
of
drop
off
and
never
come
back
again
and
you
have
to
replace
the
node.
A
A
You're
always
bringing
problems
all
right,
well,
those
kind
of
problems,
because
computing
is
a
hard
problem.
Just
you
know
hard
problems
in
troubleshooting,
but
I
would
look
at
like.
Is
it
really
the
entire
cluster
going
bad?
Is
it
os?
Is
it
disc?
Is
it
at
that
level
or
is
it?
Is
it
increased
latency
with
something
else,
and
what
size
instance.
B
Yeah,
what
what
I
noticed
is
we
started
out
on
the
c1
larges
and
those
were
terrible.
They
would
they
drop
like
flies.
You
lose
one
a
week
or
at
least
one
every
other
week.
The
m1
x
large
is
quite
a
bit
better.
If
you
go
up
to
the
m2s
or
just
a
little
bit
bigger.
What
happens?
Is
you
get
into
a
different
class
of
machine
in
amazon
you
get
into
the
newer
shinier
part
of
their
data
center
and
they
don't
have
as
much
variability
in
them
as
the
older
instances
do
so
anything.
B
B
Can't
exactly
evict
all
their
customers
off
of
the
old
gear
to
fix
it.
So
that's
why
I
would
recommend,
whenever
possible,
go
with
the
bigger
and
the
newer
instance
types,
not
just
not
because
they're
shinier,
but
because
you're
going
to
get
the
better
infrastructure.
And
then
your
database
is
going
to
be
a
lot
more
consistent.
B
A
I
think
it's
meaning
that
the
guaranteed
iops
guaranteed
ups
are
not
that
good
either.
I
I
thought
well
because
iops
are
irrelevant,
it's
transfer
speed.
How
long
have
the
sequential
operation
you
can
run?
What's
your
max
megabytes
per
second
yeah,
you
can
get
some
really
good
ones
with
evs,
but
they're
not
consistent,
and
they
just
don't
keep
up
as
well
as
what
you
can
do
with
the
stripe
set
on
ephemeral.
A
J
A
A
A
Case
scenario,
so
just
kind
of
level
set
hinted
handoff,
which
this
is
all
about
very
normal
part
of
extender
ring
operation.
So
when
you
have
multiple
nodes
running
and
you
have
coordinators,
so
whenever
the
client
connects
to
a
node,
it
may
not
have
it
may
not
be
the
home
for
that
data
and
they
need
to
replicate
that
data
needs
to
go
to
other
replicas,
so
that
node
will
act
as
a
coordinator.
A
If
one
of
the
replicas
is
down
at
that
time,
let's
say
you
just
rebooted
it
and
you,
you
know,
restarted
the
server
or
something
to
do
maintenance
or
something
totally
fine.
But
if,
if
that
replica
is
down,
the
coordinator's
job
is
to
store,
what's
called
a
hint,
it's
just
gonna,
say:
okay
I'll
hold
on
to
the
data
until
you're
back
and
then,
when
the
node
comes
back
online.
A
Consistency
that
that's
the
way
it's
supposed
to
work,
the
bad
things
that
would
happen
is,
if
you're,
if
your
cluster,
let's
say
that
you
were
using
those
c1
extra,
larger
they're,
c1
larges,
and
this
is
like
the
worst
idea
ever
well.
You
start
hammering
your
your
cluster
with
a
lot
of
extra
load.
Think
bad
things
start
happening
everywhere.
Well
hints
will
start
piling
up
in
different
places
because
nodes
are
blinking
in
and
off
it's
just.
You
really
have
a
bad
situation
on
your
hand.
B
Condition
and
you
need
to
be
able
to
remove
those
and
I've
had
to
do
it
before
where
you
go
in
and
remove
those
it's
very
rare
and
you
shouldn't
do
it
unless
you're
it's,
unless
you
really
have
to
recover
that
node
and
not
just
rebuild
it
with
the
repair.
But
that's
where
it's
stored,
that's
how
the
storage
engine
does
it
actually
stores?
It's
not
that
stable
and.
J
Does
the
hint
contain
the
actual
data
or
just
kind
of
to
say,
hey,
you
need
to
talk
to
your.
J
A
K
A
You
have
to
be
partition,
tolerant.
So
what?
If
you
can't
get
to
some
notes?
But
you
can
get
this
one
so
replaying
hints
is
not
something
you
want
to
count
on
all
the
time,
because
you
know
they're
not
going
to
last
forever,
especially
if
you're
running
a
cl1
yeah,
if
you're
doing
really
high
volume
inserts,
you
want
to
be
at
cl1.
That's
where
the
best
performance.
B
Is
and
that's
where
hints
and
handout
really
starts
to
pay
off,
because
even
if
you
have
one
of
the
replicas
fail,
the
hints
will
get
backed
up
and
then
they'll
get
replayed
onto
that
node.
When
it
comes
back
online
and
you're
not
constantly
going
into
repairing
your
cluster
because
of
single
foot
failures,
you
should
be
able
to
remove
any
node
in
your
cluster
at
any
time.
F
Go
ahead
question!
So,
if
you
have
a
row
you
feel
deleted,
is
it
possible
to
reinsert
with
an
old
timestamp.
C
A
F
A
B
Them
is,
if
you
want
to
do
say
if
you've
invented
your
own
version
of
vector
clocks
or
some
of
these
other
fancy
distributed
system
things
like
crts.
You
can
actually
put
your
own
value
inside
your
timestamp
and
use
it,
and
then
you
get
all
the
same
conflict
resolution,
but
using
that
timestamp
in
your
application.
Is
it's
a
big
pattern?
If
you're
gonna
go
there.
A
A
B
A
B
A
L
What
about
deep
space?
I've
experienced
a
lot
of
troubles
with
deep
space.
B
L
Data
but
throughout.
B
To
a
very
it
doesn't
set
the
heap
size
inside
it's
just
the
shell
script,
the
actual
command
you
run,
then
it
loads
the
java
program
right.
Yes,
when
we
start
cassandra,
we
set
the
heap
size
typically
to
eight
gigabytes
and
most
of
those
tools.
They
don't
actually
set
the
heap
size,
so
they
get
the
default
jvm
heap
size,
which
is
one
gigabyte
at
the
top.
Here's
two,
which
I
think
is
one
one
gate,
so
sometimes
you
need
to
edit
those
scripts
and
break
after
the
java
command.
Add
a
minus
x
and
max.
B
L
A
L
The
same
same
space
problem,
one
one
stopped
to
work:
the
server
go
down
when
I
restart,
but
not
able
to
restart
for
each
space
and
a
lot
of
cash
because
was
the
cash
in
the
in
the
file
system.
L
L
H
I
H
A
B
B
B
Okay,
so
you
you're
going
to
look
at
a
file
in
etsy
cassandra.
Cassandra
dash
env
data
stage
and
there
are.
There
are
two
variables
there's
it's
documented
in
the
file.
There
are
comments.
It's
just
a
shell,
shell,
blob
and
you'll
you'll
see
there's
a
max
heap
size
you
can
set,
and
you
want
to
set
that.
I
would
say
it's
probably
two
gigs
on
a
four
gig
machine,
but
maybe
three,
if
you
really
need
to,
but
I
don't
know.
L
A
L
A
I
H
N
What's
the
best
approach,
if
we
have
like
we
need
to
do
significant
amount
of
deletes
from
a
table,
so
just
realize
that
if
you
do
this
just
by
like
deleting
objects,
then
it's
like
significantly
drops
the
performance.
After
even
just
a
couple
of
thousands
of
days.
Are
you
deleting
across.
N
B
N
The
only
thing
which,
like
helped,
was
like
dropping
the
the
garbage
collection
period
or
that
helps,
but
of
course
it
has
a
drawback
that
when,
like
the
maintenance,
restart
or
something
happens,
then
this
old,
I'm
gonna,
get
zombies,
start
appearing.
A
In
the
database
delete
operations
away,
and
what
my
guess
is
is
that
you
are
not
keeping
up
with
compassion
when
you're
doing
your
deletes,
because
you're
you're,
probably
when
you
write
your
regular
data,
you're
keeping
it
at
a
certain
pace.
But
whenever
you're
doing
your
deletes
you're
just
turning
it
all
the
way
in
and
you're
running
it
as
fast
as
possible.
A
When
you
do
a
delete,
it's
going
to
fill
up
the
mid
table
which
needs
to
get
flushed.
I
would
almost
bet
you
that
if
you
look
at
your
system,
while
you're
running
a
full
fast
delete
like
that
that
your
compactions
are
backing
up,
your
heat
will
start
filling
up
because
flush
riders
are
pulling
up.
Look
at
tp
stats.
A
Tp
stats
will
show
you
very
clearly
that
flush
riders
block
threads
okay.
I
would
almost
guarantee
that's
what
you're
saying
your
disks
are
not
keeping
up
with
that
load,
because
I've
seen
people
turn
deletes
on
to
the
full
blast
and
that's
basically
like
a
cassandra
stress
test
and
because
it's
not
your
normal
application
mode
right,
you're,
just
doing
it
as
fast
as
you
can
push
that
data
in
there,
because
that's
really
just.
B
B
B
It's
not
going
to
try
to
be
nice
to
anything
else
in
the
system
or
even
inside
cassandra,
and
then
you'll
be
a
much
better
chance
of
keeping
up
with
the
impactions
and
then,
if
you're
on
lcs,
just
make
sure
you're
careful
about
your
ss
table
size
because
you're
going
to
be
doing
a
lot
of
compaction,
it
means
you
probably
don't
actually
want
to
do.
I.
B
As
a
stable
size
megabytes,
it's
in
your
schema,
you
set
that
you
can
set
it.
It's
in
the
it'll,
be
I'll
talk
about
it.
In
my
talk
too,
except
I
usually
recommend
bigger,
like
256
megabytes
or
128
megabytes,
but
you
might
even
want
to
go
smaller
depending
on
your
total
data
size
and
go
with
32
or
64.
If
you
don't
have
a
lot
of
data
per
node
and
then
then
you'll
get
a
lot
more
efficient
compaction.
A
When
I
hear
the
words-
and
this
could
be
joke,
I'm
doing
a
lot
of
writes
and
everything
starts
slowing
down,
compassion.
Okay,
if
you
just
added
this
problem,
yeah
we're
gonna,
try
it
nine
times
out
of
ten.
That's
the
issue,
let's
see
so
we
had
that
one.
You
remember
the
order
and
then
you
and
then
in
the
back
right
back.
You
hold
on
back.
First
go
ahead:
red
checkered.
O
Yes,
like
we're
doing,
you
have
better,
they
create
a
mutton
tree
for
each
column,
family
and
then
the
bucket
trees
get
copied
to
the
node.
Starting
with
that
and
that's
comparing
the
merkle
trees,
I
would
like
to
know
how
much
percentage
of
the
column
family
is
represented
by
one
leaf
and
if
one
leaf
will
match
how
much
data
will
be
transferred
and
re-inserted
by
the
loans,
because
I
feel
sometimes
it's
like
a
really
big
amount.
A
lot
more
that,
I
would
guess
that's
broken,
will
be
re-transferred.
A
Yes,
the
beginning
of
the
repair
operation
builds
the
vertical
tree
by
comparison.
So
it's
not
that
it's
transferring
it
because
it's
building
and
then
once
the
merkle
tree
is
filled.
Then
it
says
here
now.
This
is
what
I
need
that
the
merkle
tree
creation
is
the
heavy
cpu
part
of
that,
and
then
it
doesn't
characterize.
A
O
Tree
from
that
node
to
go
off
for
a
while
and
go
and
calculate
the
merkle
tree
and
then
transfer
the
whole
thing
over
and
then
that
the
node
doing
the
repair
will
do
the
comparison
and
memory
yeah
and
after
that
it
knows
all
the
elements
that's
going
on
and
then
it
will
start
asking
for
the
stuff.
It
needs
yeah,
but
it
falls.
Let's
do
it.
It
just
knows
the
range
of
multiple
row.
O
A
Don't
know
how
much
it's
grabbing
I
do
know
that
the
network
traffic
is
rarely
in
the
merkle
tree.
It's
always
in
the
screen.
A
The
the
heavier
operation-
and
this
is
actually
a
big
deal
with
streams
2.0,
which
is
in
cassandra,
is
how
the
streams
get
managed
by
the
system
and
the
other
part
of
repair
that
people
don't
really
get.
Is
that
when
you
do
a
repair
that
immediately
is
flushed
to
an
ss
table
which
will
create
a
lot
of
compaction
at
that
moment
and
a
lot
of
times.
H
A
Doing
changing
the
type
of
repair
you
do
like
pr
doing,
a
partition
range
or
changing
the
stream,
so
the
stream's
value
sets
this
that
stream
throughput
changing
that
to
a
really
low
value.
So
it's
just
kind
of
splitting
the
value
very
slowly
and
the
stream.
A
A
M
Doing
multi-data
I've
been
using
ssl,
you
can
use
ssl
to
connect
them
up
and
the
documentation
on
using
ssl
indicates
to
a
keystore,
along
with
a
certificate
etcetera
for
every
single
node,
which
seems
like
quite
a
bit
of
pain.
You
want
to
add
one
note:
you
could
go
around
every
single
box
and
add
that
in
there
any
other,
any
recommended
approaches
for
doing
ssl
with
multiple
data
centers,
which
makes
it
like
easier.
B
I've
done
some
experiments
using
tink
instead,
which
is
a
little
vpn
daemon
for
linux.
That's
really,
the
documentation
is
terrible.
I've
got
a
been
working
on
a
blog
post
for
almost
six
months,
just
trying
to
figure
out
how
to
to
do
it,
so
that
I
can
give
it
to
you
guys
and
like
you
can
run
with
it.
So
that's
that's
one
option.
If
you
really
need
encryption
is
to
do
something
like
that,
a
tank
or
ip.
I
don't
want
to
stack
your
stuff.
That's
just
me!
B
So
there's
that
or
finding
some
kind
of
a
certificate
management
system
that
might
help
you
with
that.
So
something
like
a
cert
master
is
a
open
source
python
project
that
generates
certificates
and
passes
them
around.
For
you,
I'm
not
sure.
If
that's
exactly
an
application,
it's
on
my
to-do
list
to
try
that
actually
so
yeah
that
that
is
a
pain
point.
It's
just
it's
it's
a
pain
point
in
every
part
of
crypto.
A
B
Vpn
devices
do
so
using
node
using
cassandra's
crypto,
it
works
and
it's
there
and
it's
a
it's
a
feature
but
you're
going
to
get
way
better
performance
out
of
a
hardware
vpn
device.
You
shouldn't
have
any
latency
on
hardware
and
even
some
of
the
the
local
you
know:
linux,
vpn
software,
openvpn
or
tink,
we'll
use
to
open
us,
sell
stuff
and
appear
on
a
westmear
or
hire
or
haswell
you'll
have
the
aasni
instructions
on
the
on
the
chip,
so
it'll
actually
be
accelerated
in
hardware.
J
J
A
All
right:
well,
the
compact
storage,
because
composites
are
I'm
sorry.
Cql
is
not
something
that's
understood
by
solar,
yet
that's
a
different
type
of
structure,
so
compact
storage
creates
a
cq,
less
column
and
the
secondary
indexes
are
how
it
finds
data.
So
the
data
that's
being
managed
by
the
schema
in
solar
has
to
be
understood
from
a
column
family
perspective,
meaning
that
it
needs
to
know
where
all
the
columns
are
the
values
in
them,
so
those
secondary
indexes
are
created
by
the
schema
creation
of
the
cylinder.
A
A
So
it's
it's
just
a
different
way
of
managing
data
on
top
of
cassandra.
Solar
should
be
super
fast,
but
it
has
to
index
and
data
first
and
that's
how
it
gets
into
the
data.
A
Yeah,
you
should
really
consider
that
column
family
as
solar
column.
Mostly
you
can
use
data
on
there,
normally
with
cql
or
without
with
thrift.
Even
some
c2
stuff,
but
through
mostly
and
the
idea,
though,
is
you-
should
really
have
the
set.
If
you're
going
to
be
doing
cql
like
operations
denormalizing
into
a
different
channel.
J
J
D
Small
clusters,
with
virtual
machines
and
star
jerry
networks,
or
is
that
a
completely
separate
world.
B
D
Right
the
thing
is
this
array
and
we're
wondering
if
that
would
kill
performance
on
a
small
cluster.
We
are
building.
A
B
D
B
B
So
maybe
do
a
raid
10
set
do
another
raid
10
set,
do
another
a10
set
even
inside
the
same
array
but
assign
each
one
to
a
different
node
and
you
can
zone
that
all
off
and
switches
and
all
that
happy
stuff
and
then
you're
pretty
close
to
having
local
attached
disk.
And
if
it's
just
a
science
project
you
know
or
a
dev
cluster,
then
I
guess
go
for
it.
But
it's
definitely
not
something.
B
D
B
A
As
a
solution
consultant
whenever
and
I
get
into
phone
calls
with
people
that
are
in
really
bad
place,
you
know
they've
gone
down
with
this
path
and
then
they,
you
know,
maybe
got
to
a
point
where
they're
having
a
little
bit
of
problems,
we
had
a
customer.
I
can't
name
names,
but
we
had
a
customer
called
us
on
sunday
night
and
said
we
just
went,
live
and
there's
a
fire
in
our
data
center
help
us
now
we
were
there
the
next
morning.
The
problem
was
they
were
using
this
really
shitty
shared
storage
system.
A
B
B
A
F
It's
about
repairs
and
operation
and
maintenance
operations
in
our
cluster.
We
have
a
growing
car
that
runs
a
repair
and
it
takes
a
lot
of
time.
It's
quite
heavy
and
my
question
is
two
questions.
First,
is
there
a
way
to
orchestrate
this?
Is
there
a
better
way
to
manage
this,
and
the
other
is
when
should
we
do
it
and
yeah,
because
I'm
not
sure
if
one
week
will
be
is
enough
or
it
should
be
shorter.
B
B
Days
we
screwed
that
up
and
so
they're
at
a
year
now,
but
we
would
do
the
repairs
every
month.
So
because
you
know
when
you
have
a
36
or
a
larger
cluster
36
node
cluster
you're
rolling
through
and
doing,
we
can
do
one
a
day.
It
took
about
24
hours
to
do
a
node,
so
it
would
take
about
36
days.
So
we
just
basically
kept
it
going
all
the
time
around
the
ring
in
terms
of
orchestration.
I've
heard
that
certain
products
might
be
spreading.
A
B
A
H
F
But
if
it
fails,
it
will
retry
or
something.
A
B
Well,
there's
probably
an
upper
limit;
it
used
to
be
200.
Gigs
was
the
recommended
upper
limit
and
back
in
that
day,
I
was
blowing
way
past
that,
on
all
my
clusters,
I
always
main
or
oldest
cluster
is
at
about
500
gigs
per
node.
Right
now
I
put
five
terabytes
on
a
node
yeah.
If
you're
on
ssds
yeah
five
terabytes
isn't
a
big
deal,
it's
just
you
know.
If
you
need
to
scan
over
all
your
data,
you
just
realize
that
your
spindles
are
gonna,
be
limited.
It's
just
all
about.
I
o
right
yeah.
A
The
limit
is
when
you,
you
can
no
longer
keep
up
with
what
you
have.
If
you
have
a
lot
of
cool
data,
maybe
your
latency
requirements
are
different.
I
mean
there's
so
many
different
variables.
It
can
store
a
lot
of
data
on
there.
It's
not
going
to
fail.
It's
just.
What
is
your
sla
if
you're
looking
for
10
millisecond
sla
on
all
your
reads,
then
you
put
10
terabytes
on
a
node
you're,
not
going
to
get
that
it's
just
not
going
to
happen.
B
B
If
you're
sitting
on
a
really
hot
ssd,
you
know
a
high
end,
intel
or
or
something
like
that,
32
gigs,
probably
do
you
find
as
long
as
the
latency
of.
B
A
A
A
Of
the
new
compaction
strategies
that
are
now
in
202
that
are
coming
it's
coming
out
next
week,
well,
actually
there's
some
of
them
that
have
been
since
1.2
the
new
compaction
strategies
where
they
look
at
time
series
data
and
they
don't
recompact
it
ever.
Those
will
make
it.
So
you
can
really
put
a
lot
of
data
on
there,
because
compaction
is
what
is
really
the
name
of
the
game.
If
it's
sparse
access,
then.
B
C
Actually,
yeah,
that's
actually
all
idea
to
throw
a
lot
of
storage
to
sort
of
build
site
designer
classroom
with
the
crap
out
of
storage.
The
other
thing
I
really.
B
K
A
B
B
I
A
Driver
who's
alive
and
well
puni,
and
I
who
puniti
is
the
guy
who's,
maintaining
it
now
ron
is
no
longer
doing
it.
Punit
and
I
working
well
together
we're
trying
to
blend
the
two
things
together.
Netflix
doesn't
want
to
be
in
the
driver
business
forever,
but
the
asthmatics
is
a
very
robust
driver
for
what
it
does.
It's.
M
I
A
A
Oh
go
ahead.
Evil
question:
oh
man,
we
had
to
have
it.
G
Yeah
in.
G
Very
high
line
of
fine
print,
that's
relevant
mostly
to
to
the
thing
that
centers,
where
you
just
have
one
huge
column,
family,
it's
sized
here
and
compaction,
and
then,
if
you
manage
to
run
over
50-
and
you
can't
compact,
the
largest
one
anymore,
then
you're,
basically
totally
in
trouble.
We
basically
ran
into
that.
The
second
time.
G
Is
there
any
way
to
save
the
cluster,
because
you
can't
even
protect
the
stuff
away.
Even
if
you
delete
the
road
because
it
won't
compact.
A
And
because
of
that,
you
could
still
what
I
would
do
with
that
data
is.
I
would
take
those
largest
ss
tables
move
them.
You
know
expand
the
size
of
your
cluster
like
add
new
nodes,
take
that
existing
ss
table
to
a
different
node
and
run
bulk
loader,
and
it
will
restream
that
data
into
a
new
data
structure
across
your
cluster
that's
kind
of
a
bad
way.
I
mean
that
that's
a
long
way,
but
there's,
but
it's
doable.
You
could
do
stuff.
I
mean
if
you're.
B
Really
a
strap
like
you
can't
get
more
hardware
and
something
like
that
you
could
do
things.
I
suppose
you
could
replace
no
start,
rebuilding
those
and
use
your
hair
and
then
switch
to
lcs
and
then
so
you
take
you
get
the
cluster
completely
healthy,
except
for
the
fact
that
you've
got
you
can't
compact.
What
are
we
doing
shoot
a
note
in
the
head?
B
Rebuild
it
but
put
it
on
lps
and
then
repair,
and
then
you
do
basically
rebuild
an
lcs,
and
then
you
don't
have
that
problem
anymore.
You
could
do
it
without
bringing
your
class.
That
would
be
totally
evil.
I
don't,
I
don't
think
that's
supported,
but
it
would
probably
work
so
I've
done
weird
names
on
consulting,
especially
when
you
have
it,
especially
when
you're
like
no,
you
don't
understand
our
business
is
running
on
this.
We.
A
Can't
bring
it
down
help
us
get
make
a
deal
with
the
devil
and
help
us
get
out.
So
there's
I
mean
that
that
is
doable
because
it
really
comes
down
to
the
fact
that
the
ss
tables
that's
your
data,
have
a
lot
of
options
because
they're
available,
and
that
is
all
your
data
and
you
can
take
those
there's
a
thing
called
ghetto
which
you
can
take
all
those
ss
tables
and
put
them
on
a
different
server
and
reboot
it
and
it'll
just
read
them
like.
Oh,
I
own
all
these
files
there's
some
options.