►
From YouTube: Utilizing Apache Cassandra at UltraVisual
Description
Cassandra has been an integral part of Ultravisual's infrastructure since its launch, allowing us to rapidly prototype and build new features that further enhance user experience. Over the course of this discussion we will cover three key topics. How Cassandra came to be used at Ultravisual and the key problem it solved. How the usage of Cassandra as part of our stack has evolved alongside the product. And finally, some of the experiences we've had with deploying and running Cassandra in a production environment.
A
Hi,
I'm
sky,
I'm
the
lead
systems
architect
at
ultravisual,
and
we
are
a
visual
network
for
inspiration,
expression
and
collaboration,
which
sounds
a
little
aspirational,
but
nevertheless,
so
what
that
means
is
that
you
can
not
only
share
photos
and
share
videos,
but
you
can
also
create
collections,
invite
people
to
collaborate
and
sort
of
make
great
content
together.
That's
what
we
hope
for
and
sorry.
A
A
So
it's
a
very
flexible
feed
and
it's
a
kind
of
a
constant
work
in
progress.
We're
never
really
happy
with
what
we
have
in
there.
We're
always
looking
at
analytics,
trying
to
figure
out
what's
what's
the
best
way
to
get
users
engaged.
So
it's
been
one
of
the
toughest
things
to
keep
steady
in
the
app
and
when
we
started
our
beta,
it
was
really
just
a
very
simple
get
the
post
from
collections.
You
follow
and
really
straightforward,
sql
query,
but
over
time
we
said
well,
these
social
notifications
can
help
engagement.
A
A
People
recently
followed
by
people
I
follow,
and
you
can
imagine
where
that
goes
to
collections
recently,
followed
by
people
I
follow
collaborations,
etc
and
before
you
knew
it,
we
had
like
12
queries
running
and
trying
to
generate
this
real-time
social
feed,
and
it
was
really
bagging
us
down.
We
were
looking
at
like
hundreds
of
milliseconds
to
just
generate
your
feed
when
you
first
get
into
the
app
so
and
also
it
was
getting
postgres
pretty
hard
and
when
you
use
pg
pool
it's
sitting
on
the
master
node.
A
So
even
though
you're
distributing
your
actual
work
over,
read,
replicas
you're
still
sitting
on
the
master
and
taking
up
threads,
and
it's
really
not
great
news,
so
we
rethought
it
and
it
to
us.
At
least
it
looked
like
this
problem
of
we're
just
constantly
bolting
on
all
these,
these
different
parts.
It
needs
to
be
something
a
lot,
slimmer
and
more
flexible.
A
A
So
we
designed
the
model
first
and
if
you
read
the
cassandra
docs
everything
about
it
is
model
first
concern
yourself
with
how
you
can
access
the
data
and
just
over
and
over
and
over
again.
So
we
thought
hard
about
that
and
said
yeah.
It's
it's
only
time
series
data
we're
only
getting
it
in
reverse
chronological
order
and
it's
only
a
user
getting
their
own
data,
so
we
could
actually
start
with
something
pretty
stable
and
around
this
time.
A
Rick
branson
from
instagram
did
a
great
talk
at
the
cassandra
summit
last
year
and
one
of
his
great
quotes
was
try
to
model
data
as
a
log
of
user
intent,
which
does
really
run
true
because
there's
so
much
going
on
that
it's
really
hard
to
keep
generating
all
these
new
types
of
activity.
So
it's
really
the
same
type
of
activity.
It's
something
a
user
did
that
affected
another
user.
A
So
this
is
our
we're
pretty
close
to
our
current
model,
where
the
partition
key,
where
the
first
part
of
the
primary
key,
if
you're
creating
the
table,
is
user
id,
followed
by
a
status
which
I'll
talk
about
in
a
few
minutes,
a
time
uid
and
a
story
identifier
which
is
pretty
self-explanatory.
It's
in
the
first
row,
it's
user,
three
followed
user,
five
or
somebody
created
a
post
or
collection
that
followed
and
the
last
one
which
turned
out
to
be
a
bad
idea,
was
to
actually
cache
the
entire
json
blob
for
that
item.
A
So
I
thought
man
it'd
be
really
great
if
we
could
skip
postgres
all
together
when
we're
generating
your
initial
feed,
let's
just
cache
all
the
data,
and
it
had
this
kind
of
interesting
added
benefit
of
it
was
slightly
stale.
So
if
somebody
updated
their
avatar,
you
were
actually
seeing
this
sort
of
back
in
time
version
of
what
it
was
and
for
some
reason
we
thought
that
was
interesting.
Now
we
think
that's
ridiculous,
but.
A
And
it
was
really
really
fast.
We
got
down
to
like
30
milliseconds
from
client
request
time
down
to
like
like
getting
a
response
back
so
really
cool.
We
were
really
excited
and
we
thought
yeah.
You
know
cassandra,
it's
great
we're
going
to
keep
doing
this
and
this
was
still
in
beta
and
then
we
launched
and
apple
featured
us
and
within
a
week
we
had
used
74
of
the
data
in
our
cluster
or
the
data
space
in
our
cluster,
and
we
said:
oh
okay,
so
caching
json,
probably
is
not
a
great
idea.
A
Even
if
we're
compressing
it,
it
was
like
hundreds
of
gig
streaming
and
we're
getting
rights
fail,
because
if
a
hundred
thousand
people
are
following
a
collection,
you
write
a
post
to
it,
you're
now
writing
gigs
and
gigs.
It
gives
you
the
data
in
the
course
of
a
few
minutes,
and
then
somebody
else
got
from
it
post
to
it,
and
you
get
the
same
thing.
A
A
So,
but
to
do
this,
we
had
to
go
back
to
our
sql
model
and
again
look
at
really
hard
about
how
that
was
built
and
try
to
make
those
queries
faster
and
we
got
smarter
about
batching
and
because
it
was
really
kind
of
forcing
a
decentralized
data
model
or
a
denormalized
model
on
us.
By
having
these
story
identifiers,
we
were
able
to
group
together
queries
every
time
you
requested
a
feed,
so
that
was
actually
a
huge
improvement.
A
But
we've
renamed
negations,
and
one
of
the
reasons
for
that
is
that
when
you
write
a
tombstone
in
it,
you
pass
gc
grace
seconds
which
by
default
is
10
days.
It's
going
to
clear
it
out
of
your
cluster,
never
to
be
seen
again.
So
if,
for
some
reason
you
have
a
node
down
for
10
days
and
nobody
looks
at
it
or
fixes
it,
then
you
could
lose
deletes
and
that's
probably
a
bad
thing.
A
So
we
decided
okay,
everything's,
alright
and
we're
going
to
write
the
negation,
which
is
basically
a
flag
that
just
says
this
data
was
there
and
now
it's
not
applicable
and
there's
also
it
avoids
a
race
condition
that
he
talked
about
in
that
and
that
summit
talk,
which
you
should
definitely
watch
about
if
a
node's
down
and
you
or
even
if
it's
not
down.
But
if
you
send
a
delete
and
it
gets
there
before
the
right,
you
could
definitely
hit
race
conditions
where
things
never
get
recognized.
A
So
on
the
left
is
the
original
instagram
design
on
the
right
is
what
we
did
yeah,
which
they
the
status
flag
for,
if
it's,
if
it's
deleted
or
not
for
instagram
they,
what
they're
doing
is
splitting
it
into
two
separate
lists
and
one
of
the
reasons
they
can
do.
That
is
because
they
only
have
like
100
items
in
their
inbox.
So
it's
really
a
quick
thing
to
just
get
everything
for
us.
We
have
just
about
unlimited
feeds,
so
I
mean
I
looked
the
other
day.
A
My
feed
has
something
like
70
000
items
in
it
and
what
would
make
that
sane
is
by
keeping
the
negations
in
line
in
the
time
series
so
that
you
can
do
a
select
say
with
a
panning
of
maybe
25
percent
to
just
catch
everything
and
do
one
select
and
get
enough
stories
that
haven't
been
negated
to
still
have
enough
to
populate
the
feed,
so
you're
not
doing
two
trips
to
cassandra
for
us.
It's
really
important,
because
again,
the
feed
is
the
first
thing
you
hit
when
you
enter
the
app.
A
A
When
you
sign
up
to
the
app
I'm
actually
going
to
talk
more
about
this
in
the
next
slide,
but
we
put
these
sort
of
sticky
cells
at
the
top
of
your
feed,
which
you
can
dismiss
and
go
through
a
sequence
and
eventually
that
sequence
dismisses
and
you
just
you-
become
like
a
normal
user,
reshare
stats,
just
basic
analytics
tracking.
We
have
a
bunch
of
counters
with
uuids
as
the
key
and
that
works
pretty
well.
We
haven't
seen
too
much
the
problem
that
patrick
was
talking
about
with
commit
logs
and
and
double
increments
and
stuff.
A
A
A
An
api
stats
is
something
that
we
rely
heavily
on,
which
is
just
it's
a
bunch
of
counters
for
different
versions
of
the
app
and
every
single
time
there
anything
happens
on
the
app
server.
We
just
send
an
increment
for
what
version
it
was
and
we
can
get
really
really
detailed
with
how
fast
people
are
updating.
A
You
know
how
fast
the
websites
are
responding,
so
that
is
really
good
for
the
user
onboarding
out
of
all
those
other
cases.
I
thought
this
was
probably
one
of
the
more
interesting
ones.
A
So
again
you
have
a
partition
key
for
your
user
id
and
a
time
uuid
and
the
sequence
is,
you
can
think
people
like
a
campaign,
so
we
have
onboarding
version
one
two,
three
four.
We
might
eventually
have
other
cases
for
it.
Just
like
you
know
daily
featured
or
something,
and
then
we
have
a
step
which
starts
at
one
and
goes
until
there's
no
more
steps
left.
So
when
the
user
requests
their
feed,
they
get
a
couple
of
things
at
the
top.
A
A
U
id
move
on
to
the
next
one,
so
the
next
time
they
get
the
feed
they're
going
to
see
onboarding
v2,
step
two
and
so
on,
and
this
has
definitely
been
something
that
I
know
since
we
started
doing
it
because
we
didn't
launch
with
this
upped
our
engagement
and
would
not
have
been
especially
easy
to
write
with
any
other
system.
A
A
They
talk
about
a
lot
of
talks,
so
we
thought
looks
interesting.
The
datastax
driver
didn't
really
seem
that
complete,
so
we
started
working
with
it.
It
used
thrift,
which
was
already
sort
of
out
the
door,
but
we
were
so
new
to
cassandra
that
we
didn't
really
know
that
yet
and
we
started
writing
the
code.
A
It
felt
like
this
was
too
difficult
for
what
cassandra
is
supposed
to
be.
So
when
datastax
driver
version
two
started
coming
out,
we
looked
at
migrating
and
using
native
transport
and
that's
definitely
been
a
step
up
using
it.
It's
just
it's
cleaner
code.
You
can
write
cql
right
in
the
driver,
which
is
more
native
syntax
to
somebody
who
knows
sql,
and
it
seems
a
little
lighter
just
in
footprint
in
terms
of
what
the
driver
itself
uses
which
no
complaints
there
for
node.js.
A
We
have
a
few
node
services
running
around
like
a
link
shortener
and
we
use
the
node
cassandra
cql
package
or
npm
package.
It's
pretty
nice.
It
has
batch
support,
support
cql3,
there's
not
a
lot
of
dependencies,
so
big
thumbs
up
for
that
one
and
that's!
This
is
actually
what's
recommended
on
the
datastax
website
for
a
node
driver.
So.
A
The
first
thing
we
hit
when
we
moved
to
the
datastax
driver
was
this
guy,
which
was
sending
hundreds
of
thousands
of
batch
updates,
because
we
saw
that
the
datastax
driver
was
so
fast.
We
said,
okay,
let's
throw
as
many
batches
at
it
as
we
can
and
the
first
time
we
threw
like
300
000
rights
at
it.
It
crashed
with
this
really
really
cryptic
error
that
I'm
not
even
going
to
read
because
there's
no
point
and
we
were
still
on
the
pre-release
driver.
A
So
when
2.0
came
out,
it
addressed
issue
229,
which
was
that
they
were
basically
not
telling
you
that
the
limit
was
64k
in
terms
of
batch
statements
and
the
reason
for
that
is
actually
underlying
in
the
cassandra
protocol,
where
you
can't
run
more
than
60
536
batches
at
a
time.
So
just
don't
even
try
it
on
any
driver.
We
hadn't
been
doing
it
on
stein
axe,
because
that
was
the
point
when
we
were
still
sending
jason.
So
we
knew
we
couldn't
support
that
many
were
only
doing
a
few
hundred
at
a
time.
A
A
For
compression,
as
of
I
believe,
as
of
1.2
lz4
is
the
default
or
as
of
1.2
lz4
is
available.
Now
for
2.0
is
the
default.
We
had
a
little
bit
of
a
wrestling
match
with
this
lz4
supposedly.
Well,
it
does
it
it
compresses
over
64k
windows.
So
the
big
issue
we
thought
was
well
we're.
Caching,
this
json
wizard,
700k
and
you're
going
to
get
way
better
compression.
If
you
can
press
over
the
entire
thing,
so
we
did
an
experiment
and
ran
snappy
in
lz4
and
lz4
was
still
faster
and
was
still
smaller
than
snappy.
A
A
A
This
will
fail
in
those
actually
fill
all
versions,
but
the
idea
here
is,
unless
you're
setting
created
at
basically
the
second
clustering
key
to
an
equals.
You
can't
do
any
filtering
after
that.
A
Cassandra
4851
introduced
this
idea
of
tuple
operations
if
you've
done
any
3d
graphics.
This
probably
looks
a
lot
like
a
vector
comparison.
It
operates
a
lot
the
same
way
too.
In
the
data
stacks
example.
Actually
example,
you
can
do
something
like
for
this
hour
and
this
minute,
where
it's
greater
and
this
hour
and
this
minute,
where
it's
less
than
and
you
can
get
interesting
results
by
by
getting
a
more
specific
slice.
A
Unfortunately,
we
haven't
actually
gotten
this
to
work
correctly.
The
example,
if
you
copy
and
paste
our
example,
it
works
great.
If
you
then
write
your
own
table,
try
to
get
it
to
work
with
your
data,
not
so
great,
so
it's
actually
hoping
to
get
some
time
to
talk
to
some
data
stacks
people
today
and
get
that
sorted
out,
because
it's
a
really
great
feature.
A
As
far
as
upgrades,
we
do
as
jake
was
saying
before
manual
package.
Installs
are
not
that
bad
to
do.
We
use
the
datastax
repo
and
we
use
their
dsc20
package,
which
is
really
just
the
community
cassandra.
A
The
reason
we
use
the
repos
because
it
has
the
op
center
and
the
datastax
agent
packages
also,
so
that
makes
it
pretty
easy
to
install
everything
at
once.
The
way
we
do
it
is
we
upgrade
one
note
at
a
time
I'll
have
ops
center
open
on
one
screen
and
the
nice
thing
about
that.
Is
it
auto
refreshes,
so
you
don't
have
to
be
hitting
refresh
or
constantly
doing
like
no
tool
status
and
on
the
other
screen
I'll
have
just
the
command
line
open
to
the
to
the
node
I'll.
Have
a
screen
running
and
I'll?
A
Have
the
the
tail
of
the
system
log
on
the
right
and
I'll
just
do
app
get
update,
app,
get
install,
wait
for
it
to
restart
replay
the
commit
log
handshake
again
and
then
go
back
to
op
center
and
just
wait
for
it
to
become
healthy.
Sometimes,
usually
it
wants
to
do
a
repair.
A
A
A
A
A
A
So
here
the
first
node,
basically,
the
node
you're
working
on,
gets
its
own
ip
address
and
puts
it
into
a
seeds
array,
and
then
it
goes
through
the
rest
of
the
stack
json.
That
chef
provides
you
and
it
just
depends
based
on
whether
you're
using
the
private,
ip
or
public
ip,
which
is
decided
by
if
you're,
using
the
multi-region
snitch
or
one
of
the
just
ec2
snitches,
and
then
we
write
that
out
to
the
owl
and
it
works
pretty
darn.
Well,
we
haven't
had
any
big
issues
with
it:
yeah
that's
open
source.
A
We
have
another.
If
anybody
uses
opsworks,
we
have
an
opsworks
command
line
tool
that
one
of
these
days
we're
going
to
open
source.
Now
that
I'm
talking
about
it'll
be
one
of
these
days
like
today
or
tomorrow.
I
guess,
but
that
makes
working
with
this
a
lot
easier,
because
the
node
goes
down
it's
as
easy
as
ops
works,
ad
production,
cassandra,
size,
c3.large
and
it
just
adds
the
node
and
starts
it
and
it
joins
the
cluster
automatically,
and
that
is
that.