►
From YouTube: Stormpath: Infinite Session Clustering with Cassandra
Description
Speaker: Les Hazlewood, CTO at Stormpath and the Apache Shiro PMC Chair
In this session Les Hazlewood, the Apache Shiro PMC Chair, will cover Shiro's enterprise session management capabilities, how it can be used across any application (not just web or JEE applications) and how to use Cassandra as Shiro's session store, enabling a distributed session cluster supporting hundreds of thousands or even millions of concurrent sessions. As a working example, Les will show how to set up a session cluster in under 10 minutes using Cassandra. If you need to scale user session load, you won't want to miss this!
A
A
Ok,
so
this
presentation
is
about
using
Cassandra
as
a
data
store
for
storage
and
maintenance
of
sessions
and
when
I
say
sessions,
I
really
mean
anything
session
e.
So
this
can
be
a
user
session.
It
can
be
a
device
session.
It
could
be
anything
that
you
want
to
maintain
state,
for
that
is
temporal
in
nature,
so
it's
not
necessarily
just
limited
to
user
sessions,
but
that's
primarily
what
we're
going
to
talk
about
and
demonstrate
in
the
context
of
this
presentation
just
because
it
seems
to
be
one
of
the
most
widely
adopted
use
cases.
A
So
my
name
is
less
hazel
and
I'm
from
storm
path.
We
are
a
cloud-hosted
identity
management
service.
We
focus
on
authentication,
user
management,
security,
workflows,
automating,
all
of
the
stuff
that
tend
to
tends
to
go
wrong
when
you
implement
things
or
when
you're
implementing
applications
exposed
to
the
world
at
large.
How
many
people
here
were
we're
here
for
the
presentation
that
was
just
before
this
one
on
Oh
auth
to
as
a
service?
Okay,
cool?
So
if
you
didn't
want
to
build
and
maintain
that
yourselves,
you
could
use
storm
path.
A
For
example,
now
we
do
a
lot
of
other
things
in
addition
to
that,
but
we
handle
all
these
things
and
all
for
developer
tools
and
SDKs
and
libraries
and
of
course,
as
you
might
expect,
we
use
Cassandra
on
the
back
end
during
the
course
of
this
presentation
would
be
talking
about
apache
Shiro.
That's
the
implementation
that
I'm
going
to
use
to
demonstrate
how
to
do
session
clustering
in
this
particular
use
case.
How
many
people
here
have
not
heard
of
Apache
Shiro?
A
Ok,
a
few
people
so
Apache
Shiro
is
the
Apache
foundations,
open-source
security
framework
for
the
JVM
platform.
It
handles
user
management
at
the
application
level,
session
management,
authentication
authorization
in
best
practices,
cryptography
and
encryption
and
security
digests,
but
in
the
course
of
this
presentation,
we're
only
going
to
be
talking
about
session
management.
So
these
other
things
and
the
web
support
and
auxiliary
features
like
testing
support
and
another
things
in
crypto
are
out
of
scope
for
this
presentation.
Of
course,
this
is
what
I
do
most
of
the
day.
A
So
if
you
have
any
questions
on
this
stuff,
please
feel
free
to
ask
me
after
the
presentations
over
so
okay,
before
we
jump
into
Cassandra
related
code,
let's
cover
some
shiro
quick
concepts.
The
idea,
what
Shiro
is
that
as
a
security,
freaking
work,
everything
is
user
or
in
this
case,
subject.
Centric
and
subject
is
just
a
security
term,
for
that
means
the
currently
executing
user
or
the
currently
executing
individual
or
thing
that's
interacting
with
a
service
or
an
application.
A
So
this
can
be
a
human
being
most
of
the
time
it
is,
but
it
can
also
be
a
device,
a
daemon,
a
third
party
service.
It's
anything.
That's
currently
interacting
with
the
service
and
Shiro
on
almost
all
of
its
security
operations
takes
a
subject:
centric
approach
to
its
API.
We
find
that
people
tend
to
think
about
things
and
user
use
cases.
You
know
if
I'm
accessing
this
REST
API
endpoint
can
I
do
X,
Y
or
Z,
or
am
I
allowed
to
click
this
button.
A
So
everything
takes
a
user-centric
approach
and
there's
all
sorts
of
convenience
methods
on
a
subject
that
represents
the
current
user.
For
example,
you
can
log
in
the
user,
you
can
do
some
permission,
checks,
authorization,
checks,
role,
based
access,
control
and
whatnot,
so
that
we're
going
to
be
referencing.
The
set
the
subject
here
to
show
you
what
this
kind
of
looks
like
in
code
when
we
start
doing
session
management.
So
what
do
I
mean
by
session
management?
It's
really
managing
the
life
cycle
of
a
subject-specific
temporal
data
context,
and
that's
really
a
big
mouthful.
A
That
just
means
you
know
state
that
is
associated
with
some
identity
over
a
period
of
time,
most
people
associate
or
think
of
HTTP
web
sessions
when
they,
when
they
hear
about
session
management-
and
that
is
the
most
widely
used
use
case
and,
of
course,
as
I
mentioned
before,
this
can
actually
pertain
to
any
kind
of
state
managed
over
time.
So
devices
time
series
data
attributed
to
a
particular
field
device.
Anything
kind
of
that
fits
that
mold
can
apply
here.
So
what
can
Shiro
do
in
regards
to
session
management?
A
One
of
the
cool
things
about
shero's.
It
supports
this
notion,
heterogeneous
client
access,
and
that
means
you
can
access
the
same
session
from
different
devices
or
different
web
browsers.
This
is
a
feature
that
doesn't
exist
in
Java
or
je
e
or
je
anywhere.
That
I'm
aware
of.
For
example,
you
can't
access
the
same
stateful
session
being
from
a
web
web
browser,
as
you
can
from
a
server-side
component,
because
there's
inherently
connection
state
and
other
things
associated
with
those
those
access
patterns,
but
Cheryl
allows
you
to
do
this
by
session
ID
access.
A
Everything
is
eros,
pojo
based
j2se
based,
so
it's
very
icy
friendly
for
things
like
juice
and
spring
and
other
in
and
jboss
and
other
dependency
injection
frameworks.
It's
got
an
event
like
Senator
mechanism,
so
you
can
listen
for
relevant
session
events.
You
know
expiration,
you
know,
attributes
are
added
or
removed
creation,
all
sorts
of
good
stuff.
It
also
retains
the
supports
this
thing
called
host
address
retention.
A
So,
unlike
most
servlet
requests
or
just
other
generic
session
infrastructure,
you
can
retain
the
IP
address
from
where
the
session
was
initiated,
and
sometimes
that
can
be
useful
for
access
control
policies,
especially
in
intranets.
There's,
of
course,
inactivity
and
exploration
support,
as
you
might
expect,
but
share
also
supports
this
notion
of
a
touch
method.
A
So
what
that
means
is
that
if
you
already
have
code
that
programs
to
the
session
API
in
the
servlets
back,
you
have
to
change
any
of
that
code.
You
can
this.
Every
everything
still
works,
as
expected.
Shero
implements
a
servlet
specification,
so
you
don't
have
to
change
your
source
code
and
one
of
the
biggest
reasons
people
use,
zero
and-
and
this
presentation
kind
of
reflects.
That
is
that
you
can
get
container
independent
clustering.
A
So
if
you
drop
cheer
on
a
nap
and
point
to
a
clustered
session
store
like
Cassandra,
that
will
work
the
same
in
any
web
container,
whether
it's
in
jboss
or
glassfish
or
tom
cat
or
jetty,
you
don't
have
to
change
how
you
cluster
sessions
right
today.
Almost
all
of
those
mechanisms
are
container
specific
and
requires
you
to
to
to
know
how
that
containers
configuration
operates,
and
so
you
can
test
and
jetty
and
deployed
tomcat
production
or
any
other
combination.
And
you
don't
have
to
change
your
source
code.
A
That's
a
really
big
benefit
for
teams
to
minimize
kind
of
fluctuation
during
development.
Ok,
so
how
do
you
acquire
a
session?
How
do
you
create
them?
How
you
access
them?
As
I
said
she
Rose
a
subject,
specific
API
or
subjects
centric
API
and
there's
really
two
ways:
you
can
call
the
subject
get
session
method
that
will
guarantee
a
session
exists.
A
So
if
one
does
not
exist
at
the
time,
this
method
is
called
a
new
one
is
created
and
returned
if
one
already
exists
or
is
already
associated
with
that
subject,
that
just
returns
the
existing
one
and
then,
of
course,
just
like
the
servlet
API,
you
can
pass
in
a
boolean
to
indicate
whether
or
not
a
session
should
be
created
or
not.
So,
in
certain
cases,
for
example,
rest
api's.
You
want
to
ensure
that
your
rest
api
can
remain
stateless,
but
if
they
already
have
a
session
on
via
the
user
interface,
for
example,
an
admin
console.
A
Parallel
the
servlet
request:
API,
you
can
get
set
attributes
you
can
set
the
timeout
for
an
individual
session.
There
is
this
notion
of
touch
which
the
servlet
API
does
not
have,
but
pretty
pretty
common
things,
things
that
you
would
expect.
So,
let's
talk
about
how
this
actually
works
inside
a
Shero
in
the
internal
architecture,
because
this
is
going
to
be
important
when
we
talk
about
how
to
plug
in
Cassandra.
A
So
when
you
call
subject
get
session
and
it
returns
a
session
what's
actually
returned
is
this
is
an
interface
by
the
way,
it's
not
a
concrete
class.
It's
actually
returning
is
a
very
lightweight
proxy
to
Shiro's
internal
session
manager
and
the
session
manager,
as
you
might
expect,
manages
all
sessions
or
a
particular
application,
and
so
all
operations
on
the
session
interface
itself
actually
are
delegated
to
the
session
manager.
A
To
do
the
real
heavy
lifting
and
the
session
manager
in
turn
uses
a
session
factory
to
create
brand
new
sessions
new
session
instances
at
the
time
that
they're
requested.
It's
also
has
this
notion
of
a
session
Dao
everybody
in
here
familiar
with
the
data
access,
object,
design
pattern.
You
know
just
thin
tier
layer
to
access
your
on
the
length
data
store,
so
there's
a
session
Dao.
The
intern
reference
is
a
session
ID
generator,
so
you
can
customize
what
your
session
IDs
are.
A
A
Also,
you
can
utilize
a
session
cache,
so
you
don't
have
to
hit
the
data
store
every
time
and
keep
things
in
memory
or
especially
useful
if
your
cash
is
a
clustered
distributed,
cache
like
coherence
or
hazel
cast
or
any
of
these
other
clustered
cash
frameworks,
and
then
finally,
the
cash
can-can
proxy
to
a
datastore
and,
of
course,
of
a
cache,
is
not
enabled
the
data
storage
access
directly
in
this
data
store,
as
you
might
expect,
is
completely
pluggable.
It
can
be
almost
anything
it
can
be
a
disk.
A
A
Session
managers
also
have
a
session
validation
scheduler,
which
I'll
talk
about
in
a
little
bit,
and
it
supports
the
notion
of
session
listeners
the
ability
to
listen
to
various
events
during
a
sessions
lifetime.
So
you
can
react
to
it
and
perform
business
logic.
So
all
of
this
kind
of
represents
how
to
access
the
session,
how
the
session
works
and
the
kind
of
underlying
architecture
infrastructure
of
how
Shero
thinks
about
sessions.
But
the
most
relevant
parts
for
this
conversation
are
the
purple
parts.
A
You
know
sorting
by
most
recently
accessed
or
oldest
session,
or
what
have
you
the
datastore
is
Cassandra.
We're
not
worried
about
that,
and
one
of
the
interesting
things
is
this
notion
of
a
validation
scheduler.
So
it's
every
session
framework,
that's
out
there,
whether
a
Shiro
or
tom
cat
or
anything
else,
has
this
thing
called
a
validation
scheduler.
The
main
name
might
be
different
across
the
frameworks,
but
ultimately
the
scheduler
is
required
for
running
periodically
and
deleting
orphan
data
out
of
the
data
store.
A
So
you
want
to
make
sure
the
old
sessions,
those
that
have
expired
or
have
been
implicitly
terminated,
but
not
explicitly
terminated
are
cleaned
up,
so
they
don't
fill
up
your
data
store.
If
you
don't
have
this
scheduler
or
some
kind
of
validation
mechanism,
it
will
fill
up
over
time
and
that's
never
a
good
thing.
Your
disks
will
run
out
of
space
and
whatnot.
So
this
is
an
important
concept
that
everything
has
0.
Has
it
and
we'll
talk
about
specifically
in
regards
to
Cassandra
how
this
is
important
or
whether
it's
important
or
not?
A
So
how
do
I
enable
this
inside
of
an
application
because
we're
talking
about
web
apps?
In
this
particular
case,
you
can
do
everything
that
we're
talking
about
here,
the
simple
web
XML
configuration
or
granted
with
the
the
latest
advent
of
the
servlet
30
spec.
You
could
do
these
things
programmatically
as
well,
and
what
we
want
to
do
is
make
sure
that
we
protect
all
URLs
so
any
request
that
goes
into
the
system.
We
want
to
intercept
it
and
make
sure
that
we
can
leverage
Shero
session
implementation
instead
of
the
servlet
containers.
A
Because,
again,
I
said:
Cheryl
implements
a
servlet
spec,
so
we
need
to
make
sure
that
that
is
represented
instead
of
the
default
servile,
behavior
I'm
sure
also
can,
in
addition
to
sessions,
protect
all
URLs,
of
course,
there's
authentication
and
access
control
authorization
tools
that
you
can
do
at
aur
specific
level.
Cheryl
supports
us,
this
notion
of
defining
filter
chains
in
a
very
concise
manner.
That
is
probably
the
most
succinct
and
easiest
to
use
mechanism.
A
I've
ever
seen
for
a
web
app
much
easier
than
defining
a
ton
of
filters
in
web.xml
and
we'll
talk
a
little
bit
about
what
that
looks
like
in
a
minute.
There's
also
a
JSP
tag
support,
so
you
can
control
whether
or
not
elements
in
jsps
or
JSF
pages
are
rendered
based
on
session
state
or
user
state,
all
sorts
other
things,
and
as
I
mentioned
before,
we
implement
the
Soviet
spec.
So
you
don't
have
to
change
your
session
based
code.
So
this
is
how
you
enable
Shero
in
a
web
app
either
XML
or
programmatic
config.
A
You
enable
a
listener
that
that
loads
up
Shero
in
its
environment
and
then
you
specify
filter,
so
it
can
intercept
all
requests
that
come
into
your
web
app.
In
addition,
you
want
to
make
sure
that
the
filter
mapping
for
that
filter
can
intercept
every
type
of
request
that
is
performed
by
the
servlet
container
dispatcher.
So
there's
there's
request
types:
there's
forwarded
fort
requests
forward
types
include
types,
error,
error,
error,
handling,
Cheryl
wants
to
filter
all
of
them,
so
it
can
inject
the
appropriate
behavior.
A
So
this
just
gets
it
set
up
and
then
finally,
Cheryl
has
its
own
config
format.
We
use
ini
as
our
lowest
common
denominator
format
if
you're
using
spring
or
annotation
based
configuration.
Those
are
definitely
a
great
choice
and
probably
preferable,
but
this
will
always
work
regardless
of
what
kind
of
programming
environment
you
might
be.
Deploying
an
excuse
me,
and
so
the
breakdown
of
this
is
that
Cheryl
has
a
main
section
which
basically
is
an
object.
Graph
config
you
can
define
some
static
users,
some
static
roles
and
the
URLs
section
is
defining
filter
change.
A
So
any
given
request
that
comes
in
you
can
define
a
set
of
filters
that
will
perform
security
operations
either
garrity
accession
exists
or
make
sure
that
they're
authenticated
and,
if
they're
not
redirect
them
to
a
login
page,
all
sorts
of
that
stuff
can
be
configured
here
in
a
more
succinct
way
than
using
what
about
XML.
So
how
do
we
leverage
these?
You
know
this
config
to
implement
session
clustering,
specifically
using
Cassandra.
A
You
want
to
be
able
to
handle
this
at
extreme
scale,
so
we're
going
to
leverage
cassander
fat.
How
do
you
do
that?
So?
There's
two
approaches
in
Shiro.
You
can
either
write
a
session
dao,
as
I
mentioned
before.
That's
the
thing
that
manages
access
to
a
data
store
or
you
can
leverage
she
rose
out
of
the
box
enterprise
cast
session,
dao
implementation
and
just
write
a
cache
manager.
So
again,
a
cache
manager
can
talk
to
a
clustered
distributed
cache.
That's
perfectly
fine!
A
A
The
only
things
that
we
have
to
worry
about
when
we
with
this
particular
approach
is
that
we're
going
to
focus
on
the
implementation,
the
session
ID
generator
and
then
the
actual
data
store
that
we
have
to
talk
to
so
the
best
approach
when
you
do
this
is
probably
to
extend
the
Shiro
abstract
session
dao,
which
implements
a
lot
of
a
lot
of
logic
for
you,
like
session
ID
generation
and
delegation
to
an
ID
factory
and
whatnot.
So
you
just
basically
have
to
override
a
couple
methods
that
perform
the
crud
operations
to
the
underlying
data
store.
A
A
So
the
way
to
enable
native
session
management
or
session
management
controlled
by
Shiro,
instead
of
the
servlet
container,
is
to
just
tell
Shiro:
hey
I
want
the
session
manager
that's
in
use
to
be
a
default
web
session
manager.
This
overrides
the
the
configured
or
implicit
default
manager.
That's
that's
used
in
web
apps
at
startup,
which
piggy
backs,
or
rather
wraps
the
servlet
container
session
manager.
So
now
we're
saying
I
don't
want
to
use
a
surrogate,
an
arrow
when
you
shiro,
so
I
can
leverage
these.
A
These
are
these
clustering
things
and
then
we're
going
to
do
some
more
config
that
allows
us
to
interact
with
the
actual
underlying
cassandra
datastore.
So
these
are
just
some
utility
classes.
This
is
this
is
from
a
sample
application.
I
wrote
this
is
all
available
apache
license
on
github
and
I'll
show
you
the
urls
at
the
end
of
the
presentation.
If
you
want
to
download
this
yourself
and
hack
on
it
and
kind
of
play
with
it,
but
we're
creating
a
cluster
object
this.
This
really
creates
a
cluster
object
using
the
Cassandra.
A
Excuse
me,
the
data
stacks
Cassandra
driver,
and
so
this
cluster
object
is
then
used
to
perform
all
of
our
queries
and
do
our
crud
operations.
So
the
dao
here
is
just
being
configured.
Here's
the
cluster,
here's,
the
key
space
I'm
going
to
interact
with
this
is
a
particular
table
or
column
family
that
I
want
to
interact
with.
These
are
all
the
defaults,
pretty
simple
stuff,
so
once
this
dao
is
to
find
you
just
have
to
configure
it
on
the
session
manager.
A
So,
as
you
can
kind
of
see
from
some
of
these
things,
then
the
main
section
of
Shero
is
really
just
simple
object:
graph
kind
of
definition:
I'm
defining
objects,
I'm
defining
properties
on
objects.
This
is
nothing
kind
of
interesting,
but
it
is
very
convenient
from
a
text
config
kind
of
way.
So
these
are
all
use.
Javabeans
properties,
as
you
might
expect,
I'm
very
easy
again
if
you
have
spraying
or
juice
or
annotation
base
config,
that's
probably
better,
but
this
works
for
everybody.
Ok,
so
now
we've
got
our
dao
plugged
into
shiro.
What
is
the
table?
A
Look
like,
so
this
is
a
cql
definition
of
what
our
sessions
column.
Family
looks
like
in
Cassandra
so
of
course,
there's
an
ID.
Every
every
session
has
an
identifier
that
it
can
be
used
to
access
on
on
subsequent
requests,
but
we
also
keep
track
of
other
information
that
might
be
useful,
especially
for
sorting
reasons.
So,
there's
a
start
time
stamp
a
stop
time,
Sam
the
last
time
that
the
session
was
accessed
by
an
end
user.
That's
really
important
for
session
and
validation
reasons.
A
A
This
is
really
just
an
optional
thing.
What
what
the
host,
what
the
host
from
where
the
session
was
created?
You
know
its
IP
address
and
the
serialized
value
is
really
the
serialized
attribute
map
from
within
the
web
app.
So
you
know
you
could
do
sessioned
get
attributes
that
attribute.
These
are
all
apps
specific
things,
and
so
it
might
be
prudent
to
mention
that,
as
is
the
case
in
this
implementation,
as
well
as
pretty
much
every
session
implementation
I've
ever
seen,
is
you
want
to
keep
your
serialized
attributes
the
session
state
itself
extremely
minimal
right?
A
The
more
session
state
you
have,
especially
if
its
large,
the
more
you
have
to
spread
that
state
over
cluster,
the
more
I?
Oh,
you
have
and
serializing
it
and
reading
it
into
memory
session
state
can
kill
large
scale
apps.
So
the
only
thing
that
really,
if
you
want
to
scale
that
should
be
stored
in
sessions,
are
simple
pointers.
You
know
an
identifier
or
a
couple
identifiers
to
something
else
that
you
can
look
up
from
a
cached
data
store.
So
the
idea
is
keep
the
stuff
minimal
and
your
performance
with
Cassandra
will
be
much
much
better.
A
No
Butte,
no
big
huge
object.
Graphs
in
the
session
yeah,
don't
store
the
entire
UI
state
in
the
session
yeah
okay.
So
we
talked
before
about
a
session
validation,
scheduler,
and
if
you
noticed
in
the
config
previously,
we
didn't
specify
any
session
validation
scheduler
it
one
is
defined
by
default,
but
we
don't
need
one
in
this
case
right,
there's
a
lot
of
overhead
in
querying
a
datastore
and
finding
all
of
the
sessions
that
I've
been
expired
and
if
they
are
either
purchase
them
from
the
datastore
mark
them
as
deleted.
A
Again,
as
I
mentioned
every
session
mechanism
out,
there
has
to
do
this
to
prevent
orphans,
but
we
don't
need
one
when
we
use
Cassandra,
it's
really
great
because
we
can
use
Cassandra's
TTL
feature.
So
if
you
specify
a
TTL
equivalent
to
or
equal
to
your
session
timeout
for
that
particular
session,
you
can
trust
that
Cassandra
will
automatically
delete
that
session
from
your
datastore
and
that
these
orphans
don't
fill
up
over
time.
So
again,
this
pertains
to
sessions.
This
pertains
to
open,
ID
tokens.
A
This
pertains
to
anything
that
is
temporally
stateful
and
so
we're
going
to
leverage
the
same
behavior
here.
So
the
way
you
tell
Shiro
to
not
enable
its
default
scheduler
is
just
turn
it
off.
So
we're
telling
the
session
manager
hey
I,
know,
session
validation
is
really
important,
but
in
this
case
I'm
explicitly
turning
off,
because
I
know
that
my
datastore
will
do
it
for
me
and
that
way
the
application
doesn't
bear
the
overhead
of
having
to
do
it
itself.
A
So,
given
the
table,
you
know
what
happens
when
a
new
session
is
created.
Here
is
a
sequel
query
that
we
can
use
that
will
populate
a
row
in
that
table,
so
we're
going
to
update
the
session
table
using
a
particular
TTL
dollar
sign
timeout.
It's
just
my
own
little
identifier,
token,
for
this
presentation
to
show
you
where
the
real
value
would
go.
A
That
is
not
valid
sequel
syntax,
but
that
shows
you
where
the
real
value
would
be
substituted
during
runtime,
ideally
be
like
a
question
mark,
but
that's
where
you
want
the
TTL
and
then
you
set
all
the
values,
and
then
you
specify
the
ID
of
war
that
that
value
should
be
saved,
and
so
this
is
commonly
referred
to
as
an
upsert.
Cassandra
doesn't
care
whether
this
is
the
first
time
the
row
of
the
records
being
written
or
if
it's
an
update
up,
certs
work
really
well
for
Cassandra.
A
A
What
so,
the
question
is
what,
if
you
have
these
TTL,
what
about
tombstones
and
that
it's
an
amazing
question,
what
about
tombstones?
A
So
the
reality
is
that,
yes,
if
there
is
a
lot
of
reading
and
writing
and
deleting
you're
going
to
create
tombstones
and
crew,
tombstones
and
variably
create
or
add
more
space,
and
they
need
to
be
cleaned
out
as
well,
so
when
using
Cassandra
as
a
time-based
data
store
or
temporal
data
store
for
session
state,
you
want
to
be
careful
with
your
GC
grace
and
the
compaction
strategy
that
you
use.
So
we
recommend
a
GC
grace.
A
You
know,
as
many
of
you
probably
know,
GC
grace
by
default
in
cassandra
is
10
days,
which
means
tombstones
can
live
up
to
10
days
in
the
datastore,
and
so
if
sessions
are
expired
and
say
a
session
timeout
of
30
minutes
or
an
hour.
That's
an
awful
lot
of
time
to
keep
this.
This
data,
that's
probably
never
going
to
be
used
around.
So
we
recommend
a
GC
grace
in
this
case.
86400
is,
is
one
day
24
hours,
but
you
can
even
go
much
lower
than
that
depending
on
your
load.
A
So
the
more
concurrent
sessions
you
have
and
the
more
scale
that
you
have
you
might
be
incurring
more
and
more
tombstones,
so
you
might
want
to
reduce
this
even
further.
We
found
that
a
day
in
practice
works
pretty
well
for
most
people,
but
you
know
your
mileage
may
vary.
Tuning
is
important,
so
get
some
data
and
make
adjustments
as
necessary.
Compaction
here
is
really
important,
though,
so,
how
many
people
here
do
not
know
the
difference
between
size,
tiered
compaction
and
leveled
compaction
in
Cassandra?
A
Okay,
so
we
got
some
folks,
so
sighs
tiered
compaction
basically
will
compact
SS
tables
for
more
efficient
dish,
storage
and
utilization
from
Cassandra,
based
on
the
size
of
the
SS
table
file
itself.
So
as
it
gets
to
a
certain
file,
its
going
to
there's,
there's
going
to
be
more
s,
s
tables
created
after
that
size
is
reached
and
that's
a
fine
strategy
in
an
efficient
for
space
utilization.
When
you
have
a
right,
mostly
workload
or
like
time,
series
data,
that's
not
manipulated.
A
So
that's,
actually,
a
really
good
use
case
for
using
the
size
to
your
compaction
strategy
levelled
is
really
important.
If
you
have
a
read
heavy
workload
or
an
update
heavy
workload,
and
what
this
does,
is
it
actually
compacts
SS
tables
based
on
kind
of
an
LRU
strategy
or
frequency
of
use,
and
so
the
things
that
are
most
recently
accessed
go
into
the
tier
0,
then
tier
1,
then
tier
2,
and
so
on.
So
there's
different
levels
that
act
as
as
persistent
tears
for
the
USS
tables.
A
That's
really
important
for
sessions,
because
the
things
that
are
most
active
or
or
widely
used
or
frequently
used
you
want
to
have
in
the
interior
0
if
possible,
and
so
that
the
the
probability
of
this
stuff
is.
Actually
you
know.
The
amount
of
SS
tables
that
you'll
have
to
read
on
average
is
1
point
1
1111,
like
you,
have
a
ninety
percent
chance
of
seeing
your
data
in
the
tier
0.
Then,
if
you
have,
if
it's
not
there,
you
hit
tier
1,
you
have
a
ninety
percent
chance
of
hitting
it
there.
A
So
leveled
is
much
more
efficient
for
for
for
read,
heavy
or
update
heavy
kind
of
flow,
so
make
sure
you're
using
leveled
compaction
strategy.
That
being
said,
this
does
incur
more
I.
Oh
right,
compaction
happens
more
frequently,
there's
more
I
Oh
on
disk,
and
so
you
have
to
be
aware
of
that
when
you're
customizing
or
testing
your
sessions
at
scale
and
practice.
This
isn't
this
isn't
really
that
big
of
a
deal
special
if
you're
on
SSDs,
but
you
know,
just
run
some
tests
and
make
sure
that
they
that
this
works
for
your
particular
scenario.
A
What
about
row?
Caching,
this
is
a
question
that
comes
up.
Sometimes
you
know,
hey
can
I
turn
on
row
cash
and
have
it
worked
as
sort
of
like
a
memcache,
so
I
don't
really
need
to
be
hitting
disk
all
that
often,
and
until
today's
announcement
with
two
dot
one,
you
probably
don't
really
need
it
most
of
the
time
row.
Caching
is
used
in
very
very
specific
use
cases.
The
data
stack
score.
A
Engineers
are
probably
the
best
people
to
ask
about
when
it's
when
it's
valid
to
be
used,
the
SS
table
is
already
likely
to
be
an
operating
system,
page
cache
so
and
it's
off
heap
in
Cassandra.
So
it's
already
likely
to
be
a
memory,
so
you
don't
really
need
row
cash
again.
Lest
you
know
four
versions
less
than
two
that
one's
you
do
want
to
use
key
cash.
It's
really
important
to
make
sure
your
keys
are
cached,
so
they
can
be
accessed
efficiently,
11
accessing
a
partition
row.
A
This
is
really
important,
but
it's
okay,
it's
it's
enabled
by
default
on
Cassandra,
12
and
above
so
most
time.
You
don't
have
to
worry
about
this.
That
being
said,
I'm
going
to
run
some
tests
now
that
two
dot
one
is
out
and
it's
available
on
using
row
cash
for
these
specific
use
cases
because
row,
cash
and
two
not
one-
is
really
interesting,
where
you
can
keep
the
most
frequently
used
entries
in
a
particular
memory
cache
which
makes
a
lot
of
sense
for
sessions
right.
A
Those
are
the
things
that
are
constantly
being
updated
in
access,
and
so
for
this
particular
case,
maybe
you
might
want
to
keep
a
thousand
or
5,000
sessions
in
the
row
cash,
and
you
can
do
that
based
on
the
two
dot
one
specific
row
cash.
The
code
is
here
feel
free
to
check
out
a
project
and
download
it
and
update
it.
I
don't
really
have
a
whole
lot
of
time
left.
A
What's
my
cut
off
time,
1115
anybody
know
1130
perfect,
so
I
can't
show
you
a
demo
awesome
all
the
perils
of
real-time
demos,
but
now,
okay,
so
I'm
going
to
show
you
guys,
basically
how
this
this
project
is
set
up,
what
the
code
looks
like
and
how
you
can
run
it
yourselves.
So
why
did
I
choose
Cassandra.
A
So
so
the
question
is:
why
did
I
choose
Cassandra
because
alternatives?
There
are
different
alternatives
and
for
me
one
of
the
important
things
was
we
wanted
session
state
to
be
persistent
and
geographically
distributed
in
case
of
a
data
center
disruption,
and
so
that
means
that
our
sessions
are
fully
fault,
tolerant
across
geographic
work
zones
and,
in
my
experience,
persistent,
eventually
consistent
data
stores.
A
Additionally,
the
maintenance
overhead
from
the
dev
ops
team
is
so
low
with
Cassandra
that
there
are
other
variances,
our
other
other
programs
that
require
more
operation
overhead,
that
just
wasn't
worth
it
for
us,
and
so
we
found
Cassandra
they
scaled
to
hundreds
of
millions
of
sessions
with
no
noticeable
impact
on
performance.
It
was
unbelievable,
so
it
was
kind
of
a
no-brainer
for
us
also.
We
were
using
Cassandra
for
other
things,
so
the
more
that
we
could
consolidate
to
a
single
datastore
just
made
it
easier
for
operations.
A
Yes,
how
many
replicas?
We
we
run
a
minimum
of
four
replicas
across
our
data
cluster,
so
we
have
seven
nodes
that
are
online
at
all
times,
so
we
write
with
a
confirmed
consistency,
but
we
read
with
a
consistency
level
of
one
which
is
actually
interesting.
You
guys
get
a
chance
to
go
see
a
data
stocks
talk
actually
not
datastax
a
netflix
talk,
I,
don't
know
if
Christos
talks
about
this
in
his
presentation,
but
the
the
variability
on
consistency
level
of
one
and
having
something
go
wrong
is
so
incredibly
low,
like
at
the
millisecond
level.
A
That
odds
are
high,
that
it's
not
really
worth
an
issue
worth
incurring
the
network
overhead
of
quorum
consistency,
especially
if
it's
data,
that's
not
like
super
critical.
It
depends
on
what
you
in
your
session
in
this
case,
but
most
of
the
time,
it's
not
that
big
of
a
deal.
You
can
also
implement
retry
logic
and
the
client
side
of
anything
fails
on
CL
level
one.
Then
you
try
and
quorum
and
then
that
usually
fixes
everything,
but
we
found
performance
be
much
much
better,
as
did
netflix
by
using
a
consistency
level
of
one.
Yes,.
A
So
is
the
question
if
I'm
writing
with
a
consistency
level
of
quorum?
How
do
I
handle
failures?
Is
that
the
question
almost
always
it's
expected
to
implement
that
that
logic
in
your
application,
and
so
it's
really
easy.
For
example,
I,
don't
know
if
you
guys
use
guava,
but
the
the
Java
tool
kit
from
from
the
Google
team,
or
you
can
write
it
yourself,
but
it's
really
easy
kind
of
wrap
in
a
function
that
implements
exponential,
back-off
right,
and
so
the
first
thing
is
try.
If
it
fails,
try
it
again.
A
You
know
50
milliseconds
later,
if
that
fails,
try
again
200
milliseconds
later,
and
you
keep
doing
that
up
to
a
certain
max
and
there's
utility
functions
it
I
know.
Guava
has
them
I'm
sure
it's
easy
enough
to
write
yourself.
If
you
don't
want
to
depend
on
the
library,
but
that's
definitely
the
best
way
to
do
that
and
your
code
can
block
until
it
succeeds.
When
you
have
that
many
nodes
online
odds
are
very,
very
high
that
it's
just
going
to
work.
It
doesn't
always
work
that
way.
A
A
So
this
is,
this
is
basically
a
dao.
It's
a
pojo,
it's
nothing
very
special
again,
its
subclasses
that
abstract
one.
It's
got
some
lifecycle
methods
associated
with
it.
This
is
an
ideally
designed.
This
was
just
done
quickly
for
the
demo.
You
know.
I'd
probably
extract
some
of
the
stuff
out
into
proper
oo
components,
but
the
idea
here
is
that
we're
setting
a
time
uuid
session,
ID
generator
and
if
I
go
into
this
you'll,
see
that
it
just
uses
the
Cassandra
datastax
library
you
IDs,
that
time
based.
A
There's
nothing
special
about
this.
You
don't
have
to
depend
on
the
third
party
library.
Let
me
go
back
here,
that's
already
kind
of
built
into
the
data
sex
driver.
So
you
can.
You
can
leverage
time
base
uuid
generation
cluster?
Is
the
Cassandra
datastax
object.
Our
driver
object.
What
key
space
I'm
using?
What
table
we're
going
to
persist
the
stuff
into
and
the
anit
method
kind
of
lazily
creates
the
key
space.
If
it's
not
there.
A
Just
see
if
I
can
find
that
thing,
you
know
in
this
case
I'm
just
using
a
simple
strategy
or
replication
factor
of
one,
because
this
is
the
local
test.
Ideally,
you'd
use
a
replication
factor
of
three
or
whatever
n
divided
by
2,
plus
one
is
for
your
cluster
and
we're
going
to
go
ahead
and
create
our
table.
It
lazily,
and
this
is
what
I
was
showing
you
guys
before.
A
Alright
we're
going
to
create
a
table
with
a
uuid
time,
uuid
for
the
identifier,
some
different
values,
here's
my
GC
grace
and
the
level
of
compaction
strategy,
and
this
is
going
to
create
again
create
this
table
on
startup.
If
it
does
not
exist
and
hear
some
of
our
kind
of
creation
logic
we're
going
to
we're
going
to
generate
a
session
ID
we're
going
to
assign
it
to
that
particular
session.
A
We're
binding
our
argument,
which
is
the
session
ID
to
that
statement,
we're
going
to
serialize
some
of
the
data.
These
are
the
the
class
attributes
are
in
the
the
session
attributes
and
we
store
all
this
as
a
byte
buffer,
and
then
we
execute
the
statement
to
the
data
store
and
here's
actually,
you
know
we
saw
this
already.
So
the
idea
is
that
all
of
this
is
encapsulated
in
a
single
class.
It's
really
easy
to
use
and
I'll
show
you
quickly
using.
A
A
We
didn't
test
it
extensively
to
really
find
out
if
there
is
a
difference
between
the
two.
So
for
us,
upsert
worked
fine
in
all
cases,
especially
for
hero,
because
when
a
session
is
not
there,
we
want
to
add
it
to
the
store.
If
it
is
there,
we
just
want
to
update
it
and
because
that's
implemented
by
Cassandra
by
default.
There
is
no
extra
logic
that
we
had
to
add
in
our
app.
So
we
didn't
even
try
to
test
the
performance
difference
of
insert
versus
update
because
it
always
worked.
A
A
Yes,
I,
don't
honestly
I
mean
I
would
actually
I
would
ask
a
datastax
engineer,
I'm,
not
sure
the
insert
and
update
of
it,
if
are
any
different
at
all,
with
the
exception
of
where
the,
where
clause
you
know,
it's
basically
using
the
same
identifier,
to
hit
the
rocky
in
the
database,
so
I
am
I,
don't
know
for
sure,
but
I'm
pretty
I
would
be
I
would
venture
to
say
that
there's
no
performance
difference
at
all,
but
again
confirm
that
with
the
data
sex
engineer.
So
this
is
just
the
very
simple
sample
application
for
Shiro.
A
I'm
going
to
log
in
these
are
some
sample
accounts.
This
is
our
normal
Shiro
demo
web
app
quick
start.
So,
as
you
just
saw,
I
just
logged
in,
I
can
visit
an
account
specific
page.
I
can
return
to
the
page.
I
can
see
some
roles
and
things
that
this
account
has
and
they
don't
have.
I
can
log
out.
I
can
go
back
in
and
lie
going
to
get
up.
A
And
so
now
you
could
see
my
roles
have
changed,
because
my
my
my
user
count
is
different.
All
of
this
is
using
Cassandra
under
the
hood
to
store
clutch
store
sessions
and
you
could
fire
up
any
number
of
web
nodes
and
an
all
point
to
the
same
Cassandra
sore
and
in
practice.
This
is
this
is
crazy,
fast,
really
really
fast.
Let's
see
if
I
can
run
a
quick
demo,
I.
A
We
actually
storm
path
runs
one
hundred
percent
on
amazon,
so
ours
are
not
physical
machines,
but
there
you
go.
I
just
created
10,000
sessions,
oh
and
I
want
to
I
want
to
indicate
here.
What's
going
on
for
each
session,
I'm
creating
a
new
one,
then
I'm
reading
it
out
of
the
datastore,
I'm
updating
the
data
I'm
going
to,
then
you
know
make
that
update
to
the
data
store
and
then
I'm
going
to
delete
it,
because
I
don't
need
anymore
and
then
I'm
going
to
do
yet
another
read
to
assert
that
it's
gone.
A
These
are
five
io
or
five
operations
on
the
data
stacks
driver
each
one
of
them
independently
hitting
cassandra
and
I'm
doing
that
10,000
for
10,000
different
sessions,
so
that
basically
equates
to
50,000
operations
on
a
little
on
a
2011
laptop
with
a
crap
ton
of
stuff
running,
and
it
did
it
in
about
four
seconds
right.
That
is
a
ridiculous
quantity
of
operations
for
a
non
optimized
application
on
a
non
optimized
platform.
You
can
take
this
and
scale
it
literally
and
we've
tested
this
millions
of
sessions
with
almost
no
reasonable
impact
of
the
application.