►
Description
Speaker: Colin Charles, Chief Evangelist at Monty Program Ab
Slides: http://www.slideshare.net/planetcassandra/5-colin-charles
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
A
A
It's
probably
important
to
note
that
Monte
program,
the
company
that
you
would
see
me
representing
today,
is
a
major
sponsor
of
Maria
TV,
but
Maury
DP
is
governed
by
a
foundation,
so
it
is
not.
It
is
an
open
source
project
with
an
open
source
foundation
backing
and
the
reason
why
I
say
that
I
now
work
at
Sky
SQL
is
because
sky
skill
and
monty
program
have
agreed
to
merge
towards
the
end
of
april.
So
we're
going
to
be
a
much
larger
company
offering
services
as
well
as
engineering
Monty
program,
was
completely
engineering
oriented
organization.
A
We
focused
on
making
a
better
mysql
with
lots
of
links
to
other
databases,
so
today's
agenda
is
pretty
simple:
I'm
going
to
focus
a
little
bit
on
what
Maury
DB
is
a
little
bit
about
the
marie
DB
architecture
and
you'll
understand
why
don't
say
MySQL
after
a
while,
because
there's
one
particular
feature
that
we've
extended
Marie
DB
that
isn't
in
mysql.
So
for
you
to
then
fully
migrated
of
mysql.
A
A
So
since
only
one
person
here,
sort
of
Maury
DB
before
what
is
it
Marie
DB
is
a
community
developed
feature
enhanced
backward
compatible
version
mysql?
It
is
a
one
hundred
percent
drop
in
replacement
to
mysql.
If
you
are
running
linux
like
fedora
or
ubuntu
or
Susa,
lately
many
of
them,
when
you
just
do
a
yum,
install
or
a
zipper,
install
and
ask
for
mysql
you're
getting
more
a
DB
by
default.
So
it's
a
new
default
and
we
have
a
whole
bunch
of
enhanced
features.
A
You
don't
have
to
deal
with
oracle
in
terms
of
getting
an
oracle
enterprise.
You
can
get
the
thread
pool
for
free.
A
thread
pool
is
very
useful
for
many
short
running
queries
that
that
happen
typically
with
your
web
apps.
So
you
can
just
open
up
a
few
threads
to
run
many
queries
and
get
results.
Coming
back
in
the
same
thread,
we
have
things
like
table,
elimination
which
is
the
basis
of
anchor
modeling.
A
We've
done
a
huge
amount
of
changes
in
replication,
which
still
don't
match
the
ease
of
use
that
you
get
out
of
Cassandra
wait.
So
you
still
can't
do
things
like
multi-master
replication
yet,
but
we've
made
things
like
group
commit
in
the
binary
log
happened,
which
means
that
if
you
have
more
than
three
parallel
running
queries,
instead
of
calling
F
sink
every
time,
you
call
F
sink
as
at
one
go
at
which
point
you
actually
get
great
performance
improvements,
because
s
link
is
expensive
in
Linux.
A
We've
also
been
playing
a
lot
with
these
new
SQL
and
links
handler
socket
is
a
no
SQL
interface
to
inner
DB.
The
storage
engine,
which
allows
you
to
do
simple,
create,
read,
update,
delete
operations.
It
completely
bypasses
the
sequel
layer,
so
it
just
goes
direct
to
the
engine
and
it's
very
very
fast.
We
also
integrate
with
the
string
storage
engine
so
that
you
can
now
do
full
text
search
using
stinks
because
MySQL
amore,
DB,
isn't
really
made
for
full
text
search
will
allow
multisource
replications.
A
So
since
you're,
probably
if
you've
come
from
a
MySQL
world,
you
have
many
groups
of
little
masters
and
slaves,
but
you
maybe
want
to
aggregate
all
the
data
from
all
those
masters
because
they're
running
separately,
that
that's
what
Multi
social
application
is
useful
for
and
dynamic
columns.
We
will
talk
about
when
I
get
to
the
slide.
So
this
is
the
mysql
/
maria
DB
architecture,
diagram.
A
You
have
your
application
sitting
all
the
way
at
the
top.
You
then
connect
to
it
via
the
myriad
number
of
languages
that
are
available
so
take
your
pick,
Perl
Python,
Java,
etc.
It
then
goes
to
a
connection
pool
the
connection.
Pool.
Will
then
do
authentication
we've
extended
authentication.
So
now
that
you
can,
you
can
also
do
authentication
against
pan.
You
can
also
do
education
against
ldap.
You
can
also
do
authentication
against
Active
Directory
after
you
go
through
the
connection
pool
you
hit
the
sequel
interface.
Then
it
passes,
it
hits
the
optimizer.
A
It
may
already
pick
stuff
up
from
a
cash,
but
if
it
doesn't
already
have
it
in
cash,
it
goes
straight
down
to
the
pluggable
storage
engines
and
the
pluggable
storage
engines
sit
right
on
top
of
the
file
system.
Things
like
my
eyes.
Em
do
not
offer
transaction
support,
but
it's
very
good
for
quick
inserts.
You
know
DB
is
fully
transactionally
aware,
but
we
ship
something
called
extra
DB,
which
is
in
0
DB.
That
generally
runs
at
Google
and
Facebook.
A
So
it's
in
0
DB
that
runs
at
scale,
so
you're
familiar
with
MySQL
you're,
probably
familiar
with
the
pakona
tool
set
and
extra
DB
is
a
Kona
based
tool,
but
we've
always
had
engines
that
spoke
not
only
to
the
local
file
system,
but
over
the
network.
Ndb
was
an
engine
that
it's
always
been
a
network
database,
it's
commonly
referred
to
as
mysql
cluster
federated
ex
was
commonly
embedded
inside
cisco
routers.
They
will
allow
you
to
have
SNMP
log
data
sent
across
the
wire
to
your
mysql
server.
We
have
engine
that
now
integrates
directly
to
leveldb.
A
Leveldb
is
a
key
value
store
how
many
we
use
the
chrome
web
browser.
Okay.
Most
of
you,
let
the
diversion
of
leveldb
that
sits
inside
chrome
is
what
implements
index
DB.
So
technically
you've
been
running,
we
be
without
generally,
even
knowing
you
have
this
database
sitting
there.
It's
part
of
the
html5
spec
each
browser
implemented
differently
as
well:
Firefox
user
sequel
life,
federer
I,
see
that's
the
engine
I'm
here
to
talk
to
you
about
that.
A
A
So
with
the
storage
engine,
leia
we've
been
extending
it
and
we're
looking
at
many
different
storage
engines,
including
things
like
engines
to
MongoDB
and
so
forth.
So
you
can
use
the
same
sequel
interface
that
allows
you
to
now
speak
to
other
databases,
but
then
we
also
accept
the
replication
API.
So
now
there's
a
replication
API
in
mysql
II
of
binary
logs,
which
is
we
call
bin
logs.
A
So
there
is
now
an
API
and
you
can
have
a
Hadoop
reply
that
writes
directly
to
HDFS,
which,
from
what
I
understand,
is
also
kind
of
useful
to
folk,
and
this
is
not
something
we
ourselves
developed.
Oracle
has
also
worked
on
this
as
well
to
make
the
supplier
happen.
So
now
you
have
two
ways
to
connect
to
different
engines.
You
have
the
replication
API,
as
well
as
the
pluggable
storage
engine
API,
both
of
which
generally
quite
unique
to
the
MySQL
world.
A
A
Okay,
so
actually
I
tried
to
explain
the
whole
mysql
architecture.
You
don't
go
through
any
any
layers.
You
you
basically
still
rights
equal
to
connect
to
you
Cassandra
cluster,
and
there
are
some
use
cases
for
why
this
may
actually
be
useful,
especially
in
terms
of
use,
the
tracking
and
so
forth.
So
you
don't
actually
go
through
all
of
this.
A
This
this
happens
in
less
than
a
micron
second,
possibly
but
I
kind
of
needed
to
explain
what
the
architecture
looked
like
so
that
you
know
how
we're
connecting
to
it
and
there's
no
black
magic
of
Voodoo
happening,
because
if
I
told
you
all,
your
applications
would
just
write,
sequel
and
connect
directly
to
Cassandra.
Then
you
may
presume
I'm
lying
to
you,
yeah
yeah.
That's
the
answer.
I
didn't
want
to
I
wanted
to
show
you.
A
A
We
have
memcache
memcache
d
access
directly
to
innodb,
so
if
you're
using
memcache
d,
which
is
really
really
common,
if
you
have
a
web
app
nowadays,
you
might
want
to
make
it
persistent
inside
and
you
can
save
your
mom
cached
information
inside
NDB,
so
because
I,
how
do
you
apply
leveldb
cassandra
and
this?
This
is
planning
to
go
on
and
on
we
plan
to
integrate
with
other
storage
engine,
so
you
can
continue
writing
sequel
without
so
your
current
knowledge
of
sequel
will
not
go
away.
A
So
this
is
the
reason
why
I
did
not
mention
mysql,
because
we
have
extended
maria
DB
to
include
something
called
dynamic
columns.
Dynamic
columns
allows
you
to
store
set
of
columns
every
each
and
every
row
in
the
table.
It's
an
arbitrary
star
and
it's
like
a
blob.
It
stores
it
in
a
glob,
but
it
comes
with
lots
of
handling
functions,
so
you
can
do
it
dynamic
column,
get
create,
add,
delete
and,
most
recently,
we
can
also
give
you.
The
Rose
in
JSON
format
and
json
seems
to
be
a
relatively
good
interchange
format.
A
That's
very
commonly
used
for
many
new
systems.
Many
people
like
to
write
JavaScript
from
from
the
get-go,
so
now
you
can
get
stuff
in
JSON
as
well.
You
can
nest
dynamic
columns
as
well,
and
you
can
also
name
dynamic
columns
and
you
do
the
column
name
previously.
You
could
not
name
dynamic
columns.
These
were
actually
given
to
you
by
Maria
DB
itself.
Now
this
particular
dynamic
column
feature
is
not
available
inside
of
mysql,
so
the
connection
to
Cassandra
would
not
be
available
via
mysql.
You
actually
have
to
use
Maria
DB,
but
lucky
for
you.
A
If
you're
already
using
MySQL
the
upgrade
to
Maury
DB
is
really
easy.
You
can
just
do
yum
install
Maria
DB
server
and
it
will
just
replace
it
in
situ.
It
reads
the
same
data
files.
It
has
the
same
socket
same
port
number,
so
your
application
doesn't
actually
change
per
se,
so
you
upgrade
is
in
situ.
You
just
generally
get
in
all
the
additional
benefits
and
performance
fixes
that
we
have
so
for
the
few
people
that
don't
use
Cassandra.
This
is
kind
of
like
how
we
we
mapped
it
column.
Families
are
exactly
like
tables.
A
A
A
A
Okay.
So
that's
more
more
still
on
the
Cassandra
1.1.
So
that's
good,
because
this
implementation
is
based
against
Cassandra
1.1
bugs
12,
so
cql
looks
like
sequel
at
first
glance.
It,
however,
doesn't
do
joins.
It
doesn't
do
some
queries,
however,
my
skill
doesn't
dislike
berries
either.
So
you're
probably
already
used
to
not
having
sub
queries,
but
Murray
DB
does,
which
is
why
I
bring
that
up?
You
don't
have
group
by
order
by
inside
of
sequel,
cql
cql
3,
I'm
going
to
attend
the
talk
later
to
learn
more
about
it.
A
It's
relatively
new
because
I
think
it
only
got
released
in
February
of
this
year,
so
there
are
dashed
for
some
changes
that
we
don't.
We
haven't
made
to
make
it
more
sequel,
3,
ready
where
clauses
need
to
be
represented
as
index
lookups.
Our
simple
goal
today
for
the
Cassandra
storage
engine
is
to
provide
a
view
into
Cassandra's
data
from
Maria
DB.
A
That
means
inserts,
reads,
etc,
but
we
don't
want
to
replace
sequel
cqo
and
we
want
this
to
be
a
good
good
pass
for
you
to
currently
use
an
access
cassandra
without
having
to
use
cql
yet,
but
maybe
down
the
line.
If
you
re
architecting
your
application
or
your
data
model,
you
can
then
use
this
as
a
stepping
stone
to
migrate
as
well.
Note
that
migrating
helps
me
getting
started
is
really
really
easy.
We
released
Maria
DB
1003
yesterday,
so
this
slide
used
to
say
1002
right
up
until
yesterday,
so
you
just
download
it
there.
A
We
have
binaries
available
for
all
forms
of
Linux
I'm
presuming
you
are
going
to
be
testing
most
of
this
on
some
form
of
UNIX.
You
need
to
load
the
Cassandra
plug
in
all
Sarge
engines.
Are
plugins
at
the
end
of
the
day,
this
is
even
true
for
n,
0,
DB
and
so
forth.
You
can
do
install
plugin,
Cassandra,
so
name
a
che
Cassandra
RSO.
This
will
install
it.
You
can
also
start
it
in
my
doc
CNF.
So
under
the
mysqld
you
make
sure
the
plug-in
load
is.
A
Is
there
or,
if
you
install
it
via
linux,
distribution,
make
sure
that
you
install
more
adb,
dash,
Cassandra
storage
engine,
because
you
don't
install
that
particular
package
when
you're
unsure
engines
it
will
not
be.
There
also
make
sure
that
the
Cassandra
storage
engine
is
there.
You
can
do
that
by
doing
something
like
show,
plugins
or
show
engines,
both
of
which
will
work.
Just
fine.
A
Now
you
can
create
a
sequel
table
which
is
basically
a
view
into
a
column
family.
You
need
to
set
the
global
global
thrift
toast
you
can
also
set
the
thrift
O's
per
table,
so
we
did
this
on
on
amazon.
For
so
you
can
also
try
this
on
amazon
and
I'll.
Show
you
how
later
I'm
not
going
to
do
a
live
demo
on
amazon,
because
the
internets
kind
of
flaky
and
need
to
create
a
table
specify
that
the
engine
is
cassandra,
make
sure
the
day
is
a
key
space.
A
You
must
have
a
thrift
toast,
because
this
uses
the
thrift
api
and
you
must
have
a
column
family
name
as
well.
The
Cassandra
default
proof
toast,
which
is
right
up.
There,
allows
you
to
repoint
the
table
to
any
different
nodes,
dynamically
and
not
change
table
ddl
as
well,
when
Cassandra
is
IP
changes,
so
this
is.
This
is
also
similar
to
how
you'd
connect
the
Federated,
X
or
strings
as
C
or
any
any
network
related
database
from
the
MySQL
world.
You
always
end
up
specifying
addresses
or
pools
of
addresses,
so
to
speak.
A
There
are
potential
issues
that
you
may
face,
oh
to
be
to
be
fair.
I
ran
this
against
1003
yesterday,
as
well,
just
to
make
doubly
sure
that
you
would
not
have
a
problem.
Potentially,
if
you
run
on
fedora
RL,
you
have
selinux
issues
and/or
if
you're
on
Ubuntu,
you
have
audit
d
issues,
so
you
may
see
a
permission
denied
error.
You
can
turn
selinux
off
or
stop
audit
d.
A
Okay,
none
of
this
happened,
but
if
it
does
happen
to
you
which
is
occasionally
reported,
you
can
turn
selinux
or
already
off
with
regards
to
Cassandra
1.2,
which
is
what
was
released
sometime
in
February
column.
Families
without
the
compact
storage
attribute
are
generally
not
supported.
You'll
get
an
error,
so
this
is
pre
cql
3.
You
need
to
use
compact
storage,
and
this
is
referred
to
as
legislators,
currently
in
the
Indy
documentation.
So
my
suggestion
is
to
continue
making
legacy
tables
with
compact
storage.
A
We
will
fix
this
in
future
releases,
but
the
time
frame
of
every
June
is
still
pretty
short.
Also.
We
notice
that
thrift
based
clients
can
no
longer
work.
It
also
broke
peak
in
1.2
and
we're
looking
for
forward
to
the
patch.
The
issue
is
Cassandra
5234
and
that
that
should
be
fixed,
probably
in
the
next
release
as
well.
So
Pig
0.11,
I
believe,
is
also
broken
against
Cassandra
now.
So,
for
all
intents
and
purposes,
I
have
used
cassandra.
A
1.1
data
stakes,
cassandra
1.14
for
this
demo
and
the
best
part
is
now
you
should
be
able
to
access
data.
You
can
get
data
from
Cassandra
just
by
doing
a
select
and
you
will
actually
get
the
data
pulled
out
of
Cassandra.
You
can
insert
data
into
Cassandra
and
then
you
can
double
check
with
cql
SH
to
see
if
the
data
has
been
inserted
as
well.
So
you
now
have
a
complete
window,
/
view
into
Cassandra.
All
these
commands
will
work
in
the
examples.
A
A
All
tables
must
have
a
primary
key,
and
the
name
or
the
type
must
also
match
Cassandra's
Rocky
also
called
this
will
map
to
Cassandra
static
columns,
as
highlighted
up
there.
So
don't
forget,
the
name
must
be
the
same
as
it
Cassandra.
The
data
types
must
match,
and
it
can
also
be
a
subset
of
column
families.
This
is
that
that
also
works.
A
We
support
pretty
much
everything,
including
timestamp,
and
we
support
micro
seconds
in
time.
Stamps
MySQL
doesn't
support
microseconds.
We
do
so.
That's
probably
one
additional
little
feature
there.
So
what
is
dynamic
columns
going
back
again,
mainly
because
why
do
we
are
dynamic
columns
so
that
so
that
we
can
access
Cassandra's,
dynamic
column,
families
and
access
adult
columns?
This
is
how
you
use
dynamic
problems
as
well.
Inside
of
Maria
TV.
You
don't
have
to
use
it
with
Cassandra.
A
A
Mappings
it'll
actually
spit
errors
out
at
you,
and
these
are
common
errors
that
you
can
get
spat
out
at
you,
we've
mapped
most
commands,
cassandra
has
put
get
and
delete,
and
then
there's
sequel
commands
like
select
is
basically
cool
equivalent
to
a
get
or
ask
an
insert
is
basically
put
an
absolute
is
an
update
than
an
insert.
An
absolute
is
a
valid
term.
Nowadays
it
seems
so
with
regards
to
select
command
mapping.
Marie
DB
has
a
sequel,
sequel
interpreter
Cassandra
SC
will
obviously
support
the
lookups.
A
You
can
now
join
between
cassandra
tables
as
well
as
Maria
DB
tables,
and
we
have
something
called
batch.
Key
access
joins
available
and
batch
key
access
will
actually
make
sure
the
joint
buffers
are
accumulated
and
interesting
columns
and
rows
are
actually
transmitted
to
the
optimizer.
Turning
on
batch
key
access
gives
you
great
performance
when
the
query
sign
is
go
from
one
to
three,
so
regular
joins
us
as
batch.
A
Key
access
joints
are
amazing,
so
I'd
always
turn
this
on,
especially
if
using
because
Sandra,
because
it's
accessing
stuff
of
our
network,
so
with
regards
to
DML
insolent,
does
over.
I
rose,
update,
read
stan
rights,
so
let's
just
make
it
clear
that
Cassandra
SC
doesn't
make
it
sequel,
sequel,
cuz,
its
sequel
like,
but
it's
not
sequel
per
se.
So
a
few
use
cases
with
Cassandra
I,
see
log
collection
and
analysis
is
amazing.
A
Etely
used
inside
a
Cassandra
in
the
old
days,
maybe
in
two
thousand
seven
or
so
you'd
say:
hey,
grab
log
data
and
keep
this
inside
of
my
eyes
I'm
or
are
the
archive
storage
engine?
Cassandra
is
better
at
it
form
version
and
cassandra
is
better
at
it.
You
want
to
call
that
webpage
sheets.
You
want
to
collect
data
from
sensors,
so
you
in
the
previous
talk
data
from
sensors.
Cassandra
is
awesome
for
this.
So
collecting
the
data
time
series,
data
from
Cassandra
and
then
query
outs
using
maria
DB
is
fine.
A
So
if
you
so,
if
you
are
getting
time
series
data,
you're,
doing
user
activity
tracking,
so
this
web
page
was
last
viewed
by
foo
last
known
position
of
this
user
on
inside
this
web
page
was
this.
So
if
your
ecommerce
shop,
knowing
the
last
known
position
of
the
user
and
keeping
data
of
the
user,
is
very
important
for
you,
you
want
to
keep
all
this
data
in
Cassandra.
It's
not
so
good
being
kept
inside
of
relational
database
like
mysql
amor,
adb,
you
are
user
five
out
of
1,000,000.
A
Do
the
old
adage
was
people
who
select
count
inside
of
mysql
inside
forum
software
and
that
makes
forum
software
notoriously
slow?
So
nobody
does
that.
That's
this
kind
of
stuff
you
can
do
with
cassandra
and
if
you're
coming
from
maury
DB-
and
you
want
the
table
that
is
auto,
replicated
mysql
morita,
we
do
not
do
other
application,
you
want
fault
tolerance
and
you
want
something.
That's
really
really
fast
get
Cassandra
with
a
Cassandra
SC
table.
The
other
thing
that's
pretty
pretty
unique.
Is
you
can
get
a
globally
replicated
table?
A
Cassandra
allows
this,
and
you
can't
do
this
even
with
something
like
a
galera
cluster,
which
is
another
product
line
that
we
have
inside
them
ready
be
another
possibility.
Unique
use
case
is
that
we
have
a
connect
storage
engine,
so
you
can
now
connect
and
join
data
between
an
oracle
database.
Why
odbc
Cassandra
your
Cassandra
cluster
as
well,
and
use
nodb
as
an
intermediary
and
stored
data
inside
of
Maury
DB?
A
You
want
to
turn
on
something
called
engine
condition,
push
down
which
basically
sends
non
matching
rows
from
the
storage
engine
to
the
sequel
layer
it
and
it
does
avoid
round
trips
over
the
network.
Basically,
and
the
filtering
is
done
on
the
remote
data
node
as
well.
This
is
kind
of
useful,
especially
if
you're
I
heard
people
wanting
to
migrate
out
of
sequel
surveillance,
sequel
server
supports
odbc
as
well,
and
this
is
great
great
use
for
being
middle
middleware
software
that
helps
you
migrate,
non
use
cases.
A
Things
like
huge
sift
through
data
joins
pick
is
better.
You
want
to
do
a
bulk
data
transfer.
Scoop
is
better.
We
want
a
replacement
for
energy
because
Sandra
se
is
not
quite
your
replacement
for
80
DB.
That
would
be
rethinking
the
data
model
entirely
to
make
it
happen
for
you,
here's
a
quick,
tiny
benchmark.
We
did
this
on
Amazon
ec2
with
m1
large
nodes.
The
bottom
is
innodb
in
blue.
You
can
see
that
the
moment
we
even
add
to
two
nodes
for
Cassandra.
You
start
seeing
amazingly
good
throughput,
which
is
the
kind
of
black
one.
A
There
there's
amazingly
good
throughput
for
data
with
next
to
no
tuning
when
you
start
having
a
Cassandra
cluster
in
the
backend.
Cassandra
is
really
really
fast
same
setup
as
before,
with
some
tuning
done
with
the
new
DB
and
again
with
one
Cassandra
node.
The
red
shows
that
the
red
beats
the
blue
in
terms
of
transactions,
even
when
hit
client
threads,
so
Sandra
is
fast.
Cassandra
I
see
the
interface
is
really
really
fast.
A
Oh
this,
this
is
for
single
line
insert
both
are
for
single
line
inserts
using
this
bench.
So
you
can.
You
can
basically
pick
a
data
from
maury
DB
into
Cassandra.
It's
really
really
easy
to
set
up
and
use.
We
want
to
see.
If
you
want
other
features
like
table
discovery,
we
can
actually
have
assistant
table
discovery
of
a
Cassandra
cluster
like
we
do
for
federated
X
and
connect.
So
we're
definitely
looking
into
doing
that.
A
If
that's
something
that
could
be
useful
for
you,
if
you
want
to
want
to
automatically
access
a
Cassandra
cluster,
we
were
happy
to
actually
start
looking
at
that
secondary
indexes,
possibly
as
well
huge
chunk
of
resources.
These
slides
will
be
online.
Thank
you
for
listening.
We
won't
go
through
the
internals,
but
if
you
want
to
actually
try
this
on
your
machine,
you
can
download
this
virtual
box
image
using
vagrant.