►
Description
Speaker: Mohammed Guller, Application Architect & Lead Developer at Glassbeam
Learn how Cassandra can be used to build a multi-tenant solution for analyzing operational data from Internet of Complex Things (IoCT). IoCT includes complex systems such as computing, storage, networking and medical devices. In this session, we will discuss why Glassbeam migrated from a traditional RDBMS-based architecture to a Cassandra-based architecture. We will discuss the challenges with our first-generation architecture and how Cassandra helped us overcome those challenges. In addition, we will share our next-gen architecture and lessons learned.
A
A
A
A
So
a
couple
of
things
before
I
get
started
as
I'm
going
through
the
presentation.
If
you
have
any
questions,
feel
free
to
ask
right
away,
you
don't
have
to
wait
till
the
end.
The
other
thing
is
sometimes
I
talk
very
fast
and
once
you
combine
that
with
my
heavy
indian
acts,
and
it
could
be
hard
to
understand
me.
So
if
that's
happening
just
raise
your
hand,
ask
me
to
slow
down
okay.
A
So
let
me
first
introduce
myself:
I'm
an
application
architect
and
a
late
developer
at
class
meme,
I'm
lucky
to
have
that
row
because
I'm
passionate
about
both
things,
I
enjoy
designing
new
things
and
then
building
them
up.
So
it's
fun
to
be
able
to
get
to
do
both
and
before
I
joined
class
play.
I
was
working
on
my
own
startup.
We
did
two
products.
The
first
one
was
an
idea.
A
Discussion
platform
called
good,
a
great
idea
that
allowed
people
to
discuss
ideas,
new
ideas,
new
product
ideas,
new
business
ideas
and
rate
them
and
have
qualitative
discussion
or
those
ideas.
The
other
product
that
we
build
was
trust
strikes,
which
was
a
social
recommendation.
Engine
solving
the
same
problem
that
Yelp
is
solving,
but
leveraging
is
a
social
network.
A
Okay,
so
it
looks
like
maybe
around
twenty
thirty
percent.
How
many
of
you
have
just
started:
learning,
Cassandra,
okay,
so
majority
of
the
people
are
in
that
other
segments.
Okay,
what
about
IOT?
It's
a
pretty
hard
word!
A
lot
of
people
talk
about
it,
how
many
of
you
are
actually
working
on
IOT,
okay,
so
roughly
twenty
percent,
and
how
many
of
you
have
read
about
it?
A
Okay,
what
about
the
rest
of
the
folks?
So
it
looks
like
it
has
some
people
are
lazy.
I'll
probably
have
not
been
reading
much
about
IOT.
So
it's
it's
a
really
hot
werd
lot
of
companies
are
talking
about
it.
You
hear
it
about
it
in
the
news
and
we'll
go
through
some
of
that
during
my
presentation.
Okay,
the
other
thing
is
in
terms
of
your
background,
how
many
of
you
are
on
the
technical
side
and
by
technically
means
development
operations?
A
Okay,
so
it
looks
like
a
big
chunk
of
the
audience
and
how
many
of
you
are
in
the
business
side,
product
management,
marketing,
a
few
okay,
so
it
looks
like
most
of
the
crowd
is
technical,
that's
great
so
before
I
get
into
the
meat
of
my
presentation.
I
want
to
set
this
stage
by
defining
the
problem.
So
for
those
of
you
who
are
familiar
with
IOT,
you
know
that
it's
in
it's
in
the
news
a
lot
these
days
and
the
data
from
IIT
is
exploding.
A
There
are
devices
that
are
generating
huge
amount
of
data
according
to
a
study
by
done
by
Cisco
and
IDC,
and
we
keep
on
incest
emitted
by
the
year
twenty
twenty.
There
will
be
20
*
connected
devices,
so
I
guess
all
of
us
will
be
for
each
individual
there'll
be
ten,
so
that,
basically,
that
means
we'll
have
smart
phones,
smart,
glasses,
smart
shoes.
Everything
is
going
to
be
smart.
That
was
essentially,
it
means.
A
The
same
study
also
points
that
forty
two
percent
of
the
data
is
going
to
be
generated
by
machines
and,
if
you
to
just
put
that
into
perspective
the
right
hand,
side
of
this
slide
shows
you
how
the
data
the
size
of
data
that
have
been
generated
over
a
period
of
time.
So
until
the
90
80
s
most
of
the
data
was
getting
generated
by
ABS,
which
was
structured
data,
not
a
huge
volume.
A
It
would
take
years
before
they
come
to
go
to
a
stage
where
they're
terabytes
of
data,
and
then
the
internet
took
off
in
the
1990s
and
suddenly
you
had
people
sending
emails,
sharing
pictures
posting
videos
and
that
data
eclipse,
the
all
the
data
that
was
generated
previously,
but
the
next
big
wave
is
the
data.
That's
going
to
come
out
of
RIT,
that's
going
to
be
very
huge
lot
more
than
what
people
have
been
generating
so
far,
and
this
data
presents
new
challenges.
A
The
three
main
ones-
and
some
of
you
may
have
heard
about
this
from
other
places
too
volume
variety
and
velocity
volume
is
in
terms
of
the
amount
of
data.
There
are
instrumental
devices
that
can
generate
terabytes
of
data
on
a
daily
basis
by
variety.
I
means
that
earlier
again,
going
back
to
the
or
older
days
abscess
generating
structure
data.
A
Now
you
have
this
machine
data,
which
is
not
just
structured,
but
it
can
also
be
unstructured
or
multiple
or
multi
structured
and
later,
in
my
presentation,
I
will
describe
what
I
mean
by
multi
structure
and
then
the
third,
a
key
attribute
of
IOT,
is
velocity.
So
when
you
humans
are
generating
data,
it's
at
human
speed
when,
when
the
machines
are
generating
data
or
the
Internet
of
Things,
it's
at
machine
speed.
So
it's
going
to
get
much
faster
pace,
which
means
that
kind
of
technology
need
for
consuming.
That
data
needs
to
be
totally
different.
A
But
but
I
already
also
presents
new
opportunities,
so
it's
not
just
that
you
have
new
challenges,
there's
also
new
opportunities
and
organizations.
Multiple
groups
across
in
an
organization
can
benefit
from
that
data.
So
I've
listed
some
of
the
key
groups
that
benefit
from
that
data.
First,
one
is
remote
support
by
leveraging
the
machine
data
support
organization
can
be
more
proactive
instead
of
being
reactive.
So,
instead
of
waiting
for
the
customer
to
call
and
say
hey,
my
system
is
not
working
now.
A
Actually
support
can
analyze
the
data
that
they
are
getting
from
the
machines
that
are
out
in
the
field
and
see
if
something
is
not
working,
then
proactively.
Take
steps
to
fix
this.
Imagine
how
happy
a
customer
would
be
if
he
gets
a
call
from
support
and
say,
and
the
support
guy
says
mr.
customer
I
say
that
one
of
the
component
in
your
system
is
having
some
problem
and
that's
going
to
crash
the
system
in
a
few
days.
So
we
are
sending
you
a
replacement
parts.
A
Imagine
what
the
reaction
would
be
when
the
customer
here
is
that
the
second
benefit
that
support
gets
from
the
machine
data
is
that
they
can
actually
lower
the
mean
time
to
resolution.
So
those
of
you
have
tried
to
solve
problems
remotely.
You
know
that
you
need
a
lot
of
information
when
you're
trying
to
fix
something
and
with
machine
data.
A
All
that
information
is
there,
which
means
support,
can
actually
be
a
lot
more
quicker
in
the
resolving
whatever
issues
the
customer
is
facing,
and
as
a
result
of
that,
right
being
proactive
and
being
able
to
resolve
issues,
much
faster,
you
make
the
customer
happy.
The
customer
satisfaction
goes
up
and
also
your
support
cause
goes
down
right
because,
instead
of
taking
at
30
minutes,
if
you
can
do
the
same
thing
in
10
minutes,
dear
cost
was
going
down
significantly.
The
productivity
has
gone
up.
The
second
group
that
benefits
from
machine
data
is
marketing.
A
They
can
actually
see
how
the
product
is
getting
used
in
the
field.
What's
the
adoption
curve,
so
I'll
give
you
an
example.
Let's
say
you
have
released
a
product
that
has
20
features.
How
do
you
know
which
features
are
getting
used
which
are
not
getting
used
so
if
you're
getting
data
from
the
machine
itself,
when
it's
pretty
Anna
easy
to
analyze
that
and
see
what's
going
on?
A
Similarly,
let's
say:
you've
released
multiple
products
over
a
period
of
time.
How
do
you
know
which
products
have
been
deployed
out
in
the
field
a
lot
of
times?
Customers
will
buy
something
but
not
really
deployed.
So
if
you
have
access
to
the
machine
data,
then
you
can
see
what
a
period
of
time.
What
is
the
adoption
curve
for
different
models?
A
Similarly,
engineering,
it's
insights
from
how
the
product
is
getting
used
and
they
can
build
other
products
for
their
customer
and
then
the
last
group
to
benefit
is
sales.
They
can
discover
upsell
and
cross-sell
opportunities
by
analyzing
the
machine
data.
A
good
example
would
be
lets
say
you
are
a
storage
vendor
you're
selling
storage
to
your
customers.
If
you
could
see
in
the
app
machine
data
that
some
of
my
customers
are
already
at
eighty
percent
capacity,
you
know
that
this
customer
is
going
to
need
more
storage
right.
A
So
there's
one
more
thing:
I
need
to
define
actually
before
we
continue
Internet
of
complex
things.
So
far
we
have
talked
about
IOT,
so
internet
of
complex
things
is
a
subset
of
IOT,
and
this
includes
basically
systems
that
provide
complex
functionality.
So
I've
shown
some
examples
here
on
the
slide,
and
this
would
include,
for
example,
in
a
data
center.
This
would
include
your
legacy.
Servers,
storage
systems,
routers
switches,
security
devices
in
a
hospital
setting.
A
This
would
be
the
equipment's
that
search
end
users
for
surgery,
for
example,
in
a
lab
environment
it
could
be
whatever
the
lab
technician
is
using.
These
are
all
getting
internet
connected
now.
Similarly,
there
are
industrial
devices
that
companies
like
GE
and
an
evil
are
making
that
are
all
connected
and
sending
tons
of
data
back
to
the
product
manufacturer.
In
the
automobile
section
you
have
these
new
cars,
for
example
the
cars
that
Tesla
makes
they're
heavily
instrumented.
A
So
so
what
does
glass
beam
do
so
we
are
basically
we
offer
a
SAS
based
product
that
allows
our
customers
to
do
analysis
of
their
structure
or
in
structure
machine
data.
So
as
an
input
we
take
or
whatever
operational
data
that
the
devices
are
generating
in
the
field,
the
customer
sends
it
to
eyes
and,
on
the
other
side,
basically
provide
a
bunch
of
apps
that
allows
them
to
do
whatever
analytics.
They
want
to
do
on
that
data.
A
If
you
pay
attention
or
if
you
look
closely
what's
shown
here,
not
like
the
data
actually,
that's
shown,
there
has
multiple
sections
and
each
section
has
a
different
layout
and
there's
another
thing,
actually
that's
very
different
in
each
section
and
that's
the
frequency
at
which
that
data
changes.
So
if
you
look
at
the
first
section
which
I've
marked
as
static
it
has
key
value
pair
and
that
data
very
rarely
changes
actually
across
the
lifetime
of
a
product.
In
some
case
it
might
be
every
year
in
some
cases
might
never.
A
Actually,
if
you
look
at
the
next
section,
the
config
section
it
has
to
have
a
little
and
that
information
is
going
to
change
mode
frequently.
The
third
one
is
the
statistical
information
again
it
has
totally
different
layout
than
the
config
section
and
that
one
actually
is
going
to
change
lot
more
frequently
in
some
cases
might
be
10
times
a
day.
In
some
case,
it
might
be
every
minute
in
some
cases,
every
second
and
then
the
last
one
is
the
logs.
A
A
To
do
this?
We
created
our
own
language,
which
we
called
SPL.
That's
our
code
IP.
So
this,
even
though
the
name
actually
says
semiotic
parsing
language,
it
doesn't
allow
us
to
do
parsing,
but
a
lot
of
other
functionality.
So
it
allows
us
to
specify
the
parsing
rules,
how
exactly
to
parse,
multi,
structured
or
unstructured
data
where
to
store
it,
how
to
store
it?
What
kind
of
search
capabilities
that
we
want
to
provide
on
this
data?
What
kind
of
analytics
transformation?
So
this
is
how
this
is.
A
So
this
is
a
sixty
thousand
foot
view
of
our
solution.
Customers
are
sending
error,
sending
as
unstructured
machine
data.
We
have
a
spiel
defined
for
each
customer
and
using
that
SPO
we
are
able
to
extract
meaning
out
of
that
data,
and
then
we
have
shown
the
other
side
that
our
customers
use,
and
this
is
what
the
first
generation
architecture
looked
like
a
for
product.
So
the
input
is
same,
as
you
saw
in
the
previous
slide,
and
then
we
have
a
parser
that
basically
applies
the
SPL
rules
to
the
incoming
data
extracts.
A
The
data
puts
it
into
an
SQLite
database
once
that
part
is
done.
An
ETL
process
kicks
off,
takes
that
data
and
puts
it
into
a
data
warehousing
platform
which
in
this
case
is
vertica,
and
then
we
have
another
PTL
program
that
kicks
off
at
that
point.
Takes
that
data
from
our
subset
of
that
subset
of
the
data
from
vertica
and
puts
it
into
maria
DB,
and
then
you
have
a
party
based
web
app
that
the
users
can
use.
So
this
worked
actually
does
not
a
bad
architecture.
A
For
each
product,
so
the
question
was:
do
you
have
to
define
different
SPL
for
each
vendor?
Yes,
so
yes,
the
SPL
is
for
each
product.
Yes,
so
the
first
challenge
that
we
ran
into
us
that
ingestion
speed
was
slow
because
we
were
using
a
traditional
RDBMS
which
is
reared
optimized,
so
the
rights
were
not
as
fast
as
we
would
have
liked.
So
ingestion
speed
was
the
probably
coming
problem.
A
Second
problem
was:
it
was
difficult
to
make
schema
changes
and
it
was
happening,
and
quite
often
so
sometimes
would
parse
the
data
and
then
later
on
the
customer
would
censor.
Are
some
important
stuff
that
you
guys
missed.
Can
we
start
parsing
this
one
too?
So
we
had
to
repass
the
data
and
I
know
when
we
do
that
we
also
have
to
make
schema
changes
now.
Schema
changes.
A
Okay,
if
you
have
small
amount
of
data
bus
went
once
you
have
few
terabytes
of
data,
it's
not
that
easy,
so
it
was
hard
and
then
reloading
the
data
was
painful
to
so.
Let's
see
if
you
have
been
passing
data
for
six
months
and
then
the
customer
comes
in
says
up,
we
need
you
guys
to
pass
some
additional
file
that
we
are
not
sending
earlier,
and
so
we
have
to
go
back
and
repulse
everything
reload
that
data,
and
that
would
take
weeks
and
weeks
which
was
not
acceptable
and
the
other
thing
was.
A
It
is
costly
to
scale
this
infrastructure.
So
what
I've
shown
on
the
previous
slide?
We
were
pretty
much.
This
is
not
a
multi-tenant
solution
for
each
customer.
We're
building
this
thing
actually,
so
every
customer
had
an
instance
of
what's
shown
here.
So
that
meant
every
time
you
got
any
customer.
We
had
to
be
deploying
new
infrastructure
with
the
same
set
of
tools,
and
it
was
painful
to
deploy
it
again
and
again
and
managing
multiple
things.
It
was
an
operational
headache,
so
we
decided
to
rewrite
everything
that
circle.
A
Ok,
let's
redesign
everything-
and
this
is
what
the
next
generation
architecture
looks
like
there's
a
lot
of
information.
So
let
me
take
a
few
minutes
to
go
over
each
item,
so
the
input
is
still
the
same.
We
get
streaming
data
files
and
instruct
your
format
or
multi
structure
format,
and
then
we
have
SPL.
A
The
first
change
was
that
we
rewrote
the
program
that
did
the
parsing
it's
written
in
Scala
now,
and
it's
an
order
of
magnitude
faster
than
the
previous
parts
of
that
we
had,
but
that
what
that
also
meant
is
that
now
we
needed
a
data
store
that
could
keep
up
with
the
parser
actually,
so
we
have
to
replace
the
data
store
layer.
We
have
a
case
and
review
have
s3.
We
have
solar
cloud
and
post
Christ.
So
typically,
what
happens
is
as
we
are
getting
data
once
it
gets
past.
A
The
past
data
goes
to
boot
into
cassandra
as
well
as
solar
flower.
It
gets
in
both
locations
in
the
raw
data,
in
its
original
format,
gets
towards
good
store
on
s3
and
a
subset
of
the
data
that
goes
into
consent
also
gets
extracted
into
postgres
and,
as
I
go
through,
the
apps
I'll
explain
why
we
do
that.
A
And
the
data
layer
is
front
ended
by
a
middleware,
so
the
Act
customers
never
get
access
directly
to
the
database.
Everything
is
or
even
our
own
apps.
They
never
access
the
database
directly.
They
are
all
going
through
this
middleware
layer
which
we
call
info
server
and
it
provides
a
set
of
rest
api.
So
all
the
data
is
access
through
those
api
is
so
let
me
quickly
go
over
the
app.
So
the
first
step
is
the
log
wall.
A
That's
what
allows
customers
to
get
access
to
the
raw
data,
so
if
they
want
to
go
back
and
look
at
the
data
that
they've
been
sending
to
us
and
be
able
to
filter
by
date
or
time
or
whatever
else
right,
they
can
do
that
and
that's
all
being
backed
by
s3.
The
Explorer
app
provides
search
engine
functionality.
A
So
if
a
customer
wants
to
do
full
text
search
and
some
data,
they
use
our
Explorer
app
and
that's
where
solar
cloud
comes
into
picture
so
get
using
the
solar
and
leucine
engine
in
the
back
end
to
do
that,
work,
benches
or
bi
2.
It
allows
customers
to
do
ad
hoc
analytics
and
that's
the
reason
why
some
of
the
data
gets
extracted
into
postgres
and
I'll
kind
of
go
into
more
detail.
Why
we
do
that
later
on
standard
apps?
Are
the
out-of-the-box
analytics
that
we
provide
to
our
customers,
so
they
don't
have
to
create
anything.
A
A
The
rules
and
alert
engine
is
an
interesting
one.
Actually
that
allows
our
users
to
create
complex
rules.
So,
to
give
you
an
example,
let's
say
if
you
have
a
storage
vendor
again
and
if
you
want
to
know
when
a
certain
customer
has
raged
eighty
percent
capacity
utilization,
you
can
create
a
rule
saying:
okay,
if
a
customer's
can't
get
such
ideas
and
generate
an
alert
to
me,
so
you
don't
have
to
be
constantly
monitoring
the
system.
A
So
one
of
the
questions
I
get
asked
is:
what
do
you
guys
choose?
Cassandra
there
are
so
many
options,
probably
I
mean
last
I
heard.
Was
there
150
new
SQL
databases,
but
I
met
a
friend
today
and
he
said
I
should
have
like
lot
more
than
that.
So
why
can't
I?
Why
not
something
else-
and
this
goes
back
to
the
challenges
that
I
mentioned
earlier
with
IOT
data
right
at
three
key
attributes
were
to
date
a
volume,
variety
and
velocity
a
Cassandra.
Let
us
handle
all
those
three
chaalis
are
very
elegantly.
A
So,
first,
the
first
capability
they
like
in
Cassandra
was
the
linear
scalability.
It
allows
you
to
easily
scale
from
gigabytes
terabytes,
so
you
don't
have
to
build
an
infrastructure
upfront
for
handling
terabytes
of
data.
You
can
start
with
a
very
small
cluster
and,
as
you
start
getting
more
and
more
data,
you
can
still
add
more
clusters
and
you
can
easily
scale
so
that
allows
you
to
address
the
volume
challenge.
The
other
one
was
variety
as
I
showed
you
right,
the
multi
structure
document
it
had
different
characteristic,
different
layouts,
different
change
frequency.
A
Not
a
scientist
supports
dynamic
schema,
which
makes
it
really
easy
to
consume
that
kind
of
data.
So
we
compared
to
what
we
were
doing
earlier
in
the
are
DBMS
life
became
lot
more
easier,
it's
much
easier
to
model
that
multi
structured,
it
and
casaya,
and
then
the
last
one
is
velocity
again.
Cassandra
is
right,
optimized,
so
your
rights
are
extremely
extremely
fast
and
that's
what
we
need
it.
A
So
these
were
the
three
main
reasons
why
we
chose
Cassandra,
but
then
they
actually
provided
one
more
benefit
that
helped
us
on
the
operational
side,
and
that
was
that
it
allowed
us
to
build
a
multi-tenant
architecture.
So
earlier
where
we
had
separate
infrastructure
for
each
customer
now
we
have
actually
won
infrastructure
for
all
the
customers.
So
there's
one
cassandra
cluster,
those
one
key
space
in
one
set
of
column,
families
for
all
the
customers-
and
I
don't
think
it
would
have
been
possible
to
build
something
like
this
using
any
other
technology.
A
So
what
do
we
store
in
Cassandra?
So
it's
what
I
main
data
store
are
all
the
data
that
comes
in
once
it's
passed
and
we
extracted
meaning
out
of
it.
It
goes
into
Cassandra,
different
column
families
within
Cassandra.
We
also
store
the
metadata,
which
makes
if,
if
somebody
else
wants
to
see
what's
in
the
data,
they
can
get
a
lot
of
information
out
of
that
metadata
column,
family
that
we
have
or
apps
they're,
pretty
flexible.
A
Sorry
like
even,
for
example,
what
kind
of
color
scheme
to
use
weather
or
certain
data
should
be
shown
in
form
of
chart
or
bar
chart
or
pie
chart
or
some
kind
of
graph,
all
that's
driven
through
a
configuration
and
that
configuration
is
stored
in
Cassandra
as
well.
So
it
makes
it
really
easy
to
change
the
layout
of
the
apps.
A
We
also
keep
statistics
about
the
usage
of
the
app,
so
let's
say
an
customer
bought
license
for
100
users
and
by
looking
at
the
stats
we
know
exactly
where
hundred
people
are
using
or
only
50,
and
if
only
50
people
are
using.
That
means
there's
some
problem.
Actually
so
then
you
try
to
figure
out.
What
can
you
do
to
make
sure
that
everybody
is
using
it
right?
A
You
can
also
see
how
exactly
your
apps
are
getting
used,
where
people
are
spending
time
what
the
flow
is
and
that
you
can
optimize
things
and
make
user
experience
a
lot
better
and
then
the
last
thing
is
stored
in
cassandra
is
journal.
So
all
the
data
that
comes
in
as
we
are
processing
we
keep
a
journal
and
that
way
it's
easy
to
go
back
and
audit
and
see
what
happened.
As
the
data
came
to
us.
A
So
a
few
a
word
of
wisdom
here
it
might
not
be
anything
new
for
those
of
you.
You
have
been
working
in
Cassandra,
but
for
people
who
are
just
beginning
to
learn.
Cassandra
just
started
to
your
lunches
and
I
thought
this
might
be
useful.
The
first
thing
is,
data
model
is
important,
so
you
need
to
be
really
really
careful
how
you're
going
to
model
your
data,
the
of
course
the
our
dbms
model
is
not
going
to
work.
A
So
you
need
to
understand
how
Cassandra
stores
data
and
what
kind
of
your
queries
you
are
going
to
run.
So
every
everything
is
driven
by
your
queries.
Basically,
since
there
are
no
joints,
you
can't
really
do
joints
and
Cassandra.
Sometimes
you
do
have
to
do
it.
You'll
have
to
do
it
in
the
application,
and
it's
painful
to
do
that
so
sort
of
spend
time
understanding.
What
are
the
query
patterns
and
use
that
to
create
all
the
column.
Fam
is
designed
your
data
model.
The
other
thing
is
avoid
queries
that
are
returning
large
amount
of
data.
A
One
reason
not
to
do
that
is
that
it's
slow.
Actually,
cassandra
is
what
really
fast
extremely
fast.
If
you're
doing
point
rates,
but
if
you
try
to
read
let's
and
millions
of
rows,
it's
not
a
good
use
case.
The
other
thing
is
also.
If
you
have
been
here
reading
large
amount
of
data,
it
can
count
a
lot
of
problems,
garbage
collection
and
things
like
that,
and
if
you
don't
have
enough
memory,
you
can
also
get
out
of
memory
errors.
A
Okay,
so
what
are
the
other
lessons
that
we
learned
as
we
went
through
that
experience?
One
thing
was
that
ad
hoc
queries
are
difficult
and
this
is
kind
of
by
design.
I
guess
in
Cassandra
everything
is
driven
by
query
patterns,
so
you
know
your
queries,
you
design
the
column,
families
and,
by
definition
in
case
of
ad
hoc
queries,
you
don't
know
what
the
queries
are.
A
The
other
thing
was
you'll
see
here,
see
a
lot
of
bi
tool,
windows
like
tableau,
pentaho
and
everybody
else.
They
have
announced
support
for
Cassandra.
So
if
you
go
on
their
website
also,
I
co-signed
I
support
it,
but
the
amount
of
the
capability
that
they
provide
is
nowhere
close
to
the
capabilities
that
they
provide
for
traditional
RDBMS
and
then
the
last
thing
is
about
performance.
So
I
mean
some
of
the
performance
related
stuff
is
pretty
obvious
right.
Your
performance
depends
on
your
cluster
size.
A
Obviously,
if
you
say
I'm
going
to
store
two
terabytes
of
data
in
just
two
nodes,
probably
that's
not
a
good
things.
You
need
to
distribute
the
data,
so
it
depends
on
your
cluster
size.
The
note
characteristics
right
how
much
memory
you
have
what
kind
of
disk
you
have
on
that
and
the
amount
of
memory,
the
what
kind
of
memory
you
have.
So
those
are
some
of
the
basic
stuff
that
Hardware
related,
but
there
are
few
additional
things
and
thats
related
to
your
data
model
and
data.
A
So
whatever
numbers
you
see
being
printed
by
published
by
somebody
may
not
apply
to
you
so
the
date.
The
performance
also
depends
how
exactly
your
column
family
looks
what
how
many
clustering
keys
you
have
and
what
kind
of
data
you're
storing
in
those
keys,
so
that
can
impact
performance
by
a
huge
degree.
Actually,
so,
as
you
start
deploying
this
writer
building,
it
make
sure
you
test
it
with
your
own
column.
Families,
with
your
own
schema,
put
some
data.
That's
reflective
of
the
kind
of
data
that
you'll
be
storing
there
and
then
do
your
own
benchmark.