►
From YouTube: C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting and Interactive Analysis
Description
Speaker: Ernesto Ongaro, Senior Sales Engineer at Jaspersoft
Slides: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-leveraging-the-power-of-cassandra-operational-reporting-and-interactive-analysis
A
So
leveraging
the
power
of
cassandra
through
operational
reporting
and
and
analytics
through
jasperson.
So
a
little
bit
about
me
worked
for
jasper
stuff
for
about
three
and
a
half
years
now
started
in
san
francisco
in
our
tech
support
team.
I
work
in
dublin
now
in
our
sales
engineering
and
consulting
group
there
so
kick
off
with
the
agenda,
so
we're
going
to
talk
through
requirements
for
cassandra
reporting
and
analytics.
So
what
what
I've
heard
out
in
the
field
we've
had.
Customers
ask
us
to
build
for
them.
A
We'll
look
at
the
current
state
of
reporting
and
analytics
what's
out
there
and
then
the
the
third
part
is
probably
the
the
meat
of
the
the
conversation
will
be
around
the
architectural
approaches.
How
do
you
pull
data
out?
You
know
how
do
you
provide
that
to
your
to
the
consumers
of
the
data,
then
we'll
go
into
a
demo,
some
some
stuff,
I've
built
using
our
cassandra
connector
and
show
you
what
that's
about
and
hopefully
some
time
for
a
question
and
answer.
A
A
All
right
cool-
I
don't
know
how
many
more
we
can
actually
take.
Okay,
so
I'll
go
back
sorry
for
the
agenda,
so
we're
gonna.
B
A
Requirements
for
reporting
we're
going
to
look
at
some
architectural
approaches,
we're
going
to
look
at
a
demo
and
type
the
q
a
at
the
end.
So
people
want
access
to
the
data
cassandra
and,
as
far
as
I
see
it,
most
consumers
of
that
data
are
not
going
to
be
technical.
So
you
have
data
scientists
at
the
top
of
a
pyramid
of
users
at
the
base
of
those
users
are
folks
who
are
not
technical,
they're
people
who
are
working
for
marketing
who
are
working
for
areas
where
they
don't
need
to
know.
A
Traditional
reporting
and
analytics
tools
don't
work
with
example,
so
they're
cassandra's
fairly
new
and
it's
evolving,
as
we've
learned
through
having
to
develop
our
our
connector
various
times.
Now,
as
the
technology
matures
and
changes
building
reports
isn't
easy,
it's
not
fun.
It's
not
something.
Developers
want
to
spend
a
lot
of
their
time
on,
especially
if
we're
talking
about
a
lot
of
reports,
you're
talking
about
building
one
little
dashboard
in
your
application.
A
That's
fine
you're
talking
about
building
50
of
them
or
you're
talking
about
an
application
where
your
customers
have
to
you
know
they're
asking
you
to
build
custom
reports
for
them.
That's
not
easy
or
fun,
and
then
providing
ad
hoc
analytics
is
very
complicated.
It's
it's
not
easy
to
do,
or
even
a
developer,
to
provide
a
tool
for
others
to
get
data.
Those
non-technical
people.
A
So
the
current
state
of
things
connectors
tend
to
be
for
relational
databases
only.
So
your
traditional
kind
of
business
intelligence,
vendors
came
about
in
the
70s
and
80s,
assuming
that
data
was
always
going
to
come
from
a
relational
database,
usually
their
own
relational
database,
your
oracles,
your
idn's,
db2,
etc.
A
C
B
C
A
In
your
applications
or
as
a
standalone
tool,
should
I
build
this
myself
or
should
I
look
for
an
existing
framework
that
does
this?
For
me,
you
have
to
think
about
security
and
scheduling
and
apis,
and
maybe
a
metadata
layer.
What
charting
libraries
I'm
going
to
use
flexibility
potentials
it
actually
quickly
becomes
its
own
project
if
you're,
building,
reporting
and
analytics
into
your
platform.
In
fact,
it's
actually
how
jasper
self
gets
started.
A
We
started
this
as
some
open
source
projects,
this
guy
thierry
danshu
in
romania
in
the
early
2000s.
He
he
has
an
idea
for
this
company.
It's
kind
of
this
banking
erp
type
of
system
that
he's
doing
and
when
he
came
to
reporting
analytics,
he
said.
Okay,
I'm
going
to
go
and
I'm
going
to
get
a
tool
called
crystal
reports.
It's
still
around
in
one
shape
form,
so
he
goes
to
them
and
they
tell
him.
Well,
it's
going
to
cost
you
this
much.
A
It
actually
exceeded
the
budget
for
his
entire
project.
So
he
said
I'm
going
to
write
my
own
library,
so
he
called
that
jasper
reports
made
it
open
source
and
it
turns
out
that
his
original
company
completely
failed,
but
that
the
side
project
of
reporting
turned
out
pretty
cool.
He
published
that
and
people
got
you
know,
surrounded
eventually.
A
Some
entrepreneurs
in
california
approached
theodore
got
his
intellectual
property
from
him
and
built
a
company
around,
and
that's
who
jasper
stuff
is
okay.
So
we'll
look
at
architectural
approaches
to
getting
your
data
in
and
out
of
jasper
sub,
so
that
that
was
the
the
salesy
part
of
the
presentation.
But
that's
what
pays
for
this
conference
so
the
fun
stuff
is
architectural
approaches,
so
four
methods
to
visualize
your
your
cassandra
data,
the
first
one
is
an
etl
approach,
so
extract
transform
and
load
your
data
into
another
system.
A
A
One
or
the
whitening
one:
what's
the
perfect
combination
for
me,
so
there
is
no.
There
is
no.
You
know
silver
bullet
answer
for
this.
You
have
to
make
that
choice
architecturally.
So
let's
explore
the
four
options
that
we
give
you
I'm
trying
to
make
this
as
generic
as
possible
as
well,
because
there's
other
vendors
and
there's
approaches
that
you
can
take
yourself.
So
don't
take
this
as
just
a
jaspersoft
thing.
It's
a
it's
a
general
methods
to
get
to
get
that
data
so.
A
Approach,
hopefully
some
of
you
can't
really
see
the
slides
there,
but
so
the
the
way
this
one
works.
You
have
cassandra
and
you
have
a
batch
etl
process
that
pulls
that
data
into
probably
a
relational
database,
and
then
you
can
use
whatever
bi
tool
you
want
to
use,
because
those
are
all
going
to
work
with
whatever
rdbms
that
you
choose
what's
nice
about
this
is
you
can
combine
it
with
other
data
so,
like
our
etl
tool,
has
lots
of
connectors
that'll?
Let
you
combine
this
data.
A
This
would
probably
be
the
best
suited
for
an
internal
traditional
business
intelligence
use
case.
I'm
talking
about
some
system
that
sits
inside
of
your
firewall.
That
marketing
uses
to
make
decisions
on
ceo
uses
to
you
know,
drive
it
to
steer
the
ship
in
the
right
direction,
that
type
of
reporting
and
analytics
it's
the
most
traditional
approach.
A
There's
some
transformations
extract,
transform
load
into
another
system,
we're
extracting
the
data
via
a
batch
method,
so
you're
getting
some
latency
of
the
data
like
what
we're
seeing
at
the
top
there.
Well,
like
our
our
internal
kind
of
etl
jobs,
run
every
15
minutes,
so
it's
a
15
minute
latency
of
the
data
which
for
for
bi,
is
actually
pretty
good.
We'll
call
that
near
time,
but
you
know
some
of
those
jobs
could
take
10
to
12
hours
to
run
that
kind
of
thing.
A
So
sometimes
it's
that
could
be
a
problem
in
itself
that
you're
not
offering
live
data,
and
especially
when
we're
talking
about
you
know
an
application
that
you
have
the
expectation
when
somebody
clicks
into
an
application
is
that
everything's
happening
there
and
when
you
switch
over
to
the
reporting
and
analytics.
If
it's
like,
oh
yeah,
this
data
is
only
is
12
hours
old.
Then
people
get
upset,
so
they
want
more
kind
of
real
time
in
a
real-time
application.
A
This
is
the
option
with
the
most
connectors
like
out
of
the
box
out
of
any
vendor.
You
would
probably
get
you
know
faster
ways
to
connect
to
more
types
of
data,
and
it's
the
most
robust
options
pulling
data
out
of
out
of
the
source
systems.
That's
the
e
part
of
etl.
The
t
transform
part
of
dtl
is
probably
the
most
important.
It's
an
exercise
you're
going
through
where
you're
cleaning
the
data
filtering
the
data.
A
You
can
do
pre-aggregations
of
data
you're,
taking
the
operational
data
and
you're,
making
it's
making
sense
for
business
users,
and
so
it
it's
a
very
popular
option,
very
robust
function.
So
I
won't
show
you
a
demo
of
jasperselfdtl.
They
tend
to
be
kind
of
dry
in
that
you
know,
and
now
the
data
flows
through
here.
This
is
just
a
screenshot
of
the
designer.
It's
really
it's
a
designer
based
on
eclipse
that
ships
with
450
connectors.
So
my
example.
Job
here
grab
some
data
out
of
postgres
grab.
Some
data
cassandra
and
data
salesforce
did
some.
A
Also
out
of
a
web
service
did
some
basic
things
like
in
the
web
service.
We
have
a
component
called
unique
row.
The
cassandra
one
we
did
some
aggregations
on
salesforce
just
goes
right
into
it,
and
then
it
does
some
schema
compliancy
checks
and
then
spits
it
out
to
a
relational
database.
So
that's
the
kind
of
thing
you
build.
I
realize
most
of
you
guys
are
developers.
You
can
just
write
the
code
to
do
this
yourself.
That's
fine!
A
What
an
etl
tool
does
it
helps
you
operationalize
that
so,
when
you're
done
writing
that
code,
you
publish
this
out
to
a
web
management
place
where
you
basically
give
it
to
your
operations
team
and
say
I
need
this
to
run.
You
know
this
often
here's
what
happens
when
errors
occur.
It's
a
way
to
kind
of
hand
that
that
off
to
operations
a
lot
easier
than
the
traditional.
A
Okay,
so
we
get
into
the
second
approach,
and
the
second
approach
is
where
we
built
a
native
connector
using
cql3
into
your
cassandra
cluster,
and
then
you
can
build
reports
and
dashboards
that
will
show
the
data
that's
directly
in
there.
So
no
etl
process,
no
in-memory
stuff.
It
basically
shoots
it
right
out
the
screen
and
so
that
that
one's
going
to
give
you
a
lot
less
latency
user
clicks
gets
the
data
directly
out
of
cassandra.
Don't
need
to
go
through
an
etl
job.
A
A
You
basically
design
them
in
this
in
this
space
thing,
and
then
you
polish
them
out
to
jasper
reports,
surf
lowest
latency
like
I
said,
and
it's
a
good
supplement
to
etl
when
your
time
is
required.
So
if
you
are
maybe
most
of
your
stuff
comes
from
an
etl
job,
but
some
of
the
some
of
it
where
the
business
would
get
some
value
from
near
time
or
real
time.
You
can
combine
these
two
methods:
they're,
not
mutually
exclusive.
A
Okay,
so
that
brings
us
to
oh
right,
the
connector
we
built
is
based
on.
I
can
never
say
that
word,
yes,
and
so
it
I.
I
believe
that
one's
based
on
the
on
the
java
official
job
driver
as
well,
so
we
kind
of
wrapped
around
that
and
that's
what
how
we
access
it
and
you'll
see
that
in
my
demo
this
is
a
screenshot
of
an
example.
Dashboard
we'll
actually
play
around
with
this.
In
a
few
minutes.
A
So
then,
there's
the
third
method,
which
is
direct
access
exploration,
it's
very
similar
to
the
previous
one
in
that
we're
using
our
native
connector,
it's
bringing
the
data
into
an
in-memory
olap
engine,
and
then
it
allows
the
user
to
drag
and
drop
and
sort
of
look
at
the
look
at
the
data.
In
that
way,
we
can
parametrize
the
queries
into
your
cassandra
cluster.
So
it's
not
like
just
doing
a
you
know,
dump
everything
into
memory
you
can
selectively
grab
stuff
and
then
allow
the
user
to
deal
with
it.
A
People
that
want
access
to
the
data,
not
technical
like
if
you
guys
are
developers
you'd,
have
a
million
ways
to
get
to
this
data
most
regular
humans,
don't
have
those
options.
Okay,
this
is
an
example.
All
that
view
we'll
create
something
like
this
in
the
demo
as
well,
where
we're
dragging
a
few
fields
over
to
do
a
visualization.
A
A
Again,
it's
going
to
be
a
bit
of
a
batch
process
if
I
understand
live
hive
correctly
and
and
how
the
juice
works.
A
Okay,
so
a
little
bit
of
a
demonstration
we're
going
to
look
at
an
example.
Dashboard
report
look
at
jasper
self
studio,
we're
going
to
do
ad
hoc
exploration
and
then
the
the
environment
that
I'm
running
on
I'm
running
jaspersoft
5.5,
which
is
it's
a
web
application.
That's
deployed
up
to
topcat,
j2ee
application
and
I'm
running
datastax
enterprise
3.1
with
cassandra
1.2.
A
Okay,
logging
into
this
web
app
here,
the
first
thing
I
want
to
do
is
show
you
guys,
a
dashboard,
probably
the
simplest
one
to
understand.
This
is
using
that
direct
connect
method
to
write
some
queries
and
then
bring
it,
bring
it
into
the
server
and
display.
A
So
I'm
running
a
vm
with
all
this
stuff,
it's
a
bit
scary
to
do
a
demo
room
full
of
people,
but
it
worked
okay
cool.
So
this
is
just
a
little
dashboard
with
two
two
little
widgets
showing
profit
for
the
last
12
months.
Just
some
example
data
I
generated.
I
have
an
input
control.
That's
passed
back
to
the
query.
So
if
I
change
this
to
non-consumable
re-executes
the
queries
re-renders
the
reports
and
shows
them
to
me
here,
these
are
good
for
a
high-level
kind
of
overview
of
data.
A
We
see
the
the
cost
is
in
the
red.
Now
it's
costing
more
than
there
was
some
target
number
that
I
had
set.
We
use
nice
modern,
javascript,
charting
libraries
in
our
product
that
you
can
easily
kind
of
build
stuff
into.
So
you
don't
have
to
be
a
developer
to
build
this
stuff.
You
can
you
have
to
be
fairly
technical,
but
you
don't
have
to
get
down
into
the
nitty-gritty
of
the
charting
library
we
kind
of
wrap
around
the
next
thing
I
want
to
show.
A
So
what
it's
doing
now,
it's
it's
running
some
queries
on
the
left-hand
side
to
produce
the
the
choices
that
I
have.
So
I
basically
did
a
select
product
family
from
my
sales
table
and
I
select,
let's
say
non-consumable
and
then
from
that
it
runs
another
query.
To
give
me
the
possible
departments
that
I
have
in
my
database
and
I'm
going
to
choose
household
goods.
So
now,
when
I
hit
apply,
then
it
sends
the
full
query
over
and
brings
back
the
data.
A
Okay,
so
it's
still
kind
of
loading,
the
data-
it's
just
showing
me
the
first
page
41-page
report
not
not
too
exciting,
but
I
think,
what's
not
exciting
to
developers-
is
having
to
make
this
for
each
user
that
asks
for
it
and
having
to
do
a
custom
kind
of
bespoke
job
for
it.
So
what's
nice
about
it,
is
it
lets
the
end?
Users
then
further
manipulate
the
report
to
their
to
their
liking.
A
So
what
I'll
do
here
is
change.
Do
a
little
filtering
on
the
units
sold.
This
will
actually
so
now
I
have
a
result
set
from
cassandra
and
the
rest
of
it.
I'm
going
to
kind
of
do
in
memory,
so
I'll
do
some
filters
on
units
sold.
I'm
going
to
set
this
to
show
me
only
the
ones
that
are
less
than
or
equal
to
2.,
I'm
also
going
to
change
the
city
to
only
san
francisco.
Maybe
that's
the
city
that
I
that
I'm.
A
A
A
A
And
then
I
can
schedule
it,
so
I
can
just
click
on
it.
Schedule
and
it'll.
Take
me
through
a
few
scheduling
screens.
Yeah
start.
It
immediately
show
it
to
me
every
every
week
and
choose
any
parameters.
I
might
have
a
time
based
parameter.
That
says,
you
know,
show
me
the
last
seven
days,
something
like
that
and
then
I
can
get
notified
via
email.
Send
me
a
report
with
the
html
writing
again.
It's
it's
about
making
these
available
to
people
that
don't
have
you
know,
wouldn't
be
able
to
have
direct
access
to
cassandra.
A
A
Okay,
and
so
that
was
that
was
the
part
that
was
built
by
somebody
technical.
So
I'm
going
to
show
you
the
designer
for
that.
Take
just
a
second.
A
So
this
is
that
that
tool
just
for
some
studio
built
on
eclipse,
you
can
just
add
it
to
your
eclipse
environment
as
a
plugin,
where
you
can
stamp
download
the
standalone
version
and
what
it
what
it
does
is.
It
gives
me
a
this
plugin
right
into
my
jasperself
repository,
and
so
what
I've
done
is
I've
opened
up
the
the
kpi
report
and
I'm
just
going
to
pull
down
the
definition
for
that
report.
So
this
is
basically
a
skeleton
that
I've
created
here
of
a
report.
A
B
A
When
I
hit
read
fields,
it's
going
to
go
in
and
it's
going
to
bring
back
the
fields
and
assign
it
to
a
java
type,
which
is
what
the
java
driver
does.
Then
I
have
those
as
just
java
objects,
so
these
fields
are
now
listed
as
java
objects.
I
can
create
variables
so
like
in
this
report.
I
had
one
called
profits
for
profit
per
unit
sold
by
calculations
on
top
of
the
data.
A
So
profit
is
just
something
that
I've
taken:
a
field:
sales,
minus
cost
and
that's
what's
come
up
with
the
profit
and
profits
per
unit
is
just
a
calculation
of
profits
and
units
and
then
the
other
stuff
I'm
doing
aggregations
in
memory.
So
the
sum
things
like
that
is
how
we're
getting.
A
There's
a
whole
palette
of
stuff
that
I
can
use
like
you
would
expect
in
eclipse,
so
we've
got
the
charts
and
all
that
stuff
maps
crosstabs
that
you
can
just
drag
in
and
configure.
So
it's
fairly
easy
to
get
get
something
started.
A
What's
nice,
too
is
this?
Is
all
free?
You
can
go
download
it.
You
know,
use
it.
The
tart
that
I'll
show
you
next
is
is
not
brief.
It's
part
of
our
commercial
product,
but
the,
but
the
reporting
part
of
it
is
free.
You
can
go
and
check
it
out,
build
stuff
like
this,
and
people
can
stop
bugging.
You
to
you
know,
update
this
report
or
to
schedule
a
report.
A
So
we'll
we'll
go
back
into
we'll
go
back
into
the
server
bit
and
what
I
want
to
do
now
is
create
an
ad
hoc
view.
So
this
is
that
piece
where
I
can
do.
Data
exploration
where
a
query
is
ran
it's
going
to
return
it
the
results
that
are
going
to
come
back
into
memory
and
then
I
can
kind
of
further
manipulate
the
objects
there
filter
and
all
that
stuff,
and
I
don't
need
to
know
the
query
language.
I
don't
need
to
use
the
technical
tool
like
just
yourself
student,
so
I'll
go
create
here.
A
It's
going
to
ask
me
for
a
list
of
topics
or
what
I
want
to
choose
from.
We
have
different
ways
to
get
at
data
and
jasper's
up.
There's
our
metadata
layer
called
domains,
there's
different
ways
to
get
into
olap
cubes
built
using
mondrian
and
microsoft,
sql
for
analysis,
and
then
you
can
build
what
are
called
topics
and
that's
what
I've
done
here.
So
I've
built
sales.
A
Setting
up
the
environment
for
me
now,
where
I
can
ask
the
questions
that
I
want
to
ask.
So
the
first
thing
is
like
it's
still
parametrized.
They
can
still
pass
in
n
number
of
queries.
Like
you
know,
we
might
want
to
put
something
in
the
where
class
that's
going
to
limit
it
to
the
last
hour's
data
or
to
the
last.
You
know
whatever
in
this
case.
A
It's
the
same
filters
I
had
before
product
family
and
product
department,
I'm
going
to
leave
those
alone,
and
then
I've
got
a
list
of
fields
and
measures
that
I
can
use
to
build
my
chart.
So
if
I
bring
in
sales
and
units,
it's
going
to
show
me
right
away
all
the
sales
and
all
the
unit
sales
that
I
have.
A
Breaks
it
down
by
product
subcategory,
so
these
are
snack
foods,
chocolate,
candy
gum
and
hard
candy.
We
can
break
down
even
further.
Let's
say
we
want
to
do
it
by.
B
A
A
We
might
actually
do
better
with
a
scatter
chart
where
I'm
showing
on
one
axis.
I
have
the
unit
sales
on
another
one.
I
have
store
sales,
you
can
even
do
simple
calculations
like
I
want
to
calculate
profit,
so
I
have
sales
and
cost
create
a
custom
field
where
we
take
sales,
minus
cost
basically
calculate
profit,
I'm
going
to
take
out,
let's
say,
store
sales
and
bring
in
profit.
So
I
want
to
see
what
are
my
most
profitable
and
selling
the
most
that
I
do
so.
A
I
see
I
have
a
big
cluster
down
here,
which
is
this
is
probably
my
least
popular
item
would
be
the
item
I'm
making
the
most,
the
least
from
is
selling
gum
in
the
city
of
bellingham.
I
only
have
four
units
sold
there
and
I've
made
five
bucks
off
it,
not
worthwhile
at
the
top
here.
A
The
best
one
would
be
selling
chocolate.
Candy
in
salem
seems
to
be
a
popular
thing
to
do
so,
making
a
lot
of
money
off
that
selling
a
lot
of
it.
So
those
are
the
kind
of
questions
you
can.
You
can.
A
No,
no
more
slides,
so
that's
it
any
questions
about
what
you
guys
saw,
how
it
works.
Okay,
to
start
with,
you.
D
So
could
you
remind
a
brief
about
the
reporting
product?
Is
it
each
one
that
is
going
to
hit
sandra
versus
the
commercial
product,
which
is
maybe
somehow
collecting
that
data
storing
it
somewhere
else,
and
then
you
can
swipe.
A
Both
of
them
are
actually
going
to
bring
the
result
set
into
memory.
So
when
I
showed
you
the
report
where
I
was
like
doing
the
conditional
formatting
that
kind
of
tabular
one
I
didn't
have
to
go
and
re-query
every
time
that
I
wanted
to
filter
down
some
more.
There
was
the
filters
on
the
left.
D
To
hold
all
the
nodes
to
type
that
data?
Well,
it's
I
guess
it's
using
cql3,
which
is.
A
A
I
think
I
think,
there's
no
single
use
case
that
covers,
I
would
say
it
depends
if
you're
wanting
to,
if
you're
wanting
to
combine
a
lot
of
data
like
you're
wanting
to
blend.
A
And
you
know
cassandra's
just
one
of
the
many
things
then
go
with
the
etl
approach.
If
all
of
your
data
is
in
cassandra
and
it's
a
lot
of
data
go
with
the
hadoop
approach
and
if
and
if
there's
some
data
that
you
need
real
time,
then
go
with
either
those
two
approaches
and
some
data
being
connected
directly.
I
actually
asked
this
morning
during
the
keynote
session.
A
I
asked:
what
are
the
plans
to
add
more
aggregation
functions
into
the
cbll
language
and
he
said?
Well,
I
don't
know
so
I
don't
think
they're
going
to
come,
certainly
not
going
to
come
in
a
year.
So
until
that
happens,
your
options
start
to
pull
the
data
out
either
to
the
group
or
with
your
own
etl
question
back.
C
There
I
just
wanted
to
ask
about
the
model
with
the
data
you
know
assume,
because
you
can't
join
or
anything
that
you
have
to
create.
You
have
to
model
your
data.
Every
topic
that
you
have
pulled
into
the
system
does
that
work,
yeah.
C
C
A
Right
right
now
with
etl,
of
course,
you
could
bring
in
various
tables
and
join
them
together
later,
so
that
would
mean
you're
not
getting
it
live,
but
you
are
adding
joint
capabilities
to
it.
So
you
don't
have
to
worry
so
much
about
the
data
model.
It's
a
good
question,
though,
because
yeah
the
joint
limitation,
okay
cool,
I
think.