►
From YouTube: User Defined Functions in Apache Cassandra 3.0
Description
Speaker: Robert Stupp, Consultant
User Defined Functions (UDFs) allow users to code their own functions in Java or a JSR-223 scripting language. The presentation describes the current status of UDFs and its related use.
A
Hello,
everybody
I'm
just
talking
I,
wanted
to
present
you.
Some
new
functionality
will
be
introduced
in
Cassandra
32,
though
it's
called
user-defined
functions,
I
think
it's
a
great
feature,
especially
for
everybody,
since
we
tell
people
not
to
use
any
code
in
Cassandra
some
words
about
me:
I'm,
just
a
contributor
to
Apache
Cassandra
I've
built
the
UDF
staff
currently
working
on
the
Rope
cash
for
30
I'm,
basically
I'm
working
as
a
freelancer
on
my
own
helping
customers
to
build
good
Cassandra
solutions.
A
Cassandra
field,
always
Brent
you
and
actively
developed.
So
everything
is
on
a
development.
Everything
may
change.
So
please
don't
blame
me
for
anything.
I
tell
you
now
and
might
have
might
be
changed
so
Cassandra
fear,
though
it
will
bring
a
lot
of
new
features.
As
Jonathan
already
told
today.
A
lot
of
new,
improved,
great
improvements
and
user-defined
function
is
just
one
of
it.
A
So
what
it
basically
means
user-defined
functions,
so
you
can
go
at
and
start
writing
your
own
code.
Let
it
eat
execute
on
the
Cassandra
nodes
and
one
one
thing
I'm
really
proud
of
is
that
your
own
code
gets
automatically
distributed
to
the
whole
cluster
yeah.
So
please
take
the
last
cent,
not
word-by-word.
A
A
A
As
I
said,
it's
basically
simple
to
set
up
a
user-defined
function,
so
you
have
a
bunch
of
arguments
12
and
you
have
a
return
type.
You
specify
the
language
you're
using
and
you
specify
the
source
code
you
are
using.
Yeah
will
be
injected
into
a
class
with
boost
its
transparently,
but
you
don't
have
to
bother
about
it.
Yeah.
So
just
argument,
return
type
and
some
language.
A
A
A
So,
behind
the
scenes
it
takes
the
code
that
you
have
written
in
the
create
function,
statement
built
the
java
class
or
the
script
file
compiles
it
it
loads.
The
codes
are,
and
it
transparently
migrated
to
every
other
node
in
the
cluster,
similar
to
a
create
table
statement,
create
index
statement
like
that
and
now
it's
executable
in
the
whole
cluster.
On
every
note.
A
A
So
yeah,
what
it's
for?
A
A
A
A
That
is
basically
the
syntax
to
create
an
aggregate.
The
only
aggregate,
if
you
want
to
curl
the
minimum
function,
just
give
it
a
name.
Some
argument,
some
state
it
has
to
maintain
state
when
scans
arose
and
the
name
of
the
EDF
ships
used
to
calculate
the
minimum
yeah
basically
working
internally
is
it
has.
A
State,
basically
now
for
each
row,
that
is
what
will
be
scammed.
You
take.
The
state
will
get
to
state
in
including
empty
value
of
the
rows.
Calculate
the
new
state
returns
it.
The
last
state
will
be
returned
value,
so
what
you
previously
have
to
do
is
scan
all
rows.
We
turn
them
to
the
clip
to
your
client
and
calculate
the
minimum
on
your
side
that
it
wouldn't
work
for
something
like
an
average.
A
Well,
you
have
to
do
to
some
and
to
divide
by
the
number
of
rows,
so
you
need
something
that
works
after
the
last
row
to
calculate
something.
That's
called
a
final
function
and
you
can
see,
for
example,
they
can
even
use
to
pitayas.
Those
are
two:
perhaps
it
okay
to
put
tabs
are
just
as
you
can
see
in
the
syntax.
It's
just
one
type.
A
A
I
think
it's
not
the
intention
to
execute
either
good
at
what
I
meant
when
you
win
a
set,
don't
pull
in
any
evil
dependencies,
costly
dependencies,
things
that
have
to
wait
for
something
else
and
boot
style
slow
down
your
whole
cluster
I.
Think
you
don't
want
that
that
we,
what
we
also
want
to
do
in
30
to
add
some
permissions
for
DD
l
stands
for
DML
statement,
whether
you
can
execute
a
function
or
not,
and
also
some
still.
A
The
functions
you
are
used
to
use
are
like
a
now
count
time
stamp
off
they're,
not
called
native
functions,
and
they
there's
a
reason
that
they
have
to
belong
to
the
system
key
space,
it's
just
because
you
can't
modify
them.
You
can't
even
drop
them,
so
each
user-defined
function
or
aggregate
just
belongs
to
a
key
space
like
a
table
or
use
it
hard.
A
Yes,
some
some
words
about
it
as
a
set
scripting,
it's
nice,
it's
really
nice.
You
can
call
Scala
or
groovy
something
like
that
in
Europe
and
let
it
run
on
your
cluster.
But
you
have
to
keep
in
mind
that
that
scripting
has
a
lot
of
overhead
during
some
tests.
I
found
out
that
the
overhead
just
to
execute,
for
example,
JavaScript
about
thousand
times
slower
than
Java
yeah.
Keep
that
in
mind.
I
would
strongly
recommend
just
to
use
Java
plane
job.
A
A
So
the
aggregate
stuff
was
built
by
Benjamin
era.
He
did.
He
asked
me
to
yeah
set
I'm
a
geek
I
just
want
to
know
a
bit
more
whites
execute
on
the
coordinator.
Why?
It's
not
just,
for
example,
for
aggregation,
distributed
to
the
whole
class
and
the
whole
cluster
does
work
and
returns
the
results
back.
A
C
C
C
A
What
you
can
use
them
could
be
used.
You
could
use
a
UDF
for
create
in
X
for
X
unless
John
isn't
set
in
this
presentation,
for
partial
indexing,
for
more
advanced
filtering
and
even
ya
might
be
a
bit
complicated,
big
complex
to
implement,
but
such
a
thing
like
a
distributed
group
by
where
the
individual
nodes
can
aggregate
data
on
the
partitions
they
own
and
turn
return.
These
sub
partial
results
back
to
the
coordinator
me
does
a
final
merchant.
It.
A
E
D
D
A
D
B
D
B
D
D
D
A
E
A
Not
really
much
because
if
you
stick
to
java
yeah
because
it
will,
it
will
be
yeah
jitter.
This
is
a
bit
too
much,
but
just
an
execution.
So
we're
talking
about
nano
seconds,
maybe
maybe
some
micro
seconds
for
each
row.
It
will
grow
if
you
use
javascript,
because
then
you
get
easily
in
the
middle
east.
Second.
F
A
F
A
B
A
D
B
A
B
What's
happening
in
Cassandra
coordinate
going
to
get
back
to
your
reserve,
you
build
your
rigid
second,
you
send
your
reserves
user
what's
happening
there
is
that.
Why
do
you
build
your
razor
set?
Then
you
apply
the
function
to
the
data
that
you
store
is
the
result
of
your
fortune.
Instead
of
the
original.