►
Description
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
A
A
He
is
Technical
Evangelist
or
Apache
Cassandra
for
data
sex,
and
he
spends
a
ton
of
time
talking
about
Cassandra
and
educating
and
helping
people
in
the
community.
So
we
are
delighted
to
have
with
us.
Do
hi
today
before
I
hand
over
to
him,
I
would
like
just
to
explain
how
the
Q&A
session
will
work.
If
this
is
your
first
webinar
with
us,
so
inside
of
WebEx,
there
is
a
Q&A
panel.
Please
type
your
question
in
there
any
time
that
we
have
left
after
the
presentation.
A
Until
the
top
of
the
hour,
we
will
spend
going
through
many
questions
as
possible.
We
are
recording
today's
presentation
as
well
and
we
will
make
the
sides
available
and
you
will
be
notified
by
email
that
you
registered
with
when
those
are
ready,
so
do
I.
Thank
you
very
much
for
joining
us
today.
I
am
going
to
hand
over
to
you.
B
Okay,
thank
you
very
much
Christian.
So
let's
stop
Thank
You
Christian
Troy,
introducing
your
myself.
So
let
me
introduce
myself.
My
name
is
Yuri
hi.
So,
yes,
I
am
working
as
a
cosmonaut
technical
advocate
for
our
data
stacks,
but
before
I
was
a
Java
developer
working
on
the
lieben
project
and
on
my
free
time,
I
am
also
developing
a
series
which
is
an
open
source
at
then
subject
matter
for
kazama.
We
will
talk
about
it
later
during
this
webinar
so
far
today,
agenda
I
will
stop
introducing
the
Limbourg
context.
What
is
living
business?
B
B
So
first,
the
context
Leoben
is
a
messaging
app
with
a
bunch
of
features,
among
others,
voice
over
IP
custom
voice,
mailbox
Chris
greetings
a
messaging
system
which
allows
you
to
send
text
messages,
chattin
pipes
and
the
last
important
feature
is
contact
matching.
We
will
focus
into
that
right
now,
because
it
is
the
main
topic
of
the
data
migration.
B
Okay,
so
I
am
a
liberal
either
one
of
my
friend
just
subscribe
to
live
on
his
contact
uploaded
into
ribbon
and
by
the
contact
matching
mechanism.
Leoben
can
see
that
I
am
in
his
address
book,
so
we
are
asked
to
accept
to
be
linked
to
each
other.
That
is
basically
the
contact
matching
feature
quite
simple.
B
Now,
to
give
you
some
context
of
the
project,
it
is
over
four
years
old
and
we
are
already
using,
but
only
for
the
messaging
part
and
in
production.
Custom
part
gives
us
three
and
right
latency
well
under
millisecond
and
the
overall
server
response
time
when
requesting
Cassandra
is
under
ten
millisecond.
So
we
are
quite
happy
with
Cassandra
about
the
context.
They
are
still
so
in
a
our
DBMS,
which
is
Oracle,
but
in
fact
the
problem
remains
the
same,
even
if
it
was
a
multi-core
or
Postgres
are
as
sequel
server.
B
B
In
fact,
the
problem
right
now
is
that
we
have
in
the
backend
application
to
different
desktop
store
with
very
different
paradigms.
Our
DBMS
and
cassandra
added
a
DBMS
are
not
better
or
not
worse
than
death,
or
god
it's
like
comparing
April
and
orange.
Oh,
it
has
no
sense,
in
fact,
our
DBMS
a
very
well
designed
pieces
of
software.
They
have
been
designed
to
solve
a
class
of
problem,
but
not
the
one
we
are
facing
nowadays.
Okay
and
I.
B
Like
this
metaphor,
they
are
like
helicopters,
they
are
very
flexible
income
up
weighing
Delta
that
they
cannot
go
very
fast
and
very
high
and
in
the
country,
on
the
contrary,
cassandra
has
been
designed
for
Pokemon
and
scalability
from
the
ground.
It's
like
a
jet
fighter.
You
can
fly
very,
very
fast,
very
high,
that
it
gives
up
in
terms
of
maneuverability,
meaning
that
it's
not
very
easy
to
correlate
that
with
the
Condor.
B
So
since
it
is
a
really
button
for
the
developer
to
keep
both
later
star
in
the
center,
can
we
need
to
choose
okay
and
to
help
earth
choosing
the
right
solution.
We
should
take
into
account
the
neck
challenges,
the
project
we
need
to
face.
So,
first
of
all,
we
want
to
be
highly
available
in
the
face
of
database
on
site
failure.
When
you
have
millions
of
users,
you
cannot
afford
having
downtime
by
and
certainly
we
want
to
have
addictive
on
performance
when
the
number
of
users
will
increase
in
the
future.
B
B
The
objective
of
this
migration
is,
of
course,
not
to
have
the
production
downtime
customers
first
always,
and
since
we
have
multiple
back-end
servers
in
production,
we
may
have
concurrent
updates
are
the
same
piece
of
data
and
we
do
not
want
to
handle
all
the
possible
turn
of
cases,
so
the
migration
should
be
concurrently
put.
That
is
a
very
important
point.
B
The
apps
also
access
to
make
this
migration
as
safe
as
possible,
meaning
that
we
should
be
able
to
roll
back
the
process
at
any
time,
so
in
term
of
design,
it
means
that
we
should
be
able
to
stop
to
reduce
to
replay
the
migration
batch.
Okay,
so
those
are
those
objective
are
mandatory,
so
the
strategy
consists
of
soft
stages.
B
B
So
let's
now
dig
into
the
details
of
the
migration
for
the
first
phase,
all
the
contacts
data
that
are
coming
in
to
the
backend
servers.
They
are
safe
both
into
Oracle
and
Cassandra.
In
Oracle,
we
are
using
a
sequence
to
generate
the
contact
ID,
which
is
alone
okay,
and
for
Cassandra
we
generate
a
time
user
ID.
We
created
an
extra
column
in
the
contact
table
in
Oracle
to
stop
this
contact
new
UID,
and
when
we
read
data
we
only
use
Oracle.
So
from
the
point
of
view
of
external
clients,
nothing
has
changed.
B
The
interface
remains
the
same
for
the
second
phase
on
live
production.
We
will
run
the
migration
batch
for
a
set
of
users.
We
will
fetch
all
their
contacts
in
Oracle
having
contact
user
ID.
No,
it
means
that
those
contacts
have
been
inserted.
We
saw
stage
one,
so
they
did
not
exist
in
Cassandra
and
we
will
insert
them
into
Cassandra
using
a
special
time
storm,
which
is
now
one
week.
In
fact,
we
can
be
now
one
day
or
one
hour
whatever
inside
the
mind.
The
main
idea
is
just
to
use
a
timestamp
in
the
past.
B
Why
should
we
do
that?
During
the
data
migration?
We
can
have
concurrent
write
from
the
batch,
okay
and
concurrent
updates
from
production
for
the
same
contact.
Of
course,
let's
take
an
example
to
illustrate
here.
We
have
a
contact
whose
name
is
Johnny
with
a
typo.
Okay,
the
migration
batch
is
copying
it
from
Oracle
to
cassava.
B
We
sometimes
come
in
the
past
and
at
the
same
time,
the
user
is
updating
the
contact
to
fix
the
typo
and
the
production
update
will
be
written
using
the
current
timestamp
so
that
future
reach
of
this
contact
will
always
return
the
value
updated
from
production,
not
the
one
from
migration.
But
if
you
look
into
this
diagram,
you
can
see
that,
whatever
the
scenario
of
concurrent
update,
we
will
always
with
the
same
value
in
the
end.
Okay,
so
by
writing
to
the
pass
with
the
migration
batch.
B
We
want
to
give
higher
priority
to
update
coming
from
production
and
we
are
leveraging
the
Cassandra
conflict
resolution
mechanism
by
playing
with
timestamp.
So
this
is
part
of
the
second
phase
of
the
migration.
The
top
phase
is
quite
simple:
we
switch
the
read
from
Oracle
to
katana.
This
phase
allow
us
a
tapes
run
back
to
icon.
In
case
we
discover
any
issue,
I
hope
not
fingers
crossed,
and
for
the
last
page
we
stop
writing
into
Oracle
and
only
operate
with
Cassandra.
Of
course
right
now,
let
me
show
you
how
we
refactor
all
the
business
code.
B
So
first
we
did
a
cut
inventory
and
it
looks
like
that
the
code
has
been
written
for
our
DBMS.
Of
course,
there
are
a
lot
of
joints.
The
cut
has
been
designed
around
transaction.
We
can
see
the
spring
add
a
transactional
annotation
everywhere
in
the
code
base.
Okay,
so
very
classical
line
of
this.
In
fact,
we
realize
that
the
complex
entity
is
leaking
from
the
repository
layer
up
to
many
business
services.
B
In
addition,
we
rely
heavily
on
hibernate
and
on
its
magic
lazy
loading
first
level
cache
and
automatically
regeneration.
You
all
know
the
famous
and
plus
one
collect
issues.
In
fact,
hibernate
is
very
great,
but
it
is
kind
of
magic.
We
do
not
control
totally
what
happened
underneath,
so
we
were
facing
a
tough
choice.
Should
we
throw
everything
and
redesign
contracts
for
Cassandra?
Of
course,
not
as
you
get
is
only
a
bit
Oracle
question.
Indeed,
our
existing
code
base
R
has
a
very,
very
good
code
coverage.
B
B
B
We
extended
this
entity
to
create
a
new
contact
notes,
equal
entity
and
this
entity
will
come
from
many
existing
joints
into
de-normalized
tables
in
Cassandra.
In
fact,
de-normalized
table
can
be
seen
as
materialized
views.
Okay
and
in
addition,
we
also
apply
the
CQRS
pattern,
which
stands
for
command:
query
responsibilities:
segregation.
The
idea
is
instead
of
having
one
big
giant
repository
to
handle
the
contacts
we
split
it
into
four
distinct.
Predatory
compact
reading
is
very
straightforward.
It
implies
sequential
read
on
disk
without
any
join
of
course,
and
each
read
will
translate
directly
into
one
select.
B
Updating
contact
is
the
pain
point
here,
because
in
most
cases
it
implies
read
we
can
write.
The
reason
is
that,
since
we
keep
the
same
business
layer
as
before,
we
are
using
the
the
common
read
before
write
pattern
used
by
JPL.
Ok,
first,
you
are
loading
an
entity
into
memory.
Then
you
update
it
and
then
you
save
it
back.
B
In
fact,
the
the
update
scenario
is
quite
rare,
so
we
accept
this
trade-off,
and
last
but
not
least,
did
it
in
contact
result
in
limiting
the
entire
partition
most
of
the
time,
and
so
the
outcome
of
this
migration
is
5
months
of
two
main
works.
We
spend
mainly
many
time
either
rating
to
fix
bugs
a
big
thank
to
our
integration
test
to
help
us.
We
did
a
lot
of
performance
benchmark
using
that
name.
If
you
don't
know
that
thing,
I
strongly
recommend
you
to
have
a
look
at
it.
It
is
very
nice.
B
It
has
a
scholar
DSL
to
write
your
benchmark
scenarios
very
nice
and
what
is
also
nice
with
scheduling
is
that
it
will
generate
automatically.
The
results
are
graphs
for
you
with
the
response
time
in
milliseconds,
a
bracelet
and
chains,
the
ninety-nine
thousand
times,
and
so
on.
Ok
and
the
benchmark
also
helped
us
to
validate
that
our
data
model
will
scare
with
the
new
traffic
and
we
are
almost
almost
there.
The
migration
is
going
to
production.
This
end
up
which
ok,
ok.
B
Now,
let's
have
a
look
into
the
data
model,
details
which
is
the
most
exciting
chapter
of
this
webinar,
so
a
little
summary
about
denomination,
the
good
part
of
it
is
it
will
help
you
having
very
fat
weed.
It
is
well-suited.
If
you
have
mostly
read
few
updates
scenarios,
the
bad
part
is,
if
you
have
mutable
data
to
be
updated,
it
can
be
very
difficult
in
our
case,
since
we
are
bound
to
an
existing
IDI
and
existing
business
code.
B
B
Is
the
data
model
in
detail,
as
you
can
see,
most
of
those
the
nominal
table
are
created
to
support,
searching
and
fast
lookup
contacts
by
ideal
contacts
by
identifier,
contact
search
by
first
name
and
last
name
and
so
on
and
every
time
we
take
care
to
add
the
user
ID
as
a
component
of
the
partition
key.
Why?
Because
we
want
that
design
to
scale
with
the
number
of
users.
B
Some
tables
we
also
add
the
contact
ID
to
the
partition
key
to
leverage
the
bloom
filter,
such
lookup.
If
the
contact
does
not
exist,
the
bloom
filter
will
return
a
to
negative.
So
in
that
case,
Cassandra
will
never
hit
disk,
and
if
there
is
a
positive
hit,
Cassandra
will
touch
one
SS
table
most
of
the
time.
Okay,
we
also
have
some
white
partition,
for
example,
this
table
contact
by
modification
date.
It
is
indeed
a
cue
light
table
which
is
clearly
an
anti-pattern
focus
on
our.
B
B
B
Some
remarks
about
the
contact
ID
as
I
said
in
Oracle,
we
use
a
sequence
to
generate
a
long
and
in
Cassandra
we
generate
a
time
UID
ourselves
on
the
client
side.
The
question
is
how
to
start
both
data
types
in
one
Cassandra
column.
So
an
obvious
answer
is
to
store
the
values
at
X.
Okay.
It
is
a
quick
and
dirty
solution,
but
by
doing
this
we
wait
a
lot
of
space
because
we
an
got
every
character
as
a
text
using
utf-8
or
at
key
here.
The
long
type
is
encoded
using
a
dice.
B
Normally
we
take
encoding.
The
number
of
bytes
use
correspond
to
the
digits
count.
The
worst
scenario
is
with
UUID,
because
you
you
Eddie,
is
normally
encoded
using
sixteen
bytes.
It
is
composed
of
34
hexadecimal
characters
and
for
hyphens,
so
they
are
36
characters
in
total.
If
you
try
to
encode
them
as
text,
if
you
give
you
20,
bytes,
overhead,
okay,
so
20
bytes
overhead
by
Compaq,
UID
times,
7
denomination
equals
140
bytes
for
each
contact,
your
idea
and
for
each
billions
of
contact
weighs
140
gigabytes
of
vapor,
not
even
counting
replication
factor
the
equation.
Okay.
B
So
if
you
replicate
three
times
you
can
you
can
you
can
computer
the
mass?
So
the
solution
is
to
save
the
contact
ideas,
just
as
a
byte
array
and
to
use
actually
types
on
trauma
for
automatic
conversion.
We
will
look
into
the
details
very
soon.
Bear
with
me.
If
you
want
to
investigate
data
in
production,
you
can
use
the
native
sequel
function,
club
as
begin
car
club
as
new
UID,
to
convert
the
byte
array
into
your
type.
Okay.
B
Now,
as
promised
added
since
the
beginning,
I
will
show
you
some
feature
of
a
cadiz.
We
are
using
very
heavily
with
this
migration
project,
but
first
a
very
quick
introduction
to
Achilles.
It
is
an
advanced
object.
Mapper
that
sits
on
top
of
the
data
text,
Java
driver.
It
provides
a
lot
of
features
and
a
very
friendly
environment
for
TDD.
B
So
with
a
Calise,
we
are
using
the
T
checking.
What
is
it?
Let's
take
a
compact
again,
a
compact
has
eight
mutable
fields,
times,
seven
denomination.
It
give
us
5,
56
single
fields,
update
combinations,
not
even
counting
all
the
multiple
fields
updates.
Ok,
so
it
is
clearly
not
feasible
to
manually,
generate
all
the
prepared
statement
for
all
those
update
combination
and,
of
course,
we
can
use
dynamic.
Plain
text
queries
at
runtime,
but
in
this
case
we
will
have
some
performance
penalty.
B
The
idea
with
a
cadiz
is
from
the
manager.
You
provide
the
contact
entity
class
and
a
contact
ID
as
a
primary
key,
and
actually,
if
we
return
an
empty
proxy
on
these
contacts,
committee
I
want
to
make
it
clear.
There
is
no
read
before
right.
Ok,
this
is
very
important.
At
least
we
not
read
any
data
from
Cassandra.
The
proxy
is
completely
empty,
and
from
this
proxy
you
can
call
us
at
any
setter
to
update
the
mutable
fields.
What
happened
in
the
background
is
that
the
proxy
will
intercept
your
set.
B
Our
code
determine
the
column
to
be
updated
and
the
new
value
put
them
into
a
data
map
and
after
that
a
cadiz
will
use
this
information
to
generate
dynamically
at
runtime
and
that
they
protect
that
money
with
just
the
columns
to
be
updated,
Norma,
no
less.
Of
course,
this
prepare
statement
will
be
put
into
a
cache
to
be
reused.
Later
now,
let's
see
another
important
feature
insert
strategy.
Suppose
you
want
to
generate
this
progressed
a
plan
to
create
a
new
contact
in
cattle
market.
B
Very
simple
segment:
Edward
I'm
in
some
columns
of
the
contacts
table
are
optional.
You
will
have
to
bind
a
lot
of
null
values
to
the
Clipper
statement
and
the
problem
is
with
growth
null
values:
okay,
inserting
no
in
a
column
with
secure
means
deleting
it
means
creating
comes
done.
This
is
the
official
semantics
of
CTL,
so
creating
thumbskull
x,
seven
denormalization
stands
billions
of
contacts
seriously.
I
really
don't
want
to
create
so
many
times
on
my
katana.
It
will
help
your
compaction,
your
repair
and
it
will
eat
a
lot
of
this
space.
B
So
to
take
the
issue,
we
use
a
simple
annotation
from
akiza.
Add
strategy
with
the
right
inch
of
strategy
at
runtime
by
introspection.
Achilles
will
check
all
not
null
columns
and
generate
the
appropriate
creature
statement,
of
course,
again
the
generated
product
that
man
is
put
into
a
cache
to
be
reuse.
B
Do
you
remember
when
we
were
discussing
about
type
conversion?
Well,
here,
accolades,
provide
you
a
pluggable
products
system.
All
you
need
to
do
is
to
declare
your
own
codec
by
annotation.
The
architect
should
implement
a
very
simple
interface
here
and
at
runtime.
Actually,
we
perform
encoding
decoding
on
the
flight
and
transparently
for
you,
the
last
feature
of
Katies,
probably
the
most
open
and
useful
car
for
the
team,
is
the
dynamic
logging.
In
fact,
when
I
design,
this
framework
I
did
not
want
to
have
a
black
box.
B
Okay,
where
nobody
knows
what's
happening
inside
like
hibernate
to
understand
what
activities
is
doing
behind
the
scenes.
You
can
just
turn
on
some
blogger
and
actually
we
print
out
all
the
previous
statements
and
all
the
bun
values.
For
you.
The
nicest
thing
with
logging
is
that
it
is
dynamic.
You
can
activate
it
at
runtime.
They
no
need
to
recompile
and
redeploy
your
coder
okay,
and
this
feature
helped
us
a
lot
and
save
us
Howard
of
debugging.
B
Also,
if
you
can
also
request
our
query
tracing
by
setting
the
same
blogger
at
writes
log
level,
so
it's
quite
nice
and
that's
it
so
in
a
nutshell,
to
take
on
such
data
migration
project.
You
should
know
that
data
modeling
is
the
key
for
your
success
or
failure
eat
you
messed
up,
okay,
to
warranty
a
smooth
and
a
safe
migration.
You
should
use
the
double
run
strategies
with
the
tantum
spec
to
avoid
any
concurrency
issue.
B
Please
pay
some
attention
to
data
type
conversion,
as
we
can
see.
Small
optimization
can
result
in
a
huge
gain
when
you
are
dealing
with
millions
of
become
and
do
not
forget
to
benchmark
your
data
order.
I
know
it
seems
obvious,
but
a
lot
of
people
miss
it
and
yes,
please
do
not
benchmark
with
ridiculous
data
set,
because
there
is
no
point
doing
that,
because
you
will
not
stress
enough
Cassandra
to
see
the
limits
and
a
ton
of
cases
of
your
data.
B
Okay
and
the
last
important
important
thing
is
to
make
all
the
team
involved
in
the
project,
not
only
the
developers
but
also
the
apps,
because
they
are
going
to
be
responsible
for
your
cut
introduction
and
also
the
dispatch
team,
because
they
may
be
impacted
by
your
design
and
I.
Think
that's
pretty
much.
Thank
you
for
listening.
If
you
have
any
question,
please
feel
free
to
ask.
A
Thank
you
very
much
indeed.
Zhuhai
really
appreciate
your
taking
the
time
today
to
go
through
your
experience
of
migrating
form.
Relational
databases
to
Apache
Cassandra
will
now
open
up
the
line
for
question
and
answers.
Just
a
quick
reminder.
Please,
please
use
the
Q&A
tab
inside
a
WebEx
type,
your
question
in
there
and
we
will
get
through
just
as
many
as
we
can
so.
B
A
B
B
B
A
B
A
B
Yeah
we
are,
we
do
not
handle
the
DOE
scenario
because
yeah
there
is
nothing
as
a
transaction
at
C
transaction
risk
with
Cassandra.
There
are
some
scenario
which
we
can
grant
Eve
strong
consistency
like
creating
a
new,
a
new
account
or
new
contacts
that
did
not
exist.
We
use
lightweight
transaction
with
with
with
Cassandra,
but
that's
pretty
much.
We
try
to
make
a
new
design
to
avoid
having
to
rely
on
transaction
and
all
the
exceed
properties
of
relational
database,
because
we
know
that
it
will
not
care.
A
B
B
As
I
said,
we
did
not
refactor
the
business
layer.
Okay,
we
only
change
the
database
code,
so
we
we
should
inherit
the
existing
code
from
the
business
area
and
there
is
a
a
very,
very
anti
pattern
from
the
business
code,
which
is
every
time
you
you
need
to
update
something
you
need
to
read
the
rate
of
cut
and
since
we
also
did
nominate
a
lot
okay,
we
do
normalize
in
several
tables.
B
As
you
can
see
so
every
time
you
need
to
update
a
mutable
field,
you
need
to
update
or
each
of
those
seven
tables,
but
the
thing
is
that
each
of
those
seven
tables
have
a
different
partition
and
cluttering
columns,
or
they
have
a
different
primary
piece.
Okay,
so
to
stew,
to
keep
track
of
all
the
updates
you
can.
B
You
can
take
a
lot
of
bugs
it's
normal,
that
updating
seven
tables
is
much
more
harder
than
just
updating
one
table
and
in
our
case,
what
helped
us
a
lot
is
unit
tests
and
intervention
that
what
we
did
is
we
have
a
600
integration
test
and
every
time
we
refactor
one
one
class
of
what
repository
we
relaunched
all
the
tests
until
all
of
them
are
green.
Every
time
they
did
that
in
red.
We
try
to
find
the
issue
and
we
switch
them
and
I
think
this
is
the
best
way
to
handle
this.
A
B
A
B
Batching
gas
and
non-cash
are
william.
We
don't.
Oh
I,
see
he
wanted
to
talk
about
lightweight
transaction.
In
fact,
the
updates.
We
never
use
a
lightweight
transaction
in
the
update
okay,
so
we
use
Adam
log
batch
from
even
charged
atomic
city
for
updates,
but
for
insert
we
use
like
weight
reduction,
ugly.
A
B
A
B
A
Okay,
great
Tony's,
asking
I
can
answer
this
one
once
in
a
way
you
can
get
performance
benchmarks
for
Cassandra
when
transforming
large
stage
search.
The
most
complete
benchmarking.
Information
on
Cassandra
is
on
Planet
Cassandra.
There
is
a
benchmark
page
there.
We
include
benchmarks
from
the
University
of
Toronto.
They
did
a
very
large
database
benchmarking
projects.
B
Utilities
well
for
them
the
migration
batch.
It's
just
a
plain
Java
program.
We
are
reading
beta
by
chunk,
okay
and
we
are
putting
all
those
chunk
into
Cassandra
and
we
are
keeping
track
of
the
progression.
We
are
saving
some
time
to
time
the
list,
the
state
of
the
progression
into
a
simple
table
in
our
course,
no
more,
no
less,
very
simple
design.
B
A
B
B
A
B
B
For
example,
if
we
take
the
liberal
project,
what
is
telling
is
the
number
of
user,
so
somehow
we
should
put
the
user
ID
as
a
partition
key,
because
I
have
some
slide
in
my
presentation,
showing
that
the
number,
if
user,
if
you
put
the
user
ID
in
the
partition
key,
it
will
scale
with
the
number
of
not
in
the
cluster.
That
is
the
key
point
really,
if
you
need
to
to
remember
one
thing,
remember
that.
A
A
B
In
fact,
if
you
think
about
the
contacts,
okay,
each
user
has
a
list
of
contacts-
okay,
which
is
his
address
book
right.
So
the
only
the
only
scenario
I
can
think
about.
Conduction
is
when
you
are
updating
multiple
tables,
okay,
and
in
fact
here,
when
we
will
normalize
as
we
wear
every
time
we
update
later,
we
use
secret
batches
which
provide
an
eventual
atomicity.
So
we
accept
that
we
will
not
have
immediate
atomicity
in
case
of
failure,
and
we
accept
this.
B
A
B
Out
the
business
do
not
impact
them.
The
most
important
thing
for
the
distance
is
other
the
number
of
contacts
it
is
the
number
of
user,
because
in
term
of
business,
the
more
users
you
have
and
the
bigger
is
your
business.
So
right
now
they
are
only
focusing
on
number
of
user
and
not
really
context.
Contacts
contact
matching
is
a
feature
among
all
the
teacher.
A
B
B
It
area
is
depend,
it
depends
on
well.
There
are
many
factors:
the
number
of
the
size
of
your
Cassandra
cluster,
the
mod,
the
bigger
it
is
the
faster
you
can
write
inside
the
speed
at
which
you
can
read
from
Oracle,
because
you,
you
are
also
bound
by
the
the
the
read-through
fruits
of
Oracle
and
also
we
want
to
so
we
have
some
photo
because
we
want
to
limit
the
number
that
the
traffic
under
the
internal
network,
because
we
want
to
save
our
production-
do
not
forget
that
during
this
migration,
our
production
is
life.
B
A
Yeah
and
by
the
way,
just
just
to
reiterate
what
what
you
are
talking
about
there,
because
we
we
see
this
very
commonly
with
Cassandra,
obviously,
because
it's
the
architecture
that
people
are
able
to
do
these
rolling
upgrades
in
a
live
production
system
which
obviously
in
a
relational
database,
is
not
not
advised
to
do.
That
is
asking
how
many
Cassandra
nodes
did
you
have
in
production
and
you
use
hot
center
for
managing
and
monitoring
those
nodes.
Yes,.
B
A
B
Have
yet
we
we,
we
had
some
works
coming
from
the
Android
device
which,
because
the
application
is
available
on
iPhone,
Android
and
web,
also
and
there's
some
birth
in
the
Android,
which
create
contacts
continuously
upload
them
and
remove
them
after
work.
So
if
we
crack,
it
will
make
your
partition
grow
very
fast,
and
in
some
of
this
of
this
cases
we
we
decided
to
switch
to
level
compaction
for
the
table
to
cut
with
the
high
update
rate.
B
A
A
A
I,
don't
know
where
we're
going
to
get
clarification.
Oh
here
is
his
clarification
from
I
am
from
area
the
primary
keys
in
a
relational
database
typically
seriously
increasing
numbers.
So
how
does
casaya?
Does
cassandra
automatically
take
care
of
inserting
records
in
all
the
nodes
uniformly
without
hot
spots,
due
to
Sauron.
B
B
We
put
apart
by
the
hot
spots
question
cassandra
has
partitioner,
which
is
basically
a
hash
function.
Okay
and
the
hash
function
has
been
chosen
widely
to
be
the
most
uniform
possible
so
that
every
time
you
knew
you
insert
a
partition
key
on
a
row
into
your
audio
cluster.
It
will
be
distributed
randomly
on
on
the
node.
So
with
a
good
hash
function.
You
want
a
very
good
distribution
over
all
the
cluster
and
no
hotspot
now
product
to
to
happen
generated
primary
key.
We
use
client-side,
that's
a
new
ID
generator.
Basically,.
A
Okay,
great
any
Pam
is
asking
other
any
test.
Accesses
published
for
migrating
from
MySQL
to
Cassandra.
I
am
actually
going
to
put
in
the
window
here.
I
could
have
send
it
to
everyone
they
linked
on
Planet
Cassandra.
We
have
a
page
actually
of
all
of
the
best
ways
of
migrating
from
MySQL
to
Cassandra,
so
I
have.
A
B
So
I
think
the
person
is
asking
for
schema
change.
Schema
change
is
a
big
topic
in
Cassandra,
even
though
we
have
we
have
ads
of
stable.
So
it
means
that
you
want
to
collect
your
tables.
You
can
modify
it
later
in
time,
but
your
question
is
about
a
mechanism
to
track
change.
I
think
it
is
beyond
the
scope
of
just
customer
database.
I
think
you
are
thinking
about
some
system
like
liquid
base.
If
I'm,
if
I
get
it
right,.
B
Yeah,
yes,
yes,
they
are
still
some
data
staying
in
in
Oracle
because
we
are
migrating
continuously
data.
So
with
first
two
years
ago
we
migrated
all
the
messaging
messaging
data
to
Cassandra.
Now
we
are
making
migrating
all
the
contacts
data
and
what
is
remaining
in
in
Oracle.
Is
the
user
user
accounts.
A
This
recording
will
be
available
within
24
hours
on
Planet
Cassandra
and
will
also
send
out
the
slides
as
well.
Sameera
is
asking
a
step
beyond
migrating
from
Oracle
once
in
Cassandra.
Could
you
briefly
discuss
available
to
support
continuous
refactoring
affecting
the
data
model,
keeping
zero
downtime
in
mind?
A
B
We
can,
we
can
have
continuous
refactoring
and
the
strategy
is
always
the
same.
For
example,
if
you
realize
that,
oh,
my
god,
I
missed
this
picture,
I
want
to
I,
don't
know,
I
want
to
add
a
new
dean,
irwin
I
staple
to
support
a
new
query.
Okay,
in
that
case,
you
need
to
have
some
batch
to
do
to
process
the
data,
and
if
you
want
to
to
chen
drastically
them
the
schema,
you
can
use
the
double
run
strategy.
It
means
that
you
are
writing
with
a
new
card.
B
You
are
writing
to
both
table
and,
after
migrating
data
to
the
new
table,
you
are
duplicating
the
old
table
with
your
own
code
and
with
this
double
run
strategy
you
will,
you
will
not
have
production
downtime.
Of
course
it
implies
that
there
is
a
lot
of
work
beforehand,
but
there
is
no
magic
new.
Do
not
want
downtime.
You
need
to
work
beforehand.
Kareha
then.
A
B
Remember
we
have
added
a
new
column
in
an
indirect
Oracle
double
which
is
contact
UUID
and
every
time
we
copy
data
to
customer
and
the
copy
is
acknowledged.
We
fill
in
this
column
with
a
value
so
that
and
in
the
end,
if
we
scan
through
sequentially
to
them
all
the
contacts,
if
we
see
a
contact
with
this
column
being
null,
it
means
that
it
has
not
been
migrated.
So
it
is
a
very
simple
way
to
to
double-check.
B
A
B
I
go.
We
have
a
rack
system,
we
use
the
rack
system
and
for
provisioning
the
data
in
Cassandra.
Well,
the
best
practices
recommend
not
to
use
share,
storage
or
network
share
storage.
So
for
Cassandra
we
we
purchase
the
physical
machines
and
we
have
for
each
customer
machine.
It
has
its
own
disk.
This
is
the
best
practice.
Never
never.
You
share
storage
with
Cassandra,
because
using
this
you
will
deactivate
many
optimization
many
low
level
optimizations
Cassandra
provide
you.
A
B
Classical
question,
in
fact,
the
UUID
generator
we,
it
is
a
combination
of
the
timestamp,
the
MAC
address
of
the
machines
and
another
random
clock.
So
even
if
you
are
trying
to
generate
at
the
same
millisecond,
two
distinct
value,
so
you
are
on
the
same
machine.
So
you
have
the
same
MAC
address.
You
have
the
same
millisecond,
but
then
there
is
another
random
clock:
sequence
to
ensure
that
every
generated
number
is
unique.
A
Okay,
that
is
one
one
of
our
audience.
Members
is
actually
and
box
Center
issues.
So
it's
it's
sighted
say
it.
If
you
go
ahead
and
I'll
I'll
give
you
a
do
highs
email
address,
if
you
could
just
email
him
directly
and
we'll
get
you
a
good
resource
to
help
you
out,
but
okay
last
question
from
who's
going
to
be
scott,
but
is
scott
if
a
delete
fails
on
one
or
more
nodes
and
succeeds
on
others?
Is
there
a
chance
that
will
be
lost.
B
Yeah,
this
is
the
the
practical
tom
stone
or
issue
well.
Delete
in
cassandra
is
just
another
right.
We
are
watching
a
physical
columns
with
a
division
marker.
We
call
tom
strong.
So,
for
example,
you
have
three
replicas
and
you
are
deleting,
and
the
last
speaker
has
missed
this
delete.
We
are
repairing
weekly.
We
have
a
weekly
repair
process,
okay
to
to
synchronize
the
data
on
all
replicas,
so
in
fact
the
the
some
stone
we
keep
some
stone.
We
keep
some
sun
liberation
for
the
custom
stone.
B
A
Okay,
q
hi.
Thank
you
so
much
for
taking
time
today.
I
know
it's
getting
later
in
the
evening.
Your
time
there
we
really
appreciate
it
a
great
presentation
today
lots
of
good
questions
for
those
of
you
on
the
call.
Still
we
have
a
ton
of
resources
on
planet
for
Sandra.
We
have
a
ton
of
resources
on
dates,
tax,
calm
and
look
out
for
the
Cassandra
day,
tour
that
we
are
doing
this
year
coming
to
a
city,
hopefully
near
you,
and
you
can
keep
your
cassandra
learning
going
with
datasets
to
everyone.
Very
much
really
appreciate.
Thank.