►
From YouTube: Webinar | Oracle to Cassandra Core Concepts Pt 2
Description
Third normal form? That’s so 20th century. Learn the newest techniques to make your Cassandra database sing from the rafters in performance and scalability. AND it uses concepts that you already know and apply every day. You can do this. This is the must-see half hour of your professional life! These developers found a new way to work with databases. First you will be shocked, then you will be inspired!
A
B
A
A
Well,
you
use
it
effectively,
so
we're
going
to
just
launch
right
into
this.
So
what
do
we
do?
The
last
last
episode
was
about
the
transition
from
Oracle
to
Cassandra,
and
most
was
more
of
a
conceptual
problem
that
we're
dealing
with.
How
does
it
work?
I
would
highlight
some
of
the
differences
and
is
just
to
ease
you
into
the
concepts.
So,
if
you're
coming
from
an
Oracle
background
again
in
case
you
didn't
hear,
this
I
did
an
Oracle
DBA
forever
since
the
1990s,
and
then
that
happened
to
me,
I.
A
This
is
going
to
be
gory
details
and
so
much
more
of
a
technical
discussion
this
week
by
far
I'd
say
this
is
probably
one
of
the
meteor
of
the
of
the
three
and
we'll
be
digging
right
into
the
Cassandra
query:
language,
which
is
cql.
It's
top-down,
going
to
be
beef,
where's,
the
beef.
It's
going
to
be
part
two
and
in
the
next
episode
is
going
to
be
about
building
an
application,
will
preview
a
little
bit
more
at
the
end
of
this
talk,
but
it's
really
gonna
be
about
top-down.
How
do
we
build
an
application?
B
B
Last
week
we
ended
it
like
well
where's
the
catch.
You
know
everything
about
Cassandra
seems,
you
know
pretty
cool
right.
It
was
designed
for
the
new
world
of
applications
of
multi
data
center
replication,
always
on
I
mean
it's
kind
of
a
to
kind
of
seem
like
a
you
know,
rainbow
farting
unicorn.
It
can
do
everything.
Oh
well.
Let
me
need
to
talk
about
the
fact
that
it
was
for
transactional
purposes.
B
You
can
choose
to
have
a
replication
factor
in
this
case,
we're
doing
a
replication
factor
of
three
meaning
that
each
piece
of
data
will
live
on
three
three
nodes
on
the
ring.
So
it's
up
it
was.
It
was
pretty
straightforward,
but
this
is
how
we
got
the
replication
to
work
in
Cassandra.
Go
ahead
right,
perfect.
C
A
B
We're
going
to
talk
about
a
static
table,
so,
let's
start
with
something
that
everybody's
very
familiar
with.
This
is
a
create
table
statement.
It
should
look
vaguely
familiar
to
those
who
are
even
completely
uninitiated
to
the
world
of
Cassandra.
You
have
things
like
a
table,
name,
column
name,
going.
A
B
A
A
It
could
just
do
that,
might
sleep,
but
that's
really
because
of
the
way
that
Cassandra
works,
it's
sparse
data,
so
the
data
that
you
store
is
the
data
that
you
have
on
disk,
whereas
with
Oracle,
when
you
create
a
database
and
you
create
a
data
file,
it
will
allocate
that
space
in
the
blocks
and
everything
before
you
even
write
data
to
it,
and
that's
that's
not
the
case
with
Cassandra.
Obviously
you.
B
B
A
Yes
and
a
lot
of
people
asked
like,
what's
the
limit
well
on
a
bar
chart,
you're
looking
at
any
or
any
of
these
types,
actually,
the
the
sizes
to
gigabytes
so
I
like
call
out
a
practical
limit.
Don't
ever
try
to
write
two
gigabytes
into
a
column
I
take
your
application.
Will
barf
before
you
finish,
writing
it.
So
it
is
a
it's
a
big
number
and
it
is
doable
I've
seen
people
try,
but
that's
very
interesting
right.
So
that
means
that
every
type
that's
in
Cassandra
has
a
pretty
huge
limit
and
yeah.
So.
B
The
question
is,
why
would
you
even
have
types
again
with,
but
the
type
a
type
in
the
create
table
statement
actually
tells
your
honor
to
make
sure
the
data
that's
putting
put
there
is,
is
actually
an
integer
or
text
or
a
varchar',
so
it
it
does
actually
have
a
reason
to
live,
and
it's
reason
is
to
make
sure
that
the
data
type
is
correct.
When
you
use
your
data,
oh.
A
Yeah,
so
that
that's
a
huge
thing
is
a
lot
of
a
lot
of
folks.
Think
no
sequel
means
no
schema,
or
it's
just
whatever
you
throw
in
the
database.
Cassandra
is
very
strongly
typed
schema
and
you're
right
that
enforces
the
types.
So,
if
you
put
in
int
and
try
to
put
in
some
string
of
some
sort
it
will,
it
will
throw
a
violation
right
now.
You
can't
do
that.
A
B
B
Its
uniqueness
thing,
but
you
know
we
like
to
like
to
you-
know,
be
multi
tasker's
here
right,
if
not
just
for
uniqueness,
it
also
becomes
our
partition
key.
So
whatever
that
first
value
of
the
primary
key
designation
is
it's
what
is
going
to
get
hashed?
If
you
remember
back
to
that
previous
slide
and
tell
where
that
data
live
in
the
ring
and
that
hash
value
is
going
to
be
between
well
we'll
talk
about
them
a
little
bit,
but.
A
B
B
C
B
C
B
So
then
we
have
the
concept
of
a
key
space,
which
is
a
collection
of
tables,
and
you
may
think
of
this
also
as
a
schema
or
a
database.
It
is
pretty
much
exactly
the
same
thing,
so
the
concept
of
tables
and
and
key
spaces
are
probably
very
familiar
to
you.
If
you
lived
in
a
relational
database,
world
tables
is
a
collection
of
rows
and
a
team
of
a
key
space
or
a
schema
is
a
collection
of
tables.
B
A
Let's
talk
about
how
this
works,
here's
your
first
shot
at
the
DDL
and
Cassandra
query
language
gql.
This
is
just
an
insert
statement
and
it
should
look
really
familiar
at
this
point.
If
you
came
from
Oracle
I,
don't
know
if
there's
a
whole
lot
of
difference,
maybe
other
than
just
some
syntax,
very
small
syntax
issues
like
single
tick
versus
double
tick,
but
this
is
an
insert.
So
we
have
a
table
called
videos,
and
this
video
table
is
going
to
store.
A
We
have
this
data
model
online
and
we'll
be
talking
a
lot
about
it
next
week,
especially
big
this
there's,
a
data
model
online
called
killer
video
and
this
actually
exists
as
a
web
application.
You
can
look
at
that
online.
We
will
provide
that
in
the
show
notes
later
the
link
to
this,
but
this
videos
table
is
an
entity
of
it's
just
a
video
and
it
video
has
users
etc,
and
this
is
an
insert
statement
for
that
and
what
I
want
to
look
at
is
what
how
does
this
work?
A
When
I
write
insert
statement,
how
does
it
work
in
Cassandra?
So
when
we
have
a
table
name,
we
have
some
fields
and
then
we
have
our
partition
key
now
the
partition
key
is
required.
When
we
do
an
insert
just
like
you
would
have
a
primary
key,
that's
required
on
an
insert
with
with
Oracle,
because
you
need
to
I.
A
Yeah,
that
is,
that
is
an
odd
number
if
you
are
familiar
with
a
UID
or
G
UID.
That's
exactly
what
that
is,
and
we
use
this
quite
a
bit
in
Cassandra
when
we
get
into
more
of
like
an
application
design.
We
can
talk
about
when
to
use
it,
but
what
I
can
tell
you
is.
This
is
a
really
good
way
to
create
ID
numbers
with
Sandra,
especially
any
distributed
system,
mainly
because
what
you're
getting
here
is
a
guaranteed
unique
number
in
the
universe
it
resets.
The
plan
is.
A
Uses
it
like
I,
said
GUI
D.
So
if
you,
if
you're
a
Microsoft
LAN,
is
the
GUI
D
they
just
they
call
it
a
global,
unique,
not
Universal
I
like
that
they
got
burned
on
that
640k
thing,
so
they're
pulling
back
a
little
bit,
so
the
the
GUI
GUI
D
is
in
I
think
about
every
driver,
I
can
think
of
or
every
language
Java
Python
rust,
you
name
it.
They
all
have
a
UID
way
to
create
a
UUID
and
there's
different
types
of
UID.
A
But
in
this
case
is
a
type
one
me
t
ID,
so
the
partition
keys
required
we're
going
to
throw
in
some
values.
I
want
to
talk
about
the
partition
key,
though,
let's
let's
dig
into
that
right
now,
so
we
have
a
couple
of
inserts
here
now:
I've
created
a
so
much
smaller
version
of
that
we
didn't
have
to
get
too
big.
So
I
have
two
inserts
here
now.
If
you'll
notice,
I
have
the
first
two
partition
keys
yeah,
it
may
take
a
minute
to
parse
that,
but
those
are
different
and
those
two
inserts.
A
What
will
happen
with
those
now
you
mention
this
before
about
the
hashing.
Let's,
let's
look
at
that
specifics
so
when
I
took
going
to
pass
a
partition
key
to
Cassandra
what
it
does
with
that
is
it.
It
does
a
hash
with
that
and
we
use
murmur
three
now
we
have
used
md5
in
the
past
and
that
may
be
still
what
you're
using
today.
But
this
is
a
consistent
hashing
algorithm
in
consistent
hashing
algorithms
have
the
guarantee
that
if
you
have
the
same
input,
you
will
always
get
the
same
output.
A
So
if
I
hash
these
you
IDs
I
will
always
get
a
number
out
and
that
is
assigned
to
another
128-bit
number,
which
we
call
it
token
notice,
where
this
might
come
in
right.
So
token,
again,
are
that
consistent
hash
that
120
bit
number?
This
is
going
to
show
its
position
in
the
Cassandra
ring.
This
is
how
because
murmer
3
is
a
random
partitioner.
It
will
randomly
partition
this
value
across
your
ring
and
so
how
you
have
spread
out
around
many.
A
If
you
have
a
thousand
nodes,
you'll
get
one
1000
and
then,
if
you
have
two
nodes,
you're
going
to
have
1/2
this
token
value
is
assigned
always
to
that
value
of
the
partition.
Key,
so
this
is
how
we
can
always
get
it
back,
and
this
is
how
we
always
know
where
I'll
go
and
it
will
be
evenly
distributed
throughout
the
system,
which
is
great.
B
A
A
B
The
only
table
sorting
system
databases
right,
Rach
right
no,
so
this
is
a
no
sequel
database,
so
leave
that
static
table
that
we
just
described
to
you
is
where
relational
databases
end
right
there.
It
is,
it
has
a
number
of
columns
and
a
primary
key
and
it's
stored
on
disk
is
rows
and
that's
it
because
that
is
what
could
be
done.
We
now
have
a
new
concept,
and
this
is
concepts
we.
This
is
a
dynamic
table
and
the
only
difference
in
this
particular
table.
B
If
you
notice
it
still
got
all
the
you've
got
you
you
IDs
and
time
stamps
and
text,
the
no
1
and
no
sizes.
When
you
look
at
the
primary
key
designation,
it
has
two
values
which
again
is
very
typical
in
a
relational
database
world,
but
in
the
cassandra'
world
that
second
value
is
have
special
meaning
and
what
that
special,
special
meaning
of
that
second
value
is,
what's
called
a
clustering
column,
and
that
is
what
so,
video
ID
is.
Actually
it's
actually
metadata.
Video
ID
is
never
the
word.
B
Video
ID
is
not
actually
stored
in
the
table
was
actually
stored,
as
the
column
name
is
the
actual
video
ID,
so
a
UUID.
That
is
what
is,
that
is
the
new
column
name,
the
actual
opioid
yeah.
Let's
talk
about
this
because
this
is
definitely
jumping
down
the
rabbit
hole
right.
This
is.
We
have
completely
left
the
realm
of
traditional
data
modelling
and
we're
in
someplace
else.
This.
A
Is
very
specific
to
Cassandra
and
something
that
you
know
I
you
and
I
talk
to
users
all
the
time
and
I
think
this
is
probably
the
concept
that
has
to
get
over
the
fastest
because
it
will
make
or
break
your
data
model
so
pay
attention
class.
This
will
be
on
the
test,
the
primary
key
relationship.
How
does
this
work,
and
so,
if
we
look
at
the
primary
key
that
we
took
from
the
last
table,
we
have
a
tag
and
we
have
a
video
ID.
So
we
know
that
the
tag
of
the
partition
key.
A
We
know
that
the
video
ID
is
clustering
columns.
These
are
this
is
just
terminology
right,
we're
talking
about
how
if
you
and
I
are
talking
about.
This
is
what
we
designate
these
as.
But
what
are
they
at?
What
do
they
actually
mean?
We
know
that
the
partition
key
being
data
model.
That
would
be
one
of
the
tags.
We
know
that
that's
hash
right,
so
we're
going
to
use
that,
and
in
this
clustering
column
we
have
now
with
a
partition,
key
clustering
column,
we'll
put
these
things
into
a
logical
row
by
themselves.
A
So
this
partition
will
sorry
physical
growth
by
themselves,
so
the
partition
key
says:
hey
all
this
data
is
co-located,
so
everything
that's
that
has
to
do
with
data
model.
I
have
every
all
the
video
IDs
associated
with
that
word.
Data
model
I
want
to
physically
be
next
to
each
other
on
the
disk.
I
want
to
cluster
those
together
on
the
disk.
A
Hence
the
word
clustering
column,
and
there
is
an
order
to
this,
and
in
this
case
the
order
is
not
as
easy
to
figure
out
and
I
do
have
a
better
example
in
a
minute
here,
but
it's
the
UID
and
if
you'll
notice,
each
one
is
so
for
data
model.
We
have
three
videos
that
are
associated
with
data
model.
This
is
a
one-to-many
relationship
and.
A
I
do
120
bits
sorting
all
the
time
and
I
say
that's
correct,
and
when
we,
when
we
look
at
when
we
look
at
something
that's
a
little
more
human
I
like
an
integer
or
a
string,
it
will
make
a
little
more
sense.
But
this
is
what
we
mean
by
clustering.
It's
actually
clustering.
These
values,
together
on
the
physical
disk.
So
back
to
the
selec
I
want
to
talk
about
how
a
select
works.
A
A
B
A
A
A
A
A
So
we
we
as
they
as
programmers,
need
more
control
over
where
the
data
is
and
that's
about
data
locality
and
that's
what
the
partition
key
is
going
to
give
us
with
the
hash.
But
we
also
have
some
control
as
well
with
the
order
that
we
put
the
data
on
disk
so
there.
What
we're
trying
to
build
here
is
a
high-performance
data
model
right
and
I
mean
I
feel,
like
that's
always
been
the
case.
Then.
If
you
remember
last
week
we
talked
about
that
was
the
problem.
We're
trying
to
solve
is
the
database
is
slow
right.
A
That's
very
so,
yes,
so
I'm,
here's
a
more
complicated
example,
and
so
this
is
the
this
is
another
example
that
I
use
quite
a
bit
killer
weather.
We
have
a
lot
of
killers
out
there,
and
this
is
storing
time
series
data
and
raw
weather
data
is,
is
just
a
table
that
does
that
there's
a
weather
station,
ID
and
then
I
broken
down
the
month
year
or
the
year
month,
day
and
hour,
broken
those
down
into
individual
parts
and
then
just
storing
a
temperature.
So
if
you
look
at
the
inserts
here,
I.
C
C
A
You're
good
for
questions,
the
the
primary
key
is,
if
you
notice
it,
has
a
lot
more
going
on
than
just
one
partition
key
one
clustering
column
now,
I
put
the
parentheses
or
the
brackets
if
you're
in
Europe,
around
WS
ID,
because
I
wanted
to
make
sure
that
you
know
that
this
is
a
partition
key.
This
is
this
is
being
very
correct
with
my
syntax
now,
if
I
wanted
to
add
more
columns
into
those
into
those
parentheses,
I
could
group
together
columns
into
one
partition.
A
Key,
let's
say:
I
wanted
my
weather
station
ID
in
my
year
is
one
partition.
This
is
a
design
decision
again
something
we'll
talk
about
in
the
next
episode,
but
this
is
what
that
means.
Now,
when
you
have
multiple
partition
or
multiple
clustering
columns,
it
will.
It
creates
a
new
new
thing
for
us.
It
gives
us
some
control.
If
you
look
at
the
inserts
here,
Rachel
look
at
this:
it's
inserting
data
and
there's
in
the
only
thing,
that's
really
changing
it.
So
seven,
eight,
nine
and
ten.
Those
are
the
hours
so
every
hour.
A
So
this
is
what
it
would
look
like
right
now.
Let
me
let
me
go
back
one
and
look
at
that
last
statement
here.
The
clustering
order
by
this
is
this
is
when
you
start
leveling
up
on
your
usage
of
Cassandra
and
one
one
feature
that
I
feel
is
the
is
going
to
make
your
data
model
sink,
and
this
is
a
performance
issue.
This
is
on
insert
I
want
to
control
the
order
that
the
data
is
sorted
on
disk.
So,
if
you
look,
the
natural
order,
sort
order
for
integer
is
ascending.
A
A
B
A
B
B
A
A
A
A
A
B
So
I
mean
cuz.
Here's
a
question
right
I
said
earlier
that
sorting
is
expensive
and
I
have
I've
spent
before
I
came
to
data.
Stacks
I
was
15
years
of
data,
warehousing
experience
and
a
lot
of
time.
Assent,
sorting
data
and
a
lot
of
time
was
trying
to
push
your
order,
buys
into
memory
and
all
that
stuff
in
order
to
make
it
to
actually
make
it
perform
it
that
amount
of
twisting
and
turning
that
I
had
to
do
so.
C
B
We
say
that
we're
doing
this
for
speed.
You
know
speed
of
retrieval
yeah,
but
you
know:
where
do
we
pay
for
that
speed
of
retrieval
and
here's
where
I
think
the
it's
really
cool
about
the
way
that
Cassandra
amortizes
the
cost
of
a
write
over
time?
So
we
have
the
client
and
it's
inserting
data.
We
recognize
our
insert
statements.
It
finds
the
node,
so
it
hashes
that
that
partition
key
determines
which
node
it
needs
to
go
to,
and
then
it
writes.
Institute
is
Right.
B
The
first
of
the
commit
log
and
that
commit
log
is
a
sequential
file,
and
it
just
keeps
writing
it.
It
does
and
when
the
new
file
needs
to
happen,
a
new
file
comes
up,
it's
not
there's,
nothing
really
managed,
except
for
just
sequential
writes.
The
second
plays
it
writes,
is
to
memory
and
it's
going
to
put
in
the
partition
key
and
all
of
our
clustering
keys
and
our
actual
values
which,
in
the
cases,
the
temp
into
memory
memory
very
fast
to
retrieve
from
so
somebody's
reading
from
memory.
B
A
B
Move
it
on,
but
here's
what
eventually
memory
fills
up
we're
not
to
the
point
yet
where
we
have
unlimited
memory
on
our
machines.
It'll
be
great
one
when
that
day
comes,
but
for
now
no
RAM
is
limited.
Eventually,
data
has
to
go
to
dim,
and
this
is
where
the,
where
the
sorting
happens.
So
this
it's,
this
smaller
amount
of
data
that
lives
in
your
memory
is
actually
sorted
at
the
time
it's
flushed
to
disk
and
it's
plush
to
something
called
an
FF
table.
The
SS
table
stands
for
sorted
string.
A
A
B
I
was
saying:
the
data
is
sorted
into
these
sort
of
strings
tables,
and
it's
sorted
by
that
clustering
order
by
statement
that
we
saw
earlier
so
that
so
these
SS
tables
are
immutable
and
they
are
written
every
time
at
the
mem
table
/,
but
eventually
those
SS
there's
a
number
of
SS
tables
that
build
up.
So
a
process
comes
through
called
compaction
which
will
merge
various
sort
of
strings
tables
together.
Put
the
data
correctly
in
the
right
order
and
create
a
new
SS
table.
B
That's
called
compaction
and
compaction
is
something
that
you
do
need
to
be
aware
of,
because
it
probably
will
run
all
the
time
in
a
well-tuned
system,
and
it's
something
you
do
want
to
tune
for
again.
Your
amortize
in
the
cost
of
your
rights
over
time,
and
this
compaction
is
a
cost
and
is
the
cost
in
particularly
in
cpu
and
an
I/o.
So.
A
I
get
asked
this
question
a
lot.
Rachel
is
like
what
happens
if
I
have
an
outer
order
right
what
happens
and
that
this
is
the
compaction
process.
Is
what
covers
that?
If
you
have
that
weather
station,
let's
say
that
it's
kicking
out
whether
it's
weather
data
and
it
goes
offline
for
a
period
of
time
or
you
get
some
error
and
it
misses
a
few
hours
and
then
later
comes
online
and
re-establishes
this
connection.
How
to
order
data
will
be
recombined
during
the
compaction
process.
Yeah.
A
B
Compaction
also
gets
rid
of
data.
That's
that's
been
deleted
because
we'll
be
talking
about
deletes
later
on,
but
deletes
happen
in
two
phases.
First,
data
that
it's
good
to
be
deleted
as
tombstone
and
then,
after
a
matter
of
time,
compaction
will
remove
those
tombstones
from
disk
and
give
you
back
this
place
right.
A
So
here
we
are
back
with
our
back
to
the
storage
model.
Now
we'll
just
wrap
this
concept
up,
so
we're
back
to
selecting
some
data
from
the
disk
and
just
to
reinforce
what
we've
talked
about
before
when
I
do
a
select
from
from
the
database
and
I
give
it
a
particular
partition
key
this
weather
station
ID,
but
then
I
give
it
the
clustering
columns
I,
give
it
a
year,
month,
day
and
year,
month
and
day.
A
A
This
is
really
what
you
need
to
understand
about
partition
key
and
the
ordering
now
that
I
do
an
order,
but
I
do
a
clips
from
order
by
and
reverse
it
now,
whenever
I
insert
new
data,
it's
actually
going
to
the
beginning
of
that
record,
and
that
means
that
as
new
data
is
appearing,
it's
now
becoming
then
the
first
record
there,
which
makes
it
a
lot
because
it's
merged
and
it's
sorted
and
it's
stored
sequentially.
So
now,
when
we
get
back
to
the
read
path,
this
is
where
we
actually
get
the
payoff
right.
B
Yeah
exactly
because
all
right,
so
we
now
have
our
select
statement
and
notice
that
we've
we
have
all
of
our
our
partition
key
is
there
as
required
and
because
we
want
to
go
down
all
the
way
down
to
the
our
and
we're
doing
a
range
scan
on
the
our
that
we
also
include
the
year
month
and
day
we
could
include
just
a
year
the
year
in
a
month
the
year
month
and
day
or
the
year
month,
day
and
hour.
So
the
client
sends
out
the
select
again
chooses.
A
In
the
mem
table
and
chanmin
back
to
grant
yeah
so
then-
and
that's
a
really
I
think
that's
important-
is
that
a
lot
of
people
say
we'll
ask
me
the
question:
hey
if
it
goes,
if
it
inserts
in
the
mem
table,
just
just
read
from
the
mem
table
sort
of,
but
it
needs
to
go
to
disk.
This
is
a
persistent
system.
It
has
to
go
to
the
disk
to
get
that
data
and
there's
no
two
ways
about
that.
B
C
B
A
A
This
is
why
it
will
be
fast
because
when
we
ask
for
the
data
we're
going
to
ask
for
the
partition,
key
partition
key
goes
to
a
single
node,
no
matter
how
many
we
have
in
the
universe,
and
you
could
have
the
small
cluster
big
cluster
doesn't
matter,
it's
still
going
to
go
to
a
particular
node
for
that,
because
the
partition
key
and
we're
going
to
do
that
single
seek
on
disk
because
dis
seeks
are
the
worst
part
of
your
system.
Do
not
forget.
A
This
seeks
are
get
measured
in
milliseconds
microseconds
if
you're
lucky
and
you
have
a
really
fast
like
flash
storage
or
maybe
even
SSD,
but
that
will
be
the
slowest
part
of
your
operation.
So
doing
one
seek
awesome
and
one
does
that
single
seek
it
reads
it
all
in
one
sequence,
reads
it
all
into
your
table,
your
cql
table
you
get
sorted
by
event,
time,
rows
and
columns.
Everyone
is
happy
and
programmers
love
it
right.
A
A
Because
programmatically
working
with
rows
is
a
lot
easier.
When
you
ask
for
a
set
of
data,
you
can
iterate
over
rows,
grab
the
columns
it.
It
is
just
an
easy
way
to
work
with
data.
Rows
and
columns
are
a
preferred
format.
Now
there
may
be
better
ways
to
do
it,
but
I'll
tell
you.
This
was
always
worked
well
for
me
and
I
liked.
It
I
think.
B
A
Going
back
to
our
first,
our
first
discussion:
we've
always
Cabul
ated
data
as
humans
and
creating
tabular
data
is
not
a
bad
plan
right
because
we
seem
to
like
it.
We've
done
it
for
a
thousand
years
now
for
something
completely
different,
but
not
doing
it
right.
I.
B
A
So
these
are,
these
are
really
the
Cassandra
specifics
and
the
day
will
help
quite
a
bit,
and
this
is
where
we
move
on
from
the
traditional
Oracle
style
data
model,
where
we
have
some
really
cool
features,
and
some
of
these
are
some
that
I
use
all
the
time.
In
my
data
models,
you
saw
it
already,
let's
start
out
with
collections,
so
here's
a
very
simple
one
justice
is
that
now
a
set
of
tags.
That
means
that
we're
denormalizing
our
data
here
and
when
we
get
into
application
design.
A
We'll
talk
more
about
that
strategy,
but
by
denormalizing
we're
grouping
our
data
together,
we're
making
it
faster,
and
this
is
because
of
how
Co
standard
works.
We
have
a
partition
key
and
all
our
data.
Well,
a
set
allows
us
to
have
a
semi,
a
very
dynamic
part
of
that
data
model
inside
of
a
dynamic
table.
A
So
a
set
of
tags
would
be
like
the
data
model
Cassandra
like
think
about
tag
like
on
YouTube
or
something-
and
this
is
a
set
that
has
n
column
name
and
then
the
cql
type
inside
the
greater
than
less
than
that
used
for
ordering.
So,
whenever
I
create
a
stat
I'm
because
the
set
is
ordered
by
the
type
it's
going
to
be,
it's
going
to
be
collated
by
the
the
utf-8
or
like
a
string
ordering
and.
A
Also
critical
yep,
so
if
I
was,
if
I
did
a
set
of
timestamps,
it
would
order
by
time
so
I
divided
by
integers
would
be
ordered
by
ends.
A
list
is
very
similar
to
call
a
name
and
a
cql
type.
It's
a
cql
type
in
a
list
is
not
used
for
ordering,
because
the
list
is
the
order
you
put
it
in
so
that
where
did
you
put
it
in?
Is
it
order?
You
get
it
out,
not
my
favorite
collection.
A
It's
a
little
heavier
requires
more
overhead,
because
I
have
to
sort
every
time
you
do
an
insert
I
I
stick
with
set
as
much
as
possible
in
these
cases.
List
is
one
that
I'm
not
so
hot
on,
and
but
it's
there
and
sometimes
listed
is
important,
and
if
I
wanted
to
change
my
life
cql
type
or
if
I
wanted
to
make
my
set
a
list.
I'm
sorry
I
put
a
set
in
there
didn't
I
duh
bad
flood.
A
So
if
I
add
my
list
of
tags
instead
of
my
set
of
tags,
then
I
would
be
saying
that
the
order
is
somehow
important,
which
it
isn't
really
so.
My
last
one
here
is
my
map,
and
this
is
probably
one
of
my
more
favorite
data
types
of
all
I'm,
not
even
in
pql,
but
it's
very
dynamic
you
because
you
get
a
key
and
you
get
a
value,
and
so
it
creates
a
nice.
A
It's
I
don't
want
to
stay
it
because
someone's
going
to
hold
me
to
this,
it's
like
a
database
inside
of
a
database
right.
It's
a
key/value
database
instead
of
your
table,
but
here's
the
warning
do
not
go
nuts.
With
this
thing
it
you
most,
you
can
put
in
there
65,000
for
a
reason,
because
you
have
to
there's
a
lot
of
serialization
cost
you're
going
to
start
incurring,
and
it's
really
meant
for
small
dynamic,
key
value
type
sets
and
if
you
need
more
than
65,000.
A
It's
interesting
because
what
what
we're
doing
here
is
smoke
and
mirrors
in
a
way
because
it's
still
being
stored
as
a
cholesterin
column
in
a
way
down
on
the
disk.
So
all
of
these
values
are
being
clustered
together
in
the
same
partition
and
that's
pretty
critical
right.
We
know
that
if
we
put
a
partition
key
and
then
we
have
them
all
cluster
together
on
the
disk,
that
you
will
get
a
faster
data
model.
Well,
we
do
that
the
collections
abstract
that
for
the
user
and
then
put
those
together
on
disk
as
well
and.
A
C
B
A
B
There
there's
four
currently
built-in
aggregates,
so
you
know
there's
nothing,
there's
not
all
of
them
about
all
of
the
things.
Yet.
We've
got
four
we're
very
happy
about
four.
Just
they
just
came
out
recently
it,
but
they
are
built-in
and
they
you
know,
and
you
need
again,
you
need
to
use
it
with
the
partition
key
that
it
doesn't
take
away
the
requirement
to
not
have
a
partition
key
in
your
query.
B
They
act
just
like
you
would
any
any
of
these
other
aggregates
I'm
not
going
to
go
into
the
details
of
how
of
what
a
Mac
does.
But
what
but
I
want
to
point
out
down
at
the
bottom
here
in
our
cql.
Shell
is
there's
something
missing
and
it
bothers
me
because
I
write
aggregates
with
group
by
and
having-
and
that
is
typically
how
you
would
write-
am
yes
I.
C
C
B
A
Then
not
that
is
the
partition
key.
No,
so
what
we've
known
already
right
is
the
partition
keys,
create
a
their
own
grouping
of
data
and
that's
how
you
gather
data
together
and
using
the
clustering
column
for
ordering,
but
you
get
a
group
I
and
ordered
by
by
using
a
partition
key
and
clustering
column.
A
A
Full
table
scans,
we
try
to
avoid
those
because
they
sucked
now
we're
going
to
do
a
full
cluster
scan.
This
will
suck
even
more
right.
Let's
not
do
that,
and
that
opens
up
the
the
whole
conversation
and
using
something
like
a
patchy
spark
with
Cassandra,
which
is
definitely
built
to
do
full
cluster
scan,
but
rid
of
scope
for
this
discussion.
A
A
They're
not
they're,
they
are
actually
just
built-in
user-defined
functions.
So
this
is
another
thing,
that's
now
in
container
2.2,
and
it
is
not
quite
the
trigger
that
you
expect
and
I
say
that,
because,
if
you're
used
to
using
pl/sql
or
you're
doing
you
know
just
any
kind
of
trigger
work
in
any
other
language
or
any
other
database,
you
know
that
there's
a
lot
of
act,
a
lot
of
things
you
can
get
internal
introspection
on.
You
can
do
things
inside
the
database.
A
User
defined
functions
are
built
in
as
a
pure
function,
so
you
have
a
a
function
that
takes
a
CQL
type
and
it's
usually
a
part
of
a
of
a
column
and
it
that
pure
function
doesn't
rely
on
any
outside
data
at
all.
That's
why
it's
pure,
so
the
input
parameters
are
manipulated
by
something
pure
inside
a
Java.
You
can
also
use
JavaScript,
you
can
use
Ruby,
you
can
use
Python
and
those
that
function
will
somehow
manipulate
that
data
and
this
the
functions
can
then
make
you
can
use
those
to
create
aggregations,
and
that
means
hey.
A
Do
them
aggregations
over
a
certain
range
of
data
that
I
provide
like
a
partition,
and
then
you
can
create
your
own
user-defined
functions
now,
I
get
that
max
min
average.
Those
are
all
included
count.
Those
are
all
included
because
those
are
kind
of
the
easy
button.
I've
seen
some
pretty
interesting
things
already,
with
these
I'm
kind
of
excited
to
see
what
users
come
up
with
for
user-defined
functions.
But
this
is
definitely
an
interesting
and
useful
change
to
cassandra.
C
A
B
B
If
casandra's,
that
is
not
an
acid
compliant
database,
specifically
the
consistency
portion
of
the
of
that
is,
is
a
little
bit
different
and
that's
we're
going
to
talk
more
about
that
next
week
on
how
consistency
works
in
Cassandra,
but
that
doesn't
stop
us
from
requiring,
sometimes
a
need
for
a
little
bit
of
walking.
Just
just
a
little
bit
goes
a
long
way.
Not
all
not
every
single
transaction
requires
a
heavy
lock,
but
there
are
some
transactions
that
do,
and
you
know,
for
example,
race
conditions.
B
So
when
you
are
in
a
distributed
system
and
you've
got
somebody
signing
up
with
the
same
username
in
different
parts
of
the
world,
there
needs
to
be
a
way
to
make
sure
that
those
don't
end
up
colliding
with
each
other,
because
the
the
way
that
Cassandra
works
is
the
last
one.
The
last
right
wins
and
that's
not
very
good
for
application
consistency.
So
there
is
a
concept
of
called
lightweight
transactions
with
Cassandra
and
they're
fairly
easy
to
implement.
I
mean
you
get
a
you,
get
a
pretty
big
hammer
with
just
a
couple
of
words.
B
So
right,
there's
and
if
not
exists,
you
add
that
into
your
insert
statement,
that's
pretty
much
it
that
will
actually
initiate
the
process
internally
to
make
sure
that
this
data
is
going
to
be
written
and
it's
going.
It's
not
going
to
be
overwritten
by
a
another
transaction
coming
very
quickly
after
it
yeah.
A
A
A
Anyway,
you
could
possibly
do
it
to
mimic
this,
but
because
this
is
built
into
Cassandra
we
are.
We
are
looking
at
taxes
under
the
covers
and
that's
a
pretty
established
consensus
protocol
so
in
it
lives
inside
a
cluster,
and
that
does
I
think
that's.
The
key
things
have
happened
outside
of
the
cluster
are
independent
bad
things
can
happen,
you
don't
know
it
and
you
get
a
really
inconsistent
state
of
your
data
in
dogs
and
cats
living
together,
total
chaos.
A
Yeah,
it's
not
the
cheap
option,
but
it
I
will
save
you
and,
if
so,
like
anything
else,
and
with
Cassandra
nets
are
just
many
times
before
is
you
know,
understand
the
internals?
Well,
this
is
an
internal
you
should
understand.
Is
that
there
is
round
trip,
and
so
it
will
take
longer,
it
does
I,
don't
use
it
for
all.
The
things
tells.
A
A
So
what
about
updates,
though
this
is
this-
is
a
a
standard,
update
and
they're
there
some
danger
here,
because
an
update
can
override
data,
and
how
is
that
true,
because
there's
no
constraints,
that
is
not
something
that's
built
into
Cassandra
and
a
constraint
vile
when
happen.
If
this
is,
if
this
data
existed,
what's
it
going
to
do
Rachel,
they.
C
A
Exactly
it's
like
mice,
it's,
like
my
seven-year-old
on
a
skateboard,
just
don't
get
in
its
way
because
it'll
mow
you
down
yeah
bad.
So
in
that
case
I
mean
the
DS
are
perfectly
fine.
It
needs
in
the
case
where
you
know
that
you
have
something
of
somewhat
idempotent
or
you're
working
on
a
progressive
data
model
updates
are
rarely
used,
I
think
a
Cassandra.
It's
interesting,
I
see
more
data
models
with
inserts
than
anything
updates
are
not
as
common.
A
A
So
this,
if
right
here
at
the
end,
if
you'll
notice
is
a
conditional,
it's
a
conditional
update
and
what
I'm
saying
here
in
this
conditional
update
is
hey,
update
the
videos
with
it
and
give
this
name,
but
only
if
this
user
right,
if
it's
from
this
user
ID
and
this
user
ID,
equals
this
user
ID,
so
I've
created
a
condition
where
I'm
going
to
make
sure
I'm
not
overriding
someone
else's
video.
Now
you
can.
This
is
a
very
simple
example.
A
A
B
A
A
It's
okay,
it's
all
right!
It's
all
valid
tombstones
are
a
marker
in
the
system.
We
won't
go
into
it.
If
you
want
to
learn
more
about
tombstones,
there
are
much
better
longer
discussions
about
tombstones.
The
thing
you
need
to
know
about
it
is
that
it
is
marking
in
your
database
that
that
data
is
no
longer
valid.
It
doesn't
actually
physically
delete
it
off
of
the
disk.
A
B
A
I
think
it
would
be
pretty
cool
yeah
and
that's
what
I
think
that's
the
biggest
fear
is
that
your
data
returning
like
somebody's
saying
yeah
I,
got
that
data.
You
missed
that,
oh
no,
so
here's
my
favorite
feature
of
all
time
and
I
say
that,
because
I
use
it
all
the
time
so
order
buying.
It's
like
my
second,
my
favorite
is
TTLs,
and
why
so?
A
What
we're
looking
at
here
is
a
way
to
the
time
to
live
the
TTL
and
the
expiring
data
in
especially
in
time
series
data
or
in
a
very
transient
and
data
model,
where
you
have
data
coming
and
going
I've
seen
some
crazy
use
cases
with
this.
Essentially,
what
you're
doing
is
you're
marking
your
data
whenever
you
insert
it,
you
say
expire
that
data
after
30
days
and
the
way
I
explain
a
TTL
is
this.
Is
this:
is
a
free
delete,
it's
actually
less
than
free
and
the
reason
I
say
it's
less
than
free
p.
A
A
Just
there
was
like
a
whole
routine,
like
I'm,
going
to
delete
data.
Ok,
let
me
turn
off
all
this
stuff
right,
oh
and
then
we'll
do
it
at
night,
because
god
knows
what
they'll
do
to
our
users:
the
beauty
of
the
TTL,
that
once
the
data
has
expired,
it
just
disappears
it
and
it
doesn't
what
I
mean
when
it
disappears?
Is
the
users
no
longer
get
to
see
it
and
the
did
it
will
eventually
just
not
get
copied
as
compactions
happen?
That's
why
it's
less
than
it'll
delete.
A
It
just
doesn't
move
data
anymore,
so
it's
very,
very,
very
low
low
low
down
process
lane
and
it
will
not
create
a
whole
lot
of
problems
good
for
you
seen
some
crazy
stuff
on
this.
I
had
one
one
user
that
was
processing
250
terabytes
a
month
and
expiring
it
after
30
days,
so
you
had
like
a
30
day
ring
buffer
of
250
terabytes.
It's
pretty
cool!
B
A
B
A
B
Next
week
we're
going
to
talk
about,
why
isn't
there
any
joins,
and
what
do
you
do
about
that?
And
and
then,
how
do
you
design
your
application,
both
from
the
from
the
logical
side
and
the
organizational
side
on
to
the
physical
side
like?
How
do
you
actually
take
these
concepts
following
you
about
the
last
two
weeks
and
make
your
application
work
with
Cassandra
yeah.
A
That's-
and
this
is
a
top-down
discussion-
we
hope
that
we
can
wrap
everything
up
and
make
make
you
a
little
more
successful
in
your
next
application
with
Cassandra,
because
you've
learned
you've
learned
a
lot
about
the
parts,
the
bits
the
pieces
and
ellipses-
let's
just
put
them
together.
Our
our
intent
here
is
to
make
you
really
good
at
doing
this,
so
that
everyone
thinks
you're
an
ultimate
badass.
We
want
that.
We
want
you
to
be
the
badass
Joe
everyone,
how
cool
you
are
by
this
really
cool
app.
A
You
built
that
doesn't
seem
to
go
down
ever
a
tech
job,
that's
our
job
and
we
would
love
to
see
you
love
to
see
you
at
stander
summit,
and
so
we
are
will
be.
There
live
September
22nd
through
the
24th.
We
are
going
to
do.
Training
with
a
Riley.
O'reilly
is
going
to
do
certification
for
certified
developer
Cassander
developer,
which
is
really
cool,
that'll,
be
the
first
one.
A
Ever
there's
tons
of
talk,
130
tracks,
lots
of
big
companies,
small
companies,
interesting
people
talking
about
interesting
things,
just
be
there,
it's
fruity
or
if
you
want
to
get
priority,
I
can
get
here's
a
priority.
Pass
use
Rachel's
somebody
to
use
mind
ones
and
there's
a
priority
password
pass
for
getting
out
of
25%
off
certifications.
So
we
hope
to
see
you
there.
Alright,
we
have
minutes
left
and
Devin
I
know
you're
on
the
line
out
there
somewhere.
Can
you
maybe
passes
a
couple
of
questions
here
and
just
keep
in
mind?
A
C
A
Boy,
that's
good
tough
question.
I
didn't
possibly.
A
It
could
be,
but
I
think.
Actually
what
you're
asking
is
there
wasn't
a
same,
a
number
of
columns
that
I
inserted
in
it.
You
can
have
sparse
amounts
of
columns.
If
you,
the
most
important
thing
is
you
have
to
include
the
partition
key
and
the
clustering
columns
when
you
insert
data,
so
those
have
to
be
there,
but.
B
Very
actually,
it's
actually
very
common
to
have
a
data
model
for
certain
types
of
for
certain
types
of
tables
that
we'll
go
into
a
little
bit
more
next
week,
where
you
just
have
the
partition,
key
and
a
clustering
column,
and
you
actually
don't
have
a
value
and
the
actual
cell
can
be
null,
but
the
column
name
or
the
question
column
needs
to
be
populated.
Yeah.
A
A
That's
a
really
important
one,
so
we're
at
the
top
of
the
hour
I'm
sorry,
we
only
got
two
questions
and
again
we
will
answer
your
questions,
and
these
are
that
last
question
was
obviously
a
thinker
you're,
paying
attention
good
good
class
and
we
will
be
net
back
next
week
for
our
final
installation
of
this
oracle,
Cassandra
Business
and
we'll
be
talking
applications,
so
hopefully
you'll
be
there.
Thank
you
very
much.
Thank
you.