►
From YouTube: Cassandra Community Webinar | Intro to Apache Cassandra
Description
Speaker | Aaron Morton (Apache Cassandra Committer)
Date | Wednesday, October 10 @ 11AM PST
Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*'s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to concepts such as Cassandra's data model, multi-datacenter replication, and tunable consistency.
A
Series
and
that
is
available
via
archived
on
datastax
com,
just
a
little
housekeeping
at
the
end
of
the
presentation,
Erin
will
be
taking
QA
and
you
can
either
submit
your
questions
on
Twitter
via
the
hashtag
Cassandra
QA,
or
you
can
use
the
WebEx
Q&A
panel
either
way
we
will
be
monitoring
both.
So
just
a
little
bit
about
Erin
he's
very
well
known
in
the
Apache
Cassandra
community.
A
B
B
Sorry
there
we
go
now
on
the
print
enter
good
morning
good
afternoon.
Everyone,
as
Christian,
said
my
name's
Aaron.
I
probably
have
spoken
to
some
people
on
the
cassandra
user,
lift
that's
a
great
resource
to
go
for
questions
on
the
from
the
apache
site.
If
you
have
any
our
words,
also
encourage
you
to
ask
questions
on
the
twitter
hashtag,
as
christian
said
or
through
WebEx.
B
B
So
Sandra
was
famously
excuse
me
famously
started
by
Facebook,
and
they
used
it
to
provide
search
for
their
inbox
feature.
They
donated
it
to
the
apache
software
foundation
in
2008
and
in
2010
it
became
a
top-level
project
at
Apache,
which
means
it
came
out
of
intubation.
It
was
deemed
to
be
a
mature,
stable
project.
Since
then,
we've
had
a
number
of
major
releases
and
the
project
really
has
taken
on
a
great
life
and
a
great
level
of
a
great
number
of
people
who
are
very
vocal
about
how
Cassandra.
A
B
B
We
just
take
a
step
back
to
how
2
to
the
foundations
of
Cassandra
needed
some
papers,
which
I
think
are
accessible
for.
Anyone
to
read
and
I
encourage
you
if
you've
got
the
deeper
interest
into
why
Cassandra
have
made
some
of
the
design
choices
that
did
you
can
look
these
up.
So
in
2006,
google
released
a
paper
called
big
table
which
talked
about
a
data
system
that
they
had
there
at
the
time
and
cassandra
borrows
the
column
family
data
model
from
big
table.
B
B
You
might
have
a
right
master
and
three
read
slaves
on
a
postgres
or
am
I
sequel,
installation
and
a
terracotta
case
really
what
you're
doing
there
is
trying
to
get
lower
latency
on
your
reads
and
on
your
rights
as
well,
by
separating
those
off
from
the
reeds
and
higher
throughput,
and
that's
something
that
Cassandra
can
often
help
with
and
simplify
that
insulation
to
be
sp1,
canonical
datastore.
And
the
next
point
is
operations.
I.
B
We
help
them
move
to
different
hardware
and
we
can
take
nodes
down
and
bring
them
back
up
again
on
new
hardware,
all
the
time,
keeping
that
system
operational
and
lastly,
the
data
model,
the
Coleman
family,
almond
or
data
model-
is
very
flexible
system.
As
long
as
rigid
as
a
tabular
data
model
that
you
get
in
a
relational
database
I'm,
the
often
people
will
see
that
as
a
plus,
if
you
like
know
more
about
where
the
sender
is
a
good
fit
for
what
you
do.
B
One
of
my
MVP
colleagues,
aeroclub
Oh,
we'll
be
talking
about
it
in
a
couple
of
weeks
to
encourage
you
to
sign
up
for
that
webinar
as
well.
Let's
move
on
to
talk
about
cassandra
has
a
cluster
system,
so
in
this
example,
here
I've
got
four
nodes
in
the
cluster
and
I'm
just
drawing
them
in
a
ring
here.
This
line
does
not
represent
network
connectivity.
Is
it
just
represents,
if
you
think
of
that,
normally
it
thinks
as
a
ring.
B
We
want
to
store
a
row
of
data,
so
in
Cassandra
we
have
grows
like
you
having
them
in
the
relational
database,
and
we
have
that
what
we
call
a
keen
which
is
analogous
to
a
primary
key
in
a
relational
database
and
because
we
want
to
get
some
of
these
high
availability
and
and
fault
tolerance.
We
want
to
store
the
food
key
three
times
and
call
this
a
replication
factor
into
Sandra.
So
you
can
go
ahead
and
store
this
road
on
those
one
two
and
three,
but
we
really
want
to
understand
why
we
get
that.
B
B
So
if
we
have
four
nodes,
each
will
have
twenty-five
percent
of
the
load,
and
we
can
very
easily
then
understand,
what's
going
to
happen
to
all
the
computers
in
our
cluster
and
consistent
hashing
also
allows
us
to
minimize
the
key
movements
from
those
drawing
or
leave
the
cluster.
As
I
said,
we
want
to
sandra
to
be
continuously
available
continuously
original
handling
requests.
We
want
to
be
able
to
scale
it
up
and
scale
it
down
and
consistent.
Hashing
helps
with
this.
B
B
So
that's
how
we've
got
some
randomization
in
there
and
that
randomization
helps
us,
have
a
consistent
load
on
all
the
nodes
and
instead
of
thinking
about
the
range
of
tokens
as
a
ring.
So
it
has
a
number
mine
if
we
think
about
it
as
a
token
ring,
we
get
to
do
a
couple
of
interesting
things
in
this
token
ring.
You
can
think
of
this
like
a
clock,
so
in
a
clock
will
you
count
from
58
59?
The
next
value
is
0,
similar
idea
in
our
token
ring
here.
B
If
we
came
around
from
0
to
98
99,
the
next
value
is
zero,
so
we
have
a
continuous
range
of
tokens,
and
this
allows
us
to
apply
some
operation
on
to
that
continuous
range
and
my
example
here
I've
used
the
tokens
tokens
from
the
range
0
to
99
in
real
life.
The
token
rent
range
is
a
hundred
and
twenty-eight
bit
integer
and
has
values
up
to
170
billion
billion
billion,
or
something
like
that.
So
we
have
a
very
large,
essentially
infinite
range
of
tokens.
B
B
This
token
places
the
node
on
it
on
the
ring.
If
you
look
at
this
model
here,
node
one
has
a
token
of
0
and
2
of
25
and
so
forth.
We
can
then
give
each
node
what
we
call
a
token
range.
Now
it's
hard
to
get
across
the
idea
that
this
post
range
is
not
primarily
owned.
By
a
note,
it
just
helps
us
when
it
comes
time
to
find
out
which
nodes
to
store
data.
B
Rob
Lowe
tomb
has
a
token
range
of
1
to
25,
because
the
token
range
starts
at
one
after
the
previous
token'
value,
which
is
0
and
goes
round
to
include
its
own
token
valued.
If
we
look
at
node
1,
its
token
range
starts
at
76,
which
is
one
path:
the
previous
token'
value,
and
here
we
go
with
this
clock
counting.
B
B
We
put
that
token
into
the
token
ring
and
we
can
work
out
which
node
as
a
range
which
covers
that
token
ring,
and
we
can
find
the
first
replica
before
our
roki
again,
let's
emphasize
here
that
this
is
a
peer-to-peer
based
system
and
those
one
is
not
in
any
sense
of
Master
replica
is
in
no
way
more
replicating
of
food
than
any
other
node.
It
is
simply
the
first
one
that
we
found
remember
working
out
the
replicas,
but
we
want
more
than
one
replica.
So
we
have
this
idea
of
a
replication
strategy.
B
B
It
takes
the
modes
order
them
by
their
token
and
then
simply
counts
around
until
it
gets
to
replication
fact
the
number
of
loads.
So
in
this
case
we
took
the
roki,
we
mapped
it
into.
We
created
the
token.
We
map
that
onto
the
token
range
that
token
range
is
owned
by
no
is
one
they
did
our
first
replica.
B
We
then
count
and
we
get
to
no.2
and
no.3
and
again
there
is
no
different
here
in
terms
of
one
bit
of
master
one
being
a
slave
one,
somehow
being
more
important
than
the
other.
All
these
replicas
are
the
same,
and
it's
simply
by
convention
that
we've
gone
clockwise
around
the
around
the
ring
or
that
the
token
ranges
start
on
the
left
of
the
modem.
B
Whenever
there's
a
simple
way
of
doing
things
is
normally
a
more
complicated
way,
and
we
call
that
the
network
topology
strategy
and
in
Andhra
that
allows
us
to
use
a
replication
factor
per
data
center.
The
data
center
to
cassandra
is
not
only
a
physical
building,
it
is
a
collection
of
nodes
inside
your
your
cluster,
so
this
could
be
that
you're
running
Cassandra
cluster.
That
has
some
nodes
in
an
AWS
region
on
the
East
Coast
and
an
AWS
region
on
the
west
coast.
It
could
be
that
you
have
one
building
and
inside
that
building.
B
You
have
all
of
your
Cassandra
modes,
but
you
split
them
into
two
datacenters
one
of
those
can
handle
the
public-facing
web
transactional
load
and
another
one
of
those
can
handle
an
internal
facing
and
elliptical
mode.
This
allows
you
to
separate
the
workload
the
analytical
load.
They
could
be
powered
by
habit
which
we
easily
connect
with,
and
that
allows
you
to
get
your
transactional
mode
instantaneously,
available
into
your
analytical
side
and
for
your
internal
people
to
be
able
to
analyze
that
without
having
any
impact
on
the
public
facing
side.
I
have
to
apologize
here.
B
B
You
notice
that
the
nodes
on
the
East
Coast
data
center
have
tokens
that
are
one
off
from
the
ones
on
the
west
coast
data
center.
This
is
just
a
little
bit
of
sikandar
implementation
leaking
through.
We
do
think
of
this
as
one
entire
cluster
and
every
node
in
the
cluster
has
to
have
a
unique
token,
and
so
the
way
we
normally
deal
with
this
is
we
just
add
one
when
we
add
a
ksiva
temper.
So
if
we
had
a
third
data
center
here
knows,
100c
might
have
a
token
of
two.
B
B
So
we
have
a
simple
snitch
and
again
the
idea
of
the
simple
snitches
to
do
things
as
simply
as
possible,
and
it
places
all
the
nodes
into
the
same
data
center
and
the
rack.
We
have
a
property
file
snitch
which
allows
you
through
configuration
to
say
this.
This
node
is
in
this
rack
and
there's
also
some
other
ones
which
infer
the
data
center
and
the
rack
from
the
IP
address
and
likely.
B
The
network
topology
strategy
in
this
case,
where
nodes
are
in
different
data,
centers
and
different
racks,
does
its
best
to
replicate
your
data
on
each
rack
in
each
data
center.
This
means
they've
used
to
chloe's
Cassandra
into
AWS,
and
you
have
Cassandra
nodes
in
three
availability
zone
say
and
you'd
lose
an
availability
zone.
You
lose
a
whole
physical
building,
as
it
happened
a
couple
of
times
in
the
last
few
years.
B
Your
Cassandra
installation
can
continue
working,
we
can
continue
serving
requests,
and
this
is
how
we've
seen
in
the
past.
Companies
like
Netflix
and
simple
geo
have
been
able
to
handle
these
large-scale
failures
in
AWS
and
continue
serving
requests
on
top
of
these
niches,
which
give
us
the
static
understanding
of
your
cluster.
We
have
this
thing
called
the
dynamics
niche,
which
is
a
little
bit
inside
baseball,
but
I
think
that
it
illustrates
what
it
means
to
be
at
a
peer-to-peer
system.
B
One
of
the
things
we
use
the
snitch
for
is
to
understand
which
nodes
are
physically
close
to
each
other.
So
we
can
direct
traffic
in
the
most
efficient
way,
but
just
because
two
nodes
are
in
the
same
rack
doesn't
always
mean
that
they
are
that
it's
the
closest.
So
we
want
to
know
that
is
actually
the
fastest.
B
The
dynamic
snitch
runs
on
each
node
and
it
watches
the
request
reply:
traffic
between
nodes
and
develops,
an
understanding
of
which
nodes
have
has
the
best
performance.
From
the
point
of
view
of
the
node
that
it's
running
on,
we
don't
have
a
global
server
that
looks
at
your
entire
cluster
and
develops
a
view
of
things
from
30,000
feet.
The
dynamic
snitch
gives
an
idea
of
what
it
means
they
appear
to
PA
system.
B
The
mode
is
developing
a
view
of
the
rest
of
the
cluster
on
its
own
from
its
perspective
and
it's
using
that
information
to
make
decisions
without
consulting
with
other
nodes
or
some
sort
of
master
server
more
feed.
Later
on
that,
we
also
have
this
idea
of
gossip,
which
allows
which
takes
this
to
the
next
level.
B
Again,
apologies
here,
this
slides
being
mangled
a
little
by
webex
when
it
comes
time
to
store
data,
because
we
have
appeared
the
PA
system,
every
node
into
sandra
is
the
same
and
they
can
all
perform
the
same
actions.
The
client
doesn't
have
to
be
aware
of
where
its
data
is
going
to
be
stored.
It
can
connect
to
any
node
in
the
cluster
and
ask
them
to
store
or
read
data,
and
we
call
this
mode
that
it
connects
to
the
coordinator
for
the
request
it
uses
it
to
understanding.
B
We
don't
have
to
go
to
a
master
server
or
we
don't
go
to
one
knows
the
market
to
do
some
work
and
then
go
to
the
pass
that
message
on
to
the
next
node.
In
this
example,
the
client
connects
node
for
masks.
It's
a
store,
our
food
team
and
those
all
users.
It
seals
the
cluster
to
make
one
hop
to
node
1.
At
the
same
time,
it
makes
the
message
to
no.2
and
no.3.
B
So
this
gossip
thing
that
I've
mentioned
previously
is
a
really
neat
and
simple
idea
that
allows
the
nodes
to
share
information.
So
every
second
one
node
talks
to
one
two
three
other
nodes,
and
it
says
this
is
what
I
know
about
me,
and
this
is
what
I
know
about
everyone
else
and
don't
have
to
connect
every
node
in
the
cluster
just
connects
to
between
one
and
three
sends
out
these
little
bits
of
information
and
women.
B
B
So
again,
if
we
had
a
client
and
our
coordinator
in
a
multi
data
center
set
up
the
coordinator
in
the
left-hand
side,
Stata
Center
knows
about
the
nodes.
On
the
right
hand,
side
data
center-
it
doesn't
have
to
ask
someone
else
when
it
wants
to
send
data
over
to
the
more
option
to
read
the
same
network
bandwidth.
B
B
There's
lots
of
machines
involved
there
and
start
to
wonder
what
happens
when
they
fail.
What
happens
when
modes
unavailable?
How
does
that
impact
the
claim?
Each
quiet,
each
request
from
a
client
specifies
this
thing.
We
call
the
consistency
level
which
tells
the
coordinator
how
many
nodes
to
wait
for
it
says
I
want
you
to
store
this
value
for
me
and
when
this
many
nodes
have
completed
the
request.
B
Let's
consider
that
request
complete
success,
and
please
come
back
to
me
if
the
nodes,
don't
repeat
that,
if
those
don't
return
back
to
you
in
time,
consider
that
a
failure
consistency
levels
are
built
in
Pakistan
grill.
We
have
some
basic
ones
of
any,
which
means
which
is
only
available
when
you're
doing
all
right,
one
two
and
three
quorum,
which
is
the
most
common
news.
B
We
use
intestine
sea
level,
a
local
quorum
which
is
a
quorum
in
the
data
center,
that
your
request
started
and
each
quorum,
which
is
only
available
for
right,
and
these
are
waiting
for
a
quorum
in
each
data
center
that
we're
storing
the
data
in
the
idea
of
a
quorum
is
a
very
simple
and
important
one.
The
quorum
is
the
replication
factor
divided
by
2,
plus
1
I
mean
take
the
floor
of
that.
So
the
record
for
a
replication
factor
of
3
the
quorum
is
to
replication
factor
goes
up
to
four
or
five.
B
The
quorum
is
three
and
so
forth.
As
you
increase
your
replication
factor,
you
get
more
redundancy.
So,
most
of
the
time
we
see
people
would
using
replication
factor
of
three
and
working
in
the
quorum,
which
means
the
data
has
is
on
three
nodes,
and
someone
is
two
of
those
nodes
agree.
As
long
as
two
of
those
nodes
participating
in
request,
then
we
consider
that
request
to
be
successful.
B
In
when
we
look
at
this
in
on
their
posture
diagram,
we
can
see
the
client
connects
to
node
4
nodes.
1
2
are
up
and
running,
and
in
this
example
let's
say
no
3
is
down
before
the
request
starts.
Oh
no,
for
knows
it's
down
that
there's
some
other
in
situations
where
no
3
might
become
unavailable
after
the
request
has
started
and
will
address
those
shortly.
B
So
let's
say
the
client
asks
note
for
to
write
some
data.
It
looks
at
it
view
of
the
cluster
and
says
well
mode.
1
and
2
are
available,
but
note
3
is
offline.
That's
okay,
I've
got
two
replicas
I
can
store
this
on.
I
can
meet
the
consistency
level
that
the
client
has
asked
me
to
use,
though,
goes
ahead
and
does
the
request
and
returns
back
to
the
client
and
simply
sends
I
thought
this.
B
Isn't
the
minimum
number
of
times
that
you
asked
me
to
do
that
while
it
was
running
this
request
and
node
4
knew
that
no
3
was
down
it's
stored.
This
thing,
hopefully
recall
a
hint
at
hand
off
which
means
that,
when
no
forward
notices
mode
3
to
come
back
up
again,
it
comes
back
up
again.
It
will
send
the
request
that
notary
missed
over
to
it
and
all
the
other
nodes
in
the
cluster
will
be
doing
this
as
well.
B
B
B
B
B
We
look
at
table
here
which,
given
web
actors
conveniently
put
the
headline
over,
we
look
at
on
the
left-hand
column.
We
have
some
different,
poems
purple
monkey
dishwasher
and
we
have
nodes
1,
2
and
3.
The
red
text
is
the
value
that
wins
according
to
the
timestamp,
so
in
the
first
column
C
note,
3
doesn't
have
a
value
into
purple,
93
doesn't
have
a
value
and
those
one
and
two
do
and
their
value
agrees
because
they
have
the
same
time
stamp.
B
So
we
use
that
for
the
monkey
column,
no
III
has
a
value,
but
it
has
a
lower
timestamp
and
so
will
use
the
biggins
value
which
matches
timestamp
pen
and,
for
the
last
column,
no
three
as
a
value
that
is
higher.
That
has
a
higher
timestamp
than
the
other
two,
and
so,
in
that
case,
we'll
use
the
value
from
mode
three.
B
Important
point
there
is
that,
because
sorry
notary,
so
when
those
values
and
that
are
different,
we
have
to
handle
that
to
get
a
consistent
read.
So
if,
if,
for
example,
we
do
a
read
and
those
one
and
to
return
a
value,
cromulent
and
no
3
returns,
the
value
in
big
uns
mode
for
has
to
decide
which
is
the
correct
value
and
we've
seen
it
does
that
with
timestamps,
and
it
takes
a
match
if
I
make
sure
that
later
on,
no
three
actually
give
us
back
the
right
value.
B
B
This
idea
of
having
a
quorum
of
nodes
and
is
feeds
into
this
thing.
We
call
strong
consistency
if
the
number
of
nodes
involved
in
the
right
for
a
piece
of
data
plus
the
number
of
nodes
involved
in
a
reed
is
greater
than
the
number
of
replicas
thing.
Cassandra
will
be
strongly
contestant,
which
means
that
after
every
right,
we
will
return
back
the
value
that
was
written
the
normal
way
we
go
about
achieving
that
is
using
a
quorum
for
the
read
a
quorum
to
the
right.
B
There
are
times
when
people
reduce
their
consistency
because
they
want
to
get
more
availability
and,
in
that
case,
they're
working
in
this
eventually
consistent
world,
and
that
means
that
Cassandra
take
actions
behind
the
scenes
so
that
all
the
reads
eventually
return.
The
same
results
we've
seen,
one
of
those
things
called
hinted:
handoffs,
there's
another
idea
of
read
repaired
and
we
also
have
some
scheduled
remote
repairs
that
you
can
run
on
the
command
line.
B
It's
a
very
quick
introduction
for
the
cluster.
You
can
see
that
by
having
a
number
of
modes
having
a
replication
factor
of
three
or
more
using
a
quorum
consistency
level,
he
gets
the
best
of
both
worlds.
You
get
your
data
replicated.
You
get
high
availability
because
you
can
afford
to
have
one
node
down
in
that
scenario
and
you
get
a
strongly
consistent
system.
B
Let's
have
a
brief
look
at
the
data
model.
Well,
I
said
so
far
is
pretty
incomplete.
I
said
we
have.
These,
we
have
rows,
may
have
a
key
and
we
have
some
columns
in
them.
Let's
log
into
that,
a
bit
more
Sandra
has
the
idea
of
a
key
space
which
is
analogous
to
a
database,
and
we
put
our
columns
in
these.
Things
are
called
column
families,
and
we
also
now
more
and
more
calling
them
tables
and
you'll
see
both
out
there.
B
The
column
family
is,
as
a
naming
implied,
is
a
collection
of
columns
and
the
road
can
have
different
columns
in
different
column.
Families
and
different
rows
can
have
different
columns
in
different
column,
families
and
inside
the
column
family.
All
the
columns
are
ordered
and
addressable
by
their
name
there's
a
very
flexible
model.
It
allows
us
to
only
store
the
data
that
we
need,
because
we
don't
prescribe
what
the
columns
are.
We
don't
have
a
schema
that
says
every
row
has
the
columns
user
name
first
name.
B
Last
name:
when
it
comes
time
to
store
the
user,
you
can
just
saw
the
user
name.
If
that's
all,
you've
got
if
you've
got
their
first
and
last
name
as
well.
You
can
store
those
as
we've
seen
before
the
rose
around
unit
of
replication.
We
take
a
road
entire
copy
of
that
is
on
each
of
our
replicas
and
that
the
column
family
is
our
unit
of
storage.
When
you
get
into
central
you'll
see
that
a
column
family
equates
to
some
files
on
disk
Colin
families
are
also
our
unit
of
querying.
B
B
The
interesting
thing
is
that
the
value
is
optional,
but
the
name
is
not
so
often
you
can
get
to
a
situation
where
the
mere
presence
of
a
column
in
a
column
family
employ
has
some
information
to
it,
and
we
see
they
sometimes
in
the
sorts
of
data
models.
People
create
where
columns
exist.
They
don't
have
a
value,
but
the
name
is
important
and
it
can
paint
some
information
in
there.
B
B
So
we
have
a
ski
in
utf-8
and
integers
and
loans
and
things
new
you
IDs,
and
these
really
nice
things
called
counters,
which
is
our
distributed
counter.
They
can
only
be
used
for
column
values,
but
they
allow
you
to
do
things
like
counter
web
hits
count.
The
number
of
people
who
visited
a
product
page,
something
like
that
on
top
of
this
basic
data
model,
we
have
composite
data
types.
Allow
you
to
combine
two
or
more
basic
types
into
one
structure.
For
example,
you
might
have
a
column
name
which
has
a
timestamp
and
the
user
name.
B
I'll
be
discussing
data
modeling
further
ones,
that
november
seventh
and
next
month
in
the
next
webinar
that
I'll
be
doing
on
data
modeling
again,
we've
blasted
through
the
data
model
very
quickly
there.
Just
to
recap:
cassandra
has
column
families
which
are
also
becoming
more
knowing
there's
tables
in
a
Cassandra
column.
Family
columns
are
all
optional
and
the
values
optional,
and
you
can
store
a
different
number
of
columns
in
different
rows.
B
B
B
So
in
this
example,
here
you
can
see
I
get
a
connection
to
a
column
family,
just
by
name
and
I
want
to
insert
some
data
so
just
by
the
roku
and
the
dictionary
of
column,
name
and
column
value
I
wanted
to
do
a
get.
I
can
get
one
row
just
by
saying
get
me
everything
on
that
road
provide
the
roti
or
I
can
say
hey.
Do
we
get?
Is
the
road
key
and
I
just
want
this
column?
Name
can
also
do
a
multi
get
there.
B
B
When
we
look
at
c2l
things
a
little
bit
different,
but
they're
also
the
same
because
they're
the
same
across
whatever
language
in
using
it.
So
this
would
this
would
work
in
if
you're,
using
tql
through
Python
or
using
c2l
through
java.
You
would
see
the
same
things
when
they
do.
An
insert
here
is
very
familiar
to
people
coming
from
again
and
Esther
Elwell
insert
into
the
common
family
the
key
and
the
column
names,
and
that
creates
a
row
for
us.
B
I
can
get
back
all
the
columns
in
a
row
by
doing
a
select
star
or
you
can
get
back
mains
columns
from
a
row
by
doing
a
select
and
specifying
the
column
names
again.
I
can
do
a
multi
get
where
it
I
specify
multiple
rotis
there
I
can
do
Adeleke
where
the
leak,
the
entire
row
or
delete
just
one
column
from
the
column,
family
and
that's
all
the
time
we've
got
for
now,
because
we
want
to
leave
what's
times
for
questions.
A
B
A
A
That's
actually
I
believe
we're
going
to
do
that,
one
on
the
Tuesday,
so
that
will
not
be
on
the
Wednesday
Oh
errands
first
question
is
for
sorry:
Erin
I
learned
that
when
you
introduce
yourself
Erin
so
first
question
at
the
right
fails
are
ye.
It
is
not
able
to
write
to
the
quorum
because
it
roll
back
the
rights
that
did
succeed
and
then
follow
up
to
that
would
be.
When
writing.
Does
it
attempt
to
write
to
quorum
or
attempt
to
write
to
all
and
just
wait
for
ask
from
quorum.
B
Ok,
so
I'll
answer
those
in
reverse
order:
right
assent
to
all
replicas
for
a
rocky,
so
we
don't
tend
to
adjust
the
quorum
to
just
the
consistency
level,
we're
actually
a
little
bit
cleverer
than
that
as
you're
bringing
on
new
nose
into
the
cluster.
We
know
that
they
can't
handle
reads,
but
we
want
to
get
the
new
data
over
to
them
that
they
may
become
replicas.
For
so,
the
right
goes
to
all
the
nodes,
and
it
goes
to
the
ones
that
we
know
in
the
future.
B
You'll
want
to
do
reads
from,
and
we
don't
roll
back
the
right
if
it
fails
to
reach
quorum.
We
fail
and
return
back
a
quite
an
error
to
the
client
and
remember
those
timestamps
that
we
have
on
all
the
columns
and
remember
that
when
we
compare
them
we're
just
you
put
with
putting
on
where
you
select
the
highest
timestamp,
it
means
that
our
write
operations
are
eating
item
at
their
always
get
this
wrong
idiomatic
by
the
potent.
B
So
that
means
that,
if
you
send
that
right
and
it
fails,
you
can
send
your
client
can
send
it
again,
perhaps
sending
it
to
a
different
mode.
In
the
cluster
and
that
right
can
succeed
now,
if
another
client
in
your
application
somewhere
else
has
sent
a
different
right,
but
it
had
a
higher
timestamp
that
will
win
so
I'm
sense.
Allow
us
to
resend
our
requests
that
fail,
and
we
don't
deal
with
rolling
back
a
transaction
when,
when
we
don't
reach
the
consistency
level.
A
Thank
you
very
much
questions
coming
fast
and
furious
right
now,
I
can
answer
one.
Will
the
slides
be
available?
Yes,
absolutely
we
post
these
slides
on
dates,
tax
com,
along
with
the
archive
where
you
can
watch
the
presentation
again
next
question
from
prytan.
What
does
this
like
by
names
like
by
column,
mean.
B
So
sliced
by
name
means
that
you're,
starting
at
a
point
that
I
mean
you're,
specifying
the
names
that
you
want
to
get.
Is
it
probably
a
little
bit
me
leaking
like
old-school
Cassandra
into
the
modern
era?
So
it's
less.
My
name
normally
means
you
say
that
you
want
to
get
columns
a
dnc
from
the
road
with
key
food.
The
other
approaches
you
can
say
I
just
want
to
get
the
first
10
rows
for
sorry.
B
The
first
10
columns
from
the
row
with
key
food
into
two
different
approaches
to
things
the
first
one
often
use
where
you
might
have
an
orm
us
or
you've
got
a
bit
of
a
data
model
that
your
application
layer
understands,
and
you
know
that
the
user
may
have
these
ten
columns
and
the
next,
the
other
one,
the
odds,
the
one
where
you
don't
know
the
column
names
working
situations.
Where
you
have
time
series
data,
often
you
might
be
storing
events
storing,
tweet,
storing
signals
that
are
coming
in
of
hardware.
B
A
B
You
can
deal
with
that
as
your
application
layer
with
some
locking
or
something
like
that,
or
you
can
deal
with
that
by
repairing
the
inconsistent
in
your
application.
We
need
to
read
so
you
read
something
and
it
comes
back
and
you
pivot
from
that,
and
you
go
and
get
the
real
entity
and
then
discover.
Oh,
that
entity
doesn't
have
three
equals
bar
anymore,
so
I'll
discard
that
from
the
index.
A
B
Yeah
in
a
two
node
cluster
and
the
replication
factor
I'm
getting
there
is
to
the
quorum
of
two
is
two,
unfortunately
so
with
RS
to
you
cannot
afford
to
lose
one
mode
if
you've
got
a
two
node
cluster
and
your
replication
factor
was
wrong.
One
then
your
quorum
would
be
one.
So
it's
why
often
we
suggest
people
start
at
the
replication
factor
of
three,
because
it
gives
you
this.
It
gives
you
the
ability
to
lose
one.
Those
great.
A
B
So
you
think
in
Cassandra,
when
you're
doing
a
read,
we
know
where
we
know
the
notes
to
go
to
and
we
get
there
in
one
network
hot.
We
put
in
a
bit
of
a
trick
there
for
network
efficiency,
as
I
mentioned
in
the
introduction.
It
is
well
an
introduction.
We
don't
send
the
read
request
to
all
of
the
nodes,
all
of
the
replicas
for
a
row.
We
buy
the
ninety
percent
of
the
request.
We
send
it
to
just
a
number.
B
We
need
to
achieve
the
consistency
level
for
ten
percent
of
the
requests
we
send
it
to
all
of
the
nodes,
and
we
only
ask
one
of
those
nodes
involves
to
send
us
back
for
salt
data
and
we
say
some
network
bandwidth
there
and
the
others
send
back
a
digest
of.
What's
going
on
so
I'm,
not
sure
I've
answered
your
question
there
very
well,
but
we
cassandra
is
made
to
handle
data
being
on
all
on
different
nodes
and
we
have
design
choices
in
there
that
allow
us
to
save
networkers
on
networking
and
with.
B
It
would
be
one
and
reputation
factors.
One
is
might
not
be
as
crazy
as
it
sounds.
I
think
you
could
use
that
in
a
situation
where
you
really
wanted
to
record
you,
some
high
throughput
data
and
you
didn't
have
the
resources
that
has
lots
of
modes,
but
definitely
would
suggest
that
and
you
and
you
add
you
get
to
that
replication
factor
three
and
then
you
get
all
the
benefits
of
having
a
highly
available
system.
A
B
B
If
you're
talking
about
that
you're
talking
about
a
query,
then
they
had
to
go
to
every
node
in
the
cluster,
which
is
what
secondary
indexes
do
and
then,
when
they
get
there,
they
would
have
to
do
an
inefficient
type.
Like
type
query,
maybe
you
put
a
percent
sign
in
front
and
attention
at
the
back.
You
have
to
scan
over
every
value
there
in
Cassandra.
B
When
you're
using
g,
two
or
three
to
get
the
advantages
of
being
able
to
have
that
expressive
power
in
a
text-based
API,
you
need
to
tell
t
two
or
three
what
your
schema
is.
So
you
need
to
say:
hey
is
it
has
a
create
table
statement
and
you
say
my
user
has:
please
cut
these
columns
to
Barb
ads
and
then,
when
you
do
a
select,
see
two
or
three
has
some
understanding
of
what
it
does
and
they
can
project
back
to
you.
B
The
data
structure
that
you're
expecting
and
once
they
create
people
I,
don't
want
to
scare
people
off.
We
don't
we.
We
do
have
an
ultra
stable
statement,
but
it
is
not
like
an
ultra
table
statement
in
in
relational
database
on
disk.
We
don't
store
empty
gaps,
we're
not
story
get
in.
Indeed
null
columns
when
you
do
an
altar
table,
we're
not
taking
rocks
and
we
don't
update
the
data
in
place.
The
schema
that
you
tell
cql
is
a
schemer
so
that
there's
select
statements
were
an
insert
statement.
B
A
B
Good
question:
it
can
be
a
net
cookie
or
a
surrogate
key
and
the
same
roti
can
be
used
in
all
column
families.
So
people
often
use
natural
keys
because
we
don't
have
to
be
like
an
int
or
I,
don't
have
to
join
up
and
have
an
identity
state
statement
or
something
like
that.
So
people
often
use
natural
keys.
A
B
So
I'm
going
to
assume
that
you
have
the
data
files
somewhere
there's
a
couple
of
ways.
You
can
do
that
in
Cassandra
we
have
a
snapshot,
backup
system,
which
/
is
everything
to
disk
and
paid
some
hard
links.
We
also
believe
coming
up
in
1.2
has
an
incremental
backup
system,
and
the
nice
thing
is
when
you
look
at
Cassandra
on
disk.
It
is
just
some
files
on
disk
number
of
finals
and
they
have
names
which
are
king
space
and
column
family
in
the
in
the
name
laid
out
sensibly.
B
If
you
want
to
go
and
move
your
king
space
into
a
lucid
data
files
into
a
different
key
space,
you
can
do
that
by
tuning
load,
offs
and
just
moving
the
file
there's
also
a
command
using
the
note,
all
utility
which
will
reload
data
files
from
discs
called
motile,
reloads
I.
Think
without
you
having
to
shut
it
down.
A
B
Have
different?
Yes,
they
say
they
have
different
times
names,
because
the
clients
are
on
different
machines,
the
race
about
who
wins
starts
when
they
choose
their
time
stamp
for
the
request,
so
that
kind
of
outside
the
soap,
the
scope
and
Cassandra.
What
we
care
about
is
that
one
wins
and
so
when
they
they
get
each
of
those
requests
from
take
its
processed
on
on
three
modes,
and
we
can
look
at
one
of
those
nodes
and
it'll
tell
us
how
they
all
do
the
same
job.
B
There's
a
couple
of
ways:
it
goes,
it's
a
quiet
ones,
request,
finishes
and
completes
and
inclined
to
tease,
request
finishes
with
starts
and
finishes
after
both
after
that
happened,
the
request
with
the
highest
timestamp
will
be
the
one
whose
values
and
now
the
current
values,
let's
say
that
they
both
get
processed
at
the
same
time,
because
in
Cassandra
we
have
a
thread,
pool
handling
your
right
and
it's
entirely
conceivable
that
they
both
require
get
processed.
At
the
same
time,
we
now
have
the
idea
of
row
level
isolation.
A
Ok
Erin,
thank
you,
so
much
I
think
I
was
a
record
for
how
many
questions
we
got
to
through
so
quickly.
Thank
you,
great
presentation,
thank
you
to
everyone
on
the
line,
see
you
back
here
in
two
weeks,
and
the
archive
of
this
webinar
will
be
available
on
the
resources
section
of
dates:
tanks,
comm
this
friday
and
we'll
send
you
out
an
email
thanks.
Everybody.