►
From YouTube: Walmart: From bricks to clicks - Cassandra at Walmart
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
A
Goal
today
is
just
to
give
you
an
overview
of
Cassandra
at
walmart
how
we
brought
it
in
some
of
the
challenges
we
face.
Some
mistakes
we
made
and
then
we're
going
to
kind
of
go
into
a
particular
use
case
that
chatter
and
I
spent
the
last
few
years
working
on
in
a
little
bit
more
detail.
That's
that
dynamic
data
model,
so
we'll
have
time
for
questions
at
the
end.
So
if
you
could
just
kind
of
hold
those
questions
until
the
end,
I
think
we'll
have
plenty
of
time
to
address
those.
So.
B
B
A
little
bit
about
Walmart,
these
are
just
some
interesting
facts.
Over
260
million
customers
are
in
our
stores
and
online.
Each
week
we
span
28
different
countries
employ
roughly
2.2
million
associates
across
around
the
globe
over
11,000
retail
units,
so
that
includes
walmart
stores
and
sam's
clubs,
and
we
have
12
different
ecommerce
websites.
A
We
have
an
engineering
presence
in
Bentonville
Arkansas
and
that's
actually
where
chad
and
I
work
from
also
here
in
the
bay
area,
and
we
recently
opened
up
an
office
in
Reston
Virginia
and
in
Bangalore
India,
so
we've
got
over
35
production,
closed
clusters
and
that's
over
500
nodes.
Most
of
those
are
physical
hardware,
so
bare
metal,
and
we
started
looking
at
Cassandra
back
in
the
zero
point
seven
days.
This
was
around
late
2010
early
2011
and
things
were
really
there
kind
of
primitive
back
then.
So
there
was
no
cql.
No
transactions.
Drivers
were
really
bad.
A
We
we
had
a
lot
of
Hector
code,
so
yeah
it
was
really
bad
so
and
there
was
actually
a
lot
of
different
groups
in
walmart.
You
know
we're
a
big
organization
and
there
was
actually
a
lot
of
different
engineering
groups,
both
in
Bentonville
and
out
here
in
walmart
com
that
we're
looking
at
Cassandra
and
and
there's
just
a
lot
of
things
happening,
and
it
really
was
kind
of
a
grassroots
thing.
Developers
were
getting
it,
trying
it
out,
seeing
how
it
worked
and
where
they
might
use
it.
B
Yeah,
so
you
can
imagine
with
with
the
size
and
the
number
of
engineering
presence
engineers
that
we
have.
We
have
a
lot
of
use
cases
for
that
fixed
center,
really
well
in
particular.
Right
now,
we're
making
a
pretty
big
investment
on
modernizing
are
all
of
our
systems,
so
we're
taking
all
these
nightly
batch
processes
moving
them
into
more
of
a
real
time
based
applications
and
as
we're
doing
that
we're
seeing
a
huge
uptake
on
Cassandra
in
general.
A
That's
that's
pretty
big
I
mean
a
company
our
size
with
the
resources
we
have,
although
we
do
try
to
keep
things
low
cost,
you
know
wit,
we've
got
access
to.
You
know
the
full
menu
of
databases
and
technologies
out
there
and
we
have
a
lot
of
them
already
implemented.
So
we're
able
to
try
a
lot
of
different
things,
and
so
this
one
with
you
know,
updates
coming
in
from
stores
and
clubs.
A
I
mean
you
saw
11
and
a
half
in
retail
units,
so
making
that
being
able
to
ingest
that
data
and
make
it
available
for
reads
to
everybody
in
real
time
and
that's
that's
a
huge
deal
for
us
and
cassandra
is
just
the
only
thing
that
we
found.
That
can
actually
do
that,
but
it
was
hard,
so
I
don't
mean
technically
hard.
I
mean
we
had
challenges
and
we'll
go
into
what
some
of
those
challenges
were
and
mistakes
we
made.
But
those
are
technical
problems
we
can
solve
those
I
mean
politically
just
just
getting
through.
A
A
And
so
but
people
everybody
has
their
bias
right,
but
but
you
know
most
of
these
people
were
genuinely
concerned
with
with
worried
about
us
losing
data
or
not
keeping
data
safe,
and
so
you
kinda
have
to
recognize
that
is
that
they're
just
kind
of
looking
out
for
for
the
company
and
for
for
the
company's
information.
So
it
took
some
extra
work
on
our
part
to
kind
of
put
together
this
plan.
You
know
what
are
you
going
to
do
about
backups
and
recovery,
and
how
are
you
going
to
deal
with
his
eventual
consistency?
A
We've
never
had
to
deal
with
that
before
when,
if
you
actually
think
about
and
go
look
at
it,
we've
always
had
some
form
of
eventual
consistency.
It's
just
a
little
different
now
and
non
acid
compliance.
You
know
people's
hairs
catch
on
fire
when
you
say
you
know,
you're,
not
acid,
compliant
they
just
sort
of
at
least
back.
Then
it
would
just
kind
of
assume.
Oh
it's
a
database,
that's
just
like
every
other
database,
but
Cassandra
was
different.
B
A
A
Another
one
is
batch
workloads,
so
we
actually
had
one
instance
where
there
was
this
team.
They
were
processing
a
bunch
of
data,
and
so
they
had
all
this
data
in
Hadoop
and
they
thought
what
would
it
be
cool
if
we
could
just
etl
that
data
into
cassandra
and
use
cassandra
to
do
some
more
processing
and
then
load
all
that
data
back
out
again
into
Hadoop
and
well
that
just
didn't
work
at
all.
A
That
was
a
complete
failure
and
and
so
sort
of
sort
of
along
with
that,
we
we
do
get
use
cases
sometimes
and
developers
come
up
to
us
and
they're
they're,
saying:
okay,
we're
going
to
do
this
and
we're
going
to
do
on
Cassandra
and
so
every
night
we're
going
to
etl
some
data
from
the
data
warehouse
or
somewhere
and
loaded
in,
and
you
know,
make
it
available
for
reads
and
well.
You
know
and
that's
like
their
primary
use
case.
A
B
A
And
so
one
thing
you
kind
of
tend
to
see:
there
is
a
quick
way
to
evaluate
a
model
that
somebody
gives
you
is
if
they
have
a
ton
of
secondary
indexes.
So
we've
seen
that
a
lot
where,
like
half
the
columns,
have
a
secondary
index
and
you
account
to
explain
things
and
work
through
that
and
see.
Well,
what
what
queries
are
you
really
going
to
run
and
how
do
we
make
this
model
perform
it
another
one?
A
It
is
tombstones
this
one
actually
kind
of
bit
us
in
the
project
we
were
working
on,
so
you
want
to
make
sure
you're
using
TTL
to
your
advantage.
That's
a
really
handy
feature
if
you
can,
but
really
you've
got
to
understand
what
youre
delete
workload
is
going
to
be,
and-
and
one
thing
we
didn't
realize
is
that
when
you
update
a
column
to
Knohl,
that's
actually
a
tombstone,
and
you
know
you
can
we
were
able
to
go
back.
A
A
So
there
there's
one
particular
data
model
or
particular
use
case
that
we
want
to
talk
about.
This
is
dynamic
data
model,
so
back
in
2011
that
really
triggered
us
to
start.
Looking
at
Cassandra
there's
a
lot
of
stuff
going
on
in
the
industry
we
got
put
on
to
this
project.
It
was
really
a
kind
of
a
transformational
Greenfield
type
project,
which
is
a
really
cool
place
to
be.
That
doesn't
happen
a
lot,
but
but
this
was
sort
of
it's
kind
of
a
perfect
storm
for
being
able
to
go
out
and
look
at
you
know.
A
This
was
really
a
lot
of
stuff
happening
in
the
industry.
Cassandra
was
part
of
that.
So
we
looked
at
a
lot
of
stuff,
I
mean
we
looked
at
HBase
and
and
Hadoop,
and
things
and
Cassandra
end
up
coming
out
on
top
after
we
got
through
it
and
and
sort
of
got
to
play
with
everything.
Cassandra
was
really
kind
of
a
no-brainer
for
for
us,
so
the
problems
that
we
had
to
solve.
We
had
this
handful
of
entities,
different
types
of
records
that
we
were
dealing
with.
A
Each
one
would
have
maybe
dozens
to
even
hundreds
of
attributes
for
entities.
It
was
really
sparse
and
by
that
I
mean
record
to
record,
you
would
have
a
different
set,
a
wildly
different
set
of
attributes,
so
it
is
very
sparse.
Attribution
new
attributes
are
coming
up
regularly
and
we
needed
full-text
search.
So
DSC
with
solar
really
worked
out
for
us,
and
then
we
would
also
have
frequent
intensive
maintenance,
so
customers
would
be
able
to
go
in
and
initiate
some
maintenance
process.
B
Then
non
functionally
right.
We
wanted
one
logical
database
where
historically,
we've
charted
our
databases
by
country
right.
So
imagine
all
those
countries
that
were
in
we'd
have
the
duplicate
database
in
that
particular
country.
So
we
wanted
to
bring
all
that
together
in
one
one
database
and,
of
course,
the
data
that
we
were
storing.
We
needed
to
be
available,
one
hundred
percent
of
the
time
no
downtime
was
acceptable
and
it
had
to
be
fast.
B
A
A
You
you
don't
count
so
so
this
is
an
thrift.
So
it
looks
something
like
this
this.
This
is
sort
of
what
the
schema
might
look
like.
There's
not
a
lot
here,
but
so
we've
got
this
column,
family
called
stuff
and
there's
two
columns
to
find
here,
there's
a
type
and
a
user
key
and
then
down
there
at
the
bottom.
You
can
see
this
default
validation
class
is
bytes
type.
A
A
Then
we
started
getting
a
little
bit
fancy
and
okay.
Well,
let's
make
it
a
little
dimensional.
So
we
would
do
like
this.
In
this
case,
this
is
a
language
code,
so
we've
got
a
different
different
description
for
another
language,
so
we
started
adding
all
this
stuff
and
this
got
really
complex
really
fast
and
we
had
some
significant
serialization
logic
in
our
code
right
everything
was
locked
away
in
the
code,
and
so
it
got
pretty
complex,
yeah.
B
So
then
anybody
can
guess
what
happened.
But
what
came
about
cql
and
then
our
first
impression
cql
was
hey
you're
breaking
our
dynamic
model,
you're
killing
us
here,
you're
putting
training
wheels
on
our
database
that
weed
that
we've
had
successful
implementation
with,
but
we
did
see
the
community
going
towards
cql
and
immediately
it
had
a
lot
of
valuable
features
that
we
wanted
to
take
advantage
of.
So
we
got
a
board,
but.
A
How
to
get
there
so
yeah
we're
using
hector
and
everything
and
and
so
as
part
of
migration,
we
did
move
to
the
datastax
driver
and
c
ql,
but
but
when
seek
you
all
came
along,
it
really
sort
of
pointed
out
this
deficiency
we
had
and
it
was.
We
were
getting
into
this
world
of
schema
list,
which
is
a
very
dangerous
place
to
be,
and
so
so
the
application
defined
the
schema.
And
so,
if
you
really
want
to
know
how
things
work,
you'd
have
to
go
into
the
code
and
so
maintainability
suffered
because
of
this.
A
B
Than
one
particular
instance,
where
really
kind
of
bit
is,
as
you
remember
the
use
case,
we
had,
we
need
to
have
real-time
text
text,
search,
full
text,
search
on
all
of
the
entities
and
all
the
attributes
right.
So
we
got
into
a
huge
heap
of
trouble.
He
put
a
ton
in
Java
pond,
so
we
had
all
this
coming
up
with
r.
If
everything
was
working
all
of
our
unit
tests,
everything
was
successful,
so
we
wanted
to
put
a
little
stress
test,
a
load
test
on
this
data.
B
We
quickly
identify.
We
had
a
huge
spike
in
latency
long,
GC
pauses
and
the
cluster
was
dying.
We
weren't
able
to
restart
our
solar
nodes.
We
tried
clearing
all
the
commits
logs
increasingly
heap
to
16
gigs,
and
they
would
not
start
they
run
out
of
heap.
So
at
that
time
we
were
like
hey
this
kid.
Datastax
involved
called
a
few
people
they
got
engaged
in
and
they
they
uncovered
that
we
we
had
literally
billions
of
fields
that
were
trying
to
be
indexed
and
solar.
A
A
So
this
is
sort
of
what
we
did
in
c
ql.
This
looks
an
awful
lot.
What
you
might
define
in
a
relational
model
to
do
a
dynamic
database.
The
difference
is
this
will
actually
work
we've.
Actually
we
actually
have
a
history
of
trying
this
in
this
type
of
dynamic
model
in
a
relational
database,
and
it
just
does
not
doesn't
perform
well.
A
So
here's
kind
of
what
a
what
some
data
in
that
table
might
look
like.
You
can
see
that
context.
Column
is
what's
giving
us
the
dimensionality
now,
so
this
isn't
isn't
perfect
right.
So
there's
still
this.
That
context
has
to
be
parsed
and
generated
consistently
and
in
the
same
way
that
this
value
is
just
text.
So
you
have
to
manage
that
consistently
as
well,
but
it's
it's
a
lot
better
place
to
be
so
queries.
You
can
actually
express
meaningful
queries.
A
B
A
And-
and
one
thing
we're
not
really
shown
here
is
there's
a
lot
there's
more
metadata
to
this.
So
that's
that's
really
important.
When
you're
doing
this
type
of
work,
where
you
need
to
define
attributes
on
the
fly,
there's
a
whole
whole
schema
for
dealing
with
the
attributes
themselves.
You
know,
what's
their
data
type,
what
are
they
so?
That's.
That's
also
really
critical
to
get
right.
B
So
we
started
with
a
multiple
relational
databases:
member
we
was
charted
by
country,
so
some
of
the
results
that
we
we
saw-
this
isn't
all
of
them.
But
this
is
a
few
high
points.
When
we
transition
into
Cassandra,
we
could
solid
eight
at
all
those
different
databases
into
once.
We
were
able
to
they're
not
completely
sunsetted
yet
but
they're
they're
on
their
way
out.
B
We
did
see
service
response
times
dropped
from
800,
milliseconds,
250
milliseconds,
so
a
huge
win
for
our
consumers
of
all
of
our
services.
They
really
appreciated
that
and,
of
course,
the
zero
downtime
we've
done.
I,
don't
1012
upgrades
even
refresh
the
hardware
completely
in
production
and
zero
downtime,
so
that
was
a
huge,
huge
success.
Yeah.
A
And
I
mean
no
joke
there,
actually
zero
downtime,
it's
literally
never
gone
down
in
production,
so
so
also
we
kind
of
learn.
Cassandra
is
not
just
for
time
series
right,
so
those
those
golden
use
cases
that
that
cassandra
is
a
really
good
fit
for
those
are
good,
but
there's
also
more
use
cases
you
can
that
Cassandra
really
works
well
for
and
that
set
of
use
cases
is
actually
growing
as
features
in
the
product.
Matures
eventual
consistency
is
worth
well
trade
off.
A
It
really
is
I
mean
if
you're
telling
me
I
can
get
always
on
active,
active,
active
database
where
I'm
always
writable
always
readable
in
multiple
data
centers
and
it's
fast
I'm
gonna
live
with
eventual
consistency.
It's
really
most
of
the
use
cases
it
just
it's
it's
immaterial,
it
doesn't
matter,
and
you
just
have
to
kind
of
know
where
it
does
matter
and
plan
it
appropriately
for
it
I.
Don't
think.
We've
run
into
a
use
case,
yet
where
we
haven't
been
able
to
satisfy
the
requirement
because
of
eventual
consistency
and.
B
Then
one
common
thing
that
you're
going
to
hear
through
which
I've
already
heard
several
sessions
is
the
the
data
modeling
is
critical.
We
I
can't
express
that
enough
to
the
teams
that
come
up
to
us
and
ask
for
ask
for
help
and
assistance.
Your
data
model
is
so
key
to
your
application
to
your
your
cluster.
If
that's
not
correct,
more
more
than
likely
is
going
to
fail,
so
pay
special
attention
to
that
data
model
and.
A
So
SI
q,
eles
expressiveness,
really
does
make
modeling
easier.
It
is
a
mind
shift
if
you're
coming
from
a
relational
background
and
you
kind
of
have
to
carry
people
along
with
you
know
you
we're
getting
conversations
all
the
time
and
like
whoa,
you
can't
join
these
two
tables,
you
just
can't
and
they
are
like
whoa
they're
stuck
at
that
point
like
well,
what
am
I
going
to
do
and
then
you
show
them
like.
Oh
well,
there's
these
collection
types.
A
B
Then,
for
those
use,
cases
that
require
the
full
text
search
right,
just
like
the
data
model,
be
intentional
with
your
solar
schema,
really
put
some
thoughts
behind
that
only
index
the
fields
as
you
really
need
indexed
and
leave
the
other
ones
out.
Just
just
really
put
some
thought
behind
your
solar
schema
in
general,
yeah.
D
Hello,
hey,
thank
you.
So
I
have
a
question
like,
as
you
mentioned
you
at
the
beginning,
there
are
a
lot
of
thoughts
about
securities
and
safe
keys,
so
I'm
wondering
in
your
stocks.
If
there
are
some
certain
attributes
that
are
sensitive,
how
do
you
prevent
it
from
being
globally
accessed?
How
do
you
solve
those
problems?
Okay,.
D
A
A
B
F
Thank
you
for
the
presentation
is
really
good,
because
this
is
walmart
has
the
size
and
scale
which
we
are
dealing
with
in
other
places.
My
question
is
around
when
you
look
at
a
data
model
of
an
existing
application
or
one
of
the
first
questions
that
get
asked
is
I
have
to
redo
the
data
model.
You
got
to
be
kidding
and
you
go
like
well
yeah
and
what
is
it
level
of
effort?
And
I
know
it
is
a
very
general
question
that
you
start
what
is
the
methodology
you
adopted?
F
A
So
obviously,
if
somebody's
coming
to
you
with
a
data
model,
the
first
thing
we
I
say
is
well
okay.
Thank
you
for
the
model
now
tell
me
about
your
queries
right
because
the
that's
not
documented
in
in
the
relational
model,
I've
got
to
know
what
what
your
read
and
write
patterns
are
in
order
to
transition
that
into
a
Cassandra
model.
So
it
can
be
tedious
right.
I
mean
I've
seen
some
massive
relational
potholes
with
you
know,
hundreds
of
tables
and
everything
and
I
mean
I,
don't
have
a
silver
bullet.
It's
not
you
going.
A
121
is
almost
never
the
right
way
to
do
it
unless
it's
just
a
trivial
model,
but
the
I
mean
I'll.
Tell
you.
The
data
modeling
class
from
datastax
is
very
helpful
and
it's
very
methodical
and
how
you
go
through
and
document
everything
and
do
your
erd
and
do
document
your
query
patterns
and
really
lay
it
out
and
I
mean
we've
gotten
a
huge
benefit
from
that,
but
it
just
takes
the
time
to
go
in
there
and
and
to
your
point
about
well.
You
know
that
may
now
I'm
going
to
redo
everything.
A
B
B
But
then,
when
you
get
into
the
use
cases
where
I
have
lots
of
entities
that
relate
to
other
entities-
and
I
have
all
these
different
queries
that
I
got
to
serve-
that's
where
it's
a
little
bit
more
in
depth
and
your
meeting
with
the
customer
several
times
to
get
it
right
and
working
through
some
prototyping.
So
it
takes
some
time
depending
on
how
complex
the
data
model
is.
C
Yeah,
my
question
is
more
on
the
zero
downtime
that
you
mentioned
right.
So
can
you
touch
a
little
bit
on
that,
because
I
would
assume,
like
some
of
the
models
have
to
be
rewritten,
like
you
might
have
got
into
some
data
model,
some
kind
of
tables
based
on
your
query
patterns
and
figured
out
hey?
This
is
no
more
a
right
thing
and
application
probably
expecting
that
data
from
that
data
that
set
of
tables-
and
now
you
have
to
rip
that
into
something
else.
So
in
conditions
like
that
which
would
take
this
almost
in
every
solutions.
A
A
A
We
would
check
the
new
table
to
see
if
the
data
was
there
and
if
it
wasn't,
we'd
fall
back
to
the
old
table,
and
that
way
we
did
this
slow
migration
of
doing
these
dual
dual
rights
and
then
you
can
think
of
it
as
like,
a
reed
repair
right
like
oh
well,
it's
not
not
in
the
new
table.
Let
me
read
it
from
the
old
table
and
update
and
move
that
data
into
the
new
table,
and
we've
done
that
a
few
times
and
it's
actually
worked
out
pretty
well
for
us.
G
A
Particular
use
case
I
I
can't
get
too
specific,
but
I
think
the
I
mean
the
problem
to
a
handful
of
entities
right
so
different
kinds
of
things
right
and
each
thing
may
have
you
know
dozens
to
hundreds
of
attributes
and
then
they're
all
very
different,
not
sure
how
else
to
describe
it
but
yeah.
This
is
this
sparse
attribution
of
of
information,
so
we-
and
this
all
goes
into
one
table
right.
So
we
we
basically
built
services
around
around
the
database
and
you
know
just
a
restful
interface
into
it
and
JSON
in
JSON
out.
A
So
so
back
in
that
cql
table
and
that's
not
the
only
way
to
do
it,
it's
beyond,
so
this
is
actually
one
way
to
do
it
and
so
to
the
keen
observer.
This
really
won't
work
in
solar
right
because
it's
a
cql
row
right,
like
if
you
search
you're,
going
to
get
a
single
cql
row
back
instead
of
all
the
all
the
rows,
all
the
attributes
for
the
the
ID.
So
what
we
actually
do
there
is.
A
A
But
what
that
let
us
do
is
it's
all
in
11
c
ql
row,
and
so
that
means
it's
it's
one
solar
document
and
we
can
actually
put
some
some
filtering
and
transformation
in
between
there
to
sort
of
make
solar
indexing
work
a
little
bit
better
so
like
we
could
discard
fields
that
we
don't
need.
We
know
we
don't
need
to
index.
So
that's
like
an
optimization,
you
can
do
there,
but
that
actually
works
really
well
for
for
solar.
B
A
No
I
wouldn't
say
we
have
that
I
think
the
way
we
would
solve
that
would
be
sort
of
further
up
in
the
stack
at
the
service
layer.
So
we're
we're
kind
of
big.
At
least
chat
and
I
are
big
on
microservices.
So
if
it
was
coming
from
a
relational
database,
I
would
probably
be
further
up
in
a
stack
and
then
you'd
have
a
service
that
would
combine
those.
We.
H
Have
situation
we're
in
a
table?
We
have
to
kind
of
data,
one
is
like
attribute
data
which
we
don't
know.
How
is
going
to
come,
there
won't
be
one
product
might
have
ten
attribute
or
might
have
like
100
and
the
other
data
where
we
want
to
have
the
roll
I
mean
column
level,
access
that
you
know.
Somebody
should
be
able
to
see
only
this
much
data.
H
H
H
B
H
D
Okay,
so
one
more
question,
so
in
this
data
model
you
mention
the
using
contacts,
the
text
to
save
extra
dimensions,
and
you
said
it
as
a
a
clustering
key.
So
my
question
is:
if
you
have
multiple
dimension
information,
you
want
to
save
into
this
context,
and
you
want
to
search
on
certain
dimension.
Then,
in
that
case,
when
you
are
acquiring,
are
you
performing
a
key
look
up
or
are
you
performing
a
search
because
you
have
multiple
dimensions
on
a
map
together
in
this
column,
single
column,
right.
A
So
in
so
in
this,
in
this
case,
I
mean
it
sort
of
works
out
for
us,
because
that
those
dimensional
that
goes
context
attributes
everything
we're
wrapping
up
in
that
context
is,
is
provided
to
the
service.
So
that's
part
of
the
lookup
part
of
the
key
look
update
that
you're
doing
so.
There's
kind
of
a
couple
of
ways
to
read
it.
One
you're
you're
querying
the
data,
and
you
know
the
context
so.