►
Description
Speaker: Adrian Cockcroft, Technology Fellow
The SimianViz microservices simulator contains a model of Cassandra that allows large scale global deployments to be created and exercised by simulating failure modes and connecting the simulation to real monitoring tools to visualize the effects. The simulator is open source Go code at github.com/adrianco/spigo and is developing rapidly.
A
B
I
wanted
to
start
off
by
asking
so
who
was
here:
I
wasn't
a
Cassandra
summit
last
year,
but
I
think
I
was
up
most
of
the
ones
before
that.
So
I
wanted
to
just
summarize
a
few
of
the
things
that
if
you
were
around,
then
you
might
remember
some
of
these
things.
So
back
in
2011
we
did
some
testing
while
I
was
at
netflix
and
on
scalability
and
we
were
running
24
nodes
and
we
weren't
sure
what
would
happen
when
we
went
to
48.
B
So
we
tried
scaling
up
and
it
kept
scaling
all
the
way
to
288.
So
I
posted
this
blog
post,
which
datastax
then
turned
into
an
advert
which
I
kept
popping
up
on
websites
or
the
web
for
the
following
year
or
something
that's
all
it
scales.
So
that
was
good
and
then
we
we
managed
to
persuade
AWS
to
do
solid
state
disks,
and
then
we
hoovered
up
the
entire
supply
of
solid
state
disk
based
instances
for
about
one
year
after
that,
so
everyone
else
said:
well,
they
exist,
but
you
can't
get
them
so
I'm.
B
B
So
now
I'm
at
Battery
Ventures,
it's
a
VC
firm,
and
these
are
all
the
different
things
I
do
now,
I,
don't
obviously
due
diligence
on
deals
for
the
VC
firm,
the
companies
that
come
in
I
advise
portfolio
companies,
giving
technical
support
sort
of
consultant
to
the
CTO,
scalability
cloud,
migrations,
Cassandra,
migrations
and
I'm
responsible
now
for
four
or
five
companies.
I
think
switching
to
cassandra
and
they've
seemed
pretty
happy
about
it.
B
Networking
with
interesting
people-
that's
all
of
you
if
you
want
to
contact
me
after
this
I've-
got
a
few
cards,
but
I'm
easy
to
find
so
just
go
to
Twitter.
If
you
want
to
talk
about
something,
if
you
want
to
start
a
company
or
you
want
to
talk
about
using
some
of
our
portfolio
company
interacting
with
people
I
also
mess
around
with
some
technologies
and
I'll
talk
a
bit
later
about
some
code,
I've
been
writing,
which
is
this
simulation
thing.
I
also
do
a
lot
of
conferences.
B
I'm
on
the
program
committee
I
present
a
lot
and
I
sort
of
travel
around
doing
internal
things,
but
I
also
do
quite
a
few
internal
talks
at
companies.
So
if
you're
particularly
interested
in
having
me
come
and
do
an
internal
talk
to
kind
of
persuade
your
team
that
you
should
be
doing
microservices
better
or
something
happy
to
discuss
that
and
see
if
I
can
fit
something
in
all
right,
so
topics
for
today,
I'm
actually
going
to
start
by
talking
about
micro
services.
From
the
point
of
view
of
why
is
this
an
interesting
thing?
B
It
was
good
to
see
Sam
Newman's
microservices
book
up
in
this
morning
when
Jonathan
put
up
the
slide.
So
you
know
that
it's
a
hot
topic
what's
going
on
there
and
why
is
that
a
Cassandra
sort
of
tends
to
drive
a
lot
of
the
a
lot
of
the
backend
for
microservices
ends
up
on
Cassandra
I.
Think
it's
a
very
good
fit
and
then
I'll
talk
about
more
about
simulating
architectures.
B
So
really
comes
down
to
you
know,
sort
of
you
know
the
canonical,
CIO
I,
guess
Canonical's
a
company,
but
the
CIO
that
that
you
know
the
generic
CIOs
that
are
trying
to
get
stuff
done.
The
kinds
of
things
that
they're
struggling
with
right
now
they
sometimes
call
it
the
digital
transformation
or
the
sort
of
data
deluge
or
whatever.
But
what
they're
really
trying
to
do
is
align
the
IT
departments
with
the
business
and
what
does
that
mean?
How
do
they
build
get
more
product
focused
instead
of
more
project
focused?
B
You
know
threat
problems
where,
if
we
don't
innovate
as
fast
as
you
know,
some
you
know:
Bay
Area
company,
that's
got
a
big
web
services
thing
and
you're
an
old
enterprise.
If
you
can't
go
that
speed,
then
you're
going
to
get
eaten
by
them
and
then
also
try
not
to
get
briefed,
there's
a
lot
of
focus
on
security
and
building
systems
that
are
much
more
robust
now.
B
So
when
we
were
trying
to
figure
out
architectures
in
20
2009
Netflix,
we
were
listening
to
10,
who
was
one
of
the
original.
This
is
part
of
the
original
idea
behind
one
of
the
ways
of
doing
DevOps,
which
is
that
if
you
built
something,
you
run
it.
So
that
means
the
developers
own
things
in
production.
Everything
is
self-service,
so
you're
on
call,
and
what
that
drives
is
that
developers
have
to
own
building
things
and
own
deploying
things,
and
it's
really
nice.
B
If
you
can
deploy
an
entire
global
cassandra
cluster
in
a
few
minutes
right
rather
than
having
to
go
and
argue
with
vendors
for
a
few
months
and
or
opt
people,
you
know
ops,
teams
or
whatever
it
is
they've
just
push
a
button,
and
it
happens.
That
is
an
incredibly
powerful
thing.
This
is
much
more
than
just
safes
new
you're
standing
up
a
webpage
on
Heroku,
which
you
can
do
in
a
few
seconds.
The
ability
to
deploy
you
know,
terabytes
or
no
petabytes
of
storage.
B
Trivially
is
really
important,
so
it
means
that
developers
are
responsible
for
going
faster
and
they're,
also
responsible
for
the
efficiency
of
what
they're
doing
how
you
know
scaling
it
down
when
it's
not
being
used
and
tidying
up
after
themselves
and
how
efficiently
they
build
product
as
well
and
then
they're
also
responsible.
Now
for
the
the
safety
of
it,
you
know
how
rugged
or
how
secure
systems
are
now
good,
I'm
not
going
to
go
through
it.
I
don't
have
time
to
go
through
this
much
today.
B
But
if
you
look
up
my
presentations
online
I've
got
lots
of
presentations
that
go
into
this
and
a
lot
more
detail,
but
this
is
what
continuous
delivery
looks
like
nowadays,
you're
trying
to
observe
what
to
do
orient
yourself,
decide
which,
which
option
to
take
and
then
act
on
it
and
for
observe
it's
really
about
innovation.
Can
you
figure
out
what
you
should
be
doing?
Can
you
find
that
customer
pain
point?
Then
you
use
big
data
analytics
to
creates
to
figure
out
what's
going
on.
The
typical
thing
here
is
you're
trying
to
answer
a
question.
B
That's
never
been
asked
before.
This
is
why
you're
groping,
through
you
know
so
gripping
through
log
files,
you're
importing
cleaning
stuff
up
you're,
looking
in
places
that
no
one's
looked
at
before
to
find
this
piece
of
customer
painless.
You
know
one
entry
sort
of
scat
hiding
in
in
a
lot
of
log
files,
then
you're
going
to
figure
out
what
to
do
as
if
you
have
the
right
culture.
You
can
get
stuff
done
quickly
because
it's
also
self-service
and
you
don't
have
to
go
and
ask
for
permission
and
you're
working
on
small
pieces
of
work.
B
So
it's
easy,
and
then
you
use
cloud
to
do
employ
to
deploy
this
automatically
and
that
getting
around
this
loop
as
fast
faster
than
your
competition
is
what
makes
you
competitive
nowadays
and
a
lot
of
people
are
suffering
from
this.
So
you
don't
just
go
around
the
Luke
you
bounce
around
it.
So
once
you've
got
that,
then
this
works
fine
as
long
as
there's
a
handful
of
people,
but
what
happens
when
there's
a
large
team
of
people?
What
happens
is
if
you
build
a
monolithic
application?
You
have
a
hundred
people
or
working
on
it.
B
They
keep
stomping
on
each
other's
toes.
They
keep.
You
know,
checking
in
code
that
breaks
somebody
else's
code
and
it's
hard
to
get
a
bill
put
together
and
what
you
tend
to
find,
and
maybe
some
people
have
heard
this
phrase
from
their
QA
team.
Can
we
have
more
time
to
test
this
release,
as
anyone
ever
heard
that
know
if
you
give
them
more
time,
it
turns
out
the
release
after
that
will
be
even
bigger
because
the
developers
kept
putting
more
features
in.
B
So
you
get
more
and
more
features
in
a
release,
it's
harder
and
harder
to
test.
So
the
correct
thing
to
do
to
get
around
that
is
actually
to
shrink
the
size
of
releases
and
do
smaller
releases
more
often
and
to
break
the
releases
into
multiple
pieces,
and
this
is
really
what
microservices
is
about.
What
you're
trying
to
do
is
how
not
have
one
release
plan
where
everyone's
trying
to
coordinate
a
hundred
people
trying
to
coordinate
around
it.
You
have
multiple
release
plans
with
a
handful
of
people
for
each
and
you
can
do
it.
B
Then
you
also
get
the
ability
to
use
different
languages,
different
systems,
different
development
approaches.
You
know
the
front
end
them
in
the
middle
tier
they're.
All
different
kinds
of
things
can
use
whatever
they
want,
because
they're
not
all
lock,
stepped
on
a
single
release
model.
Now,
when
you've
got
multiple
things,
you
want
to
bundle
them
together
into
a
common
platform
and
common
way
of
doing
that
now
is
to
put
things
in
docker
containers
and
your
deployment
platform
just
says:
I
know
how
to
stick
containers
out
with
Netflix.
B
B
There's
no
reason
not
to
do
it
as
often
as
you
need
to,
but
what
you
can
do
do
then
is
make
each
release
be
a
single
thing,
which
means
that
it's
really
easy
to
tell
whether
it
worked
it's
easy
to
back
it
out.
If
it
didn't
work
and
you're,
not
disrupting
everyone
else
to
the
coordination
time,
it
turns
out
to
be
the
biggest
thing.
That's
slowing
everyone
down
so
what's
happening,
then,
is
as
we
reduce
the
cost
and
the
size
and
the
risk
of
change.
We
increase
the
rate
of
change.
B
So
this
isn't
weird
we're
not
doing
this
typical.
You
know
six
monthly
releases,
we're
not
doing
those
ten
times
a
day,
we're
releasing
tiny
fragments
of
a
release.
One
feature
or
one
update
or
one
bug
fix
at
a
time
and
you've
got
a
continuously
evolving
system.
Right
now,
if
you're
coming
from
a
world
where
yeah
we
go
and
we
spent,
we
have
a
project
and
we're
going
to
upgrade
SI
p
for
the
next
nine
months,
and
then
everyone
sort
of
gets,
exhausted
and
runs
away
from
it.
B
B
So
this
is
the
other
labels
for
this,
but
mostly
when
people
talk
about
microservices
they're
talking
about
the
architectural
patterns
that
people
use
to
do
this-
and
my
definition
is-
is
here
I'll
just
sort
of
pop
up
some
more
bits
here
you
want
it
to
be
loosely
coupled,
so
you
can
update
different
pieces
of
it
independently
and
you
want
to
have
a
bounded
context,
because
that's
how
you
decide
what
is
the
right
size
or
microservice?
How
much?
How
big?
Should
it
be?
B
How
many
different
things
should
you
put
in
it,
and
what
do
you
really
want
is
to
have
each
microservers?
Do
one
thing
right:
it
should.
You
know
miss
this
when
you
get
this
when
this
applies
to
the
data
tier,
you
end
up
building
data
club,
you
know
data
stores
that
do
one
thing,
so
you
have
lots
and
lots
of
Cassandra
clusters.
You
have
one
Cassandra
cluster,
with
lots
of
key
spaces
in
it
or
a
big
huge
schemer
with
everything
golden
together.
You've
got
lots
of
denormalized
stores
and
they
all
work
independently.
B
They
scale
independently
they're
owned
independently
and
you
typically
put
a
data
access
layer
in
front
of
them.
You
funnel
everything
through
so
you're
trying
to
turn
everything
looks
more
like
a
rest
interface.
They
actually
don't
care.
What's
behind
it,
you
can.
This
is
this
is
one
of
the
common
patterns.
They
say
you're
on
my
sequel.
B
You
want
to
get
to
cassandra
one
of
the
open-source
netflix
projects
called
stash
storage
tier
as
a
service
over
HTTP
with
to
a
stash
with
2
a's
in
it,
and
that
is
a
java
app
with
an
HTTP
interface
and
it
has
a
cassandra
client
library
in
it,
and
it
has
a
my
sequel,
client
library
in
it.
So
you
put
that
in
front
of
my
sequel
start
using
it.
B
Then
you
add
cassandra,
then
you
start
gradually
moving
breaking
pieces
of
the
of
your
schema
off
and
then
gradually
you
end
up
all
on
Cassandra
and
the
but
you've
abstracted
it
up
into
this
data
access
layer.
So
you
actually
don't
care
what's
behind
it
anymore
right.
So
that's
the
kind
of
pattern,
there's
a
whole
lot
of
books.
That
I
think
are
interesting
in
this
space,
in
particular
the
domain
driven
design
book.
B
You
need
to
understand
the
drift
into
failure
book,
because
that
explains
why
these
perfectly
reliable
systems
will
occasionally
fall
over
because
they
hide
all
the
brokenness
until
all
the
things
that
are
broken
gang
up
on
you
and
finally,
it
tips
over
and
you
find
10
things
are
broken
because
it
was
so
resilient.
It
was
hiding
the
nine
other
things
that
you
didn't
know
about,
so
you
have
to
understand
how
to
get
in
there
and
root
out
the
things
that
are
going
to
bite
you
later,
as
you
build
systems
are
more
and
more
highly
available.
B
So
I'll
talk
a
bit
about
the
these
micro
services
and
cloud
native
and
monitoring.
So
what
we've
got
here
is
in
the
cloud
native
world
is
a
very
high
rate
of
change.
We
have
code
pushes
causing
floods
of
new
instances,
floods
and
new
metrics
and
there's
a
very
short
baseline
for
analysis.
The
configurations
are
very
ephemeral.
They
don't
live
for
very
long.
They
could
even
live
for
less
than
a
second.
B
Some
analysis
of
docker
lifetime
showed
the
most
common
lifetime
for
a
docker
container
for
over
a
pair
three
or
four
month
period
of
New,
Relic
monitoring,
all
the
docker
containers
they
monitored,
the
the
most
common
life
time
was
1
minute
right
and
the
second
most
common
lifetime
was
0
minutes
right
less
than
a
minute,
and
then
you
know
more
than
one
minute
was
way
down.
There
was
a
graph
with
two
tall
spikes
and
a
whole
lot
of
small
spikes
right
minute
by
minute
and
and
these
microservices
are
calling
each
other
in
complex
patterns.
B
If
you
just
look
within
cassandra,
cassandra
is
sort
of
a
micro
service
architecture.
The
the
flow
between
my
cassandra
nodes
is
actually
fairly
complex,
all
the
gossip
and
the
painted
handoff
and
all
the
replication.
So
when
you
look
at
how
do
you
scale
microservice
patterns
and
how
do
you
monitor
them?
One
part:
one
problem
is
managing
the
scale
and
it's
not
just
the
number
of
machines.
There's
this
big
complex
hierarchy.
You
have
it
distributed
across
continents
and
regions
and
lots
of
zones
if
you're
an
AWS
or
data
centers.
B
If
you're
on
your
own
systems,
there's
lots
of
different
versions
of
things
running
at
once.
There's
lots
of
containers
and
you
can
have
tens
of
thousands
of
machines.
So
how
do
you
deal
with
that?
And
one
problem,
then,
is:
what's
the
flow
look
like
you
know,
how
is
your
when
you
have
a
monolithic
app?
You
hit
one
side
of
it,
you
stayed
inside
it
and
then
it
popped
out
at
the
database
right
that
that's
relatively
simple,
there's
no
real
flow
there.
B
But
when
you
build
these
micro
service
systems,
you
want
to
know
what's
going
on
so
the
flu
flow
visualization
tools
out
there,
the
left
one
is
a
Netflix
one.
I,
don't
think
they've
released
bit
cold
cell
top
writers,
a
nap
dynamics
flow
which
works
up
to
a
certain
amount
of
scale.
But
you
know
once
you
get
into
huge
numbers
of
microservices,
it
can
get
more
challenging
to
visualize,
and
the
bottom
right
is
a
Twitter.
B
Is
the
output
from
a
Twitter
tool
called
Zipkin,
which
is
a
a
flow
based
monitoring
open
source
tool
which
is
currently
in
the
process
of
being
turned
into
a
more
standardized
thing,
so
an
open
Zipkin
project?
That's
going
on
right
now,
so
that
looks
like
a
place
where
we
could
standardize
things.
But
if
you
look
at
the
architecture
diagrams
of
a
lot
of
sites,
this
is
kind
of
what
they
look
like.
You
have
hundreds
of
services
talking
to
each
other,
and
you
have
what
I
call
death
star
diagrams
everything
turns
into
a
big
circle.
B
You
can
no
longer
see
what
the
structure
is
and
that's
a
problem,
because
we
have
interesting
failure
modes
that
are
part
of
the
structure
of
the
system,
and
when
we
look
at
failures,
we
want
to
understand
that
a
zone
went
down.
Not
that
looks
like
about
a
third
of
our
machine
suddenly
disappeared
right.
If
you
get
a
power
cut
or
you
lose
connection,
you
need
to
understand
structurally
that
this
is
everything
in
their
zone
broke
rather
than
everything
else
so
I
mean
you
should
know
this
right.
B
If
this
is,
this
is
a
sort
of
a
diagram
of
a
3
zone
system.
You've
got
a
load
balancer
at
the
top.
Then
you
hit
an
API
proxy.
You
had
a
bunch
of
business
logic.
Then
there's
one
of
these
data
access
layers
in
in
the
back
I've
got
a
12,
node
Cassandra
cluster,
with
three
nodes
in
each
zone
right
the
traffic
comes
in
and
when
it
writes,
it
writes
to
Cassandra
and
that's
how
the
data
gets
to
the
other
zone.
So
the
next
request
can
go
to
a
different
zone
and
the
data
is
already
there.
B
So
this
is
typical.
3-Way
replication
should
be
pretty
familiar
to
anyone,
that's
playing
with
Cassandra.
Now,
if
you
lose
a
zone,
what
what?
What
is
your
monitoring
tool
say
if
you
lose
an
entire
zone
right,
you're,
probably
going
to
get
a
massive
flood
of
errors?
Now
the
system
is
going
to
explore
your
logging,
your
monitoring
systems,
going
to
sort
of
complain,
mightily
right,
but
what
should
it
really
do?
It
should
give
you
one
message
saying
you're
still
up
but
don't
mess
with
it.
B
You
lost
all
your
redundancy
all
right,
because
cassandra
is
designed
to
run
on
two
out
of
three
zones
as
part
of
the
point
right,
that's
why
we
were
doing
this.
So
the
fact
that
you
lost
an
entire
zones
worth
of
systems
isn't
actually
an
outage
to
the
end
user
or
shouldn't
be
right.
Maybe
there's
a
few
retries
as
it
sort
of
glitches
a
bit
and
you
want
to
get
into
the
ALB
and
have
it
stopped
sending
traffic
to
the
dead
zone.
B
B
So
the
codes
on
github
there's
a
front-end,
that's
using
d3,
it's
a
JavaScript
front,
end,
there's
a
back-end
written
in
go
and
they
aren't
really
connected
together.
Yet
one
of
the
things
I'm
gradually
working
on
is
getting
getting
it
so
that
the
front
end
and
the
back
end,
you
can
actually
control
the
back
end
in
real
time
and
visualize
what's
happening.
What
happens
now?
Is
you
run
the
back
end
on
the
command
line?
It
saves
lots
of
files
which
you
then
visualize
to
see
what
what's
there.
So
how
do
I
define
that
architecture?
B
B
One
of
the
things
I
have
is
a
chaos
monkey.
So
one
of
the
nodes
gets
deleted
at
some
point
during
the
simulation,
but
I
didn't
want.
I
didn't
need
that
so
I
didn't
give
it
a
name
and
then
I've
got
these
tears
in
the
first
one.
You
can
see
I
just
call
Cassandra
I
have
a
package
which
implements
the
behaviors
and
I'll
show
you
what
that
looks
like
later.
B
But
this
is
a
a
go
package
which
pretends
to
be
the
sort
of
priam
Cassandra
mix,
primas
Netflix's
management,
TF
a
wrapper
for
Cassandra,
so
I
put
the
two
names
there
and
it
depends
on
itself,
which
means
the
Cassandra
nodes
talk
to
each
other.
So
my
dependency
list
by
cassandra
has
to
list
Cassandra
that
and
you
can
have
any
name.
You
can
create
as
many
Cassandra
clusters
as
you
want
doing
this
right,
I
caught
six
nodes
in
one
region.
B
The
next
one
is
my
stash
data
access
layer
which
is
just
a
rest
data
access
layer,
so
I
just
called
it
rest
data
that
depends
on
Cassandra.
I've
got
six
nodes.
For
that
too
then
I've
got
my
business
logic.
Carry
on
is
the
name
of
the
netflix
project
for
a
generic
business
logic.
I've
got
12
of
them.
It
depends.
This
is
a
very
simple
app.
It's
just
straight.
B
Through
I've
got
an
API
proxy
I
have
a
load
balancer
and
at
the
top,
I
have
sort
of
the
DNS
entry
denominators,
the
name
of
the
DNS
management
layer.
That
thing
that
Netflix
built.
But
the
point
here
is
that
at
the
top
level,
when
you're
doing
globally
distributed
systems,
you
have
to
have
something
that
isn't
in
a
region
right.
B
So
it's
in
zero
regions
and
they're
0
count
because
it's
just
a
DNS
entry
and
you
can
see
that
the
elb
there
isn't
actually
an
instance
for
the
ALB,
it's
just
a
thing,
but
there's
one
per
region.
So
that's
why
those
numbers
are
set
up
that
way,
all
right,
so
when
you've
done
that
what
it
looks
like,
hopefully
you
can
see
this
the
architecture.
The
leftmost
thing
is
the
purest
form
of
this
architecture.
It's
just
one
of
each
right,
so
at
the
front,
I've
got
no
bring
a
mouse
over
pointer.
B
This
simulation
up
and
I
said,
make
it
two
hundred
percent
bigger
and
then
the
third
one
I
made
it
four
hundred
percent
bigger.
So
now,
I've
got
a
twenty
four
node
and
a
48
node
Cassandra
cluster
three
zones
and
I
can
make
an
arbitrary,
complicated
architecture
and
scale
it
to
an
arbitrary
size
and
at
first
times,
I've
had
a
hundred
thousand
nodes
running
on
my
laptop
a
couple
of
gig
of
ram,
which
is
ridiculous
to
try
and
do
in
real
life
right.
So
I
can
what
the
point
of
the
simulation
is.
B
I
can
create
architectures
that
you
couldn't
really
create
in
real
life
either
you
couldn't
create
them
at
all.
You
couldn't
create
them
cost-effectively.
So
what
I'm
trying
to
do
is
model
these
architectures.
The
next
thing
I
can
do
from
that
same
file
without
any
changes
all
I
did
was.
This
is
all
I
all
I
specified
I
can
tell
it
to
have
up
to
six
regions.
So
then
I
do
multi-region
cassandra
and
the
first
one
on
the
Left.
That's
a
to
region,
one!
Let's
hope
you
can
see
it.
B
The
contrast
isn't
that
great
the
dot
in
the
middle
at
the
top
is
that
is
the
endpoint
that
splits
by
DNS
to
the
two
sides.
You
can
kind
of
see
the
six
region,
six
zones
there
in
two
regions,
and
then
it
goes
into
this
Cassandra
cluster.
That's
all
clumped,
together
in
the
middle
right
and
the
one
on
the
right
is
33
regions
right.
B
B
So
so
the
system
can
export,
monitor,
monitoring
tools
as
if
it
was
a
real
machine
with
a
real
name
with
a
real
amazon
IP
address,
even
though
it's
just
faking
it
all
in
my
laptop-
and
this
is
what
four
five
and
six,
and
by
that
time,
I
get
back
to
my
same
problem
that
I
had
previously
right
there
still
that's
looking
like
a
Death
Star,
so
you
can
see
four
and
then
I
just
gave
up
trying
to
stretch
everything
out
and
it's
all
sort
of
into
a
big
blob.
Now
this
is
a
simple
system.
B
It's
got
one
cluster,
it's
just
got
a
few
things
connected
in
front
of
it.
I
try
to
create
something
more
realistic.
This
is
a
this
one
has
three
different
Cassandra
clusters
to
them:
sort
of
overlapping,
each
other
there
and
two
different
endpoints,
which
is
again
a
fairly
simple
thing,
but
you
can
see
it's
getting
harder
and
harder
to
visualize
these
systems.
So
there's
two
things
I'm
trying
to
do
here.
One
is
to
create
these
architectures,
which
you
can
then
feed
to
monitoring
tools
and
say:
can
you
figure
out
how
to
visualize
this
better?
B
Please
so
I'm
trying
to
offend
people
by
showing
them
badly
visualize
things
to
get
them
to
build
better
visualizations.
So
there
are
some
people
out
there
using
my
tool
to
generate
architectures,
which
there
then
feeding
into
their
monitoring
systems
in
order
to
figure
out
how
to
display
things
better,
there's
a
kind
of
Gordon
stoner.
That's
actually
using
my
tool
now
been
talking
to
other
tools,
vendors
about
this
too.
So
that's
you
know
this
I
could
get
this
to
be
so
complicated.
B
I
can't
see
it
so
when
we
can
render
this
better,
maybe
I'll
generate
some
more
complicated
ones.
So
anyone
here
ever
written
any
coding
go
if
you
go
programmers,
ok,
so
this
the
code,
I
have
isn't
very
idiomatically
go
code,
but
it's
somebody
told
me
it
look
more
like
Erlang,
but
this
is
what
the
thing
looks
like
and
it
ends
up
being
the
entire
package
that
implements
all
my
Cassandra
behaviour
is
about
200
lines
of
code.
B
I
wrote
about
half
of
it
in
a
few
hours
on
a
plane
last
weekend,
while
flying
from
back
from
the
UK.
So
you
know
four
or
five
hours.
I
was
able
to
do
a
fairly
significant
upgrade
to
it.
So
basically,
every
go
routine
is
simulating,
which
is
a
thread
basically
is
simulating
a
real
machine
right.
So
it
way
you'd
have
an
instance
or
a
container
or
something
you
want
to
model.
B
I
have
a
go
routine
and
it's
just
sits
there
in
memory
and
I
sent
I
have
channels
for
connecting
them
together
and
I
can
send
messages
back
and
forth
at
a
few
hundred
thousand
a
second
right.
So
that's,
basically,
what's
going
on
every
service
looks
roughly
like
this.
You
have,
it
has
a
listener
to
a
channel
which
is
basic
like
a
it's.
B
It
is
a
channel
that
has
traffic
that's
going
on
it
right,
so
all
of
those
diagrams
were
all
fully
connected.
Now,
when
you
send
messages
through
the
system,
it
records
the
flow
it
took
through
the
system
and
I'll
show
you
what
that
looks
like
so
here's
here's
my
stash
talking
making
one
request.
This
is
a
single
request
coming
into
the
Cassandra
system
that
I'm
simulating
and
the
second
thing
says
TP
and
s.
So
s
is
the
span.
A
span
is
a
connection
between
two
microservices
right.
B
So
that's
the
span
right
each
span
has
a
unique
number.
The
tea
is
the
transaction
or
the
request,
and
every
every
request.
Thats
related
has
the
same
transaction.
So
when
you
come,
when
you
hit
the
edge
of
this
system,
it
creates
a
new
transaction
number
and
then
the
one
in
the
middle
is
the
parent
span.
B
So
at
every
spare
every
time
you
you
land
in
an
inner
microservice,
and
you
want
to
call
out
you
take
the
span
that
got
you
there
and
you
stick
it
in
as
the
parent
and
that
way
you
can
actually
build
this
tree.
So
what
I've
got
here
is
the
aput
going
into
Cassandra
multi-region
Cassandra
cluster
that
wanted
to
replicate
the
data,
and
this
is
actually
what
happens
in
the
simulation.
B
So
when
you're
talking
to
somebody,
you
don't
actually
know
what
their
shards
are.
It's
not
the
cassandra.
Has
this
full
view?
Every
node
knows
everyone
everyone's
token
in
the
ring
right.
Well,
the
way
I
have
it.
I
don't
really
know
that
yet
so
I've
got
a
tinker
with
it
a
bit
more.
So
the
reason
there's
another
series
of
calls
is
because
I
landed
in
zone
because
cash
for-
and
it
turns
out
that-
isn't
the
right
owner
for
this
piece
of
data,
so
I
have
to
copy
it
to
one
of
the
other
nodes
in
Zone
B.
B
B
It's
basically
a
protocol
simulator
where
you
can
create
arbitrary
protocols
with
arbitrary
forwarding
logic
and
you
can
go
and
explore
what's
going
on
and
we're
gradually
getting
better
simulations
a
better
visualizations
of
the
flow,
so
I
want
to
take
these
flows
and
put
them
into
a
much
better
trace.
Visualizer
and
I
want
to
get
the
output
to
be
in
Zipkin
format
so
that
it
basically
can
feed
anything.
That
knows
how
to
do
that,
all
right.
B
So
why
am
I
building
this
and
partly
because
I
think
the
tools
are
currently
doing
a
bad
job
of
monitoring
these
large-scale
configurations
and
particularly
systems
with
cassandra
in
them?
We've
got
a
lot
of
internal
structure
to
Cassandra
that
very
few
tools
really
understand
so
I'm
trying
to
get
people
to
understand
those
structures.
I
also
want
to
be
able
to
grow
and
shrink.
This
have
auto
scaling
so
that
this
network
that
I've
created
is
not
a
fixed
Network.
It
grows
and
shrinks
it's
actually
a
dynamic
graph
and
I
can
grow
it.
I
can
shrink
it.
B
I
can
actually
forget
nodes.
I
can
I
can
basically
have
chaos,
monkeys,
killing
things
and
I
can
forget
links
I
can
create
partitions,
so
I
can
have
a
globally
distributed
system
with
traffic
running
through
it.
Then
I
can
cause
certain
types
of
partitions
and
I
can
cause
types
of
outages
and
then
I
can
try
and
make
sure
the
simulators
can
handle
that.
So
that's
that's.
Where
we're
going
on
so
just
to
take
away
on
that,
if
you
want
to
know,
what's
really
happening
now
in
enterprises,
the
Lean
Enterprise
book
is
really
documents.
B
The
struggle
a
lot
of
people
going
through
to
get
things
brought
into
this
modern
age.
It's
sort
of
the
enterprise's
trying
to
learn
how
to
do
continuous
delivery,
building
microservices
an
awesome
book
for
figuring
out
all
the
different
transitions
you
have
when
you're
trying
to
get
to
this
place
and
I
think
Cassandra
has
a
really
keep
key
place
in
the
whole
microservices
continuous
delivery
transition.
B
So
got
we
start
a
little
bit
late.
It
could
take
a
few
questions
and,
like
I
said,
I
work
at
Battery
Ventures.
This
is
the
these.
Are
the
companies
that
I
mostly
play
around
with?
This
is
our
current
portfolio.
So
if
you
want
to
talk
to
me
of
any,
if
these
companies
happy
to
do
that,
it
questions
yep.
A
B
Talking
about
the
mapping
of
tables
to
basically
clusters
and
microservices
the
way
you
can
do
it
lots
of
ways,
but
the
sort
of
best
practice
way
I.
Think
of
building
out
a
micro
service
architecture.
Is
you
want
to
own
the
data
sources?
You
want
to
own
a
data,
access
layer
and
force
everybody
to
go
through
that
data
access
layer?
You
don't
want
anyone
acts.
Anyone
else
accessing
your
database.
You
want
to
be
able
to
hide
any
maintenance,
work
or
upgrades
or
anything
you
might
want
to
be
doing.
B
You
can
hide
it
behind
that
data
access
layer,
which
means
your
business
logic
code,
is
all
using
rest
calls
into
the
data
access
layer
from
everywhere
else.
The
netflix
that's
enforced,
using
security
groups.
So
the
only
thing
that
Cassandra
trust
is
its
data
access,
layer
right
and
everyone
else
has
to
go
into
that.
So
that's
that's
one
model.
You
then
you're
optimizing
differently
here.
So
you've
gotta
read
denormalized
data
model,
you
have
your
rights
are
sprinkling
across
the
system
and
you
have
to
have
checkers
that
make
sure
all
these
databases
stay
in
sync.
B
So
you
do.
There
is
some
work
to
be
done
there,
but
that
can
also
happen
in
the
data
access
layer.
You
can
have
threads
running.
There
may
be
at
night
when
it
gets
quiet
and
traffic
drops.
They've
got
some
extra
capacity
and
you
can
use
that
capacity
to
make
sure
that
all
of
your
foreign
keys
exist,
and
you
know
all
the
references.
All
the
cross
referencing
between
clusters
actually
still
works
right.