►
From YouTube: Microsoft: Building a massively scalable system with DataStax and Microsoft's next generation PaaS
Description
Speaker: Rob Bagby, Cloud Architect
We have the challenge of how to reliably store massive quantities of data that are available even in the face of infrastructure failures. We have similar challenges on the application side. The most successful cloud architectures break applications down into microservices. How then do we deploy, upgrade and manage the scale of those microservices? This session will illustrate how to tackle these challenges by taking advantage of both Cassandra and Microsoft's next generation PaaS infrastructure called Azure Service Fabric.
A
B
Thank
you
appreciate
that
Mike
again
my
name
is
Rob.
This
is
his
is.
We
both
are
cloud
architects
for
Microsoft
and
we're
going
to
be
talking
to
you
about
building
scalable
systems
and
kind
of
the
marriage.
Marriage
between
service
fabric,
which
is
our
next
generation,
Paz
infrastructure
and
Cassandra,
and
so
I
think.
B
The
interesting
thing
that
comes
about
when
you
start
thinking
about
building
scalable
systems
is
that
scalability,
linear,
scalability
isn't
isn't
necessarily
the
only
challenge
you're
going
to
come
up
with
you're,
going
to
start
to
have
you're
going
to
start
to
see
a
series
of
questions
getting
raised,
and
that's
because
the
challenge
is:
is
that
we're
running
these
massive
clusters
on
data
centers?
Well,
we
don't
have
physical
control
of
the
hardware.
We've
got
these
clusters
running
out
there,
and
our
services
are
running
on
servers
that
are
on
commodity
hardware,
so
the
servers
are
going
to
fail.
B
Our
data
is
sitting
on
servers
that
are
going
to
fail
and
we
have
to
be
able
to
deal
with
all
of
those
all
of
those
eventualities,
and
so
these
questions
come
about.
You
can
kind
of
bucket
them
into
one
of
two
areas:
either
kind
of
application
focused
or
data
or
data
focused
and
from
the
application
side.
Some
questions
are:
how
do
I
know
if
my
vm
is
running?
B
How
do
I
know
if
my
applications
running
and
if
my
service
isn't
running,
where
do
I
move
it
do
I,
have
any
do
I
have
any
nodes
out
there
with
capacity
to
run
those
on
do
I
have
any
nodes
that
have
that
meet
the
constraints
that
that
my
service
needs
to
run
on.
How
do
I
move
it
over
there?
What
happens
when
I'm
running
on
something
that's
too
hot?
When
do
I
move
it
do
I,
have
anything
I
can
move
it
to
from
the
data
side.
B
The
questions
come
about
saying
how
do
I
deal
with
linear
scalability?
How
do
I
make
sure
that,
in
the
eventualities
at
perhaps
a
national
disaster
comes
up
and
a
data
center
goes
down,
I
want
my
application
to
still
have
access
to
my
data.
How
do
I
handle
all
of
those
things?
Well,
that's
what
we're
going
to
be
talking
about
today.
How
do
we
answer
those
questions
so.
A
We
kind
of
boil
down
everything
what
everything
of
what
Rob
just
said.
You
know
we
actually
ended
up
with
more
or
less
five
problem,
domains
that
we
need
to
keep
an
eye
we're
trying
to
deal
with
the
situation
of
building
a
system
that
is
highly
scalable
right.
First
of
all,
we
need
to
keep
in
mind
that
a
real
scale
we
need
to
do
it
with
ease
and
fast.
It's
extremely
important
that
we
take
that
into
consideration
along
the
same
moon
dust
along
the
same
lines.
A
We
have
to
consider
that
whenever
we
are
scaling,
we
need
to
make
sure
that
we
maximize
the
resources
where
what
our
application
is
running.
Think
about
system
that
has
a
large
amount
of
nodes,
large
semantic
computer
resources.
You
want
to
make
sure
that
every
single
resource
that
you
are
using
in
the
service
or
your
data
right
is
utilized,
because
otherwise
you
are
wasting
resources
which
you
don't
want
also
brings
another
set
of
problems.
You
know
having
massive
system
brings
north
of
the
problems
into
taking
consideration.
For
instance,
how
do
you
maintain
the
system
over
time?
A
You
know,
maybe
you
have
to
rethink
the
way
you
actually
version
and
manage
updates
with
your
application.
So
today
we're
going
to
show
you
how
you
can
do
that
using
the
next
iteration
of
a
platform
service
offering
nature
in
conjunction
with
data
static
text.
Last
but
not
least,
we
need
to
assume
that
failure
is
going
to
occur
right
so
embedded
in
your
system.
We
need
to
come
up
with
architectures
and
constructs
that
anticipate
that
scenario,
so
we
can
maintain
the
availability
or
system
no
matter.
A
B
So
we
kind
of
raised
all
these
questions
and
saying
here,
challenges
that
we
have
now.
We
need
to
start
looking
at.
How
do
we
solve
those,
and
so
the
guess
the
big
question
for
for
us
is
how
does
Microsoft
solve
those?
How
do
we?
We've
got
a
series
of
services
that
we
run
at
scale
on
commodity
hardware,
inside
of
our
data
centers,
we
have
sequel
Azure,
which
is
running
millions
of
databases.
We've
got
a
no
sequel
offering
out
there
document
DB
we've
got
Cortana.
B
B
It
becomes
too
complex
when
you
have
these
massive
massive
systems
running
its
scale,
so
that
infrastructure
we
have
is
call
the
azure
service
fabric
you
might
have
heard
to
refer
to
as
our
next
generation
of
paths,
orpaz
v2,
but
it's
called
service
fabric,
and
an
interesting
thing
to
note
is
the
service
fabric
that
we're
packaging
up
out
there.
For
people
to
build
applications
on
is
not
a
watered-down
version.
It's
not
a
scaled-back
version.
B
It's
not
a
it's,
not
a
lite
version
of
what
we're
using
it's
the
exact
same
codebase,
we're
running
all
of
our
services
are
not
all
of
our
services,
but
those
services.
I
mentioned
it's
the
exact
same
infrastructure.
All
we're
doing
is
packaging
it
up
to
make
it
easier
to
use
for
general
purpose,
and
so
that's
what
we're
going
to
be
chatting
to
you
about
today
and
here's
a
here's,
a
list
of
several
two
hundred
some-odd
customers
that
are
running
this
in
private
preview
right
now
out
in
the
wild
I.
B
Think
the
most
interesting
kind
of
thing
to
glean
from
this
slide
is
that
modern
architectural
patterns,
it's
the
third
point,
I
have
down
there.
If
you
talk
about
cloud
architectural
patterns,
it
won't
be
too
long
until
you
start
hearing
about
micro
services
and,
if
you're
not
familiar,
that
micro
services
are
think
about,
as
opposed
to
taking
a
big
monolithic
application
with
a
big
monolithic
database
and
deploying
that,
as
opposed
to
doing
that,
microservices.
B
Basically,
that
approach
tells
you
to
take
your
system
and
break
it
down
into
smaller
pieces,
break
it
down
into
smaller
services
if
you're
familiar
with
domain,
driven
design
and
microservices
kind
of
mapped
to
bounded
contexts.
So
you
take
your
system,
you
break
it
down
into
a
bunch
of
smaller
services,
and
this
yields
many
results,
many
of
which
we're
going
to
be
talking
about
today.
B
The
interesting
thing
about,
if
you
break
your
system
down
into
a
bunch
of
services,
each
one
of
them
independent
that
can
be
managed
independently
deployed
independently
in
versioned
independently
is
that
is
that
you
get
many
benefits,
not
the
least
of
which
is
maintainability,
but
each
of
these
services
is
wholly
owned.
They
own
everything,
including
their
data,
which
gives
you
some
additional
capabilities.
B
You
can
start
you
can
use
a
different
back-end
for
different
services.
So,
for
instance,
let's
say
I
had
one
service
that
was
searchable
oriented,
maybe
I
want
to
use
leucine
in
order
to
meet
the
needs
for
that
search.
Maybe
I
have
another
service
which
needs
massive
scalability.
I
need
to
be
able
to
handle
massive
concurrent
rights.
I
need
to
be
able
to
handle
cross
data
center
replication.
Well,
that's
a
perfect,
perfect
spot
for
Cassandra,
which
has
established
itself
as
the
leader
in
that
category.
B
A
Come
in
here,
you
guys
know
that
you
can
run
linux
on
azure,
all
right,
so
less
than
half
of
the
people,
so
Linda's
is
the
first
class
citizen
and
a
shirt
right.
So
all
the
open
source
solutions
that
you're
familiar
with
that.
You
typically
think
that
we
don't
run
the
Microsoft
classroom.
Actually
they
do.
You
know,
and
one
particular
example
of
that
is
datastax.
In
Cassandra
we
actually
have
two
mechanisms
to
deploy
datastax
and
cassandra
on
azure.
A
One
is
through
our
marketplace:
it's
a
growing
ecosystem
of
Asherah
Certified
solutions
that
are
ready
to
go
that
you
can
instantiate
whenever
you
need
to
write,
and
these
solutions
consist
of
open
source
solutions
as
well
as
enterprise
solutions.
In
addition
to
that,
we
recently
enable
a
new
technology
where
you
de
can
declaratively
specify
the
resources
that
you
want
to
deploy
into
a
sure
right.
A
So,
instead
of
programming,
sequence
of
scripts
and
sequence
of
events
right,
you
can
declaratively
specify
what
your
solution
is
going
to
look
like
and
through
that
mechanism
we
have
the
ability
to
deploy
it
at
Cassandra
as
well.
So
let
me
show
you
a
quick
demo
of
that,
so
we're
going
to
be
doing
here
is
we're
going
to
go
into
the
marketplace.
We're
going
to
look
for
datastax.
A
There
are
three
instances
that
pop
up
one
for
production
scenario
has
been
certified,
another
one
that
is
for
testing
areas
and
one
that
is
for
developer
purposes,
so
we're
going
to
Stan.
She
ate
an
instance
of
the
data
stacks
for
development
scenarios.
So
what
we
need
to
provide
is
very
simple:
we
just
need
to
provide
the
name
of
our
vm
in
this
case.
It's
going
to
be
dev.
A
Yes,
so,
as
you
can
see,
this
is
a
video,
so
I
don't
have
to
attic
powers,
you
know
so
it's
it's!
It's
a
video
that
would
recorded
earlier
and
you
provided
username
the
password
is
you
want
to
use
the
ssh
key?
You
can
do
that
too,
and
then
we
need
to
provide
the
resource
group,
but
the
sauce
group
is
a
construct
that
we
use
for
management
purposes.
We
just
need
to
specify
the
location,
the
region
where
you
want
the
instance
run.
A
We
select
the
size
of
our
instance,
how
big
that
is
and
then
we're
good
to
go.
There
are
some
additional
configurations
that
you
might
want
to
tune
in,
but
for
the
most
part
for
for
testing
and
prototyping
purposes,
you're
going
to
be
okay
with
the
default
values
we
say:
okay
and
then
we
click
x
and
then
it
will
go
ahead
and
instantiate
an
instance.
So
we're
going
to
show
you
next.
So
this
is
the
marketplace
offering,
as
you
can
see
so
few
steps
you
get
going
with
a
Cassandra
instance
in
a
single
node.
A
But
what
about?
If
we're
going
to
employ
a
full
cluster?
Well,
for
that
we
use
the
templates.
The
templates
are
hosted
in
github,
so
we
have
a
very
strong
relationship
with
woody
to
open
the
way
we
deploy
things
from
Greek
hope
we
have
that
button
over
there.
Basically,
what
it
does,
he
will
try
to
deploy
it
or
Jason
template
to
usher,
and
then
what
is
gonna
happen
here
is
the
Riggin
after
getting
authenticated,
we're
going
to
get
to
a
point
where
we
need
to
provide
the
information
to
the
employer
or
Cassandra
cluster.
A
So
we
select
the
region,
the
name
of
the
storage
account
where
physically
the
PhDs
are
going
to
recite
the
dns
name.
This
is
the
name
that
is
going
to
be
used
for
the
name
of
instance
of
ops
center,
maybe
the
network
again
the
username
and
password
for
our
instances.
The
opscenter
admin
account
the
password.
We
want
to
select
also
the
size
of
the
vm
that
we
want
to
use
for
all
the
notes
and
then
the
number
of
nodes
that
we
want
to
deploy
it's
very
important,
so
you
can
deploy
as
many
as
you
need.
A
A
Parallel
lies
the
process
of
deployment,
so
we're
talking
about
three
nodes,
this
particular
case,
because
it's
off
center
plus
the
two
Cassandra
notes
it's
going
to
try
to
do
it
in
parallel,
it's
going
to
take
a
few
minutes,
but
in
the
meantime
let
me
show
you
something
already
in
place.
That
is
running,
so
you
can
understand
the
you
know,
end
user
experience.
A
After
the
deploy
a
successful,
you
ended
up
with
an
instance
of
op
center
up
and
running
where
you
can
just
log
in
with
the
password
that
you
specified
early
when
you
declared
when
you
set
up
the
template,
you
can
log
in
and
then
this
is
a
familiar
screen
for
you
guys
we
can
just
click
on
it,
create
a
brand
new
cluster.
Since
we
deployed
two
instances,
sorry
two
nodes
I
can
just
provide
I'm
gonna.
A
Do
it
in
a
second
we're
going
to
provide
after
we
provide
the
username
and
passwords
for
connecting
to
the
datastax
repository
we're
going
to
enter
the
IP
addresses
of
the
nodes
and
then
we're
going
to
be
good
to
go
so
little.
Let
it
go
a
little
bit
at
rush.
You
know,
I
was
faster
than
the
demo
specify
that
the
IP
addresses
so
for
nodes
right
here.
So
again
we
deploy
to
so
I'm
going
to
enter
two
instances
of
this
based
on
the
private
IP
addresses
and
then
once
you
deploy
them.
A
But
what
is
really
cool
about
this
is
that
all
the
best
practices
are
in
place
for
redundancy
in
terms
of
a
local
data
center.
But
in
addition
you
can
extend
that
configuration.
So
you
can
deploy
another
instance
of
these
across
the
region
and
then
you
can
create
a
VPN
connection
between
the
two
instances
and
then
you
have
a
now
idea:
replicated
Cassandra
cluster.
A
A
So
what
happens
when
you
have
a
maximum
amount
of
nodes
right
and
you
wanted
to
do
something
that
is
very
cost-effective,
so
the
data
fabric
introduces
an
interesting
concept
that
is
very
important
to
understand,
so
you
can
maximize
the
return
on
investment,
so
the
service
fabric
perspective
the
service
fabric.
The
idea
of
a
cluster
is
independent
of
the
hosting
BMS,
where
the
Closter
will
be
deployed.
A
So
you
can
have
a
cluster
that
is
running
on
a
local
machine
right,
one
single
machine,
multiple
nodes
right
or
you
can
have
a
cluster
that
actually
runs
across
several
machines
right,
so
that
layer
of
a
structured
actually
provides
very
powerful
scenarios
for
you.
So
let
me
explain
one
of
those
scenarios,
in
contrast
with,
when
we
didn't
have
that.
So,
let's
take,
for
instance,
the
scenario
where
you
are
typically
considering
one
BM
to
a
node
in
the
system
right
in
a
cluster.
Well
that
creates
a
few
challenges.
A
So
therefore,
you're,
basically
leaving
money
on
the
table
right
also
has
an
implication
terms
of
the
scalability
and
recovery
you
from
that
is
scalability
or
failure
right,
because
from
that
perspective,
if
you
want
to
increase
the
number
of
nodes,
you
have
to
wait
until
the
vm
is
fully
loaded,
so
you
can
deploy
your
application
there
right.
Similarly,
if
something
goes
down,
you
have
to
wait
until
another
one
is
provision
get
ready
and
then
you
can
provision
your
application,
in
contrast
with
a
service
fabric,
because
there
is
that
layer
of
abstraction
between
your
cluster
and
infrastructure.
A
You
can
achieve
what
is
calling
the
industry
high
density
scenarios,
where
you
can
deploy
multiple
services,
that
independent
of
each
other
within
a
single
VM
right.
In
addition,
the
service
fabric
has
the
capability
to
manage
the
amount
or
resources
so
that
you
don't
overtax
or
over
utilize.
Those
VMs.
The
other
benefit
is
that
in
the
case
of
out
the
scalability,
because
there
is
that
the
coupling
of
the
underlying
infrastructure,
you
can
create
those
instances
and
deploy
those
instances
replication
way
faster
than
before
and
the
same
recovery
from
a
recovery
perspective.
A
The
orchestrator
is
going
to
take
those
instances
in
case
of
failure
of
a
node
and
try
to
report
deploy
very
fast
to
another,
existing
node
or
instance.
So
because
we
don't
know
we
don't
we
don't
control
the
the
underlying
infrastructure.
How
can
I
manage
the
way
they
flow,
the
information,
or?
How
do
I
deploy
my
my
instance?
How
do
I
deploy
my
application?
How
do
I
control?
Where
does
it
go?
A
Well,
the
satisfiability
introduces
the
concept
of
placement
constraints
where
basically,
you
as
specified
the
type
of
node
right
and
based
on
that
type
of
node
or
metadata,
that
you
send
to
a
node.
You
can
control
what
part
of
your
application
goes
where
so.
This
particular
example,
let's
assume
that
you
know
we
have
a
very
simple
application
where
we
won't
expose
the
web
part
to
the
outside,
that
is
outside
the
DMC
and
another
internal
part
of
our
application
that
we
went
expose
internally
to
an
internal
network.
A
So
by
defining
you
know
a
set
of
metadata
and
our
notes,
we
can
control
what
part
application
goes
where
so
we
have
full
control
for,
even
though
we
don't
have
control
of
the
underlying
infrastructure,
we
have
full
control
on
how
our
application
is
deployed
and
controlled
and
deploy
in
national
service
fabric.
This.
B
Is
gonna
be
a
little
challenging
with
holding
the
mic?
So
no
that's
just
creepy.
So
what
we're
going
to
do
now
is
we're
going
to
show
you.
We
built
a
demo
application
and
we
built
the
application
to
help
illustrate
some
of
the
challenges
that
you
have
and
how
the
service
fabric
it
helps
you
to
deal
with
those,
and
so
this
is
a
typical
shopping
application
and
so
I
think
it's
important
to
note
if
you
look
up
at
the
uri
up
here,
you're
turning
on
localhost,
so
this
isn't
deployed
out
in
the
cloud.
B
Yet
it's
not
deployed
in
a
sure
it's
not
in
service
fabric.
This
is
a
local
instance
running
it's
running
on
local
data,
all
sitting
in
memory.
Okay,
but
I
wanted
you
to
see
what
the
application
did
before
we
deployed
anything.
So
we
can
go
ahead
and
do
a
search
and
we
can
search
on
just
in
general
anything
and
get
a
product
and
add
it
in
to
add
it
into
our
shopping,
cart
and
we've
got
a
card
out
here
and
then
we've
got
up
frequently
bought
or
people
who
bought
these
but
might
also
buy
these.
B
It's
a
recommender
engine
and
the
interesting
thing
when
you
start
to
look
at
this
application
is
try
not
to
think
of
it,
as
it
is
one
giant
hole,
but
rather
a
set
of
micro
services.
So
over
here,
here's
my
search,
microservice
and,
as
I
mentioned
earlier
when
I
was
describing
it
the
backend
for
that
may
well
be
some
kind
of
indexing
engine
like
leucine
or
as
your
search
or
something
like
that
over
here,
I've
got
top
sellers
that
maybe
maybe
running
in
any
variety
of
no
sequel
systems.
B
B
Constantly
updating
their
cart,
I'm
gonna
have
a
massive
amount
of
concurrent
rights.
I
need
this
linear,
scalability
and
I
need
to
make
sure
people
don't
lose
their
shopping,
cart
in
the
event
of
a
data
center
going
down.
So
the
backend
for
this
micro
service
we've
chosen
to
use
Cassandra,
because
it's
perfect
for
all
of
those
challenges
that
I
just
just
mentioned.
So
we've
got
Cassandra
sitting
behind
this
micro
service,
so
for
what
we
want
to
do
in
this
demo
is
basically
instead
of
clouding
the
issue
of
making
it
very
complex.
B
What
I'm
going
to
do
is
just
that
one
micro
service,
that's
what
I'm
going
to
deploy
out
into
the
service
fabric,
so
it
can
very
clearly
see
how
the
deployment
works,
how
upgrades
work,
etc,
the
rest
of
the
moment
or
I'm
going
to
leave
stubbed
okay.
So,
let's
minimize
that
what
I
want
to
do
is
I
want
to
show
you
first
of
all,
the
cluster
that's
running
out
in
oops
the
cluster
that's
sitting
out
in
Azure,
so
this
is
actually
the
physical.
B
This
is
the
physical
VMS,
I'm
going
to
be
showing
you
here,
so
I've
got
a
resource
group
and,
as
his
oops
kind
of
mentioned
earlier,
resource
groups
are
just
kind
of
a
logical
grouping
of
physical
assets.
Physical
hardware,
so
I've
got
this
resource
group
that
I
call
bag,
BSF
clust
service
that
I
cluster
and
you
can
see
all
the
physical
Hardware
out
here.
B
I've
got
five
VMs
in
my
cluster
I've
got
a
load
balancer
sitting
in
front
of
those
I've
got
my
neck
cards
and
I've
got
it
and
they're
all
sitting
inside
of
a
virtual
network.
So
I've
got
all
of
this
hardware.
That's
out
there
now
the
key
to
what
we're
talking
about
is
we
want.
We
don't
want
to
basically
say
hey
when
I
deploy
this
service
I
wanted
to
go
out
onto
this
specific
node.
We
want
an
abstraction
in
there
or
separation
between
the
two
and
that's
where
the
service
fabric
comes
into
play.
B
So
I've
got
us
a
view
into
my
service
fabric
cluster,
so
go
to
service
fabric
cluster
up
here
right
here
called
bag
BSF
must
its
associated
with
what
we
just
saw.
The
physical
cluster
out
an
azure
if
we
take
a
look
at
this
thing
under
it,
one
view
which
is
the
application
view
you
can
see.
I
have
no
applications
that
have
been
deployed
out
there
yet
and
in
the
nodes
I.
B
B
So
let's
deploy
that
so
right
here,
I'm
going
to
use
a
script
that
I've
written
to
do
this
deployment,
your
choices
are,
you
can
use
PowerShell
or
we've
got
a
cross-platform,
a
CLI
that
you're
able
to
use
to
do
the
deployment
and
in
just
a
second
here
that
script
I've
basically
asked
said:
will
you
deploy
my
service
out
across
three
of
the
five
nodes
in
my
cluster?
I?
Didn't
add
any
placement
constraints
in
there
but
I
could
have
so.
If
we
do
a
refresh
here
and
I
take
a
look.
B
We
can
see
that
now
my
shopping
service
has
now
been
deployed
as
an
application
and
if
I
start
looking
at
these
nodes,
you
can
see
that
my
applications
been
deployed
across
three
of
those
nodes
which
I
asked
it
to
do
very
declarative.
I
said
deploy
across
three
nodes:
I,
don't
care
which
nodes
you
take
care
of
it
as
long
as
as
long
as
it
gets
deployed
across
three
nodes
sitting
behind
a
load
balancer.
If
we
go
back
into
our
web
page.
B
Here
is
a
web
application.
That's
been
deployed
out
to
the
same
data
center
that
my
service
fabric
is
in
and
as
that
pops
up
here
in
a
second
I
guess,
it's
got
a
seat
itself
up
as
he
pops
up
I'm
going
to
run
a
quick
search
here,
we're
going
to
grab
an
item
and
that's
going
to
go
now.
That's
getting
pushed
into
that
service.
You
just
saw
me
deploy.
This
is
talking
to
that
service.
B
That's
that's
now
deployed
out
in
service
fabric
that
I
just
that
I
just
showed
to
you
and
it's
going
to
take
a
second
because
nothing
said
that
service
yet,
and
this
guy
is
now
sitting
out
in
the
cloud.
So
with
that,
why
don't
we
just
jump
back
and
again,
the
purpose
of
this
portion
of
the
demo
was
to
introduce
to
you
what
the
application
is.
As
we
start
talking
about
those
problems
in
more
detail,
we're
going
to
show
you,
via
this
application,
how
we
start
to
solve
this.
Let's
go
back
to
our
slides.
A
So
I
just
want
to
mention
something
about
those
demos.
Those
are
live
demos.
By
the
way
guys
I
mean
you
know,
we
took
a
lot
of
risk
and
making
sure
that
that's
the
case.
That's
a
Cassandra,
node,
I,
guess
I
casanta
close
to
running
a
sure,
that's
a
service
fabric
reaction.
We
wanted
to
show
the
real
stuff.
So
that's
life
right.
So
so
we
took
a
lot
of
risk
and
so
far
so
good
knock
on
wood
is
working
right.
So
the
next
point
that
I
want
to
tell
you
about
this
availability.
A
As
I
mentioned
earlier,
it's
very
important
that
we
think
of
our
system
as
a
system
that
is
ventually
is
going
to
fail.
So
what
do
you
guys
think
are
the
primary
reasons
why
a
system
failed
in
at
a
very
high
level?
Anyone
can
tell
me
you
know,
so
what
do
you
think
drove
what
we
hardware
failure?
What
else
you
know
what
other
scenario
you
think
you
know
will
come
into
the
mix
for
sure
right.
That
would
cause
an
utter
failure.
So
we're
failure
here.
A
You
go,
that's
exactly
it,
so
we
need
to
make
a
system
at
least
resilient
for
those
two
scenarios:
hardware,
failure
or
a
brake
failure,
so
I'm
in
order
to
help
you
with
that,
I
should
introduce
these
two
constructs
that
are
very
important
to
understand.
You
know
whatever
you're
dealing,
whether
you're
doing
something
with
a
service
fabric
or
any
other
system
or
any
other
any
other
aspect.
Gradual.
These
two
comes
are
very
important
to
understand.
So
we
have
the
concept
of
fall.
A
Two
mates
fall
domains
you
can
think
of
them
as
a
single
unit
of
our
work
hardware
that
can
fail
right
and
update
domains.
Meaning
is
a
single
unit
of
software
that
can
be
updated
at
the
same
time.
So
what
does
it
mean
if
you
have
an
application
that
has
at
least
two
tiers
like
the
one
I'm
describing
their
the
web
api,
a
weapon,
an
API
peered
right?
You
need
to
make
sure
that
the
full
stack
of
that
application
is
at
least
in
two
full
domains.
A
So,
in
the
event
of
a
failure,
because
one
domain
goes
down
because
you
know
typically
a
domain
correlates
physically
to
Iraq
in
the
data
center.
So
if
one
of
those
goes
down,
then
you
have
the
capability
to
fail
over
to
the
other
one.
Similarly,
for
a
break
domains,
you
need
to
make
sure
that
you
have
at
least
two
instances
of
every
tier
in
your
application.
So
when
you
are
changing
something
in
that
particular
tier,
when
you're
updating
version
of
that
particular
tier,
then
you
have
control
and
then
you
can
fail
over
to
the
next
one.
A
So
I
guess
birthday
Dwayne
today
right
so
next,
so
can
you
use
this
in
the
context
of
the
service
fabric?
So
let's
assume
that
you
have
a
system
right,
multiple
microservices
there
deploy
across
several
break
domains
right
and
that
version
of
this
of
the
Oriole
service
is
something
that
is
very
important
to
you
right
and
you
want
to
operate
that
right.
So
one
of
the
things
that
are
introduced
as
part
of
the
service
fabric
and
it's
important
that
you
think
about
implication
of
doing
this
is
the
concept
of
rolling
updates.
A
You
have
full
control
of
when
something
gets
updated
in
the
service
fabric,
so,
for
instance,
you
want
to
gradually
update
a
version
of
your
application,
because
you
want
to
see
how
the
system
behaves.
You
know
in
a
very
narrow
scenario,
which
is
technique
called
in
flight
testing
right
then
you
can
do
it
and
you
can
gradually
using
the
constructs
of
uptake
domains,
increase
the
number
of
instances
that
get
upgraded
over
time.
What's
important
to
consider
here
is
that
any
particular
moment
in
time?
A
If
you
want
to
roll
back,
the
infrastructure
has
the
capability
to
do
that
right.
So
you
gradually
abraded
wanna
break
domain.
Didn't
work
out.
Your
telemetric
said
you
know
the
behavior
that
empty
in
here
it's
not
working
as
suspected.
You
can
help
us
roll
that
back
and
that's
coming
from
the
infrastructure.
That's
what
gives
you
a
service
fabric,
we're
going
to
show
you
them
of
that,
but.
B
Before
we
do
that,
we're
going
to
talk
just
very
briefly
about
maintainability
I,
don't
want
to
kind
of
beat
this
whole
microservices
thing
to
death,
but
when
you
break
your
application
down
into
small
pieces
into
services
that
that
can
be
independently
managed
can
be
independently
versioned
and
independently
deployed
great
things
happen
from
a
maintainability
perspective.
Think
about
the
challenge
that
you
have
if
you're
deploying
your
application
is
a
monolith,
and
you
have
to
make
one
small
change
to
one
small
piece
when
you
deploy
the
entire
application.
B
Think
about
the
risk
that
you've
brought
upon
yourself
by
breaking
it
down
and
having
your
services
living
independently.
You
greatly
minimize
the
risk,
so
I
just
wanted
John
that
just
very
briefly,
the
next
thing
we're
going
to
do
is
I'm
going
to
show
you
a
demo
on
two
sides.
One
is
handling
failures
and
the
other
one
is
the
rolling
up
the
upgrades
that
has
talked
about.
So,
let's
just
jump
back
in
to
our
application
here.
Okay,
so,
let's
start
with
the
failures
handling
failures:
let's
go
back
out
and
take
a
look
inside
of
my
view.
B
This
is
just
a
view
into
my
cluster.
A
UI,
if
you
will
notice
I
I
told
you
I
asked
to
have
three
nodes
through
my
three
instances
of
my
server
service
running
across
my
five
nodes.
What
happens
if
one
of
those
nodes
goes
down
I'm
going
to
go
ahead
and
kill
this
node
I'm,
going
to
kill
vm
too
so
I'm
just
stopping
that
the
action.
B
That
node
goes
down,
let's
go
ahead
and
just
refresh
see
what's
going
on,
you
can
see.
V
m2
is
down.
If
I
refresh
it
again,
it's
going
to
turn
red
well,
the
guy.
Now
it's
not
going
to
turn
red,
but
it
should
have
turned
red
either
way
notice.
What
happened?
The
service
fabric
automatically
noticed
that
that
node
was
down
and
it
automatically
moved
it
over
onto
another
vm.
You
can
see
it's
moved
over
to
vm
one.
It's
not
quite
green
yet
because
it's
not
quite
active
yet,
but
it
is
it
is.
B
It
has
been
redeployed
over
there.
So
the
service
fabric
is
in
charge
of
us,
making
sure
that
the
topology
that
I've
requested
is
being
adhered
to,
and
so,
if
a
node
goes
down,
it's
going
to
take
all
those
services
and
it's
going
to
find
the
appropriate
place
for
them,
make
sure
that
we've
got
assets
that
have
the
appropriate
resources
for
them.
It's
going
to
move
them
over
without
me
moving
a
finger.
B
The
next
thing
I
want
to
talk
about
is
the
rolling
upgrades.
So,
let's
take
a
look
at
this
app
out
here
now.
I've
got
a
UI
out
here
and
if
you
look
at
the
price
just
to
the
left
of
the
price,
you
don't
see
anything
you
just
see
the
the
we've
got
the
name
of
our
book
and
then
the
description,
but
I've
got
a
service
out
here.
My
shopping
cart,
service.
I've
got
two
versions
version
one
is
deployed
version.
One
does
not
return
the
year
that
the
book
was
written
version.
2
does
okay
vert.
B
This
guy's
miui
is
tolerant
of
both
versions.
So
if
you
supply
it
with
a
year,
it'll
show
you
the
year.
If
you
don't
it
won't.
So
what
I
want
to
do
is
I
want
to
deploy
version
2
of
this
service
I'm
going
to
go
ahead
and
kill
this
guy.
I'm
going
to
remove
this
guy
from
my
shopping
cart,
just
to
make
things
a
little
bit
clearer,
that's
going
to
get
removed
and
what
I'm
going
to
do
again
before
I?
Do
this
rolling
upgrade
I
want
you
to
take
a
look?
B
You
can
see
down
here
at
the
bottom.
Do
you
see
where
I'm
highlighting
I've
got
version?
One
of
my
service
and
version,
one
of
my
application
same
thing
and
the
same
thing
on
this
instance
version
1
and
version
1.
The
other
thing
I
want
you
to
notice.
If
you
look
where
I
highlighted
looked
at
the
very
right,
do
you
see
the
you
d0?
That's
upgrade
domain
0,
so
this
guy's
upgrade
domain
0
he's
on
upgrade
domain
2
and
he's
on
upgrade
domain.
For
so
when
I
do
this
rolling
upgrade
what's
going
to
happen?
B
Is
this
guy's
going
to
get
upgraded
first,
then
this
guy
and
then
this
guy
and
we're
going
to
be
able
to
visualize
that
we're
going
to
be
able
to
watch
it
as
it
goes
and
and
the
items
hitting
those
services
are
going
to
get
hit
with
are
going
to
hit
either
of
those
services,
as
that
upgrade
goes
goes
forward.
So
I've
got
a
script
here
to
run
my
upgrade
so
I'm
going
to
run
this
upgrade.
B
Things
are
going
now
I'm
going
to
configure
this
thing
to
do
an
auto
refresh
every
two
seconds
and
you'll
see
on
the
bottom
right.
A
little
blip
will
occur
and,
let's
just
watch
and
let's
just
see
this
shopping
service
go
from
one
to
two,
and
that
should
happen
in
the
next
10
seconds
or
so.
This
is
live
and
if
it
doesn't
happen,
it's
generally
his
oozes
fault
when
it
works
out.
Well,
it's
my
hard
work.
B
You
can
see
it's
going
to
two
now,
let's
move
up
to
this
guy
and
you
can
see
he's
still
at
one
so
upgrade
to
me
and
zeros
at
version
to
upgrade
domain.
This
guy
has
just
moved
up
to
two,
so
we're
at
22
and
up
no
he's
still
at
one
sorry,
that's
a
UI
glitch
you'll
see
this
guy
will
get
popped
up
to
two
in
the
next
10
seconds
or
so
or
you
can
blame
his
loose
again
he's
going
to
own
the
problem
here.
B
B
We
will
hopefully
see
the
year
show
up
in
our
UI
and
we'll
have
seen
both
how
we
handled
failures,
as
well
as
how
we
handled
those
rolling
upgrades
and
the
only
other
point
to
make
is,
as
upgrades
are
going
across
those
upgrade
domains.
The
service
fabric
is
looking
and
seeing
if
it
was
successful
and
if
it
wasn't,
you
can
configure
it
to
do
to
do
an
auto
roll
back
for
you.
So
it's
kind
of
taking
a
lot
of
the
pain
away
from
you.
B
I
want
to
just
declare
in
my
application.
Here's
the
topology
I
want
I,
want
these
services
running
on
nodes
that
look
like
this
I
want
these
services
running
on
nodes
that
look
like
that.
These
have
to
run
inside
of
a
DMZ.
These
I
want
public
facing
these
need
hardware,
with
this
kind
of
storage,
etc,
etc.
I
want
to
just
be
declarative
and
then
I
want
to
hand
it
over
to
the
service
fabric
and
let
the
service
fabric
deal
with
placing
those
nodes
on
hardware.
That's
running
that
isn't
overburdened.
B
We
want
it
to
do
the
balancing
for
us.
I,
don't
want
to
be
in
charge
of
saying:
hey
is
my
application
service
running
slowly?
Is
it
if
I
used
up
all
the
the
memory
on
this
box?
Maybe
I
should
move
this
service
over
to
another
box.
I,
don't
want
to
do
that.
I
want
to
tell
it.
I
want
to
tell
the
service
fabric.
Look
I
need
this.
This
deployed
here
and
I
don't
want
in
the
event
that
memory
exceeds
this
move,
it
I'm
overburdened.
Okay.
B
Now
all
that
being
said,
we
need
some
some
control,
as
his
who's
talked
about
the
placement
constraints.
I
can't
just
have
my
software
thrown
anywhere.
I
need
to
be
able
to
define
the
type
of
hardware
I
wanted
to
run
on
I.
Also,
don't
want
some
infrastructure
moving
my
services
and
rebalancing
randomly
I.
Don't
want
it
to
have
to
determine
hey
one
should
I
rebalance
I
need
that
kind
of
control,
and
so
we
do
have
those
that
control
and
they're
called
thresholds.
B
So
we
have
both
balancing
as
well
as
activity
thresholds
and
so
from
a
balancing
threshold.
You
can
think
of
it.
This
way
you
wouldn't
want
services
to
be
moved
from
one
node
to
another.
At
every
slight
imbalance.
You
want
to
be
able
to
define,
saying
hey
when
this
guy
is
using
up
twice
as
many
resources
as
him
or
thirty
percent,
more
resources
in
him,
then
I
want
you
to
rebalance
and
that's
a
balancing
threshold.
An
activity
threshold
basically
tells
you
look
I,
don't
need
you
to
start
rebalancing
if
no
one's
using
this
thing.
B
B
The
last
thing
we
want
to
do
before
we
jump
into
questions
is:
do
a
very
quick
demo
on
scalability
now,
as
we
start
approaching
again
we're
still
in
preview
as
we
approach
GA,
which
general
release
a
lot
of
what
you're
going
to
see
here,
is
going
to
be
you're,
going
to
be
able
to
configure
again
through
through
policy,
saying
hey
when
we
hit
this
threshold
automatically
scale
up
a
node.
Add
a
couple
more
notes
to
the
cluster
right
now:
I'm,
just
going
to
kick
off
a
script
to
run.
To
illustrate
this
to
you.
B
So
again
we
have
running
across
three
of
my
four
nodes:
I'm
going
to
go
and
start
this
node
anyways,
so
I've
got
across
three
of
my
five
nodes:
I've
got
my
service
running
I'm,
going
to
go
ahead
and
just
kick
off
a
quick
script
which
just
tells
me
hey,
I,
want
this
to
go
to
four
I
want
to
go
up
to
four
instances.
So
let's
run
that
and
let's
take
a
quick
look,
this
should
be
fairly
quick.
B
Let's
see
this
is
live,
not
what's
not
memorex
and
you
can
see
just
that
simply
I
was
able
to
scale
this
scale.
My
semi
stateless
service,
now
across
four
nodes,
and
so
with
that
last
final
demo
I
think
we
have
53
minutes
left
for
questions.
So
do
we
have
any
questions
yeah
first,
yes,
thank
you
guys
very
much
for
for
coming
to
the
talk
question
over
here.
C
B
So
if
I
make
sure
I
understand
your
question,
how
can
we
make
sure
we
maintain
quality
resources
to
make?
Is
that
a
noisy
neighbor
question
executive?
Okay?
So
so
we're
deploying
to
our
own
cluster
and
so
that
cluster
itself?
We
that's
our
hardware
so
that
they're
not
going
to
be
any
noisy
neighbors
on
on
that
now,
when
you're
talking
about,
if
you
yourself,
your
own
services,
is
noisy
neighbor.
B
That
has
to
do
with
constraints
that
you
can
play
on
and
I
kind
of
alluded
to
those
thresholds,
but
we
can
say
we
can
configure
into
the
service
fabric.
Saying
hey
when
this
certain
whatever
it
is,
it
could
be.
Cpu
could
be
memory,
but
you
can
also
define
it
yourself.
It's
an
open
system.
When
this
this
threshold,
then
we
need
to
balance
and
you
need
to
move.
You
need
to
start
moving
software
moving
software
around
so
that
we're
not
exceeding
that
threshold.
So
it's
basically
through
policy.
A
Yes,
so
the
important
consideration
is
there,
so
it's
not
only
with
that
offering
there
is
a
bunch
of
other.
You
know
offerings
that
can
do
the
same
thing.
What
is
important
is
there
is
a
lot
of
innovation
in
the
area
right.
There
is
a
lot
of
innovation
in
that
level
of
abstraction
where
now
we're
bringing
the
application
higher
so
that
we're
not
doing
a
one-to-one
comparison,
a
lot
of
the
capabilities
that
you
find
in
all
the
other
competitors.
A
We
have
them
here
right,
but
in
addition,
you
know
we
have
a
programming
model
and
we're
emphasizing
developers
right
it's
more
about
having
an
infrastructure
and
then
separating
that
development
efforts,
we're
actually
trying
to
integrate
both
right.
So
the
core
foundation
of
the
service
fabric
is
all
the
orchestration
that
we
talked
about,
but
also
very
rich
programming
model
based
on
the
SDK,
a.
B
Couple
things
I'll
add
to
that
in
that
you
can
have
a
very
you-
can
have
a
very
nice
relationship
with
the
service
fabric
between
your
services,
there's
a
model
where
you
can
tie
into
the
health.
So
I
mentioned
earlier.
How
do
I
know
if
my
vm
is
running?
How
do
I
know
if
my
applications
running?
I
can't
my
application
can
report
out
at
the
service
fabric,
it's
health,
and
so
I
can
have
that
nice
relationship.
B
One
thing
we
didn't
touch
upon
here
also
is
a
stateful
programming
model
which
is
also
available
in
the
service
fabric
and
I'll
kind
of
leave
that
to
you
to
look
at.
But
it's
a
it's
a
quick,
it's
nice,
it's
a
nice
offering
and
it's
something
worthy
of
taking
a
look
at
and
we
can
chat
about
it
after.
If
you
want.
B
So
some
of
what
you
saw
I
was
running
manually
through
scripts
that
will
be
done
through
policy
and
so
like,
for
instance,
when
we
are
doing
the
scale
you
wouldn't
in
the
end,
when
we
hit
GA.
That
will
be
done
through
policy
and
we'll
say
as
we
get
to
this
point:
it'll
move
up
and
it
scale
up
or
scale
down,
and
you
can
do
that
to
your
policy.
B
So
some
of
that
you
wouldn't
have
seen
some
of
the
rest
of
it
is
I
mean
scripting
is
the
way
of
DevOps
today,
and
so,
when
you
start
to
talk
about
doing
versioning
up
versioning
up
the
typical
way
people
want
to
do
that
is
through
scripting,
and
so
that
would
be
pretty
much
how
they
would
how
they
would
handle
that
you
would
write
your
scripts.
They
be
parameterised,
you
would
say
I'm
moving
from
version,
X,
2
version
y
and
here's
my
back
out
strategy
and
if
it
fails,
I
want
to
do
this.
B
So
it's
kind
of
a
combination,
I
I,
feel
badly.
When
I'm
doing
the
scale-up
demo
and
running
a
script,
because
no
one's
going
to
want
to
be
sitting
there,
you
can
almost
envision
a
guy
going.
Should
I
scale
up
now
or
no
and
that's
not
the
way
it's
going
to
be
that'll
be
done
through
policy.
Thank
you
all
right.
Thanks
again,
guys.