►
Description
Speaker: Sean Usher, Software Engineer
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running dse on azure.
A
B
Today,
we're
going
to
talk
to
you
about
how
Asher
and
datastax
Enterprise
powers
the
office
365
/
user
store,
so
I'm
going
to
start
by
giving
an
introduction
of
who
we
are
and
what
it
is
that
that
our
organizations
do
and
we're
going
to
talk
about
what
we
built
and
why
we
needed
to
build
it.
We're
going
to
talk
then
about
how
we
built
it
using
Cassandra
and
spark
and
then
why
we
chose
azure
and
datastax
enterprise
as
the
platform
to
build
it
on
now.
B
So
my
name
is
shawna
sure,
I'm
a
senior
software
engineer
and
office
365
contact
information
is
there.
My
team
really
focuses
on
currently
running
cassandra
for
other
teams
in
office,
as
well
as
building
platforms
on
top
of
cassandra,
so
that
we
can
understand
our
customers
better,
and
this
is
silvano
koreana.
B
He
is
a
principal
program
manager
and
Azure
and
he'll
talk
more
about
himself
when
he
tells
you
how
you
can
build
it
so
office,
365
I'm,
assuming
many
people
here-
are
familiar
with
it,
you've,
probably
heard
of
word
and
excel
and
outlook,
but
those
are
client
level
products.
There
are
also
server
level
products,
so
historically
enterprises
would
buy
things
like
exchange
for
running
email,
SharePoint
for
document
management.
They
would
get
the
hardware
install
it.
They
would
maintain
networking
machines
disks
going
bad
and
office
365.
B
We
take
all
that
overhead
and
we
move
it
into
Microsoft,
managing
it.
So
IT
admins
can
focus
on
adding
business
value.
On
top
of
that,
instead
of
dealing
with
what
discs
broke
last
night
and
then
Azure,
which
I
hope
all
of
you
know,
of
its
Microsoft
cloud
computing
platform,
it
provides
compute
database
services,
machine
learning,
mobile
services.
If
any
of
you
aren't
familiar
with
Azure
come
talk
to
silvano
after
the
talk
and
please
sign
up
for
a
short
rials,
so
you
can
start
deploying
things
and
see
how
it
works
and
maybe
even
deploy
Cassandra.
B
So
what
did
we
build?
So
an
office
65
because
we're
hosting
all
these
services
for
a
lot
of
large
organizations
and
they
all
use
services
in
different
ways.
Some
people
rely
heavily
on
email.
Some
people
rely
on
sharepoint
and
some
people
use
features
of
those
services
more
than
others.
So
what
we
need
to
understand
to
provide
good
service
to
them
is
how
they're
using
it,
what
their
experience
is.
Like
historically
we've
had
synthetic
monitoring
that
would
run
to
try
to
simulate
user
experience.
B
We
would
have
passive
monitoring
that
would
look
at
server
logs
and
give
us
aggregated
data
on
how
users
were
we're
experiencing
the
platform
in
different
pieces
of
hardware.
But
then
we
couldn't
say
you,
as
an
organization
are
having
this
experience
in
our
system.
So
we
really
wanted
to
build
a
way
to
understand
our
users
and
their
organizations
at
a
deep
level
and
to
start
doing
that,
we
wanted
to
answer
a
few
easy
questions
and
a
few
really
hard
questions.
So
the
first
one
is
our
users
happy
with
their
service
are
receiving.
B
This
is
a
really
hard
question,
because
how
do
you
come
up
with
a
number
that
says
they're
happy
or
not,
and
then
our
users
fully
utilizing
the
service
they're
paying?
For
you
know
if
we
have
an
organization
that
signed
up
for
a
thousand
licenses
and
they're
using
500
licenses,
we
don't
want
them
to
have
to
pay
that
extra
cost.
We
want
to
be
able
to
reach
out
with
them
and
work
with
them
and
give
them
a
better
experience.
Our
users
hitting
issues
we
can
proactively
help
them
with
issues
can
happen
on
our
side.
B
That
can
also
happen
on
the
organization
side.
They
could
go
and
make
some
licensing
changes
and
the
users
are
still
trying
to
use
the
service,
but
they're
not
getting
access
to
it,
and
we
need
to
be
able
to
reach
out
to
them
and
say:
hey.
We
see
that
a
problem
happened,
it's
impacting
a
lot
of
your
users.
How
can
we
help
you
fix
it
and
what
does
a
user
experience
bin
over
their
lifetime
in
order
to
provide
good
business
value
for
a
company?
B
You
have
to
be
able
to
show
them
that
you're,
providing
business
value
and
one
of
the
ways
of
doing
that
is
to
show
them.
Here's
your
availability
here,
the
number
of
support
cases
here,
the
number
of
incidents
you've
been
impacted
by
so
they
can
really
go
to
you
know
their
leadership
and
say
office
365
is
providing
us
with
value
and
then
there's
a
really
difficult
one
on
how
do
we
discover
patterns
in
our
data
that
we
aren't
aware
of
and
our
users
going
to
pages?
B
You
know
following
some
kind
of
wizard
and
then
leaving
now,
if
we're
seeing
a
lot
of
users
doing
that,
that
can
mean
that.
Well,
maybe
our
page
is
broken.
Maybe
it's
not
intuitive
and
it's
something
we
need
to
feed
that
data
back
into
the
product
to
say:
here's
how
we
can
provide
a
better
experience.
So
to
do
all
of
this,
we
need
to
store
a
lot
of
data,
but,
more
importantly,
we
need
to
be
able
to
aggregate
on
this
data
fast
if
we
take
24
hours
to
come
up
with
this.
B
B
So
we
had
a
few
requirements.
We
wanted
to
run
on
the
cloud
we
managed
bare
metal
in
Microsoft
and
office
55
all
the
time,
but
that
adds
an
overhead
if
you
want
to
dynamically
scale.
We
also
wanted
to,
of
course,
be
highly
scalable
ingest
around
50,000
events
per
second
to
start
with
rapidly
growing
and
a
few
other
common
requirements.
The
biggest
one
for
us
is
really
tunable
consistency.
B
We've
used
a
lot
of
data
services
in
the
past
that
enforced
full
consistency
and
when
you're
writing
a
lot
of
data
and
looking
at
aggregates
of
data
losing
one
piece
of
data:
it's
not
really
the
end
of
the
world,
but
having
your
whole
pipeline
backed
up
because
you
can't
get
full
consistency
means
you
can't
do
anything
with
the
data
and
then
you
can't
tell
your
customers
anything
or
help
improve
their
experience.
We
also
wanted
to
be
able
to
do
a
real-time
and
batch
analytics
as
well
as
machine
learning.
B
Machine
learning
can
help
us
understand
what
we
don't
know
about
our
data
and
long-term
storage.
So
we
need
a
system
that
really
can
scale
on
the
storage
side
of
things
and
one
big
requirement
from
management
was
it
gets
done
in
a
month,
so
we
started
off
not
knowing
what
data
system
we
wanted
to
use.
B
So
we
started
looking
at
some
things
that
are
public
some
things
that
aren't
public,
but
really
what
met
all
of
our
requirements:
linear,
scalability,
tunable
consistency
running
well
and
azure
and
had
a
big
ecosystem
around
it
so
that
we
didn't
take
on
a
technology
that
we
wouldn't
have
any
support
out
there.
Something
went
wrong
was
Cassandra,
so
we
said:
okay,
let's,
let's
take
the
bet.
We
know
that
some
people
in
Azure
we're
already
partnered
with
datastax
and
working
on
Cassandra,
so
we
decided
to
try
it
so
we
started
deploying
it.
B
We
went
with
one
physical
data
center.
Eventually,
we
want
to
geo
replicate,
but
for
now
one
data
center
with
two
logical
data,
centers
one
for
Cassandra
and
then
one
for
analytics.
So,
in
order
to
deploy
these,
we
decided
to
put
them
all
in
a
veena
assigned
static,
IPS,
separate
roles
in
two
different
subnets
and
then
able
to
akal
all
the
end
points
to
the
subnet
so
that
we
aren't
exposing
our
back
ends
to
the
Internet,
where
any
zero-day
flock
and
let
someone
come
in
and
steal
all
of
our
data,
which
is
pretty
bad
for
us.
B
So
because
we
didn't
understand
how
Cassandra
was
scale,
we
didn't
we've
never
used
it
before.
We
said:
let's
go
and
get
really
big
machines,
so
we
went
with
16
core
machines,
224
gigs,
three
terabytes,
and
we
decided
we
wanted
to
be
able
to
set
this
up
fast.
So,
instead
of
building
the
integrations
ourselves
between
spark
and
Cassandra
and
configuring
that
we
would
go
and
use
the
data
stacks
package,
so
we
had
to
use
a
boom
to
so
here
is
just
kind
of
a
table
of
what
we
deployed.
B
We
have
larger
heaps
and
spark
because
it
ends
up
pulling
in
a
lot
of
data
from
Cassandra.
It
wants
to
do
aggregates
and
g1
garbage
collection
which,
when
I
get
to
the
problem
section
that
was
very
big
for
us.
It
reduced
our
garbage
collection
times
quite
a
bit
our
paws
times
and
everything
uses
replication
factor
3
between
our
two
logical
data
centers.
B
So
here's
our
ops
center
picture
just
in
case
anyone
can
so
we
had
to
get
data
into
Cassandra
from
office,
365
and
I'm,
not
going
to
say
the
number
of
service
we
have,
but
it's
a
lot
of
servers
in
office,
365
and
so
to
do
this.
We
didn't
want
to
have
the
service
directly
talking
to
our
Cassandra
servers.
We
wanted
to
have
a
queue
so
that
the
clients
didn't
have
to
deal
with
retrying
data.
That
was
that
needed
to
be
replayed
or
couldn't
be
written.
B
So
we
decided
we'd
go
with
the
rest
api
and
behind
that
api
we'd
put
a
queue,
but
we
chose
the
azure
of
vente
so
where
a
lot
of
people
would
choose
Kafka,
we
didn't
want
to
go
and
try
a
whole
nother
technology.
While
we
were
learning
a
cassandra
is
a
technology.
Now
there
are
some
people
in
our
group
who
are
looking
at
Kafka
and
trying
to
weigh
the
benefits
of
Kafka
versus
event.
B
So
if
anyone
did
look
at
the
data,
they
wouldn't
be
able
to
tie
it
back
to
any
of
our
users
or
organizations,
but
Microsoft
systems
can
tie
it
back
so
for
the
next
two
slides
I
want
to
show
an
example
of
our
data
model,
two
of
our
our
main
entities,
our
users
and
organ.
We
also
call
organizations
tenants,
so
our
big
ingested
data
is
user.
B
Key
is
create
a
time.
We
do
used
a
tiered
compaction
strategy
and
we'll
talk
a
little
bit
about
why
we
use
date
to
your
compaction
when
I
get
to
the
problems
and
our
other
main
entity
is
really
tenant,
an
organization
that
might
be
hard
to
read.
But
what's
important
about
this,
the
partition
is
tenant
and
then
we
have
a
set
of
skews
that
you
can
sign
up
for
an
office
365.
So
that
is
our
clustering
key,
so
one
tenant
can
have
multiple
skews.
B
Now
this
is
not
date,
time
series
this
gets
bulk
imported
once
a
day
and
then
we
import
diffs
as
as
needed,
and
this
really
gives
us
metadata
about
an
organization.
Where
are
they
how
many
licenses
do
they
have
overall?
How
many
licenses
per
system
that
they
use
be
an
exchange,
sharepoint
and
Skype
for
business?
Are
they
set
to
auto
renew
and
what's
key,
were
they
signed
up
for
so
we
ended
up
taking
those
two
tables
and
with
those
two
tables
we
could
join
them
and
we
could
answer
a
lot
of
questions.
B
I
know
who's
using
the
system,
our
users
seeing
failures.
Are
there
certain
failures
from
certain
regions
that
we
can
reach
out
to?
But,
as
we
were
running,
these
spark
jobs
for
every
piece
of
insight
we
wanted
to
get.
We
were
spending
20,
30
minutes
which
really
limits
the
value
you
can
get
out
of
your
data.
So
what
we
ended
up
doing
was
having
a
spark
job
that
aggregated
all
of
this
raw
data
into
daily
hourly
and
most
recently
weekly
tables.
B
So
the
jobs
can
look
at
that,
and
this
hourly
job
though
it
is
batch
right
now
we're
looking
to
move
into
spark
streaming,
so
we
don't
have
to
pay
the
cost
of
riding
and
then
reading
data
back
out.
Once
we
switch
to
that
using
the
same
schema,
we
had
all
the
jobs
switch
to
the
aggregated
data.
They
then
started
taking
20
to
30
seconds.
So
now
we
can
get
a
lot
more
value
out
of
the
data
that
we
have
inside.
So
what
were
the
results?
Well,
we
were
able
to
answer
a
lot
of
those
questions.
B
Are
users
happy
with
the
service
they're
receiving?
You
still
don't
have
a
good
answer
for
that
other
than
the
feedback
that
we
get
from
customers
and
being
able
to
see
the
availability.
So
we
do
go
reach
out
to
customers,
but
just
from
the
Cassandra
data,
we
didn't
have
a
concrete
answer.
Our
users
fully
utilizing
the
service
they're
paying
for
that.
We
can
see
the
metadata
says
how
many
licenses
they
have.
We
can
see
how
many
users
use
those
licenses
and
we've
been
able
to
reach
out
to
organizations
and
say
hey.
B
We
can
work
with
you
to
save
money,
our
users
hitting
issues
we
can
proactively
help
them
with.
We
have
a
whole
team
at
office
65
that
that
gets
the
output
of
this
date
of
the
availability
of
customers.
What
users
are
hitting
issues
and
they're
able
to
reach
out
to
each
one
of
those
organizations
and
say
hey?
We
see
that
these
users
are
having
problems.
Can
we
go
work
with
your
admin
and
try
to
fix
them,
and
sometimes
it's
admin
side?
Sometimes
it's
our
side
and
how's,
the
user
experience
over
their
lifetime.
B
Well,
that's
hard
to
answer
since
we
just
started
this
recently
and
we've
had
users
for
a
long
time,
but
we
do
maintain
now
long-term
data
on
user
availability
and
you
how
users
are
using
our
system
now
the
great
feedback
we
got
from
from
support
and
customers
really
inspired
us
to
know
that
we're
doing
the
right
thing,
support
agents
when
you
call
them
they
don't
have
to
go
and
ask
you
all
these
questions.
What
services
are
using?
Where
are
you
failing
now?
They
just
look
it
up
and
they
know
which
is
fantastic.
B
So
everything
worked
perfectly
no
problems,
that's
not
true!
So,
just
like
everyone
else
that
we
are
seeing
at
these
talks,
bad
data
modeling
with
the
problem
huge
partitions
one
to
two
gigabytes.
We
had
no
idea
what
we
were
doing
and
we
didn't
used
a
tiered
compaction
for
date
to
your
tables.
So
we
had
a
lot
of
compaction
over
going
on.
Our
other
biggest
issue
were
not
watching
metrics
and
we
had
too
many
black
flush
riders
too
many
drought
mutations.
B
The
number
of
files
on
the
OS
was
set
lower
than
the
number
of
SS
tables
that
we
had,
which
immediately
caused
Cassandra
to
kid.
Maad
of
memory,
error
and
RSS
table
count
got
too
high
and
that's
where
you
really
have
to
work
to
bring
it
back
down.
You
have
to
monitor
that
closely
squeezed
off
center
now
for
for
monitoring
that.
B
So
why
did
we
choose
a
sure,
you're?
Probably
thinking?
Well,
there
Microsoft,
of
course
they
chose
Azure,
but
we
really
wanted
to
look
at
this
as
a
cost-benefit
analysis.
So
we
knew
that
we
wanted
use
the
cloud
we
didn't
want
to
manage
bare
metal.
So
we
had
to
evaluate
what
cloud
provided
everyone
to
use
and
we
went
and
looked
at
operational
costs
because
that's
more
important
than
the
financial
cost
that
you're
going
to
pay.
B
Regardless
of
what
cloud
provider
you
use
and
we've
been
using
Azure
for
five-plus
years,
writing
some
tools
around
them
for
deployment
and
monitoring
and
managing
the
service.
So
that
was
really
the
cheapest
way
for
us
to
go
and
have
our
current
dev
ops
team
be
able
to
manage
the
service.
We
also
work
closely
with
as
your
support
and
we
love
trying
things
out
on
a
droid
cuz.
We
can
try
to
break
them.
It's
always
fun
to
try
to
break
someone
in
your
company.
B
Why
Cassandra
I
talked
a
bit
about
this.
The
biggest
thing
for
us
was
didn't,
enforce,
full
consistency.
We
can
decide
what
data
is
important,
that
we
get
all
of
it
and
what
data
is
it?
Okay
if
we
lose
some
of
it,
because
we're
looking
at
in
aggregates
the
bad
things
where
we
had
to
run
it
ourselves,
we
would
have
loved
a
hosted
solution,
less
overhead
on
the
team,
but
so
far
it
actually
hasn't
been
bad
and
then
why
datastax,
of
course,
training
they
provide
integration.
So
it's
easy
for
us
to
get
spark
in
Cassandra
working.
B
We
have
op
center,
but
to
me
support
there
were
so
many
times
in
the
beginning,
where
I
see
Chuck
taking
a
picture
over
there,
where
we
were
actually
hitting
problems
and
we
couldn't
figure
it
out.
We
didn't
know
the
code
base,
so
we
just
went
to
datastax
and
we
said
please
help
and
when
you
have,
you
know,
bps
yelling
at
you
saying
hey
this
needs
to
get
up.
Datastax
is
there
to
back
you
up,
which
is
fantastic.
B
C
Thank
thanks
a
lot
shown
and
yeah
I'm
from
a
juror
engineering.
My
name
is
Savannah
Corey
ani,
my
team,
specifically
as
customer
advisory
team,
is
helping
first
party
services
like
office
365,
like
Skype
dynamics,
to
give
you
some
examples,
but
also
external
customers
to
onboard
on
Azure
their
largest
and
more
complex
solutions.
C
And
you
know
we
are
talking
about
end-to-end
solutions,
including
multiple
SEO
services,
and
you
know
specifically
I'm
more
focused
on
the
data
tier
and
that's
why
I
had
some
experience
working
with
customers
that
on-boarded
large
cassandra
clusters
on
top
of
on
top
of
failure?
So
so
Sean
was
mentioning
hosted
services
and
we
are
not
there
yet
having
a
fool,
hosted
the
cassandra
service,
a
fully
managed
available
on
azure,
but
we
do
have
a
number
of
options
for
you
in
order
to
deploy
your
own
cassandra
topologies.
C
On
top
of
on
top
of
the
azure
platform,
I
would
divide
this
into
two
main
buckets.
One
is
what
scott
guthrie
actually
presented
yesterday
during
the
keynote
it
is
our
market
market
place
based
offering
for
Cassandra.
We
work
in
a
lot
with
datastax
in
order
to
integrate
into
our
marketplace
section
of
of
the
azure
portal
simplified.
C
You
know
way
of
deploying
your
cassandra
clusters,
where
you
essentially
can
bring
your
own
license
and
deploy
production
and
non-production
clusters
based
on
a
number
of
attributes
and
configuration
options.
You
can
start
small
with,
for
example,
four
nodes
clusters
and
go
up
to
90
nodes.
These
are
you
know,
pre-baked
the
options
that
we
have
in
the
user
interface.
Whenever
you
are
actually
clicking,
you
know
create
a
brand
new,
a
brand
new
Cassandra
cluster
on
top
of
on
top
of
measure.
So
you
can
pick
up
essentially
some
some
options.
C
Like
you
know,
the
vm
size,
the
node,
sighs
the
vien
type
and
this
particular
marketplace
offering
today
is
limited
to
a
single
v-net
within
a
single
Asia
region.
When
you
are,
you
know
when
you
and
you
know,
ended
entering
all
the
attributes.
Basically,
you
get
your
cluster
up
and
running.
You
can
access
the
OP
Center
to
you
know
that
URL
that
you
see
on
the
bottom
of
the
slide,
and
essentially
you
can
start.
C
You
know
interacting
and
managing
with
with
your
Cassandra
cluster,
you
may
add
additional
services,
additional
compute
nodes
or
whatever
that
are
actually
consuming
or
interacting
with
the
Cassandra
service.
We
didn't
that
V
net
or
you
may
have
a
more
complex
networking
topology.
That
actually
is,
is
you
know
part
of
your
overall
solution.
The
other
option
that
we
have
is
actually
to
you
know,
let
you
define
your
own
deployment
apology
right
and
in
general,
you
know
this
goes
through
a
number
of
options
and
selections
that
you
have
that
you
need
to
have.
C
First
of
all,
you
need
to
decide
how
you
want
to
group
your
resources
within.
You
know
your
larger
deployment.
Apologies!
If
you
have
a
you
know,
compute
here.
If
you
have
data
tier
multiple
services,
interacting
with
each
other,
you
will
need
to
define.
You
know
how
you
want
to
manage
those
different.
You
know
resources
in
Asia,
then
you
want
to
actually
define
how
your
compute
and
storage
resources
typically
are
configured.
C
You
need
to
decide.
You
know
if
you
want
to
leverage,
you
know
ephemeral,
disks,
local,
to
the
single
computer,
compute
node,
or
if
you
prefer,
to
go
to
a
persistent.
You
know
durable
storage.
You
know
this
is
a.
We
know.
This
is
a
kind
of
a
sensitive
topic
in
terms
of
selecting
what
which
one
is
the
best
for
your
particular
workload.
But
we
can
we
offer
both
actually
and
we
will
see
a
little
bit
later
more
details
on
this
and
how,
for
example,
to
organize.
C
You
know
your
operational
activities
like
you
know,
backups,
taking
snapshots
of
your
of
your
Cassandra
databases
and
and
and
so
on.
A
third
you
know
typical
area
of
you
know.
Selection
for
defining
your
your
topology
is
about
networking
Sean
mention
that
you
know
they
consider
it
in
their
deployment,
a
single
region,
single
data
center
deployment
within
a
single
v-net.
C
We
have
seen
other
customers
actually
deploying
Cassandra
nodes
across
multiple
physical
regions
of
multiple
physical
data
centers
and
using
v-net
to
V
net
connectivity,
and
we
will
see
you
know
some
of
the
option
and
some
performance
considerations
around.
You
know
this
three
layer.
You
know
compute
networking
storage
options
that
we
have
available
today
in
in
Azure,
so
in
terms
of
organizing
resources
within
your
Azure
deployments,
we
recently
introduced
the
azure
resource
manager
as
the
backbone,
and
you
know
the
provisioning
and
managing
mechanism
across
all
other
services.
C
That
brings
a
lot
of
new
capabilities
compared
to
what
we
used
to
head
in
the
past,
a
armor
or
as
we
call
it
as
a
resource
manager,
is
fully
our
back
based.
So
you
can
define
your
role
based
access,
control
to
all
resources,
all
actions
or
operations
that
you
are
doing
across
your
subscriptions
and
then
deployments
instant
play
driven.
So
you
can
define
your
templates
using.
C
You
know
a
JSON
base,
the
dialect
and
how
you
know
all
the
resources
will
be
deployed
in
a
single
fashion
through
the
orchestration
engine
that
is
part
of
the
as
yours,
whose
manager
itself
it
hovers.
You
two
models:
the
declarative
model,
where
you
are
passing
a
template
to
the
arm
engine
or
the
imperative
one
where
you're
actually
interacting
with
resources.
C
Resource
group
is
a
fundamental
entity
within,
as
your
suits
manager
is
a
logical
container,
as
I
said
that,
where
each
resource
needs
to
belong
to
these
resource
groups
are
as
effectively
the
unit
of
life
cycle
management
for
your
Asia
resources.
So
whenever
you
are
deploying,
for
example,
a
Cassandra
cluster,
you
need
to
define
as
an
example,
you
know
your
networking
topology
and
your
compute
nodes
apology.
You
may
want
to
separate
these
into
different
resource
groups,
and
this
is
giving
you
the
ability
to
essentially
define
different
life
cycle
terms
for
this
different
resources
deploy.
C
So
you
may
want
to,
for
example,
get
rid
of
your
of
your
compute
nodes,
because
you
are
in
a
development
environment
and
keep
all
the
infrastructural
components
are
deployed
and
deploy.
For
example,
a
new
configuration
on
top
of
that
deployment
is
another
first
class,
a
concept
in
in
as
resource
manager.
Basically,
you
can
organize
and
track
this
template
execution.
The
deployment
template
execution.
You
can
read
the
you
know
diagnostic
information
out
of
these
deployments
to
understand
if
everything
went
well
and
you
can
also
create
nested
deployments.
C
If
your
deployment
apology
is
so
complex,
that
requires
some
form
of
organization
like,
for
example,
you
know
presidencies
and
in
some
more
complex
processes
behind
the
sea.
So
as
your
resource
manager
is
actually
helping
us
to
describe
the
topology
of
our
solution,
deploy
the
measure,
but
within
you
know
each
node.
Of
course
we
will
need
to
execute
a
number
of
configuration
options,
configuration
operations
and
in
Azure
and
in
Asia
resource
manager.
We
have
the
options
to
install
on
each
node
a
set
of
BM
extensions
that
are
basically
dedicated
for
a
particular
set
of
tasks.
C
For
example,
we
have
third-party
extensions
like,
for
example,
chef
and
puppet.
In
order
to
you
know,
automate
configuration
of
single
nodes.
We
have
custom
script,
extensions
that
can
deploy
scripts
from
our
own
repositories
and
start
doing
vm
configurations
the
marketplace
offering
that
a
Scott
showed
today,
for
example,
for
each
compute
node
for
each
Cassandra,
cluster
node
is
injecting
and,
of
course
everything
from
you
know.
The
Java
is
decayed
up
to
the
OP
Center
for
the
opscenter
node
and
then
is
out.
C
You
know,
automating
all
the
deployment
tasks
for
for
us
and
at
the
end
of
the
day,
is
giving
us
the
Cassandra
cluster
up
and
running,
but
these
extensions
can
also
help
you,
for
example,
in
the
compute
here
in
your
application
server
tier
in
order
to
inject
on
both
windows
and
linux
machines,
whatever
application
framework,
your
solution
is
actually
requiring,
and
you
know
these
aren't
templates.
We
see
a
lot
of
end
customers,
but
also
ISVs
or
system
integrators
that
are
starting
to
rely
on
these
templates.
C
In
order
to
organize
their
deployments,
these
templates
may
be
related
to
a
particular
solution
area.
Within
your
you
know:
bigger
infrastructure,
for
example.
You
know,
building
a
particular
Cassandra
cluster
or
the
templates
may
describe
the
entire
solution
end
to
end
by
using
the
this
nest
that
template
mechanism
will
give
you
the
ability
actually
to
orchestrate
very
complex.
C
What
we
are
seeing
is
that
most
of
these
templates
are
organized
in
order
to
deploy,
to
simplify,
essentially
the
set
of
options
that
you
have
available
for
your
deployments
grouping
these
options
within
some
t-shirt
sizes
type
of
deployments
right,
so
that,
if
you
need
a
small
cluster,
you
know
that
automatically
you
we
get.
You
know
a
given
configuration
in
terms
of
storage,
networking
and
compute
nodes.
If
you
need
a
medium
or
large,
you
know
these
configurations
will
be
different
and
you
can
automate.
You
know
these
different
options
within
the
same
template
by
you
know.
C
By
adopting
this
nest,
a
template
mechanism
so
based
on
the
parameters
that
you
will
pass
different,
sub
templates
or
nested
templates
will
be
will
be
called
behind
the
scenes.
So
there
is
a
a
you
know
very
interesting
white
paper
that
is
actually
describing
how
you
can
design
this
complex.
The
this
complex,
the
complex
topology,
the
topologies
for
for
deployment
by
nesting
together
and
joining
together,
multiple
multiple
templates
and
that's
the
templates.
This
is
giving
you
also
the
ability,
for
example,
to
maintain
to
evolve
what
to
test.
C
You
know,
debug,
a
single
you
know
smaller
units
of
your
deployment
process,
but
orchestrate
you
know
the
entire
units
within
a
single,
a
single
environment.
So,
but,
at
the
end
of
the
day,
every
arm
template
deployment
will
be
based
on
a
set
of
parameters
that
you
will
pass
username
passwords
for
compute
nodes.
C
You
know
network
configuration
the
region
where
you
want
to
deploy
that
particular
template
into
plus
a
number
of
other
rounding
parameters,
and
you
will
have
you
know
this
nested
structure
template
that
will
give
you
the
ability
to
define,
for
example,
to
configure
in
a
different
way
your
op
Center
nodes
compared
to
your
Cassandra
nodes
in
your
in
your
Cassandra
cluster
right.
So
you
will
define
these
roles
within
your
larger
deployment
and
arm
is
giving
you
the
ability
to
automate
both.
C
You
know
the
resource
you
know
deployment,
but
also
a
source
configuration
within
each
type
of
different
deployment.
So
I,
if
you
are
interested
in
this
topic,
I
highly
recommend
you
to
go
to
these
two
github
repos.
The
first
one
is
maintained
by
Microsoft
and
is
containing
right
now
more
than
200.
If
I
remember
correctly,
different
deployment
templates
orchestrating
large
solutions,
including
a
lot
of
OSS
frameworks
and
applications
from
from
Cassandra
to
to
elasticsearch
and
a
number
of
others,
and
also
is
also
showing
step-by-step.
C
C
Basically,
today
they
are
offering
the
ability
to
to
deploy
your
datastax
cluster
in
in
the
best
optimize
the
way
possible,
and
there
are
also
in
the
in
the
deck
a
couple
of
links
that
can
give
you
the
ability
to
go
directly
to
the
training
center
in
datastax
and
and
see,
for
example,
how
to
deploy
with
a
CLI
or
with
the
azure
marketplace,
Cassandra
on
top
of
on
top
of
asher.
So
what
kind
of
in
terms
of
compute
and
storage
options?
These
are,
of
course,
super
interesting
topics.
C
In
order
to
achieve
your
ratio,
your
sweet
spot
in
terms
of
performance
as
gay.
Okay,
we
do
have
a.
We
do,
recommend
a
couple
of
computer
families
for
Austin
Cassandra
nodes.
We
do
have
this
series.
The
series
is
based
on
local
SSD.
Disks
is
optimized,
it
has
Intel
processors
and
it's
definitely
recommended
if
you
want
to,
for
example,
balance,
cost
and
performance,
because
it's
a
cheaper
than,
for
example,
the
g
series,
that
is,
our
top
class
vm
family.
But
you
can
still
get
a
lot
of
performance,
for
example
from
a
femoral
node.
C
Even
you
know,
the
biggest
vm
for
from
d
series
can
give
you
up
to
60,000
I
ops
and
some
millisecond
latency
for
for
local
disks,
but
we
do
also
offer
the
ability
to
attach
remote
storage
network
at
storage,
in
particular
for
production
classes.
We
recommend
premium
storage,
that
is
our
provisional
die-offs
mechanism
that
is
giving
you
low
latency
and
I
throughput
capabilities
on
top
of
the
DSA
or
the
GS
series
of
of
BMS.
C
So
you
can,
for
example,
stripe
together
a
number
of
premium
storage
disks
and
get
up
to
80,000
I
ups
and,
if
I
remember
correctly,
for
the
latest
development
up
to
2
Giga,
two
gigabyte
per
second
in
terms
of
a
disk
true
put
for
a
single
node.
So
you
can
start
small
with
few
chords,
and
you
know
local
SSD
disks
and
go
up
to
32
cores
and
terabytes
of
I
performance,
low,
latency
storage
attached
to
to
each
node.
C
It's
you
know,
commit
data
and
commit
logs
are
better
suit
for
the
local
SSD
drive
or
for
the
premium
storage
option.
In
case
you
want
durable,
durable
storage
for
forest
in
your
forest
in
your
ear
date
in
terms
of
networking
deployment
options,
depending
on
on
your
network,
topology
strategy,
your
application
factor,
and
so
on
so
forth.
We
do
offer
you
know
as
a
as
a
as
a
basic
concept
this
this
idea
of
V
net.
So
it
is,
of
course,
a
private
network
environment
where
your
nodes
can
communicate
to
each
other.
C
With
you
know,
low,
latency
and
and
I
bandwidth.
Actually,
we
don't
throttle
nodes
that
are
talking
to
each
other
within
the
same
V
net
and
the
only
limit
is
actually
the
ability
of
a
single
VM.
You
know
the
essentially
the
bandwidth
available
for
for
for
each
v-net
and
we
can
go
up
to
20
gigabit
per
second
for
the
largest
vm,
that
we
have
a
g5
or
gs5
in
terms
of
in
terms
of
vm
size.
C
If
we
want
to
implement
across
region
deployment-
and
you
know,
for
example,
partitioned
at
across
physical
data
centers,
we
do
have
a
v-neck
to
vinet
gateways
available
that
are
giving
you
the
ability
to
create
a
single
address.
You
know
space,
we
didn't
multiple
data
centers
right
now.
We
do
have
two
options
in
terms
of
gateway.
The
standard
and
the
I
performance
won
the
I
performance.
C
One
can
give
you
up
to
200
megabit
per
second
in
terms
of
in
terms
of
throughput,
and
of
course
the
latency
will
depend
on
how
far
you
know
the
two
data
centers
that
you
are
selecting
in
terms
of
in
terms
of
deploying
the
different
you
know
multiple
nodes.
Are
we
see
you
know
we
do,
recommend
some
geo
pairing
between
regions.
If
you
want
to
maintain,
you
know
low
latency
and
we
don't
have
an
latency
SLA.
Actually,
nobody
in
this
in
the
industry
have
our
network
SLA.
C
In
that
that,
on
this
10-point,
but
the
measures
that
we
took
our
around,
you
know
20
millisecond
latency
between
this
GOP
data,
centers
that
we
have,
for
example,
West
us
and
south
central
us,
or
you
know,
or
some
of
the
other
19
or
20
I.
Don't
remember
exactly
the
number
today
regions
that
we
have
around
around
the
world.
These
gateways
also
have
the
ability
to
disable
encryption
on
this
VPN
tunnels.
C
In
case
you're
already
have
a
based
encryption,
like
you
know,
Cassandra
Cassandra
King
can
offer
you
and
you
know,
by
turning
off
the
the
encryption
you
can
get
some
additional
bandwidth.
Just
because
you
know
the
CPUs
I
know
compute
capabilities
on
these
gateway
nodes
will,
you
know,
don't
need
to
actually
compute
all
the
encryption
part.
So
we
give
you
some
more
bandwidth
between
your
your
different
minutes
across
multiple
data
center.
C
So,
just
to
summarize,
in
Azure
we
do
have
multiple
options.
To
give
you
the
ability
to
deploy
your
cassandra
clusters,
we
do
have
a
niley,
automated
simplified
mode
through
the
marketplace
offering
where
you
just
click
on.
You
know
create
your
new
cluster
and
the
cluster
will
be
deployed
in
minutes.
The
last
test
that
I've
done
is
for
a
4040
node
cluster
was
something
like
twenty
fifteen
to
twenty
minutes,
and
you
can
get.
You
know
your
environment
up
around
in
your
op
centers.
B
A
C
Yeah,
it
really
depends
on
some
to
give
you
an
example:
the
largest
VMS
vm
families
nvm
size
that
we
have
are
actually
owning
the
entire
physical
node,
so
I'm
assuming
performance
are
exactly
on
par
with
a
similar
equipment.
You
know
hardware,
our
environment,
but,
depending
on
you
know
your
your
requirements,
your
the
n-type
and
the
end
size.
You
can
get,
of
course,
different
different
behavior.
We
think
you
know
at
the
I
provider
level
we
are
doing
a
good
job
in
isolating.
C
You
know,
performance
characteristics
of
different
tenants
sitting
on
the
same
on
the
same
physical
host.
So
there
is
a
good
you
know
matching
between
given
VN
of
that
size
and
an
equivalent
hardware
equipment,
of
course,
depending
on
your
storage
option.
If
it's
a
local
storage
versus
remote
storage,
this
can
introduce
a
little
bit
of
variance.
B
D
I'm
interested
in
your
choice
of
date
tiered,
since
that
it's
kind
of
fundamentally
broken
in
its
implementation,
so
I
was
wondering
how
you've
been
working
with
it.
So.
B
B
The
good
thing
is
any
time
we've
gone
to
datastax
they've
already
gotten
that
same
problem
from
another
customer,
and
they
have
a
patch
for
us,
so
they've
been
really
to
help
been
able
to
help
us
through
that
problem,
and
so
far
our
compaction
overhead
has
really
gone
down
with
de
tiered
well
before
we
were
using
sized
here,
but
it
looks
like
we're
out
of
time
so
we'll
be
around
after
the
talk.
I
want
to
have
any
questions
and
tweet
and
email
us.
Thank
you.
Thanks.