►
From YouTube: DASH High Availability Working Group June 28 2022
Description
June 28, 2022
@Marian sharing OpenCompute PR for SAI APIs in the experimental section
https://github.com/opencomputeproject/SAI saiexperimentaldashha.h
We did not get through the entire set of APIs
Administrative State up/down (add enum w/more states such as starting syncing, syncing progress, sync completed, etc…)
Peer ID (used to communicate state)
IP for the session
Role (optional)
@Michal presented an HA Deep Dive slide
A
So
if
we
could
get
started,
I
guess
I'll
document
the
answer
about
the
ip
address
for
everyone
else
to
hear
the
ip
address
needed
to
sync
information
between
two
peers
and
it
looks
like
we'll
be
needing
to
use
the
unique
ip
address
and
theoretically
it
should
be
provided
by
bgp.
B
I
want
to
go
over
all
the
attributes
and
then,
if
anyone
has
some
comments,
I
saw
one
for
now.
B
B
So
here
we
define
a
set
of
attributes
for
an
aha
session,
so
the
two
devices
will
be
able
to
synchronize
their
state.
B
Let's
start
with
the
first
one:
it's
the
administrative
state
up
or
down
quite
simple.
We
we
decide
when
when
we
want
to
actually
start
syncing,
it's
up
to
the
controller.
B
Second,
is
the
one
that
I
have
had
question
about
the
pure
ip?
This
is
the
ip
address
that
we
will
communicate.
Our
state
with
should
be
unique.
A
C
Yeah,
okay,
guys
yeah,
the
the
I
I
p
to
initiate
the
appearing,
so
there
will
be
always
two
ips,
at
least
for
for
for
those
kind
of
stuff.
So
basically,
there
is
one
ip
that
this
each
card
will
have
a
unique
ip
which
which
is
basically
to
uniquely
identify
the
card,
and
then
this
is
the
ip
that
will
be
used
to
initiate
the
peering
right.
C
So,
for
example,
one
like
card
a
will
have
one
ip
card
b
will
have
the
second
ip
and
those
are
the
unique
ips
that
will
be
used
to
initiate
the
peering
right
and
on
top
of
it,
there
is
also
the
ip
that
is
announced
over
wgp,
which
is
the
the
highly
available
ip
right.
So
one
card
will
be
announcing
the
same
ip,
then
the
other
card
potentially
and
those
are
completely
different,
set
of
ips
right.
C
And
then,
basically,
the
bgp,
the
the
ip
that
is
basically
the
highly
available
ip.
This
needs
to
be
both
ipv4
and
ipv6.
Right
because,
like
underneath
the
underlay,
we
have
some
standard
that
underlay
is
using
ipv4,
for
which,
basically,
the
car
seems
to
announce
this
highly
available,
ipv4
ip
and
just
like
virtual
ip
and
for
the
sellers
that
are
using
ipv6.
The
cars
is
to
announce
over
the
pgp.
Also,
in
addition,
the
ipv6
highly
available
ip
right-
and
this
is
this-
is
basically
what
the
data
pad
will
be
using.
C
So
datapad
will
be
using
basic
kiddos
to
highly
available
ips,
that
the
cards
will
be
advertised
and
advertised
and
but
then
the
cards
also
need
to
talk
with
each
other
using
the
unique
ips-
and
this
is
this-
separates
the
ips.
We
call
them
like
management
ips,
on
which,
basically,
the
card
can
be
accessed
and
they
can
talk
with
each
other
on
their
on
their
own
kind
of
like
private
range
and
this
kind
of
stuff.
B
B
I
mean,
oh
actually,
sorry
confusing,
with
the
switch
like
the
out
of
the
management
network,
so
yeah,
okay,
so
thanks
for
clarifying
we,
so
this
is
the
unique
iep
that
will
be
used
for
establishing
the
session.
This
was
the
missing
information
that
we
had
up
till
now.
I
assumed
only
the
virtual
ip
okay
thanks.
A
D
Yeah
marion
a
clarifying
question,
so
this
header
file
here
hand
generated
set
of,
let's
say,
administrative
attributes.
I
would
call
them,
and
so
these
are
going
to
be
manually
or
hand
written
and
then
we'll
have.
In
addition,
the
auto
generated-
let's
say
data
plane,
headers
right
that
get
added
to
this
so
they're
additive.
Is
that
correct?
D
I
think
we
ought
to
explain
that
somewhere
in
our
documentation,
because
it's
not
obvious
to
someone
walking
in
here
what
how
this
all
merges
together.
So
at
some
point
we'll
want
to
capture
hey,
there's
some
hand-generated
apis
that
are
administrative
and
then
there's
the
auto-generated
ones,
and
they
they
merge
together
into
one
composite
set
of
dash
apis.
B
Right
right,
yeah,
we
will,
as
soon
as
we
will
have
an
agreement.
We
can
merge
this
this
api.
Then
I
will
have
a
pointer
to
it,
explaining
that
this
was
done
manually,
you're
crack.
Okay,
thank
you
all
right,
yeah.
So,
just
to
recap,
what
we
have
is
the
admin
state
which
is
create
and
set
so
we'll
be
able
to
change
the
state.
B
E
B
E
Question
comment
about
the
admin
state,
so
the
admin
state
is
the
one
that
we
discussed
was
related
to
the
role.
Is
it
role
of
mastering
a
slave
in
terms
of
thinking.
B
Role
is
a
separate
attribute
which
is
also
optional,
I'll
get
to
it
next,
but
admin
state
is
just
up
or
down
either
we
we
are
okay
with
starting
the
sync
or
maybe
we
are
pending
some
some
other
input
from
the
control
plane
and
waiting
for
for
the
session
to
be
able
to
start
the
sync
up,
and
the
role
is
the
next
attribute,
which
is.
B
B
So
we
have
two
roles,
so
this
is
an
enum.
Let
me
go
to
it.
We
have
two
roles:
defined,
active
and
backup.
So
active
is
the
default
one,
meaning
that
you
are
kind
of
working
independently
from
you
can
be
working
independently
from
a
peer
because
you
are
active,
so
you
can
accept
new
connections
if
the
if
the
operational
state
of
the
session
is
down,
even
if
your
peer
is
not
available,
because
you
can
still
accept
new
connections
and
as
soon
as
it
will
be
available,
they
will
be
synced
up.
B
However,
if
someone
desires
to
take
high
availability
as
higher
priority
than
new
connections,
then
they
can
declare
one
of
the
gears
as
a
backup,
so
that
if
the
session
goes
down,
backup
will
start
receiving
all
of
the
existing
flows
that
it
will
still
forward
as
before.
However,
it
will
not
accept
new
connections
until
there
will
be
there
will
be
a
new
year
and
the
session
will
go
up
again.
B
B
But
again
this
is
up
to
the
implementation
and
the
underlying
protocol
of
choice.
It
means
that
the
implementation
decides
that
the
peer
is
either
available
or
not
based
on
different
criteria.
That
will
be
chosen
either
like
heartbeat
messages
or
maybe
there
is
like
the
feedback
in
case.
This
is
the
reliable
protocol
that
you
get
acknowledgements
from
your
peer
that
you
received
everything.
So
this
is
cons.
D
B
Hk
session,
meaning
the
peer-to-peer
session.
C
So
I
have
a
classification
question.
This
is
the
operational
state
right,
because
one
thing
is
the
state
of
the
appearing
right
whether
appearing
is
established
is
starting.
Sync,
it's
in
process
of
the
thing
or
the
thing
is
fully
completed
and
it's
ready
right,
and
the
second
step
is
whether
the
peer
is
announcing
bgp
right,
because
because
the
idea
is
that,
for
example,
if
let's
say
the
one
one
appear
goes
down,
the
peer
goes
up
right.
C
Okay
now
the
spirit
is
operational
from
the
point
of
view
of
c
completed,
and
now
I'm,
let's
say,
okay
to
start
announcing,
bgp
back
right,
because
one
thing
one
thing
is
that,
for
example,
bgp
should
not
be
announced
before
we
complete
the
sync,
because
if
we
start
announcing
bgp
that
some
of
the
connection
in
this
active
active
mode
right
may
land
on
this
new
peer,
the
connections
are
not
it
sync
right.
So
the
peer
will
not
be
would
not
know
how
to
process
this
right.
So
I
want
to
basically
separate
two
things.
C
I
want
us
to
be
able
to
separate,
bringing
bgp
up
and
down
operational
versus
versus
hs
session
that
basically
syncs.
The
connections
right
because,
like
I
would
like
to,
for
example,
be
able
to
send
the
connections
even
if
the
bgp
is
not
up
right,
and
this
should
be
completely
unrelated
from
bringing
bgp
up
or
down
right.
B
C
Yeah,
I
would,
I
would
say
that,
because
the
the
sinking,
definitely
like
the
the
states
of
the
state
machine
will
be
more
right
because
they'll
be
kind
of
like
peer
gets
connected
right
or
it's
like
peer
can
be
completely
disconnected
right
or
appear
can
be
connected
and
then,
for
example,
and
then
basically
the
state
machine
will
go,
let's
say
starting
syncing
and
or
like
singing
progress
right
and
then,
for
example,
sync
completed
and
ready,
and
it's
like
ongoing
sync,
this
kind
of
stuff
right,
because
otherwise
it's
kind
of
like
down
versus
up
it's.
C
It's
not
sufficient
from
the
point
of
view
of
down
means
like
that
may
mean
that
I'm
not
disconnected,
but
that
may
also
mean
that
I'm
not
fully
ready,
because
not
everything
is
sync
right.
So
definitely
we
need
to
differentiate
down.
From
the
point
of
view,
the
connection
can
be
established
from
something
like
there
is
a
syncing
in
progress,
but
I
don't
have
yet
full
state
fully.
C
Sync
it
to
can
bring
basically
the
session
fully
up
from
the
bgp
and
then
the
third
state
that,
for
example,
right
now,
the
the
state
is
fully
sync
and
I'm
ready
and
everything
is
syncing
real
time.
So
I'm,
basically
that
the
period
is
fully
ready
with
everything
is
synced
from
one
side
to
the
another,
and
this
basically
only
means
that
this
is
the
time
when
I
can
raise
the
pg
position
up.
So
definitely
more
states
here
should
be
needed.
C
B
C
So
so
I
would
say
like
because
one
dp,
you
will
be
handling,
I
think,
like
multiple
in
ice
right,
if
we,
if
we,
for
example,
pure
base
on
the
enis,
that
means
that
one
device
will
need
to
have
multiple
peers
right.
So,
for
example,
potentially
like
one
e,
and
I
would
be
unique
with
this-
the
cni
on
the
on
the
other
dpo
that
this
one
on
this
this
kind
of
stuff.
I
really
more
think
that
we
should
establish
something.
C
Let's
say
that
I
that
basically
there
will
be
something
like
appearing
groups
or
peering
sessions,
this
kind
of
stuff,
and
we
can
say
that,
for
example,
that
that
at
the
beginning
that,
for
example,
device
can
only
have
let's
say
one
periodic
session
with
with
those
other
cards.
Let's
say
right
and
those
e9s
belongs
to
the
appearing
session
right.
So,
for
example,
this
can
stop,
because
if
we,
if
we
do
the
appearing
granularity
based
on
the
eni,
I
think
it
may
be
too
granular
from
the
point
of
view
right.
C
So
I
because,
like
if
every
single
eni
can
potentially
land
on
com
like
peer
with
completely
different
card,
potentially
yes,
but
at
the
same
point,
I'm
more
like
freaking
like
like
in
this
design.
Here,
that's
kind
of
like
pinning
session,
and
this
spinning
session
has
basically
the
list
of
eni
object
to
sync
right.
Yeah.
D
Well,
one
could
build
the
data
model
or
the
api
to
be
generalized
to
accommodate
such
a
sophisticated
approach
in
the
future,
and
then
we
all
agree.
We're
going
to
implement
appear
one-to-one
for
the
foreseeable
future,
but
the
data
model
will
prevent
us
from
becoming
more
advanced.
If
we
choose
to
yes
yeah,
you
know
that
way.
We
limit
the
scope
of
work,
but
we
don't
create
a
backwards
compatibility
or
an
api.
You
know
funkiness
in
the
future,
but
that
that
takes
more
work
to
even
build
in
that
road
map.
D
I'm
wondering
if
some
of
these
you
know
christina
actually
mentioned
this
to
me
once
in
the
past,
aren't
there
existing
high
availability,
architectures
and
things
that
we
use
every
day
like
redis
or
other
databases
other
places
where
people
have
already
kind
of
worked
this
out
and
agreed
on
the
states
and
synchronization
mechanism.
And
what
can
we
draw
inspiration
from
some
of
those
things?
So
we
don't
reinvent
all
the
high-level
administrative
things.
D
Been
you
know
around
since
telecom
days,
and
people
have
solved
this
problem
about
a
thousand
times.
Then,
if
we
reinvent
everything
from
scratch,
we
might
be
making
more
work
for
ourselves.
C
Yeah,
I
want
to
add
to
it:
yeah.
We
should
definitely
put
as
much
as
existing
states
impo
it's
possible
as
possible
right
to
do
this,
and
for
me,
the
kind
of
like
sinking
mechanism
right,
because
the
eni
is
is
too
false
right.
The
eni
has
like
a
ghost,
which
is
the
configuration
right
which
is,
for
example,
what
kind
how
acos
looks
like
how
routes
looks
like
for
the
specific
eni
and
the
running
policies?
Right.
C
That's
one
thing,
and
the
second
thing
is
from
the
point
of
view
of
actual
flows
right
and
and
for
me,
the
most
important
here
would
be
really
to
design
the
synchronization
to
not
have
to
sync
the
eni
state,
which
is
their
policy
that
that
our
control
plane
can
sync
and
our
control
plane
already
has
a
way
to
basically
send
it
to
multiple
peers
right
in
normal
discipline.
Decisions
kind
of
like
eventual
consistency
right,
but
the
but
the
most
important
part
would
be
to
actually
sing
the
flaws
right.
C
So
I
would
like
to
use
this
has
session
to
releasing
the
flaws
and
make
sure
the
flaws
get
seeing,
because
that's
that's,
I
would
say
the
heavy
part
because
should
be
seeing
almost
like
really
real
time
with
watching
right,
but
from
the
point
of
view
of
the
eni
state,
which
is
the
which
is
the
basic,
let's
say:
entire
uni
policy
list
of
lists
of
kind
of
mappings
to
translate
this
kind
of
stuff.
C
That
is
that
we
can
sync
through
control
plane
through
eventual
consistency,
because
even
if
the
customer,
for
example,
update
the
accounts
right
so,
for
example,
customer,
let's
say
switches,
one
accola
and
ads,
for
example,
deny
all
this
kind
of
stuff
right.
They
do
expect
that
they
will
they
like
once
they
click
on
portal,
for
example,
to
add
samako
right.
They
do
expect
that
hey
this,
this
kind
of
new
deny
rule
will
happen
in
let's
say
one
seconds
few
seconds,
this
kind
of
stuff
right.
They
don't
expect
it
in
fully
in
real
time.
C
So,
in
this
case,
like
eventual
consistency
when
we,
when
you
basically
plump
this
in
in
specific
location
at
some
point
right
that
that
normally
happens
right
but
but
the
flow
this
is,
I
would
like
to
concentrate
this
kind
of
thinking
mostly
about
the
flows.
If,
because
like
once,
the
connection
is
already
established
right,
because
the
aha
literally
the
main
thing
is
like.
C
If
there
is
one
device
right,
if
this
device
never
dies
right,
then
we
don't
need
to
have
a
cha
right,
but
the
device
may
have
power,
outages,
kind
of
stuff
right,
and
this
is
the
one
and
the
main
reason
why
you
need
to
have
that
if
one
device
dies
for
some
reason
or
another,
either
networking
problem
or
the
power
plan
or
basically
just
hardware
dies
right,
then
the
other
device
can
can
take
over
right
and
and
for
the
other
device
to
take
override
the
important
part
is
yeah.
C
Definitely
the
acls
should
be
synced
right,
but
at
the
same
point
they
are
being
synced
with
eventual
consistency
which,
which
means
that
basically,
within
a
few
seconds,
all
that
all
the
cards
will
have
the
same
state
right
but,
most
importantly
from
the
point
of
view
ability
right.
If
I
initiate
the
connection
and
one
device
dies,
then
the
connection
should
be
other
part
on
the
other
part.
B
Yeah
right,
so
this
h.a
session
is
all
about
sinking
flows.
We
are
not.
We
are
not
looking
at
thinking
that
policies,
because
this
is
what
controller
will
do.
E
E
E
I
see
that
you
know
what
gerald
mentioned
last
time
was
that
the
bgp
cost
that
it
advertises
for
the
two
pairs
to
you
know
will
be
different
and
the
higher
cost
will
become
the
backup
and
the
lower
cost
will
become
active.
E
So
I
just
feel
that
the
role
has
a
lot
to
do
with
the
bgp
configuration
as
well,
and
if
a
dpu
comes
up
and
and
the
role
is
already
active
and
then
if
it's
an
if
then
another
dpu
you
know,
partner
comes
up
right,
which
is
also
sort
of
equidistant
from
the
vm.
Then
the
cost
will
have
to
be
manipulated.
So
you
know:
how
does
that
work?
How
what
is
your
thought
on
that.
C
Without
a
ace
and
prepends,
the
other
announcement
with
asn
prepends
right,
this
is
the
current
implementation
that
we
are
trying
in
azure
right,
but
at
the
same
point
no
one
else
in
azure,
there's
only
one
other
social
that
is
using
this,
and
this
is
huge.
It
causes
huge
impact
on
the
control
plane.
Sorry
in
the
control.
B
C
Like
a
physical
network,
because
all
the
sis
and
prepends
needs
to
be
propagated
with
community
strings
and
it
uses
that
income
space
on
the
tours-
and
it's
basically
not
scalable-
to
work
with
the
active
backup
design.
That's
why?
Basically,
what
we
are
proposing
here,
considering
that
we
are
making
new
standard
to
create
this
in
a
way
that
all
the
other
services
that
we
have
in
our
platform
like,
for
example,
software
load
balance
or
discount
stuff.
They
actually
have
multiple
instances
which
are
announcing
active
active.
So
there
is
no
basically
preplans
or
anything
like
this.
C
D
C
This
case,
the
physical
network
doesn't
need
to
propagate
communities
and
doesn't
need
to
use
the
camp
space.
They
just
basically
announce
the
beeps
and
they
can
be
aggregated
and
the
traffic
basically
lands.
The
tours
will
basically
spread
the
traffic
using
using
normal,
like
hashing,
algorithm
right,
those
kind
of
stuff
and
some
of
the
traffic
will
land.
For
example,
let's
say
on
this
v
on
on
this
device
right
and
if
this
traffic
lights
on
this
device,
this
device
basically
will
sync
this.
C
This
flow
to
the
per
device,
if
some
other
traffic
lights
on
the
per
device,
this
per
device
is
saying
back
to
the
first
device.
So
there
is
availability,
but
there
is
no
distinction
from
the
point
of
view
that
one
is
strictly
active
and
one
is
strictly
passive
right,
so
we
don't
want
to
have.
We
potentially
may
want
to
have
this
distinction,
maybe
as
a
in
case
it's
needed,
but
this
is
not
really
the
primary
mode
of
operation
that
you
would
like
to
use
so
primary
mode
of
operation.
C
E
B
So
this
is
the
intent
because
attribute
is
optional,
and
by
default
the
peers
should
be
considered
active.
E
And
when
it
is
back
up,
it
is
still
active,
active
right.
It's
just
not.
What
is
the
exact
definition
when
we
call
it
backup.
B
B
Another
device
to
to
back
us
up
so
that
we
are
not
synced
to
anyone
for
some
people
for
some
period
of
time
or
we
don't
really
care.
We
prefer
to
rather
accept
new
connections
and
have
the
second
device
go
up
in
the
meantime,
and
that's
why
we
would
choose
active.
So
backup
is
just
not
accepting
new
connections.
C
Yes,
so
there's
no
accepting
new
connections
right.
I
wonder
from
the
implementation
point
of
view
right
because,
like
we
cannot
advertise
the
vip
during
this
time
right
because
like
if
we
advertise
the
vip
on
the
bgp
on
this
on
this
card
that
has
set
up
as
a
backup
right,
then
all
the
packets
will
go
to
let's
say
to
this
backup
device
right
and
it
does
not
accept
the
connections
we
kind
of
like
black
holy
customer
traffic.
C
Right
so
so
that's
why
I'm
more
thinking
that
not
accepting
new
connections
should
be
more
like,
like
separate
stuff,
that
if
I
don't
access
the
connection,
I
just
don't
advertise
the
vip
on
the
pgp
right.
This
means
basically
not
not
accepting
connections
right,
because
I,
if
I
will
be
advertising
the
vip,
but
at
the
same
point
they're
all
with
backup.
This
means
I'm
basically
black
calling
the
customer
traffic
in
general.
B
E
D
C
That's
correct
yeah
we
need
to.
We
need
to
update
it,
because
this
was
this
was
created
using
like
over
a
year.
I
think
ago,
right
when
you
were
the
time
where,
like
we
always
wanted
to
have
active
active
but
the
time
basically,
it
was
deemed
very,
very
hard
to
do
this
at
the
beginning.
So
that's
why
it's
actually
passive.
That's
why
it's
proposed,
but
yeah
it
should.
It
should
be
updated.
So.
E
B
G
Quick
question
so
for
for
any
given
card,
I
mean
the
dpu
that
dpu
for
any
given
time
will
act.
Isn't
it
true
that
it
will
act
active
for
some
enis
and
we
like
act
as
a
backup
for
some
other
set
of
enas.
C
D
H
G
Why,
when
we
made
that
that
point
at
that
time,
my
basic
understanding
was
that
the
card
will
be.
You
know,
behaving
as
active
for
those
32
enis
and
will
be
behaving
at
the
back
for
the
rest
of
the
32
you
guys
and
thereby
supporting
64
enis.
And
then
you
know
you
multiply
the
flows
and
connections
and
so
forth
accordingly,
but
now
you're
saying
it
that
you
know
it's
something
different
and.
C
Have
some
have
some
opportunities
like
like
in
a
way,
for
example,
the
active
passive
design
is
actually
working
right
now,
and
it's
kind
of
a
interesting,
interesting
way
of
basically
saying
that
one
card
is,
for
example,
not
receiving
any
traffic.
The
other
car
basically
is
receiving
full
traffic.
That
is
basically
doing
this
right
and
the
the
reason
why
we
have
basically
two
set
of
vnis
per
card
is
mostly
from
the
point
of
view
that
we
do.
C
B
C
Active
active
backup
design.
The
proposal
is
that,
because
there
are
some
things
with
regards
to
this
flow
replication
right
because
definitely
from
the
application
in
one
one
direction:
right:
it's
theoretically
easier
than
floor
replication
in
both
directions,
for
example,
one
packet
glance
here,
the
other
packet
lands
there
right.
So
the
in
the
active
active
is
slightly
more
complex
than
active
passive,
because
active
passive
basic
has
always
flow
replicating
in
one
one
way
right,
but
the
same
one,
because
we
don't
want
to
have
one
card
fully
dormant.
C
That's
why
we
propose
that
if
we
are
to
stick
with
active
passive
design,
then
at
least
have
two
groups
per
card,
and
one
one
group
basically
will
be
active
one
on
the
device
and
passive
for
the
other
device.
But
the
other
group
of
enis
will
be
active
for
the
another
device
and
pass
it
for
the
first
device.
C
So
so
both
cards
are
active
at
this,
like
from
the
point
of
view
of
receiving
traffic,
but
it
should
receive
traffic
as
the
primary
for
a
different
set
of
cards
versus
is
a
standby
for
a
send
off
in
eyes.
C
But
it's
not
going
for
differences
to
be
nice
right,
and
this
this
is
mostly
the
two
groups
is
mostly
to
to
make
sure
we
don't
have
like
a
card
that
is
completely
not
utilized,
so
this
was
the
only
reason
why
we
decided
to
have
two
groups
instead
of
one
group,
because
because
one
ineffective,
passive
kind
of
you
think
about
this
is
always
kind
of
like
a
one.
Car
is
active
for
a
nice.
The
other
is
standby,
but
in
this
case
the
other
card
is
completely
utilized.
C
So
that's
why
we
decide
to
have
two
groups,
so
both
cards
are
utilized
and
in
case
one
car
dies,
then
the
sure
the
capacity
of
one
cars
is
halved,
because
this
card
is
only
serving
traffic
for
both
the
the
ones
that
were
active,
plus
the
one
that
will
potentially
backup
which
are
right
now
is
active,
and
but
it
is
still
using
the
asm
prepared
right.
C
So
once
you
have
this
replication
right,
then
the
active
active
is
really
the
preferred
one,
because
the
bootcad
are
utilized
this
kind
of
stuff
right,
and
you
can
think
that
it
is
very
easy.
Once
you
have
this
flower
application
working
both
direction
in
the
irrespective
one
packet
lands
right
to
switch
the
in
effective
passive.
If
you
want,
because
switching
to
active
passes
is
really
the
the
bgp
configuration
right.
C
So
so,
if
we
consider
bgp
configuration
to
be
completely
separate
from
the
configuration
of
the
flow
sync
right,
then
if
the
flossing
happens
right,
irrespective
of
how
bgp
is
advertised
and
redirects
the
traffic
right,
if
the
flossing
happens,
when
the
packet
lands,
for
example
on
device,
I
and
it
gets
replicated
device
b
or
some
other
packets
lands
on
device
b
and
grades
duplicate
two
to
basic
device.
A
and
and
this
works
in
both
direction
right
then,
we
have
active
active,
which
means
the
flow
application
is
fully
working
right
and.
F
C
Doing
this
to
active
passive
that
doesn't
require
any
changes
to
the
sync
algorithm.
It
just
requires
potential
some
adjustment
to
bgp.
So
I
want
to
make
sure
that
bgp
logic,
which
is
completely,
for
example,
let's
say
separate
microservice-
that
is,
advertising
pgp
right
this.
This
should
be
capable
of
advertising.
C
For
example,
one
card,
I'm
advertising,
let's
say
without
asn
people
the
same
web,
the
other
card
would
accent,
and
this
will
allow
control
plane
of
of
whoever
will
be
using
this
to
decide
if
they
want
to
have
active
active,
which
is
the
preferred
way
right
or
they
want
to
play
with
active,
passive
or
those
kind
of
stuff
right.
So
that's
why?
Basically,
those
are
two
kind
of
whether
we
later
do
like
active
passive
versus
a
versus
active
active
will
be
only
depending
on
the
bgp
right,
but
this
is
only
on
the
condition.
C
This
will
be
only
possible
on
the
condition
that
the
flossing
algorithm
actually
works
in
both
direction
right,
so
it
because,
like
if
we
design
from
day
one
to
only
support
active
passive
design,
then
there
will
be
lots
of,
for
example,
for
example
like
shortcuts.
That
will
only
sync
flows
in
one
direction,
and
this
kind
of
stuff
right
and
we'll
never
be
able
to
do
active,
active
but
really
active,
active
is
kind
of
like
a
long-term
vision,
because
this
will
allow
us
to
save
the
camp
space,
and
this
is
also
how
other
aj
solutions
are
working.
G
Yeah
no
thank
you
michael.
I
I.
C
G
You
know
it's
it's
based
on
again.
We
are
not
precluding
active
active
so
and
it's
a
more
flexible
model,
but
even
if
we
want
to
start
with
saying
hey,
you
know,
let's
start
to
think
about
not
only
in
the
active
passive
part,
but
let's
not
preclude
that
hey.
You
know
we
are
not
going
to
do
to
active
active.
It's
always
the
understanding
is
that
have
more
flexibility
around
it
and
then
that's
why
my
question
initial
was:
is
that
always
think
in
terms
of
that?
G
You
know
they'd
be
appearing
in
e
and
I
so
as
to
not
basically
say
that
hey
you
know
there
will
always
be
one
to
one
mapping
so
to
speak.
You
know
one
card
may
be
protecting
ens
from
multiple
different
cards.
So
if
that
flexibility
is
there
in
the
design
or
in
the
data
model,
then
we
are
essentially
are.
You
know
really
designing
it
for
the
more
flexible
and
and
more
scalable
sort
of
like
more
expandable
design.
Yes,
yeah
and.
C
But
one
thing
that
I
want
to:
I
want
to
tell
basically
that,
like
because
we
have
taken
space
issues,
because
we
know
that,
like
we
are
advertising
lots
of
prefixes
and
this
kind
of
stuff
in
data
centers
right.
We
know
for
sure
that
we
will
will
not
be
able
to
put
in
a
data
center
any
new
solution
that
is
basically
active
passive
right
only
because,
like
just
asn
prepares
scale
in
very
limited
way,
because
because
they
just
need
to
be
propagated
everywhere,
they
can
be
aggregated
this
kind
of
stuff
right.
C
H
So
so
michael,
this
is
just
to
clarify.
So
the
summary
of
what
you're
saying
is
whether
a
particular
appliance
is
in
active
or
passive
mode
is
completely
dependent
on
the
bgp
reachability
to
it
right,
that's
correct.
Only
the
active
would
always
receive
because
it
will
have
the
lowest
cost
in
the
network,
while
the
other
other
peers
within
that
tiering
group
will
keep
receiving
the
flow
sync,
but
they
will
never
be
responding
or
they
will,
they
will
have
higher
costs.
Only
one
of
them
will
be
effectively
active.
There
is
no
active
passive,
otherwise.
C
Yes,
that's
correct
and
one
clarification
on
this
yeah,
so
you're
right
that,
basically
it's
just
the
bgp
configuration
right
and
just
clarification
on
this
that
that,
in
those
car,
those
cars
or
those
devices
that
have
been
data
center
right,
we
want
to
actually
set
the
bgp
without
asm
prepaid
right.
So
so
the
scenario
that
we
are
discussing
right
now
that
will
be
advertising.
For
example,
something
could
be
gps
active
like
lower
cost
and
the
other
as
passive
as
a
as
smaller
cause,
like
bigger
causes
kind
of
stuff
right.
C
This
is
not
how
we
configure
bgp
like
this
is
not
how
we
will
configure
bgp
right.
We
may
decide
to
maybe
configure
this
in
some,
like
maybe
edge
site
for
the
customer,
for
some
crazy
reason.
I
don't
know
right,
but
but
how
will
basically
configure
those
cards?
I
can
tell
you
that
you
will
get
those
cards
established
peering
and
once
the
appearing
is
sync
right,
we
will
basically
both
cards
will
advertise
advertise
basically
bgp
with
the
same
cost.
H
C
Yeah,
and
let
me
let
me
maybe
share
quickly,
I'm
just
checking
with
slides
of
anything.
C
Confidential
or
not-
probably
not
at
this
moment,
so
I
can
quickly
potentially
share
it.
This
is
kind
of
what
what
we
kind
of
designed
the
project
on
our
site
is
called
sirius.
So
it's
kind
of
serious,
it's
kind
of
you
guys.
You
guys
should
see
that.
C
That
basically
indicates
one
eni
right,
so
so
in
this
case,
ignore
at
the
beginning
the
flow
splitter,
this
kind
of
like
additional
stuff
that
I
can
discuss
in
a
second
right,
but
if
you
guys
consider
just
only,
for
example,
the
the
most
inner
part
which
is
kind
of
the
card
that
is
basically
pairwise
replication
with
the
with
the
red
arrow
right.
C
I
think
it's
like
teal
and
yellow,
and
the
same
ghost
state
is
basically
on
the
other
card
stuff
right
and
this
card
is
basically
has
a
pairwise
replication,
which
means
that
the
flow
can
here,
it's
kind
of
like
higher
asn
and
lower
sense.
So
it's
kind
of
like
slightly
outdated,
but
imagine
there
will
be
no
higher
s
lower
asl
a
send.
Basically,
those
will
be
advertising.
C
Let's
say
the
same
beeps
like
let's
say
this
23001
will
be
advertised
by
both,
for
example,
right
and
then
the
traffic
can
actually
land
on
any
of
those
cards
right
and
because
these
cars
are
appearing
they
should
they
should
basically
replicate
the
flow
to
another
card
right.
So
this
normal
replication
and
and
the
reason
why-
and
this
is
basically
you
wanted
this-
to
be
working
active,
active
mode
right.
C
So,
for
example,
if
some
vm1
will
send
some
traffic
right,
it
should
not
matter
if
it
lands
on
the
on
the
card
on
the
left
series
appliance
on
the
card
on
the
right
series
appliance
right.
Wherever
this
rapid
glance
from
door
perspective,
it
should
be
basically
replicated
with
the
other
card
right,
and
this
is-
and
we
are
thinking
from
the
aha
of
kind
of
like
enabling
this
in
a
few
steps.
C
But
at
the
end,
don't
have
like
a
step
three
potentially
right,
but
the
but
the
kind
of
step
one
and
then
and
the
master
one
will
be
kind
of
like
have
eni
on
on
two
cards,
so
basically
program.
The
seminar
on
card
one
and
card
two
from
the
ghostly
perspective
right
and
don't
establish
yet
the
floor,
application
that
we
are
talking
about
right,
so
the
floor
application,
for
example,
there's
no
floor
application
right.
So,
for
example,
just
the
ghostly
on
both
cards
right
and
one
would
be.
C
Let's
say
on
the
bgp-
will
configure
that
one
is,
for
example,
with
a
higher
asl
like
lower
asn,
prep
and
higher
same
people.
This
kind
of
stuff-
and
it's
kind
of
this
kind
of
backup
active,
active
backup
right
when
we
can
potentially
start
testing
those
cards
kind
of
stuff.
But
this
also
only
guarantees
for
the
customer.
C
The
availability
from
the
point
of
view
of
the
goal
state,
but
not
from
the
point
of
the
flow
right,
so
here
you
guys,
can
imagine
that
like
if
one
because
there
is
no
flow
replication,
they
can
do
mice
as
well
micelle
one
if
one
car
dies,
then
the
other
card
can
pick
up
new
connections
because
the
the
eni
was
configured
so
the
gaussian
was
replicated
by
our
control
plane
right,
but
because
there
was
no
fraud
application.
Yet
all
the
connections
will
die
for
the
customer
right
in
case
one
car
dies
right.
C
So
that's
why,
for
example,
myself
one
is
kind
of
good
for
testing,
but
not
only
for
production
right,
so
we
can
test.
If
two
cards
can
can
pro
can
basically
support
supported
traffic
with
this
kind
of
like
asm
failover
right,
but
there
is
no
forward
application
yet
right
the
floor
application.
Actually,
what
we
need
here
is
is
what
we
are
discussing
is
basically
the
step
two,
which
is
the
two
cards,
the
same
unites
program,
two
cards
and
we
set
up
the
spareware
for
application
right
and
we
have
this
automatic
bgp
fill
over.
C
In
this
case.
The
step
two
is
kind
of
this
automatic
bgp
failover
and
here's
like
a
active,
active,
active
and
no
asn
prepaids
right
in
this
case
right.
So
so,
in
this
case,
it's
active
active.
So
so
both
cards
have
the
goal
state
and
they
set
up
the
pervs
application
right
and
if
something
fails,
like
collection
fails,
the
bgp
will
take
over
and
automatically.
Basically
all
the
connections
will
be
going
to
only
one
card
and
it's
active,
but
because
it
was
perfect
peris
for
all
the
ph
was
happening.
That's
why?
C
Basically,
this
card
has
all
the
connections
so
there'll
be
some
few
packets
that
transmit
from
the
customer
perspective
to
wait
for
wgg
converges,
which
is
like
5
or
10
seconds,
this
kind
of
stuff
sure
unfortunate,
even
event
and
during,
however,
no
already
established
connection
will
get
dropped.
So,
for
example,
custom
will
experience
some
particular
transmissions
right
if
they
were
like
transferring
some
bigger
data
from
one
vm
to
another
or
from
storage
right,
but
no
connection
will
be
dropped
because
because
tcp
expiration
window
will
not
hit
right.
C
So
so,
in
this
case,
the
second
solution
is
the
one
that
we
want
to
have
and
is
the
one
that
we
are
designing
right.
So
basically,
you
want
to
have
same
eni,
configure
on
both
cards
and
flow
replication
established
using
this
right
and
using
bgp
that
will
automatically
cover
failover
with
this
active,
active
mode
and
noise
and
prepares
right,
and
this
is
the
one
that
you
want
to
have,
and
it's
the
one
that
we
are
discussing
right
and
in
the
long
term.
C
We
also
want
to
potentially
spread
the
traffic
because
sometimes-
and
we
call
it
kind
of
like
infinite
scale,
but
not
really
fully
infinite
scale
like
in
a
way,
for
example,
if
some
customer-
let's
say
one
car
will
give-
let's
say
I
don't
know
like
4
million
cbs
right,
which
means
that,
for
example,
if
we
put
eni
on
this
card
right,
the
card,
the
customer
can
only
reach-
let's
say
4
million
connection
per
second
right,
and
this
is
the
upper
upper
limit
of
the
card
right.
C
C
We
actually
want
to
put
the
splitter,
which
is
on
the
left
side
and
do
more
intelligent,
hashing
like
in
a
way
that,
based
on
this,
like
consistent
hashing
of
the
connection
that
are
gets
from
example
from
the
vm1
right,
we
will
actually
end
up
this
traffic
with
this
kind
of
tunnel
right
not
to
the
specific
card
which
is
like
pairwise
replication
with
a
red,
but,
for
example,
for
one
of
the
other
cards,
which
is
also
paired
with
the
other
cast.
C
That's
why
those
are
additional
cards
which
is,
for
example,
pairwise
with
a
card
that
is
paired
with
blue
or
green
right
and
in
this
case,
we'll
program,
the
eni
on
more
than
two
cards
right,
we'll
program,
the
dial,
for
example,
all
the
six
cards
right
cards
are
paired.
So
basically,
the
innermost
card,
for
example,
is
paired
with
the
other
card.
C
The
the
middle
card
is
paired
with
some
like
other
card,
with
the
same
or
difference
user
plans
here
just
two
series
of
plans,
but
you
guys
can
imagine
that
those
card
can
be
whatever
in
data
center
right.
So,
for
example,
in
total,
we'll
have
let's
say
six
cards,
each
of
them
will
be
appeared,
so
it
would
be
paired.
C
So,
for
example,
in
this
case
we
have
like
three
cards
here
and
three
cards
per
right
and
imagine
that
the
same
eni
is
not
only
programmed
on
the
main
card
and
the
paired
card,
but
we
program
this
on
all
the
cards.
So,
for
example,
from
the
go
state
perspective
right,
all
the
cards
will
have
the
same
ghostly
so
we'll
be
able
to
serve
new
connections
right
and
but
all
the
cards
will
be
advertising
different
set
of
beeps.
So,
for
example,
one
the
middle
card
will
be
advertising.
C
C
A
closer
to
the
vms,
which
means
that,
based
on
the
five
tuple
hash
of
the
connection,
we
will
decide.
Okay,
this,
let's
say
one
third
of
the
connections
that
this
vm
originates
goes
to
the
specific
veep,
which
means
that,
for
example,
first
set
of
cards
right.
The
other
part
of
the
connection
goes
to
other
ships
right,
which
will
would
be
kind
of
doing.
The
same
kind
of
like
ecmp
is
happening
on
the
physical
network
right
just
more
intelligent
from
the
point
of
view.
C
It's
it's
not
only
doing
ecmp
on
the
layer,
two,
which
is
basically
like
this
link
versus
this
link
right
but
or
like
basically
like
it'll,
be
doing
on
layer
3,
which
is
basically
the
ip
based
ecmp,
so
more
intelligent.
So
this
will
allow
us
to
even
reach
more
capacity
than
than
the
four
million.
For
example,
if
one
car
gives
us
four
million
right,
then,
theoretically
customer
can
we
can
evenly
split
the
splitter
flows
that
is
experiencing
right,
so
one
car
will
be
handling.
Let's
say
one
third
of
the
flows,
the
other
car
one.
C
Third,
so
we
can
have
like
12
million
cps
or
even
more
if
we
spread
the
traffic
evenly.
This
kind
of
stuff
right-
and
this
assumes
this
active,
active
design
between
the
cards
and
also
also
decreases
the
blast
radius,
because
if
one
card
fails,
only
some
part
of
the
traffic
gets
affected
because
basically,
the
other
car
needs
to
handle
all
this
traffic.
C
But
all
this
traffic
only
from
the
one-third
right
so
basically
we'll
be
doing
stuff
on
our
site,
which
is
outside
of
basically
this
engagement,
because
this
engagement
from
the
hardware
perspective
right
to
have
replication
p2
cards.
But
on
our
side
we
also
will
have
the
splitter,
which
will
basically
even
split
this
ecmp,
based
on
the
vips
and
and
this
kind
of
next
step
from
our
site.
But
for
this
we
need
to
have
a
step
number
two,
which
is
the
the
floor:
application
to
make
sure
we
provide
availability
between
two
cards.
C
So
the
connections
are
not
getting
lost
right.
So
that's
that's
kind
of
how
we
are
thinking
about
this
right
and
this
is
how
we
can
reach
higher
connection
per
second.
Even
then,
like
four
million
discount
stuff
right
per
card
right
and
for
the
customer,
so
customer
will
be
super
happy
about.
C
I
think
right-
and
this
is
kind
of
some
of
the
ip
addresses-
that
you're
kind
of
thinking
from
point
of
view
of
allocating
that
that
basically
each
card
will
have
like
unique
address,
which
is
which
is,
for
example,
let's
say
for
ipv4-
maybe
let's
say
2
27,
so
each
card,
if
we,
for
example,
six
card
per
per
chassis
in
the
risers,
because
this
is
how
much
space
there
is,
for
example,
for
some
of
the
some
of
the
devices.
This
is.
C
This
is
what
we
will
do
and
we
will
allocate
this
unique
addresses
right,
and
this
will
each
card
will
have
this
like
unique
veep.
This
is
the
one
that
is
basically
being
advertised
on
the
on
the
bgp
right
and
potentially
unique,
really
128,
but
this
really
advertises
64
on
tours
right.
So
this
can
staff
plus
plus
some
additional
management
ip
for
the
management
network
right.
C
E
Thank
you
michael.
This
is
really
useful
and
very
good
to
see.
E
C
So,
to
be
honest,
like
the
the
splitting
right,
it's
doing
basic,
speeding,
kind
of
like
ecmp,
so
not
be
doing
any
intelligent
monitoring
of
the
dpu
capacity
and
this
kind
of
stuff
right.
We
are
assuming
that
flows.
The
the
splitting
element
we
have
will
will
be
uniformly
distributed
right,
so
we'll
uniformly
basically
split
the
flow.
So
let's
say
one
third
of
the
flow
or
1
4,
depending
how
many
beeps
you
will
decide
to
split
right.
It
will
be
doing
this,
so
this
would
be
like
uniformly
distributed.
C
So
definitely
not
intelligent,
maybe
in
the
future,
but
I
don't
perceive
this
happening
right.
At
the
same
point,
the
splitting,
I
would
say
it's
outside
of
our
discussion
for
the
because,
because
this
plus,
even
even
more
fancy
logic
that
we
have
on
our
site,
we
can
we
can
do
with
the
once.
We
have
the
the
basic
functionality
of
the
of
the
flow
replication
between
two
cards,
and
this
is
basically
the
main
idea
of
this
work
stream.
C
To
make
sure
that
this
point
number
two,
which
is
basically,
we
can
put
the
same
eye
on
two
cards-
have
flow
replication
right
and
then,
on
top
of
this
use,
bgp
for
failover.
That's
the
main
goal
right,
so
so
anything
else
like
the
first
thing
is
mostly
like
we
are
doing
some
internal
testing,
whatever
not
really
related
to
this.
This
kind
of
stuff
there's
the
second.
C
The
second
step
is-
is
really
where
we
need
this
work
stream
to
kind
of
give
the
cards
ability
to
do
low
replication,
so
the
customer
connections
are
not
being
affected
and
the
and
the
ecmp
is
really
like
internal
stuff.
We
right
now
are
doing
this
like
just
normal
ecmp,
just
just
using
the
layer
3
right,
but
we
can.
The
splitter
can
be
even
like
more
complex
or
whatever
right,
but
this
outside
of
this
works
right.
So
this
works.
C
A
A
I
don't
think
we
got
through
the
full
set
of
apis.
We
may
have
to
pick
up
where
we
left
off
next
week.
Didn't
win,
have
a
comment
before
we
drop
off.