►
From YouTube: DASH High Availability Working Group Nov 8 2022
Description
BFD & ECMP - Prince
Overview, from a DASH perspective we would need to add-on
A
Into
the
high
availability
meeting
for
the
day,
November
8th
so
last
week
we
talked
about
covering
comments
on
the
AMD
PR
here
271
and
we
also
possibly
had
Prince
coming
to
talk
about
BFD
and
ecmp,
so
I'm
trying
to
get
a
hold
of
him
here.
Yeah.
A
Oh,
hey,
Prince,
hey
how's,
it
going
good
good,
good,
okay
and
I
was
trying
to
see
if
I
needed
to
let
someone
else
in
Lisa,
great
okay,
so
yeah.
Why
don't
we
get
started
with
Prince?
We
we
went
ahead.
Maybe
maybe
we
can
go
over
this
with
Gohan
when
and
if
he
comes
the
The
Proposal
271,
so
I'll
stop
presenting
here,
and
maybe
we
can
talk
a
little
bit
about
BFD
and
ecmp
yeah.
B
So
I
have
one
diagram
to
share,
so
I
can
I.
B
So
it
it
should
cover
for
both
the
bft
and
another
proposal
that
we
we
have
been
thinking.
So
let
me
share
the
screen.
B
Yeah,
so
can
you
all
see
my
screen?
B
Okay?
So,
basically
in
in
this
approach,
right,
let's
say
the
we
have
like
a
appliance
card.
One
is
primary
and
another
is
backup.
B
So
from
a
T1
layer
we
can
initiate
some
sort
of
configuration
through
the
controller
to
advertise
the
the
bgp
prefix
that
the
the
appliance
cards
are
advertising
right
and
then
and
then
the
T1
can
establish
bft
sessions
right
with,
with
with
the
cards
like,
basically
the
primary
and
backup
and
and
based
on
the
bfts
hell
right
between
the
between
the
primary
and
backup
the
T1
can
determine
what
is
the
the
end
point
for
that
for
that
package.
B
So
I
have
the
one
who
said:
let
me
also
share
one
more
documentation,
so
that
would
be.
B
So
this
is
the
the
PFT
and
overlay
ecmp
design
document
that
we
have
in
Sonic.
So
basically
the
idea
is
the
the
routes
or
the
prefixes
are
programmed
at
the
T1
layer
and
T1
will
distribute
the
packet
to
different
endpoints
right.
So
let's
say
these
are
the
the
appliance
cards
and,
and
in
our
case
there
are
like
some
primaries
and
some
backup
the
T1
can
be
programmed
in
such
a
way
that
the
controller
can
say:
okay,
hey.
This
is
the
the
endpoint
one
is
the
primary
and
two
and
three
are
the
backup.
B
Now
the
T1
can
initiate
bft
session
to
each
of
the
endpoints
and
when
the
primary
goes
down
based
on
the
bft
session
right,
the
the
T1
layer
can
reprogram
the
route
to
to
point
to
the
the
backup
endpoints
so
that
when
the
packet
lands
on
T1,
instead
of
going
to
the
to
the
previous
active
endpoint
it
can,
it
will
now
go
to
the
to
the
backups.
So
this
health
of
the
the
back
or
the
cards
are
determined
by
the
bft
session.
B
Does
it
make
sense?
Or
so
today
we
we
are
using
this
propos.
This
side
idea
for
overlay,
ecmp
hashing.
Basically,
the
packet
lands
on
T1
and
it
distributes
the
packet
to
all
the
different
endpoints
in
in
an
ecmp
fashion.
But
the
same
thing
we
can
extend
to
to
a
active
backup
or
primary
backup
scenario
as
well.
D
There's
one
question
here:
so
the
the
bft
session
and
the
tunnel
endpoint:
are
they
kind
of
related
or
is
it
some
custom
logic
to
map
which
tunnel
I
would
imagine
there
is
a
tunnel
from
leafman
to
tunnel
in
point
one
right
and
that
there
are
three
tunnels
here
and
one
of
the
tunnel
goes
down
to
reprogram
to
go
to
the
other.
Two
I
would
guess
right
when
the
PSP
goes
down.
D
So
is
the
mapping
in
the
scheme
right,
for
example,
the
tunnel
endpoint
one
has
to
be
the
source
IP
for
the
PFT
session,
or
is
there
a
separate
mapping
to
say
this
bft
session
corresponds
to
these
tunnels?
The
identity
is
whatever
so.
B
D
B
We
have
today,
so
basically
you
can
say
the
end
point
is
so
this
is
the
prefix.
Let's
say
this
is
the
beep
that
is
advertised
okay.
It
can
specify
what
is
the
endpoint
IP,
which
is
this
endpoint
IP
right
for
that
preference,
and
then
you
can
also
say
for
that
IP.
What
is
the
monitoring
IP
or
the
bfti?
This
is
the
bft
session.
Okay,.
B
D
D
D
B
B
So
yeah,
so
basically
it
means
like
over
here
in
the
same
schema
when
we
say
in
point
and
endpoint
monitor.
There
will
also
be
one
more
attribute
that
says
primary
and
it
can
say,
like
IP
address
one
comma
IP
address
2
are
primary
and
say:
if
you
have
IP
address
one
two,
three
and
four:
the
implementation
can
infer
that
3
and
4
would
be
the
the
backups
and
192
would
be
the
primary.
B
So
it
will
basically
start
the
bft
session
with
all
the
four
all
the
four
monitoring
IPS
and
and
make
sure
that
all
the
four
are
reachable
or
the
bft
sessions
are
up
and
when
both
the
prime,
let's
say,
192
are
primary
and
if
both
primaries
goes
down
is
when
it
falls
back
to
two
and
three.
The
next
stop
will
be
changed
in
the
hardware
or
the
Asic
to
choose
two
and
three
and
four,
which
are
the
new
backup.
B
Only
if,
let's
say
the
bft
sessions
are
up,
okay,
but
now
one
more
thing
is
like.
If
say,
the
BFD
session
comes
back
between
with
one
and
two,
then
it
will.
It
will
revert
back
to
one
and
two:
the
Prime
account.
B
See
it's
just
the
hardware
that
programs,
with
only
the
primaries
or
active
bfts,
there's
no
longer
prefix
or
shorter
prefix.
E
D
E
E
So
the
switchover
happens.
B
On
like
today
in
the
in
the
Sonic
we
we
can,
we
have
the
hardware-
or
this
is
hardware
offload
PFT.
So
the
the
notification
shall
come
to
the
the
sonic
layer
that
says
that
this
bft
session
is
down,
and
then
the
Sony
cork
agent
will
have
the
logic
to
map
that
vfd
session
to
this
endpoint,
and
it
will
remove
that
endpoint
from.
F
So
what
happens?
Is
this
one
of
the
three
ones
most
loses
its
bft
session
to
the
dpu.
B
So
yeah,
so
one
thing
is
in
in
this
case:
it
it
kind
of
assumes
that
all
the
T1
should
be
having
the
same
view
right
like,
for
example,
Leaf
one
and
leave
two
cannot
think
one
this
tunnel
in
point.
One
is
primary
but
leaf
here
things
that
okay,
this
one
is
primary
because
it
lost
the
the
PFT
session
right.
B
So
for
that
one
thing
is:
whenever
we
do
such
switchover,
the
plan
is
to
have
some
sort
of
alerting
and
monitoring
so
that
the
sdn
controller
knows
that
there
is
a
these
are.
B
These
are
the
this
is
the
view
of
the
of
each
of
the
T1
layer,
but
maybe
to
to
answer
your
question:
there's
no
synchronization
between
the
T
ones,
okay,
so
the
the
one
thing
that
we
thought
about
is
to
to
have
the
the
switch
over
being
reported
to
the
controller,
and,
if
controller
sees
that,
okay-
that's
not
that's
not
right,
then
we
need
to
take
some
actions
around
that.
F
So
from
the
from
The
View
perspective
ly
to
even
one
of
the
leaves
or
t1s,
then
it's
supposed
to
switch
over,
although
it
has
connected
you
to
other
tea,
leaves.
B
D
Yeah
I
guess
because
your
question
is
you're
looking
at
the
problem
scenario,
where
leaf
and
leaf
2
had
one
view
while
leaf
and
had
another
view,
we
are
looking
at
that
situation
from
the
dpo's
perspective,
where
the
dpu
still
has
different
and
leaf
too,
but
leaf
and
connectivity
goes
down.
Is
that
correct
is
that?
Was
that
your.
F
Planning
also
from
the
dpu
perspective,
is
in
the
active
standby
mode
of
operation.
The
standby
is
dropping
packet
right,
so
the
dpu
also
has
to
do
a
switch
over
correct.
F
So
if,
if
the
active
dpu
has
lost
connectivity
to
just
one
of
the
leaves,
while
it
still
has
connected
to
the
other
leaves
if
the
knees
needs,
synchronize
and
say,
okay,
we
want
to
we
actually
want
to
show
to
the
stand
by
the
standby
is
not
yet
ready
to
access
packets
unless
it
also
the
dpu
level
also
has
a
secure
happening
right
right,
so
there
has
to
be
some
kind
of
handshake
between
the
t1s
and
the
dpus
I.
Don't
know
like
probably,
we
need
to
explore
this.
Okay.
E
B
The
the
leaf
and
and
and
disables
the
bft
between
between
okay.
D
B
D
Okay,
so
currently
I
guess
from
the
dpu
side,
there
is
no
linkage
between
the
number
of
PFT
sessions
and
the.
D
B
B
G
B
Session
is
initiated
from
the
T1
to
the
so
when,
when
the
controller
programs,
the
the
T1
or
the
leaf
one
is
when
the
lephone
will
start
the
bft
session
with
with
this
endpoint
or
or
the
card
so
I
don't
see
a
requirement
for
the
dpu
to
keep
track
of
the
bft
session.
G
D
Oh
I,
guess
are
you
referring
to
the
heartbeat
itself.
D
G
See
if
this
BFD
infrastructure
can
be
used
for
heartbeat
sake,
right
that
we
are
trying
to
achieve
using
achieve
for
H
A
between
active
and
standby,
if
you
are,
we
are
trying
to
see
if
we
can
use
utilize
this
BFD
infrastructure
right.
So
then
we
will
need
the
path
from
active
DPO
to
standby
dpu.
Also.
D
Right
so
I
guess
Prince,
please
correct
me
my
my
view.
What
I'm
the
way
I'm
understanding
is
is
right,
so
there
could
be
potentially
let's
say
we
use
bft
as
a
protocol
for
monitoring
the
monitoring
DP
to
dpu
the
heartbeat
itself
right.
But
that
would
be
us
that
would
be
a
special
bft
session.
I
mean
what
I'm
meaning
is
we'll
just
call
it
heartbreak.
D
And
it
has
got
a
special
meaning
too.
Then
these
three,
for
example,
these
PFT
sessions,
that
we
have
these
bft
sessions,
have
no
bearing
on
the
HS
state,
while
whatever
is
there
in
the
heartbeat,
whether
it's
bft
or
a
different
protocol
that
has
a
pairing
on
the
HS
state
of
the
dpus
correct,
right,
yeah,
it's
incidental
that
yeah
incidental,
that
PFT
could
be
used
for
the
heartbeat
too,
but
not
related
to
this
scheme.
G
So
previously,
when
Gohan
was
suggesting
to
utilize
the
BFD
infrastructure
for
the
heartbeat
as
well,
I
was
under
the
impression
that
you
know.
B
Yeah,
so
that's
the
idea
of
PFT
with
probably
acmp
so.
D
Here
the
this
thing
itself
in
your
previous
this
thing
or
even
the
route
that
we
have
here
prints
I,
think
I
have
some
idea,
but
the
this
thing,
the
the
key
ones,
the
the
leaves
that
advertise
that
out
are
they
kind
of
related
to
where
the
dpus
are
placed.
The
appliance
is
placed
or
it
could
be.
B
B
Could
be
in
a
different
cluster
as
well,
because
these
are
like
a
tunnel
packets
right,
so
it.
B
D
B
It
can
some
of
it
can
take
a
a
direct
path
through
the
t0
if
it
is
if
it
is
connected
in
the
same
cluster,
but
otherwise
it
will
take
the
the
the
t2
path
and
then
go
to
another
cluster
to
it,
because
this
is
like
within
the
same
availability
Zone,
and
it
can
be
reachable
from
this
T1
to
another.
B
D
Yeah,
my
question
is
also
from
the
failure
domain
kind
of
this
thing,
because
if
it's
too
far
away,
then
I
think
the
case
you
were
saying
with
leaf
and
losing
connectivity,
but
not
leaf
and
leaf.
2
becomes
more
probable.
D
Right
if
they
are
completely
different
failure,
domains
too
far
apart?
Yes,
yes,
this.
B
It's
like
within
one
availability
so
on,
but
it
may
be
spanned
across
one
or
two
clusters
because
we
want,
like
the
actual
package
to
land
on
different
t1s
right.
B
Yes,
so
it
can
like
I
said
in
the
in
the
schema
it
can
be,
it
can
be
configurable.
So
if,
if
your
next
stop
endpoint
IP
is
let's
say
the
cards
PA
right,
the
the
monitoring
IP
can
be
the
same,
or
it
can
be
a
different
one.
So
it
depends
on
what
the
card
wants
to
have
the
PFT
session,
for
it.
F
So
the
whip
is
what
the
T1
is:
advertising
right
correct.
So
there
is
another
IP
that
we
are
using
for
the
heartbeat
between
the
so
each.
D
B
B
No,
but
that
is
up
to
the
implementation
right
like
we
can
have
the
the
PIP
also
for
the
the
bft,
but
the
bft
session
will
be
established
only
when
the
the
card
is
ready,
say.
F
D
F
Also,
there's
another
IP
that
we
advertise
with
which
the
heartbeat
so
that,
when
we
want
to
connect
to
the
peer
for
the
heartbeat,
the
peer
GPU
that
IP
is
also
advertised
by
the
prdpu
in
addition
to
the
script
today.
B
E
B
But
if
there
is
because
with
this
like
the
advertisement
will
be
done
by
the
vapor
advertisement
is
done
by
the
T1
and
the
cards
doesn't
need
to
even
advertise
anything
to
the
door
right
so
and
the
end
point
or
the
the
next
stop
for
the
the
whip
is
nothing
but
the
pap
of
the
end
of
the
card.
So.
F
B
F
And
then
another
control,
Network
IP,
which
which
we
today
we
are
yeah
I
plan
to
start
it
is
that
also
airbgp.
E
B
E
Two
entry
connected
L3
interface,
paips
and
then
another
loop
pack,
which
is
used
for
the
control
Network
IP,
that's
also
advertised
through
bgp,
but
if
that
control
Network
IP
is
in
the
finite
space,
no
data
traffic
hits
it.
It's
only
used
for
our
flow
sync.
So
if
it's
in
finite
space
we
don't
need
the
bft
mechanism.
I
guess
that's
the
question.
B
E
C
Prince
I
have
a
follow-up
question
in
the
current
Sonic.
Like
you
mentioned
earlier,
it's
operating
active,
active
mode,
so
basically
the
tunnel
endpoint
one
and
tunnel
in
0.2
can
be
receiving
traffic
based
on
the
load,
balancing
decision
at
least
one.
Yes,
however,
with
dpu
tunnel
endpoint
2
will
be
standby
right.
So,
let's
say
of
the
BFD
session
between
Leaf
1
and
tunnel,
endpoint
one
go
down
for
whatever
reason,
but
we
have
not
triggered
failover
from
a
dpu
point
of
view.
C
B
The
expected-
that's
not
my
understanding,
I
thought
if
let's
say
for
any
genuine
reason,
the
tunnel
endpoint
one
goes
down
right.
That's
when
the
bft
has
gone
down
right
right.
In
that
case
there
is
an
automatic
synchronization
between
the
and
the
two
cards
that
the
other
one
will
switch
over
I.
Don't
think
a
controller
needs
to
be
needs
to
notify.
C
B
C
B
That
depends
on
the
website
for
each
whip.
You
have
a
endpoint,
so
if
it.
B
Across
different
dni,
then
it
will
be
it,
it
will
be
basically
will.
D
Controller
yeah
there
is
an
administrative
state
which
says
that
preferably
this
is
that,
because
this
is
a
stand
wave,
but
the
actual
operational
state
is
controlled
by
independent
in
the
sense,
if
the
active
is
not
there,
then
the
configured
standby
can
become
the
active.
Is
that
what
you
meant?
Yeah.
C
So
this
will
have
to
be.
This
will
be
a
new
trigger
for
changing
the
role.
C
Now,
let's
say
let's
say:
T1
BFD
session
go
down
between
the
leap
one
and
the
active
dpu
Okay,
so
in
that
K
T1
will
fail
over
the
traffic
to
the
standby
right,
correct.
Okay,
so
standby
is
current
with
the
current
behavior.
If
he
is
a
standby,
he
will
not
follow
any
traffic.
He'll
drop
it
no.
C
B
Okay,
so
the
point
is
this
is
not
a
normal
scenario
where
the
active
actually
went
down
it
is
like
active
is
still
now,
but
for
some
reason
the
bft
went
down
exactly
yes,
so
the
bft
also
is
like
there
is
like
multiple
retries
like
we
have
300
milliseconds
by
three
times.
B
So
so,
if
all
the
three
went
down
right,
so
all
the
three
like,
let's
say
three
retries
of
the
bft,
went
down,
and
in
that
case
yes,
of
course
the
the
T1
will
make
the
switch
over
decision,
because
the
the
T1
doesn't
know
whether
it
has
actually
actually
went
down.
Or
is
it
a
physical
networking
connection
right.
C
C
E
B
Let's
say,
let's
say
this:
one
is
connected
to
this
door.
C
Right
you're
talking
about
multi-path
for
the
underlay
multi-part
for
the
underlay
yeah
underlay.
So
what
you're
saying
is,
if
a
b,
because
this
BFD
session
is
running
over
vs
lanterno,
the
only
way
for
it
to
go
down
anyway,
unless
all
the
underlay
paths
are
there,
what
is
unlikely
or
the
other
reason,
is
the
tunnel
endpoint
actually
down
for
which
you
interpreted
as
a
dpu
down
exactly
so,
which
failover
will
happen?
Yes,
okay!
C
So
then,
in
this
case,
even
though
you're
using
BFD
as
a
fast
detection
for
failover,
how
much
traffic
loss
will
depend
on
the
failover
in
this
case,
yes,
unplanned,
if
ever
okay.
B
C
G
G
Now,
if
leave
one
end
point
one
connectivity
to
leave,
one
completely
goes
down
right
but
and
point
one
has
connectivity
to
end
point
two
via
Leaf,
2
and
others,
because
they're
all
connected
to
the
Taurus
and
it's
a
cost
apology,
yeah
right
so
so
from
end
point.
One
connectivity
to
end
point
two
point
of
view:
there
is
still
connectivity.
B
There
is
still
connectivity
yeah
if
any
of
the
T1
going
down
is
not
a
concern
for
for
us,
because
it's
like
just
isolating
from
the
from
the
network,
but
the
only
concern
that
we
discussed
in
the
in
the
beginning
of
the
meeting
is:
let's
say
if
one
has
a
connectivity
to
to
n.1
but
leave
two
lost
and
if
Leaf
2
decides
to
switch
over,
that
will
be
like
inconsistent
between
the
two
t
ones
right
and
that's
when
we
we
wanted
to
to
raise
it
as
a
as
a
signal
and
alerting
to
the
to
the
controller
too.
B
So
anytime,
message
over
happens
so
that,
if,
if
for
some
unknown
reason,
if
any
underlay
network
has
gone
down
between
the
T
T1,
but
the
T1
is
still
advertising,
which
means
it's
not
isolated,
then
that
needs
to
be
a
like
acted
upon.
There's
no
automatic
way
too.
H
B
How
fast
we
can
reconverge
and
fall
back
to
the
standby
and
also
the
other
reason,
is
the
with
this
proposal.
We
don't
need
the
cards
to
send.
You
know
different
bgp
session
with
the
aspened
or
something
to
to
tell
the
physical
networking
that
hey
I
am
inactive
and
that
I'm
In
Style
by
right.
So.
H
A
Thanks
Prince
I
wrote
down
the
link,
so
I'll
put
that
in
the
notes.
So
did
you
want
the
community
here
to
go
ahead
and
read
through
it
and
look
at
it.
G
Friends
are
we
planning
to
tie
this
to
the
discussion
that
we
have
been
having
so
far
in
a
chair
right
about
the
connectivity
between
the
two
dpus
and
what
would
be
the
reason
for
switcher
switchover,
failover
right
yeah?
G
Is
there
any
text
in
this
PR
that
will
connect
with
the
dpu
and
the
between
the
two
dpus.
B
This
is
already
one
that
is
available.
A
A
A
Great
so
I'll
just
post
the
link
to
it
then
yeah,
okay,
okay,
great
and
then
I,
don't
see
Gohan
on
the
call
Prints,
but
they
we
did
do
some
follow-up
to
the
PR
or
the
amendments
to
pr244
and
I.
Believe
Gohan
will
see
those
because
those
were
basically
his
comments.
So
did
you
want
to
look
at
them
here
Prince,
or
are
we
good
just
to
let
Gohan
handle
it.
A
Okay,
all
right,
anyone
on
the
call
do
you
have
something
you
want
to
bring
up
before
we
close
for
the
day.
A
No
okay,
the
survey
responses
I've
received
have
pretty
much
unanimously
said,
cancel
for
Thanksgiving
week,
so
I'll
probably
go
ahead
and
do
that
and
I'll
bring
that
up
again
in
the
larger
community
meeting.
Just
to
let
you
guys
know
that's
what
I'm
hearing
though
okay
well
stop
would
be
good.
Yeah,
yeah,
okay,
give
everybody
a
break
right.
Yeah.