DASH / High Availability WG

Add meeting Rate page Subscribe

DASH / High Availability WG

These are all the meetings we have in "High Availability WG" (part of the organization "DASH"). Click into individual meeting pages to watch the recording and search or read the transcript.

25 Apr 2023

F5 joining call, Q&A

Exposing table to SONiC is not an issue; concerned w/speed
Inline sync is not easily decoupled from the dataplane - would never get the performance if we went w/SONiC
Perfect sync/Bulk sync there is no tight timeline
Think of as snapshot/restoring feature
See:
https://github.com/sonic-net/DASH/blob/main/documentation/high-avail/AMD-Pensando_HA_Proposal.md
https://github.com/sonic-net/DASH/pull/271/files
https://github.com/sonic-net/DASH/blob/main/documentation/high-avail/xsight-labs-ha-proposal-new-ideas.md
  • 5 participants
  • 20 minutes
sonic
replication
dash
backup
gig
throughput
review
issue
f5os
interoperability
youtube image

15 Nov 2022

Continued bulk sync what to do with config changes during bulk sync.
Guohan needs SAI spec to be complete, for SONiC to have logic to handle cases.
Bulk sync s/be accurate (vs. performant), w/certainty re: how to test it.
Create a lightweight bmv2 HA for experimental version to verify algorithms?
  • 8 participants
  • 57 minutes
sync
updates
discussion
pr
processing
meet
backup
setup
recording
ahead
youtube image

8 Nov 2022

BFD & ECMP - Prince
Overview, from a DASH perspective we would need to add-on
  • 8 participants
  • 36 minutes
bft
t1
package
proposal
protocol
ecmp
primaries
controller
backup
maintenance
youtube image

1 Nov 2022

Community to Review please
https://github.com/Azure/DASH/pull/271 - amendments to PR 244 (Sanjay & Mukesh)

Prince: BFD & ECMP - move to Nov 8, 2022 - Still WIP
  • 6 participants
  • 13 minutes
comments
planned
suggestions
finalized
present
reviewers
issue
week
thinking
prince
youtube image

25 Oct 2022

-Suggestions & Comments added to https://github.com/Azure/DASH/pull/244/files
-Heartbeat process and format: Message format is in the header file (flow sync message notification data)
-SAI_DASH_HA_REGISTER_CP_CHANNEL_ATTR_NAMED_PIPE
-Get from named pipe, or standard SAI notification?
-Callback function, is this defined? So SyncD can register (b/c ReDis is not used)
-Should we use 2 different channels? Named pipe for data, another for all control messages?
-HB messages are over UDP for this paper. Should we use BFD? Need to think more on this…
-Multi-path vs Single-path - need to think through this problem. We should have options to choose
-Come to next session w/ideas :)
  • 6 participants
  • 58 minutes
protocol
session
proposal
discussion
finalizing
connectivity
ecmp
api
preemptive
encapsulated
youtube image

11 Oct 2022

MetaData standardization
AMD update to proposal
Syncs near real-time, not necessarily real-time
  • 4 participants
  • 44 minutes
discussed
consensus
metadata
updated
protocol
issue
proposal
message
processing
gohan
youtube image

4 Oct 2022

Discussion on MetaData policies/expectations
Proposals to define InterOp in the future
Future work suggested by Guohan: Community to work on CP and DP InterOp, and packet format
Need to get control channel defined
  • 7 participants
  • 1:10 hours
discussion
comments
metadata
having
message
proposals
details
acceptable
updates
processing
youtube image

27 Sep 2022

Unplanned Switchover discussion
Requested updates to HA Proposal document
Thank you Sanjay for presenting!
Next week discussion on MetaData policies/expectations
  • 7 participants
  • 58 minutes
review
discussed
proposal
okay
retriggers
termination
concerns
switchover
gohan
flow
youtube image

20 Sep 2022

Reshma hosting :)
  • 7 participants
  • 1:00 hours
dash
proposal
cmp
issue
protocol
monitoring
amd
gohan
ahead
hue
youtube image

13 Sep 2022

Pick up w/SAI definitions on page 10
Q: will this be represented as a P4 model?
A: Sanjay - this could be challenging. Will look for a way to formalize and represent it.
Q: Guohan - would be great if we can sample code to call the api.
A: SAI API common config examples (Marian)

Change CP messages opaque vs common
  • 5 participants
  • 56 minutes
communicated
status
discussion
amd
acknowledgement
threads
missions
psi
controller
ahead
youtube image

6 Sep 2022

Continuance of AMD proposal - will need another session next week.
Next session: Pre-emption case to be discussed, SAI Spec, Overlay HA
  • 3 participants
  • 54 minutes
udp
packets
connectivity
tcp
configuration
process
dps
tested
theoretically
preempting
youtube image

30 Aug 2022

AMD provided an overview of their HA proposal
  • 5 participants
  • 1:00 hours
protocol
message
overview
implemented
relaying
plan
current
structure
functioning
vm
youtube image

16 Aug 2022

Small agenda this week, push to next week
  • 6 participants
  • 5 minutes
discussed
christina
control
chat
panzando
proposal
week
marion
hey
coverage
youtube image

9 Aug 2022

Review of XSightLabs proposal
SONiC Team attendance & discussion
SAI APIs to query flow state?
Performance concerns
  • 11 participants
  • 1:06 hours
proposal
protocol
discussed
amd
implementation
message
proposed
interoperability
package
transmitting
youtube image

2 Aug 2022

AMD HA presentation
  • 5 participants
  • 1:11 hours
syncing
switchover
connections
plans
setup
flow
replication
vp1
failover
coordinated
youtube image

26 Jul 2022

1. Participants should read: https://github.com/Azure/DASH/blob/main/documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md
2. Bring concerns to the table please
3. Proposal is an attempt to cross Reliable/Perf while remaining InterOperable
  • 4 participants
  • 34 minutes
protocol
messaging
acknowledgement
operational
updates
advanced
interoperation
sending
processing
synchronizing
youtube image

12 Jul 2022

Continue API details

Definition of switch_id
Active/Active convo
Protocol definition convo
  • 8 participants
  • 30 minutes
synced
protocol
batches
updates
session
maintainers
gpu
notification
current
initialized
youtube image

28 Jun 2022

June 28, 2022
@Marian sharing OpenCompute PR for SAI APIs in the experimental section
https://github.com/opencomputeproject/SAI saiexperimentaldashha.h

We did not get through the entire set of APIs

Administrative State up/down (add enum w/more states such as starting syncing, syncing progress, sync completed, etc…)
Peer ID (used to communicate state)
IP for the session
Role (optional)

@Michal presented an HA Deep Dive slide
  • 8 participants
  • 54 minutes
ip4
ips
protocol
connections
access
api
host
routing
hardware
syncing
youtube image

21 Jun 2022

Please review/leave Comments for SAI HA API proposal by Marian - thank you!
https://github.com/opencomputeproject/SAI/pull/1500
  • 5 participants
  • 17 minutes
experimental
psi
dash
session
p4
package
marion
latest
project
configuration
youtube image

14 Jun 2022

June 14, 2022
Discussion of API requirements
  • 11 participants
  • 59 minutes
interface
routed
shared
dpu
protocol
hosts
api
virtual
ecm
interoperate
youtube image

7 Jun 2022

June 7, 2022
  • 4 participants
  • 11 minutes
network
tweaking
optimize
mode
application
interoperability
algorithmic
operating
hardware
asic
youtube image

17 May 2022

Reliable or stateless
Synchronization and state updates discussion
  • 6 participants
  • 50 minutes
bandwidth
dpu
updates
ports
protocols
retransmissions
synchronization
connection
throughput
reliable
youtube image

10 May 2022

No description provided.
  • 6 participants
  • 1:01 hours
protocol
communicate
tcp
protocols
synchronization
session
servers
bandwidths
stateful
network
youtube image

3 May 2022

May 5, 2022 HA WG Call
  • 8 participants
  • 1:03 hours
flow
protocols
connectivity
asymmetrical
rethink
synchronous
configuration
extensibility
efficient
migrate
youtube image

19 Apr 2022

April 19, 2022 HA WG Call
  • 7 participants
  • 58 minutes
dash
vms
protocol
configurations
relayed
coordinating
connections
backup
simulation
interoperability
youtube image