youtube image
From YouTube: Observability In ArgoCD/Rollouts Using Streaming ML For Reducing MTTR - Vigith Maurice & A Kalamkar

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Observability In ArgoCD/Rollouts Using Streaming ML For Reducing MTTR - Vigith Maurice & Amit Kalamkar, Intuit

Speakers: Amit Kalamkar, Vigith Maurice
At Intuit one third of P1/P2 outages are caused by a change. As Intuit runs ~2500 services on K8s we need to quickly detect and resolve problems using AIOps. Our talk focuses on how we built a K8s native DAG-based streaming processing platform (Numaflow) and streaming ML platform (Numalogic) which is open-sourced under Numaproj to address this problem. We will show how we collect, process, and analyze in-cluster data in real-time and how our Numalogic computes anomaly scores for each deployment. This DAG-based ML platform has now been adopted by Intuit and helps our ML engineers focus on writing just the inference and pre/post-processing logic while the platform takes care of building the dynamic execution model, retries, buffering between the vertices, back-pressure, conditional-forwarding, and auto-scaling. We will also show how we integrated Observability into Argo CD so users can understand and remediate the behavior induced by change and how this is helping Intuit reduce MTTD/MTTR.