youtube image
From YouTube: Logs Told Us It Was DNS, It Felt Like DNS, It Had To Be DNS, I... Laurent Bernaille & Elijah Andrews

Description

Don’t miss out! Join us at our upcoming hybrid event: KubeCon + CloudNativeCon North America 2022 from October 24-28 in Detroit (and online!). Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Logs Told Us It Was DNS, It Felt Like DNS, It Had To Be DNS, It Wasn’t DNS - Laurent Bernaille & Elijah Andrews, Datadog

It all started with a team reaching out because they had DNS issues during rolling updates. Business as usual when you host hundreds of applications on dozens of Kubernetes clusters… Four weeks later: We are reading kernel code to understand the corner cases of dropping Martian packets. Could this be the connection between gRPC client reconnect algorithms and the overflowing conntrack table we can feel but not see? In time, we solved the issue. And for once… it wasn't DNS! In this talk, we will focus on one of the most complex incidents we have faced in our Kubernetes environment. We will go through the debugging steps in detail, dive deep into the mysterious behaviors we discovered and explain how we finally addressed the incident by simply removing three lines of code.