youtube image
From YouTube: When the Logs Just Don’t Cut It: Root-Causing Incidents Without Re-Deploying Prod- Phillip Kuznetsov

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from April 17-21, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

When the Logs Just Don’t Cut It: Root-Causing Incidents Without Re-Deploying Prod - Phillip Kuznetsov, New Relic

Speakers: Phillip Kuznetsov
We’ve all been there: your pod is crash-looping, you check the logs and you realize you forgot to log something important - now you’re unable to figure out what went wrong. You try to reproduce the problem locally with no luck: it only seems to happen in production. What do you do? Do you re-deploy to production with more print statements? You could burn hours doing that while you risk more problems. What if you could instead get that same data without the headache of restarting prod? In this talk, I’ll show you how to magically collect this data using bpftrace. Bpftrace lets you capture lots of useful data (function arguments, return values, latencies of individual functions - just to name a few) without re-deploying pods. Bpftrace is very powerful, but can be complex to work with, especially in multi-node environments like a Kubernetes cluster. I’ll show you how to cut past these problems by walking through a demo incident. I’ll show you some tips and tricks for working with bpftrace on Kubernetes, including how to leverage Pixie to easily deploy and collect data from bpftrace scripts.