youtube image
From YouTube: Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk

Description

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk

"Observability and reliability engineering have been on a convergent course for several years. Error Budgets joined the reliability lexicon of engineering organizations in 2016 with the release of the SRE book. The intersection of observability and reliability has largely been the domain of specialists for practical implementation. How can one democratize these techniques to put them in the hands of a thousand engineers at once?

At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc)."