youtube image
From YouTube: How to Include Latency in SLO-based Alerting - Björn Rabenstein, Grafana Labs

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

How to Include Latency in SLO-based Alerting - Björn Rabenstein, Grafana Labs

Chapter 5 of “The Site Reliability Workbook” is an excellent study of how to create meaningful alerts based on SLOs by measuring the rate at which the error budget is burned over different time windows. This rather complex approach is blissfully straight-forward to implement in Prometheus, as demonstrated in the chapter itself. However, all of it is based on error rates, leaving latency concerns out of scope. Björn “Beorn” Rabenstein will explore various options of applying the same ideas to latency-based SLOs. The foundation is a precise and meaningful definition of the SLO. From there, Beorn will explore various techniques to translate the SLO into an error budget and how to measure its burn rate with Prometheus. Once that is done, creating error-budget-based alerts is relatively simple. There are, however, pitfalls and trade-offs along the way, which Beorn will help cope with.

https://sched.co/UaZQ