youtube image
From YouTube: Alerting in Cloud Native Environments [I] - Fabian Reinartz, CoreOS

Description

Alerting in Cloud Native Environments [I] - Fabian Reinartz, CoreOS

In a Cloud Native infrastructure, component failure is normal and expected. The loss of a single node or a dozen hard drives is automatically handled by the systems running a datacenter, removing the need to page someone at 4am.

This calls for an alerting system that understands service availability at a global scope, yet is still able to give detailed reports if and when there is a service-impacting incident. Prometheus achieves this by defining alerting conditions directly on time series data. The resulting alerts are grouped and aggregated into comprehensive and meaningful notifications.

Fabian will walk through the philosophy of time series based alerting, the Prometheus architecture behind it, and how practical anomaly detection can be implemented.

About Fabian Reinartz
| Fabian Reinartz is a software engineer at CoreOS and one of the core developers of Prometheus, a monitoring system and time series database. | Previously, he was a production engineer at SoundCloud and worked on information retrieval during his time at Saarland University.
Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.