youtube image
From YouTube: Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs

Description

Don’t miss out! Join us at our upcoming events: EnvoyCon Virtual on October 15 and KubeCon + CloudNativeCon North America 2020 Virtual from November 17-20. Learn more at https://kubecon.io. The conferences feature presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs

These days, the most common reason for a Prometheus server to run out of memory is an excessive amount of time series in the so called head block, the part of the internal TSDB with the freshest data, which has to be kept in memory prior to consolidation into a block on disk. A large head block leads to a long restart time because the head block has to be rebuilt from the write-ahead log. On large servers, the restart time can be 10 minutes or more. Since restarts happen regularly to upgrade the binary or to change flags, the resulting interruption of sample collection is problematic. Even worse: After an OOM crash, the same replaying from the WAL has to happen, often causing another OOM crash immediately. Ganesh Vernekar will talk about the work started in late 2019 to persist parts of the head block earlier, thereby reducing both the memory footprint and the restart time.

https://sched.co/Zeih