youtube image
From YouTube: Is Kubernetes Monitoring Flawed?

Description

A 3-node Kubernetes cluster with Prometheus will ship around 40k active series by default! Do we really need all that data?? The current state of Kubernetes open source monitoring is in need of improvement. High churn rate of pod metrics, proliferation of metrics with low usage, and configuration complexity are some of the issues that need to be addressed.

I discussed this topic with Aliaksandr Valialkin, CTO at VictoriaMetrics and creator of the open source project. We discussed the common problems, as well as directions and best practices to overcome some of these complexities as individuals and as a community. We also discussed VictoriaMetrics open source project and how it addresses some of these challenges.

Aliaksandr is a Golang engineer, who likes writing simple and performant code and creating easy-to-use programs. Sometimes these hard-to-match requirements work together, like in the VictoriaMetrics case.

The podcast episodes are available for listening on your favorite podcast app and on this YouTube channel.

We live-stream the episodes, and you’re welcome to join the stream here on YouTube Live or at https://www.twitch.tv/openobservability​.

Follow us on Twitter @openobserv to get the live stream times and other updates, and to pitch in with your thoughts and comments.

Have you got an interesting topic you'd like to share in an episode? Reach out to us and submit your proposal at https://forms.gle/9LDkYCmegyS5D8Li7


Dotan Horovits
============
Twitter: https://twitter.com/horovits
LinkedIn: https://www.linkedin.com/in/horovits/

Aliaksandr Valialkin
===============
Twitter: https://twitter.com/valyala
LinkedIn: https://www.linkedin.com/in/valyala/
VictoriaMetrics: https://victoriametrics.com/
On GitHub: https://github.com/VictoriaMetrics/VictoriaMetrics
VictoriaMetrics community channels - https://docs.victoriametrics.com/#community-and-contributions

Resources
=========
Why Prometheus cannot query remote storage in an expected way via remote_read protocol - https://github.com/prometheus/prometheus/issues/4456
VictoriaMetrics: scaling to 100 million metrics per second https://www.youtube.com/watch?v=xfed9_Q0_qU

Chapters
========
00:00 show intro
02:07 topic and guest intro
03:13 monitoring microservice system, app and communications
05:43 high churn rate for pod metrics
12:02 Kubernetes produces too many metrics by defaults, most of which are unused
17:06 recommended listing of metrics
21:50 removing unused metric labels to reduce cardinality
24:16 Prometheus native (exponential buckets) historgrams
26:49 Configuration complexity with multiple deployments
33:16 OpenTelemetry and OpenMetrics open specifications
36:11 collecting system metrics and application metrics uniformly
40:20 VictoriaMetrics essentials
48:46 VictoriaMetrics extensions beyond Prometheus
54:06 a full stack monitoring collection, analysis and alerting
56:09 how to join the VictoriaMetrics community
58:05 industry update: 2023 cloud native predictions post by CNCF CTO
59:16 outro