Cloud Native Computing Foundation PrometheusDay EU 2022 Open Meetings

3 Jun 2022

Prometheus instrumentation: the Practical Way - Aditi Ahuja, Couchbase

Instrumenting applications to expose meaningful metrics is the key to harnessing the power of Prometheus. The native Prometheus client libraries offer a convenient way to define various metrics about essential behaviours of your application in a form of basic metric types: counters, gauges and histograms. Applying this to more complex cases might be challenging.

In this talk you will learn about the instrumenting real application in an example of Thanos (metric data store extending the long term storage capabilities of Prometheus) compaction microservice. The audience will learn practical instrumentation approaches on production-grade software from basic to more complex cases. The complex case is monitoring various stages and estimating the potential compaction durations, which can vary widely based on the data. Aditi will explain Go client implementation using the official Prometheus library, but the same can be generalized to other languages.

At the end of this talk, you will know how to instrument applications and how to unit test that setup! This talk is for everyone looking to start out with instrumenting code and tap that into Prometheus.

3 participants
22 minutes

instrumentations

instrumentation

instrumenting

prometheus

beginner

introduction

implementation

thanos

platform

golang

19 May 2022

Alerting and Anomaly Detection – Best Friends Forever? - Björn Rabenstein, Grafana Labs

When Prometheus became publicly known starting in 2015, the Prometheus developers expected many questions. But one surprisingly stuck out: “Can you do anomaly detection?” Somehow, everyone expected a next-generation monitoring and alerting system to venture into anomaly detection. PromQL is powerful enough to support fundamental building blocks of anomaly detection, but the general direction of Promethean alerting is, in a way, exactly the opposite: Towards confident, non-noisy alerts based on your SLOs. In this talk, Beorn will share a few stories from the receiving end of the pager and why it is almost always a bad idea to put anomaly detection at the other end. He will talk about the “proper” Promethean way of alerting (including its limitations) and where anomaly detection (or even machine learning) might have its place in it after all.

5 participants
29 minutes

prometheus

anomaly

monitoring

alert

advance

carefully

discussion

intrusion

prophecy

julian

19 May 2022

Fleeting Metrics: Monitoring Short-lived or Serverless Jobs with Prometheus - Bartłomiej Płotka & Saswata Mukherjee, Red Hat

Prometheus is the leading open-source monitoring solution when it comes to metrics and alerting. It is a single binary that provides you with all you need to monitor your infrastructure and services. It has seen the shift from on-prem to cloud environments and has proven to be successful for users with all kinds of use cases. Prometheus was always designed to aggregate long-living metrics. However, this does not always go along with the solutions that are emerging in the CNCF ecosystem. Short-living workloads are increasingly common in form of Kubernetes batch jobs and serverless platforms like OpenFaas or Lambda and many more. This leads to the question, how and if we can use Prometheus to monitor and troubleshoot those kinds of jobs? In this talk, you will learn about the potential solutions that are emerging in the Prometheus ecosystem. Bartek and Saswata will dive into this problem and propose a set of solutions that could help in monitoring those short-living workloads using the Prometheus data model. The audience will see a demonstration of a solution that uses best practices to capture fleeting metrics and integrates them with Prometheus.

4 participants
34 minutes

fleeting

moments

trivial

durations

seconds

quick

capabilities

term

theory

glance

19 May 2022

How Prometheus indexes Data and Why You Should Care - Harkishen Singh, Timescale

Prometheus is capable of ingesting and storing large amounts of metric samples. Prometheus users define queries and dashboards to extract insights from all that data that help them ensure their systems are up and performing as expected. Good query performance is important and that’s why Prometheus indexes incoming data. In this talk we will dive into how Prometheus indexes incoming data. We will aim to give you a visual understanding of the on-disk layout and data structures used to store samples. The aim is to develop an intuitive understanding of data access complexity and costs. This will inform you about how to manage cardinality and how PromQL queries leverage the index to speed up query execution.

1 participant
17 minutes

prometheus

indexes

promql

prompkill

observability

performance

evaluation

discussed

monitor

minutes

19 May 2022

How and Why We Rebuilt Auto-scaling in OpenFaaS with Prometheus - Alex Ellis, OpenFaaS Ltd

In the Six Million Dollar Man we get the quote “We can rebuild him. We have the technology. We can make him better than he was. Better, stronger, faster.” And with that in mind, prompted by customer feedback we rebuilt the subsystem responsible for scaling OpenFaaS functions. The new and improved version serves the needs of customers better - with the added ability to scale on in-flight requests and CPU (as well as RPS). This wasn’t an easy journey and we think you’ll be able to learn from some of the PromQL we wrote, how we (instrument) collect the data and the issues we ran into along the way. There’ll be PromQL samples, live demos of scaling functions linked back to end-user use-cases.

2 participants
28 minutes

microservices

microservice

openfast

openvas

prometheus

software

capacity

tunable

kubernetes

views

19 May 2022

How to Be 10x SRE? A Deep Dive to Prometheus Operator - Jayapriya Pai & Haoyu Sun, Red Hat

Prometheus Operator is a fairly known solution for monitoring Kubernetes workloads using Prometheus. Many Cloud Native users benefit from Prometheus Operator CRD-based components like ServiceMonitors, PodMonitors, PrometheusRules, Probes which allows better configuration management, self-service or even multi-tenancy. Many things were said about Prometheus Operator in the past, but we believe there is room for a dedicated talk about the designed way of utilizing Prometheus Operator on production Kubernetes clusters. In this talk, Jayapriya a Prometheus Operator contributor from the Red Hat Monitoring team and her teammate Haoyu will explain all you need to know about the common usage patterns. The audience will see practical examples and learn advanced features like securing Prometheus with TLS, enabling robust remote write and operating AlertManager via Prometheus Operator. The talk will also summarize the monitoring and operating aspects of the Prometheus Operator itself, sharing first hand experience of maintaining Prometheus Operator in thousands of OpenShift clusters.

2 participants
24 minutes

monitoring

prometheus

monitors

workflow

deploying

dashboard

kubernetes

repository

scripts

sres

19 May 2022

Lightning Talk: Easy anomaly Detection with PromQL - David de Torres Huerta, Sysdig

How to create an alert on a service whose load changes over the different hours of a day? How can I alert on a process that has different usage over different days of a week? Anomaly detection is one of the main challenges that Prometheus users face while setting up alerts. Systems are usually dynamic and the use of resources and behavior depends on external factors that vary over time. Setting up alerts with static thresholds in these environments generates a lot of noise, causing alert fatigue in the operators and ignoring important notifications camouflaged among false positives. In this talk, we will see the different kinds of anomaly detection, when to use them and how to implement them in promQL. Although PromQL does not have specific functions for anomaly detection, as it has for linear regression, it does provide the building blocks to create different kinds of anomaly detection. We will also discuss the possibility of creating new PromQL functions that would make it easier to create this kind of anomaly detection alert.

1 participant
6 minutes

anomaly

anomalies

anomalous

samples

detection

deviation

measured

normal

investigating

model

19 May 2022

Lightning Talk: Monitoring Counter Strike Global offensive with Prometheus - David Lorite, Sysdig

Everyone is using Prometheus in their infrastructure, but who is using Prometheus in their game server? In gaming, servers are a critical component of the industry's success. The gaming industry is highly profitable and the enabling technology is critical to its success. They also carry a great responsibility in maintaining quality of service (QoS), where a drop in the latency or in the computing power, especially in multiplayer games, seriously affects user experience. can be critical. In this talk, you will learn how to set up and monitor a Counter-Strike: Global Offensive server with Prometheus. We will show the installation and configuration of the Prometheus server and the following exporters: - Node exporter: to monitor the infrastructure metrics. - CAdvisor: to monitor the usage of the containers. - SRCDS Exporter: to monitor the game server metrics. With all these exporters, apart from monitoring the game itself, we will have visibility into the node and the applications on it, to be sure that the VM is running everything at an optimal service level and avoids extra costs in our cloud bill.

1 participant
7 minutes

monitoring

container

server

exporter

deploy

docker

project

crts

metrics

prometheus

19 May 2022

Lightning Talk: Optimize UX and Performance Through Grafana, Prometheus and Lighthouse - Miki Lombardi, Growens

At MailUp we always develop to improve. Lighthouse is a tool that allows us to analyze our page and returns important metrics that allow us to operate to optimize performance and UX. We have created a tool that, thanks to Docker containers, allows us to quickly analyze our platform and view the data in the Grafana Dashboard. In this talk we will analyze our use case!

1 participant
6 minutes

monitoring

performance

optimize

microservices

dashboard

efficient

servers

improving

docker

browser

19 May 2022

Lightning Talk: Troubleshoot Compactor Backlog with Ease - Ben Ye, ByteDance

This talk will talk about a common problem if you are running Thanos and Cortex on large scales: compactor backlog. As a core component, it is important to make sure that the compactors are running smoothly and well scaled. In this talk, Ben Ye will explain why compactor backlog happens and how to prevent it from happening. He will walk through ways to identify and troubleshoot it using existing metrics and tools.

1 participant
5 minutes

backlog

compactor

compactors

compacting

compact

compacts

compaction

retention

troubleshooting

sonos

19 May 2022

No description provided.

1 participant
6 minutes

prometheus

prom

consensus

meet

attended

contentious

finally

2022

inconvenient

things

19 May 2022

Prometheus Data analysis and Event Notifications for Progressive Delivery - Ravi Hari, Intuit

Prometheus is a defacto monitoring tool in kubernetes. Argo Rollouts is an open source kubernetes controller provides ways to perform analysis to drive progressive delivery in kubernetes using Prometheus. While it is crucial to do the analysis it is also important to send the notification of analysis status to the user in near real time. Argo Rollouts uses notification engine which will trigger notifications based on a successful or failure status of analysis using Prometheus data. In this talk we will walk you through an example of how an application can be configured using argo-rollouts by using analysis templates that rely on Prometheus and use notification templates to send notifications in real time to the user. This will also show how this can be integrated to multiple notification channels, destinations and recipients on analysis status.

3 participants
25 minutes

argo

kubernetes

arguable

progressive

provider

rollout

argorolots

capability

deployments

observability

19 May 2022

Sponsored Keynote - Connecting Prometheus and OpenTelemetry Data for Faster Troubleshooting - Ramon Guiu, VP of Observability, Timescale

The last few years have been fantastic for observability practitioners with the growth of Prometheus as the standard for metrics monitoring and the emergence of OpenTelemetry as a standard for application monitoring. Interoperability is key for standards to be adopted and successful. In this case, these two standards can make it easier for engineers to both instrument their systems and troubleshoot problems faster. In this talk, we will show the true power of Prometheus and OpenTelemetry working together.

1 participant
13 minutes

correlating

interoperability

observability

telemetry

prometheus

issue

infrastructure

standards

documentation

batching

19 May 2022

Storing Continuous Benchmarking Data in Prometheus - Matvey Arye, TimescaleDB/Promscale

Prometheus is most commonly used for observing live production systems. In this talk, we’ll cover another great use case: benchmarking. Usually, distributed systems are benchmarked by using a benchmark driver to apply load and measure performance. This is typically the only data recorded for the benchmark. The problem with this approach is that it gives you visibility into performance output but no ability to diagnose why performance issues occurred. By using Prometheus in your performance benchmarks you can measure resource usage metrics and internal application metrics across all your components giving you the insights you need to understand the reason for performance issues so you can fix them. This will not require a lot of additional effort because you can reuse the observability infrastructure that you should already be implementing in your application as well as the dashboards already built into Grafana. It also allows retrospective analysis of benchmark runs since the data is stored in Prometheus. In this talk we’ll explain how we set up such an environment as well as share lessons learned about tracking benchmarking runs and keeping the result data organized.

3 participants
17 minutes

benchmarking

benchmarks

benchmark

prometheus

performance

cloud

bench

observability

kubernetes

question

19 May 2022

Warp-Speed Debugging with Prometheus Exemplars - Ian Billett, Red Hat

Effectively debugging distributed systems almost always requires inspecting more than just your Prometheus metrics data - logs, traces and profiles all provide essential information that help you quickly and efficiently pinpoint the root cause of your bugs. However, navigating between different systems with disjointed data sources interrupts your debugging flow state and ultimately increases the time taken to identify and resolve your bugs. Wouldn't it be nice if Prometheus had a native capability to help you hop between data sources? Enter exemplars! In this beginner-focused talk, Ian Billett will walk you through what exemplars are, how they work and provide practical examples of how you can leverage them in your applications today to super charge your debugging experience.

4 participants
29 minutes

exemplar

experimental

prometheus

things

useful

observe

exposition

effort

accelerate

er

Cloud Native Computing Foundation / PrometheusDay EU 2022

3 Jun 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022