Cloud Native Computing Foundation / PrometheusDay EU 2022

Add meeting Rate page Subscribe

Cloud Native Computing Foundation / PrometheusDay EU 2022

These are all the meetings we have in "PrometheusDay EU 2022" (part of the organization "Cloud Native Computi…"). Click into individual meeting pages to watch the recording and search or read the transcript.

3 Jun 2022

Prometheus instrumentation: the Practical Way - Aditi Ahuja, Couchbase

Instrumenting applications to expose meaningful metrics is the key to harnessing the power of Prometheus. The native Prometheus client libraries offer a convenient way to define various metrics about essential behaviours of your application in a form of basic metric types: counters, gauges and histograms. Applying this to more complex cases might be challenging.

In this talk you will learn about the instrumenting real application in an example of Thanos (metric data store extending the long term storage capabilities of Prometheus) compaction microservice. The audience will learn practical instrumentation approaches on production-grade software from basic to more complex cases. The complex case is monitoring various stages and estimating the potential compaction durations, which can vary widely based on the data. Aditi will explain Go client implementation using the official Prometheus library, but the same can be generalized to other languages.

At the end of this talk, you will know how to instrument applications and how to unit test that setup! This talk is for everyone looking to start out with instrumenting code and tap that into Prometheus.
  • 3 participants
  • 22 minutes
instrumentations
instrumentation
instrumenting
prometheus
beginner
introduction
implementation
thanos
platform
golang
youtube image

19 May 2022

Alerting and Anomaly Detection – Best Friends Forever? - Björn Rabenstein, Grafana Labs

When Prometheus became publicly known starting in 2015, the Prometheus developers expected many questions. But one surprisingly stuck out: “Can you do anomaly detection?” Somehow, everyone expected a next-generation monitoring and alerting system to venture into anomaly detection. PromQL is powerful enough to support fundamental building blocks of anomaly detection, but the general direction of Promethean alerting is, in a way, exactly the opposite: Towards confident, non-noisy alerts based on your SLOs. In this talk, Beorn will share a few stories from the receiving end of the pager and why it is almost always a bad idea to put anomaly detection at the other end. He will talk about the “proper” Promethean way of alerting (including its limitations) and where anomaly detection (or even machine learning) might have its place in it after all.
  • 5 participants
  • 29 minutes
prometheus
anomaly
monitoring
alert
advance
carefully
discussion
intrusion
prophecy
julian
youtube image

19 May 2022

Fleeting Metrics: Monitoring Short-lived or Serverless Jobs with Prometheus - Bartłomiej Płotka & Saswata Mukherjee, Red Hat

Prometheus is the leading open-source monitoring solution when it comes to metrics and alerting. It is a single binary that provides you with all you need to monitor your infrastructure and services. It has seen the shift from on-prem to cloud environments and has proven to be successful for users with all kinds of use cases. Prometheus was always designed to aggregate long-living metrics. However, this does not always go along with the solutions that are emerging in the CNCF ecosystem. Short-living workloads are increasingly common in form of Kubernetes batch jobs and serverless platforms like OpenFaas or Lambda and many more. This leads to the question, how and if we can use Prometheus to monitor and troubleshoot those kinds of jobs? In this talk, you will learn about the potential solutions that are emerging in the Prometheus ecosystem. Bartek and Saswata will dive into this problem and propose a set of solutions that could help in monitoring those short-living workloads using the Prometheus data model. The audience will see a demonstration of a solution that uses best practices to capture fleeting metrics and integrates them with Prometheus.
  • 4 participants
  • 34 minutes
fleeting
moments
trivial
durations
seconds
quick
capabilities
term
theory
glance
youtube image

19 May 2022

How Prometheus indexes Data and Why You Should Care - Harkishen Singh, Timescale

Prometheus is capable of ingesting and storing large amounts of metric samples. Prometheus users define queries and dashboards to extract insights from all that data that help them ensure their systems are up and performing as expected. Good query performance is important and that’s why Prometheus indexes incoming data. In this talk we will dive into how Prometheus indexes incoming data. We will aim to give you a visual understanding of the on-disk layout and data structures used to store samples. The aim is to develop an intuitive understanding of data access complexity and costs. This will inform you about how to manage cardinality and how PromQL queries leverage the index to speed up query execution.
  • 1 participant
  • 17 minutes
prometheus
indexes
promql
prompkill
observability
performance
evaluation
discussed
monitor
minutes
youtube image

19 May 2022

How and Why We Rebuilt Auto-scaling in OpenFaaS with Prometheus - Alex Ellis, OpenFaaS Ltd

In the Six Million Dollar Man we get the quote “We can rebuild him. We have the technology. We can make him better than he was. Better, stronger, faster.” And with that in mind, prompted by customer feedback we rebuilt the subsystem responsible for scaling OpenFaaS functions. The new and improved version serves the needs of customers better - with the added ability to scale on in-flight requests and CPU (as well as RPS). This wasn’t an easy journey and we think you’ll be able to learn from some of the PromQL we wrote, how we (instrument) collect the data and the issues we ran into along the way. There’ll be PromQL samples, live demos of scaling functions linked back to end-user use-cases.
  • 2 participants
  • 28 minutes
microservices
microservice
openfast
openvas
prometheus
software
capacity
tunable
kubernetes
views
youtube image

19 May 2022

How to Be 10x SRE? A Deep Dive to Prometheus Operator - Jayapriya Pai & Haoyu Sun, Red Hat

Prometheus Operator is a fairly known solution for monitoring Kubernetes workloads using Prometheus. Many Cloud Native users benefit from Prometheus Operator CRD-based components like ServiceMonitors, PodMonitors, PrometheusRules, Probes which allows better configuration management, self-service or even multi-tenancy. Many things were said about Prometheus Operator in the past, but we believe there is room for a dedicated talk about the designed way of utilizing Prometheus Operator on production Kubernetes clusters. In this talk, Jayapriya a Prometheus Operator contributor from the Red Hat Monitoring team and her teammate Haoyu will explain all you need to know about the common usage patterns. The audience will see practical examples and learn advanced features like securing Prometheus with TLS, enabling robust remote write and operating AlertManager via Prometheus Operator. The talk will also summarize the monitoring and operating aspects of the Prometheus Operator itself, sharing first hand experience of maintaining Prometheus Operator in thousands of OpenShift clusters.
  • 2 participants
  • 24 minutes
monitoring
prometheus
monitors
workflow
deploying
dashboard
kubernetes
repository
scripts
sres
youtube image

19 May 2022

Lightning Talk: Easy anomaly Detection with PromQL - David de Torres Huerta, Sysdig

How to create an alert on a service whose load changes over the different hours of a day? How can I alert on a process that has different usage over different days of a week? Anomaly detection is one of the main challenges that Prometheus users face while setting up alerts. Systems are usually dynamic and the use of resources and behavior depends on external factors that vary over time. Setting up alerts with static thresholds in these environments generates a lot of noise, causing alert fatigue in the operators and ignoring important notifications camouflaged among false positives. In this talk, we will see the different kinds of anomaly detection, when to use them and how to implement them in promQL. Although PromQL does not have specific functions for anomaly detection, as it has for linear regression, it does provide the building blocks to create different kinds of anomaly detection. We will also discuss the possibility of creating new PromQL functions that would make it easier to create this kind of anomaly detection alert.
  • 1 participant
  • 6 minutes
anomaly
anomalies
anomalous
samples
detection
deviation
measured
normal
investigating
model
youtube image

19 May 2022

Lightning Talk: Monitoring Counter Strike Global offensive with Prometheus - David Lorite, Sysdig

Everyone is using Prometheus in their infrastructure, but who is using Prometheus in their game server? In gaming, servers are a critical component of the industry's success. The gaming industry is highly profitable and the enabling technology is critical to its success. They also carry a great responsibility in maintaining quality of service (QoS), where a drop in the latency or in the computing power, especially in multiplayer games, seriously affects user experience. can be critical. In this talk, you will learn how to set up and monitor a Counter-Strike: Global Offensive server with Prometheus. We will show the installation and configuration of the Prometheus server and the following exporters: - Node exporter: to monitor the infrastructure metrics. - CAdvisor: to monitor the usage of the containers. - SRCDS Exporter: to monitor the game server metrics. With all these exporters, apart from monitoring the game itself, we will have visibility into the node and the applications on it, to be sure that the VM is running everything at an optimal service level and avoids extra costs in our cloud bill.
  • 1 participant
  • 7 minutes
monitoring
container
server
exporter
deploy
docker
project
crts
metrics
prometheus
youtube image

19 May 2022

Lightning Talk: Optimize UX and Performance Through Grafana, Prometheus and Lighthouse - Miki Lombardi, Growens

At MailUp we always develop to improve. Lighthouse is a tool that allows us to analyze our page and returns important metrics that allow us to operate to optimize performance and UX. We have created a tool that, thanks to Docker containers, allows us to quickly analyze our platform and view the data in the Grafana Dashboard. In this talk we will analyze our use case!
  • 1 participant
  • 6 minutes
monitoring
performance
optimize
microservices
dashboard
efficient
servers
improving
docker
browser
youtube image

19 May 2022

Lightning Talk: Troubleshoot Compactor Backlog with Ease - Ben Ye, ByteDance

This talk will talk about a common problem if you are running Thanos and Cortex on large scales: compactor backlog. As a core component, it is important to make sure that the compactors are running smoothly and well scaled. In this talk, Ben Ye will explain why compactor backlog happens and how to prevent it from happening. He will walk through ways to identify and troubleshoot it using existing metrics and tools.
  • 1 participant
  • 5 minutes
backlog
compactor
compactors
compacting
compact
compacts
compaction
retention
troubleshooting
sonos
youtube image

19 May 2022

No description provided.
  • 1 participant
  • 6 minutes
prometheus
prom
consensus
meet
attended
contentious
finally
2022
inconvenient
things
youtube image

19 May 2022

Prometheus Data analysis and Event Notifications for Progressive Delivery - Ravi Hari, Intuit

Prometheus is a defacto monitoring tool in kubernetes. Argo Rollouts is an open source kubernetes controller provides ways to perform analysis to drive progressive delivery in kubernetes using Prometheus. While it is crucial to do the analysis it is also important to send the notification of analysis status to the user in near real time. Argo Rollouts uses notification engine which will trigger notifications based on a successful or failure status of analysis using Prometheus data. In this talk we will walk you through an example of how an application can be configured using argo-rollouts by using analysis templates that rely on Prometheus and use notification templates to send notifications in real time to the user. This will also show how this can be integrated to multiple notification channels, destinations and recipients on analysis status.
  • 3 participants
  • 25 minutes
argo
kubernetes
arguable
progressive
provider
rollout
argorolots
capability
deployments
observability
youtube image

19 May 2022

Sponsored Keynote - Connecting Prometheus and OpenTelemetry Data for Faster Troubleshooting - Ramon Guiu, VP of Observability, Timescale

The last few years have been fantastic for observability practitioners with the growth of Prometheus as the standard for metrics monitoring and the emergence of OpenTelemetry as a standard for application monitoring. Interoperability is key for standards to be adopted and successful. In this case, these two standards can make it easier for engineers to both instrument their systems and troubleshoot problems faster. In this talk, we will show the true power of Prometheus and OpenTelemetry working together.
  • 1 participant
  • 13 minutes
correlating
interoperability
observability
telemetry
prometheus
issue
infrastructure
standards
documentation
batching
youtube image

19 May 2022

Storing Continuous Benchmarking Data in Prometheus - Matvey Arye, TimescaleDB/Promscale

Prometheus is most commonly used for observing live production systems. In this talk, we’ll cover another great use case: benchmarking. Usually, distributed systems are benchmarked by using a benchmark driver to apply load and measure performance. This is typically the only data recorded for the benchmark. The problem with this approach is that it gives you visibility into performance output but no ability to diagnose why performance issues occurred. By using Prometheus in your performance benchmarks you can measure resource usage metrics and internal application metrics across all your components giving you the insights you need to understand the reason for performance issues so you can fix them. This will not require a lot of additional effort because you can reuse the observability infrastructure that you should already be implementing in your application as well as the dashboards already built into Grafana. It also allows retrospective analysis of benchmark runs since the data is stored in Prometheus. In this talk we’ll explain how we set up such an environment as well as share lessons learned about tracking benchmarking runs and keeping the result data organized.
  • 3 participants
  • 17 minutes
benchmarking
benchmarks
benchmark
prometheus
performance
cloud
bench
observability
kubernetes
question
youtube image

19 May 2022

Warp-Speed Debugging with Prometheus Exemplars - Ian Billett, Red Hat

Effectively debugging distributed systems almost always requires inspecting more than just your Prometheus metrics data - logs, traces and profiles all provide essential information that help you quickly and efficiently pinpoint the root cause of your bugs. However, navigating between different systems with disjointed data sources interrupts your debugging flow state and ultimately increases the time taken to identify and resolve your bugs. Wouldn't it be nice if Prometheus had a native capability to help you hop between data sources? Enter exemplars! In this beginner-focused talk, Ian Billett will walk you through what exemplars are, how they work and provide practical examples of how you can leverage them in your applications today to super charge your debugging experience.
  • 4 participants
  • 29 minutes
exemplar
experimental
prometheus
things
useful
observe
exposition
effort
accelerate
er
youtube image