Cloud Native Computing Foundation PromCon North America 2021 Open Meetings

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Auto-instrumentation of Prometheus for RED Monitoring with eBPF - Bartlomiej Plotka & Harshitha Chowdary Thota, Red Hat

Prometheus is known for robust client libraries in many languages allowing you to instrument your application with useful metrics. With those capabilities, many solid patterns emerged in order to get unified monitoring views for a cluster of applications. One of those is called RED which tells us from what points we should monitor our application (rate, errors, duration). While it’s easy to instrument such RED view for one application, it can be challenging for heterogeneous workloads, written in many different programming languages or in the close source. What if we could remove the instrumentation step from the equation, similar to what service meshes give, but without added complexity and overhead? In this talk, Harshitha and Bartek will go through different approaches of “auto” instrumenting your workload using eBPF, a way of safe and fast execution of code in the Linux kernel. You will learn how to leverage eBPF to get essential data in uniform format into Prometheus for RED monitoring. Join to see a demo showcasing these capabilities using open-source software.

2 participants
33 minutes

ebpf

observability

cloudflare

implementation

instrumentation

infrastructure

debugging

monitoring

dashboards

concerns

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Connecting with Prometheus World: Logs and Metrics - Eduardo Silva, Calyptia

From a data collection standpoint, Logs and Metrics were always handled separately: different sources were handled by different "agents". Systems administrators for years have asked for a unified experience where both, Logs and Metrics, were collected, pre-processed, and shipped from a single agent. In the Logging space, Fluentd and Fluent Bit projects are one of the preferred choices, and surprisedly our users have asked that we integrate native Metrics handling, and today this has been implemented. As the Fluent project, we evaluated not only how to collect metrics, but also how to provide smooth integration with the defacto industry standards. We decided to launch our first Metrics support by fully integrating with OpenMetrics and Prometheus. In this presentation, you will learn about the integration, best practices for Metrics collection with Fluent, and how to leverage our Node Exporter, Prometheus Exporter and Remote Write implementations with your current Prometheus services and tooling without any disruptive change. In the Fluent project, we believe in lowering the friction for our users and integrating with their current standard services. These are interesting times where Prometheus and Fluentd are finally in sync.

3 participants
25 minutes

logging

log

connectivities

monitoring

metadata

users

messages

api

infrastructure

docker

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Deep Network Traffic Observability with Pktvisor and Prometheus - Shannon Weyrick, NS1

Observability of network traffic can prove very important to the successful operation of modern applications. The ability to divine key information from the flow of network traffic can provide insight useful for operations, debugging, and security. But efficient analysis and collection of deeply inspected, high throughput traffic is hard… especially as the trend towards globally distributed applications continues. How does one organize a fleet of at-scale agents which can analyze network traffic in real time and send the results to a modern observability stack? pktvisor is a free and open source observability agent designed to address these challenges. Developed by NS1 for their global DNS network, it makes use of real time streaming algorithms to efficiently extract counts, top-k heavy hitters, set cardinality, quantiles and other key information from the various networking layers. In this talk we will outline the challenges above and then work through solving them with pktvisor and Prometheus. We will cover installation with containers, configuration, metrics collection to Prometheus via scrape or remote write, and how to query Prometheus to visualize the results. Finally we will look at the future of the project which adds full remote configuration and fleet management.

2 participants
24 minutes

insights

infrastructure

analyzers

research

intelligence

functionality

servers

observability

strategy

nodes

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Ecosystem Tools to Select the Best Remote Storage Solution - Matvey Arye, Timescale

In the past year, the Prometheus ecosystem has developed multiple open-source tools to help users evaluate different remote storage systems to decide which one will better fit their needs. This talk will give users a high-level overview of a number of those tools and discuss how they can help users decide which remote storage system to use. The tools fall into three categories: compliance, benchmarking, and data migration. The first class of tools checks remote storage systems for compliance to the Prometheus standards. In particular, there is now a PromQL compliance suite that makes sure that query results are correct as well as a remote_write compliance suite. A remote_read compliance suite is in development. The second type of tool is a remote write benchmark which allows users to take their existing Prometheus TSDB blocks and “replay” them into any remote storage system that uses the remote_write protocol. During replay, users can choose to test various scenarios such as increased scrape speed or cardinality. The last type of tool allows users to migrate data from one remote-storage system to another. This prevents vendor lock-in and promotes flexibility for users as their storage needs change. We will go over the basics of how prom-migrator works and show a quick demo.

1 participant
12 minutes

prometheus

prometheuses

storage

considerations

remote

performance

analyze

migrate

tuning

managed

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Lightning Talk: Learning from Cortex to Improve Promscale HA - Matvey Arye, Timescale

In general, deploying Prometheus high-availability replicas is critical for robust production systems, since they protect against a crash of any one server. Promscale has supported ingesting and deduplicating data from Prometheus HA replicas since the first release – but our original method was based on database locks, which led to complex deployments, had problems with scalability and coupling, and was less resilient to certain kinds of failures. Our new system, which takes inspiration from Cortex, solves these issues and makes Promscale both easier to use and more robust. In this talk, we will discuss how our understanding of support for Prometheus HA has evolved and use our experience as a lens through which to build a mental model of how Prometheus HA works, and how users should think about a robust end-to-end HA solution. In this talk, we will discuss what guarantees Prometheus HA aims to achieve and the correctness properties that are involved. Next, we’ll cover how all of the services in a Prometheus HA setup connect together and how each component can provide robustness. Finally, we’ll discuss some interesting edge-cases that came up when designing our HA solution.

1 participant
6 minutes

prometheus

storing

replicant

copies

promiscuous

promise

servers

remote

problems

cluster

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Lightning Talk: Machine Learning Observability with Prometheus - Shivay Lamba, Layer5

Evaluating the metrics of a Machine Learning system is obviously a critical task during research & development phase. However, once a machine learning model is deployed in production, it is also critical to know how this model is performing. Having good instrumentation and observability practices is needed. Prometheus is a platform for monitoring application metrics and this lightning talk will share how you can use Prometheus to monitor ML Pipelines.

1 participant
5 minutes

observability

monitoring

machine

smi

performance

service

research

concerned

life

shiva

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Lightning Talk: Monitor Your SOA Stack with Prometheus & Grafana - Michel Schildmeijer, Qualogy

Monitoring performance, throughput, and the error rate are important to be in control of your SOA transactions. If you use Oracle Service Bus or Oracle SOA/BPM suite there are a lot out of the box diagnostics waiting for you. The puzzle here is how to get it out in a useful way. Besides the many commercial solutions also OpenSource tools can help you out with it. You can export runtime diagnostics out of the Diagnostics framework, monitor your SOA Composites, and trace down Service Bus statistics using Prometheus and Grafana. The session will elaborate on how to set up proper monitoring using these tools, also in a proactive way where automated monitoring is a must for every application environment.

1 participant
11 minutes

oracle

monitoring

software

weblogic

dashboards

server

analytics

docker

execution

soa

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Lightning Talk: Who Are You? A Heuristic Approximation to Service Identification - David de Torres, Sysdig

Standard Prometheus annotations allow scraping applications via service discovery. This works great when all the jobs are similar, but what if we need to apply some special relabeling, drop metrics or add new labels? Sometimes, it is possible to tag the workloads with new annotations that can help to identify a specific application. However, most of the time, restrictions on permissions or restarts of critical services can make this hard to implement in production environments. In this talk, David will introduce techniques to identify the application running in a pod using the information provided by the service discovery. To do that, he will present the use of heuristic filters and its different applications and use cases.

1 participant
6 minutes

scrape

scraped

bots

annotations

service

prometheus

api

spot

container

implementation

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Opening Remarks - Richard Hartmann, Director of Community, Grafana Labs

1 participant
7 minutes

vaccination

conference

2023

gradually

kovac

attend

having

monitoring

expect

certain

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Practical Kubernetes Monitoring with Prometheus - Michael Friedrich, GitLab

Monitoring for microservices and distributed workloads isn’t an easy task. Imagine that you become responsible after years of traditional service monitoring. You start your research - Kubernetes as a container orchestrator already is a complex ecosystem to understand. Join this talk for a journey on the first steps, best practices with Prometheus, Grafana and the Prometheus Operator. The adventure does not stop here: Use client libraries to instrument your own application with metrics, deploy it to Kubernetes and what now? Explore how service discovery, long term storage with Thanos/Cortex and alerting help complete cloud native monitoring. It’s also an iterative process, with new workloads, changes, and a constant work in progress. This talk will help prepare you for cold winters and hot summers, silencing alerts before they exist. Spoiler: The end will peek into observability with logs, traces and SLOs, and show monitoring use cases on GitLab.com SaaS.

1 participant
24 minutes

monitoring

microservices

monitored

kubernetes

manage

cluster

opentelemetry

observability

deploying

replicas

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Sparse High-resolution Histograms in the Prometheus TSDB - Ganesh Vernekar & Dieter Plaetinck, Grafana Labs

The Prometheus TSDB has gained experimental support to store and retrieve the new sparse high-resolution histograms. The talk will present exciting results fresh off the press: savings in storage and indexing space, benchmarks for storing and retrieving the new histograms, interesting tidbits encountered during implementation.

2 participants
21 minutes

histogram

histograms

prometheus

monitoring

instrumentation

tsd

observations

visualization

distributions

version

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Sponsored Keynote: Aggregating, Alerting and Graphing on Millions of Prometheus Timeseries - Rob Skillington, CTO & Co-founder, Chronosphere

The key to quickly resolving problems is knowing about them as soon as possible. To accomplish this, you need fast and reliable alerting. However, many companies find that as their environments grow, and their alerts and queries become increasingly more expensive, it can become challenging to maintain the speed of real-time queries alerts as the metrics become higher in cardinality. This is where aggregation can help!

In this keynote, Rob will discuss how M3 aggregation can be used to create derived metrics, similar to Prometheus recording rules, with any Prometheus remote storage that supports Prometheus Remote Write. This can help relieve the pressure on Prometheus or remote storage when querying and alerting on metrics at large scale.

1 participant
11 minutes

monitoring

observability

pods

large

cardinality

millions

micro

cluster

times

workloads

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Sponsored Keynote: Monitoring Production Clusters at Scale - Filip Petkovski, Senior Software Engineer, Red Hat

In this talk, Filip Petkovski & Ian Billett, engineers from the Red Hat Monitoring Group will explain how Red Hat leverages Prometheus and its ecosystem components to monitor tens of thousands of clusters running critical customer workloads. The audience will learn how tens of billions of metrics are gathered from user applications & infrastructure components using Prometheus, how this data is stored at scale, and how developers, admins and business stakeholders use this data to ensure that OpenShift is delivering a first-class experience for customers of Red Hat.

This talk will outline problems faced when running this infrastructure, and explain how we have managed challenges such as multi-tenancy, scale, authorization and more

1 participant
10 minutes

prometheus

monitoring

dashboards

openshift

nodes

kubernetes

software

platform

project

proxy

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Streaming Recording Rules for Prometheus, Thanos, and Cortex Using the M3 Coordinator - Gibbs Cullen & Rob Skillington, Chronosphere

As the Prometheus ecosystem has matured, the compatibility between projects has accelerated and it is possible to use features from one project with another. In this talk, Gibbs and Rob will show how to use various parts of the M3 ecosystem to perform in-memory streaming metrics aggregation, without using M3DB itself, thanks to developments in core Prometheus and M3. Using two deployment options, they will perform a demo using a concrete example of creating a highly efficient cAdvisor overview dashboard that shows fast aggregate views of CPU, memory, network and disk activity for pod groups from tens of thousands of active containers. They will also demo how to use streaming recording rules to store per-metric 10m or 1h downsampled metrics with a global metric suffix in Prometheus, Cortex, and other Prometheus remote storage solutions, like M3 and Thanos, that don’t yet have the ability to downsample metrics.

6 participants
29 minutes

monitoring

dashboard

streaming

promethepromql

performance

m3

cortex

aggregators

server

cmo

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

TSDB Data Compactions on Scale Without Headache: Now & Future - Ben Ye, ByteDance

Data compaction is common in modern databases. The same happens in most Prometheus-based systems like Thanos, Cortex and Prometheus itself. Compaction plays an important role in those systems, improving query performance and storage usage significantly. In this talk, Ben will explain how compaction works in TSDB and how TSDB-based compactor works with object storage. Audiences will learn about the challenges of operating compactions on scale and how to mitigate those issues. Next, Ben will introduce the new improvements that were contributed to this space and new toolings that rewrite TSDB blocks. Last but not least, Ben will outline the future work that is planned to improve compactions in the Prometheus ecosystem further.

2 participants
24 minutes

compact

compacts

tsdb

compaction

tsb

tsv

compacted

node

block

introduction

1 Nov 2021

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Telemetry is Not Having to Hope - Owen Diel, Grafana Labs

Telemetry isn't new, so why is it so important now? In this nontechnical talk, Owen reflects on his unorthodox journey into software, how open source enabled him, and what monitoring means in the face of ever increasing complexity. In addition to the importance of monitoring and open source, Owen describes its natural progression from a reactive tool for managing complexity to a proactive tool for incentivizing and evaluating changes. The increasing scope of monitoring is only part of its success: it is also driven by the external growth of systems it need watch. Finally, Owen elaborates how Prometheus simplifies external complexity for both technical and organizational benefit.

1 participant
19 minutes

telemetry

functioning

launched

users

today

understanding

providers

talked

experience

ec2

Cloud Native Computing Foundation / PromCon North America 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021

1 Nov 2021