Cloud Native Computing Foundation PromCon North America 2021, 1 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Lightning Talk: Learning from Cortex to Improve Promscale HA - Matvey Arye, Timescale

Description

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Lightning Talk: Learning from Cortex to Improve Promscale HA - Matvey Arye, Timescale

In general, deploying Prometheus high-availability replicas is critical for robust production systems, since they protect against a crash of any one server. Promscale has supported ingesting and deduplicating data from Prometheus HA replicas since the first release – but our original method was based on database locks, which led to complex deployments, had problems with scalability and coupling, and was less resilient to certain kinds of failures. Our new system, which takes inspiration from Cortex, solves these issues and makes Promscale both easier to use and more robust. In this talk, we will discuss how our understanding of support for Prometheus HA has evolved and use our experience as a lens through which to build a mental model of how Prometheus HA works, and how users should think about a robust end-to-end HA solution. In this talk, we will discuss what guarantees Prometheus HA aims to achieve and the correctness properties that are involved. Next, we’ll cover how all of the services in a Prometheus HA setup connect together and how each component can provide robustness. Finally, we’ll discuss some interesting edge-cases that came up when designing our HA solution.

A

So right now, I'm gonna tell you about using some ideas from cortex to improve high availability in the prom scale.

A

So this is a recap: prometheus high availability works by just deploying two identical prometheus servers, scraping the same endpoints and storing almost the same data right ben mentioned that the time stamps might not quite align, but it's close enough. It's it's very close data, but when you think about a remote storage solutions, people don't actually want to pay the storage course of keeping both copies of the data. So what most remote storage systems support is some ability to de-duplicate this data in a way right, keep only one copy of the data for time period.

A

So, for example, replica one might be sending the data and keeping the data. If a replica one goes down, then the long term storage might want to switch over to replica 2 and and the slower that until that goes down and so forth, um so uh promiscuous based on sql. So we originally implemented a very naive solution to this using database logs right all of the problem scale. Instances in one cluster tried to get the same database lock, whichever promise scale instance got the database lock was the writer and the other data was just dropped.

A

If the writer died, it would give up its lock and the other replicas would get its lock, etc, etc. There were two problems with the solution. One problem is that this created a tight coupling between the prometheus instances and the problem scale. Instances prometheus itself couldn't take the lock it had to delegate that to promise scale.

A

But now you have to have this one-to-one coupling between um the prometheus servers and the prom scale servers, but really what you want in these type of systems is one prometheus, a tier, a load, balancer and then one promise cat here. This was impossible in this kind of system. The other thing is, as with most database locking systems, you know who has the lock right now, you don't know who had the lock an hour ago, and so, if you are getting delayed data now you can decide whether to keep that data or not.

A

We solved this second problem by switching to an immutable lease approach where, for each cluster time period, only one replica took a lease, and once that leaves was taken, it was immutable. You couldn't change the leaves at the end of the lease. If that replica was still alive, it could extend the lease right, but if the replica went down, then another replica say replica 2 could take the lease for a future time period. So you couldn't modify the current lease, but when that lease is up, you could switch over uh that solved.

A

The second problem, uh because now you have a log of you- know who had the lock an hour ago right.

A

To solve the first problem, which was already coupling the um the network topology uh from uh knowing who the replica was right, so you wanted a promiscuous know or which replica data came from without having to constrain the network topology.

A

And for this we used a clever idea from cortex, which was just to put in the replicant information into the data itself. This is commonly, and then cortex is done with external labels, where you define external labels on your prometheus.

A

Instead of saying, hey, I'm sending data from cluster a and I'm a replica one or I'm a replica too, and now the data can be sent through a load balancer and once the data is received, you still know what replica that day they came from, and so this allowed us to create a new high availability architecture, which is actually what you would expect.

A

You have your prometheus instances sending data labeled with the appropriate external labels to alert balancer, which then sends data to a prom scale tier which uses a releasing mechanism at the scrubbed to save it into our database, which is time scaredy, be a database built on top of postgres.

A