Cloud Native Computing Foundation KubeCon + CloudNativeCon North America 2019 (San Diego), 22 Nov 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Prometheus Deep Dive - Ben Kochie, GitLab

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Prometheus Deep Dive - Ben Kochie, GitLab

After the Intro session we will go into a mix of advanced use cases, news, and open Q&A with all Prometheus maintainers who are at CloudNativeCon.

https://sched.co/UahS

A

So welcome to the Prometheus deep dive, a kook on my name is Ben I'm, a site, reliability engineer at KITT lab and I'm also been a contributor to the Prometheus team for quite a number of years. Now. Most of my work is not on prometheus itself, but all the exporters and integrations that people use and yeah feel free. If you have questions, one of our other Prometheus developers will be handing out mics during. If you have a question in the middle of talk, feel free to interrupt me.

A

I, don't mind so quick 101, since this is the deep dive, but you know Prometheus started in late 2012. It was created as a realtor. It was created to solve real world problems. We had quite a lot of metrics, but we also had problems with having.

A

Monitoring at the time, Nagios keep up with the rate of change in our infrastructure, and so we built this system to be an active monitoring system and not just a passive data collector so who into the intro yesterday yeah and maybe an other random audience question now: who's who's already using Prometheus.

A

Who's not using Prometheus and is just getting started, yeah cool!

A

So now that you've got Prometheus installed. What do we go? What do we do here? Well, there's a lot of good reading material on on monitoring itself and and why it's good to use monitoring to let you know when your systems are working properly, I'm a big fan of the of this tecnique of the red method and used method where kind of two sides of the same coin, and it's it's all about.

A

Looking at your metrics from the perspective of your users, because your users don't care if your out of memory or your CPUs are overloaded, they care if they're there there. Their requests are going through and they're going through quickly and there's a lot of great material on that and then on for Prometheus itself. There's a couple of really great books that you can read: they they go through all the detail of getting you into prometheus, but yeah.

A

So Prometheus itself is a is what I call a intentionally uncoordinated distributed system is the Prometheus design was came from a need where the monitoring system needed to be the most reliable thing on the network and which meant that the Prometheus itself needed to have the least number of dependencies on anything else on your network. So as long as it it's up and running, it's got a little local disk and it can reach to the network. It can monitor it versus other other monitoring platforms where like, if you've got it.

A

If you're linked to the internet goes down, you lose all visibility, because you can't send any data to your SAS platform anymore or, if you, if, if Prometheus, included monitoring that included a distributed storage cluster.

A

Well, how can you monitor if, if, when you're cassandra is down, you know you want Prometheus to be able to tell you that your Cassandra is down or whatever you happen to be using for for a for storage, you want to run Prometheus close to your targets, because you don't want to monitor your when links or or other network as a side effect of of your of your of your target monitoring, and we always recommend that you think about vertical charting before you start thinking about horizontal charting.

A

So if you've got a thousand pods and 500 of them are for one service and 500 500 or for another service split your Prometheus to have 500 one service and 500 on the and yes, so it's a it's a minimal dependency super robust piece of software. The prometheus includes its own built-in time series database. It uses right ahead log for reliability during operation and restarts, but in the the time series database itself is is written out and it is immutable. So it's it's hard to corrupt and it works works pretty well. It has.

A

It goes through simple compaction cycles to make the indexing more efficient, and it's just designed to be reasonably simple and robust. So yeah there's a lot of common questions that I get for Prometheus. So.

A

Prometheus is collecting float64 data all the time and the the best practice is you want to expose your your. You want to count your events and expose those as counters, because, if me, because Prometheus is a polling based system, if you lose a data point, you don't want to lose the information that was contained in that data point completely, so you have a count of say one and then a scrape happens, and now the count is two and another scrape happens. Now the count is three and Boop's.

A

You missed a scrape and you come up in the next count is five, so you don't know whether it went from the next scrape would have been three or four, but at least you know that the total is still there, so you don't actually lose the you lose fidelity, but you don't lose accuracy, and so because of this you, the first thing that you tend to run on a previous counter.

A

Is you run a rate or an increase to find out what the rate of change of that counter is because, most of the time, if you're, counting CPU seconds to find out how much CPU usage is going on in a process or a server, you need to take a rate to find out how many CPU seconds per second is is being used, and that gives you you know the percentage of CPU utilization or how many cores a process is using if you've got a multi-threaded app and so here's some real-world example of data points, and so this is sixty seconds of data for a counter.

A

That's going up and somewhere around you know, fifteen hundred two or so every every fifteen seconds, and you can see that the scrape intervals is fifteen seconds. So we get four data points within 16 seconds and there's like one at four seconds: a nice note next, one at 19 seconds, etc, etc, and then that's just one instance of this server. So we've got a. This is from a H, a proxy web server or web reverse proxy, and you know we of course have more than one H a proxy. The load balance all of our systems.

A

So here's the data points from a different H, a proxy, and so it's it's not going up as fast, but also notice that the the the samples are offset differently. So Prometheus you tell it to scrape all of your targets every 15 seconds and it's not doing that on a clock basis. It's actually taking all of your targets and spreading them out over the scrape interval so that you get this nice even and flow of ingestion into Prometheus.

A

It also gives you some nice, like mathematical, effects to to get more accurate information, and so, when you get a bunch of different data points, you can see that they're spread out all over time within that window and now now we want to query this and we want to get a rate.

A

So we're gonna, say, say: I hit the query button and I time-aligned my query so that it my query time is at the top of the minute and I'm asking for an increase over 30 seconds, and so now I've got two data points that fall within that 30-second window and what Prometheus does is, instead of giving you the difference between the first data point and the second data point.

A

It actually extrapolates the the two data points to find out what the total increase would be if there were more samples within that section or within that within that range, and this is really a little bit confusing for some people, because they they see a counter, and it goes up by one and they're like but I got. A number that was was I, got two point: five because and they're like that.

A

That's confusing and well it's all, because it's interpolation because it you're asking you ask Cory theists for 30 seconds worth of increase, but it's only gotten data points that are 15 seconds apart within that time window.

A

So it has to extrapolate to give you the correct answer, because if you know that that counter is kind of going up at the same rate and once once we start adding more data points back into this, and we do more extrapolation, you can see that it is now gonna work out, because now, if I want to take all four of my H, a proxy servers and sum up that that increase I can now not have to worry about the the milliseconds of difference in time between the different scrapes that are in that time window.

A

Another interesting thing is what, if I asked for a minute of increase, I don't have any more pretty pictures. Thank you good gnuplot. What if I asked for an increase and what, if over the whole minute? What? If that say that the the blue circles on the top of the graph, the first two data points were missing. So if I didn't do extrapolation and I had was missing. Those two data points, I would only get say about 2000 on the increase versus the total of 6,000, which is what it's really going on here.

A

So it's one of those cases where we we do the interpolation. We do the extrapolation of the data in order to try and produce a more accurate overall picture of the of the data that you've got in your metrics.

A

So next big question: a lot of people ask is okay, so now I've got a Prometheus and it's overloaded or I started a nice I now have two Prometheus servers and I've got high availability and and we're gonna start talking about scaling and the first. The first question people ask is well how do I capacity plan?

A

Well, it depends mostly on the rate of ingestion of your data, because, as Prometheus collects data, it's got a store it somewhere and it needs currently about 1.5 it's per per sample, and so it's pretty easy to just do a little bit of capacity planning, math or and- and it also depends a little bit on the data that you've got so usually the recommendation is. Is you start out?

A

You look at the to you, you, you scrape some of your data and then you can now start to build a capacity planning story for your for your Prometheus and you, and so like. For example, we have a server in our production environment, that's doing about a hundred thousand samples per second by about one and a half bytes for 60 seconds for 60 minutes and it's about half a gigabyte an hour and that's a bit of data. But it's not too bad when a hundred thousand samples per second is quite big. We have.

A

We have a lot of metrics and a lot of data, and that's just you know one of our our Prometheus servers.

A

Now the question is: is now I've got all these multiple instances and how do I scale it? Well, the there's a bunch of different ways: Prometheus has a technique called Federation where you can take a single Prometheus. That is collecting data from a lot of targets and you don't maybe not even care because it's a you know, say a bunch of individual, auto scaled pods and the individual pods don't matter so what you really care about is the cluster level data. So you have what promethease is called recording rules.

A

It takes a Prometheus query and stores that into a new metric, and then you have a Federation server on top of that, that pulls in just the recording rules and ignores the individual pod data and that's a simple like hierarchical system. The promise, of course, if you tried to pull everything into that federated server, it would blow up, because if you try it's it's not a method of like replicating.

A

It's a myth, it's a method of tearing, but if you really want to get all of the data into a database, well, you're gonna need some kind of horizontally scalable database and in the Prometheus design we have the individual previous servers that are small and robust and good for alerting and then you've got these large scale, databases that might fall over if there's a network partition, but at least the is still gonna work, and so things like cortex and m3, DB and Thanos are different methods of doing horizontal scaling of the data from multiple clusters and multiple instances of Prometheus and dealing with the the high availability deduplication and things like that and for a long time we've had this service discovery moratorium and we're finally out we we're now taking new service discovery methods, so who here is using docker swarm and would love to have parithi support for docker swarm directly?

A

Well, we can do it now. Some somebody open a pull request and yeah I want to save time, because lots of people have questions. So, let's do Q&A.

A

Yarn, do you have a question what's what's what's what's your biggest problem for me, theist.

B

Okay, we have one.

A

C

What is the impact in the storage of using many levels for per metric? Sorry say that, again, what.

A

C

In terms of the storage, when you use many, many labels for parametric I'm.

A

Not I'm not sure with what's the impact on the storage.

A

So yeah the Prometheus uses an inverted index. So when you add a new new label, it doesn't really take up that much more space, because the index itself only really needs to store the the the key from that label into the the metric. And it's it's relatively efficient. So it's it's totally. Okay to have a bunch of different labels, especially if it you know they're important that you want them to.

A

You want to be able to have your infrastructure like if you're talking about like service discovery, labels and you've got you know, maybe a data center label or a cluster label and like those are those are not that expensive.

B

Perps mention the Colleen allegation, the infamous yeah.

A

You wanna you, you want to make sure that you don't have tons of values for those labels, because the the the more values the bigger the index is and the longer it takes to scan those indexes when you're when you're walking a metric. But if you've got you know tens of data, centers or or hundreds of clusters. Those, though that's a reasonable amount of of cardinality. It's.

B

Also hope how things are correlated so if you have a an instance level naturally, and then you have faster label, but in one cluster you always have the same instances. It's actually not increasing cardinality, that's essentially a free label. But if you add something only two orthogonal, you get those like square growth of cardinality, okay,.

D

So, regarding pre aggregates, let's say: I captured a metric for a month and this parameter supports creating a new Priya after a month and then creating on existing data. Or does the pre aggregate has to exist at the tended metrics getting captured.

B

Should I take it sure why now that sounds like retro actively evaluating recording rules? Is that it? Okay, that's not yet a feature, but the plumbing is all in place. So it's on the roadmap and it will I, don't know whoever works on it. It's actually she's pretty straightforward. Technically, no to do there.

A

There's actually an open pull request to to start building into prom tool, the some of that ability. So the.

B

Dts DB is structured in a way that inserting individual time series later is a really expensive operation, but there is like it's implemented as an operation, and now you just need the tooling that Ravel gets a rule or that backfills data you got from another source, and this will all happen fairly soon. Okay,.

A

You go that one first.

E

I'm curious, what experience you have with the the aggregation layer. You know, there's your Thanos, your your your Thanos cortex, you don't! What kind of experience do you guys yeah.

A

So so I get lab worries in Thanos because it was really really nice to deploy. Is we we were before we were using Thanos. You know we were smaller than we were growing and we were adding more Prometheus servers and we were having more. You know as tedious to setup dashboards that pulled from multiple. You know so we're using Groupon up for our dashboards, and it was tedious to have like one data source for this cluster in this data source.

A

For this cluster and mixing data sources was really annoying, so we added Thanos as an overlay proxy to make it easier to query, but we weren't actually using the any storage. So we actually had six months of data in our Prometheus servers, and that was working quite well and it was just fan of as an add-on to start doing the overlay layer and deduplication, because you know we we we had four are between our ger fauna and our Prometheus servers. We had a little uh nginx failover, so it's just a whichever, whichever one was.

A

It would pick the first one and if that one was down, I would grab the second I forget what the nginx configures for that, but it was just a simple failover and then, um but that that would produce weirdness, sometimes because, of course, they've got different gaps and so adding Thanos to do the gap. Fill was really really nice and then after we after we rolled that out, then we started to experiment with pushing our T's to be data into a object, storage and using the storage for that.

E

Can you hear me yeah got a quick follow-up, then did you guys use any sort of caching layers because the the finals queries can get kind of slow yeah.

A

So we're we're using we're not we're using it for our public dashboards, so we're using trickster and it works. Okay, but yeah. There's, there's a bunch of I've been we're talking with the I've, been talking with the tano's developers and we're talking about adding a caching layer and actually specifically, there's talk about bringing the cortex query caching layer into Santos and, like there's some some code share between all these projects.

A

Like that's the nice thing is we we've got a good team where the cortex developers and the thanos developers and the Prometheus developers are all kind of the same team. It's just slightly different techniques for the same stuff. That's.

B

I think that's already merged, like you can use this context, cache for normal communities for Taro's. It works in all directions, cool.

F

So prometheus alert manager is like a nice tool for monitoring alerts and such and I, like its simplicity. um But do you have any other integrations that might like empower like the the ops personnel that collecting Prometheus data to like tune their alerts without necessarily having to go into like, like crate configurations at the code level like ROM qol and such like.

A

Gui tools for the alerting rules. We keep trying to convince the gravano people to build that for us, but no I, don't most of us on the prometheus team or we're back-end developers. We we don't like doing UI.

G

Yeah um giving recommendations or resources on how to like go about comparing contrasting the various storage solutions between cortex, m3, DB or Thanos, and how to choose one to use for your environment.

G

A

Specific I don't have any Darren. Do you have any recommendations for picking a storage layer? It's a kind of it depends on what your what your like network layout is, and you know what your, what kind of storage systems do you have you know dependent. You know how much complication do you want like? No, the Thanos is really interesting because it only requires object, storage and it works by sin. It works really.

A

Well, if you have like integrated network where you can actually sit, have Thanos query talk directly to the Prometheus servers, but if you're very highly distributed- and you don't have a global VPN and you've got remote sites, something like cortex might be better where you've got the remote Prometheus servers behind some kind of NAT and it streams the data out to the internet over the public internet and sends it up it into a core text service.

H

mmm So you asked us what's our greatest problem with Prometheus right now for us, it's that the Jenkins plugin doesn't work, we installed it and it crashed immediately now before I go fix that and submit a PR for myself to fix it. What is the plan for things like Jenkins or other integrations that are popular but might not be in the core of what you guys are.

A

Working on yeah, no, we actually have a separate. We have a separate github organization called prometheus community and we're slowly trying to find that we're slowly trying to build that up as like the place where popular things can go and have additional maintenance and be more more official than than just some random other github org and we're. You know we're slowly trying to get that on board and help them so check out the Prometheus community, github org I'll.

H

I

Maybe this is a little more starter question we've been using Prometheus prom operators and getting our all the metrics for our cluster and that's working really well. We've got Gravano running all sorts of interesting dashboards and as soon as we do that now the application developers are saying: hey I want to use that for some application, specific metrics that we want to put in and I'm wondering. If you could talk a little bit about what we have to do to do, that I know. There's SDKs for a Java go know.

C

I

But now are we running an additional web server? That's gonna be on each one of our pods that are now you know hosting all the metrics. Are we sending them somewhere? Are we registering with that? I mean the pom. Operators were great for the cluster, but now they build this in how complicated is that going to be yeah.

A

So all of the the Prometheus metrics protocol is simple: HTTP GET and all the Prometheus client libraries include hooks into whatever language web servers are available. So if you've got the Java, you know it uses the Java Web HTTP server, you have a go lying at uses, go HTTP and then yeah. So you you just use these Prometheus client libraries in your code and either on an exhibit. You can. You can register it. If you've got like a request, router, you can register.

A

You can put the Prometheus metrics registry, which is the usual name for the the internal metric counter trackers, and you put you can just send a route to slash. Metrics and Prometheus. Can then scrape that data and and collect and put it into the database, and but you know there are cases where you might want it. You might want to put it on a different port and set the service discovery. So you've got your main API port and you get your metrics port, but it's it's up.

A

It's kind of up to it depends on the language and things like we actually recently moved.

A

We have a large rails app and we moved the we're moving the Prometheus metrics from an inline controller to its own, dedicated port on the the unicorn controller process. So because it turns out that whenever we get slammed with traffic unicorn queues and then we lose monitoring, and so we've moved it to a separate port. Because, though, because of the limitations of Ruby but ongoing, it's all guru, teens and it's no big deal got one on the right.

J

Can you give us an update on the state of vertical compaction in prometheus.

A

Like what do you? What what what specific question about compaction so.

J

If you're back filling data from a different Prometheus server, if you're like doing a remote right or if you have focus- and you have a che pairs that are both providing data for the same the same time, period kind of compacting that down so that you're not keeping the duplicated data in the object store.

A

That's more of a question for things like cortex and and Thanos I believe cortex takes the H a and throws away one of the them automatically, whereas Thanos keeps both and and keeps both around, and so that's that's kind of like prometheus Prometheus itself is not a remote right receiver. So you can't stream one Prometheus to another. That's more of a question for things like panels and cortex.

A

B

The I mean both of them have deduplication logic and if you backfill into a Prometheus server, if this tooling we mentioned which is kind of there, we're not really ready to vailable, then it will not to do deduplication. It will just put it all together into one time, series block.

K

So I understand that most people are using grow fauna.

K

Has there been discussion about a way to quickly visualize all the metric graphs of a target in Prometheus itself.

A

So yeah we, the Prometheus web interfaces, is intentionally simple and it's mostly there to like, provide a basic, debugging interface and and getting started and testing queries. And then you know my usual workflow is all all beyond Prometheus or Thanos, using interface and I'll. Do a bunch of test queries and I'll figure out what I want and then I'll copy and paste those in the core fauna, but we actually just started a rewrite of the web interface.

A

The original webinar face on Prometheus was handwritten javis ripped and it was really annoying to work on and we've got a new a new UI we're rebuilding and react. So we're looking for more contributors to contribute features to the built in prometheus, UI.

B

There's also work on an LSP implementation for prom ql that this react, your I could perhaps use or you can use in vs code. So it's quite exciting. What's happening there.

L

This might be a github issue, but I'm gonna ask it anyways. Is there gonna be any continued support for sizing on TS DB being specified on the CLI? It's an experimental feature right now and it does not work properly. For me, the.

A

Mean there were two teams size based retention. Yes, correct I haven't heard anything that that it doesn't work, I've tested it. It works for normally, okay,.

L

A

Issue yeah sounds like you might have hit a bug.

A

Anybody else got one right here.

E

The I wonder if you have any tips or anything for config management, cuz config management of Prometheus can get heavy. It has a lot of configs alert configs and you also have to often update configs all the time you want to add an alert or double, remove an alert so yeah.

A

It depends on what your your configuration management system is. Like you know, we have a lot of our configuration and chef.

A

Other people are, you know, there's there's the operator stuff and you can use JSON it yeah. There's there's a lot of JSON O's is super interesting. We're actually moving all of our dashboards into Griffin at JSON it so our instead of hand clicking and making all of our dashboards. We we auto generate them so they're, all consistent with all the services, so.

B

Prometheus is super opinionated in many aspects, but this is intentionally up to you to pick your poison. Having said that, bunch of projects get mix in sub-directories, just as examples how you could with JSON or create rules, create dashboards, but that's not as in like this is another one canonical way. It's just a way for us to document possible rules possible dashboards, three for a few more minutes, so.

M

As like an infrastructure or operations, person, I think Prometheus makes a lot of sense and the scalability aspect of it. It's very logical, there's a bit of an overhead I think when you ask sort of regular day-to-day application engineers to start using it, because there are some sort of non intuitive concepts in there. They're like different than uh yeah metrics is a service kind of company. Do you've any recommendations on sort of like how to onboard people mentally and yes, the.

A

The the previous books and- and we have you know we have a bunch of how-to guides and tutorials stuff on our website. You know we're always looking to improve those and yeah there's. We also have a YouTube channel so we're every year we have Prometheus conference and we have lots of great talks and I'm actually going to be putting together after we get the 2019 talks posted online. Well, I'm gonna make like a Prometheus 101 playlist on YouTube, it's cool and I. Think that's all. We have time for.

A