Cloud Native Computing Foundation Online Programs, 16 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CNCF Live Webinar: Intro to open source observability with Grafana, Prometheus, Loki, and Tempo

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Okay, today, we are talking with richard hartman, with grafana labs, an intro to open source observability with grafana prometheus loki and tempo everyone. Please uh remember that during the webinar you're not able to speak as an attendee you're using the chat I see already so oh say hello, and for richard um we'll get you as many of those as we can at the end, and um we can even stay on a little bit longer since we're running late. To take care of that.

A

This is an official official webinar of the cncf and as such is a subject to the cncf code of conduct. Please do not add anything to the chat or questions that would be in violation of the code of conduct and please be respectful of your participants and presenters.

A

Please also note that the recording and slides will be posted later today to the cncf online programs page under online programs. They were also available via this registration link, which will take you to our online programs youtube playlist with that, I'm going to hand it over to richard, so we can kick things off.

A

Thank you again for hanging.

B

Thank you. Thank you, libby. uh Thank you. Everyone um word of warning. I keep getting pop-ups from from the platform that my internet connection is unstable, which I don't believe is the case, but something is, is is broken-ish, so if I drop or anything um I'll try to rejoin so, let's get started intro to open source observable team, a little bit of validating that most of my life. I've worked in engineering architecture, operation worlds, so I I have strong opinions about about the right tools and about not perfect or not good enough tools.

B

Oftentimes. You have this. This parrot thing there, where you have breaks between different media, where you have breaks between different um trains of thought like how to how to index your data, how how the mental modeling works. Maybe one thing has the one color. The other thing is the other color or one of the things left to right. The other is like right to left doesn't matter. You have breaks between your different systems too often, which in turn means uh that way.

B

Too often you um you end up paying extra cost in mental overhead or in in automation, overhead. It's not seamless! You need to switch mental modes when you go from your logs to your traces or what have you, which um is, is not nice and it just. It adds friction, and you don't really need that at like five in the morning on a sunday when you've got when you've gotten a pager.

B

So let's try and rethink uh what what we actually want to do here and I'm going to to go through a little bit of of like the philosophy of observability and a few buzzwords as a as a foundation. Let's say um for what we are then talking about.

B

um There is um a thing where the cloud native scale is basically what internet scale was two decades ago and that's kind of important to keep in mind, because a lot of of issues which we see in the cloud native world have already been solved in different contexts before us, and it's always a good idea to to look at what engineers before us did to to solve problems like not the specific implementations, because usually they don't fit their age if they're, too old or the new age.

B

um But the underlying concepts like, for example, um computer networks. The internet also power networks, a lot of those tend to run on metrics, because this is already a predestination of of what you care about as as a domain subject expert.

B

um So it's it's always good to look back at what has been done before and what worked not from the specific implementation, but from a an engineering point of view.

B

As always in tech we have buzzwords, um buzzwords are usually usually they have a kernel of truth, um but um by the time there are buzzwords they have lost most of that meaning, um which is a pity, but it also stands like why they were so successful.

B

Of particular note is: is cargo culting cargo culting is defined as observing behavior and observing success or results of that behavior and emulating that behavior, without actually applying the underlying thought or fundamental engineering practices.

B

It comes from from indigenous people who observed soldiers, building, building, runways for planes and small control towers and such and then the gods send send stuff, from heavens, which was basically just uh logistics of the army. But the perception was that, just by building runways and such you could get gifts from the gods and to this day, those those things still echo in in a few religions, so that is observed behavior it becomes part of culture, but it it's not actually doing anything.

B

It's not actually pursuing the uh the goals or or the the underlying um rationale and that's something which you always need to be worried about. It's not about just changing the name for a thing, and anyone who was assistment yesterday is sae today and you're done it's about actually changing the behavior and actually understanding why something is successful, not just observing that it is successful monitoring.

B

While I personally uh use monitoring and observability more or less interchangeably- and that is buzzwordy definition- monitoring has taken a little bit of a meaning of collecting data, not using it. um You have two extremes in this. One takes one thing where you have the full text: indexing where you just in in a vain attempt, go after everything which you can find or data lake, which outside of batch analysis, is often a euphemism, for um no one is ever going to look at.

B

The thing um observability is is trying to reframe that a little bit um about being able to ask new questions, just observe what inputs, what outputs a system has and being able to deduce the internal state of that system from those inputs and outputs, as in ask questions which you didn't know you wanted to to ask before, and that enables humans to understand complex system. But it also allows you to automate a lot of this. So it's not just about determining that something is in a certain state.

B

It's also about determining why it is in a certain state and ideally how to get it out of that state. If you cannot ask questions on the fly like new questions, it's just not observability.

B

Another super important concept is complexity, where you have what I call fake complexity, aka, bad design, which you can reduce and you should reduce, in my opinion, like, unless you have other engineering constraints. Like I don't know, money gtm, maybe maybe compliance reasons. What have you but outside of of actually reasons why you have complexity? You should always strive to get rid of complexity, but you have real system inherent complexity as well, and that can be moved, but it cannot be made to go away like state is always someone's else's problem.

B

You have all your micro services they're stateless, but someone has to maintain the database so that that complexity has to live somewhere. So yeah you can move it back and forth. You can comparison mentalize and, in my opinion, my strong opinion. You should comparison mentalize it and you should distill it meaningfully and we have two different definitions of of distilling. This, a the apis towards whatever the consumer, slash user of the thing is and b already start thinking about what you need to emit towards the observers towards your operational teams.

B

So they can look at the thing that is basically slis, um sli slo sla, often times people are confused. What they mean. It's really really simple. Sli are several service level indicators. What you measure objectives are what you need to hit and agreement since, when you need to start paying course, someone broke a contract.

B

A lot of of sre to me is about aligning incentives across the org, because if you have devs they want to ship code, then they want to ship new releases asap.

B

You have operational people who are paid for for stuff, not breaking so you have diametry diametrically opposed incentives where the one group wants to move super quickly and the other group wants to move rather slowly and carefully, and so they always they always fight. They always have strive. Course, that's built into literally into their compensation structure and into their complete organizational structure.

B

There is one of the main things of sre to me: is the concept of error budgets, where everyone shares a budget for how many errors a thing can have, and if you hit those budgets, it's fine, but it doesn't matter if this is due to new features or a b testing or a new deployment where the pm needed something really really urgently or things always breaking. If things break too often in operations, the devs don't have error budget for their testing and deployment velocity anymore. So you align those incentives.

B

Another nice thing is, if you're able to build a shared understanding, not just align incentives between people and that's where dashboards are coming in, where all those dashboards ideally are shared between all those different teams, because then you have an incentive to invest in shared tooling, and everyone improves a little bit and everyone else benefits from the thing you pull all your institution knowledge around a thing from a lot of different angles, and everyone works together in making this better. It also means you're, building the same language and you built the same understanding.

B

Everyone has the same dashboard. The pm doesn't need to fight the engineers about what that one metric is course. They literally look at the same data. They don't use different words for different aspects. Of course, all of them use the same dashboards, the same alerts, the same reports, which in turn means they use the same language services are not a super important concept; they could basically comparison mentalize complexity and, if you remember just now, I said one of those two abstraction layers would be an interface towards the user.

B

They usually have different owners and teams. Obviously, teams can have or own more than one service, but by and large they they tend to have their own groups of whoever is responsible and contracts define their interfaces.

B

I, like the term contract, a lot of course it is, is commonly defined as a written agreement, which must not be broken, and you actually write it down and you agree it and you sign it. So you have an agreement and by writing things down and making things explicit.

B

A lot of those implicit misunderstandings just go away because once it's written down and agreed and that's the basis for what you actually do and how you operate, a lot of people will take a second and third look and actually start negotiating details, instead of everyone being like yeah, whatever it'll work, and then it breaks and everyone is fighting why it broke. And then you realize that you had a lot of misunderstandings doesn't matter if the customers or consumers are internal or external, treat them as if they were external.

B

Of course, they are depending on your thing. Anyone coming from networking like myself, layers or another way of thinking about this. The internet wouldn't exist without proper layering, because I can literally rip out layer, one and layer two and I have instead of ethernet, I have wi-fi or what have you and that wouldn't be possible without those clean and long-term stable interfaces between the different layers.

B

Other things like cpus, hardness, compute nodes, your lunch. Even if you cook from scratch, you will not grow every last cucumber yourself. You have certain interfaces where you buy other services and just consume. Those alerting also super important um customers, don't care. If I don't know, you have 20 database notes, they don't care if if 15 of them are down or five of them are down or all of them are healthy, they care about that service which they are consuming being healthy and responsive.

B

And what have you so that's the perspective to mainly take define your slas, your sli's, your slos from that perspective of, is it user interfacing, or is it user visible? The nice thing? If you do this in depth, what is your provider's sla and sli is perfect for you to debug, of course, if the database is down, you don't need to debug where your webshop is not working, you kind of know, so you you structure again, you use the same language across the complete stack of what you're doing important to avoid burnout.

B

Anything or anything which is currently or imminent. Customers must be alerted upon and nothing else raise a ticket. Do it during business hours, if it's not customer, impacting just don't, of course, you'll burn out. So that's the intro part now gets to the tech part prometheus prometheus, if you don't know, is inspired by google's pokemon. It's a time series database internally. It uses 64-bit values for pretty much everything which is relevant, there's thousands or tens of thousands of insta thousands of instrumentations and exporters that are public. um There's millions of installations of prometheus.

B

um It's not for law done by gorham main selling points.

B

Built-In services cover that is, will notice um they're next, like not impossible, it's very uncommon to run kubernetes without a prometheus of some sort, because they are literally designed from each other. Even back from the google work in pork mondays and more or less by a happy little accident with kubernetes and prometheus within cncf low and behold. Those are the two founding projects of cncf they go together. um You haven't, you have no hierarchical data models, so you don't have your.

B

I don't know your region, your your city, your customer, and then you need to select by customer and all of a sudden, you need to walk up your hierarchical area. While you need to walk down blah blah blah now you have an n-dimensional label set, which you slice and dice as you need it. So you select by label. Customer equals x and you're done. Prom kl is a function label a functional language which allows you to do vector math on on on your data, which is super efficient, like highly efficient in particular.

B

Of course, the label matches matching usually does more or less by magic. What you want- and this is used for everything- processing, graphing, alerting exporting data every every way you work on the data is always through promptly. So it's a language you have to learn, but it's the one language and then everything works. Simply operation, don't need to convince you probably highly efficient, it's pull-based for good reason. Of course, this makes a lot of things easier to reason about about correctness and up-to-date correctness of the state of of the wider system.

B

Push versus pull is a borderline religious debate, but in particular, coming from the networking space. There are some properties of pull which are next to impossible or super hard to to emulate in push-based system. Unless the push-based system has complete information of of everything which should be sending data at which point pulling is more efficient. Anyway, white box, black box monitoring, one looks at the thing from the outside without further information, whereas white box monitoring looks at all the inerts, you instrument your code, you emit data from internal.

B

um Every service should have its own metrics and endpoint. With things like the prometheus agent, which we announced today, with my promises team head on look at the block of promises, io slash blog, um we uh we can also accumulate this data for you and then even push it to other backends um yeah and super hard api commits stronger than anything I've ever seen in my life, maybe except for the linux kernel time, series yeah most certainly except for the linux kernel, these defined as user interfaces, which are not deprecated anyway.

B

What are time series recorded values which change over time, for example the temperature in your room? That's a time series you usually merge those individual events of. I don't know tens of thousands of people accessing that thing into counters and their histograms um typical examples would be requests to web server temperatures service latency this kind of thing. It's super easy to omit the parse and read: that's literally how it looks on the wire. So it's like.

B

I know people who print f in their c code and then just dump that file onto web server and that's how they instrument their code and it works like there are easier ways, but for them that works and it's totally fine scaling kubernetes is a spork. Prometheus is a sport. One, so yeah scale is, is kind of built in prometheus and kubernetes are designed and written with each other in mind, borg and borgmon again yeah just looking at prometheus.

B

I have a typo there's a two missing in that in that sentence. um So roughly 1 million samples per second is not a problem on current hardware.

B

2200K samples per second and core is is roughly where we're at and but that's already slightly old, and the single largest uh prometheus instance, which we saw in production, was 125 million active times years like we as in prometheus team.

B

um I know of someone who ran it at 700 million, so um yeah it's kind of scalable, but it's also painful. At that point, you probably would cortex or thanos or something speaking of um there's, two uh two projects which have high overlap with uh with prometheus team members, thanos and cortex. Historically, thanos is easier to run and scales. Storage horizontally.

B

Cortex is a lot easier to run these days and it started with scaling uh storage in gestures and querying horizontally. It took the code of of thanos to also scale storage horizontally, guess what thanos was working on with.

A

B

B

Data from from from grafana itself the largest single cluster- it's not all of grafana, just one cluster, but that's already old data. We have higher numbers now: 65 million active series at a cost of 668 cpu cores and 3.4 gigs of ram.

B

One customer is running at 3 billion, but that's kind of more than pushing it, but it did not completely die in a fire loki. It is basically like prometheus, but for logs. So it follows all the same design principles as the same label based system. It has the same indexing type. It takes tons of code from from cortex and for a kind of aubry seasons. The nice thing is you don't need a full text index course.

B

Usually, if you work on logs, you don't need every last bit and piece of your thing, indexed most often you're able to to extract a few relevant bits and pieces of information. You index that you search on that and the rest is just an opaque string which is which is stored without indexing, which means you have a lot less overhead and cost in storage and in particular, indexing in lookups.

B

You can work at scale like significant scale.

B

Sorry and one of the nice properties which are initially non-obvious to a lot of users, is, um as you use literally the same label based system as prometheus. It's trivial to to turn your logs into metrics to extract metrics from your logs for alerting, graphing blah blah blah blah blah, basically pre-processing or processing logs into metrics, again remember same like internet scale. Two decades ago. That's kind of the same trick, which is literally the same thing where a lot of singular ones were turned into metrics and then just the metrics exposed in loki.

B

You have that mechanism built in which is super nice and except for google's m tail, which kind of was that, even when it was released, something which which we haven't seen in in the open source or in the open world like certain search engines and such have this internally, but not uh not others prior to loki at least, and you can pump basically all type of of of text-based information into into loki.

B

One of the elite deaths at welsh even puts his his car telemetry and pictures from his dash cam into low key course. He can and he likes to because again the content back here is unindexed, which means you can just put whatever, and it's just an opaque string or blob. To be precise, you might remember the prometheus exposition format we saw earlier or the open, metrics format which we saw earlier. um That's actually the same with the labels. Here you just need a timestamp, of course.

B

Obviously an event is is always at a specific point in time. So you need to emit that specific point in time, whereas the metrics are handled differently. On a conceptual level, you can emit precise timestamps, but usually for mathematical reasons, which we are not going into here. It's it's better to to have prometheus or cortex or thanos handle handle the timestamping versus with low key. It's better to have the emitter handle timestamping um some numbers. um Our queries at grafana labs regularly see 40 gigs per second gigabytes per second.

B

um I know that we already at rough production, see 80 gigs due to a new way how we, how we scale our queries, um which means you can go through insane amounts of data within a super short time.

B

We regularly query terabytes of data in under a minute, and ideally you then emit this back into metrics. So you don't have to do those expensive or relatively expensive queries regularly. You can just what you really care about already emit into metrics and then again you reduce total amount of information, also computational complexity by orders of magnitude tempo, the loss of the bunch um with openmetrics.

B

There was another thing which which was brought into the open, which was before that basically limited to to google, with my openmetrics head on when we were talking ages ago about potentially merging open senses and open metrics.

B

One of the things which stuck with me is when, when the googlers mentioned, how how searching for for traces didn't scale and when google tells you that searching doesn't scale searching for something um you better, listen which, which I did so x, employers are just an id.

B

You have an id for a trace and you attach that to the trace, but you also attach that id to a metric or to a log line.

B

So now you know that a relevant metric or relevant log line carries a trace with it, and you don't have to have this needle and haystack problem where you, where you have to search through all your traces or live analysis, do a live analysis on your traces to deduce what the properties of that particular traces.

B

um You already know that this is a relevant thing course. It came from that high latency bucket, where I know your p99 was two seconds. What have you doesn't matter, but you know you have a high latency there. You know you had that one error, you know you had that one security exception. What have you- and you know that this one trace is relevant to the thing which you're currently working on, which you saw in your logs or your metrics, so you don't need to search.

B

You don't need to switch mental context, all the time trying to walk through a ton of traces or spends you simply from your metrics from your logs, where you already know that something is relevant, jump into your traces super nice.

B

um They are built into pretty much everything which we're talking about um of course, kind of obvious they're nice, but tempo also um also allows you to search, of course, some users and some use cases just require searching of more or less raw traces and spends my own personal opinion. At some point.

B

It would be nice to optimize this out if, if you need to do search as of today, but if you need to rely on search going forward, that's also completely doable better would be if, if you go through ex-employers, because it's just so much more efficient, only works on object, storage, you don't need cassandra elastic anything expensive in the background, given an object, store and you're done, it's compatible with all the things open, telemetry tracing, zipkin jager by default. We are not sampling, you can sample if you want to, but we don't sample.

B

um I also need to update that slide. I see um because as of four months ago, which is eons in in this production velocity, uh we had over 2 million samples per second at 350 megabytes per second, and we have 14 day retention, three copy stored at a cost of 240 cpu, 400, gigs, 450, gigs of ram and 132 terabytes of object, storage and the p99 of 2.5, um it's better already, but like tempo scales and it scales insanely high.

B

Bringing all of this together this this more holistic thing allows you to jump from logs to traces, from metrics to traces, from traces to logs and all the all the other different ways, um of course, like it's literally designed for each other and while they're all distinct projects and you're, not forced to use all of them to to reap benefits.

B

If you so choose um you get, you get the most bang for your non buck.

B

Of course, a those things have been designed for each other and personally speaking since at least 2015, I have been working towards having those three things for metrics logs and traces as a holistic thing. So there is a long-running underlying design as to the bank for the buck. All of this is open source. You can run it yourself.

B

I like food and shelter so you're, also more than welcome to go to grafana cloud or or buy enterprise, or what have you um and there's some more features: rough sniff test. If the user, the intended user, has more money than time, um it tends to be a paid feature. If they have more time than money, it tends to be open source, like that's roughly the the sniff test for our monetization strategy.

B

Again most or anything, we talked about right now, it's completely open source. You can run it yourself, a few screenshots. Most of you know how um how grafana looks, but still um those blue lines are relatively new and super nice. You can have events you can have um you can have your alerts. You can have things like this, which, which give you a lot more context. You can also have examples visualized and things like this um and tons of other visualizations.

B

As just last week we had uh observabilitycon 2021 online. Obviously um a lot of what we just talked about. You can find in more depth uh without that rush to to cover as many questions as possible um at this location, grafanacon.

B

Anyone um that's also part of the slice. It's even a click.

B

Thank you very much. uh You can post talks on github like all of them for last decade or so. Email twitter are there for your per user and, let's see what we have as questions.

B

um Do we get created questions and they're read out or how does it work? I honestly don't know sorry. I didn't.

A

Anyone with a question just drop it into the chat, and richard can take a look at it as they come in, sounds good, we'll go from there.

B

B

So there are currently no questions, which means I wouldn't have had to hurry as much. I can also ad-lib and go into more detail and other stuff, but do ask questions if you have any.

B

How to orchestrate apps to integrate with grafana cloud?

B

B

Orchestrate is: can you expand on what you mean with orchestrate course? I think you're mixing? On the one hand, your own orchestration of application versus um how to how to emit data towards grafana cloud. I can try and have have a partial reply as as to the second part of that question, how I understand it um the easiest way for, for most things, is the grafana agent, which is what the prometheus agent, which was released today, is based upon.

B

Of course, this allows you to uh to channel all your your signals uh towards grafana cloud. um If you have any of the other interfaces like the common ones, um they're all supported um like, ideally, you you put things somehow into into a prometheus remote right um to to emit towards graphite cloud. If it's metrics for traces, open, telemetry tracing is, is the gold standard? So you should absolutely do this.

B

If you have non-prometheus things and there's an exporter for pretty much or for probably everything on the market to get data into prometheus format, and then you can use the agent or other mechanisms to um to push towards grafana cloud. If you want to um the open, telemetry collector also supports prometheus remote right, so you can also use this um yeah pretty much. Everything which, which is on the market, is supported, prom tail and such for loki and everything is built into into the grafana agent.

B

If you just want the bare bones open metrics to um to promise this remote right pipeline, uh the prometheus agent is better. If you want built-in exporters, if you want prom tail, if you want to have open telemetry tracing all those things built into a single binary, the grafana agent is better depends on your trade-off. Some deployment models like to have a single, huge binary, which does pretty much everything other deployment models, mandate that you have tons of smaller services, both as valid both as covered.

B

um Are there docker images available? Yes for everything, as far as I am aware, if not focus on on cncf slack or on on grafana community slack or shoot me a message, but I would be surprised if we don't have up-to-date docker images for everything. I'm.

A

B

We do um do you have an off-the-shelf helm chart for getting this whole setup.

B

um I think we do um there's tons of work on in our integration crew and we're hiring like crazy for the integrations crew, where all of this is made more seamless internally, we use tanker, which is jsonnet, which is then compiled into into helm and others, and also is able to to ingest term charts, which means you don't have this common problem of.

B

Of having those super static, slash brittle hem charts, which are hard to to change and and hard to to track, in particular, if you have both upstream changes and your own local changes where you functionally need to fork pretty much everything and and carry your own forks. If you, if you need to do anything more than really baseline changes, I suggest you you look at tanka and and js on it. um I.

A

B

Drop the url into into chat in a few, which is a lot more malleable and also allows you to define other things like like alerts and such, and you all have this in one one. Language jsonnet, which is quite nice.

B

How to integrate apps to send metrics or emit data to grafana cloud um depends on the type of of well okay. No, he said metrics, not signal, sorry, um well, okay, let's go with metrics and then with data metrics, um prometheus client. Libraries um is the gold standard for for emitting metrics, as of today um for uh data defined as traces open, telemetry tracing is the gold standard for logs. It doesn't really matter. Of course, logs is just historically kind of a mess, as most of you will probably agree.

B

So promtel can ingest pretty much everything and just hammer it into shape for for loki to consume. Again. All of this is built into the grafana agent, but for for your own applications when you need to emit the actual raw data from your own code, and you need to instrument your own code for metrics, prometheus client, libraries for traces, open, telemetry tracing and for logs, it doesn't really matter cause prompted eats at.

B

All does correlation happen between loki logs and tempo traces. So um going from your logs to your traces, the ideal case is you have an xml on your on your logs there. You know that your id for uh for that trace or that span or both ex-employer support, support free form text. So, as per wc3's tracing standard, we support both span.

A

B

Trace id and that modeling is also coming in large part from how google did it internally, like a lot of this, has a history from there, so it tends to already work nicely with each other and you just tossed it in and once loki is aware that, yes, this is an exopla.

B

You can just jump to your trace, storage and and just go there and there's also an inverse index where you can look up um trace, ids or exemplars.

B

If you have one, and you need to see that one log line, you can also go the other way, which is of particular interest, if you, if you came to your trace or your span through a search within tempo of course, then that exemplar is is like the shortcut back into into your logs or matrix.

B

Should kubernetes application services be designed in any particular way to use these tools? What is a good starting point to integrate these tools to custom kubernetes services running in a cluster great question, and it's not basic not at all for prometheus slash. The others is super simple, um prometheus.

B

I touched on this, but it didn't go in depth.

B

It has a thing called service discovery, which is just an interface where prometheus understands how other services run their thing first and foremost, kubernetes, but there's also things like text files where you just write, yaml and and populate your your service discovery for anyone uh more on the networking site zone transfer is is possible, so you have your bind or whatever unbound dns server allows zone transfers by prometheus and it just ingests the complete zone and just starts monitoring or scraping everything which is defined in that in that zone and again that is also the case for for kubernetes.

B

So you literally just point your premises at your kubernetes, and you tell your kubernetes that, yes, this thing may get the data and automatically prometheus gets all the data from that kubernetes cluster with or from the parts like services, internal blah blah might be different, depending on your precise setup. Maybe you need a sidecar blah blah blah the usual, but for the parts itself and such all, that is automatically emitted, which is super nice. Of course, it's literally one thing to set up and automatically you have all that data in your local prometheus.

B

If you don't want to have local storage or you have issues with state, which was the reason why we created the prometheus operator ages ago to handle state within within kubernetes, you can also just run the grafana or the prometheus agent and just shove all that data into eg grafana cloud or one of the other prometheus compatible offerings. Speaking of hermes compatibility, also on the prometheus block, again promises io, slash blog.

B

We did start a prometheus with my prometheus head on. We did start a prometheus compliance thing there or prometheus conformance thing where, if you are compliant to the relevant apis and service interfaces, you get certified as prometheus compatible, which means for the users that you actually know that a thing is promises compatible and- and you can just use it without fear of of something breaking um prometheus cortex grafana cloud are prometheus compatible.

B

Do you have any best practices blueprints for self-managed, grafana, slash, prometheus loki setup? Any best practices to optimize performance depends a little bit on your scale.

B

So if you have normal scale like if, if you're working at a huge company- or you run a team- and they have- I don't know how many users blah blah blah blah blah that is not as applicable, but if you have normal sized amounts of data, it's pretty easy because you just start a prometheus or a cortex or thanos. Cortex and prometheus have single binary modes. So you just start the binary and and you're done. In this case, I would recommend uh prometheus myself. If you get started.

B

um Loki also has a single binary mode tempo as well, um so you just start those binaries and you can just you can just start ingesting data into those systems.

B

As for prometheus, I would suggest the documentation on prometheio as to grafana cloud loki temple. I would suggest the documentation on grafana.com. um Those are the best ones.

B

Digitalocean also has quite a few um super nice prometheus tutorials, which are, I think, four years old, but they are super nicely written. So um yeah also we are extending the tutorial section on prometheusel, so yeah.

B

Does prometheus integrate with tools like istio? um I think I know the answer, but I don't want to give a wrong answer, so I can follow up and shoot me an email or something I'll I'll get you the authoritative answer from robot or from joe sorry, not from robert and and before I say something wrong.

B

13 more minutes and no questions. This is your chance.

A

Any other questions.

A

Have we stalled out.

B

A

Do you want to include a slack channel or something in the chat, richard um or julie, just to for any follow-up questions? Anything like that.

B

Yeah we have the uh I mean, for we have to split this for cortex and prometheus. You have you, have the cncf slack.

A

I do, let me put, let me put ours in and uh online programs and then, if anybody has any other questions, you can hit each other up here.

B

A

A

Okay, well, if there are no other questions, I want to thank you richard. Thank you, everyone for um hanging in there with us as we got things started a little bit of a rough rough start, but um I think this was a great one and you got tons of great questions and let's keep those conversations rolling and um thank you again, and the recordings will be up in a little bit uh this afternoon.

B

Cool thanks for having me thanks. Everyone.

A

So much thanks. Everyone.