Cloud Native Computing Foundation TAG Observability, 5 Jul 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2022-07-05 CNCF TAG Observability Meeting

Description

Metrics at Scale: Issues in OTEL Collector, Prometheus by Kevin w/Google

A

B

Hey, how are you.

A

Doing well, how are you.

C

Good today might be a really quiet day, we'll see um there was no toc meeting today and we're coming off the july 4th holiday. So I wouldn't be surprised if there's like you me and one other person, I haven't heard from allah leading either, but that was your uh fourth.

A

It was good we actually, our fireworks, got canceled, uh unfortunately, for unforeseen reasons. So, oh really.

C

Like rain or something.

A

uh No, no, no, no rain. I don't actually know what it was. We actually went to like an amusement park near us uh and they were supposed to have fireworks on site and then like an hour before they're like due to technical reasons, we have to cancel them. I'm like well that kind of stinks, because that's why we came how about you.

C

um uh So my my girlfriend and her three kids are in colorado uh for almost two weeks, so they come back on friday or third friday and uh they're they're like having a vacation with family and my oldest uh he's 17 and will be a high school senior next year. um He is, uh he was accepted to rhode, island school of design, risd's pre-college program. Oh wow, yeah, that's awesome! There you go on saturday. I think and uh pick them up uh the first week of august like so it's like me and the dog.

C

A

Very quiet quiet weekend, the risky that that's really exciting congrats to him. That's great.

C

Yeah I'm pretty psyched about it. I'm actually plowing back through the notes from like a year and a half ago, maybe because I had identified way back then some sig roll or some tag roles that we really need to fill. um You know both tech leads, as I talk to you about a chair, uh but there's a bunch of other stuff that you know tags are free to make their own roles.

C

um So, for example, we need someone to work on um social media and communications like in an intentional way. It's an actual job. It takes time. um You know we need artists and or design folks to work on a tag website and or other you know, collateral and artifacts, and things like that. So um I remember I captured a lot of this a long time ago and so I'm being a little bit lazy here and hoping to copy paste because none of it's changed um but a couple others too yeah.

C

Well, we are now two minutes over so I'll give people another minute or two to filter in um but, like I said, given the holiday and the proximity, I wouldn't be surprised if we have a really short meeting today.

C

Let's see what else has been new, um but I should say for the recording. uh This is uh the first tuesday uh week uh of the tag observability um the cncf sponsored uh group and channel and stream. So please don't do anything that would be a violation of dakota conduct.

C

C

I think we might have a couple new faces here as well, so in a moment I will forward the notes people can sign in.

C

Have you been following the profiling stuff? I would imagine.

A

Oh yes, yes, no, a lot of a lot of interest, early on which is exciting and a lot of non-uh hotel folks, which is even more exciting. So, yes,.

C

Yeah, I'm pretty I'm pretty psyched about that as well.

C

So I'll put it in the chat here, there's a link to the meeting notes feel free to sign in.

B

B

D

B

Hi steve, how are you hi.

D

C

How are you um I was telling steve that as we're, you know one day after july 4th and no toc meeting today, which always happens just the hour prior to this meeting yeah, we might have very light attendance.

D

Yes, I think most folks in the us at least, are out uh still on break.

C

Yeah, I think kevin from I think google is here.

B

Yeah thanks hi kevin, doesn't know what it works. How are you.

E

Hey um does it work? Yes, okay, good.

E

I got the message I was like yeah. Okay, I won't just stop by anyway you're right.

C

um So, what's your team, do uh you kind of gave a teaser in the channel.

E

Yeah, so my team specifically is gke metrics, so we take. We collect metrics in our kubernetes clusters, mostly system metrics, in like promises format and from like the cubelet container metrics and then send them to google monitoring for customers to see and right now we yeah we do that using uh open, telemetry um and we're slowly moving away from open telemetry um to do more optimizations on memory efficiency, because we noticed that parsing, the promises format is expensive and we're like doing a few experiments on how to handle metrics more efficiently.

D

No, that's cool, that's very interesting. I mean uh you said that you're moving away from hotel or or yeah I see um is that the mainly because yeah.

E

um Right now in on every gk node, there is a container called gk metrics agent and it's a slightly customized uh open, telemetry collector that finds all parts in the on the node that export prometheus metrics. Like all system parts.

D

E

Scrapes, every one of them using open, telemetry and prometheus receiver and sensors metrics and the problem is it's like this one single thing and it scrapes a lot of metrics and with the amount of metrics and the script and tools we're getting to a point where it consistently out of goes out of memory. At some point it doesn't grow well enough and we've had other issues where, like open telemetry um uh is seems to be still changing a lot. So like we updated open telemetry.

E

Suddenly most of our metrics were gone and we didn't have a proper way to conform and test. All of that. So it was.

D

Specifically, the collector.

E

uh Specifically well yeah the collector yeah and.

D

Was it specifically the prometheus receiver or.

E

uh No, the reason the metrics were gone were changes in um open telemetry like in the collector specifically like, um I think, the exporter and like we use batching to send 200 metrics at once, and that doesn't always work correctly. So we ran a few problems and mostly that, like the collector, is a great great for specific use cases. But in our case we're collecting so many metrics in one process and it's no longer.

D

E

Not whatever yeah and we want to split it up into a mix of side, cars and um other options and and right now we're working on the side car to scrape prometheus metrics, convert them into that monitoring format and send them away, and that's what we came across openmetrics, because um google cloud monitoring relies on start time to be there for cumulative metrics and right now the open telemetry. Well, not in the current version.

E

I think, but in the version we use open, telemetry relies on the process start and metric and that's somewhere at the bottom of the metrics response. So we always have to buffer way more metrics than we want to until we get that so we're kind of excited to see if openmetric, uh oh.

D

Very good I mean again, I think.

E

D

Would be super interesting because you know when um I mean I worked on the prometheus in drop, you know for the hotel, collector and otlp specifically and- and you know um one of the uh issues, as you said, is you know, handling scaling up the collector to to be able to handle.

D

You know shorted uh sharding as well as um you know, not sidecar sidecars are used typically, but you know really stateful set support, so that was something that we built out, but um we did actually make sure that it was open, metrics compliant so from an you know, format and a compatibility standpoint.

D

That certainly exists, but you're you are it's. I think, in the scaling that we did. I did see that you know there was uh scaling of metrics, especially when you have like a stream of metrics coming in for uh in a really large number, there's still work to be done there for the collector, so.

E

And and the biggest well a the prometheus library itself has some yeah exactly inefficiencies.

D

It's also not fully compatible with open metrics, believe it or not. Yeah.

E

That's like I'm, also joining prometheus maintainer sessions now because we want to like maybe help on there and to implement um but like what we noticed is, for example, re-labeling is very expensive. Like we just yeah, we recently fought the prometheus library because, like we know what metrics we want, so um we can already.

E

If we see the metric name, we can already stop parsing and we're labeling if we don't care about the rest, um that's just like optimization and um like in our case, we collect metrics from many containers and when we notice that that if a single container has like a cardinality explosion or sends too many metrics, that's enough to lose metrics for everybody else, and so that's why we're kind of splitting things up a little we're just blast. Radius have.

D

You have you uh proposed some of these changes back to the um collector's sick at in hotel.

E

um We have a regular sync with um uh david aspel yeah, um but well we're proposing some of those changes, but in general I think. um Well, I'm not sure if what we're doing is actually the target use case for the collector. So we're that's. Why we're kind of branching off in this other direction? Also because the prometheus metrics we straight like um we don't need most of the logic from resistors. We only need their parsing for the text, format and hopefully proto soon, we'll see with openmetrics and so on.

E

D

You looked at the operator in hotel for the hotel collector because that's where that scaling had was done- and um you know one of the areas, as you rightly said, is that there's a lot of stuff you do not need from the prometheus. um You know uh dependencies, you know and and optimizing and making a lower. You know smaller footprint. um There would actually be ideal because it's actually quite it's carrying a lot of baggage uh in that whole process. To my.

E

Understanding right now, though, like that's, why we like we have a fork of hotel, uh hotel, collector, contrib and prometheus, because we had to make optimizations in the prometheus library and then in the receiver, library and and it's a whole chain and the optimizations we did were kind of like hotel. Right now takes the prometheus library, so you pass the electromycius config and any optimization we're doing a prometheus site. You can't really because it's like nested in three libraries yeah.

E

So if hotel would yeah the hotel needs to support proper prometheus, that's why it probably won't work for us. I.

D

Think I think there was another issue, and- and this was something that has been ongoing- is where uh to ensure from a compatibility, uh configuration compatibility standpoint, full compatibility between the two formats of configuration right.

D

So uh there is a proposal for a you know, standardized or a more sophisticated configuration manager which is actually handled with the remote there's, a remote agent uh work group in um in the hotel six, and that's where you know this agent management, especially with configuration management specifically, is um you know, being discussed right uh that said again, there were there is.

D

There are actually a few design proposals that were made in terms of improving the configuration management for hotel and making it more sophisticated and fully compatible with you know, users not having to change a prometheus, existing prometheus configuration, for example, and the format and it being interoperable with the hotel configuration um injection. So again, I think david's also aware of it, but yeah.

E

D

And and steve it might be worth you know, while kind of having a more detailed discussion, because I think that that work got a bit uh less prioritized by the maintainers. um So we should definitely, you know, figure out how we can get some of these requirements aligned, um because these these are no known issues they're, not new and uh kevin we'd. Love to you know, make sure that uh hotel does address them, because it's not only you, it's actually many other users who are um having the same issues and it's the same.

D

Recurring theme.

A

Three people we're going to schedule a meeting. We will talk about this, so I appreciate you raising.

B

D

You don't know how many issues I've posted.

C

You've also um touched on sort of an operational, pragmatic, real world kind of issue. That was my experience as well running, in particular multi-tenanted clusters. um You know where you're using name space to get sort of pseudo isolation. um You know. Ultimately, you know the purpose-built clusters worked better for us, but in that, even in that scenario we found the need. You know to have many prometheuses.

C

You know one for the cluster itself and then and then you know, teams or whatever, whatever um self-service you know or or team ownership scheme is, is at play. You know that was the granularity that we needed to give everybody their own prometheus, um because cardinality bombs would take out not only.

D

Yeah, yes, but, like you know the clusters metrics.

C

Itself right, so I.

D

Like that them no yeah.

C

Well, there's actually something we used. I don't know if it's so called this, but it's called the cardinality bomb detector uh and it literally looks at the number of series and some other things as a rate and then sets up alert manager configs. You know if something is blowing up, but but again in practice. If you don't have those blast radii, um you know, then you know it blew up. um That's great and and that's.

E

C

E

C

Like have like you're blind your metric, all the alerts go nuts right because all metrics kind of just stop as prometheus grants to a halt so and and paris doesn't help right. So um that's.

E

Something we're kind of is one part of our list of we're building into this new scraper. Whatever um is yeah. We like we can measure how many points we get for a single metric and we know how many to expect. So if we get too many we're like okay, hey, this is the thousands uh a thousand like I don't know. We we had like bucks where there was like one metric straight was like 250 000 lines, and usually it's like 10.

E

uh and we're like. We can't tell like okay, let's start dropping metrics instead, because that's we alert the exporter and like say, hey, you're, sending too many metrics and there's a bug and we start drop and fail gracefully, um which, right now, with the current architecture of open telemetry, I think, would be very hard to get into and, like our team, developed kind of the stance of open telemetry is great, but it's way too general for the optimizations. We want to have.

D

I think I think uh um to that point um and sorry steve I just um wanted to complete this thought was uh kevin, that there was a discussion and there is actually an open discussion on having a lighter weight. um You know optimized, collector and and- and I think that again I'd invite you to you- know kind of chat with us about it. Because um uh again we totally understand that you know not all use cases fit one.

D

uh uh You know snapshot and- and uh there are obviously that's the reason why distributions also exist downstream, because you know you can actually shed the number of processors or tune the you know, components which are part of an general release. If you will of the collector and and that's one of the ways it's been handled, but that said again, I really would love to see- and you know, high cardinality optimized um uh version of the collector with the with a smaller footprint.

C

And there's really two scenarios there right, there's the high cardinality all by itself, which can be challenging, but you know it was a straightforward, optimization problem and a resourcing problem, but but the sudden cardinality bombs are spikes. You know I found myself wishing. Maybe this is already in hotel. I don't know we were at the time we were running. You know prometheus directly and using remote right, uh hotel collector wasn't quite as grown up.

C

This was like three years ago, um but I found myself almost wishing for like a circuit breaker pattern inside the scrapers right, so that for a particular target and there's different ways, you can implement it. You know you could do based on delta or absolutes or you know all manner of rules, but you know if you could have a per target. You know this is just too many I'm going to just stop and lose metrics for this one target, rather than try to drink from the fire hose and tank everything by the way too.

C

You know if you don't have a wall right ahead log in prometheus, that's like dramatically larger than defaults right. You have a very, very short amount of time.

D

David ashford actually wrote the wall for yeah.

C

I mean it was awesome, I mean it. Let us mitigate this, you know yeah.

D

But I mean again: it's not perfect. That's.

C

What I do right right, um you know, and that, and if we did have that with some same defaults, you know as well or at least the potential for them, then the complexity of like what we had to do, which is like every name space gets its own prometheus. You know the cluster gets its own prometheus, but then people might want to do rules that branch across promethei. So now you need like a multi-tenant cross tenant.

C

You know query capability and or scraping capability if you wanted to do recording rules to record rates, for example of, like you know something in your app combined with something from the cluster right, so it very quickly gets complicated and in a sea of painful our backs. So I think, like an elegant fix a little bit upstream on the collector side might really have a lot of knock-on benefit, but kevin definitely.

D

I think there's value in following up steve. Please go ahead. I know you won yeah.

A

I think there's several things right like there's. Definitely some missing features. I totally agree with the circuit breaking by the way like we should totally implement something like that to offer kind of flexibility, but you're you're spot on kevin. Like today, the hotel collector is kind of a generic like just trying to support a bunch of use cases.

A

We are definitely going to head down the path of needing to optimize like we're going to have like an edge instance or like a high cardinality instance. Or what have you because these use cases sure I mean google has massive scale, totally appreciate that, but some other users of this system are also going to have some similar type behaviors that need to be handled and just having a generic solution. The collector's not going to be great.

A

I I want to just second what eloita was saying, which is we would love to chat, understand those use cases better figure out how we can better support you, and even if it's not today, even if it's like in the future, how do we get there? Because some of this, I think, is applicable to many users out there.

D

E

And, like one more thing I think can share, is um we like we've spent? I I think I I like. I joined google october and since I joined I've been mainly working on the collector and optimizing memory stuff and one thing we notice that we're getting to a point where um we optimized a lot and now we're mostly fighting the garbage collector, um which and so.

E

uh We're like looking into four metrics like collecting parsing sending if we would be better off using a non-garage selected language to do that because, like the metrics collecting part it like it's a bunch of strings and numbers, and then we throw them away and collect them again and that's kind of an experiment I'm into right now, um because we like at a certain scale. You notice that, after a scrape, even if you use 60 seconds, it takes longer for the garbage collector to clean up behind the metrics. It just collected.

D

E

Then the next trade comes along and you go even higher and keeps going and it's not always great. I.

A

Mean the governor's collector was tweakable at least there were customizations for it. I don't know if we kept that functionality, so maybe you can play with it a little bit, but if not then yeah, probably a different garbage technique would be beneficial here.

E

So that's kind of an experiment. um I I don't know if that was ever discussed with hotel to do the collector and like yes,.

D

Yeah yeah actually kevin. I mean these are all known issues. It's more that I think there are a couple of things here. One is um again it's on list for the hotel collector. You know to address all of these areas, but um it's also you know.

D

We've worked with dave ashford here, you know pretty closely as well as um some of the other engineers who are involved, but I think there's also your input there as well as other feedback, would be useful because I mean this is something that even uh we are seeing um and- and you know so- it's something that I I definitely would love to see more prioritization on and and please.

B

D

With your design suggestions, you know, please come with.

E

Your changes, don't.

D

Don't uh hesitate because you'd like to ensure that these changes are made upstream on the project itself. So uh again don't be shy. Yeah.

E

No, no like yeah right now, I'm doing a lot of experimentation and learning and I'm I definitely want to bring most of it upstream somewhere, because yeah.

D

Yeah and then typically, the project is very, you know, very accommodating and really supports. You know, in fact, one of the strengths and weaknesses of hotel is that it's it's incorporates everything. As you said it's like, then it becomes a big fat general collector right. I have.

B

A question very kevin too, but yeah I'll wait till.

E

B

Want to cut you off no.

E

Yeah and and one thing about that very quick- um we found something surprising in a profile a while ago, um but I think it's prometheus in the hotel, the prometheus library imports for its discovery stuff. A lot of third parties like- uh and in this case I think it was an aws library that always allocated, uh I think, one or two megabytes of memory, um just by being imported as go code which, like we now do like when they're magic to get rid of dependencies.

E

We don't need because for me, this library allocates memory for something you don't use and we're like. Why.

D

Oh, it was a secular dependency, it could.

E

D

Because uh there's a lot of dependencies on the prometheus side that automatically you know you your.

E

Discovery is the result and.

D

Then that's go partially, but it's also. You know, I think that could be trimmed for sure.

C

um You mentioned that you had forked a few of the the pieces of hotel. um I was curious what your experience was. You know in running with those forks like, for example, was it easy for you to replicate ci? Was that straightforward um and you know, or or you know, and and also like? What's your predicate for pushing these changes back up so that you're not having to work off a fork? So you know I'm kind of curious like for someone new who wants to make a change like what you've done.

C

You know forking stuff, so you can make improvements that that should be frictionless, and I haven't done it myself. Yet I'm kind of just curious, like as feedback to the hotel folks um yeah. What was the experience like like? Was it pain and suffering or straight.

E

Well, the forks are not on get up; they are an internal system because for the like build security, when you have the code around, um but I mean the forking was clone push um we don't like the ci part we didn't replicate.

E

For I don't know if the hotel collector forecast, we don't maintain that if that has a ci, um but the forks are really there, um mostly especially the prometheus and the hotel ones, to give us a way to quickly test the develop fixes and build images internally until we know that it works and then contributed back upstream, um because importing the data versions into our builds is pain.

E

C

What I meant was like when you're in that workflow, when you're fixing those bugs, as as you need as you need to with a fork like you know, was it a bunch of work to be able to have the same ci that would run upstream run locally on your mirror or fork, or was that sort of something you had to engineer around and or bring your own ci.

E

To right um the collector, we have the separate repo and we built like we built the collector ourselves. We have our own ci and testing and stuff and yeah, so we don't use the hotel collector binary. We have built our own binary because we add stuff to it um and for the other parts like my process right now is I have all the forks. I need to check that locally and when I test something I overwrite the go mod file to point to local paths instead of a version and that works.

E

uh It's not it's not great, but it works. Yes. So.

C

Maybe there's like a maybe there's a you know nice to have kind of issue that we could log a hotel just around you know new contributor experience and or reducing friction to you know it's like if I wanted to make a fork to make a patch or something you know like it should just work out of the box.

C

I want it all work, but.

E

You know, I think what was more challenging was whenever we updated the hotel libraries. um There is a lot of connected libraries like there's the collector and then there's the contrib, and for me and like the one where a few times we had it where we updated open, telemetry collector to, I think the one was from 0 35 to 0 37 or something and a lot of stuff went weird, and we didn't like understand why and then you have to go through a lot of diffs and and understand like across three four repositories.

E

What changed in these versions to find out why it went bad.

D

E

um In that case, no because it was a very subtle change and we, but it like it, also because we relied on probably implementation. We relied on labels and there were no longer there at some point or remains in some way, and we didn't notice.

D

E

Like one of the customs things.

D

E

One of the custom things in our collector is we define in our config what metrics we accept and based on and what labels we accept. And if we see a metric with a label we don't know, we just drop it. So there was a label added to every metric and as it's not on our allow list, we just dropped them all and we like, but it's hard to debug and we didn't know and like david, helped us a little and we found a twist out, but it took a while.

D

E

It's a big dependency tree. Let's put it that way,.

D

But I mean again, you know if you have run into issues like this in the future. Just ask because I can you know again, there is a collected uh maintainers channel and you can just ask.

E

Right now we asked uh david, I don't know if we should also ask somewhere else.

D

Yeah yeah, I mean uh typically.

E

D

I I mean, if you're on cncf slack, I can bring you the names, but you can reach out to uh you. Can reach out to alex burton you can reach out to bogdan. You can reach out to um jurassic there. There are a few maintainers that you can reach out to so in case you ever, you know, have an issue where you are seeing dependency, changes or any kind of breaking changes.

D

Please please, just ping us bring me I'll, find the right edge here.

C

um I I think I think you had mentioned that you and or your team were looking for ways to contribute to the technical advisory group. You know this this body itself. um Do you have any idea about this? The kind of things you'd like to do I mean I think I I think it is the case that we have a pile of work streams defined but not resourced, um and you know the way that tags work like if there's something that you or your team want to work on.

C

That's in scope uh for for the for the tag and it's there isn't an issue for it or something like that. Then we'll just make one right. It's a it's! A community driven group.

D

Yeah and then especially, I think that kevin as you called out you know there are specific use cases even for the hotel collector that actually can be defined uh in in the in this forum, for example, and- and you know then worked collaboratively with with hotel with the you know, maintainers right so again leverage it, because that reiterates, you know the necessity and.

B

D

For supporting different use cases specifically particularly.

C

If they're pragmatic, you know anyone would hit this at scale or even not scale kind of kind of kind of things I mean I've noticed over the last year. You know there's a dramatic disparity between, like you know, the 900 plus people in our channel and the you know, generally less than a dozen uh that that come every couple of weeks right. So I I feel like there's a huge untapped reservoir of people who are lurking right, and so, if we engage in work that meets people where they are, we might find ourselves.

C

You know um with more resourcing and more contribution from from the community, as well as more feedback to vet. Some of these you know ideas. You know in this case for hotel, but just generally for observability. So.

D

Right, right and, and I mean going back to kevin your point of you know when you're really looking at high volume uh streams of metrics, um you know what what architecture actually works because even prometheus has you know, issues and and um again that needs to be addressed for a you know, better user experience that needs to be addressed in all of these stacks, no matter what right, so, those discussions you know need actually to be had across projects, and you know really addressed to scale to you know what is needed today right because again, uh these stacks have evolved over time.

D

Prometheus included.

E

Yeah and like I think, that's mainly my well mainly my focus, we only for people, so we split things. A little um is yeah performance and efficiency, performance efficiency scalability.

E

So I I or we noticed this well ongoing push for prometheus as a standard which I'm mostly on board with I'm excited for openmetrics and would like to work on that, because I think it's an opportunity, because one thing is prometheus, like I've been using it, for, I don't know how long and it never seemed scalable like that's why we see like product mimir or thanos or whatever, um and I, like I wanna, see how this tag is an opportunity to.

E

I don't know, advocate for scalability and approaches to be having metrics more scalable um and efficient. Because did you.

C

um So what do you mean by scalable? I I hear you I'm not disputing anything, but but do you mean scalable in terms of like durable storage and or retention and ability to query or total number of things that can be scraped or memory overhead, like what, in what dimension of scalability are your.

E

Multi-Dimensional like my biggest issue right now, is, for example, prefixes format is uh the text-based format with like you have a lot of string like the metrics names along. It's it's a lot of overhead head that might be worth optimizing away um if possible. Like that's.

E

Where I'm curious about the up and downs of the proto format, I know it was dropped and now it's coming back um and yeah like optimizing, the processes that collect scrape, send store, metrics yeah, um because that's what where my pain is right now that's kind of over the last few months, where I build my my opinion and now I I'm looking on how to use it somewhere.

D

No, I I completely agree uh kevin with all the I mean again. I've seen this same issues for the last couple of years, so we've been working on it steadily, but it's just that.

D

I think that feedback as well as um you know how we actually look at it at an overall scale, uh where some of these issues are standardized, because performance benchmarks, for example, and um uh can be standardized right, I mean that's um depending on you know what the definition of different categories of scalability is right and- and um those are you know, those are some of the areas that um both prometheus as a project, as well as a hotel as a project have been interested in uh and.

E

We always want more and higher resolution metrics.

D

Yes, exactly and high technology is very important. I agree.

B

And this is I'm sorry.

E

Oh and like just for me personally, I I see like with the also the ecosystem narrowing down on prometheus. I just want to make sure we still keep thinking about. I don't know the like. What comes after, like yeah prometheus is awesome. I really like it, but I don't want to stay stubborn on okay, that it's prometheus and we stopped thinking about the next innovation for metrics or no.

D

No, and and absolutely because I think that you know as the as the uh platforms that we are pulling data from for telemetry for observability change and they evolve right. The formats need to be compatible the scalability, you know requirements and stability. Guarantees need to accommodate that right. So it's something! That's always in work in progress.

C

Also, the protocols, like some of the things you said are analogous to some of the things that have been coming out of the hotel profiling discussions, uh particularly around and here again google has a stateful protocol versus a stateless protocol. So you lower the overhead of like these repeated strings over and over and over. So you know, as we do look at this ecosystem.

C

I couldn't agree more that you know we want to be looking forward. Not you know saying well, we've got this one thing and it's good forever, because then you get surprised, but um I think inclusive to that discussion is also you know the protocols and their overhead and the nature of them, and how to do that in a way that that doesn't have a one-size-fits-all kind of.

D

Approach um here we.

C

Do need to transition to a couple of bookkeeping items, but I can do them really quick. I just need to make some announcements before we hit the 50-minute mark, uh so we've got another 10 minutes. um There are a couple people who joined while we were talking. um Do you guys want to say hi or was there anything specific you wanted to put on the agenda either for for this meeting or a future one.

D

Gaurav christian.

B

D

Doesn't happen, we're just calling him saying.

B

Yeah I was fairly irregular in the previous meetings a few months back, but lately I had no time to join this course. So this week I had some time and I checked what's going on and it was a really interesting discussion. I'm glad I joined good cool.

C

um All right, well I'll just do do we want to. Is it okay to transition here? And I don't want to be too abrupt, because this is actually yeah.

D

I think kevin again ma'am just to wrap up. I was kevin thanks again for, for you, know, kind of discussing uh these different areas as an action item stephen I'll, follow up from you know, hotels. We have uh to make sure that we, you know, have not missed any of these um requirements as well, as you know, work towards figuring out how we can accelerate some of these changes that you know are already on our radar, but we should be made.

D

You know sooner than later, and also would love to you know have a little more time from you if possible, on the hotel side, you know to actually review design proposals as well as uh provide feedback on. You know some of the code that's been developed already. um The collector sig may be a bit noisy, but I do welcome you to you, know kind of join in there it's on wednesdays, at nine uh pacific time and um but you know happy to also help pull in the collector leads to be able to.

D

You know, discuss with you. We work very closely with david ashford, but you know he also has a lot of different projects. He's working on and and uh with josh suarez uh also, but you know, josh is also kind of spread thin right. So please get more involved yourself and- and you know uh ping me on slack- let's you know work with steve to figure this out.

E

Yeah, and just just not like, is there people specifically from the start, working on openmetrics or the implementation.

D

Yes, there's the prometheus web group right that we have and we work with openmetrics very closely there. Most of the you know: active uh maintainers on openmetrics join in there. uh It's on wednesdays, at eight a.m. Pacific time, uh there's a work group meeting every week. So you know if, if that's the time that works for you please, you know, please do join um tomorrow or next week. Whatever works for you.

E

Yeah I need to I need some time, so I need to check. Is it in the cncf calendar or.

D

It's on the hotel calendar.

E

D

Elementary yeah yeah.

C

Cool um cool. I wanted to make just a few brief uh sort of announcements because, again um by looking at the youtube video uh uh download uh viewing numbers, you know who's actually watching these, um uh it's quite a bit larger than than folks. That can be present presently.

C

So uh one piece of that is our meeting times are almost are always you know the first and third tuesdays uh at noon, uh eastern, which is great for the various folks in americas, but uh we probably should move to a cadence where you know every other meeting is a little bit more suitable uh for asia, pacific and and europe. uh So uh so that's just one thing.

C

You know I think, maybe next in two weeks we can kind of make something more formal around that, but is going to put that out there um as far as nominations go we've, we've kind of been running lean. If you will, uh in terms of roles, so um I intend to nominate uh steve flanders as a co-chair um co-chairs, have a two-year duration and, uh as some folks might not know, steve and he's here today, I want to give him an opportunity to introduce himself and just um uh say who you are.

C

Maybe for those that are uh not familiar uh and I'll say I met steve close to a decade ago. Virtually I don't know if he remembers, but when he was working on log insight. um We were both in in in a similar in a similar part of a vmware, but.

D

Steve's awesome.

B

It's good to have him.

A

Yeah so I'm steve uh I've kind of been in the monitoring observability space for for more than a decade now I've worked in the logging space at vmware. I did uh apm uh or distributed tracing at a stealth, startup called omnition that was acquired by splunk three years ago. uh Now, I'm at splunk and I'm also working on the uh metric side of the house. So I swung infrastructure monitoring, which was previously signalfx at omniscient.

A

I was working on the open census project and the open census service, which is now the open symmetry collector and I've been involved in basically open census and open telemetry, both projects since very early days since the very beginning so very, very passionate about what's happening in this space. I think it's really innovative and necessary.

A

I have lots of customer conversations across many different companies that I've been at and kind of feeling the pain of how hard is it to actually do observability in general, but how to do it in a way that's kind of like vendor agnostic, because you don't really want to rip and replace that's a very heavy lift. People are just looking for a generic solution and then kind of flexibility through configuration, so they can leverage whatever back ends. They want to whether it's open source proprietary, local sas whatever right.

A

um So I really think that the open symmetry project has a lot to offer. So it's a long way to go uh still quite early days, but uh really proud of the work that everyone's been kind of pitching in and given that we see broad adoption across cloud providers, end users, observability companies and the like, like. I think it really points out a pain point so area that I'm very passionate about uh used to be a pretty extensive blogger.

A

I haven't been doing that as of late, because life has been busy but hoping to get back to it here soon. That would be great, um but yeah definitely passionate about uh observability in general. I love the cncf and I would love to be helping out in this if this tag. So thanks for your consideration.

D

Awesome, uh that's that's great steve, thank you again and, and I think uh you know as going forward, um I think you know our meetings. As you know, steve and I know you've joined.

B

D

Is uh twice uh every every first and third tuesdays of uh you know at 9? 00 am, but I think, uh as matt was suggesting we might also. We should at least address one meeting in apac times uh once every month. If not, you know so alternatingly. If you will and- and I think, 4 p.m. Pacific might be a good time, uh although it's 7 p.m, on the east coast, um but still still doable right because again, it seems to work out.

C

I'm on the east coast, uh I believe, as is steven and uh you know, we've enjoyed, I think two and a half years of noon meetings. So um you know, I think.

D

Where is it am on pacific? It's like the hardest time.

C

And the second one he's not here today he had a conflict, but henrik rext. uh He runs a couple of podcasts. One is called it's observable, that's been running for some time now um uh and he covers a lot of issues around observability and open telemetry and also there's uh something that he and michael hasenblast have been doing. A little bit uh called, uh I believe, open source news.

C

um uh He's he's been vocal and helpful uh and is a great example of someone from the community- that's just persistently been here uh and and is helping out uh in a variety of forums. So um you should be here, uh I think in two weeks, so um that was the other one um and then lastly, I put in a couple of links. uh We talked about it before um about other sort of roles. That would make a lot of sense. I think uh so.

C

If anyone knows anyone looking for an excuse to contribute, um we either have one or can make one. So um that's really all I have for today and it's 12 49. I think for the first time in about a year at least the agenda items have been covered by the 50 minute mark. So I'm going to take that as a win, but if there's anything else folks want to chat about, um I think the cncf zoom doesn't cut off for another 12 minutes or so.

D

No, I think I think this is a good discussion and kevin. Thank you again, for you know kind of bringing in some of these areas. uh We definitely love to see. You know hotel uh and open metrics uh and prometheus continue to work closely in this in in terms of metrics support, you know and and handling the different use cases. um So, let's, let's continue to work together and, and we I'll also reach out to the prometheus and open metrics communities.

D

Typically richard joins in uh from openmetrics, but uh would love to see other folks also joining in from the larger for the larger discussions. Yeah.

B

Looking for good so.

D

I pinged you on slack uh kevin, so let me know if you thought. Thank you. Sorry, thanks matt, thanks for setting this up. Thank you steve. Thank you. Chad, later folks take care.

D