Cloud Native Computing Foundation Open Tracing, 9 Feb 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenTracing Monthly Meeting - 2018-02-09

Description

Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.

A

So I'll, let me share my screen.

A

A

So, can you guys see my slides yep.

B

A

So thanks man for the introduction. Once again, my name is naman Abbas I work at Pinterest and basically I work with their with Pinterest visibility. Team and our team bill builds tools like metrics pipeline log search and distributed racing before pin.

C

A

I work at companies like Netflix and Microsoft, where I was building components for their cloud platform. So in this talk, I'll just cover how we use our Pinterest data to to solve performance challenges.

A

So here's the agenda for this talk we'll start with the pin trace infrastructure, basically how our distributed tracing works. Then I'll go over some data, visualization tools that we have failed to look at this data and then some more tools that we've built to do deeper analysis and then some future plans for for for this project.

A

A

This is how our tracing pipeline looks like we have instrumented most of our services, starting from the clients like iOS, Android and web, including CDN, the front end API and all of the backend services and all these services. When a request passes through these services, they log trace data to our Kafka pipeline. In this diagram, we can see the black arrows are the requests, the user requests path and the dotted arrows are or the trace data path. So all this data from all these services goes to Kafka from Kafka.

A

We have a spark collector that picks up this data and pushes it to elastic search from elastic search. We have built bunch of tools to to view this data to search this data and and then to do deeper analysis on this data and in this talk, I'll quickly, go over all these tools and explain why we and what are different, use cases of these tools before I move forward. I just want to highlight few things.

A

The first thing is end to end, so our tracing is enabled from from the very start from the client to all the way to the back end. So we capture we capture all steps of this process. This includes timestamp, is spent on the network between the user and CDN time spent on CDN and the time spent by all back-end services.

A

Second thing is it's all real time, so our users basically are engineers they can access this data within a few seconds of the request happening, so there's no lag in this pipeline. I think in worst case all the all event, data witches are back and within one minute.

A

Third thing is we we built spark collector. This is basically to to consume data from Kafka and push it to elasticsearch, but we added some additional functionality in this collector as well to do some kind of post processing so that we can do some feature extraction or data cleaning or data blacklisting, or these kind of things just want to highlight one more thing here that this spark collector does all the processing at span level and where we're planning to extend it.

A

To add, when doing in sparks so that we can do some post-processing at trace level and again, we need this ven doing functionality so that different different events, different spans of our trace, can reach at different time. So unless we have this window of one minute or something where we can get all the spans for a given trace and then process them, we cannot do trace level processing and the last thing is scalability.

A

So our current pipeline is very scalable right now we handle around 20 million data events per day and we have seen 10x spikes in the data volume and so far our system scaling pretty well for these spikes towards the end of the presentation. I'll circle back to this point and explain why why having a scalable system is very important, so we'll start with how we view our traces to view our traces. We use the kanji. Why it's a really nice tool to look at the details of a single trace.

A

It gives you the timeline of all events. You can look at metadata and annotations and all those things one thing we did was we extended this tool to add archive feature.

A

Currently, our elasticsearch retains the data for 10 days, and we had this request from our users that they want to save some critical traces that they want to use in their bug, logging or want to reference later. So we add this feature to our archive: trace.

A

I'll, just quickly show a couple of screenshots from how we use use it in UI, so basically, in this screenshot you can see that we highlight errors in our traces. We also add in process fans, for example. Here we can see, there's a sub spent for authentication, there's a sub spent for request, handling and processing. This really helps us in in performance tuning our services. We exactly know how much time is spent in user authentication and how much time is spent in request, processing and then encoding. The the response.

A

And we do a lot of annotation in our requests here. We can see that we add errors and we add stack traces. So, let's say I request fails. Our engineers have a lot of rich data to to look at and they can exactly see the line of code that triggered that failure.

A

That can also comes with a service dependency graph. The the out of box service dependency graph did not work for our scale, so we had to rewrite few of a few of its parts too, to make it scalable for our scale.

A

But this service dependency graph is really useful for us to get a bigger picture of our infrastructure and especially for cases where, for example, recently we're deploying our services to a new AWS region, and this really helped us in finding out what what are the upstream and downstream dependencies for any service so that before deploying the service to the new region, we exactly know what what other services should be deployed there first and here's the zoom in view of this service dependency graph, it kind of shows basically how the request comes from client goes to CDN from there.

A

It goes to ng, API and G. Api is our front-end API service and from there the request fans out to all our back-end services. We have around 80 services and they're still growing, so so yeah that that kind of tells you the scale of our micro service infrastructure as well.

A

Ok, so now we know how to how to look at a trace. The next problem that we wanted to solve is how to find the most relevant traces for a given issue. Zipkin, do ID gives you some functionality to to find these traces, but, but still you don't get a visual representation. So first thing we did was we integrated our tracing data with our metrics metrics visualization?

A

We have a tool called stats board and what we did was we overlaid.

A

This trace data on top of our metrics data and from here user, get a really good visual represent of spans, for example, in this case, we're looking at the metrics for for one endpoint for for a service and we're overlaying, the say the corresponding spans for the same endpoint and here user can see that with which which traces correspond to the p99 latency of that endpoint, and they can quickly see that okay at this time there was a spike. Let's look at these set of traces and they they would be good. Representative of that spike.

A

What so next thing that we we want you to look at is trends in our in our traces prints like is it's a latency increasing? Is the error rate increasing for a service or not so for that? What we use is Cubana dashboard. So, since our data is in elasticsearch, it's aligned, elasticsearch comes with this Cabana plugin. This. These dashboards were really helpful. To give us a high level trend in our traces. We have three main dashboards. One dashboard is for services where service owners can look at their trace trends.

A

The other dashboard is for clients like iOS, Android and web, and the dashboard is for for the Pinterest developers and operation people so that we can see how Pinterest pipeline is doing and if, if there are any latency or delays in in our data, so for example, here we can see the dashboard for for a service and the service owners can see how much spans are coming for their service. What's the breakdown per endpoint?

A

What's the latency per endpoint and the most useful thing here is error rate, so they can see what kind of errors are coming per endpoint from here they can click on these errors, and this will filter out just as fans or traces that are relevant to that error in that endpoint and again, it gives them an easy way to to explore their data.

A

The next dashboard is for clients again pretty similar. You can see the breakdown of different clients, their virtual. You can see latency comparison between different versions. What endpoints are more frequently used for each version?

A

One one really useful feature for or useful thing about these Cabana dashboards is it's very easy to find the outliers, for example, in this first case on the left side, we have a graph that shows us the top traces with high number of spans, and we see that there's this spike at certain point where we saw that there were traces coming with tens of thousands of spans and there was clearly a bug in our instrumentation that was causing this.

A

But without a tool like this, it would have been really hard to figure out where what's the source of this, this bug. Similarly, on the right side, we see a spike in spans coming from our dev dev servers. So again, someone had some problem on their dev server and it was it started, sending too many spans and we were able to catch it quickly.

A

Okay, so now now we have a way to look at traces and a way to look at trends. Next thing that we wanted to do was to get a deeper analysis of our traces and answer more complex questions like how much time is spent on a single service for for for a for a given user operation or what's the difference between performance between two different versions of software. So for this we built a trace analyzer, and this is the tool that we explained in our blog as well.

A

So the requirement for for this tool was that we want you to get an aggregated view of traces. Looking at one trace might not be the right way to go because it might not be the representative trace for for any service, so so the first requirement meant was to have a way to look at the aggregated view. Second requirement was to compare two sets of crises. For example, you want to compare trace from last from last week to this week or between two versions or between two devices.

A

You want to compare traces from Android and iOS. Then we also wanted to provide a way to do root-cause analysis. Let's say: there's a regression in your performance and now because we have 80 services this in our back and it's really hard to find out which service is causing that regression. So we wanted an automated way to surface up the the root cause for any any any performance problem, and then we wanted it to be scalable.

A

Since this there's a huge volume of data, this tool has to scale with the volume and easy to use so that users, they don't have to understand the details of tracing, but they can quickly run a report and get the answer.

A

So the we build here, here's the basically architecture of the tool that we build for the UI part. We used Jupiter notebooks, it's a tool, that's used here at Pinterest, and people are used to this tool. This UI from this UI users can provide their input their parameters for the analysis. This notebook, then triggers a spark job. This bar job reads spans from elasticsearch.

A

Does lots of processing and analysis builds a report and writes it back to elasticsearch and then again the Jupiter notebook reads the report from elasticsearch and displays it to the user, and this whole process can take from seconds to minutes. So once the report is done, the user gets an email and then they can go and look at this report.

A

Here's the UI for for running this report. Basically you-you-you can specify the time period that you want to look at the service endpoint and some other binary annotations and you can provide it for two different sets of phrases. If you want to compare these traces and here's the sample report summary for one of these reports- and in this summary, you can see things like what was the average latency, what were the calls portrays?

A

What total number of calls made the most interesting part here is if we look at the three graphs in the bottom, the first graph. This report is by the way for ng API, so the first graph shows the self latency of ng API and it's a histogram, and we can see that in batch one and batch to the two sets of majid batches that we were comparing there.

A

There wasn't much difference, but in the second graph, the overall latency we we see a clear shift of the histogram towards the right, and this kind of highlights that viola, ng API, did not add any performance delay in in the in the request. The backend services did add some performance delay because the overall latency has increased. So this kind of shows you how this tool can be used to do, root, cause analysis and fight the root cause that where this extra latency is coming from and from here we can again.

A

This report has lots of other things. One one more thing is downstream: Service latency, so this this table lists down all the latency that was interest introduced by all downstream services, and this is their self latency. So this does not include the latency that was incurred by any further downstream calls made by those services.

A

And again we have a table for number of downstream calls as well and again in these tables the the two kind of highlights, the services that that have the highest impact and then the engineers they can quickly look at other dashboard for those services and see how they're performing this last table is for for looking at per endpoint per and point latency impact. So it has. Basically each row shows you a span name, client and server.

A

So this gives you a further detailed overview of what endpoint added the highest latency in the set of traces that we are looking at and again from here then users, the engineer's. They can see that okay, this service, this endpoint of this service, had the most impact and they can go and look at the the core changes that they made for that endpoint and see what what broke the? What added this extra latency okay. So we are kind of running out of time.

A

I'll quickly go over a few more slides, I'll quickly go over them, this tool that I explained earlier the trace analyzer, it's not very flexible, so some of the engineers they requested some more a library where, where they can just pull all the traces and do some ad hoc analysis, so we built a Python client for them that has all the helper functions to do the analysis. Here's the sample code for this client.

A

So these few lines, let's the engineers, pull all the spans and traces of services, and then they can do their own ad hoc analysis and some future plans. So so far we have done the trace, instrumentation visualization analysis. The next thing is automation, these tools that I explained earlier.

A

We want to automate them so as they can run after every build and release, and we can have a nice daily reports and basically remove so that we don't have to require the engineer to run these tools, but these tools can run themselves and whenever anything breaks, they can just send an email to the engineers at hey your services having this problem and one more tool that we're working on is a visceral dashboard.

A

So this I don't know if you guys are familiar with Netflix visceral too, but this tool lets you view real-time state of your services and we are using this tool to basically get to display the real-time view of our trace data and the output of these analysis tools. So let's say if one of the service is having some issue some errors or some latency spike, this will clearly show you where the spike is coming from, and you can see the root cause of that spike.

A

All right, I'm two minutes over time, but I think yeah. She'll be okay, so yeah. Let me know if you guys have any questions.

C

D

C

That's what I do have one question: if we have time for that, it is your trace. Analyzing reports why'd, you.

E

C

Difference column, why is that so.

A

The negative values is if, if the latency is decreasing for some endpoint, so again, if you're comparing two sets of phrases, the latency can decrease for certain end points too right, so that negative value shows that that the performance improved. For that end, point.

A

I'm sorry I kind of went pretty fast to these slides, but yeah I mean I can come back in the next call, I mean if you have any questions later as well. I.

B

Was curious about Kafka? Actually, given the consistency guarantees it's trying to provide our probably stronger than what you need, I'm just wondering if that's a painful part of the of the stack or, if you don't really have trouble with Kafka, no.

A

So far, we we did not have any trouble with Kafka and again we have other services at Pinterest that that use basically that have high requirements for Kaufmann. Even for them, it's working pretty good and our service is not one of those services that pushes Kafka to its limits. So so far we didn't see any problem with alpha great.

B

Good to hear so.

F

I have a couple of questions on try make expect. First of all, thank so much awesome sensation. Second of all, I'm serious, like almost work from a product standpoint and from a technology standpoint. How do you most people who use the we have various entities shown get into it, that they they go in their browser and navigate to an internal URL, or is it that they're being they were diverting through some other like a metrics tool or something else?

F

And then my second question as a follow-up is, is you know I last time I supposed to assume on about this was probably a year and possibly go, and he hadn't managed to get the kind of instrumentation coverage that you you've gotten at this point and I was curious like from a organizational standpoint, what what you all did to get such high quality coverage of your system. It's a challenge for a lot of companies. I was curious to hear about that. Both thanks so much sure.

A

So, for the first part, basically, we have internal URLs for these tools and people are pretty familiar with, pin, trace and Jupiter notebook, so they they know where to go for to use these tools and again we have all these tools documented in our internal wiki's as well.

A

For the second question, how did we get this instrumentation coverage so good? So what we did was we we kind of delegated this instrumentation work to the service framework owners for each language. So it's not I mean if, if just me or my team is doing all the instrumentation, it won't be scalable and again we don't have skillsets in iOS and Android and CDN and all those places.

A

So so we worked with other teams and gave the ownership for every language and framework to the right team, and since they have all the skill sets for them, it wasn't too hard to instrument their code and instrument their frameworks and that's how we were. We were able to to get such a such a good coverage and again now moving forward those teams they maintain their own instrumentation.

A

So, from our perspective, we just give them guidelines that okay, let's use this standard, let's use this schema and if they're- and we help them debug issues as well- that when they start implementing these instrumentations, usually they run into some problems, some encoding problem or those kind of things, but what we helped them and we worked with them. But in the end those teams are the owners of of the instrumentation and that's why we were able to scale. So else did I answer your question.

A

F

You did thank you, it seems I mean some companies, don't have the benefit of having a central framework scheme for his language. I think that's, that's probably something that helps a lot for Pinterest is like you actually captured that into your org chart in some way, which is which is really smart. So so that's great. Thank you. So much sure.

G

I had a question thanks for presenting by the way you mentioned the seven-day retention period. I was curious if you keep statistics on how many traces are viewed and how many are archived so.

A

We I mean I, I, haven't looked at how many traces are viewed, but once in a while I go and see how many traces are archived because they are in a different index. So it's really easy to go and view them. So we don't have a whole lot, but everyday we have few traces and again, the the biggest use case is that let's say someone finds a bug in their code or they they see some interesting trace.

A

They want to save it, because the timeline of fixing that trace and then the point that code can take more than seven or ten days. So our retention policy for regular traces as 10 days, but for any such use case. We want to keep those traces for longer and and again people are using it a lot. It's like I would say a few traces a day, but even then, though, those are really critical traces that we really want to save. For that.

G

Makes ancilla get saved in Nigeria or wherever that bug is being tracked? Law yeah.

A

So their link doesn't break and they just keep them forever. You.

B

Awesome well, thanks for presenting.

H

B

So it's uh it's 9:00 a.m. and we've got a couple other items on the agenda unless there any last questions I'd like to move on all right. Let's move on next up on the agenda is just a sort of report. Back from the w3c trace context, a workshop that occurred in Seattle last week.

B

There were a couple of things that happened there and a couple of people who are on this call.

B

Were there I'll just give a high-level report back and then maybe Erika and Ben and others can chime in with their thoughts the the main thing that was discussed there was the trace context, specification that really has two parts trace context which is information being propagated about the trace itself, from one tracing system to another, potentially and then there's another header called correlation context, which is sort of like a baggage propagation, a way to just transport a set of key value pairs down the stack through some header. That's been whitelisted by the various proxies.

B

There was a lot of discussion about these two things in general, it seems like more and more agreement is being reached with the Tres context. In trace context, extension, headers I would encourage people to go. Look at the repo to read up on on where that's currently at, but it felt to me like that was was really getting in the hair-splitting territory and as far as the open tracing project is concerned, I think I think it's it's reaching some point.

B

That means we should start thinking about it in particular. Do we want to expose some of these fields on the span context API, so there's going to be a span ID and a trace ID that come with this new header and people have been asking for the ability to correlate spans. You know doing span, observers and things like that and not having an exposed span.

B

Id or trace ID has kind of been a blocker for us to be able to do some useful things so I think to me that was the most important thing that came out of that. The second most important thing, which relates to the correlation context, header. It was a discussion on security and when you add baggage and your program, and then you have some third party piece of instrumentation- that's serializing that baggage onto your HTTP calls or message queues or whatnot, but in particular, HTTP calls.

B

There's not really any API mechanism for indicating whether or not a request is outbound to another third-party system or staying in bound within your own system. And that means this is a security issue without some clear way of identifying. You know when that correlation contacts, header should be populated and without any kind of encryption on the data inside of there.

B

It's it's really kind of handing a foot gun out to application developers, in fact, in the open tracing API right now, there's literally no mechanism for deleting or indicating not to propagate baggage, so I think that's a real problem that we should think about. There are some other things that happened there. Open census showed up. It was nice to see them, but I think those two points that I just mentioned: exposing the trace context, fields on span context and really thinking hard about baggage in terms of security.

B

Were the big takeaways for the open tracing project anyone else like to chime in on what they what they saw there.

H

Somehow I volunteered myself to write trés context, implementation, it's a basic tracer. My agent was going around the room after you guys, left and and I was like. He was like who's gonna write reference, implementations and someone thought you'd for I. Don't know what for Zipkin someone volunteered for this I think I said I. Do it's the basic triskin, okay.

E

H

Yeah I I, don't know I mean there was one interesting piece of the census: API was the ability to propagate baggage without starting or finishing the span ever, and that was sort of interesting to just think about as a concept for me, just as oh is that something that users might want to do and is that the job of open tracing I remember there was an original version of open tracing that was called leg, open context or something like that and I suppose tracing we built on top of that, but this was sort of just the pure element of K V pairs, perfectly baggage.

H

You know who cares if there's a trace ID or not, um that I thought was sort of interesting and fundamentally challenging some of the models, but people in all trades vendors have been doing yeah. That's.

B

I think I was convincing specifically that you know we were trying to solve lower-level problems at the higher level by not having some form of you know some form of standardized context, propagation, because that doesn't really exist. We end up, you know we need it for tracing, and so then we baked it in, and then everyone comes over to the tracing system, which is an observability system and says, uh can I please ride shotgun on this thing?

B

Can I just tag my thing onto here because you're doing the context propagation in a way nobody else is and that's because the tracing system is the only one that's getting propagated through the proxies then also when it goes into the process, has the right kind of information to note which outgoing calls are related to the incoming calls, and so we're doing all of the hard work of wiring up this context.

B

But without some lower level, primitive and I can see why we would not want to be in the business of trying to Videl that on top of providing some standard tracing api, but but the problems kind of keep circling around each other yeah.

E

There's like sort of a separation of concerns and in previous places, I've been we've had we had contacts propagation before we had tracing, so there wasn't a need to couple the two together because we had one before the other, but in a new context, when I'm trying to think about this and bringing this into a new place and I have neither of these now I'm, faced with a very large barrier to entry, Oh a much larger barrier to entry, if they're, not coupled I, guess so makes the thing about.

B

H

Right yeah, because you're gonna do the work at the same time, and it's maybe a nice carrot to provide the instrument or, if they're, having to propagate context for tracing to tell them. Oh also, you can use this channel for all this other wacky stuff that you probably shouldn't rely on for your application. It's.

E

Amazing, what shiny graphs do for buy-in for things yeah.

H

No I don't think you should teach developers to use it for their core application logic, but the the sense that stuff is using it just for monitoring, tags and I know of companies the to click application logic in there when I was going up the hill to Brown University to watch them talk about all these exotic applications of baggage.

H

I was often thinking of trace linux as a way to we could somehow productize the propagation of context on its own and then it seemed like a bad idea because we would be at fog, keep providing automatic instrumentation if they missed a spot, and then they will be mad at us. So it's I think it's much safer to give that to them to the manual instrument or the engineer themselves that their own company, rather than have them automatic, instrumentation vendors to try and you know, promise to hit every context propagation opportunity perfectly.

H

But it's kind of neat I, don't know I think if, in five years everyone is using context propagation for there all kinds of exotic use cases it'd be cool. I will.

E

Say in a previous life, when a very large scale micro-service this deployment, with probably on the order of 2,000 microservices and where engineers put a lot of stuff into context and we kind of let it be like freeform like if you want to use it, go ahead and use it and there wasn't really much governance around what go. What should or should not go into context, I.

E

Think at one point we incurred like I, think we had like 50% extra latency, because we were sending that much more unnecessary context through a micro service architecture because of it. So we had to actually dial it back because we had so much. There was just so much in there, but that's a it's an interesting problem to hit and have at that point.

B

Yeah something that was brought up that I think is relevant and I'm kind of interested. What what people on this call think is sort of the opposite direction, which is there's no way to sort of tag traces at the trace level.

B

There's you can tag spans and you can attach baggage to a tracing system and baggage is sort of like trace level context in a way right, you're saying this, this trace has this project ID associated with it things of that nature, but because there's getters on baggage. The semantics is this a way of propagating in-band information, then that implies there's this cost to to adding you know baggage to your system, because you're going to be propagating it. So you have to think about.

B

You know the the size of that and something various tracing vendors were bringing up at this workshop. Was you know if you're just doing it for the purposes of monitoring within your tracing system? There's no need to be propagating that information in band right like it would be much better to be telling your tracing system- hey. Here's I'm just tagging this trace with project ID and an out-of-band. It's indexing it or doing whatever it does with that and you're, not worrying so much about.

B

Oh, is this all gonna fit in a single header or something of that nature, so that that was just a point that was product that I thought was was potentially interesting and I'm, not sure if open tracing needs needs such a thing, but but it was interesting to think about if you wanted to be indexing, these traces with baggage like if that was a thing tracing systems, we're gonna start doing.

B

Is it the kind of thing we're having the getter is on baggage, then sort of complicates that issue.

B

It's baggage seemed to be doing double duty right, like if the tracing system is trying to use that as information and you're trying to provide users with indexing based on that, then you've got this sort of spurious overhead, that's associated with them versus baggages, for not for the tracing system at all, and it's just it's just the mailman delivering this to some other system, in which case tracing vendors were like. That is not my job.

B

That is not why people are installing my system into their system and I don't want to be on the hook for when the the mail gets lost. So so I thought those were kind of two things that were almost kind of at odds with each other I.

G

Agree I enjoyed talking to vendors, but as a vendor, I enjoy talking to customers and trying to really nail down why they wanted baggage, because I was hoping that the answer would be. They just really want to tag their traces. Not they actually need to use it at every single hop because I'd make my life a lot easier.

B

Yeah, it seems like like that API is really doing double duty and when you throw in the extra added fund of security and propagating these things out of your system by accident, without any kind of encryption on the data or adding encryption to the data and thus potentially ensuring a lot of overhead into this whole affair. I, don't know I, think I. Think baggage needs needs to get reviewed in a very serious way.

B

All right, possibly some of these things may show up in this specification issue. Backlog I think a couple things that we should start thinking about. Based on that workshop are exposing span, ID and trace ID, you know, just assuming trace context is going to come through the pipe. What would we like that API to look like an open tracing? Would it be problematic, so I think that's one issue we're gonna raise again.

B

It was sort of called correlation, ID or debug correlation ID in the past, but we're gonna bring that back up and then I think an issue around baggage review around security, and some of these other things will get brought up and I bring it up on this call because the OTS C is nominally supposed to be kind of reviewing and driving that stuff, because we haven't really been doing a lot of it because we've been focused on in process context propagation. There hasn't been a lot of movement on the speck outside of that.

B

So now that we've sort of passed that particular gallstone I think the time is right to sort of pick up. Some of these other issues and start moving on them quickly, so I would ask that when those things pop up in the specification issue backlog people who are members of the OTS C, you know pay attention to them and make sure we're not I, don't think we have an official SLA on.

B

You know, responses and resolving those issues, but we should think about you know operating as if we did just to make sure things aren't, aren't just kind of dying on the vine, so have a look for those things coming soon.

B

All right so I think that gets through the next thing on our item here was sort of bagged in Spain context, relationship where the getters live in the interest of time. I think I want to just hop over that for the time being, I think we can discuss those nuances on github I. Don't think we need to talk about them here, but just basically you know where do the getters live?

B

You know around whether span context is immutable or not, but I think we should discuss that on github, since it's very dry moving down the list. I think this next one is a a fairly non-controversial issue. The CMC f has requested that we switch the licenses for all of our repos to the Apache I believe Apache, you know v2 or whatever the standard apache licenses. We have a number of them that are currently the MIT license. The rest are already apache.

B

I don't think anyone has any particular attachment to the MIT license and think it would be a problem so just very quickly. If anyone does think that's gonna be a problem. Can you speak up now.

B

Yeah I didn't think so so I think they're gonna go ahead and and startled me. We.

F

Should probably open an.

B

F

And like let it sit for a few days, I mean it's not like everyone who cares on this call. I agree that no one's gonna care it's an API. It.

E

F

Make sense to license it, but but yeah it's not well I, don't think it'll be controversial. Oh that the reason for a people if they're curious, it's just that MIT license, has some loopholes in it. That can become problematic was like submarine. Ip is long story that, but it it's it's just like super wonky legal stuff, but it for love interesting. It makes like literally no difference the only reason anyone at MIT is that I think MIT was more popular into some of the languages.

F

We just didn't want to add additional licenses to people's software, where it wasn't necessary, but I think Apache is ubiquitous, so it shouldn't matter yeah. So.

D

We went with this probe for this through this process with Yaeger when we joined since EF and the way we approached it and I consulted with our legal. They said like open an issue in a particularly poor and just being every contributor who was who is in the history of github. For that repo and just let it sit for like a couple weeks, if no one objects done feel free to upgrade to a budget.

D

You we also adopted by the way so like uh a girl code used to be on uber repo, which had a CLA. So we killed that when we moved to independent work, but we also adopted the certificate of origin from CN CF, like the one that Linux uses. So every commits that happens in the repo has to be signed, which is like a like a yes switch to the git commit. This is not much, but there is a checker which verifies that all commits in the P are assigned.

D

So we may want to consider that as well.

B

H

B

I didn't mean to imply, we were just gonna go ahead, and just do this. You know by Fiat to everyone just more that I, don't think we need to have a big debate on this call about whether it's a good idea or not seems pretty non-controversial alright.

B

Anyone else has any further licensing related comments like spend the last 10 minutes just talking about some new project structure coming up.

B

So in order to kind of keep the OTS see, focus on sort of a steering committee specification level stuff and not fill this time up with a lot of nuts and bolts, we still need to get those nuts and bolts put into place so created a new set of working groups. One is called the cross language working group, which is tasked with figuring out some of the day-to-day project management around these various backlogs, so the language API backlogs in ot core and then the backlogs for all of the contributed.

B

Instrumentation there's just a certain amount of process. We want to make sure we have in place around just processing issues in PRS that currently is not really nailed down. You know things like you know, templates, making sure when people make a PR or an issue, it's it's just focused on a single subject.

B

Who should be a sign just reviewers to these things, not that other people can't comment, but who should be assigned to ensure that these things are getting shepherded through to completion if they can't reach a completed state and have to be set aside? Where do we put them a bunch of that kind of day-to-day project? Management needs to get sorted out. So this is a group focusing on that and as far as API decisions is concerned, also focused on making sure all of the language api's are implementing this spec.

B

So, for example, we had a contact propagation as a concept making sure that actually goes into all the languages that would need it. The API is that result feel coherent as a whole if you're doing cross-language open tracing. So that's sort of the Mandate of that particular group and I would welcome people to join it. There's a new getter channel under open tracing called cross language and we'll be having sort of regular workshop meetings to kind of get this stuff moving any questions or comments about that working group.

B

Cool feel free to ask questions on git er related to that there's also a new documentation. Working group that's been kicked up and that's shock to overhaul or documentation. Basically we're missing a bunch of styles of documentation. That would be really helpful. There's a lot of you know consistent questions. People have when they come to the project that we could clear up so I think making that a lot cleaner would be helpful.

B

Also we'd like to create a sort of open tracing, cookbook cross language, examples of all the different tasks and scenarios you end up doing an open, tracing, I think that'd be very helpful for people and finally, getting some kind of you know searchable index of instrumentation, some way of being able to to really focus people in the fact that what open tracing is for is it's for instrumentation it's for having a bunch of standardized instrumentations that all work with each other that have some amount of guarantees being attached to them, and it would be great to have something like you know: NPM or rubygems or Sipan, where, through the open tracing website, you can kind of search and find all this stuff.

B

You can kind of dig around that. You know ot. You can trib github organization right now and find things, but I think that's just too opaque enough that it's not doing good ecosystem justice. So that's the other project. I. Think that group's going to work on by the docks working group, I would HIGHLY encourage members open tracing members who come from organizations large enough to employ technical writers.

B

To consider having your technical writers join this group and donate some of their time and to also think about how the open tracing documentation could get worked back into your own vendor specific documentation, because a certain amount of your API is, of course, the open tracing API. So there's some other reason to come to this group.

B

Already and that that's that's all I have on new project structure going on right now, you'll probably see a lot more movement coming in these various repos. As a result of that group kind of getting off the ground.

B

Wow we finished early for once.

F

What actually, since Ted, fast-talking hat.

B

Yeah so open-floor uh anyone have any other questions or topic discussion. I.

F

Had just one thing as I mentioned, I didn't think it was that important, so it's gonna be right out of time, but since we didn't run out of time, I didn't want to mention that I met with a louise who is like doing a lot of the trace context. Work in works a tiny trace.

F

It has a fancy title there a couple of days ago in San, Francisco and I think that we're going to try and find some it's unfortunate that the nest distributed tracing workshop where the trace context, stuff is being discussed, and that's one of those is concurrent with pucon in Europe in May. Second, third I think so he was talking about trying to get that workshop moved a couple of days or to maybe have some separate thing that happens just on either just before.

F

Just after the coop fun thing try and get together and talk a little bit more about how to consolidate a bunch of API is dynaTrace is also pretty interested in being involved up and tracing and the capacity of contributing like higher level higher level API. Is that build on top of open tracing? Like they have a bunch of API as that track, individual users and things like that that wouldn't make sense that open tracing this layer of abstraction would make a lot of sense from level up from that.

F

So I thought that was very interesting, so we'll probably try and incorporate the Benham and and him into the work that we're doing going forward as well. I just wanted to mention that.

B

Great speaking of high level api's, that was actually a piece of report back from the trace context workshop round, there was some requests from from various tracing implementers around having higher level API is an open tracing around common tasks such as trace httprequest, where, if you could have some kind of higher level api, that sort of forced you to put in certain key pieces of information to kind of ensure right now we're trying to use testing just sort of ensure you know some form of kind of uniformity.

B

But there is a request to sort of bake that into the api's, the other high level. Api's might be useful. Now that the you know, lower level ones are sort of settling into place. So that's that's something worth worth considering. We haven't spent much time focusing on tags, since we've been so focused on the api's but tags. How should things be tagged? What counts as an official you know, HTTP request span or something like that.

B

I think that's really useful information or useful tasks for us to do maybe at the OTS sea level or to start another working group to really focus on the sort of tag taxonomy.

D

The only aspect of that, though, is that it's mostly make sense in the context of go where there is the single HTTP request API. Where is it no pretty much any other languages that that sort of thing becomes fairly impossible like you can't really provide helper methods.

B

Yeah or you can, but maybe they I don't know like they're, not gonna, help you automatically slep the information out of the request. Object right. It would be more just shaped in such a way that there was. You were forced to extract all the information. This thing needed to tag things correctly, yeah.

D

That's I think was one of their feedback is like. Why can't you just define a struct that users need to populate instead of saying? Oh, this is a document describing tags, yeah.

H

What you need to do, or some kind of special begins and method that had required arguments and then very attic arguments afterwards or something like that, which is I. Think.

D

Another struct is kind of this way to approach that it's extensible.

B

Yeah I think the one thing that I was skeptical of was people really demanding like compilation guarantees was a lot of what was brought up, but it's I. Don't that seemed like I, don't know.

B

I I would be just as satisfied with very clear guidelines and then ensuring that for stuff that that's in ot contribs, like a review to prove that these things you know it's just like. If you do a CD B, you have to have a test that proves that you know you're assigning these these various fields. You know various tags and things to it and hope you satisfied with that. Other people actually tests.

H

They validate your conformance with this Beck sound fairly.

H

B

You could have a testing tracer. The whole job in life was, to you know, help verify these things right. You know it seems like the kind of thing that would be easy to to maybe make some standardized tests their test harness runs on this stuff, like no it's worth thinking about where we just have a review process of being like things can't, you know, come in without you know, reviewing it saying: do you have a test, as the tests show you've tagged this properly.

B

All right, well, there's a good meeting. I think we're done. Yeah.

F

And I want to thank no man again for the talk that often.

B

Hey lovely, seeing lovely faces, see you next time.

H