Cloud Native Computing Foundation Open Tracing, 6 Jul 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenTracing Monthly Call - 2018-07-06

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

A

For Linux screen sharing actually works with the miracle. All right, then we are now recording its five-minute pass. So I think you can get. This kicked off. I don't know if VHS is on the call, and one here.

B

A

B

I want me to do quick, intro, yeah.

A

Let's, let's start off that way! Well,.

B

Yeah so Todd Lacan is I guess you know Todd a lot of different stuff, but he's.

B

The lead for the canoe project to do projects, which is a pretty interesting data for that Tom, Warrington combination and it's kind of like a chimera, different sword systems, pretty it's pretty powerful, actually and and he's done some incredible performance work, making it worse on a variety different workloads which I'm excited to see him present here today, he's also a committer for a long list of other Apache projects and incredibly well-respected and back communities.

B

So I'm really excited to hear his paw, and you know: I don't have too much more of an intro site than that. But that's important.

C

Thanks man, so yeah I didn't prepare a whole lot better. This is usually a pretty laid-back blast off. I just threw together a few slides starting about 20 minutes ago. So I apologize for the lack of Polish here also feel free to jump in I. Don't know how many folks are on the call, but if someone wants to jump in and kind of guide the conversation to stuff that you all find interesting, that's fine by me. So I do have a couple of flies and I also just wanted to show a couple of things: lives.

C

I! Guess: we've got about 25 more minutes before you got to get on to other stuff.

B

Probably more interesting, whenever we'll talk about that, okay.

C

Okay, well we'll play by your than plug up anytime, so yeah 30 second intro. Next video I guess then I already kind of covered it. It's a distributed. Columnstore for those of you who aren't database people, column stores, basically organize the data that you store sort of in a column-oriented manner, so that each column start together and when you want to scan just one column out of say, 100 columns in your table. You can do so that happened to waste the I/o reading, all the others, which makes it very well-suited for analytics.

C

One thing that we did included, though, is try to actually make this also efficient for random access. So, while to do is typically used for analytics, we do some use cases that are pretty random access oriented and we do run some benchmarks using them more, no sequel, random access style, benchmarking tools like whiteysd that people might be familiar with it's also a distributed system, so we use rafts for replication.

C

Imagine people here probably know about rafts is basically another implementation of consensus very similar to multi taxes. So we do care about latency, I, wouldn't say the latency is our number one concern we're not typically running directly web facing properties. We do, but we do usually have end users who are on some bi tool and they expect queries to come back sub second and oftentimes. That sub second query actually boils down to hundreds or thousands of requests underneath. So the Taylor agency is actually pretty important where one tael outlier at the 99th percentile.

D

C

Tends to dominate a lot of workloads, I think.

D

People here probably are familiar.

C

With that whole idea is the great paper that I really like called tail at scale from Google, maybe six or eight years ago. If you haven't read that you definitely should, if you're working in tracing I don't want to talk too much about speedy, though I think the way that we approach buildings we do is essentially to build a bunch of kind of generic systems. Infrastructure people who work at companies like Google or uber probably have a lot of that stuff already in-house, not open source.

C

Unfortunately, we started from scratch on a lot of stuff, so we built a lot of these things that probably seem familiar to people from either other companies or other ecosystems that are pretty generic to any distributed system software. That cares about this kind of stuff other things most of the stock. You don't need to know anything about what kudi does just think of this as a platform for building high performance, low latency system software, so I'm going to jump right in to come some of the various things. This is kind.

B

Of like a grab-bag.

C

Talk, it's not like there's one. Our core story to this I just kind of here are the various things we do well, we tend to be useful. The first one is pretty simple: to request scope tracing. This is probably the thing we do that's most similar to open tracing where we have a by the way who knew almost all in C++. So all this talk is about our C++ back-end. We've got a macro of health traits. It takes a little substitution string with dollar sign placeholders and pretty much.

C

Every RBC starts a new trace and we can pass it between threads and then the trace things it sort of gets appended to whatever the current trace is.

C

This is not actually a hierarchical tree if it's not really dapper or open trading style, it's really just a log, and when we accumulate the vlog associate with our PC, we sample it. So we have different sampling, buckets for different latency profiles and also we actually have a timeout to propagate different clients. So, whenever a client sends an RPC, it says: hey, my timeout is one second, and on the back end, if we realize that we responded to that RPC after one second we'll.

D

Always jump the.

C

Trace that RPC, so it gives us a pretty good idea of what's happening on the our pcs that are too long, very, very simplistic. But it's again it took like you know, two hours to write, whereas open tracing is a much more complicated thing and it's super super lightweight. There's no infrastructure, it's all in process. We don't need to hook up to any collectors. Anything like that, so it's limited in scope I mean that both in the future science sense of the word scope and also in the how much it accomplishes.

C

But it's been very, very useful for us one thing actually I didn't put in the slides, as we also have for each of these traces, a very simple map with counters. So if you look at an RPC trace, we'll have the log and we'll all have a bunch of counters. Some of them are pretty generic, so our spinlock implementation will count how many cycles where it's been spinning, and it reads that to their PC and then we also have a lot more specific to the particular requests.

C

So if you're doing it right, we have to write it right ahead log and you might have time we've spent waiting to write to the right ahead. Log becomes a counter on that trace, so some examples in just a minute. Actually we don't have to do this in line and show examples while I talk so I have another browser here.

C

So here's a server I just started running on localhost I'm, showing the RPC v page I wish shows the running our PCs and sampled, our pcs, but never made any other pcs to the server. Yet so there's nothing in there. But if I go a Python shell over here and how lift tables there's no tables in this cluster Kalitta turn did it, but that would have made an RPC. So if I reload, this page I can see the current RPC connections that are open where it's from the state.

C

If there were an RPC currently running at show up in this inbound connections list, and then we can see a sample dirty tea trays, so the trace here. Unfortunately, my browser doesn't show the new lines, but you can see the time that it arrived how many microseconds it took coming on with a call to you coming off the call queue handling and then 229 microseconds, later rq8, a success response. I think this is a debug build. All the times are much slower than you normally expect in early.

C

This probably would take you know a few microseconds, not, however money this deck is 100 microseconds. This is very simple. You know. If I do a bunch of these calls, probably all of them are going to fall into the same bucket, we're not going to actually see it as change as a weari sample once a second.

C

If I go to a actually one of our production servers for an internal use case here, Clara and I check out rbtv, we can see there's a lot more going on, there's a bunch of connections open from Levites and hosts. In fact, there's one called it's currently in flight, and you can see that the client sent a three minutes timeout on this. This is a scan call so far, astern running for 11 milliseconds.

C

Lots of information with outbound our pcs, because the server's talk to each other. So this server as an outbound RPC to another server, calling update consensus. You can see it sent. The call hasn't yet received a response.

C

If I go down to look at some of the more interesting things you can see here, start tablets help you, which is one of our rear application or pcs and the whole information about what happened. And then here the metrics that I mentioned that every RPC has various metrics come in the reproduce narrow our I/o code will account these metrics, like F Data Sync. How many we did, how many microseconds they took, how many microseconds we spent waiting on mutexes DNS. Some reason those refresh started a thread how long it took the first thread.

C

Every thread pool that we use has few time and run time a little CPU run time. I, don't know why they're not in alphabetical order here, but so that we have a thread pool, is called raft and a thread pool called tablet, copies that this request used. We can see this tablet. Copy thread took quite a long time. This is actually downloading a bunch of data from another server, so it's a longer request.

C

So this is one particular sample that took 82 milliseconds, but if we scroll down, you can actually see there's another sample of the same RPC that took longer and if we're lucky, you might even have an example of a very long one. This is pretty useful to find out like what are the outliers. What happened in that layer that was different from other outliers? Maybe it's the update, I think.

C

Maybe it's the next time you can go through and see all the different, our pcs, that we do that's sort of the simple RPC tracing that we do. Another thing that I really like is that I found that oftentimes a single RPC trace, won't tell you a whole lot. It will tell you hey. This RPC took a long time waiting on a lock or took a long time waiting at I/o. We don't really know what happened that actually caused that it's some across request interaction, so we separately have an infrastructure called process while tracing, unfortunately,.

D

They're not really integrated, they.

C

Were implemented.

D

Separately and never.

C

Really changed use the same annotations or anything, but basically these are mostly scoped annotations and there's a way to actually draw an arc between two scopes as well, if they think event fired in one place and picked up another place, and essentially you have a category for each trace event, some human readable name and then some set of variables and again there's a super low overhead when it's not on, and then you actually enable this when it's on there's pretty high overhead when it's on, because you a lot of these traits coops, so I'll pop over to the server I have here and go to tracing dhtml and hit record, and you can see there's a bunch of different categories.

C

D

C

Is the categories that are in that trace, annotation, I'm, going to say record and then I'll make a couple RPC? Is this server.

E

C

It stopped personally doesn't look great when I'm zoomed way in for the screen share, but you can see on the top there's a timeline of CP usage and then various threads down the left. You can see that one RPC worker was actually involved, I think I call it request. I called lift tables four times. You can see one two three four. If I zoom way in here, I can actually do the time line that this call started here.

C

Method list tables it got picked up on this reactor thread, which is our network, is Libby V for network than based I/o. It did the person thing of the part of us if I turn on the flow events, I think we'll see, I think the new version. The browser doesn't support these, but it should actually.

D

C

An arrow here from here to here showing that the call was parsed here and picked up by a different thread, and you can see it actually included in this trace the traces that we just looked at as you can see when it was picked up when it was handled with the metrics were in this case it's a pretty uninteresting call of no metrics and then responding success. So this is again not a super interesting RPC.

C

It's just a list table, but if I go to the tracing page I've one of our production servers and record, it should actually fill up the buffer quite quickly. I'll, just capture a couple seconds and.

C

You can see there's a lot more going on a lot more threads, a lot more apiece he's going on and there's actually some RPGs that are taking pretty long. So if I click on a scan, I can see that it took seven hundred five, milliseconds and I might be able to zoom in and see. This is continuing a scan, meaning that it started in the previous RPC. It's reading some blocks, it got a cache. Miss that's probably going to be blocking I/o does give you a pretty good idea. What might be going on?

C

You can zoom in and release. You have the fine grain level. Here's a cache miss this one is pretty quick. It probably hit the OS buffer cache. Where is that one? That is pretty long? It's probably I'm actually going to disk list at 12 milliseconds, it's probably hitting a spinning disk.

C

This is all very useful tis. You can actually see kind of cross request when one thing might actually be causing an impact on another. We're also able to see pretty interesting patterns in red thread pools where we're used to not have Lefou, ordered or thread pools. So he'd round robin across all of our workers- and we wouldn't get this kind of nice chunking we're only a small handful of our PC workers is active.

C

It would actually be round-robin across 100 threads and really hurting the cash performance, and things like that, so this has been very, very useful for us to find process wide lockups. We found some issues at TC Malik. For example, we've seen some issues with the linux kernel, where the MSM before gets held and all the other threads block for apparently no reason but they're actually they're all blocked on the lock in the kernel. But I found this very useful.

C

It's way more information than you'd actually get from something like open tracing and it captures the cross request. So I think things like open tracing are useful to pinpoint hey. The server has high latency, but when you actually want to dig into what's going on on that server, this can be more useful. Another nice feature of this. This is actually the trace viewers. It's built into Chrome, so I can type save here.

C

I can actually save a JSON file and we often are playing at a customer site on premises and they can make these JSON files and attach it to a support ticket and then I can load it into any other. Kuti server or even in chrome I, think, is good about tracing and load the wherever that JSON file went and it'll load in and display on anyone's Chrome browser. In fact, I might even display a little bit. Nicer is probably a newer version that we've embedded in q2 itself.

C

So that's the process, white tracing terms, an inner process racing we actually haven't had a big need. Yet we don't have a lot of super deep RPC call stack at least within coop I, think there's some cases where a user application. If it's building like a website they might want to do tracing, in which case we want to support it for the consumers. But in terms of Q itself, when we get a request, our request is going to maybe wait on one other server for application, but that's about it. So.

D

We don't really, we haven't, had a big impetus.

C

To go and do even tracing or adapter or hire as if Ken or Yaeger, or anything like that. I also wanted to call out the unreasonable effectiveness of log statements. So we have this really good.

C

Macarons I probably wrote on the first week when I started writing to do, which is Coke log, slow execution, see past number of milliseconds and then some string, and it's just checks if this particular scope that you put it in, takes more than X number of milliseconds it'll log out a statement saying: hey I took a long time to do X. This was incredibly useful in customer environments when they they kind of called up and say, hey, dude, who's being a little bit slow. Oh I, upload that I own.

C

That I don't know, here's a lot of figure it out and just having these kind of markers in the logs. That say, hey, look right into the right ahead: log, a bunch of threads blog. This thing saying that it took a long time to write right ahead. Log is a good point, as maybe you're right ahead. Log viscous flow or overly contended by other applications, and things like that. So super simple, but pretty useful for the amount of effort it took. So we didn't have these sprinkled around our code base in various interesting places.

C

A newer thing that we've added is process-wise stack trace collections, so we have a thing called the Diagnostics log now I think I can probably try to show you that.

C

So, just by default we run with this diagnostic lock, which, if put in a long directory, in this case it's a dead built in temp. So if I look at that file, it's semi human readable and basically, you get stacked race records which are by default once a minute with some jitter. So we don't actually correlate with any kind of schedule once a minute tack. So this one's 45 seconds apart this one is another 45.

C

This one is a little longer and then, in order to make a little bit smaller, we do a little bit of dictionary encoding of the symbols seals here in the stack trace line. The stacks just have hex addresses and then inner leaves there's these symbols lines which map those hex addresses to particular particulars and bolts and function names, the other type of until we put in these logs as metrics dumps. So we have a lot of metrics that are captured from the server histogram counters things like that I'll talk about.

C

We found that even though customers may have centralized metrics collection oftentimes, those do a lot of down simple, laying or aggregation, and it's hard to get down to what happened at this exact minute or in between these two exact minutes. What was the 99th percentile log of pen latency on this particular server I? Think the best companies in the world probably can answer that question. Most companies can't- and if you just have this really dumb gzip log into our log that you can get from the customer and look at this.

C

We have various tools you can take these logs and graph them and calculate various derived metrics. That's thank you very useful again. It's kind of the simple thing but works pretty well and we've got a description, tag notice, parse stacks and get some some our own part stacks on this slug it'll print out a lot more information, so the stacks and it does. The symbolization shows if my thread groups together threads that are all having the same stack.

C

So we have for reactor threads that are running a Libby event loops and it groups them together in this blood, easier to understand. What's going on there versus seeing hundreds of threads all of the same stack, so I mentioned the other periodic. We also have triggered selections.

C

So we have our VC cues, where, when an RPC hits the system, it goes into a queue waiting to be handled by handler thread, and we have a pretty tight limit on the length of that queue and if something arrives and it doesn't fit in the next few- will evict it in our pcs and that queue based on priority send back a message saying you need to come back later, essentially doing back pressure on the clients, but that mechanism and when we actually do a fixed stuff from the queue we trigger.

C

At that point, a stack trace to all the threads. This has been very, very useful for us finding it reasons why something is locked up. So there's some.

D

Set of underlying locks.

C

D

Gets held maybe it's.

C

A logging issue or a kernel.

D

Issue or something.

C

Like that and then very quickly, the RPC queues back up and then the triggers of stack trace. So we have this smoking-gun snapshot of here all the threads here, where they're all blocked and pretty.

D

C

Can point out these issues so techniques like this have allowed us to find issues like ng log, for example, if you just use the Google login library and it's mode, there is a mutex around vlogging and that mutex could be held while it's actually doing the I/o and I opened take a long time. So we've seen these issues where all threads end up locked on G log, so we moved out to isn't logging and those things got a lot better. So these kind of techniques again pretty simple, but work really well.

C

The stack traces are also viewable on a slash stack web page again, unreasonably effective, simple thing. So, if I go to one of our production servers go to slash stacks, pretty quick and I call it a kind of a poor man's profile. Also, if I'm curious, what a workload is doing is its can heavy. Is it doing a lot of I/o? Is it a way to not have to be you on something?

C

Usually just a couple reload this page. It gives you a pretty good idea of how busy the server is and what might be some bottlenecks. So it's interesting to me to see a hash table. Look up on the serialize row block Hall. This is actually unknown performance issues, at least and I think fixed, so very poor man's profile. Every loaded again, I also see the standard hash table signs here on this conflict, and that probably shouldn't be in that call. We should have something a little faster there.

C

All right as a slash metrics is pretty simple a lot of metric stuff. We built our own metric subsystem. We couldn't Lee sign much good for C++ I. Think now the maybe the census project is trying to do a little bit with this, but we implemented the HDR histogram data structure for high resolution, histograms and all of our pcs, as well as a bunch of other things throughout the code base track really fancy histograms. So you can see in this example that this particular right RPC has two significant digits. Precision.

C

We've done some number of them, I mean all the percentiles, and these actually keeps the raw socket counts as well underneath, so you can fetch it from slash metrics if you had a special query parameter and they end up in that metrics log. So, given the metrics log and given snapshots of the raw bucket count, you can actually say between any two points in time.

C

What are the nine mint percentiles of a bunch of different things and we calculate it on the server level as well as individual flips, and you can actually aggregate those raw bucket counts across tablets to say what's the 99th percentile for this pretty good table?

C

That's very useful for us as well to understand where the bottlenecks might be or with where slowness might be coming from another fun thing we built a couple of years ago that found a lot of interesting issues is the stack watchdog so on important threads of things like free mint bowls that are right ahead log of hand.

C

We use this skill to watch stack and we give some number of milliseconds that we expect it should not take more than 500 milliseconds to append to the wall and then there's some background thread, which is the watchdog thread which essentially scans a registry or all the other threads, and does a kind of lock free check to see if any of these threads is inside.

C

One of these particular scope watch sacks coat and, if anybody's been inside one of those skills for longer than the amount of time, it's expected it'll actually take the stack trace of that target thread and dump into the log there's some rate limiting to make sure that things don't go crazy in the logging and some super useful to find various issues with the file system, level or D log or TCM.

D

Alex has been other kids.

C

We've found some bugs there, so in this case the right ahead. Log at this line of code was stuck first 200 milliseconds and it takes the kernel stack as well. So we can actually see that inside the kernel it was waiting on jbjb d2, which is the file system journal. We didn't get right access to the file system journal, so this is something that you know. You probably would expect right ahead luck. The fact you had to wait for a 600 milliseconds is a little bit.

C

Unexpected may be that disk is either going bad or just overload it in.

F

Fact it turns out that.

D

C

Is just due to Red, Hat, Enterprise, 6, being really old and pretty bad imitation. A lot of this stuff, as you can also see in the user stack, is in do right because kind of make sense, because the kernel stack is in the right beat system call so I think that's all the slides that I've prepared it and want to go too long. It's a good questions and discussion is more interesting.

C

There anything that people want to hear more about or to see how the code works for anything like that.

D

I've got a question thanks Todd. This is pretty interesting and I. It's fun to see I kind of knew that you would do this with just what I wanted you to do this, but it's nice to see a presentation about performance, analysis and stuff like that. That's not just like 100 percent about distributed tracing, because these other techniques are really interesting relevant, but one thing that comes up in my head, I think. Actually, this is a fine example.

D

It sounds like in this case it was the issue had to do with red hat kind of price effects, not being a very good implementation, but a lot of the things that you're probably dealing with, have to do with contention for some shared resource, whether it's the disc or something else. Then I'm curious, like what you know. Do you have techniques that you're using to understand the source of load when there's just no, you know contention issues overloaded resource that type of thing like what?

D

What are you typically doing to find the multiple writers or whatever it is that's contending for a resource yeah.

C

We don't have any super generic things for that MINIX, specifically for lock contention. Our spin locks are instrumented with apology to talk about that here. Our spin locks have some instrumentation where they collect the stack trace of the unlocking node, with an unlock that sees. There was a waiter and collected spectrum, so it kind of knows which folders were causing attention of somebody else, and then we expose that through the peopre web interface, so I'll see if I can actually.

C

Show that yeah, so if I go to a special URL which can be read via that Gopi prof. tool as well it'll tell us over this one. Second, the various stack traces where we had some contention and you can be symbolized if you have the binary as well. So this is super useful for the generic kind of spin, lock contention.

C

It won't tell us exactly like it tells us a sex race and the memory addresses it doesn't tell us what kind of application level object was contended, but it's usually fairly clear once you get those data that okay at least I need to zero in on that part of the code to understand where my contention is going.

C

Similarly, you can get CPU profiles from this kind of endpoint. Honestly I find the flash stacks to be unreasonably useful for this kind of thing as well. So one interesting example is: maybe six months ago we learned that TC Malik, which is the alligator we use, has sort of six free lists for all sizes, less than one megabyte allocations, but one megabyte and above actually goes to like a central span list which was actually implemented as a linked list until very recently.

C

So, just by looking at the flash tax profile, we thought why are half the RIS resident EC malloc allocate large iterating of our link list just in the stack traces you can see that and inviting the code.

D

We realized that.

C

What we thought was a less than or equal to one megabyte was actually a less than one megabyte and all of our via allocators had been tuned to get one megabyte allocation.

C

So by changing that down from one megabyte to a little bit less than one megabyte, we got rid of a bunch of latency outliers, some kind of like this very stupid thing where we went to like 1020, kilobytes or something, and then we also submitted some patches upstream to TC malloc to actually cache the one megabyte allocation and to make the central free list not use a linked list and use a tree instead, and those two changes actually increase the throughput on some workloads, which did a lot of larger allocations by like 40 to 50 percent.

C

So it all kind of started from seeing something strange and just some stack traces and then from there. We did the next level of digging to actually understand what was going on.

C

Well, so thanks.

A

I got a question actually, first of all great presentation. It's awesome to see all these details.

A

One thing that that comes up a lot is you've, got all these different tools for for looking at instrumenting various parts of your system, kernel level, stuff, stacks threads user logs and where some of the trickiness shows up a lot of the times is sort of figuring out the right granularity of you.

A

In order for these things to be relevant, you often have to sort of staple them together some and that act of stapling them together itself has overhead, and that often seems to come up is like the trickiest part, and I was wondering if you had any thoughts on that or could you know relate experience report with trying to figure that part out yeah.

C

I definitely agree: I think we have a lot of systems that are useful in the right-hand but hard to expose what you should be looking at, so we're trying to document things better, we're starting some run books for our internal support team to understand like how these things might be useful in terms of like correlating, oh I saw one outlier, but I wasn't collecting the traces at that time yeah. We definitely have that you kind of have to like hope that you catch the thing happening that you want them to see.

C

So it's a little hard. That's why we started to add more of these features like the diagnostic plug was just always digging stack and that's now on by default. It took us a little bit of nervousness to be like. Is it actually safe to have this thing taking factories in once a minute, because when we first implemented it, we actually found unlocked in the dynamic loader if you're trying to back-trace a thread while it didn't the loader, and we have awful worker appetiser, read that so that I think there's always risk.

C

When you add this instrumentation, either performance or bugs and remember actually the first time we added the contention profiling I introduced this awful memory correction book where I was writing outside of the stack and that almost got released to customers and it would be really bad because we had a lot of crashes and things like that. So there's always risk and I. Think for us, it's okay to have even like a 5 or 10 percent performance reduction. I think our customers are not so performance sensitive and there are a lot.

D

More sensitive.

C

Stuff is down, and they don't know why or stuff is performing badly and they don't know why and it takes us. You know it takes about three weeks to understand what the performance problem is. There'll, be a lot more upset versus, if you say well, you've got a 5% overhead, but we can pinpoint that problem in an hour instead of three weeks. It's usually a good trade-off for us, it's probably not the case for every company, but we tend to lean more towards that side of the spectrum.

C

I guess I thought, that's exactly what you were getting at sort of our philosophy.

A

Yeah that is sort of it around there is well one trying to figure out. The right. Granularity often seems to be part of the trick, yonder being potentially dangerous. You know, there's some there's always some overhead that comes with this stuff, and sometimes it just seems especially writing databases. C++ stuff people can be very, very obsessive about maximal efficiency and then you're, saying well uh we're we're just gonna, add 5% overhead to figure out what's wrong with it.

A

It's almost like a like a cultural issue that sometimes you you have to convince people that it that it's it's worth it yeah.

C

The best example I can give there's like yeah. We always have a 5% overhead, but this time percent overhead has allowed us to pinpoint performance issues that have saved us, 40 or 50 percent. I. Think we've got any huge gains from things based on using this infrastructure, so we never added a spark 4%. We fully stock way back and yeah a year ago, and it was much much slower to give to spend a little to win a lot cool, great.

C

There's one last thing that I didn't show is the heat profile, which is another thing we've turned out more recently. Oh, it's not even on the servers we turned on so recently, but the TC Milan keep sampling is one of these things. It's not really well advertised, it's quite low overhead and I. Think probably our next release we're going to turn it on by default.

C

We have a lot of our own having internal memory tracking, to understand why our memory is going, but sometimes we have this case where the customers, like your internal memory, Jackson, says you're using three gigs of memory, but, like the art static, we're in the top is 16 gigs where'd, it all go and we don't really have any clue usually, but with turning on the deep sampling thing, even though again it might be a 1% overhead will be able to answer those questions a lot better and when we first turned it on and started looking at somewhere close with it, we found huge wins over like hey.

C

Why are we storing that thing? Don't even use that thing. We're.

D

Moving in and save.

C

Eight megabytes here.

D

C

Megabytes there it adds up.

A

Yeah makes sense.

A

Right any other questions for Todd.

A

Looks like that was it, but that was a really great great presentation and we'll chop this up and put this on. The web. Todd for other people who are interested in kudu to check out definitely seems if you're using kudu like this is like a great rundown of all the things you can do with it. Yeah.

C

Yeah I think the most advanced users would probably find use I think a lot of users probably don't want to care about. This, doesn't just hope it works yeah. We certainly use it on the dev team alone, although if anybody has any further, questions feel free to come on the git er, the open tracing Gator so feel free to ping. You there and I'll I'll check in later today. Awesome. Thank you. Thank you. So much for.

A

Inviting me Ben thanks for having everybody. Thank you all.

C

A

Okay, so uh back to our regularly scheduled programming, we've got a couple things on the agenda: around open tracing API questions.

A

First, one someone put on here a trace, IDs fan ID. How do we make a decision to proceed? Is that URI yeah.

F

A

The question so.

F

I remember: what's the status of the spec RC for this one, but definitely there's a lot of people keep asking and trying to open PRS in different language recourse, oh yeah.

A

I think we should just get moving on it. I know just like with my team. We've been really focused on getting the scope and scope manager released for Python at the door, and so we haven't felt like we've personally, had bandwidth to to also release and manage this in other languages. While that's going on.

A

So that's honestly, probably the holdup I don't supposed to go out this week, there's some final, a little back and forth about naming conventions, but people in general seem satisfied with that API. So my plan was as soon as that was out to start pushing on span and trace ID.

F

My question is: are we okay with breaking the API in this case, because it will be a break and change in many languages right, it's definitely in go and it may come. It may clash with the existing tracers already implementing those methods, but potentially with different return types. Yeah.

A

So I would say that there's like two issues there, one it it's a breaking change for tracing implementers, but not a breaking change, but it's it's a breaking change this backwards compatible. So you now have to expose these on your tracer and issue a new version of your tracer, but that tracer will conform to the older API. So it's not like you need to fork and maintain two versions and four users of the code.

A

It's not a breaking change at all, because it's simply an additive method, so in that sense, I think everyone's fine with it being a breaking change, because it's it's more of a distant and minor update, as opposed to a major break.

A

The other issue that that's, maybe more serious or harder to see, is around naming these methods should they be called, trace, ID and span ID or trace identifier and span identifiers, which is a big mouthful, but definitely it's the chances of a collision with a pre-existing, pre-existing method that returns. Something else. We've seen one example of that which is the Mach: tracer has Trey's ID and span ID and it returns like a UN, but we've been asking around actual implementers and no one with a tracer currently binding to open tracing has spoken up and said.

A

No that won't work so I think that's really the the final bike shed. There there's been a lot of push from everyone to be like Tracey, ID and span. Id are nice names let's and it doesn't seem to mess up any real code. So let's do it. I am a little nervous. That's a little show up too late and say hey this messed with me. It's.

F

Not quite non-breaking change, because in the absence of this api's today, people do cast to concrete implementations and use those existing methods which may be returning different types. So, though, the end user code will be affected so that it's just that, if.

A

You're casting I, don't the only case where this is potentially a breaking change would be if you literally, had the method trace, ID and span ID with the same capitalization and everything else, yeah, which most most racers had I. Think that that's that's been the question asking around like who literally has this method, signature that returns something else and and no one has spoken up saying that they do the the other answer is just to name it. Something slightly different right. I think that that's the final question has gets resolved.

A

If you call it just a slightly different name, then you massively reduced the chance of there being a collision yeah. No one called it trace identifier, because that's really long to type it's just! You now have this API we're asking everyone to use, and it's and it's got a funky method signature as a result of this. So really maybe it's like. Can we do a more exhaustive audit of existing tracers that bind to open tracing and really get an active confirmation that it will or will not be a problem? Well.

F

As far as Jager, every single librarian Hagar had a traced, ID and span ID in the most idiomatic form for the language right, so in Ingo it would be like upper ID, etc. So definitely gonna have class and they would return like native types rather than strings. Did you say something man we've got that question anywhere, but you said like Mach tracer had the same thing and I would assume most of the traces well.

A

Then it's just a matter of calling it just calling it something else. I think that's! That's the solution, maybe something that's not as long as identifiers for people who have to type this out manually, that's hard to remember how to spell and very long, so I really think. That's that's what we need.

F

Spanky trace key I'm fine, with the identify myself but yeah.

A

F

Mean good completion seriously, who types this stuff I did.

A

Man, I type it I, can't remember if the eye goes before the e or how many E's there are bad at typing and have no code completion, sometimes anyways I feel for people, but it's also very long on the screen. I don't know. Let's.

D

Not try to do the whole, if mean I. Think URIs question is like how you know this has been a known issue for a long long time, some languages and- and you know the natives are getting restless and that there's, like you, know, people file issues frequently about this and we kind of concluded that we should add something, but that we have made progress.

D

10, I think you're accurately saying that, like there's, basically resourcing issues we're doing this, it seems like a simple change, because conceptually it is that like it does require a bunch of roll out care because of these issues that were bringing up so I. Think the question is: what's the next step, I mean and now I would rather not if possibly get into the discussions the PR discussion in this call or whatever event, and we could be something where we mean the opening up.

D

The PR without merging it in most languages is a very easy thing to do. I mean truly easy thing to do, and it could be done. You know like without getting everything through opening the PRS advertising them soliciting comments from from. You know implementers that sort of thing could probably be done without a lot of tiny investment, at least that's my two cents and is the stuff that could be taking us here.

D

It would also allow people who are coming in and filing these issues to see that, in fact, this is this is like there's something in motion: I, don't know how you feel about a time that I think like did that stuff itself could be kind of paralyzed, so to speak. So it's gonna be a happy but remain open for a while, anyway, just to make sure people see them and got a chance to comment.

A

I would agree with that yeah and again, my apologies for being uh you know, maybe too focused on Python right now, but it's there has been a long-running PR about this. We could essentially socialize it a bit more and just kind of put it out there make tracking issues in every language and kind of announce that it's coming, but yeah I am I.

A

Do think what you just said does me around coming up with a different name for these things, just to make sure we don't collide so I think that's the final bike shed, but but we should move on it very quickly. Once we've resolved that I think if we go with the approach of just picking a name that has a low chance of collision with anything that there's no reason why we can't get a release candidate out in every language quickly and get people to start binding to it.

A

So I think it'll move very quickly once once we do that.

A

All right I will try to get moving on that on Monday. Actually so, and the new Python API should should be out probably on Tuesday.

A

A

Okay, so we've only got about 12 minutes left on the call I wanted to bring up another issue that I would like us to get moving on as well, which is a sort of higher level. Api is for scope management, so I'm just going to share my screen real, quick, just to make it clear what I'm talking about.

A

A

Presentation mode here, so it mostly comes down to having both scopes and spans. So we added a sort of active span concept to open tracing so that the tracer would be responsible for managing which span was active, in which context and if you have some kind of context, switching, whether it be threads or some async level, user land level thing, the the tracer would be tracking that using a span, a scope manager.

A

So each context that has a span is called scope and you can ask the scope manager for the currently active scope and pull the span off of it. Scopes have to be closed when they're done and that doesn't always necessarily line up with a span being finished, because you may be moving spans from context to context. So you may make a span active in one scope, then close that scope move this span to another scope so on and so forth.

A

So it seems that at a lower level, there is a need for for this extra concept of a scope manager. But if you see here the amount of code, you have to write, it's not totally onerous.

A

This is just two simple functions, start work which makes an active span and a scope puts a tag on it and then you finish it and another function. So you pull off the active scope, maybe do a log and then close it if you're writing library instrumentation framework instrumentation.

A

This usually doesn't feel too onerous because you're writing code inside of a plug-in or an interceptor and most of the code. You're. Writing is really focused on on tracing this higher level concept, so that doesn't feel too bad, however, released to me it doesn't feel too bad, but for application developers if you're, if start work and finish work, contain a lot of application code and you're you're doing quite a bit of this. It gets onerous pretty fast.

A

It's also hard to get application developers up to speed on your team, because there's kind of these like extra concepts, you know you're saying build, span, start active, but you don't get a span back. You get a scope and then, if you make the span automatically finish when you close the scope, that's nice, but now you're, saying scope close at the end and you never touch the span there either, so that this has like some cognitive load, that's sort of above and beyond the the simpler model that we had initially envisioned.

A

So if you look at a simpler API, if you make some assumptions that you can make when you're writing application code, such as the presence of a global tracer, you can make this a lot more declarative right, it's possible to create an API where you just say start a span and it's automatically made active and then you can access it declaratively, because you have access to a global tracer. You don't even necessarily have to track the tracer or do any kind of object chain object, method! Chaining, you could just say: hey tag.

A

The current span log on it and then, when you're done, you can say hey justjust finish this thing, so I'm not proposing this precise API I'm just proposing that it should be possible to produce an API. That's that's this simple, and in order to get application developers more comfortable, I think as a community, we should push for for providing some more official ergonomic API, if not looking like this, at least at least something with this level of complexity.

A

So that's that's my pitch I'm gonna be pushing for this in the cross-language working group starting next week as well, but I was interested if anyone had comments on this at this time or thoughts about how to do this or any kind of experience. Reports from working with scopes in active spans in the field.

G

Well, I was I have to say that man you from day to dog he had mentioned that he would love to see something like this as well. So I think this is something nice I think it would require a bit of testing, of course and refactoring, and all that the thing is it's all a quick idea.

A

Yeah I've heard from several people who couldn't be on this, call that they're very interested in something like this. So you know we'll have a discussion online on Gitter, but there's also just the sort of general issue of you know. Do we need scope, scope, managers that kind of a thing Pavel I know you were asking about that. Do you have any thoughts on this?

A

Okay, there pebble.

E

Yes, some years I don't have not.

A

E

I'm writing mostly instrumentations, and there I prefer to pass things explicitly around yeah.

D

You haven't I think that's really hinges on whether you're talking about instrumenting stuff in library versus just trying to get your work done as an application developer, and these sorts of higher level abstractions I think make a lot of sense for the latter, where we want an easy mode type of experience. But as Pablo saying, for you know, very meticulous instrumentation of shared libraries, it probably makes more sense to avoid the Global's and stuff like that.

D

A

I think a side effect of making something like this more officious is, is to make it clear that there would be two style guides when you're writing instrumentation. There's a style guide that talks about you know, don't presume a global tracer right always take in a tracer, is an option and fall back to the global tracer. It won't give you one, and basically you wouldn't get to use this. This cleaner API, because this API makes a bunch of assumptions.

A

Essentially, the purpose of it is to high scopes and some of those lower-level complexity that you actually need when you're writing, trickier, instrumentation, but but don't need in the most common use cases that application developers are hitting over and over again, I would.

E

Have some comments to start and finish, but tagging look looks very nice yeah.

A

Yeah start span finish span, maybe, but but the long basically, the long and short of it is, can we take the scopes and scope managers and make that a concept that, as an application developer, you never have to think about that you're, not necessarily even aware that they exist until you get into some tricky situation, and then you dig into the docs and discover there's actually these lower level api's that you can use to deal with those situations. I.

E

Mean maybe it's better idea to completely leave out the start and finish and provide some API only to add metadata if there is an active spanner or something active, hmm yeah.

A

Well, let's have the discussion on git er. This is Mason, mainly just a sort of advertisement to people that that we want to kind of get moving on this and really we should have it. You know in a forum where you know people in time, zones that aren't this can't make it to this call can participate in it, but I would if people have ideas about what this kind of API might look like, or you know, if they're already working with application developers who have written something like this, it would be great to start.

A

You know some contribute. Oh, that are experimenting with this one. Nice thing is I'm fairly certain. We can write all this without actually touching the tracer API. That I think would be one of the goals, so there's a lot of room to sort of experiment with different approaches to this and contribute moving on that.

F

One thing I want to add is when I saw there's an agenda. I thought that would be a different topic. More about high-level API is for specific operations like HTTP requests or database requests, so which kind of I mean works in a similar way. That people often ask like for some standard way of doing these things.

A

Yes, I definitely think we need those as well and that could get get wrapped up in this. For example, if you see tag where we say some tag key some tag value, that's fine for your own custom tags, but actually you know going and finding the constants and kind of gluing them together. When you want to do something like say you know, login error or an exception.

A

There are definitely there's definitely room for you know higher level functions that that do all that work for you, where you can just pass it the exception and not have to think about how that translates into which keys and values should be stuck on to the span.

A

Likewise, for something like HTTP requests database requests, we could probably make some more ergonomic calls that aren't creating a whole bunch of key value pairs.

D

Yeah another thought on this is that we, you know in this discussion earlier around traces span IDs. We would need to make a change like that in some kind of coordinated fashion across languages, I think that for some of the higher level primitives they they actually naturally should deviate from language to language like if you're working in a Ruby or rails environment, or something like that.

D

But types of primitives that you might want for convenience are actually different than what you'd want and go and so on so forth and and that can actually make the stuff go bit faster. I think when we have to do cross language stuff, because I think we're now dealing with like nine languages or something like that. It's a bit daunting to start one, those projects, knowing how much parallel work is going to have to take place and for this I hear you just mentioned around HTTP, and things like that.

D

I might need some coordination, but for things that are the sugar just to make it easier to do simple things. I can imagine that happening. You know in a decoupled way across languages and let language owners make those decisions independently.

A

Yeah totally I think another way of thinking about this is there's been a lot and lot of work of trying to figure out what is the correct low-level API for tracers to bind to and that work has been slow going. It's very difficult work, but we're getting. It feels to me getting to the end of that and that's starting to gel up and now it's sort of time between this kind of work and things like getting span and trace.

A

Identifier is out there to allow people to start building, middleware and other things, we're sort of moving up the stack to application, developer zone and things that they would like, and that world is definitely much more opinionated and nuanced and there's room even within a single language, to have more than one way to to do. This I think we should probably offer some.

A

You know official version of this at some point just to lower the cognitive overhead, but I totally anticipate you know in Java there's some people who may want to do this kind of thing with annotations some people who may want to do it using some other declarative strategy. Like you said, Ruby there's a lot of different metadata magic approaches to doing things, and, what's great about doing these is higher-level. Api is like not everyone has to agree.

A

You can have several different approaches here and they're all complimentary, with each other, even within the same code base.

A

And we're basically out of time so unless anyone has any final comments on this, I would suggest we retake these discussions to the cross-language Gitter channel and continue them there.

A

All right, good call, everyone will be seeing your lovely faces.

G