Node.js Diagnostics Working Group, 25 Sep 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2019 September 25, Node.js Diagnostics WG Deep Dive: Process Crash

Description

https://github.com/nodejs/bootstrap/issues/114

A

Okay, we are live so welcome everyone on the node.js diagnostic working group, meeting, 2019 September 25. The today's meeting is different than the usual one, meaning that today we will discuss one of the deep dive topics around diagnostic use cases, user journeys and and tutorials, and we peaked the crash analysis as the first one. So today we will talk about both how we want to work together on these two topics: kind of figuring out. What are the use cases what have available to us? What are the user journeys today?

A

What would be the idea usage for me and what kind of documentation other heavy materials we can provide to the community?

A

So the today is the first one as I mentioned, which little bit will be mixed, both figuring out how we, how we want to work on this and to discuss the actual deep type of it, which is the closest crash willing. The document today meetings. Actually it's cross-linked both in the original data security, new notes document, and we have these separate documents which will be used today. So you can find the link in both the github issue and in the zoom meeting.

A

Okay. So what was in my mind how we could facilitate this meeting today is kind of start to come up with use cases. What what we we know, as both the users and I know, stick working group members that people may want to do around process crash analyzes. Then we could kind of discuss how we are debugging today, those what are the tools and what are the gaps.

A

So what would be the idea where we want to be, and we should also discuss that what we want to start to recommend to people and what the condition would be required for that? What what people think in this meeting all this kind of agenda?

A

A

Do you see the Queen Chad, yes, yeah.

A

Okay about the use cases, I have no better idea. Then I start with one right now and I would like to hear what other people are facing issues around this or know about that their customer facing and- and hopefully we start to have some kind of list. What we can work with that. Does it sound good to the people here?

A

Yep, okay, so what's probably really Netflix, Pacific or platform-specific? Is that our use case is basically, my team is building a platform where people running- let's say arbitrary code, black box code from our team point of view, because we are just providing a platform and operating. So when we start to see crashes, then sometimes it can be really hard to to kind of inspect to process the post-mortem in a way, you're kind of gaining as data to really figure that what happened.

A

So sometimes the error message and the stack trace, which was thrown, have enough information for that sometimes doesn't, and the use cases when it doesn't have enough information is usually when either of the stack trace is like it's happening in a third party library which has a really deep stack trace, and the error message is not very useful. This is an example when, for example, you use the official G RPC library.

A

So when you make an issue like when you pass the negative negative number to positive integer strongly tied for buffer, that it blows up, the error message is not used for the stack trace is not useful because it's inside the library, so it's kind of a black box. What really happened in the process- and another use case is why am someone is catching the error and withdrawing?

A

This is really usual with promises or try/catch or rxjs, or those kind of things, and with what you see is tech rates on your console is not the original set rates. So you kind of need a way to find the error in the process and kind of read original set waste and other error properties.

A

So that's the tool, but actually I already put in this document as if you do, one is kind of let's, let's call it when there is some third-party library with without useful, like face and another one I, just called it finding the original factory it's at least row.

A

That's the to use case that I can print this conversation. What what other people has in their mind, I.

B

Guess like if we look at the document that juries put together with the crash use case, I mean crashes, could also be like a sea level crash in the process right, yeah, so I don't know if that's like in terms of the the original documentation, it makes sense to combine those like right. Okay, so like it's right, because it's kind.

A

Of like yeah I would say, let's put everything that we had crash here and then later, if it's need and we can categorize or create more deep ideas, if necessary,.

C

The way I look at it is at least there are two different class of issues which comes under the crash. After listening to the explanation of Peter, one is basically low level crashes, such as sea crash, etc, which do not leave a trace of a stack trace for whatever reason, whereas the the crash context and the failure data is crisp and precisely available, if you debug through a core dump debugger, that is the first class of issue. The second class is basically through some abstraction, such as a third-party library, etcetera.

C

The original failing context is completely masked off and you are providing a sophisticated error message propagating that, through as the different control flow paths and at the time when you are intercepting the failure, you completely lost the original context and I I, don't think the crash diagnostics is going to have any association with that for any any usage in anything that can be helped and that that's not it well.

B

Except, except that I think some of the stuff you have in your in your document is about being prepared for those environments. In terms of like you know, do you have the you limit, set names and is there any commonality and those two situations.

C

Yeah I mean the only commonality would be like there is somebody such as a low-level tracing which intercepts the origination of the first error and generates a quorum in response to the creation of the first error, and then that core file has complete information that you are looking for and the DPR is a predict and it shows all the predict that is required for generating the column. Yeah.

B

Right, so if it's in those cases we're gonna want to have a core file, then they're there at least is the commonality in terms of what you need to be ready for in terms of those core files right, yep.

C

A

So please we will jump to codons and kind of way of find it. It's it kind of resonates. The second one. The error message is tempered by the error ending pipeline which what I shared so I. Think and and can you talk a little bit about the low-level, see like use cases? I know it can happen, but to be honest, I didn't run too much into this, so I have a really limited. The context would why it can happen, and, and how can happen, could you maybe share some examples? Yeah.

C

So, basically, if you look at the way, node.js is abstracting the JavaScript API. Is it internally make use of v8 api's and almost everywhere, and the v8 API is be within the API or at the call site of the APA. You have macros to validate the sanctity of the input and the output. For example, you will see checks and rechecks. These are basically to make sure that you don't get unexpected results because of a user error or a completely bogus input or output.

C

Now it's so possible that at some because of some bug, these chips are missing at some control flow parts and when the application undergoes that control flow path, the the side effect is that it can crash in a highly unexpected manner. We thought without having any proper JavaScript call stack. Possibly it will be as good as a granular C program that is correct. I.

B

Think awesome: don't.

D

They normally also show at least the last JavaScript function that executed it.

C

Ideally, yes, that is a there is a about hook that is installed by the v8 even before the first Java space of code is executed, but I have seen very, very plain, primitive crashes as well, without without any any intersection by the virtual machine.

B

Right and I guess in that case, even if it does include the JavaScript stock, you need to be set up to be able to get those core dumps and introspect them and get them off. The production machine, which is part of what the the other dog Jewish put together, covers yeah. The other thing I'll add to that. One is like, in addition to something as the e crashing anytime, your application uses native modules. You know the native modules can also cause you know, node to crash.

D

B

That's probably worry, like you know, you could be using modules which don't have as heavy a usage or as broader usages the core runtime itself, so, let's potentially I think more likely to.

C

B

D

Cool one more type, maybe so what I also said like what's normally also difficult to debug is, if you have a memory leak and your process crashes, and because your process runs out of memory and that's also pretty difficult to debug, because you don't know where it's coming from yeah.

B

You know not a GC, yeah, GC managed heap memory leak, put an actual like native memory. Leak can definitely cause yeah.

A

Okay, cool, so maybe I will move my use cases under the tempered. One I think this. These categories are really great and maybe maybe it is fall under the low levels. It's it sounds like we have two bigger categories: the low level one which can be by v8 internal bugs checks based on the user.

A

Input can be caused by native model crashes and memory leak in native models, and we have another bigger category, which is mainly when the error is tempered by the error ending pipeline, which couple of examples can be third party libraries kind of not making really good job of making the arrows and can be. We are simply restoring arrows wrapping across or doing other things with error which which causes the original error message and stacktrace kind of hidden in the.

B

A

B

We have isn't the second one more like that. You can have things like unhandled exceptions and resolve promises which, depending on your options, can cause you to exit and then under that you can either have one that gives you a nice good stack trace where it's like obvious. What is happening or one, where you've got that problem of the tampering.

B

A

That, let's, let's break it down to more.

A

Ejection and exception, and one category is the prognostic one another one. When a trace and error message has enough sufficient information to really know what happened.

C

And I would like to clarify on the first bullet, which is v8 checks when input check is missing, can lead to a crash. Yes, basically, what I'm Mandy is the check missing is not necessarily the root cause of the crash check. Missing is now causing the manifestation to be of the vanilla crash. If the check is present, then we would see a more decent aramis, a probably optional crash, as for the actual root cause could be any any bug in the virtual machine or in the data or in the application.

C

The check missing is actually causing the the sophisticated atom error. Caching mechanism leading to row crashes.

A

E

A

Does it turn now.

D

What about like until I think no 12? We never showed properties on arrows, for example, when we lock them that's different now, so now we definitely lark properties all the time when they're directly on the arrow as an enumerable property, but we often create arrows in various ways and often also have properties on them that are non enumerable due to being on the prototype. For example, what opinion do you have about these things? Should these properties be visible by default, it might be a bit off. Okay, I'm, not totally sure.

A

That's a good question, so I would say that maybe do we have any other use case, because if we don't have we can move to discuss, they might generalize.

B

The one you have there like, basically the it's again, the the first one is kind of like bugs in the node.js runtime or one of its dependencies right.

B

Because, really you shouldn't be, but you you shouldn't, be able to cause the runtime itself to crash whatever you do, should basically end up with an exception or a rejection or something.

A

B

I guess cuz, like yeah I, mean to reach correct me if I'm wrong, but even in the case where there's a missing check, some other bug must have allowed that call to be made, or it's a bug that you know they. You know you shouldn't be able to pass data. It would be a bug and node potentially calling zi8 in a way that wasn't intended right.

A

A

Okay, we mentioned condoms, Ruben mentioned other properties. Do we have other use cases or should we move towards what we have today? Just.

B

Thinking of there are any other things that cause unintended like. Basically, you exit.

A

We lost the rash in the meantime, hopefully, hopefully.

B

Is rejoining yeah wow.

A

I, don't have anything else, in my mind, is.

B

It if you ever run into cases where, like bugs in the code, caused you to exit I, guess those are more unhandled exceptions or rejection or something yeah.

A

Some of Netflix we have many of them, because we, our philosophy, is that we crash the process instead of keeping it in consistent, States all right so um but I know how example, people who using Express GS the XS Jas, has a try-catch in the middle right, so synchronous error happens. Then the xsj s basically catches that and doesn't crash and the promise also catches errors. So my feeling is that maybe in the new type, I mean more modern applications.

A

It's it's less frequently that programmatic arrows make the process crashes, but I'm not sure, actually, maybe Ruben. What's your experience, I know you did some concert in recently.

D

I'm, sorry, can you just repeat it: yeah.

A

So the question was that what's the percentage Noah days or and application crashes from programmatic errors, and then we started to discuss that because promises catches them and the xpress has a try-catch. Maybe it's less frequent now that actually applications crash is because the automatic errors, but for and I was just asking. What's your experience, because probably you got the most involved with many user applications.

A

D

You mean it's a little tire you see well.

A

D

Hard to give it like a concrete number.

B

But I was more going.

E

B

That a different use case like, if is, is there you know, say it Netflix. If you know you basically said youth, you say well for an inconsistent state. We want to exit. Is it that you throw an exception at that point or is there like you know, do you call process not exit or is there some.

C

Other way that.

B

Basically, the running instance terminates yes,.

A

We are throwing an exception: okay,.

B

So that's probably covered under that one. Yes,.

A

Okay, then maybe, let's move to the tools and later we can always come back and multiple use cases.

A

Every couple of tools in this table, which I had in my mind, but it's a good point it actually what Reuben mentioned kind of we can see it as a tool. But now we have the new arrow property, printing, English.

A

And one gap is for sure what you mentioned.

A

Okay, so we have three tools in this table: I mainly again, I was working. What was in my head at Netflix, we use condoms heavily, meaning we are running or process with about on-call exception and we have a ready to download the code and and otherwise, and we use it to to find original stacktrace and when it's usually useful than ever that tampered at least that's the most common use case when we need to use it I know, IBM was heavily involved with the diagnostic report and Rubin was involved with error.

A

So you want to talk about those tools, but we are more familiar with use cases. You.

C

Know diagnostic report originally was third party model called node report. Basically, it gets connected with the application runs in a different thread, I believe no in the original tool and then inserts the hook for signals and unhandled exceptions, and things like that and.

C

It also has the ability to Colin to collect reports on demand, basically API based report generation. The whole idea of report is to provide a snapshot of the running virtual machine. Looking from different aspects, the JavaScript heap the overall native memory, the versions of the subsystems, the libuv state, the lorda shared libraries and environment variables, and things like that. Basically, looking from a diagnostic engineer's perspective and see what are the most useful information, he would be interested in and capture all those things in once in the readable file text file.

C

That was the philosophy behind Nord report when it was the integrator into the core, the objective slightly changed. No, we made it into a JSON formatted report other than that, fundamentally, that the content remains unchanged. The the philosophy basically is the same, provide as much as useful information collected in one shot from the running application.

C

A

Know the dynastic report is a very general because it has a lot of information. Can we can we look it from the from the use case its point of view where it can be useful, so.

B

You know, for example, if you you know, if say it includes information about you limits, hmm and you know those are. The kinds of things were like. So, for example, if you had a you limit that constrained your memory to be too low and that's sending a person crashes or if you're getting an unhandled exception, because you know you can't create a sockets open a file, you know the you limits can help. You understand at a quick shot.

B

It's like you can you can look at those and say well: okay, yeah you're, having troubles with memory because you're you limit is setting you to the point where you don't have enough memory to run properly or you know you know, your application creates a whole bunch of sockets, but the limits set so that you're, you know it's it's too little free to be able to do that.

B

So it helps with that kind of site that cut side of things in terms of crashes. It does include some information right to reach in terms of the stock, traces and.

C

That's right, yeah now.

B

I think the regular a crash includes a stack trace as well. Does it does, is it the same, or is there an extension in that that case so.

C

If you are looking at the plane crashes that is controlled by the v8, you get the JavaScript stack trace, whereas the one which is controlled by the diagnostic report, you get both JavaScript stack as well as the native stack, of course, depending on the use case and the specific control flow point at which the crash has happened. The native stack can be of great value or of no value right.

B

So would that be an example of where, if your crashes, in err on that crack should be quite useful, yeah.

A

Okay, so based on the configuration because it's highlighting the configuration the matrix that face, maybe the dynastic report is something she's more useful in the native related use cases in the first category. Is it fair to say or not.

B

Really well the things like the uncaught exceptions. I think you can end up with like.

A

B

Have a process limit and the run you know the VM urn or no does the right thing and instead of crashing says, wait a sec I can't open this file or I can't create you. A new, socket I think you would end up with an exception and so I think in those cases, it's just as useful right. Okay,.

B

The other thing it includes is is shared libraries I know, one that Jewish ran into was a mismatch in the shared libraries, and you know the thought information helped figure that one out.

C

Yeah, that was basically, we have different types of distributions of no JS binaries and, for example, in certain Linux distributions. They have the option to bundle the open, SSL versus linking against the existing optimism, shared libraries in the system, etc. Now, just by looking at the node.js version, you wouldn't know which SSL Bush library is being linked against the executable. In this particular scenario, the application was questioning one of the SSL teams.

C

Just by looking at the listings from the diagnostic port, it was very evident that it was um it was using I mean it was supposed to use the inbuilt library, but because of fair overriding mechanism. It was connecting to a wrong library and that led to wrong offset calculations and crash. So it was just a nope. Once we have the diagnosis poor.

B

I guess, if the, if the exception that you get is like a regular GC out of memory, diagnostic report also includes some information about recent GC activity.

B

So you might be able to tell like okay well wait a sec. Just before I got my out of memory. I was G, seeing like crazy and here's. You know the the the old space size was full or whatever that kind of stuff.

A

A

A

A

Everyone mentioned called them, so I get. Everyone is familiar. Just let's put a couple of points. If someone reads these notes after the meeting them in.

B

It lets you, you know it's got a full memory image of what was going on, so you you have much more. You have a a more complete picture right.

B

It's really just more work to extract the info from it.

C

Last time you were also saying that you are able to capture the original context from the column.

B

And you can also meet the arrow properties. Now. Are you always using ll note for that? Yes, right so I wonder if above it should almost be core dump, / error, /, ll, node yeah, possibly in the table, because it's kind of like it doesn't you can't use it? The core dump is very hard to use. Just thought it sound. Yes,.

E

I agree: okay, what.

A

Else, do we have I guess we have the new error properties feature which is not really a tool, but I believe it can be placed some of the use cases Ruben. Do you want to talk about the error properties.

D

Just as I said before it like him for a long time and no chance did not print arrows fully it just printed to the stack trace and the stack trace contains the actual stack frames, and it also contains the error message and and arrow name, while it might not be the actual name of the error class, because if you have a subclass of an arrow, then it normally only prints the native class name. So let's say you have error of class foo extends type error.

D

Then it will say type error, not foo, if you don't explicitly add the name to the class foo and while instantiating an error or getting a prototype property on it, and we did not show any any error properties, which was quite bad. Also with note arrows.

D

We have a lot of properties at sometimes about error codes when we have FS, so any file system involved calls and users could not debug that as dead well, because information was missing and they also do not have the information about what properties the error might actually have and to access them programmatically.

D

So it's just. It was a bad situation, but this is solved, but right now so also, let's say you have a nested error.

D

um The whole error is normally visible by now, if it's not massive too far, and it will contain all the all the properties that are innumerable, but we still have multiple properties that would not be visible as such, especially when you have error subclasses and then you might have some and not static properties, but like a kind of on some classes and that you just have on a prototype, and you know every instance can access their property and programmatically.

D

But you actively have to know about the property because it will not be presented by default, and my question is: should we actually make sure to print those as well? It would be maybe a bit weird, but maybe it's also useful for end-users was.

B

There, a discussion about that before like was it a conscious choice not to print those I.

D

Think it was not a conscious choice. I think it was just historically because and like if you think about arrow to string. It is exactly what it was before and that's how was printed since I think the beginning of no J's until I changed it and to also include no wait, I'm, not sure about all properties. It might have printed in some cases, some properties that normally did not, and now it definitely does ah sorry. It did not show them when you crashed. That was it. It did show them when he actively locked them. Yeah.

B

So if you didn't eregistry.

E

B

Them but you will get them when you crashed yeah, so.

D

um Until recently, when you crashed, we only inspect at the error on the c++ side and we didn't use util inspect on it. Now we actually use util, inspect to inspect the error and then print that on the terminal when crashing.

B

D

Of it, while we get it, that's why we get the error properties now, all right.

B

And the reason.

D

B

We don't get the other ones. You're thinking about is just because inspect doesn't show those right. Yes,.

B

And what's the downside.

D

We might show like it's difficult to distinguish all these properties, because, if you think about prototype and how people might create an error, there are really lots of ways how people do that. It's not only about es6 classes, but also about the old prototype style and then, if they call have like a kind of a super call or not, and so how you could detect. These properties is actually by going down the prototype and then checking and the prototype properties and then removing.

D

Let's say if there is message or a name or two string and also remove all that and that our function and but all other properties that have the primitive in them or the our getter and they might be inspected, and that would probably be valuable to the end-user to improve their debugging experience.

D

It's just not absolutely reliable because you could have like multiple layers of prototypes because you have like superclass and the subclass. Sorry. The superclass has another superclass, so I.

B

Guess the question is: if the informations in those super classes are useful in the particular error failure rate no.

D

You're right, yes, that's pretty much in my question.

A

More information, I, guess, is better about. To be honest, we are not allowed to have so I. Don't really have the experience of even what having the properties.

D

But it's not about Corgan's, really it's just about getting more information to the user. I.

A

Mean these are kind of related in my head, because if you have more information than the tax rates, it's wrong on the console, maybe you don't have to use condoms, which is actually great.

B

Yeah, you know you: if you have the initial stock trace, that's great! Maybe that helps you solve it. Then you've got like the node report, which has a little bit more information and then, if none of those help you, then you gotta move to the courthouse right. Oh.

D

There is one more point when we talk about the stack frame, so when we have a recursion algorithm, for example, we often have a lot of frames that are not useful in the end and- and you don't see the actual occurrence anymore, because we only track ten stack frames at the moment, and so we might consider at least removing recursive algorithms in node core, so that a user stack traces are visible or and reduce our stack frames as much as possible to keep the original user stack frame in there.

D

No comment down there: I'm.

B

Just thinking would you'd ever make sense to have like you know. If you could tag all the internal stack frames and hide those, but we.

A

D

B

A bit okay, we already have so right. How are the extra ones.

D

B

D

It's working, so you can't actively say that we do not want to track no warframes. We cannot distinguish that before getting the stack frames. What we can do is to hide them after serializing the stack frames and that's what we are currently doing. I think I implemented it in no 12 I'm, not sure if it's before to the secrets in verse number minor, where we now gray out the note core frames, but it's done after serializing them.

D

So we already collected and stack frames and now the default is 10 and, let's say nine of those are note forcings and only a single one is a user frame, but maybe it's not the original user frame and in the worst case we have a recursion algorithm in node core and it's more than ten stack frames. So the user stack frame is not visible anymore.

D

Even if we have to note correct frames, because then we would just have zero frames left right, yep.

B

So, what's the question like this is a good thing to do to try and make that better. So.

D

We might and try to reduce the amount of recursive algorithms in node core and also I would like to start thinking about. How can we reduce note course that frames in general to the absolute minimum, maybe even have an like discuss with DES, be a team if it's possible to distinguish specific frames to track, don't tract any pretty much and or or not to count them towards the number 10 as a limit? That would be pretty cool.

D

So let's say you have the 10 stack frame limit as it is by default now, but we could tell the 8 not to track specific frames to that number to that counter, so we would get at least 10 up to 10 frames from the user. That would improve a couple of use cases because you also have modules, so it's not only applying to node core in this case, but it could also apply to node modules within the module implement. That could say you, I don't know, could try skip tracking dos or something like that. I.

B

Think it's interesting to be able to take, although it'd be like yeah I, don't know how how that those would be identified or how you would flag them. You know properly the to solve.

E

B

The other ones yeah.

D

I guess you could maybe I don't know if it's possible I would definitely we have to talk to the VA team to get some pretty inside, but maybe maybe it's possible to say to give a specific file name and like a full file, name right and then no frames inside of that file might be trying yeah.

B

It depends on how they keep them and if they can efficiently figure that area. You know yes,.

D

Yes, exactly that's probably the main problem.

B

But I think I think like we don't have a huge amount of time. I think what we'd want to do is this: to try and capture our use case in recommendations and then have a separate list of things we might improve on that.

B

You know so this one, like you know, I, would just throw into one of the limitations. Is that sometimes you know we can only we only capture display a limited number of frames, and sometimes the contents of those frames isn't what would be most useful right. Yes,.

D

Definitely just.

B

Something I would.

D

Like to track on a large urn, because, ideally it's as Peter, also sent that we hopefully at some point, get most of our information from the arrow itself and don't need any person autumn, debugging anymore.

B

So I mean I, I would say you know, let's say that that's one of the limitations cuz, you know it'd, be it'd, be good to have a here's. What we have? Here's, the here's, the use case, here's what we we have today. Here's our recommendations for how you do how you you know you basically debug and then here's a list of things that are still a challenge in that flow, that we can work on to improve yeah.

A

Okay, I think it would be too ambitious in this remaining seven minutes to really go into user journey. So, let's, let's try to focus what we started. The use case have been kind of the tools what we have today, because it sounds reasonable for the people sure, okay, so we design week, report condoms and our properties and what the arrow properties are today and what it could involve in the future.

A

Is there any other way how we can do it crashes like I, mean I, know that people have kind of different look processing aggregation, but it's all depends on the arrow. Profit is basically and what I think.

B

Might be trace trace events which today, our story is not great on, but that's another way of if you turn on increasing levels of tracing and you can get a trace just before you crash that can help you understand where you're crashing. It's like the control flow that led up to things.

C

Well, do you mean the inbuilt tracing that the v8 engine provides yeah.

B

Well, we I mean we have support that for that and note as well. We just don't have a lot of trace points. We could introduce more trace points like that. If you have a if you have a actual sea level crash that can help in terms of the exceptions. Sometimes it's a bit harder because you know you've already unwound for where the, where the actual error happened, but it can still give you an idea of like what were you doing before you got.

B

You know into the problems and in fact you know you could, depending on the level of Tracy turn on you know, if you turn on stuff that even shows you what you know, methods are being run. You might be able to see in your user code what caused the exception.

C

Yeah, the trade-off with the trace option is on one hand, it is very, very good in the sense it penetrates every JavaScript matter and with trace points, every C++ method, as well with the most useful information. But on the other hand, it's a little bit less consumable, because the large amount of data it produces makes Italy usually impossible to be used in the production and, more importantly, there is no control mechanism for us to be able to say start tracing only when this method is kicked in or start tracing after this many hours.

C

That kind of that controlling mechanism is not there, so you start end up in tracing from white beginning and produce a lot of data right.

B

But that that is something that we could improve on again: yeah.

D

B

Like you know, I, if we want to have like a comprehensive list of you know what you can use to investigate these crashes. Probably still brigands are belongs on the list. It might be that like given the current state, it's not near the top of our recommendations but I, but it it could be.

B

For example, like the things you just mentioned, like hey I want you to start tracing, you know when some happens or after a certain amount of time or whatever those kinds of things or turn it on when yeah, when this happens and then trace for a minute and then turn off or something right, yeah.

C

And yeah in general, if you can customize the scope for tracing that, would you know drastically improve the user experience with the trace option, not.

B

C

The crash Diagnostics, but also for say, for example, a module developer or ever an end user wants to debug a specific module. So he could turn on the tracing only specific to a module and get the control flow specific to that module.

C

Rather than doing a live, debugging throwdini the core control flowing just turn on the trace, which has a filter that applies only for the specified module and everything else is taken off in the latest right. What he sees is a control flow that is flowing through the interest rate, module alone.

B

Right yeah, something like that: cuz yeah, you could see you you get a crash, you get an exception. Maybe you could tell what module ten, but you can't quite tell how it got there. If you then enable trace.

A

Okay, we are running out of time. um We kind of collect the use cases and kind of collect the tools. What we have today, I think what we could do offline, it's kind of describing what is the usage of tuesday's like what it means to using dynastic report or what it really means using condoms.

A

Girish porous actually did a great job there, so I can copy information. This document and I think we can do that offline and we don't. We need to be in a meeting to do that. But probably what we need to be in the meeting is we need to agree what we want to a document. So what actually?

A

We want to recommend to the people to do and how to do it and and the more interesting conversation which we kind of started to touch when we were talking what it could be, the arrow property, but it could be. The trace point is what would be the ideal user journey and the ideal tool. I think that that could be really interesting. Conversation that why we have three different tools for how we could make it one which is working on every platform.

A

I, don't expect that everything is possible because we have some technical limitations, but that could be an interesting next meeting, but it makes sense to kind of finishing or flying this. What is it today and having the meeting about what we recommend and what what the future could be? I think.

B

So the the thing I'd say, though, is like I. Don't need know that we need to pull it all into this. One document you know like that under the documentation crash, there's a sort of the crash set up because I think there's kind of gonna.

B

There's gonna be like what do you need to do in advance so that you're ready when it happens and and that's you know, part of what jury she's put together there and if we can actually agree on that and land it that smaller piece I think that's good, then the next step would be the okay. Now your application is crashing. What do you do and and then we could talk through and I my mind is kind of the layered thing of like hey.

B

If you got a site if it crashed and you got an error and it's printing, the properties, here's maybe what you want to look for and if that helps you solve it great. The next recommendation is to, like you know, use diagnostic report to get some more information and some detail on that, and then.

B

Finally, if that doesn't work, you know you can spin up ll, node or possibly tracing and here's what we recommend you do today now you know, and can we make that those like a set of you know consumable recommendations like it could be that even that should be layered worse, like that next document says here's the flow that we recommend and then here's a link out to like.

B

Well, how do you use diagnostic report and what to look for and how to use our all note and so forth, right, okay, because even just understanding that, like hey here, the tools and here's, the order that we recommend that make sense, I, think and it is probably useful and then, with the pointers on how you actually do the more detailed investigation.

A

Should we do this in the current open, pull requests, or should we create to the merge this form and create more I, know it's open for, like months, which is not ideal.

B

Yeah I I would almost say, like you know, if that's if we can agree that, that's a reasonable recommendation for setting up to be ready for crashes, we should lie on that and then you know we could add another document underneath crash, which is like the readme that says. Okay, you know you need to you know crashes. Here's some of the you know could have a some here's, some of the use cases and maybe point back to the larger document that says you know you've got your you've got this situation and here's the summary. Now.

B

What do you do? Well, first of all make sure that you know your setup point over the setup document. Second, here's their here's, the sort of initial steps we recommend and that document could be its own thing and then it could have within. We could have new documents that go in there that are like how to use diagnostic report: how to use core dumps how to use error properties.

B

Okay, does that make sense, or it's just sort of a way so that you can read to whatever level, though you want to it a bit particular time, yeah I think it's me.

A

Thank you like. We already know that, although quad dumps is very powerful, it's like nobody really wants to busy hello. What would be friend it really make sense that, let's start with the most important, the most powerful one, and if you need to you, need to do more complex things than.

B

Right and and I think we should like personally, if we can break it up a little bit like that, if we can land little pieces like even if we just land like if the using core dumps ll node, if the initial thing we land is simply like a collection of links to articles or things that are already public, that's at least a starting point right and we could lie on that. Then the next step would be to go through and actually extract the most important information and put it into the dock itself right.

B

A

A good idea, I'm just will get stalled.

B

If we try and like document everything at the same time, right.

A

Any volunteer who can come up with just the structure so not via the content. It's just structure what we did for fall.

B

All volunteer to work with Teresa, not about thanks. Oh maybe, if the ask could be you know, be great to just land this crash one we can always adjust later if we think the contents useful and reasonable right.

A

Okay, yeah just what I meant having a structure, what we write, what you want to achieve in a long term, even if we right now, we don't even link them to access articles just to do just kind of having the picture yeah.

B

Yeah, that's what we'll do we'll look at you know either the readme with the links or empty versions of files that would go under crash out with that. Okay,.

A

A

I have to go and we're running out of time is there anything that we should us now or I would recommend that in two weeks let's do the normal. The analytic meeting kind of with a little conversation how to proceed on this one and then the next one could be again a deep dive on either continuing this one or attaching other topics so yeah, big sense. Okay, then, thank you, everyone and wonderful day. All.

B

Right talk to you later bye. Thank you.