Eclipse OMR OMR Technical Talks, 3 Jun 2020

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Overview of Data Flow Analysis in the OMR Compiler

Description

Vijay Sundaresan describes the OMR data flow analyses in the OMR compiler.

A

Okay, hi and I guess welcome to another session with BJ, where I'll be going over. The use, deaf and data flow analysis in in Omar.

B

Okay, thanks Phillip, so yeah. Let's, let me talk first about data flow analysis. Today, the data flow analysis engine in Omar is really a part of the code that isn't very front and center to most people's attention. It's just the thing that supposed to more or less work outside of the outside of from a user's perspective is just supposed to work and it's been battle tested over the years. So it's probably a good thing that people aren't in there debugging problems all the time.

B

There's the design of the data flow analysis engine is based on virtual inheritance, so data flow analysis that there's a class called a flow analysis in the file data flow analysis, dart HBP, which is in the compiler optimize the directory and yeah. So the data flow analysis class is the root of that hierarchy.

B

However, there are several classes that extended this I would put those classes that extend it in sort of two different categories: there's more or less what I would term abstract classes, but abstract classes that get more and more specialized in functionality, but aren't good enough to do anything on their own as such, and in this category, I would put things like forward.

B

Df set analysis, Union DF set analysis, these sorts of subclasses, which we'll get to in do in the talk.

B

So there are kind of really helper classes which specialize certain functionality more and more as you go down the hierarchy, but they're, not an actual analysis. You know themselves, so the second set of sub classes is the set of sub classes. That are actual analyses in this category would be things like reaching definitions or things like lightness, analyses that you're sort of more used to from a compiler, Theory perspective and and so oftentimes.

B

These analyses that I just mentioned the second set of sub classes tend to be the leaves off the class hierarchy that, through to the data flow analysis and ideas, of course, is the ultimate sort of abstract class in the hierarchy. It. It only defines them very high-level concept, helper functions and so on.

B

It doesn't even dictate, for example, what the container will be that is used to do the flow analysis by that I mean the data structure that is used to represent the set that are being flown around in the in the data flow analysis, so that that may be a good next topic to go to, in fact, so in terms of the set that we allow, there's really I would say two categories of off containers or sets that we allow to be flown around one is the in the bit vector or bit set category, and that's the hierarchy that is represented by DF, set analysis forward.

B

Df set analysis. These sorts of classes are kind of representing the fundamentally the bit vector type of container and I say why I'll explain why I say type of container shortly. We only have, to my knowledge at least one other data flow analysis that is not bit rector or category as such a bit like the container category, and that is one of the analyses that we have driven by partial redundancy elimination.

B

It's called exception check motion, it's a subclass that they're in partial redundancy dot hvp, and it actually does a dataflow analysis on list its core. A container is a linked list and it's trying to do various operations on those lists, as we are fluid flowing around, and perhaps that's an interesting enough topic that I will spend a couple of minutes on it eventually, but we don't need to get bogged down in that area. To begin with, I think the port for large majority of our dataflow analysis are in the bit vector analysis family.

B

So really these classes actually used to be called bit vector, analysis and Union bit vector analysis and forward a bit waiter analysis, and so on that these were the names of these classes. However, a few years back, we renamed those to be more generic, so DF set analysis that I was saying or forward DF stat analysis. These became the names that were used in the code, and this was because we abstracted out the notion of a bit vector itself.

B

So there there's a class with vector class in the form our code base that's used, but we also wanted it to be able to run, for example, in other cases when, when what you're propagating around was just a bit container, so you didn't need an entire bit vector analysis or a bit vector data structure to hold a sinc single bit, and so there is a in the Omar compiler.

B

There's a there's, a bit container class as well, and so the idea was to allow the core engine to function, sort of the same way, whether you're operating on actual bit like a class instances or on bit container instances. So if you look at the, if you look at the code in dataflow analysis or HPP, there's some template ization that goes on there as well with respect to the container.

B

So you can instantiate any of these classes that I talked about with either a bit vector or bit container in actual fact, so so that you'll see the template ization through the mention of the class container and all the API is that you see in data flow analysis, Jordan hvp or a lot of a large number of them will be written in terms off the container sort of notion or or the API.

B

So let's get back to the actual contents of these classes and how they're organized so, as I said, the the dataflow analysis class on its own is just a placeholder very much that's useful as all that's in there handy routines to get at the optimizer or the compilation object, or the memory allocator routines trace option, for example, whether it's fair or not those sorts of very generic notions, and there are a few helper routines in there that give you sort of very basic.

B

Is this node alias or is a similar reference, alias with that one, for example, or or does this tree have any exception, checks in it very general notions that don't have anything to do with any particular analysis as such?

B

Then you get into the root of the bit wrecker Analysis hierarchy or vector clock and container hierarchy, and that's rooted at the basic D F set analysis class, which is a subclass and immediate subclass of dataflow analysis, and this class now starts to talk about notions that are more specific, written more in terms of the bit containers.

B

So, for example, this class will have a routine called get number of bits, which of course, doesn't particularly make sense for a dataflow analysis, but it does make sense in this context, you'll notice a large sort of preponderance of virtual functions in this in this class and that's kind of the theme throughout the virtual inheritance or this sort of virtual hierarchy that we have for the dataflow engine.

B

In particular. The kinds of things that we try to offer as virtual functions are things we think a subclass is going to override in a particular way, and most of these functions are actually overridden. When you look at the sum total of the subclasses that we have in a data flow hierarchy, there's very little that it isn't actually overridden somewhere for the most part of functions with that we made virtual. We are overriding and plugging in different functionality as we go along.

B

So what are some simple sort of api is to wrap one head around in the sense of these virtual methods. So one is, as I said, the number of bits how you count that, depending on the particular analysis that you're implementing, so we offer the user or the diplom mentor of a new data flow analysis, the capability to define his or her way of getting at the number of bits. Sometimes the number of bit depends on the number of similar references.

B

Sometimes it depends on the number of a certain type of nodes, for example, if you're, counting loads or stores or or checks or something it really depends on your own specific sort of use case what they get number of bits returns.

B

Then there are, as you can imagine, in a bit factor analysis as you are going through. The flow is really all about doing operations on at merge points, for example, or as you're encountering new treetops you're merging data store merchants merging the data that's flowing through into those bits vectors so that there's a virtual method named compose that we offer, which basically allows you to take two containers and create a composition of that of those two containers.

B

Now the composition might mean a union or an intersection, and when it means a union you're, the composed method will be overridden in the Union DF set analysis class to be a union and there's a separate class which is.

B

Intersection DF set analysis which will do an intersection operation as the composition operator.

B

So there's a compose there's also an inverse composed the interesting concept, if you think about it, the inverse as the name suggestion where is it useful, sometimes when you're building up genset for a particular data bit factor analysis you're, saying here is a collection of either trees or blocks and they're all resulting in this consolidated gem set.

B

That is the set of bits that are being generated in this section of the code. There's the sort of converse image or inverse image, which is the kill set. The kill set, is the set of bits that are killed by this section of code. So if you open up any sort of conventional compiler, a back-end text book, all these data flow analysis are described in terms of gen and kill step.

B

So, going back to the composition operator when you're looking to generate a consolidated, gen set, the composition operator is often handy in sort of merging to two different sets as you're going along, for example, and the inverse operators and is useful when you're merging together, the kill sets from different tree tops, for example, so it's basically intuitively. If, if you have Union and the compose operator, you have intersection as the inverse composed operator and and yeah. So so so, if Union says that on any one path, this property is true.

B

Let it flow through kill need the property to be killed on all paths in order for the property to be killed. So that's the intuition behind sort of Y inverse composition is an intersection when composition is a union in the interest of time. Let me keep going here that there are fundamentally that there are notions, whether an aqua weather analysis even supports gen and kill, sets nothing forces our dataflow analysis to support gen and kill, set. There's an API.

B

In fact, on the class which says, do you support, gen and kill sets and it's a virtual method as well, and basically, if this API returns true, then there's a then it internally sort of tries to set the set up the gen and kill, set or or is expected to set set up blue sets and the core data flow and a bit vector analysis engine will assume that gen and kill set have been set up.

B

If you return true to supports gen and kill, set and then proceed to do its analysis based on those gen and kill set, what is the alternative to using gen and kill sets in the infrastructure? The alternative is to walk the trees and the nodes every single time you encounter a block during the dataflow analysis. So, as you know, dataflow analysis is really can result in iteration over the CFG as and it can cause you to visit a particular block several time in an effort to get the solution that you're after to converge.

B

So imagine if you were visiting a block repeatedly were inside a deeply nested loop. You were inside several cycles, I mean, and then you have to visit it every single time, ra's and visit the hole every tree and walk every node every single time, if you had a consolidated gen that for the whole block consolidated, killed that for the whole block generated up front when, when the engine gets to that point, guess who's gets to the point of the block.

B

It can just use the gen and kill set to generate the outset for that block, rather than painstakingly walk every node every time.

B

So this notion of gen and kill sets is there some of our dataflow analysis, actually implement a gen and kill set some don't so you can look up which analyses are returning or we're writing this and returning true and see their structure and how it looks compared to how an analysis that does not return. True for this that there's other virtual routines in the in the core classes. Things like whether or the the way in which you are expected to initialize a container.

B

So, for example, if you want to initialize the container for doing an intersection operation operation, really the the initial value ought to be all the bits set, so in other words, all one's in the in the bit vector because you're going to apply intersections to it. If you started with all the bits being 0, the intersection would get blown away as soon as you started, so the the intersection, the initial value is all bed set or Union.

B

The initial value is all but unset, and if you're, if you, that is, if you're going to do actual composition operations as you're going along, if you don't actually do any compositions, if you just let the value pass through, if the initial value passes through, then the rule changes as well. Just because you didn't have any thing to compose doesn't mean you should get everything for example.

B

So so really these sorts of api's are therefore to be used in different contexts in the subclasses that exist under the bit vector analysis that the higher-level super classes, as I said most of these api's are overridden in the sub classes. In fact, most of them are overridden in that first category of sub classes. That I talked about which not even really need that much external exposure that they're just the guts of the engine that implement sort of whether you're doing a forward bit factoring out the backward bit vector analysis and then more specific.

B

If you are a forward, is it a forward Union forward intersection, we're more specific in the case of backward? Are you a backward intersection or a backward Union? These sorts of classes are where a lot of this stuff gets overwritten when it comes down to the leaves, as I was saying, the actual implementations of specific dataflow analysis.

B

That's usually not the place for overriding these types of routines, these initialization routines or the composition routines that those are done for you by basically by these higher-level subclasses that are doing the right compose, are doing the right, initialization at the right place and so on and so forth. The idea is to make it really simple at the energy for a user to write a new data flow announcements.

B

If you look at the code for many of our core data flow nouns and I'll pick the two that are most well-known and literature, reaching definitions and lightness and I encourage people to go and read those data flow analysis, because they're fairly short, I would say reaching definitions about things between 250 and 300 lines. Right now, lightness is a bit shorter, but it's deceptive and that there some of the logic, is in another class or live variable. Information, not CVP, but the core point that I want to make.

B

If you go and look at those classes or any of the other quote-unquote leaf, dataflow classes is that you won't find very much code in them. The the you'll find basically two kinds of code. One is a constructor which generally sort of establishes. What kind of analysis are you? And it is important, obviously, because you, how is the engine going to instantiate itself, to do the right operations right so so, for example, for reaching definitions?

B

If you go and look its extend, Union bit vector analysis, that's what it extends, so that that's a way of saying that what type of analysis is reaching definitions is essentially you've stated that by overriding or extending that particular class Union bit right for an hour. So so you have a short constructor, usually the the most important part being which data flow class. You extend, which real specific to your own analysis, you set go to reaching definitions of CVP, for example. You said something called the UCF info as a field in your in that class.

B

For that specific sort of thing, for your particular analysis at hand, then you have a perform routine generally with just sort of called initialize block info and very simply calls perform analysis after that. The point of this is that you do want to offer a place for a user to actually in their Diagnostics or any sort of their own logic, for what they want to do up front before they launch into the data flow analysis.

B

The data flow analysis itself is launched by a call to routine called perform analysis, which is another one of these virtual routines, and then that activates, the engine and that's it pretty much perform analysis- does the whole propagation across the 50. You don't need to know what happened at all.

B

It's just going to magically, does it for you and it generates a bunch of sets that you can then pull out afterwards and start consuming for the results of the analysis. A few other routines they're like how do you get the number of bit in case of reaching definition that returns it based on the underlying used, F info class and and then pretty soon it gets into initializing the gen and kill set for that particular analysis.

B

Now this again is one of these operations. That has to be done by the specific analysis. The general data flow engine cannot know. What is the right logic to generate gen and kill sets. So you anyways, you have to write this piece of code and even there if you look at the initialize, gen and kill set, and so it's actually not very complex at all, it just box over the trees and calls initialized in and kills that info for each node. That is encounters.

B

So it's basically a tree Walker over a block and the real guts of the logic is inside this TR colon node, a specific routine called initialized gen and kills that info for node. So that's kind of that's as basic as it gets for your data flow analysis. You have to define what the logic is for a given node. Nobody else can do that for you, so that's as atomic as it gets in this whole engine. So it's an ideal world. You just focus on that. What you want to do for your node!

B

You set up some of the sets that need to be set up. In particular, these gen and kill, set that I talked about and you're done, and the rest is taken care of by the by the super classes that that are above you.

B

So that's in a nutshell, kind of how things work that there is another complication that I wanted to mention. Just for some sort of completeness here is that gen and kill sets are generally thought of in terms of basic blocks. However, if you look at the engine, the Oh all data flow engine, it is based on our Omar's notion of structure, so it is via now the engine itself operates on structural. It does not operate directly on blocks or on CFG.

B

So the reason behind this was that we wanted to give ourselves the freedom to devise analyses where gen and kill sets are created not just on a per block basis, but in theory you could create a gen and kill set for an entire structure and- and we wanted to sort of take advantage of that, more powerful sort of abstraction and have gen and kill, sets per structure and and and therefore not have to like I was saying earlier. If you have gen and kill set, you don't need to walk trees and nodes within a block.

B

If you have gen and kill set for an entire structure, you don't need to walk any of the sub structures at all. If you have the whole thing, you just process that structure as a whole based on the gen and kill, sets that are there for the structure.

A

Or sake of completeness, could you explain what a structure means no more.

B

Ok, so there's basically two kinds of structures in terms of just.

B

The classes in cells, there's a block structure and a region structure. Block structure is just wraparound blocks in the CFG, so it's kind of the atom. If you will of the structure hierarchy, region structures are more complex, control flow, constructs or control flow structures that are built up bottom-up service week, so that a good example would be a loop. For example, a loop has a property that you could get in to the loop from the entry single entry.

B

The sort of the entry of the loop dominate all the all the cycles and all the cycle involves that entry. So there are no internal cycle within that region. If you have a cycle in that region, it always goes through the entry of that region. That's that's what a well-behaved loop is in structural terms, and and so we keep track of those kinds of well-behaved cyclic structures in the parlance of the home, our compilers.

B

Those are natural that those are called natural, loops, they're, well-behaved loops, there's a notion of an acyclic strap region structure, which is a structure that is composed of several smaller sub structures, and this is recursive by the way. So a structure, a region structure can have another region structures as a sub node within it.

B

So so a structure and a cyclic structure is one that does not have any cycles at all. It's just a collection of structures and you're defined, and the control flow between those structures is well-defined, but you know that there is no cycle there, whereas for a natural loop. Obviously you know that there is a cycle and it will always involve the entry of the region, and there is a third class in again Omar compiler parlance.

B

It's referred to as an improper region and an improper region structure is really one that has internal cycle, meaning a cycle exists or a cycle may exist which and doesn't involve the entry of the region. It's not a guarantee that such a cycle does exist, but if there is a chance that such a cycle may then we call that an improper region and we flag it as containing internal cycles, which means that it gets disqualified from certain kinds of analyses.

B

Though Omar compiler has a rich suite of loop optimizations, none of those loop optimizations will engage. If you are dealing with an improper region, all of them will get interested, at least in taking a look look if it's a natural loop, for example.

B

Maybe I shouldn't get into it and much more detail than that here, but hopefully that's good enough for people to follow along in this discussion. In fact, it's a good segue into where I was going with this. If you have, if you have an improper region, in other words, in internal cycles in the structure, it gets disqualified from.

B

The the gen and kill sets on a first structure basis. We get scared of something odd is happening here and we throw up our hands. We can't do gen and kill sets on a per structure basis. If you have internal cycles and if you have improper regions, the whole notion of structure is kind of easier to wrap around from a data flow perspective going forward, because every structure has a single entry and information and sort of the data that is flowing in comes in through that entry point into the structure.

B

The whole notion of data flow becomes much messier if you're dealing with backwards flow, because a region, even a block, really is defined or can be defined a single entry, multiple exit so going backwards. You have multiple places to enter the structure going backwards into block. There are multiple successors that can get back flow back into the block and so on. So things get pretty hairy.

B

If you're trying to do gen and kill sets for structures with improper regions with multiple sort of successors leading out from the from the structure existing the structure, there's several possible entry points flowing backwards into the structure in that sort of situation, and it becomes very difficult to create gen and kill, sets to operate for backwards flow in an LC. So there's an attempt at doing that in the code, I would think the partial attempt, and then it was abandoned, because there was two toggles and messing up the code too much.

B

There is also an attempt done to do gen and kill sets for on a structure basis in the forward bit factorial and that is enabled and work, except if there are improper regions, I think it throws up its hands in that scenario and doesn't work, but otherwise for well-behaved structures. It does try to use it. We were originally when we did that sort of exploration.

B

We were expecting more savings in compile time than we got ultimately, and some of our early results also led us to sort of get pessimistic about this whole structure based gen and kill, set approach, because the savings simply didn't match up to the complexity that we were introducing in the code, especially in the backward case. It's what led to its outright abandonment, but even on the forward case.

B

I, sometimes wonder if it's worth the complexity that we added in the code that there is, but it's there now and it provides a small savings, I think from memory we'll only around the ten. An issue percent savings in the time that we spent spend in the forward vector analysis code. So so wasn't huge, there's a yeah! So if you, if you didn't, have any notions of wanting to do this for structure based again and kill in theory, I think you could have written the dataflow engine on a block basis.

B

You didn't really need to go down the structure road if you weren't actually going to go the whole way. So you could say, we've done part of the way and we are getting some of the benefits, so it's still what's keeping the engine in terms of structures, but strictly speaking, you could have just written it in terms of blocks and CFG as well.

B

If, but then you would have closed the door firmly on ever exploring this sort of area again, the other point I will say there: there are some optimizations within the engine in particular, the structure keeps track off. If the last time it was analyzed, it keeps track of the inset that came in. So the information that flowed in whether that's forward or backward. You can just visualize it as the inflow of information into that particular block.

B

It keeps track of the last inflow information and if it sees the same inflow the next time, then it will just produce the same out set, which it also keeps track off until it's kind of a cache that it maintains. So so it doesn't even need to operate on the gen and kill set if they're present. Obviously, if they're not present, it doesn't need to walk the the trees or the nodes at all. It just says it's the same input provided I'm and analysis that generates the same output.

B

If given the same input which I believe most of our analyses are in that category, if not all of them are in that category, then just take the previous outset and you flow that along. So it's actually given that optimization as well.

B

It's it's kind of quite effective at saving compile time anyway, and it maybe that was the reason why we didn't get an as big again when we went to a gen and kill set on a full structure basis as well that those kinds of little optimizations 8 into 8 into the advantage. Now, in terms of compile time itself, we don't find these bit fixer and our C's classes show up prominently. I would say: I mean obviously they're there.

B

The data for engines are very big, compile time hog at higher up levels, but at lower levels, like warm I, think we only run one or two passes that require data flow analysis. Obviously, it depends on your definition of warm that should go with for your language in a normal sense, but in some of the front ends that we are in some of the up strategies that I've seen it runs no more than one or sometimes even at warm, and it doesn't run at all and I'm not mistaken that forward.

B

So, given that a large number of methods compiled in big applications tend to be warm compiled or lower op level, you don't see the data for houses very prominent, but as you go up to higher off levels, this code is more critical and you run several passes of things like youssef analysis. You run several passes of things like lightness and then all of those other data flow and all these also come into the mix a little bit more at the hierarchy levels.

B

So, let's take a look quickly at anonym that English, although I'm, not sure how accurate this is or how up-to-date this has been kept. But there is a file called Omar data flow analysis, denim, which lists sort of all the analyses that I'm going to say used to exist at one point in that hierarchy and it's kind of nice, you can see them all in one place.

B

So there's a if you go to that file as thanks Stan today I'm just going to talk about how it looks at the moment there's a reaching definitions as available expressions which I, don't believe, exists in the code anymore, but it only exists in the venom could be wrong about that.

B

Then there is a bunch of data flow analysis that have to do with the pyaari analysis of global anticipate ability earliest, as delayed there's latest as isolationist. These are all data flow analysis that are dictated by the partial redundancy elimination, optimization that only runs at heart and above generally then, there's liveness, there's a bunch of these very generic subclasses that I talked about that aren't really analyses of their own like basic DF, set analysis and so on, but roll it there. You can be kind of it.

B

All itself should be self-explanatory, given what I've said so far in this. In this talk, and then there's this thing called exception. Check motion, which is the one I talked about with a list based data flow analysis and a few other sort of more conventional tree based data flows like a flow sensitive escape analysis, which we run only on power because of bad platforms, unique weak memory model considerations require us to eliminate their memory fences more aggressively, so we're willing to pay the cost of a flow sensitivity.

B

Analysis on that platform in particular, we have that analysis listed here as well.

B

The reason I'm iffy about this file and taking it too far, is that I know there are some analyses that are that have been added for the gist on stock replacement, infrastructure or implementation. There's an analysis called always are definite or certain life range analysis. Those are those are off or analyses that initiate dataflow analysis.

B

So it's a bit like that. There's a wrapper analysis class, which then invokes say in the case of voids, are definite. It invokes. Reaching definitions in the case of OS are live range analysis in MOOCs liveness from inside that analysis. So you could say that they're, not new data flow analysis as such, but they're doing data flow analysis and then using the results and massaging the results in some way. So look at the set of the mostly complete that I would say, rather than the most accurate one, at least as I look at it.

B

It doesn't seem to be that way. I will maybe stop there and not get into use def info in much detail in this talk and reserve that for the next one and instead ask if Philip has questions or there are areas within this that he feels I should get into more in the basic data flow analysis. Section.

A

Thanks EJ, so what is the difference between the mature good container.

B

I believe so that's not tied to the data flow engine aside. These are separate classes that are just offered in the Omar compiler. The bit container is just one a bit vector that has only one bit set guaranteed to be only one bit, and so it doesn't actually allocate the Omar compilers factor class. It doesn't instantiate that it just internally I believe stores the bit the number of the bit. That should be said.

A

And what kind of tracing is available for the data flow engine.

B

So there is tracing around now the question I have to go back into the code and look at what how much is enabled by default under prod versus how much is enabled only under debug, when many of these classes were implemented early on in in the compilers lifetime.

B

When there were many bugs in this sort of area, we used to do debug bills quite regularly and and that used to enable a bunch of code that was under if def debug, so I know for a fact that there is tracing that you get for specific sort of those leaf. Classes have tracing off their own for sure, so things like liveness and things like reaching definitions will print out their own information even under prod, but the base engine itself.

B

It may have its tracing hidden under debug and, if that's the case, really no real reason. We are able, if trace statements elsewhere in the compiler I. Don't expect it to be such a big, compile time hog that it just cannot be done. In fact, more broadly I think we should take a look at the debug everything under SDF debug and either deleted or or enable it by default and just deprecated that for the most part.

A

So, if I have the high level correct to implement a new data flow analysis, basically, what you need to do is pick one of the base classes like the DF set or the forward the upset I guess conceptually modeled towards what you're trying to do, whether an intersection or a union and extend that class and then implement the the custom logic for the gen and kill sets that's pretty much. Basically, it.

B

A

B

And yeah, the other thing you have to select is you know the whether you want to use a bit container or a bit vector that that's yeah. You have to pick that as well as you have to be. Actually you have to be specific about whether you want to use a union DF set analysis or an intersection. Df set analysis or backward Union, DF set analysis or a backward intersection. Df set analysis. Those are the only four choices from the bit vector family that you should be extending.

B

You shouldn't be extending super classes above that, because it leaves it and it leaves things ambiguous if you think about it like if you just extended DF, set analysis, for example, we what? What is the operation right like? Are you saying it's a union, or are you saying it's an intersection that stuff is left unclear? Which means you fail to inherit a bunch of routines that you would have inherited otherwise and I would have made it convenient to interact with the core engine.

B

You could I guess extend one of those higher level classes if you wanted to, but then you would have to re-implement some of the same routines that exist in union or intersection or backward Union or backward intersection. Isn't really no good reason to do that? I! Don't think anybody does that there's really those four that you should view as the extension point, that side of the hierarchy.

A

That make sense, I guess when you're implementing the custom logic for the gen and kill, sets I, guess you're working at the free level. At that point and you let the analysis, take care of the flow of the data across the blocks and.

B

That's right, that's right! So you work well again. There are API is available, I believe for doing it three at a time or doing it node at a time so softer you, like some. Some analyses may want to analyze, take matters into their own hands, analyze the whole tree chop and say: I I, don't need to use the hook, point that exists per node I'm, just gonna, take matters into my own hands and walk the street up myself and for whatever reason.

B

So we offer that flexibility, but if you're, a very conventional analysis that just doesn't have any free top level for special free top level processing doesn't have any special block level. Processing. All of the logic is basically defined by when you analyze a node, then all you need to do is extend the node specific routine, because the tree walking and the block to treetop mapping and all that will be done automatically for you by the framework. You don't have to write that code.

A

Yeah and my last question is: are there any other prerequisites to running the dataflow engine, I suppose something needs to compute the structure and the CFG etc handle.

B

That's right, so you cannot run the dataflow engine without structure the way we have designed it. As I said, I got into the reasons earlier, so you do need structure. I! Think you need anything else, though yeah I don't think you need anything else. Whether you need other analyses to do a particular sort of specific analysis that you're after that's all, left up to the specific analysis, to request and so on right.

B

So, for example, if the particular dataflow analysis that you're interested in prereqs value numbering for example, then it should be your and your specific analysis that requests value numbering the core engine itself doesn't require it. All it requires is structure and it's good, the node specific logic, the TR cocoa, node, specific logic that you have probably is where that information will surface and probably is, whereas I will be consulted.

B

So so it should be the job of that specific analysis to request other more more elaborate predicts that it needs.

A

Yeah, thank you. Lots of questions that I had so I like to thank you Vijay for going over this. Hopefully we'll come up with another document describing the beautiful engine, and maybe next time we meet. We could talk about how they use Stephan alysus uses this engine to compute uses in the desk yeah.

B

Yeah no problem I'll be happy to do that. One other point, just as host paging through the file that I had here, I'll just mention in last thing we talked about- is that then there is this other notion that we have as well of sort of these containers that a container node number pair you see if you see in the basic DF, set analysis class, there's a notion of a container node number pair. There's a class like that, and you may be wondering what that's for the reason is to begin with.

B

This analysis was based on notion that you would calculate solution for a particular block and then, but it would be, it could be different analysis in for depending on which outgoing edge you're talking about so imagine a forward analysis. For example, you may be flowing a different information along an exception edge to a cache block.

B

Then you do to a regular edge through a regular successor through a normal engine, so there isn't necessary because of the nature of the block definition single entry, multiple exit, you actually may have multiple outset and so other class that you see in the code contain a known number pair, tries to encapsulate that. So it says for this, we take a node number I think successful.

B

This is the solution, so it has this container in it as one of the two fields. It's a pair of the node number for the successor and the container, which is the solution for that. So, rather than allocate a large array is almost always going to be very sparse in that particular block. Only has a few successors generally, rather than have an entire array for all of the possible blocks that exist in the CFG and have it be sparse.

B

It's just this linked list known no container, no number of pairs that we have and just maintaining the maintaining the solution for each of our successors, for example, I just wanted to say that, because I didn't talk about that at all.

A

Thanks for Jay I guess no problem who's. Our our session yep.

B

Hope to sync up with you on the next session.