Eclipse OMR Architecture, 20 Jun 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OMR Compiler Architecture Meeting 20180620

Description

Compiler Architecture Meeting agenda:

* Common Out-of-Line Instruction Designs [ 0xdaryl ]

Please add any comments/questions to the GitHub agenda issue: https://github.com/eclipse/omr/issues/2641

A

A

Okay welcome everyone. So today we have a bit of a lighter topic. I wasn't expecting a huge large attendance today, but I think that there's been some work. That's been happening lately on AR 64. That kind of motivated today's topic. I want to talk about the commenting up the out of long instruction generation code that exists in the code.

A

Generators for those of you that are not familiar with what out of lawn instructions are so they're, basically a means by which a code generator can inject its own control flow in the normal instruction stream, and the idea about this control flow is that it's localized control flow in the sense that, if intended to be used within a single basic block, it's not intended to branch to another block or like that you're supposed to go out. Do some work and potentially come back if necessary, without changing the control flow graph.

A

So this is actually used and it's actually useful in a number of different places throughout the different code. Generators so, for example, handling unresolved data references.

A

Automatic instruction flows need to fix up a little bit of code. That kind of thing so so this is implemented on all architectures all code. January support this on x86, it's actually called outlined instructions. The.

B

A

Initial design, many years ago began or an x86, and then all the other architectures followed, the other architectures actually are following sort of a different class structure than the NEX 86. At one point, in the past, a commoning effort was similar to what I'm proposing here, but a commenting effort was attempted where all of those implementations are derived from a common out of line and struck out of line code section base class. The the only real issue with that, though, is that there isn't really very much code, that's shared between those different implementations.

A

It's really just sort of the eight days, a very top level API that shared and everything else is, is pretty much repeated across the different code generators. So what, when you look, actually look at the functionality that that this that this code provides? If it really is pretty much identical across the different code generators and it's really just sort of the subtle nuances about each code, generator which really makes it, which is really the reason why we have to have duplicated implementations of it.

A

So what this sort of high-level epic will will will address is really an effort to to begin, to comment up those implementations and to share much more functional logic than is currently being shared and when we're looking at the kinds of features to to roll into this. Actually, this is another problem in, in the sense that, because we have all these independent implementations, some architectures do more and some do less than others.

A

So I would say, for example, that I think that Z and city six have probably got the richest implementations I, don't think they are actually implementing exactly what the other does either. So I think that there is a opportunity to kind of go through go through those Co through those implementations and understand what the best features are. I, think power and arm are are still very much a minimalist implementation and, in fact, if you look at the arm code, generator out of my instructions in general are used fairly sparingly and I. Think that's more!

A

It's not because the opportunities aren't there. I think it's just that that it's not it's not something that was ever really considered to be used in that in that code, generator and I think that needs to change, and certainly for AR 64, as we go go forward with that. We need to keep that in mind as that gets gets designed.

A

So looking at those architectures, Z and X I think that there's a number of different features that I think would be useful to have in common out of line instructions implementation.

A

The first is, you know, really the ability to handle an arbitrary stream of instructions and having those instructions map back to a particular node and the way that that's sort of implemented right now is that what actually the very first implementation of the the outer line code sections really just were intended to handle call modes, and if you look at the implementations, there's a lot of terminology being used there. That really sort of points to the fact that it's intended to be used for call modes.

A

Some of that is actually different is no longer the case, so that needs to that needs to potentially change.

A

But the the but the fact that these instructions can sort of map back to a particular node means that they can now also participate in either sort of metadata generation, and you can kind of keep them keep the information that gets generated about those instructions you can actually generate metadata for it, and other parts of the VM can potentially find that information and attribute it to those instructions, if necessary, so yeah so really being able to lay out whatever arbitrary sequence of instructions you want I think is important, I'm. Just a quick question clear on.

B

That topic yep they're actually possible to lay out an arbitrary set of instructions, because I remember doing this a couple of years back where I wanted to change some snippet based code to use outline cogent and I found that anything outside of a simple diamond was uncharted territory.

A

Well, I think it's I think the answer. The answer should be yes, but perhaps at the moment that's not the case that we don't have the the right kind of infrastructure in place to allow that I. Think aspirationally I'll get to that in a second here, but aspirationally I think that we want to be able to replace our snippets are handcrafted snippets with with some kind of infrastructure like this, but um I.

B

A

Think it maybe depends on the architecture of the Iran.

B

To occasionally.

A

Be reviewing this on PowerPC yeah.

B

It really stood, there was PowerPC and if I remember correctly, snippets I wanted to replace had either multiple branches into into the snippet and or multiple branches out back to mainline. So, okay.

A

Okay, that's can that is getting the um so the way that things are currently architected I think across the board is that the expectation is that there is a single entry point to the outline instructions and a single exit point, um and the reason for that is just the way that, when we're actually doing register assignment maybe needs to be able to capture the register. States I've had a couple of different points and restore it at a certain point as well.

A

So I think that having more involved control flow like that is going to take some thinking, but I think that's a if that's really a works well thing.

B

That I think that's something we need to yeah. So the snippet did this, because what I could tell it was actually worthwhile to do this.

B

There was efficiency to be had in entering entering this limit below the entry point and it was efficiency to be had and leaving the snippet early and expressing those sorts of things in out of line was kind of impossible for the reason we said, unless I went out of my way and made sure that, for example, Church I was branching from multiple places into the snippet making sure that not nothing could happen.

B

The register allocator couldn't do anything between the branches that way, the snapshot that one of the branches would be essentially equivalent in to the other branch, and we.

A

Would yes, and what would happen we'd have to start teaching the local register signer about when it's taking its snapshots or not. We need to need to understand the control flow within the other line code, which is getting fairly complicated. Remember what snippet actually was before to think.

B

It was an open or ibn Java, monitor, okay, monitor, enter and monitor, exits, I think possibly, and they had there were multiple paths. I, don't review that good! You wanted to jump to the snippet to do something quick and if you manage to do it, you want to jump back out. Otherwise you want to continue to do more things than you could possibly jump early and basically it was not a diamond and I.

B

Remember correctly, I think I left it as a.

B

Okay or pause Holly, you replaced it with out of line koujun that wasn't as good, but we just fixable I, don't remember the exact couple.

A

Do you know if there are other similarly complex snippets is.

B

A

B

Experience using out of line Trojan I want multiple branches into a sentence, because usually you want to test something and decide. I have to go out of line, and then you, so you can continue. You want to keep going. You have to test something else and you and you want to go out of line again. So multiple branches into a snippet is probably one of the most more common control flow patterns. You'll see outside of the diamond.

B

Okay, the other variations I, don't know, I, don't remember having to use.

A

Them: okay, the means that the use cases I'm familiar with on the next cities think spent on Iran's e4 outline instructions. Actually most of our all revolve around. You want to be able to do some kind of a call, but that call a specialized that some kind of linkage and you don't want to have to encode that by hand in the snippet. So you basically just cook up a fake note with a call note, you pick up a fake call, node and you evaluate.

B

A

Snippets Arnie and the other line instruction sequence and yeah so.

B

It really, rather than a little.

A

Powder a little bit: yeah, okay, okay, that's a good damn. It could be some feedback there think about.

A

I mentioned the the creation of metadata and the ability to do that over arbitrary instruction. Sequences right now, that is somewhat tied to certainly in the case of Java, is tied to the ability to map those instructions back to a particular node. But you also have to make sure that exception ranges in the metadata are covering the right ranges of instructions in case they they do, throw an exception of some sort.

A

One thing I've got here that I thought looked that actually looks kind of, like quite clever, is the works. Some work work that the victors done recently with was actually part of the previous one, we're actually and extending it to generate more metadata, but he also added the sort of an REI feature where it replaces the actual manual switching of the instruction screen and it really just uses sort of a scope based. It district happens automatically based on the lifetime of the mind.

A

Instructions object, so it looks a little bit cleaner doing it that way, so that might be something that can be replicated across so that I know Philip looked at that frizzy and he was interested in doing it there as well.

A

B

Of Xia's remembered they do do something where they have a recursive out of line. Coach n, so they'll go out of line to generate some code and then that code will decide it wants to generate something out of line suggesting whether whether that's good or bad I, don't know what that is something that they do or die. Remember. The nested out of line sequence doesn't actually do much as far as changing register allocation, so I think it's one or two instructions that may or may not even have any registered dependencies.

B

So they'll change register straight so that just might be a special case that they got working, but it wouldn't have worked on power because registered snapshots, I think there's only one. You could really only have one snapshot exit right line, so you wouldn't be able.

A

To do that, x86 is certainly being bitten by the internal control flow Regents, where they get nested and there's only one snapshot yep at the outer level, and then they register assignment is all screwed up inside it'd be really nice. If we architecting it, we could get some kind of assert or something to make it obvious when somebody has generated a sequence like those sequences, often work functionally just by fluke of the mr. finer, but if there was a way to meet the code, generator actually violently assert.

A

When it's put into doing one of these strange things that I think it counts, the depth I'm not mistaken, it allows them that looked like code, but when you create a label for internal controls, law doesn't increase increments from depth. Counter I never found one, but that doesn't mean it doesn't exist, I'm not a while since I've or it may have existed at one point, then, maybe that's what I'm thinking of the latter is. There is a counter I, don't know that.

B

It kept DEP, please not on power. Okay,.

A

But anyway, I guess I'm just suggesting that if we could maybe just try and structure things to have some more asserts and things about the invariance that we expect for these, like you know, if it's single entry.

B

A

Make sure there's a mysterio that that, and you know we could, even if it's just scanning the instructions, the instructions once they've been generated to look for branches into the middle of it or something right. You may not enable it by default all the time, but it would be so helpful to be able to force the compiler to verify the invariants that we're expecting cold, ok, paranoid code, Dan I guess we could cornering.

B

It didn't really make sense in why you would want method out of line code, because once you go out of line, what does it mean to go out of line from their data? Blind code? Section is already outside of normal control. It's already going to end up somewhere at the bottom of the method in cold code. What does it mean to go out of one again? Why don't you?

B

What I don't know I'm talking to the degree guys I think it was more? You wanted to be able to you reuse the same.

A

B

In and out of line contacts, as well as an inline context, so you wanted. You wanted to go to not care whether it was being invoked out of line because it wants to generate something out of lines with. You could call the same code from inline from an inline sequence and generate something out of line, and then you could also call it from an out of line sequence and it were generated and it wouldn't have to care. I mean I, think that that was a motivation.

B

I remember, which seems reasonable, but ok.

A

I mean yeah I think that with the bit with the right kind of design, I think you can support that sort of thing and whether or not guess we should have some ideas or whether that's something you really want to do. But if it's not hard to support, then.

A

Why we couldn't do that.

A

The so speaking of the snapshots, the way that it works right now is that there isn't really a even though every architecture actually goes ahead and takes the snapshot and restores the snapshot. The there.

B

Isn't really sort.

A

Of a similar API and the actual implementation of it is different on every architecture and I think that tape managing perhaps a stack of these snapshots is possible even the logic, within that the actual process of taking the snapshot and restoring the snapshot, I think can be. You know perhaps comment into the machine class in some way that it can be shared.

A

The big challenge, I think with all that is and I get to that a little bit later, but one of the bigger challenges is really and one of the reasons why things are different across all the different architectures is because of the way that we we tended to use the real register enumerations right, so we're looking for spelled, registers we're looking for no regs we're looking for certain registers and we're iterating over the range of known registers.

A

Those enums typically are tied to a specific code generator and because of that, every Cogan ray ends up with its own implementation of it. There is some specialty code that some architectures have to take care of, like, for example, in Z I think they have to care about. You.

B

A

Have to take a bigger snapshot like they need to take the HP or the hyper-v. The high word registers there might be some condition registers as well. That sort of thing, so there needs to be some kind of architecture specialization, but I still think that the bulk of it could still or a lot of it hopefully can still be the eco mode on.

A

The management of the out of line instruction list, I think, is also something that should be shared again. Everybody implements it in the coach in their own code, generator how to iterate over the different sections. How do you swap the instruction streams? That's all sort of independent.

A

Authority exclusively in the code generator as well that it shouldn't have to be that way, and then the other may be a sort of final feature that I just thought of was a graduation. We Eunice was saying initially was just being something that's actually going to be replacing the the handcrafted snippets that are there right now. I think is really sort of the the ultimate goal of a lot of this. Just so that we can deprecated that that technology I mean those snippets were generated. A time like a long.

A

The snippets were around from the very beginning of the Testarossa architecture, where it wasn't really possible to do the kinds of things that were that we're doing with this with this technology right now, but that time is perhaps long asked and it's probably worth deprecating those again.

A

Any other sort of requirements that you think would be useful to for any use cases of this least of this instruction stream that you know you wish you had in the past, and you like I, know other than what you just mentioned. You can think of.

A

Yeah smell, we can just add it to the issue later, a pretty good starting point here: okay, so some challenges with this with really coming up with a common design and I think that it's sort of encroaching on something. That's there are a number of different parts of the these code.

A

Generators are kind of all intertwined with each other and they think that I mentioned here that it's going to have a bit of a viral effect and I think that one of the things that it's going to run into right away is the fact that the registry dependency mechanism for the way that you actually the mechanism that we have for specifying a mapping between virtual registers and real registers on a per instruction basis. At the moment, that is also implemented per code generator as well and I. I.

A

Think a big part of the reason for that is also because of these register emails right and the fact that it's mapping a virtual register to a real register. That's only known to a particular architecture.

A

So we had over the years when this was still a closed source project. There have been a couple of attempts that we architecting to register dependency mechanism. It didn't I, think it didn't exceed. It didn't proceed for a lack of for technical reasons, I think it really just didn't succeed for priority reasons. It just wasn't worth the the effort to go about again. There are a lot of places in the code that actually doing these register dependencies. It.

B

A

The effort to go about changing all that at that time, but that doesn't necessarily mean that it shouldn't be done. We've had you know almost 20 years of experience using the current mechanism, so there are undoubtedly lots of.

A

We have a lot of experience with them right now. What works? What doesn't work? What are the features we want we're looking to choose? We don't want that kind of thing, so I think that there's a place for we architecting those in the future. I know that Philip has a few ideas on that as well that so what I think I'll end up happening. There is a lot to create a new issue, an epoch for that, and we can have others waiting on on perhaps shaping the design for that beautiful Ike.

B

A

Forward yeah, so if we could start that discussion, because one of the things that Victor and I are going to be looking at, is that there's a slew of new AVX instructions on x86, how re-registered form and the current x86 register assignment the register liner itself can't cope with three register forms and even how we're going to put them in is a bit of an unknown, but we need to.

A

We need to add that capability first for the GPRS, but eventually we are also going to need the adds support for the K flag registers on the new AVX vector instructions so for the conditional registers on those when the advance kvx encodings. So it's not an immediate priority, but it is then looming on the list of things. The x86 code gen is going to need to do to stay competitive in a rather than later so and in the worst case worst case, you would just hack the rich dependency mechanism.

A

That's there on x86 right now, yeah Victor has hacked it and you know yeah it does you know you can make it do things.

A

The thing we found just with our initial experiments with that was just that some of the three register forms have a slightly higher execution cost of things like that, and some of the savings of the three register forms are to allow the greater freedom in the register assignment something.

B

A

The code generator just isn't capable of exploiting at the moment. Oh it's a strategic thing that needs to be done in that code generator over the next wall. Okay, so I think one of the things that's going to feed into. That is what I mentioned earlier about the the fact that the reason that we have a lot of this sort of unique implementations is because of the register with a real register in ohms right.

A

So this is basically describing in an enum the the registers that are available on a particular architecture and they're they're, typically like PowerPC, specific or Z specific or accessing specific and really in some cases. The only difference between the various architectures in terms of terms of the implementation is just those those enums. And how do you iterate over those registers in that in that genome and that bit there's a lot of very similar logic that that should be shared.

A

So I think that we need to find a way of providing a generic form of those well, basically taking the enum out of it and providing a generic form of that, so that we can share a lot more logic. We also need to be able to share things like like. There are certain special enum flags like. Is it a spilled registers at a no wreck? Is it a things like that that that should be shared, because those are also duplicated across the different across the different architectures as well?

A

So I think we need to think about how to do that and that'll probably fit into the right depth design and we feed into this design as well. So.

B

One thing I'd like to mention.

A

B

A very old conversation or meeting we had on this is that a lot of times you'll find code that just a lot of codes dedicated to just figuring out how many dependencies you need. So you look at certain evaluators and half the code will be trying to figure out how many arriving at number of dependency that you need checking this that the other to come up with a number and then the code will be repeated to actually generate the instructions that use it.

B

50% of the evaluator is code dedicated to coming up with number 10, because it is almost waiting right and the other approach was just allocate the biggest register dependency conditions, as you can, and all the code generators where machines had max number of registers going to happen. You could do that as well. I.

A

Have seen both yeah there's, certainly a trade-off brains of logic, and so the the beam initial and I think it's still the same. Design of these register dependencies was to be very, very tight on the memory usage right, and so you have to tell it exactly how much I mean, and it actually goes and expands the there's a struct that it uses, and it just allocates at the end of that. However, exactly how much memory you or how many dependencies you say, you're going to need, but it really makes it in can.

B

A

Wasn't fine for very simple cases, but it really makes it complicated when you have some of the more complicated controller. All of that to count the number of dependencies you need and all the different conditions. So it may make more sense if we have, if you relax out a little bit, and it really just allow you to add whatever you want.

A

Of course, we have to be able to to control the memory but to control the consumption of it, but I think that there are ways of doing that, but I think it would simplify a lot of the code for sure I.

A

Don't know if this is outside the scope, but one of the things that caused some consternation for a while on x86- and it's probably still suboptimal- is the handling of condition codes off of some of the instructions. So like okay, a great example would be like you have some trees or something where you do an ADD and then the add is going to be compared to get zero. Then you're, going to branch conditional right now on x86 be ad will set the zero flag if it's zero.

A

So you don't actually have to generate the compare because well you know, and you can then just branch conditional directly on the ad and at the moment, I believe that the backwards walk to find the last instruction that set the condition flag and then isn't the instruction that I wanted. And if it is, then I can skip this thing and blah blah blah. That always struck me as exceedingly close I.

A

Don't know if there's a better way of doing it, but it's one of those things that always made me cringe when I had to go and had another case of patterns and flags, but to it to to get rid of redundant tasks and redundant, compares and things because there was an operation just for instructions before or whatever that had set the flag based on exactly the thing that I needed yeah, and you also had to manage the insertion of instructions that potentially would mess up your flags. But.

B

If you had already.

A

Calculated you I've got a test. Yes and.

B

A

Going to put that whatever I mean because I remember, we've had bugs like that in the past yeah, for that it may be outside the scope of this particular discussion, but I just thought: I throw it out. There is another one of those things that was very surprising when I found it in rather annoying ok,.

A

Ok, so in terms of the I think those are the biggest challenges that are that are sort of facing this right now and I think that if we can actually genera size, some of the registered concepts that I think that it's going to be it's going to make this effort a lot easier. I'm I'm also thinking that at some point in the very well in the distant future, there are some opportunities of actually sharing some of the register assignment logic between the the different architectures and one of the ways.

A

One of the things you need to get there even is to actually have a more genericized notion of what the register names are. So this is going to be a step in that direction as as well.

A

But you know one thing at a time.

A

So anyway, so I wanted to propose the re-architecting of the out of line instruction mechanism. I've got an issue open for that. Certainly invite any feedback, any comments, any suggestions, any you know, use cases that you've had in the past and.

A

Yeah by all means yeah, would you mind adding victor on that? Just so he's aware of it definitely is. I think he may have some useful contributions that confession you've.

B

Got a lot of ideas about.

A

How things might be better? Okay, yep.

A

Right I'm going to throw this out as a Help Wanted kind of task. It is a is a bigger piece of work for sure.

B

Important, nonetheless,.

A

Okay, that's the only topic that I had for today, and the other discussion want to have on on that. Well,.

A

I'm, a particular design.

A

A

All right, okay! Well, if there's no other discussion on that, then I will pose the call met and talk again in a couple of weeks thanks. Everyone.