Eclipse OMR Architecture, 24 Sep 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OMR Architecture Meeting 20200924

Description

Agenda:
* Power instruction fusion support structure (#5552) [ @rmnattas ]
* Introduce an ObjectFormat class to implement call encodings (#5569) [ @0xdaryl ]

A

Welcome everyone to this week's omar architecture meeting today we have two topics: they're, actually both compiler topics. The first one is uh a bit of an architectural um uh discussion around uh power instruction, fusion support um that uh abdul rahman will take us through. So why don't we start with that? One and.

B

Abdul rahman.

C

Okay, um can everyone hear me yep, yeah, okay, uh okay, uh so for power, so we have with b10 starting. We have this new instruction fusion feature and like this support structure, hopefully allows omr to take advantage of it uh by knowing which structures can use and which, not uh if, uh if you or I can show the pr, maybe uh just.

A

To go over if you can share that might be better just so you're in control. You can drive.

C

Yeah I'll share that.

C

C

So maybe I'll start with the example just to show the fusion, uh because it has like two levels, kind of thing, so we have the normal if we have two ads, for example, the ads fuse.

C

um If the target of the first instruction is one of the sources of the second instruction, um but that fusion does have to be issued twice because there are two target registers that needs to be set. uh But if it's used same target register we it it's only issued once so. That's kind of that's. The perfect kind of goal is having it like to the same reuse, the same register in the target and having the follow the conditions to fuse.

C

So that's kind of the perfect or even or this fear, fusion also kind of fuses, but that has still to it be issued twice. um So I started with a simple design before the design. Maybe we can go over the things that the design should have.

C

um It has to support multiple instruction formats for registers, there's there's no condition or there's a condition on the register.

C

There's cases where the general register zero is a specific condition, for example, in add-in is shifted, as ben mentioned, like omr, doesn't support zero in our a and these it uses li, but that's something to consider maybe in the future, do we want to have it like having specific registers be a condition?

C

Maybe it should be in the design um immediate uh either. There's no condition it means or is there or there is a condition, because some some have a condition being 0 1 -1, I think even 63 in one of the shift, fusion conditions um also for formats there's some conditions where, like there's an ad, the instruction one can be add subtract or like multiple formats too many instruction two, so you have like two instruction. One formats specified confused with any of the four instruction two formats, so that gives like eight possibilities.

C

This is maybe an issue between memory usage of this structure and maintainability, because if we split the formats in a different list and have the structure like points to these formats, it's going to be harder to maintain like an extra pointer to follow. So maybe the memory usage is kind of still worth it.

C

uh Source register interchangeable so for target one source, two instructions with one target, two source registers most, if not all like all that I have seen, is they allow fusion if the source register? If, if the register like in the example, if rx here was in the a or in the b, so it allows both in almost all cases.

C

So that's something that we don't want to double register. If every uh fusion pair twice because like that, would be a a lot of memory, useful memory, we can have like maybe a boolean that tell us that, oh it can't be here or here and that all works for the fusion um yeah and then like we're assuming two instructions and access to the structure given like julia mentioned, that the gain from instruction fusion is not big.

C

So access the structure should be quick to not cross the break, even point of of being slower after using the structure and that's kind of break. The performance like we want better performance, not slower, um and that's the kind of the main points of what we want from the design.

C

I have like a very simple design, like kind of just to see what issues goes with it, um but maybe before that, I don't know if there's any questions or ideas about like what.

C

D

Look for well, I'm I'm just trying to get my head around like what it is that you're wanting to try and do. Are you proposing some kind of instruction scheduling and like some constraints, on the register assigner, or are you trying to like change how you generate the instructions or have multiple like I'm just trying to get my head around exactly what it is that this is proposing.

C

Yeah, so uh so these ideas are like. I started with a big whole idea, but before going into that, like that people needs a structure that tells it which instructions can fuse together, and this is kind of just the structure that stores oh. This can fuse with that or doesn't so and like.

C

If I go to an the code, for example, I have a structure which is like a fusion bear that holds instruction one instruction, two and like instructional instructions to our like format, and if I go to an example, maybe I'm going to okay, so just holding oh for this instruction, this these views, so it's just a structure to hold what instructions can fuse. Then, whatever else in omr can use this structure to know all these use, so we can do something with it.

C

So it's not optimizing anything. Yet. This is the structure that whatever feature we'll use to know. What's what things gonna use.

A

You mentioned this: is uh you expect this to be a people pass when, when do you want to do that, people pass? Is this after register assignment or before.

C

So it could be also like this structure can be used, also any register assignment to assign the best register for fusion and then for people. The idea I have is like a reorder, because the fusion window is just like consecutive instructions, so the basically whole idea was to reorder the instructions that can fuse to fuse, but it can be also used in uh and register assignment, to assign the best register for fusion.

D

Okay, but don't you need something bigger than a peephole in that case, because if the register assignment is going to be impacted, then if you want to get maximum fusion, you not only need you need to schedule the instructions as well as the registers right.

D

It becomes a larger, optimization problem because you have two variables right and if you just generate whatever sequence, you would generate and then try and do these tweaks you're not going to really like you'll get some fusion but there'd, be a lot of fusion you'd miss out on because it wasn't considered as part of the overall sequence.

C

So it should like mainly it should be an instruction schedule, but as far as I know, there's no instruction schedule. The next in mind was a bible and register assignment. I'm not sure if there's like something else um that can have like a whole holistic kind of look over it before, because when a node convert to instruction, um there's like set nodes to set instructions, um is there something else that can be done here.

A

My idea is yeah, I mean we, I mean many years ago we used to have them. We used to have an instruction scheduler that ran on power, um but that um that didn't survive open sourcing. So um it's it's not there. So we haven't had this kind of capability or needed to do this kind of capability in some time.

A

So this is the only fusion that we is do you know if this is the only fusion that we that we do right now in the back ends like does, for example, to z, do anything or anywhere else in power that has fusion.

E

Yeah we do it on z, it's just literally uh the simplest possible people pass. um It seems a bit overkill for fusion.

C

Because for power at least like in p10 like there's, maybe 600 or three, if we count the interchangeability uh as like a separate one, there's like 600 pairs, so it's kind of more larger stairs that can fuse. So it's like having a structure that holds all these and then like what other people. Whatever else can use the structure to kind of optimize the code.

A

Okay, why don't you get into what your proposed uh design is.

C

My proposed design is similar to instruction uh properties and then what it starts with. So I have a fusion format and instruction format that holds the target series once or so. That's for one format, I'm just like working on one format right now and then, like I get two of these format in one pair and that's like, and I have then a list of pairs.

C

This has many issues this like the simplest idea, because this first, if you want to act as a fusion bear, we have to iterate over everything and it takes a lot of memory and not maintainability could be a good for it. But it's not even great.

A

But you have the registers stored as int. Is that because you're using the the enum value of the real register, is that what's being stored there? So.

C

It's in eight because uh for register here, I'm having if it's minus one, it's like it's, not a fusion condition.

B

C

Okay, yeah, okay, so that's kind of the I'm hoping like to increase formats at whatever else needed um like a better idea. The guy in my head is like having a hash table because that's much quicker so having a hash table. The hash key would be like the op code of instruction, one and instruction two because like if we want this structure to be quick, because the the gain from fusion itself is not it's not that big.

C

um We need the structure to be big uh to be fast, so having a hash table to access, I'm trying to limit the search space for pairs and then like comparing and finding a pair that match or not match. Okay, if a beep hole goes through, instructions takes an instruction, it looks in the hash table. Is there a pair that matches with the one after it, for example, uh or like it goes after like a window, for example, 20 instruction?

C

It goes until if it finds one that matches in a pair it reorders it if possible, to be consecutive, so it will have the hardware kind of use it.

E

Is there any possibility of extending the instruction um metadata service like the instruction properties.

C

That's one idea that I saw um we can add like an extra property, but then an extra problem, then we would have like different lists. That's I haven't thought a lot of it, but it's a bottle to add an extra property. But how do we.

C

Like because we can point from instruction one to instruction to many instruction twos, but if we want to go the reverse way, that could be some an issue like if we can have a structure that finds in all instruction two possible for this instruction, one that can't fuse. But if we want the other way, I'm not sure if we, if it like. If it's something that we want, that's going from instruction to instruction one. But it's having general enough that it does. It would be better. But maybe we.

E

List them in your current design, anyways uh sorry again partner like in your current design, you need to list the pairs regardless.

C

Yeah, so the current design is it. It has a lot of issues just like I wanted from it. It's kind of raised the issues to tackle from this design, uh so a hash table wasn't a good idea. I think uh instruction properties. We could do something with that. I have to look over it. How exactly do we find?

C

Because we can add a property for uh instruction for, like the fusion group, for example, and then even that, like we have a huge fusion group because like, for example, an ad infused with a lot of things um in a hash table, I think well with an up code of instruction one and up code instruction, two that would like really reduce the sur space.

F

I mean if you're, using something like a hash table, you might run into problems with trying to fit that whole thing in the cpu cache. So I don't know that using both op codes and doing lots of lookups into that thing would really help performance.

C

Why, like I feel like when we have both of codes, would be fewer lookups, because it's like how many you can do.

F

So if your key is both op codes, then every time you want to check if a pair of instructions can be fused, you'd need to do a full look up on the table.

F

So you'd need to do a completely separate lookup on the the whole table every time and that would tend to just fluctuate randomly around the table. It wouldn't have any nice caching properties. Okay, I.

C

See yeah that could be a hash table, I'm trying to find the best like structure, because obviously a normal array is not the best hash table. Has its issues um yeah honestly,.

F

I think I think we might need to know more about what exactly it is you're proposing and how you're going to use this before we can decide on a design for the structure.

C

Yeah, I don't get a lot on how I gonna use it because I'm thinking of having it generally like as a general structure for people's or others to use it. But if we talk about the people, maybe that goes over the instruction for each instructions goes like for a window. Let's say 20 down. It goes to the next one instruction. It says if there's an instruction that could use with the first instruction and, if possible, reorder them, so they fuse.

C

So that's kind of the basic idea of the first. I.

E

Would say, implement that and but you'll discover a lot of implementation details that you might not have thought of um and it'll help the committers review the code better than uh just an arbitrary design that may get used in a bunch of places, but may only get used in just the people.

C

Okay, um so, and have like a symbol structure for because the people would need the structure so have a symbol, structure, work on the people and then come back to the structure and modify it.

D

Yeah yeah, I I might the comment I would make is that it would also be good to understand the benefit that it provides for the complexity and compile time and memory increases that it causes in the compiler right.

D

If the benefit is very marginal and the code is going to be extremely complex or make debugging the compiler a lot harder. I think that would be a concern right.

D

um If the benefit is marginal, I mean, if you can show that there's a good benefit, then complexity is warranted, but um I think we just need to be a little careful on the trade-off there right now, because if we design something really complicated and the instructions get reordered and changed after they've been selected- and you know that makes debugging harder and yet it may not give us anything. There's one concern that comes to my mind.

A

I was going to say much the same thing because earlier you mentioned that you didn't expect to see much much performance from from this. So the question that I had in my mind, as you were going through this, was how many opportunities do. We think we're actually going to find with with this kind of analysis because of the opportunities are going to be fairly few yep. The analysis is somewhat expensive and it probably isn't a great trade-off, and you may want to save this, for only the most.

A

You know like the hottest pieces of code or scorching methods, or something like that.

C

True yeah, um I I don't believe it's gonna like it's gonna, give a huge gain, but I'm not sure if it's gonna be like, would it be a less than a break even or it's gonna be just costing more than it uh and then it uses. So that should be something that I'll look into more and how much it's gonna cost versus how much it's gonna gain from it.

C

Yeah, I think that would that would help.

C

On top of the structure itself, we would have like other like, for example, the beep hole, so the structure itself there's no cause for writing per se, but it's the cost of accessing the structure by whatever else is going to use it was it the people. Was it the registered assignment.

A

Yeah, I mean one of the reasons that, um like I had mentioned, that we used to have an instruction scheduler um on power and on z, prior to open sourcing. I mean one of the motivations behind not open sourcing it not and not consuming it in the open products is that we didn't really measure a whole lot of benefit from it.

A

It was doing a lot of analysis, but in the end we could tolerate the very minimal amount of performance losing the very minimal amount of performance that we were getting from it versus the complexity of doing all that analysis. So I'm hoping that this feature isn't going to fall into that category. I'm hoping we might see more from it, but it certainly is something to keep in mind um as you proceed with the design.

C

Yeah the design I tried as much as I can to have it like getting on abstract to like allow future fusion bears allow future. Maybe the winter is going to expand. Maybe I don't know if this is going to have fusing more than two instructions. I don't know what the future going to bring. So I'm trying as much to have it generally enough at the same time having it like work, yeah.

D

Yeah the danger was going too general is that it's going to be more complex than it needs to be and burn more time and space than it needs to to do what it's going to do.

C

Okay, and for I'm guessing maintainability is a like trying to have maintainability is a good thing, too memory usage memory usage, like I think it's going to be like for this, like for this symbol structure, it's going to be around like the five six kilobytes, I'm not sure like I know omar is like once as little memory usage, but how much like would that be a big effect? I.

D

I think most of these come to a cost benefit trade-off, at least in my mind, but others may have other opinions right like if you want to burn 5k, but you give us 20 throughput. Well, no duh we'd do that right but like if it's uh 5k- and you know 0.1 percent throughput- that's hard to measure.

D

Maybe it's not a good trade.

A

Okay, the other thing is if, if there's certain kinds of um code, that would tend to produce more fusion opportunities than perhaps limiting it to that kind of thing. So, for example, if uh I don't know if this applies to floating point or not or if it's only just fcr gprs, but if it did apply to floating point, you know, perhaps um you know restricting it to methods that had floating point or to blocks that had floating point would be the way to do it.

C

That's a good point yeah, because even for people like I I was thinking like going over, the whole code was not the was not the best idea, like probably specific some code. That would be better yeah.

G

C

You thinking of this as a.

G

Sorry go ahead, darryl.

A

I was just gonna follow this: uh were you thinking of this as a global analysis, or were you thinking of doing this just uh on a basic block by basic block like the register assigned the local register? Signer would do.

C

um Having so much of that between the two, I won't say that I have a interview: okay, what um I don't know, maybe I'll ask. What do you think is better, I'm guessing the block by block.

E

The instruction schedule will work for extended basic blocks.

A

Yeah yeah again, it would come down to how many opportunities, like you may find more opportunities by widening your your your your the field of search by considering you know all blocks, but from a implementation point of view, it may be a lot easier just to focus on a block extended basic block. At a time.

A

I don't know it kind of comes down again to how many opportunities you expect to see and and where they're coming from how to make that decision.

G

Yeah um yeah, limiting it to heart blocks only or blocks it. Opportunities only helps with the compile time or resource usage question, but doesn't help with the implementation cost and maintenance cost aspects.

G

Those are two different types of concerns.

C

Okay, um maybe I'll start like if it's easier to go general and like easy to modify it later to a basic block or like go from one to the other. Seeing what each kind of gives.

A

It's probably easier to start um with block by block, as opposed to starting um general and then restricted to a block.

D

Yeah those those two things end up, looking very different most of the time and if you're going to go across blocks, then you have to worry about things like the global register assignment that might have happened and all kinds of other things and various control flow issues and things. So if you're, trying to keep this simple extended basic blocks would be where I would start all.

B

C

That's yeah. I have very good points here.

C

Anything has any anything.

D

Else I mean if you're considering designs, I mean there are things other than peepholes that can achieve this kind of thing. I don't know if you've considered any of those or how well those would fit with what you're trying to do, but I mean things like bottom up, tree rewriting and various other kinds of techniques that can be used to produce optimal instruction sequences if there's enough benefit to be had from the fusion right.

D

If getting it right is that important there? There are other ways of doing it that are less um fragile.

C

Yeah, that's like one of the things I like. I tried just writing a simple java code and see how the compiler is going to do like already, like reordering and the trees, like kind of move things near each other in a good way. So yeah. Maybe I don't yeah, that's a good idea, so maybe I'll look into that too.

E

Yeah, I would say for concrete things implement in a very naive way. The people will pass with as many fusions as you want and then measure see how much performance provides it's in the compile time performance it consumes um and then doing that problem.

F

Is that, as I understand, we can't know how much we'll get right now, because p10 is not generally available and even internally, I'm not sure what the status of getting p10 machines to do performance testing on is. I think this might be a bit premature.

C

Maybe so, if it's issues having the code, maybe just running and then having it like having like even going and seeing the oh, these would fuse or not how much fusion, because maybe, in the whole after the whole thing like we have oh only one or two views then like you can already see even without weekend.

C

There won't be much so yeah.

D

Right, but I mean the raw number of fusions is not like. You could implement all this and simulate how many things might fuse, but the number of fusions does not determine the performance right, because um the fusion may give a large performance benefit or the fusion may provide only an incremental performance benefit, and if it's only incremental, even if you have millions of them, you may not even really be able to observe it.

D

So unless you can measure it and justify the complexity, that's and we can understand the trade-off, but that would seem to be the key point right.

C

Yeah, I think it's more gonna show if we're not kind, it's not kind of the gain is small. If it's like only a couple, then we can see that the game is small. If it's more than we machine.

D

I guess that comes down to how much development effort are you wanting to invest in something that may or may not pay off? That's outside this discussion. It's just um true yeah, a pointer that in making an argument for this kind of complexity, and for this kind of thing, a raw number of fusions is probably not going to be sufficient for all of the different people. Who would be reviewing this and weighing up the pros and cons of the contribution.

D

The actual benefit of those fusions is also an important part of the equation.

D

E

D

You feel otherwise, please speak up, but I think that would probably be a fairly general assertion for most people.

A

Has has this feature been implemented in any static compilers yet so, for example, does gcc have r10 instruction fusion support and because, presumably, if you could get, if you could get access to a power, 10 machine they're probably fairly hard to find? But if you could- and you could possibly do, some performance runs with and without that feature enabled just to get an idea on c code c, plus plus code, just to get a general sense of what kind of performance you'd expect in kind of general purpose code, not sure if that's possible or not.

C

Okay, that's a good point I'll I'll see if we can do that, but yeah, maybe before going and implementing anything, that would be a good thing to.

C

Test, um if that's everyone's points, I think I'll have back. Thank you. Everyone, like I got a very good amount of points here and I'll look over them and see hopefully get back to you with a good answer.

B

Okay, uh thanks abdulrahman, um okay, so I think I'm up next, let me share my.

A

um Can you see my uh my browser window.

D

A

Yeah, okay, um so no formal presentation, unfortunately, um just sort of a quick um introduction to something that um I'd like to propose um I'd like to get some feedback um on it proposed design, perhaps alternatives that kind of thing, so um so the the motivation for what I want to talk about is um really to uh think about how the the code that we are currently generating in the back end can be used um in in different kinds of compilation scenarios.

A

So, though, those that are familiar with it will see that um it's really um a lot of has been really designed to work in a dynamic language concept.

A

Concept so the the code that you ended up producing lives in a code cache and it's expected to be executed dynamically. That works just fine for many dynamic languages. It works fine for java other applications that we've moved this code into. It's. It's been a good, a good solution, but when we start to think about using this in other contexts like static compilation, where you want to be able to target use it in a static compiler to let's say, generate an object file, so you want to produce something. That's um that's an elf object file.

A

um The what you would really have to do is to go around to a lot of the different places where we're generating calls or accessing data that kind of thing and generate a different instruction sequence, potentially, depending on that actual target that you're that you're producing um you know, it's certainly possible that you go around to every single place. That's emitting one of these pieces of code and and and doing something different there. But what I was trying to think of was that there was a more um a cleaner way of of of representing this.

A

So um what I came up with is um in something that I've been calling an object format for lack of a better name, um but um really what the intention here is to look at places where we are generating I'm just starting with calls for now. um This certainly would extend to data as well at some point, but just starting with calls, which is the main thing that I'm interested in to begin with.

A

How can we sort of abstract what it is that we're doing at all the different places that we're generating calls so that we can change the instruction sequence depending upon the target that we're trying to target? So, if I'm generating an object file for for elf, it needs to generate stuff that cares about the plt and perhaps a global offset table, I'm generating stuff in the in the you know in in for a jit compiler. Maybe what we have right now works just fine, so I'm trying to hide a lot of that decision.

A

Logic behind these things that I'm that I'm referring to as as object formats. The other thing that it's going to do to our current code.

A

I didn't mention this before was that we also make assumptions uh we we impose decisions on uh consumers of omr uh on things that they may not want to make a decision on so, for example, the way that the code is written right now assumes you're generating code into a code cache, and it also assumes that you're gonna need to care about trampolines uh when you're calling other methods or calling natives or things like that. So that may not be something that is important to you or even applicable to you.

A

So it would be good to hide that away so that you don't have to to deal with it.

A

So what I'm thinking of is what we can do for all the different call family of of of instructions, um so this doesn't just apply to calls that come from the il. This could be anything that is being generated, any kind of call that's being generated from the code that you're that you're generating. So this could be to a native. It could be to another uh uh an entry in another shared library. It could be to a vm method. It could be to a helper, something like that.

A

They're all different kinds of calls, and potentially they would each need a different kind of encoding depending upon the the target environment. uh This doesn't really um this. Doesn't uh this isn't about the linkage? uh The linkage is really about the calling convention that's being used in order to like from one function to call another. um This is really just about the encoding of the instruction and the things that the target environment requires that we do in order to encode that code.

A

That correctly, but certainly the pairing of a linkage and an object format is what you would need to emit the proper call sequence.

A

um So what I am thinking of is introducing something called an object format, an extensible class. To encapsulate the encoding of a function call. The idea is that object format itself will be will be abstract.

A

It's going to create it's going to provide apis that must be implemented by specialized object formats and, depending on whatever makes sense for your language environment. You install the right object format for that. So, for example, like just just some examples, I'm giving here a jit code, object format again for lack of a better name that we could use for dynamically generated code. That's executed in place, so this is pretty much what we have right now and anything that we do right now would fit into that kind of an object format.

A

You know, elf, is another possibility, macho on on mac os x, coffin aix, that kind of thing. So if you want to generate different kinds of um calls, you install the the right one for your target.

A

um I expect that the the the initial api, if we're just focusing on calls to be quite light to begin with um at this point, I'm thinking of just a couple of different kinds of calls. So one I've been calling a global function for again lack of a better name, so it's basically any kind of a call to something that may not be in the code that you're generating.

A

um So this would be like to a native or something to the vm or um something like that, um and then there's the notion of calling something within the code that you're that you're, generating or within the code cache of something that you're generating. um So that would be what I call a code. Cache function call, and it's certainly possible that uh you would get different encodings for these two different kinds of targets, but uh they could also be very much the same. They could they could generate exactly the same kind of code for both.

A

um So if we are able to provide that kind of abstraction, the the kind of work involved would be to pretty much visit every place where we do a call in generated code and change it to use uh an object format.

A

um Now, one of the things that I discovered, while while thinking about this and trying to think about how this could work across platforms, is that every back end that mar supports.

A

Even though they are very similar in terms of the logic, the way that they actually implement uh different kinds of function calls is very different and the requirements that each of those sites have to make the decision on what to do is quite different, and I mean there are similarities like they usually start with a symbol and they have a node. But uh beyond that, there's lots of other pieces of information that each back-end uses in order to make the right decision on the call to do this is.

A

This is definitely unfortunate and I don't think it necessarily has to be this way, but um just given the way that the the different backhands have evolved over time means that there's a lot of information awfully often being consulted in order to make the make a decision.

A

The reason this is important is because, if I'm trying to come up with a single api that a nice clean api that every back end just really needs to needs to implement, um there isn't really a a nice easy way to to do that um because of all these different pieces of information.

A

So the way that I solved, that was to introduce another kind of structure that these object format function call functions would take, is their parameter and I've been just calling it a function. Call data class and every architecture can populate this function called data with they can specialize that data structure the way that they want and they populate it with the data that they need in order to in order to emit their function, calls at these different sites, so in itself is going to be another extensible class.

A

I'm going to show you that a code in just a sec and you can see how kind of ugly it it starts to get under the covers um going forward. I don't necessarily think it has to be that way. I think that over time we can certainly clean up and reduce the amount of information, that's required at each kind of call site and perhaps even common up, some of that across the different backends, but just starting from the state of the world right now. It's um there's a bit of.

A

They're definitely discontiguous so um and, as I mentioned before, object format should apply to both functions and data, but what I'm really proposing here is just talking about function calls for now over time.

A

I can see that as we start to target, or this code starts to target other static contexts where we need to deal with data that we're going to need to have some kind of a a solution in place for encoding the different kinds of memory, references and, and that sort of thing, so what I'm going to do is just just to give you an idea of what some of this code might actually look like. I put together just an example.

A

Pr of of how some of this code could actually look, I'm just going to start with object, format itself.

A

Oops, here's my omr object format, um all right so extensible class, um and in here it basically just contains a bunch of abstract apis. So I've got one for emit global function, call that takes as its parameter a concrete function called data I'll show you how that gets populated on x86 in just a sec.

A

So there's two different kinds of functions that are being introduced: one is an emit and one is an encode and the real difference between them is that emits will generate tr instructions. It'll, add tr instructions to the instruction stream in order to emit the call. So this is useful prior to register assignment like during tree evaluation and then there's a corresponding one called encode, which is useful after register assignment, let's say during binary encoding or, if you're, generating snippets.

A

Something like that where you want to generate a call, but it's really just going to produce the actual binary encoding for that call they're similar, but they just have different kinds of outputs.

A

So I have two functions for that. I have one for asking for how many bytes are going to be emitted in this in this sequence. So it's useful for code sizing purposes and then I have a corresponding set of functions for recalling um functions in the code cache itself and, like I said before, it's certainly possible that these functions map to the same kind of implementation up here it really depends on on your language environment.

A

These are all pure virtual functions as well, so any any object format being extended from here must implement those just implement those functions.

A

um So that's that did I add any. I don't think I actually specialize that, but um so once the concrete class is produced, so you have a tr colon colon object format. The expectation is that you can specialize. um You can provide specialized versions of this. So um here is a jit code. Object format that that extends the tr object, format.

A

In this particular location, it doesn't actually provide anything other than just sort of the anchor point for that. um The idea is that jit code object format would do exactly what the code we have today does so, whatever the implementation is, it's been sort of repurposed into this into this into this format.

A

um The other thing that I wanted to just quickly mention that I see it here is that I've introduced a new directory here called object format. It would be a similar, it would be a sibling to cogen or optimizer or any other top-level directories.

A

The main reason I did that was because um I guess it wasn't strictly required. I guess some of this could live in the cogen directory, but the number of object formats, if we're going to start to support a lot of these, is going to to to start to to accumulate so right now I've got I've, got ideas for a jit code, object format. I've got an elf object format, there's a macho object format.

A

I've got hybrid ones as well, so ones that if you wanted to have an executable health format, so you generate elf code, but you also want to be able to execute that elf code dynamically while you're generating it. So we need a different format for that, so those are starting to accumulate so I produced those in in their own. You know directory just to keep things a bit cleaner, but what I wanted to find was the x entity. 64 object format. So here is an amd 64 implementation of the jit code object format.

A

And what it really does is provide a an emit global function within its emit global function.

A

Call it does take uh this sort of uh larger data structure, a function called data, that's got um and I'll show you how that gets populated in just a sec, but what it really does is it follows the logic: that's currently there in the code to figure out what it is that I'm trying to call and how do I call it and there's lots of pieces of information that it tries to draw from in order to do that, and it can find all this information. That's in this in this data structure.

A

So if you've ever looked at the the dispatch logic for different places, it's all been sort of unified into this one blob of logic here that I'm not really going to talk through because it's all code- that's that's already there.

A

um So I certainly do think that some of this could be simplified and uh and and condensed, but that isn't at this point not the not the purpose of this this exercise. um So what this will end up doing was because this is the emit function, it'll go and produce tr instructions and then it'll return. The the final thing that it actually ended up generating encode is pretty much the same. In many cases it can actually be much simpler than this.

A

So for a jit code object format, it's really just either a call or a jump that we end up producing directly to some address that we already know about. So that's that's what it does. Currently, I didn't show you the function called data, yet I forgot to do that.

A

All data uh which itself doesn't do anything, but it's the specializations that where things get interesting.

B

uh No, oh actually, there's got to be something in here.

B

And nothing in there. Sorry, I'm just getting oriented here.

A

Okay, so this is the x86 amd64 version of function call data, so the way that this looks is there's a lot of pieces of information that may need to get passed in here or to make the the right decision and what I to try to simplify this a bit um a lot of these.

A

It provides a number of different constructors and a lot of the information is actually set uh by default. It isn't isn't required at a particular point and as we start to spread this out throughout the code, you'll you'll see that in some cases you have some of this information. In some cases you don't so there are different constructors for those different kinds of kinds of applications. Where, if you don't have the information, there will be a sensible default.

A

Option provided so there are suitable apis for calling non global non-helper calls there's some for if I'm actually calling a helper call, if I'm calling something from the snippet I'd want to use something like that.

A

So here's an example of here of where oh now, where you.

A

Have webex things in my screen here, um so the code generator is responsible for knowing what the target object format is for this particular compilation.

A

It stores the object format, and so, if you want to know what it is you you ask, you ask the code generator, so I've got two examples of where I'm using this. um The first is in system linkage where this is where we're actually doing the call to um uh to the native this is in uh build direct dispatch.

A

um So all of this other code is sort of been removed and what we end up, what we do for all these different call sites. Is we end up producing this data?

A

An instance of this data uh function call data and we pass in the various bits of information that are all being used up here to make this decision and then that data, a reference to it, is passed into emit global function, call which basically unpacks what it needs in order to for, for this particular object format and and does what it needs to do so. It should be. um I mean, if you ever wanted to change this to going from jit code to an elf object format to to a hybrid approach, whatever it should be.

A

All you would really need to do is just to change the implementation of that object format in the code generator, and it should just happen uh transparently to you. So it's really getting rid of all this extra logic and this specialized code for knowing oh, I need to call directly again directly through a register. I have to call through I'm calling this directly with a 32-bit displacement.

A

That kind of thing it's getting rid of all that and hiding it underneath this global function call um that's one example: um here's another example within a tree evaluator um some. I think it's a helper call here, so uh the the code or the initialization to call a helper call is much simpler. You just need the index and perhaps some dependencies and then and then away you go so um I'm just going to stop there just to sort of uh see.

A

If there's any comments on on what on this approach, um any any feedback, I will say sorry before I I'll- maybe open it up for questions that um the population of that data structure.

A

um I I do need to study that a little bit more just to make sure that we're not incurring uh more of a performance overhead than a compile-time performance overhead than that than we should be um part of it comes down to reducing the number of things that we actually need to make a decision on. So that way you don't actually have to store them into this data.

A

This is actually just a local object. It's stored on the stack here, so it gets cleaned up at the end of when it goes out of scope. So it's not allocating any sort of scratch memory or anything like that. The data structure itself is pretty small. It's you know it's got to be in like 64, bytes or 64 to 80 bytes, something like that, but it's the actual writing into that data structure of all these different things could potentially be problematic, but I haven't quite other than reducing the number of those I haven't.

A

Quite thought of a better way of doing that yet so, um and I'm pretty sure that this approach also works on z as well. I know that there's been some some prototyping work happening there as well, so I know it will scale beyond just x86 anyways. Stop there now and uh take any questions.

D

uh So one question for you: daryl one of the things that open j9 is interested in at the moment as part of value types being added to the java language is linkage, optimization uh so being able to carry objects that might normally need to be put into um boxed and put past as a reference in the general case being able to optimize the linkage by passing values and registers and things where they'll fit. um I was just wondering how you might see that kind of thing playing into this design that you're proposing.

A

When you say linkage, do you mean the um the uh the openg9 definition of linkage like like like the calling convention, or do you mean linkages in um like the extern and.

D

The scope kind of I mean, as in the open j9 definition, is the the way that the arguments are being passed and what the convention is between the the caller and kali.

A

um So is the linkage now going to be uh specialized by call site, then? Is that what you're saying.

D

It's not necessarily by call site, it might be by target. So if we've compiled a particular target in the gym- and we have a jit entry point that will accept the um value type being passed in registers and the interpreter entry point will have to accept it boxed, but that basically, rather that we would have these different um depend like you would have to you would have to do. You may have to want to do different things depending on.

D

If you know you had a compiled target that you were going to, for example, and that would deal with um that would deal with some of that.

A

And would you always know the linkage at compile time.

D

I think doing I think, doing an optimized linkage. You would be able to know at compile time right like if you didn't know what compile time you would go with some.

A

Linkages. Okay, so if you speculate on the target, you know what you're going to be calling you can. You can then therefore speculate on the linkage as well um yeah I mean I think that um uh uh so this deals with the encoding of the ins of the actual call, but uh you know linkage, uh any every call site is actually just a pairing of the linkage and and um and the object format.

A

So um we would almost need to have some kind of a um a higher level structure that actually paired them or have them on the like a symbol, reference or something for a particular or that information on the symbol, reference perhaps um for a particular target, um but uh yeah, but keeping them as a pair. I don't think, would be uh we'd have to find the right structure for that, but I think we could. We could do that.

A

Not sure, if that answered your question or not.

D

um Sort of yes, and no, I guess I'm just trying to figure out how we would try and avoid you know like an ex an explosion of combinations that may end up being poorly tested in some situations. Right. Just because, um like the value types, implementation is going to want to pass values in register by preference when it's possible, but in general, because the interpreter treats everything as boxed.

D

You know there's there's kind of these different conventions that we have to have, and I know this is very open, j9 specific, but I just thought it might play into some of.

B

D

A

Mean well from from a testing point of view. I mean one of the things that we could do on. Every call site is to not just assume the linkage is whatever the code generator tells it like. We have to get a site-specific linkage or target specific linkage, but we could certainly um I mean you could certainly do some testing based like you could you could provide.

A

um You could certainly change that linkage then, and and and test out different combinations. I, I would think.

D

Okay, all right thanks yeah, it isn't really an objection to your design. It was just something that that the open, j9 community is looking at, and I was just trying to figure out how it fitted with what you were. What you were saying.

A

Yeah I'll give that a little bit more thought as well, um and just uh just to clarify my mind, but your what what you have in mind but uh um yeah. I think that there are some extension opportunities just be even just beyond global function. Calls extending this for for linkages as well. I think could uh um could happen so.

H

um Thanks for what it's worth off the top of my head, um I think that so in the open, g9 case of value types, um I suspect a lot of the work will. We will want to have happen.

H

In the linkage.

B

H

So is it most of the information we will need, um barring significant refactoring, I believe, will be in the il not in the instruction stream a lot of the information we need to do. Those optimized um calls with value types, um so my guess would be that all of that sort of handling would probably stay in the linkage.

H

um Whatever kind of handling would probably have to happen at a higher level than this.

H

Just because I suspect that by the time we get to this level, we won't have enough information to do the proper decisions, so we, unless we somehow try to like propagate all of the information, all the way through that, but that would be costly. I would think.

D

Yeah, I'm not quite clear yet in my head, how it would fit together. So I was just mentioning it because it is something that people.

E

Are going to be wanting to look at and if we're playing in this space.

D

It kind of makes a bit of sense just to think about it, but I I think you could also be right. I think there's just some dimensions that we should all consider in these different extensions to what we have.

H

Yeah yeah, no great, it's definitely something we think have to think about. My gut feeling is that it should happen at a higher level, but yeah we we'd have to think about it, some more to really say for sure.

G

H

If I can try to briefly summarize kind of the design you have here, um so you have object format that is an extensible class right.

H

Then um it it's an extensible class, but it kind of acts as an interface, and the expectation is that we then have various realizations of of that interface for the different kinds of object format, and each of those will also be an extensible class.

H

Yes, um okay, so we have an extensible class that extends another extensible class. Then yes,.

A

I mean this isn't the first time we've done that, but.

H

um We're actually following here.

A

Yeah, I mean the actual use of an extensible class right, the it's only the specialized classes at the moment that are making use of the fact that it's extensible, like they're the ones that are providing the specialization for architecture or project whatever.

A

If you look at from what I've seen so far with just the object format base class, I haven't yet in in the way that I've done it found a need to actually provide any specialization beyond what's in the base class.

A

So perhaps the very base cost doesn't need to be made extensible, but we're just trying to make it as flexible as possible. I suppose.

H

um Okay, yeah, but.

A

I am worried about over here.

H

It was going to be my sorry go ahead.

A

Well, I see I I I don't want to over engineer it if I don't have to, but it's I I am providing the flexibility points here. Just in case there is some implementation that needs to provide some specialization to the base class. I'm not sure if that's needed yet, but um I don't know.

H

um Yeah, okay, okay! That way, because, yes, that was going to be my next question. Just it's not clear to me at how.

H

What the value is of having both um if classes be extensible.

A

um Well, it would be a shared, I mean if you want, if a project wanted, to extend the object format in some way and share that extension point with all the other, um with all the other or share that api with all the other specializations.

A

I guess that could be one way of doing it, but again, that's maybe thinking about problems that don't exist yet so perhaps the basis.

H

Could be simplified right so the way I'm thinking about it is.

H

So I guess yeah: when would a user want to extend the base class so off the top of my head? An example would be of say, there's something other that they want to do special, that's not a function, call or data, or maybe a different kind of data that they need to handle, especially.

H

um In that case, um let's see the if you want you would need, you would only need to extend the base class if there were parts of the code that need to know about this extra behavior.

H

That also don't know which specific kind of object format you're dealing with.

H

Okay, I can imagine something like that.

A

Happening, okay, um I mean the reason I left it as extensible is because it actually wasn't that much work to leave it as an extensible class. But you know if, but like I said, I'm not, I haven't really found a use for it. Yet it's just a way of um it's just providing a feature that may not be immediately useful, but it's not a huge amount of effort to provide that feature.

A

So um if there is, if the, if the feeling is that you know, we should just make it extensible later, should we need to extend this? um I'm fine with that. But if it's going to improve the understanding of the code.

A

But I'm open to.

H

Yeah, I'm I'm just kind of trying to think it through right. Now, I'm probably going to need more time to think about this.

A

Okay, the the one piece of feedback that I was hoping to get um if there's other ideas on was if there was a different way of structuring the the api um it's like like. If you look at that function called data, it is a little bit ugly and a lot of that is hidden from developers behind the different constructors.

A

But if you actually get into that code, it's actually kind of ugly. Looking at all the way that you can you can. You can compose that, so um I was wanting to ask if there's any other suggestions on how to provide the myriad of information, that's needed at different at these different call sites to determine how to encode the how and what to encode.

A

um So that's that's one thing that I was hoping to maybe get some feedback on if there are any other suggestions, otherwise go go with this.

H

um I mean opposite.

E

H

E

H

um I was just going to say off the top of my head. um It sounds like the structure you want is an extensible variant.

H

So you have um basically a variant of different tuples and each variant corresponds to a different kind of call and only contains the information you need. um There's no straightforward way of implementing that in c plus plus um at least not in the versions of c plus we support so um yeah. I can't really think of a better approach right now,.

A

Okay, sorry, philip did you have a comment or yeah.

E

I was just going to mention um having experience porting someone to see. um I think what you have here is actually pretty good, given what we're trying to achieve. There's quite a lot of ugliness, um and I think we can.

E

We can get rid of some of that by partitioning as a I saw, some of the constructors have like 13 or 14 different parameters, or something crazy like that. um We can for sure trim down the number of overloads we have of the function called data and kind of standardize that across the different cogents, but at the end of the day, like all that the object format is doing is just querying the the function call data to example. What is the function address that I'm dispatching to or something like that? um It's doing, heroics.

B

Okay, all right! Well, thanks for that feedback, actually, that's good.

E

I personally found that the code looks a lot clearer, at least, if you just ignore the ugly bits hidden behind the function. Call data.

A

Yeah, actually, once you, when you get it down to just doing an emit or an encode function, call it uh it does seem to simplify the the code quite a bit. It makes it a bit more readable. That's for sure.

A

um So, my next step here, if there weren't any sort of glaring objections to anything, that's being done here is to is to um is to start to um introduce sort of roll this out in in in some stages it obviously needs a bit of documentation. I think I've provided some here, but I think I could do a little bit more um and uh um start to make some of the changes to the code to to to introduce it. It turns out there actually aren't that many places, at least in omar.

A

There aren't that many places right now that actually need um need these calls on x8664.

A

So that's that's a good new, a good sign, but no code generator really absolutely has to implement this right away. It's not like.

A

We introduce something and it's going to break cogens, that don't support this cogens can introduce this up there um as they um as they need it, and but I think that at some point it would be good to to have all cogens sort of converge to the same um to the same thing, uh and similarly, in downstream projects like openj9 and making changes to the various calls that get emitted from the code, cache making them use global functions as well. So.

A

B

Any other comments before we wrap up.

A

Okay, if that's it, then that's the last item on our agenda for this week. So thank you. Everyone for participation and we'll talk again in two weeks. Thanks.

B