Eclipse OMR OMR Technical Talks, 8 Mar 2018

Previous Meeting

⏯

youtube image

►

From YouTube: OMR Compiler Architecture Overview 20180308

Description

This is a quick tour of the compiler technology in Eclipse OMR in action. It provides an overview of the overall architecture, IL generation, high-level optimizer, and code generator all in the context of OpenJ9.

A

Okay, perfect now I'm just going to share my screen.

B

I'm going to play from start that would help the floor. Thank you.

A

Can you guys coño sign? Can you see my screen that I'm sharing.

C

If they can, okay.

A

Perfect, okay, so I'm going to get started here.

A

Okay, so when we we had an initial call a couple of weeks back where we we talked a bit about the the plans for an arm 64 code generator in in Eclipse om are the people that are sort of participating in this project are coming from a variety of different backgrounds in terms of experience with compiler technology and general experience with OMR experiences arm.

A

That kind of thing, and what I thought I could do is to is to start the level set most people on sort of the basic concepts of of how the how a back-end in Willmar actually works and how a more technically Willmar compiler technology actually works. So so what I thought I would do today is really sort of start from the beginning and and then sort of take you through a very high-level overview of the Testarossa compiler architecture. That's available in OMR.

A

It will have a specific you'll, have a particular bent toward open j9, and the reason that that's useful, though, is because you can talk about it in an actual context, as opposed to just very high-level concepts of the compiler technology. I can actually tie it to something so.

A

So yeah so I'd like to take you through from a high level. This will be very some of you will probably find this very familiar already. So I apologize for that, but but I think that, as we get into this more and as we start to have a few more discussions around this technology, it's going to get a lot more deeper and a lot more relevant to to to you so anyway. So let's get started here.

A

Ok, so just to start off with a very big picture of the the compiler technology, as it sits in a much bigger context. So the bigger context in this particular diagram here is open j9 and, as you can see, the compiler technology itself is just one part of this much much bigger picture and the the key thing here is that the compiler technology doesn't doesn't sort of sit alone. It actually does need to communicate with, with with the virtual machine. That's actually running regardless of whether or not this is open during on or not.

A

It does need to communicate with the higher-level virtual machine and it does need to talk to the runtime in some way as well. So this is this is where you can kind of get into native code when you're talking about Java through J&I or some tooling interfaces with JVM ti. One thing missing on here as well is you know several conversations of the compiler might need to have with the operating system itself and the means that we have for going through there and there's actually an arrow in this diagram.

A

It's actually quite old, but the the means that the compiler uses for that is to actually go through. Something called a port library and a port library is really a layer. It's an abstract layer that sits between virtual machine technology and the underlying operating system.

A

So, regardless of the OS that we're running on, we just sort of have to code to a particular API, so that's sort of where the compiler sits in in the broader context, let me just say that I'm but I'm, hoping that this kind of meeting today will be very informal and I'm.

A

Expecting you guys to interrupt me at any time to ask me and question you want I want this to be very interactive, so I just want to talk for 90 minutes by all means, speak your mind and ask any questions that you that you might want to ask okay, an equally old diagram on some of the architecture within the actual compute with the component itself and again this does have a bit of a Java abayas to it.

A

But the way that sort of things work within Testarossa is that you really need to sort of start off with the core technology really sort of sits within it with it, with an API that talks to a particular runtime system. So in this particular case here we've got. You know on the on the left-hand side here in blue, you can actually have sort of you what we've traditionally called a front-end interface to the runtime technology, and it's that front-end interface is what the compiler really asks.

A

A lot of questions about about the environment in which it's running and it expects to get answers back so, for example, questions about the object model or, if there's sort of language, specific semantics it needs in order to do optimizations or if there's information around debugging or tracing and that kind of stuff.

A

It really needs to talk to these to these front ends, and there is sort of this generic front-end interface that different consumers of this compiler technology can go and and specialize for its own for its own use and I mean I've got a few examples here around you know: we've got the ahead of time compiled for Java, there's like an embedded Java there as well. We've got other technologies that are built on this that have nothing to do with Java.

A

You can really define these as you like, and then sort of the first step in the the overall, the first stage in sort of the compilation process, and it's something that every compiler is going to need to provide in some way is what we call a Niall and il generator.

A

It's called IR generator here, but throughout the code, if you ever look at the code, you're going to see il gender you're used quite quite extensively and what the isle generator is really there to do is to take some representation of the thing that you want to compile in the in the representation that a particular runtime has it and converts it into an internal representation that the Testarossa compiler technology can deal with, and the the internal representation that Testarossa uses is is a tree based format and when we don't really have a great name for it.

A

People call it trees, though, but but that's kind of what it's. What it's referred to as the trees, or something like that. So every sort of language runtime is going to need to provide some kind of il generator for for producing the initial representation.

A

If you, if you've, been sort of following the Omar project as it's been evolving since it's been open since that, since the the compiler technology was open sourced in September of 2016, one of the one of the advances that we've actually that we've actually provided and something we're very heavily invested in putting a lot of work into improving, is something called JIT builder. And what JIT builder really? Is it's a much more simplified interface from your language runtime to the compiler technology?

A

So it really simplifies the process by which so the process of producing an IR an il generator. It really sort of takes that out of the equation and allows you to map the the things in your language runtime to the compiler technology, much easier, so that in itself is an entire presentation that we can talk about at some point in the future.

A

Ok, so once you get past the the isle generator you, we have a variety of high level optimizations that are really based on the the trees that can be performed. The the compiler technology actually has a very rich suite of optimizations they're, actually fairly powerful optimizations. As well, I can't remember exactly how many are actually in OMR have been contributed to Omar, but I think between Oh Maur and open j9 there. There must be like 120, if not more optimizations, that that can be applied.

A

The way that the optimizations work is that they consume trees and they also produce trees as well in in the most theoretical sense. You can reorder these optimizations as you see fit, however, in practice- and this isn't necessarily the right answer, but the but in practice there are dependencies between some of the optimizations such that the ordering is certain ordering is required and actually that's something that we're working to take to get rid of anyways. So it does a whole suite of optimizations these.

A

These are things that, like their classical compiler optimizations like loop, optimizations, there's like dataflow optimizations, there's commenting simplification. Very you know very bread-and-butter kind of compiler optimizations. There are also more runtime specific sort of optimizations that are available, so things like removing texts, if you like array, checks for example, or if there are ways of removing synchronization.

A

That sort of thing there's optimizations like that, there's a bunch of speculative optimizations as well, and we have- and this technology when, coupled with with with open j9, has the ability of doing a lot of sort of speculative optimizations and has compensation paths when those conditions are no longer no longer valid, so very powerful stuff and then, once things flow out of the optimizer, you go into one of the backends that you're for the process of the that you're attempting to target the main code.

A

Generators that we have in Omar right now are for x86 and PowerPC, and system Z and 32-bit arm. This particular chart here shows even more SH for in MIPS we used to have a deeper set of backends that were available, but at the moment in Willmar, it's just just the four you know, and and in terms of what we're talking about here.

A

You know we're talking about supplementing the Armco generator with you know, with with 64-bit capabilities as well so and based on our sort of earlier discussions about this is it's looking like it might be, a new code generator on its own, but the code generator is basically responsible for taking trees and and producing instructions from them and these instructions can be executed. So that's really the the path idea that it follows.

A

Now that's sort of the core flow through the the compiler. There are other components as well that may be consulted for for information as the compilation proceeds. So, for example, some optimizations do rely on profiling, information or the or they will perform much better in the context of profiling information. So there is interaction with the profile manager to get data about a particular.

A

You know about a particular opcode or a particular byte toad, and if there's something that we can, we can specialize better from that. It also interacts with a runtime in terms of you know the the when the code is being produced, and you know needing to call certain methods at runtime from compiled code that sort of thing. So when we're looking at just OMR, though I, usually don't you don't really deal with the profiler right now we're not really going to be dealing with the runtime right now.

A

I think most of the work of this particular work group is going to be spent looking at just the code generator aspect of this, but this is really to give you a higher level higher level overview.

A

Okay, so what I thought I would do is to is to start going through a fairly simple example of how a method actually gets compiled in in Java in using open j9. As an example, I think, it's really going to touch on a lot of the concepts here. So given some virtual machine I'm going to assume it's Java, what you sort of start off with is java bytecodes, and normally your your virtual machine is just simply going to interpret those byte codes.

A

So what we're really talking about here with this compiler technology is how do we take those byte codes and actually compile them at runtime into native instructions in the binary code that can actually be executed so, as I talked about before in terms of the architecture? The way that this works is that we have a means by which the compiler can take java bytecodes and through the isle generator.

A

It can produce the Testarossa trees for consumption within within Testarossa. So that's what the IL gen does and then it calls those trees flow into the optimizer. Depending on the strategy in which you want to compile those methods, you can throw different packages of these optimizations at those trees to get them to do different things. It's a very kind of iterative process as well. So you can you can.

A

You know, pass these trees through a variety of different optimizations over and over it again, and when that's done, these trees flow into the code generator and a process known as tree evaluation occurs, which basically takes those trees, Testarossa IL trees and turns them into an instruction stream, a flat instruction stream.

A

That's really based on that that really uses sort of an infinite virtual register set, and then we go through a very platform, specific register assignment phase, where we react, take those virtual pseudo registers and map them into real machine registers, and then you end up going through a binary encoding phase. It truly does a lot of uh sort of last-minute things where you actually take that and map it into a to a buffer. That's that's executable and you pass that compiled method back to the virtual machine and say: go please execute it.

A

So that's really the the process at a very high level, so I think what I'd like to do is to drill down deeper into each one of these different phases and talk a bit more about how it does the things. That is, that it actually does so. The first step is I'll generator so which is sort of colloquially called il gen.

A

So in general you know, Java byte codes are not really great for analysis, and it's not something. That's you know easily consumed or worked on within within the compiler technology. So what we end up doing is we turn those byte codes into our into an internal representation that we know is easy to work with and for which you know the properties of that il are reasonably understood and it can be used by multiple language front ends as well.

A

So what we end up doing is producing these trees and the isle gen phase itself is actually just an abstract interpreter, so it really walks through the byte codes, just like the Java interpreter would, and it's basically interpreting what it what it comes across here. So as an example, we, when we hit the first I load byte code here we end up creating, what's called a node in our in our tree, and a node is really just a representation of a particular inspect expression.

A

So we produce an eye load. Node. We go onto the next byte code we produce yet another I load node.

A

We come a comp come across a third byte code that is going to be consuming the previous two byte codes as its operands and the way that we handled. That is that we create a third node to represent the add expression and then the way that we associate the the the operands with this particular expression is by connecting, as by making the first two nodes, children of the add node. So that's what those little arrows are intended to represent there.

A

We we carry on. We come to a store node that is going to be consuming the that takes as its operand the the add. So we create a node for that and we make that Association.

A

But what we're starting to see here is sort of a complete Express like a complete statement being formed here. So in this particular case, this is really the representation of a adding B to 2a and then storing the result of that back into a and after you, you know, each of these statements really is internally within within Testarossa.

A

What we refer to those as is a tree table, and we actually create a special kind of node, called the treetop that that sort of points to the root of one of these statements and I think informally the the what what at when a tree table is really supposed to represent. This is an expression. That's got a single side-effect with it. So in this case this the the major side effect here is going to be the store into into a so that alone should get its own its own tree table.

A

You would never find a tree table that that actually has, if you would never associate a tree with a treetop. That's got more than one side effects, so, for example, it has two stores in it or it has two calls and then, for example, or a call in the store that sort of thing. It would always need to have a single side effect associated with it, and that's going to make our understanding of these trees by optimizations and other parts of the compiler a lot easier to reason about in the future.

A

So we carry on and we can predict we get to the next statement. Then we end up producing yet another treetop for the for the next set of byte codes there and what we, what we do when we're producing all these tree tops, is we actually link them all together in program order, so at the end of consuming all the byte codes for a particular method, we're going to end up having a sort of a very long linked list of these tree tops, as we see here in the next next step.

A

Now this is one long linear list, but in reality a lot of methods actually do have some some form control flow in them, and the way that that happens is that we can introduce branching into these tree tops and though- and the means by which we do that is, we actually do have special tree tops to represent where the target of a branch is going to be and in special you know, il nodes that actually represent like conditional branches or unconditional branches, and that sort of thing.

A

So what we tend to do with these tree tops is we actually start to group them into what we call basic blocks, and these are single entry, single exit blocks and there are special nodes that get created to delimit the beginning of a block. The basic blocks start and there's a special node. That indicates the the basic block end and we can go through and produce. These basic blocks from these tree tops and.

A

Basically, link them together.

A

A

So the basic blocks and then the edges that go between them basically form the control flow graph and then, if you take the control flow graph and look at the underlying trees in each of these blocks, but that's really what forms the intermediate language that Testarossa is going to be consuming in a lot of its optimizations and and in the backend.

A

So what I just said here? You know it was a Java example, but any any language runtime that wants to produce that wants to use this architecture is going to have to end up producing trees at some point and sort of the same rules around tree tops and the same rules around. You know how you know how edges are formed between blocks and the semantics of basic blocks all have to apply across across those different languages.

B

A

So once we have produced the initial set of internal intermediate language for a particular, let's say, I set up byte codes. We we can pass that into the optimizer, a high-level optimizer and, like I, said before, there's probably at least 120 optimizations that are available. It would be absolutely impossible to go through each and every one of those and it's certainly possible to do to drill down into those. If anybody has any sort of specific interests, we can do that at another time.

A

We can bring in the right people that understand each of those optimizations to talk about them, but it actually it is in between. Suffice it to say it does consume a fairly large part of the compiler technology, I'm just going to give a very quick overview of what it can do, just give you a taste of sort of the optimizations that it's capable of and how it actually goes about, manipulating certain trees.

A

But basically the optimizer consists of a variety of different optimizations that are each controlled by their own or manager, optimization manager and basically, an optimization, behaves in more or less in two phases right. It basically does an analysis of what work is is possible. What work is needed to be done, and so it's looking for opportunities and it looks to make sure that any sort of transformation that it might want to do is done correctly and it's done safely as well.

A

So once it's done that analysis, it then goes through and does the actual transformation that actually manipulates tree. So the analysis phase doesn't actually changing the trees. Anything like that because it can bail out at any moment, but once it's proven that things can that there is an opportunity, it's worthwhile doing this change and that it it can be done correctly and then it'll go and transform the trees, and hopefully your we're better off from that or we're uncovering like.

A

We may not be necessarily better off immediately, but what it might do is to expose opportunities for further optimizations that are that are that are going to be coming up.

A

Okay, so just a quick example of some of the things that we can do. So if you look at a couple of treetops in order, one of the things that can get done is a strength reduction optimization. So, basically, by analyzing this code, we can figure out that we can replace the multiply with with a shift instruction. So let's go ahead and do that and then a very, very, very frequent optimization that occurs in this code is something called commoning, and it's really like a common sub-expression elimination, common sub-expression, elimination, optimization so I.

A

Look what what that means here in the context of these trees is that we can see that we've got two I two integer loads of from from some variable, and we don't necessarily need to do this thing twice right. We only really need to to perform this operation once so.

A

What we can do is assume that the operation is going to be occur is going to occur by the by the first by the first tree, and then we simply just say that we can just comment up the what we call commenting up the the operation under the subtract node there. So basically, that operation doesn't need to be repeated.

A

We already know that we've done that, there's other rules around how this can actually happen, but from this very simple example, we're just we're just performing that straight-up, but the resulting trees in this case are very likely going to be a dag. We don't allow cycles in any of the trees that we do form. I guess I should have mentioned that earlier.

A

So another optimization that we can do is something called copy propagation. So, looking at this sort of the next set of optimizations that we can apply here, we are producing the results of some expression and storing it into a variable a and then the next treetop is coming along and actually loading that a and- and what we can basically see is that, instead of loading that a for memory, what we can actually do is to just basically replace it with the result of the I.

A

Add right, it's fairly straightforward copy propagation there, and then we can use like a simplification, optimization as well, which kind of looks to see. If there's you know unnecessary trees or operations that we can sort of fold into a small of small unit. As possible, so in this particular case we look at we're doing and like the text on the right says, we're doing a it.

A

Seen the it's seen the pattern, a plus B minus B, and it knows they can simply replace that with just an A, so getting rid of the B minus B and that's what this is doing.

A

The next thing that it can look at is okay. Well, I've got a store to some variable, followed by another store to that variable. It's a local, so I know that the first store or the the second store is really going to subsume the first one.

A

So I don't really need to do the store at all in the first case, because the second one is is going to overwrite it so I simply replace that with a tree top and then we all, we have another phase called dead trees, elimination that can actually go ahead and remove dead tree tops. So all in all, that's going to end up happening is that this tree is X groups very.

B

A

Oops we're going to simplify this down to as small unit as possible and we can keep going and going and going applying as many you know, optimizations as possible oftentimes from the apply optimization. It actually uncovers opportunities for other optimizations and then we sort of iterate on that. So the optimizer is a fairly big topic so and it's not really the focus of what we wanted to talk about with this work group. But if there's any ever any interest in anything in particular there we can.

A

We can certainly give a much deeper dive into that, so right so where we go next is it is into the code generator and through a phase that we call a tree evaluation, just a a quick thought about the code generator.

A

So, ultimately, its role is really to produce binary code from from il that's a lot of steps happen between you know, il being introduced and when the binary code is actually produced and what we do is we've actually broken up the code January into a variety of different phases that can be customized by a particular language runtime or the architecture that you're running on so on. The far left here I just happen to list all the phases that are available in the current Omar compiler. Not every runtime is actually going to use these or not.

A

Every architecture is even going to use they you just sort of pick and choose from these different phases.

A

In the middle column, I've got an example of how how particular architecture can provide their own phases that make sense to that particular architecture and on the far right is sort of an example of a package of phases that that a particular runtime could actually do, and in fact, this package here's the sort of the default one that an Omar code generator would would use unless it was actually overwritten by something by a project downstream. This is sort of the default ordering of the the phases and one of these phases.

A

The first phase is the is this tree evaluation and it's really the thing that's going to be taking the the aisle trees and turning them into sort of this flat instruction form that can be that can be taken further into the code generation process.

A

So the word that gets used a lot around here is evaluation and the sort of the act of inspecting a Nile node and then turning it into an instruction is called an evaluation and, like the quote, there says it's to generate instructions which, when executed, will cause the trees value to appear in some virtual register. So the result of evaluating a node is to produce a sequence of sort of virtual instructions that will that will use sorry, whose results will be a virtual register that represents the value of that particular tree.

A

We'll see that in this a sec, the the virtual registers that are used to come from an infinite set, so you can use as many virtual registers as you need in order to accomplish the thing that you want, you don't have to worry about limitations of your particular architecture that you're targeting this. In this particular phase, you use an infinite set of registers.

A

A

So I'm just reading that that bullet point there, if a note, produces a result consumed by other nodes, a net result of the evaluation is a virtual. Registered ler represents a result yeah. So that's! Basically what I, when I just said, I was just reading that to make sure that that was the case.

A

So basically some rules around evaluating nodes. So this is really done in a in a post order, traversal fashion, so the children of a node are always evaluated before the actual node itself. So you always go from the bottom up.

A

If, as a result of evaluating evaluating a node, it produces the result in a register that node must set what we call setting that register on the node itself, so you're, basically associating that virtual register with that particular node.

A

If we ever get into the code- and we will later I- don't mean today, but if we ever get into the code, we're going to see that happen a lot, you can see that setting of the register on the node a lot one thing that it does, that there really guides the whole process here in and tries to Jen like make the cogeneration as efficient as possible, is by looking at the reference counts and by reference count.

A

This really means is how many, how many instances of a node are really seen in a particular program, and it can you know we can use that information to make decisions about. Is this the first use? Is this the last use of a node that kind of thing, and we can make decisions about like when registers died when register should go, live that kind of thing?

A

So yeah? That's what I'm that's what it says there we can make reuse decisions based on you know if the reference count goes to one. You know that this is going to be a last reference. You see of this of this node and it's method and an evaluator does not usually decrement its own reference count. The parent is responsible for decrementing the the reference counter in a particular node, so, okay, so an example here fairly simple tree representing that statement there. So the way that this works is that we do a well.

A

We need to have some kind of a pointer to the instruction stream that we're working on. So this is the stream of virtual instructions. I guess you can call it that and the way that this works is we basically do a post order. Traversal like we're talking here, we go. We reevaluate the children before the before the parent, so we get to the bottom here, there's no more children to traverse. So now we actually have to do something here.

A

So what we're going to do here is actually what we call evaluating this node and again we're assuming an infinite supply of registers. So what this really turns into- and this is it's basically a load- so I'm just going to use a very machine generic kind of pseudo assembly here. This isn't any particular architecture and like that. But what it's really doing here is loading the the contents of a into some virtual register that we happen to call our 2:03.

A

So the the evaluation of a load results in that particular virtual instruction and since the result of that is stored into 203 into virtual intergroup into register, 2 or 3, we can actually set register 203 on that node. So anytime in the future. You look at that node. You can ask. What have you been? Has a virtual register been assigned to you yet and if the answer is yes, you know you've actually evaluated that node in the past, and you can just use the value that's stored in there in that virtual register.

A

But if it hasn't, then this is really the first time that we actually have to to assign that here. So it's a first opportunity for it. So anyways we do that we go to the next child and we basically go through the same process of evaluation.

A

In this case, it's loading B into into register, 204 and again what it does is it replaces it sets registered 204 on that node and since we're out of children for this particular node. We now go back up to the parent, the add, and so what the add is now going to be able to do. Is you know, since both of its children have now been evaluated? It basically replaces it. Does the operation that it's supposed to be doing so it's interesting an add instruction that stores its result into a virtual.

A

That's in this case two or five of its of its two children that have been evaluated, so we just did two or three and 204, and then we keep going here right. So we now get on to the IMO. The AMA hasn't evaluated this, the it's it's subtract child. Yet so that's what we're doing here to subtract itself.

A

You can see by the arrows here that it's the children of the subtract are already 203 and 204. We don't have to re-evaluate those nodes because they've already been evaluated into a virtual, so the subtract node simply just takes its children from our 203 and our 204, and it needs to store it into a new register 206 and we carry on up the tree same thing for the multiply we've done. 205 you've done 206.

A

We need to put the result of that into new one 207 and then eventually, what happens is that our 207 value is stored in two variable. A.

A

And we can carry on down for the rest of the for the for the rest of the instructions in this method.

A

So if you happen to look at a so, there is a lot of tracing and logging information that can come out of the compiler. This is sort of a composite of a different parts of that log that you can end up producing I just to make it fit onto the slide here.

A

But what this is really showing you here is before you evaluate it's showing you one particular tree top and and then the nodes underneath it's what it looks like before tree evaluation, what it looks like after we've gone through and just did the process that we just did and what the actual underlying instructions look like. So how this? How you can read this is that the column on the left that starts with an N every node has got a unique ID associated with it and n 2 3, 8 n, for example.

A

There is really just a sort of a convenience way of writing it for for someone who's debugging some code. Here, it's just the how we refer to that particular node.

A

We can generate actual addresses of those no data structures, but it makes these logs look very, very busy and it in an often cases it's the they're harder to read. So we do use the sort of the simplified form here. The next column over is the reference count.

A

So in this particular case you see that the if I compare less than note as a reference count of 0, that's good, because it's a treetop and treetops should not actually be common, and then underneath that you see like it's, got an eye load with a reference count of 4, which means that there are. There are there's this there's this use of that I load, but then there's three other places that actually are referencing, this particular eye load.

A

So it's been common three times and then we've got another eye load and then a lorry d5 with their own reference counts.

A

What you can see is that, after we've done a tree evaluation, the the reference counts on these nodes actually get decremented. So after we've gone through this, the eye load actually has its reference counts. Decremented from four to three I load goes down to 0 and then the eye load goes down to a load, goes down to 1 and that's for the natural result of this and then the result of these evaluations is a virtual register and and after we've done prevail, you a ssin.

A

We can now report which register has been set on each of these nodes. So in the case of the eye load, we know that was evaluated into GPR 64. So we can actually now report that when we print these in the logs and then a little bit later, you can actually look at the virtual instructions that are that better be produced as a result of a value, those trees, and indeed they are. You know a very flat. It is a flat instruction representation here. The column on the left here actually refers to the instruction address.

A

I think there there might be a way of actually producing a simplified version of that, but in a lot of cases having natural instruction addresses is useful to see but anyways yeah. So this is a very high-level way of representing these.

A

These instructions. At this point,.

A

Okay, so up until this point, we've been dealing with virtual registers and the problem is that when you actually do want to generate this, for a real machine like x86 like arm whatever that real Hardware doesn't have an unlimited set of registers- and it makes it very handy during evaluation to assume that it has a virtual animated number of registers. But at some point we have to make everything fit within the constraints of the of the the target architecture.

A

So what we do is we go through a register assignment phase, often abbreviated as RA, and what we basically do there is we map all these virtual registers to a real machine register and the way that this particular process works. It doesn't through a backward pass, so it goes through all the instructions in Reverse and and and does that mapping we deal with. It goes one block at a time.

A

We call this a local register Sun for that reason and wheel, because we only deal with a with a single basic block at a time, and the other thing about that. That's important about this kind of a register assignment technique is that it doesn't really know anything about control flow, so, which is why I can't really cross block boundaries right. So if it comes across some kind of control flow, it doesn't really know that you know it doesn't really tract register state in different branches of this control flow. It really is only assuming it.

A

It's going to be one straight-line piece of code.

A

We use a mechanism called register dependencies to specify when a particular real register must be assigned to a particular virtual register, and this is used for four for handling calling conventions like if a the first and second parameter of a call have to be in a certain real register, as the linkage specifies, or if a there's a particular machine instruction that happens to have a requirement that one of its operands be in a particular machine register like, for example, a shift instruction with that in a register.

A

Three must contain the shift count as the the ISA specifies, so we use register dependencies for that. For that.

A

But perhaps you've already discovered that one of the real problems here is that the trying to map an infinite set of registers onto a much more finite set of registers won't always 602 always succeed.

A

So what we end up having to do is you know, is the Spill registers, what's called spilling registers when filling registers to memory using memory to backup when we don't have enough registers and then having some kind of a management code to be able to determine when something's in spell state and when it's not so using memory to back up our register, signer allows us to assign these smaller number of registered or sure look allows us to map the infinite set onto a smaller set.

A

The register Center has to obey the the register dependencies as well and in the case of Java, we have to use well it. It also has another duty in real in building these register maps because it understands what's in a what's in and what we call a collective register or not I'm not going to talk really much about that right now, but it does have other tasks that they must perform during register assignment, so just a quick example on how this actually works.

A

So if you have a stream of instructions like this, that's using virtual registers.

A

Let's assume for this moment that we have three real registers in this particular architecture. This is called XY and Z and at this particular point in the in the register, assignment we're actually walking backwards, as I said that this technology works, we've already gone through and assigned stuff to our 203 and our 204. So in this particular case, let's assume that they've already been assigned to real register X and real register Y for 204.

A

So the way that this will work is that we get to this particular instruction and we see that there's one register that's needing to be assigned here. It's 207 and the question is: is there a register available for it? And the answer is yes, because register Z is still available, so we can give it register Z and we're happy. So we go on to the next instruction.

A

So here it's this an example of an instruction that has a destination like a target with the our definition and it's also got registers that is actually using. So the way that the that this works is that we actually process the definitions first. So the first register that we're looking to assign is for 207 so and it turns out, we've already got a register assigned to it so Z. So that's great, but the thing to notice here is that register.

A

207 is no longer you if you keep looking up in the app in the in the program right. This is the last use of register 207. So after this point it doesn't need to hang to the to the assignment for register Z. So after it's done this assignment, it basically says I'm done with register Z. Somebody else can use it so it'll, basically free it up for someone else to use it.

A

It's a definition. So this is going to be the first death point here. So so then the register assignment proceeds on to our 205 and our 206, so neither of those have got registers assigned to them. Yet so, let's start with 206 Z is available now, so let's use register Z and we now move on to 205. But the problem that we have here is that XY and z are all our entire register set is actually used, so we don't actually have a free, real register to assign to 205.

A

So what we end up having to do is to use the spilling mechanism to to make this work and the way that that works is we say. Okay, let's pick a register and most backends have got some heuristics to figure out what the best register is the spill. But in this particular case we chose X, that's the register that we want so, but because it was already being used.

A

What we have to end up doing here is insert a load from some spill slot into register X, because all the code beneath us had already assumed that that value is in real register X. So what we're actually inserting here is a load from some spill location which we haven't stored into you. So we now say that are 2 or 3 which had X is in a spill. State r-25 is now given the real register X and we're happy.

A

So let's carry on we're now at subtract instruction, so we process the the definition first, so just go back for sex, so our 206 was already assigned to that V and our 204 was already in Y, but.

A

Our 203 here is a bit of a challenge because it's in a spill State, so we don't. It actually isn't sitting in a register right now, so we're going to have to find a register for that and what we end up choosing is, oh, so the other thing I wanted to point out here may just go back here so are 206. Here is actually the first use of it, so it actually dies after this point, so after we've assigned Z in this case it's available for assignment.

A

So in this particular case, our 203, which we're looking for a register, is going to get register Z. But since our 2 or 3 was in a spill State, we actually have to insert a store of Z into that spill slot because a little bit later on we're going to reload it back into X. So this is where that symmetric spilling is, is coming into play here. So we have two stores the end of the spill slot and we can carry on with our register assignment. In this particular case.

A

This add instruction two or three 204 205 they're all available, so we can assign those straightaway and we just keep going repeating the process that we're we've been going through here.

A

So this particular example, you really demonstrated a lot of the well a very very world. One way it really demonstrated a lot of the sort of the things of the registry that a register signer has to deal with and I guess. That's it for register assignment yeah. Alright, so I can pause there if I happen to be any questions on that cuz I did go through that fairly quickly.

C

Darrow can.

A

C

Ask one question: absolutely.

C

I guess the register assignment process is mostly.

C

Platform-Independent but I do, except for the kind of registers or numbers of registers, but do you see any CPU dependent thing about register segment? 2 is the listed.

A

Oh yeah I think you're you're right that I would say at least 90% of you know. The DNA in the bedstraw signers are the same I think we have had some discussions in the past about commenting up a lot of the functionality between the different architectures, because you know they are basically behaving the same way. There are some architectural differences that do come into play with regards to.

A

Like the like, the way that, if there's sort of special implied registers that happen to be used within certain instructions and certain architectures or if there's special ways of choosing a register like I, can use the upper half of a reveler. That kind of thing so, but even having said all that I think that a lot of that could be could be abstracted away and we could actually do a much more general job of doing this. Now.

A

Having said that, though, it's a very it's going to be a big task, because a lot of these registers well, each of these backends has invested quite a bit in their own implementation of these, even though they are basically doing the same thing. So reimplemented this in a generic way is going to be a tedious task, but it's an perhaps an error-prone task as well. So we have to be super careful about doing that, but yeah you're you're right that there's not.

A

There are very few examples of where this actually is a where platform specifics are required.

C

A

Okay, so there's this one last little phase that we have to go through and reproducing code, and it's really where we we take those assigned instructions and actually write them out into a buffer in the terms of in terms of the compiler technology. Here we call that a code cache, and so we go through this phase called binary encoding, and this is very architecture specific. So for each of those virtual instructions, depending on what opcode that actually are, is in use, it actually produces a.

A

Binary representation of that instruction, with the registers encoded with the memory references encoded all that sort of thing, it's also responsible for writing out, though the prologues and the Apple logs in this particular function. It does relocation processing so so, for example, handling.

A

If you've got branches within a method, it needs to be able to figure out like the branch displacements and that kind of thing for the case of Java, it has to handle the ahead of time, compile relocations, that kind of thing, and it also handles things called snippets, which are really small out of line sections of code. That may be needed in some exceptional cases and for some for some for some op codes, but yeah I mean, but that's that's.

A

Basically, it I mean once we've done, that we've got a we've, got a all the code written into a buffer and that code can actually be handed back to the to the VM for execution or or whatever it wishes to do with it. So.

A

So that's really the whirlwind tour of the compiler technology, I, really just assume this is going to be the start of a lot of discussions that we were going to have on this. This is just to give everybody a sort of a flavor of of what's possible, with a bit more focus on the code generator, which is what people are really going to be interested in here.

A

I expect that looking forwards, the kinds of things that I'd like to talk about next I, don't want to do that today. I want to view that maybe schedule something next week when, when more people are available to start talking about, you know the conceptual integrity around some of these different components. So things like what is the machine class do in the code generator? What does the code generator class supposed to do? What is a register? What is an instruction? You know how to register dependencies work.

A

What kinds of things do we actually have to architect for a art 64? So, for example, how do we, how do we represent the instructions and their properties? How do we fill up the machine class? How do we design the evaluators that kind of thing right so start to have those kinds of discussions?

A

Hopefully this is just to give everybody a flavor of what of what's there but I think I've maybe talked enough I'd like to open it up for any questions. Anybody might have on on anything here. So.

A

No questions: okay! Well! Okay, if there aren't any questions, I think the next steps here, I'd like to you know, convene another call, perhaps on Monday evening or sorry Monday evening, Eastern Time, to maybe start talking through some of the other components of the of the code Jenner in a little bit more detail, like I, said before things like the machine class and the linkage class and the code generator class, and that sort of thing I'd also like to talk a little bit about how the code itself is structured in the code generator.

A

And if you look at the compiler technology, it actually uses something that we call extensible classes.

A

It's a way of architecting the class is to make them extensible and I'd like to, if you've never actually seen that kind of code before it may strike you as a bit odd. There are good reasons for doing what we did and I'd like to explain those hopefully not try to overly justify them, but but try to explain what they're, what it's actually trying to accomplish and how you architect, the class that uses extensible class in in that in that format. So I think I'd like to talk about that.

A

Next, actually might be a good thing to talk about extensive classes. So so what I'm going to do is to schedule something on Monday konasana. It will be I, guess very early Tuesday for you again.

B

A

And part of me.

C

No problem: okay,.

A

Awesome, okay, so are there any other? There are no other questions, then I think I'll. Just conclude the call right now and we'll pick this up on on Monday when we have a little bit more a few more participants.

A

B

A

Right thanks, everyone.

C

Thank you very much.