Eclipse OMR Architecture, 23 May 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OMR Compiler Architecture Meeting 20190523

Description

Agenda:
* Concurrent scavenge read barrier patching (#3847) [ @yanluo7 ]
* Next steps for RISCV OMR compiler [ @shingarov ]
* Formalization of IL semantics [ @shingarov ]

A

Look at welcome everyone to this week's compiler architecture. Meeting this week we've got enough items on the agenda. Talita sauce is Yan. Luo, we'll be talking about a proposal for concurrent scavenge ribéry patching so I'll turn it over to yam sure.

B

So I'm here to basically make a case for concurrent scavenge, be very patching framework that we are trying to design implement. How much detail do you think I should go into four CS yeah, whatever.

A

You think you need, in order to be able.

B

To justify okay, your design, so for CSS we are trying to. We already have a working implantation where the scavenge of GC happens concurrently with the application thread. So the idea here is to instead of having a big pause, but we do GC. We do all the tracing and copy that is to amortize the effort and spread them out into the application threads. While they are running so we eliminate the pause and, at the same time we have some background threads that helping out the application threads picking up there, any slack that they have.

B

They have left for GC. So in order for the Java threads to perform the GC work previously.

B

We enforce a rule where, whenever a java application threads trying to load a reference field from an object on heap, we will implement a sequence of instructions called rib area where we will try to garbage collect the referenced object, ID field that the fields that they object, the fields point to. So we will perform some range check in my compiled code and if the range check succeeds, it will basically perform a call to the GC where the referenced object will be got collected or copied into the appropriate place on heap.

B

So currently, um the instruction that we generate in the main line of jet code consists consists of a variant of a load of the referenced field and then a compare with basically a range check, a series of compare instructions where we compare with the heat based and the heat top, and if the address board within this range, we basically fall through to the next my code, and it fails that range check. It will basically call out of line to the GC code to do the selection so want to make it faster so before so.

B

One way to make it faster is we thinking of basically patching out this piece of code, which is all over the generated code cache um by replacing them with no-ops. um So.

C

You're talking about replacing multiple instructions or one instruction.

B

I'll get I'll get into that later, so that overall, the idea is replacing somehow replacing them whatever they have in the main line. We replace them with Noah and.

A

The reason you want to make them faster is because you've perceived and measured performance, Jana.

B

We have so much yeah. We have some data that sort of support our um one piece of data is in the in the runs. The Jimmy runs that we.

D

B

We measured how long the application is under the CS mode and how long the application is not in the CS mode.

B

So we found out these the CS mode, meaning the starting from the GC star site, TV, star pounds to the end of GC and Paulo. So, basically a GC cycle, the duration of a CF cycle and the.

A

Distance to be clear, that's the period of time where the range checks are actually doing useful work. So those comparisons with heat-based are doing useful work during this window, where CS is active, it.

B

Has to be clear on that yep, so the duration, where the CS the range check, the construction is actually useful versus the duration of each end is actually not useful. The measurement we did on X and on Z point out to a duty cycle for the instruction where the range chicken was useful was only 10% overall time.

E

B

10% time the instruction, the ranger construction are useful and 90% of time we're not under any CS mode. Ie they're not useful, we'll continue to execute a raincheck. They will always yeah succeed fall through, and so that's one piece of data that supports that. Okay, we are trying to optimize ROI 9% near you. If we can somehow you know, get rid of the cost of the extra path lengths and the extra code cache waste for 90 percent times. That should give us the and some win.

B

We hope um the other piece of data is um within the 10 percent of the time when the raincheck instructions are useful, um we do the arrangement we do. The rain check rain check fails. We call out to the GC, so only 10%, of that 10% of the time we actually needed to call out to this GC. So that really tells us these two pieces of data I really tells us.

B

You know the cost of the time where we spend actually when the GC is active, are actually not very, very high, so that sort of it reinforces our conviction that we should be doing something where the CS is not active. Where 90% of time is I. Think.

A

There was another data point, that's worth bringing up right, which was B so when, when you, if you take a garbage collector, it doesn't require these range checks, yep yep and you forcibly insert them so that they will always succeed and fall through. You never have to call out the throughput penalty of just spraying. Those checks into the generated code on x86 I believe you said, was on the order about 5 or 6% yeah. Actually.

B

Yeah, so so that they'll travel with our current PS implementation in Gen Con is around 10% on the three platforms and we did experiments where we just take Gen Con read much in town and we try to emulate the effect of CF I just bring those instructions in, but they just never do anything. Just a just cause the extra path lines in the extra cash. um So just by doing that, we sort of we were able to regress so most of the 10% regression.

B

We were able to reproduce that so total 7 6 7 to 8% regression we're able to reproduce that just by you know adding those into the regular gen company. So that's really, you know another piece of data that points to just by doing this useless check, we are able to slow down quite a bit so so hands you know 90 percent and of the time we are not really doing. Much of anything.

C

Useful, do you know how much so, assuming you were patching these two knobs? You know how much of the regression is attributed just bit of extra cash, that you're thinking so, rather than inserting the actual instructions? What happens when you insert not you get the same regression or go.

A

So the knots on x86 were I believe in the noise margin of the measurement, so it was sub 1% when it was measured by McDonnell Victor. Previously so I mean he done an implementation where there was a single knob in the code where the checks with the D check would be, and with that one off it was in the noise margin.

A

You couldn't detect the knob essentially so.

B

In other words, actually another code cache is Hotlanta.

A

It's the pathway, so I think that the problems there are two problems right. One problem is that the key base has to be loaded. If the heap size is not fixed, you have to load the heat base from somewhere, there's an overhead for that. If it is fixed, there's still a conditional branch that will processor can continue speculative execution, but there are limits on what the processor can do, while in that speculative execution mode because of that conditional branch and that conditional, the speculation, is a big part of that overhead.

A

If you turn that into a knob that saves you a significant amount of time, because it essentially disappears during instruction decode and dispatch, rather than actually needing to enter the pipeline and potentially hold up the completion of subsequent instructions.

A

So there's there's a data point on that that says just spraying. A knob in the required places does not add significant. Add a significant throughput penalty on x86 on experiment. I, don't believe, is being done anywhere else.

A

Sorry, no I thought that was an important point of discussion. It's.

B

All good stuff supporting our case, um so so that's a piece of data that supports that we try to rely on in making our design decisions. So we sort of came up with the idea of basically have start strategies amortization again, so we don't want to because we know there are so many loads all over the co cache. We do not want to patch them all at once within one huge pause. That would just basically kill our response time, so we want to do it um lazily and we want to amortize the cost.

B

We try to achieve that by running by utilizing our existing GC pause mechanism. So we have a start pause where we, where the whole applications stops and the GC takes over to mark the start of a concurrent scavenge cycle.

B

um At that time, the GC will walk every javathread stack frames and basically records or the collective reference on stack and mark them as the ROO set to commence the tracing of the object reference tree and because there's a risky new mechanism to do this tag, walk we want to what is well GC threads are doing the staff walk for every map for a rigid frame that it visits. We want to patch that frame patch, that method body right there at the CSC distortion cause.

B

So that's one piece so when the pause ends or the Java threads will pick up with you a resuming execution and it will be patched but to the correct sequence where the rain is actually there so and so those methods can continue to run functionally and what? What about all the other methods. So that's where the lazy patching comes in. So we want to patch those methods that one that that we didn't execute during GC pause, GG, stop house we're going to execute after GC pause time.

B

Those methods will be patched by itself upon entry permit entry, but so what we came up with was a a sort of a small snippet of code at method, prologue, where we will check the state of the method and then, if the same up I was cleaning that in a minute, if the state of method doesn't match with a global global state, it will basically call out to some jet helper that will basically patch the entire method body.

B

So what is the state of a method if, as long as far as concurrent scavenger is concerned, a state of a method can be in one of two states either patched or unmatched unpatch, the meaning? You know we have the raincheck instructions in there for every load, that we that's reference load and patched meaning. You know those rings check instructions or replace with no ops. So at any given time, a method body can only be in one of these two stages. So so having that commotion, then we look at what the global stage means.

B

A global state is when GC signals the JVM saying I'm going to be in I'm, going to be in the CSS active cycle active period, oh I'm, in the CS inactive period, where ninety percent time is so. Basically when you have a GC start pause, the GC was set up, some variable global variable thing, I mean see effective, then it will will start running concurrently and until the gcn pause, where the global state will be flip to CS inactive. So now you have a method state and have a global state, a method state.

B

If it's patched matches the CS in active state, so the feeis is inactive. I, don't care. I can just run a no op instruction in do of the we barrier, so they match and the patch the unpatched a state matches the CS active state. So at any given time a method can be in one of the two states and at any given time. On the other hand, at any given time, a JVM can only be in one of CS, active or inactive state.

B

So with that, when we enter the method prologue, we check the method state to see if it matches the global state if they match. That means the method. The instruction the ribéry instruction is in the correct state. I can just run it without patching. You.

C

Do that, before or after the stackoverflow track.

B

Know if that should be before I believe yeah.

B

Yeah we do it before.

E

um This is a separate check, then like so, can you continue not combine it with tacos no check as in right now, I.

B

The way the way I design it right now it's a separate check, but I can certainly look into the details of that check to see if I can combine them.

E

B

In fierce mode for.

E

Example in that phase of subjecting the start or flow check, you find that nothing needs to be done whatever.

A

So I think I think one of the things that Eon was aiming for with this was the simplicity of the check so checking the GC stage or other things.

E

Like that will necessarily require.

A

You to interrogate some part of the runtime system, but the statement that's being made here is that there's there is a state that the JIT believes that the methods need to be in either. They need to have the read barriers running or not, and if that, if, if you use a bit in a control, word stored globally, that says we need them where we don't need them, then checking the state of the method is to check the method. Bodies.

A

Control word against the global control word, which it can me to loads, one of which can be done very easily because it can be right near the method body of one we global and you can do a simple bit XOR to see you need to worry about that now.

A

The use case here is for yes, but the hope is to use it for other things. I have designs on trying to use lazy patching to allow us to turn patching on for Pro to patch in profiling, code software, profiling in method bodies. So again there you would want to have a state that, yes, we want to turn profiling on, will profile. Any method that runs or some subset of methods that run and we want that state to be toggled on or off in a similar way to what we are proposing for.

A

Yes, so having a general mechanism where you're just going to load and control word and compare it against the methods state yeah, you have an extra load, but you have a lot of flexibility in having multiple different features and abled or disabled and calling out. If the body does not match the state that you expect so over. Both the stock.

F

Overflow chart you also have to take a likely unpredicted branch to get some code that will do.

E

The yeah only in the cases went to PS vernacular. Yes, so you'll have to do it on every member that pretty discovery methods going to check whether or not that fits properly see the stock. The worth no track did not fail.

B

E

Of this extra reason, if CS is not active, ninety percent of the time you're not doing any work that you weren't already doing, you're just doing the same check. You were doing before it's not going to fire you're not going to go out as one more cost if it does fail in that ten percent of the time, which is why I wanted to put it out there as something to think about whether you want whether it somehow works out to be a net positive.

E

If you think about the sequences carefully enough I'm, not saying it is the right concert saying it should be considered.

A

It's certainly worth considering. So where is the? Where is the global State you're grading it off of from a Java perspective, we're getting that off the van threat? Is that we're getting at all? What are you checking.

B

It can be me imprinted tech me, the M, trailer. Okay, are you thinking.

G

B

How did you design it? Oh I was going to available somewhere the static that you're no.

E

B

Vm thread anthro will be easier to get to condense, the condenser sequence. Otherwise, you would have more instruction just to grab it. If it's an absolute address, though.

A

You can just hard code, the address and load from the address bring.

G

Your goal is less code and don't do this on.

C

G

Give you the work for happy.

B

Ones write an absolute address well,.

G

B

Do it an IP address book and.

H

It takes five instructions to materialize on powers. Alright,.

G

E

Well, we can put.

H

G

Load is okay, talk is limited after you.

E

G

E

Return, the Google really smooth super said no.

A

But it could be stored per thread. You can update all the threads while you're stopped runs.

E

You can lose from.

B

The g-going store.

E

And copies of the.

B

Gc is going to upload it certain values in the threat already for the s. If.

G

B

All those based on top and compress base and compresses.

E

B

Does doing that? Okay.

A

We can have a control I.

C

Think from what I'm hearing is two things, one is the mechanism of invoking the feature, and one is the feature itself. It seems to me from the Omar perspective, at least the feature is being able to patch instructions and revert them back something we currently do not support, where the mechanism of invoking that logic is open, genuine specific, so I think we should design this as the feature itself, which is patching and unpacking instructions and the mechanism to invoke it is I'm totally separate.

A

Omar may wish to give some consideration to it, because, if you're going to, if Omar is going to have a notion of software profiling, which I certainly hope will be the case in the not distant future, there's some persistent method, information infrastructure that needs to be promoted into home armor before we can do that. But we have software profiling techniques that are very low overhead, that we've developed in the context of open j9 that are not Java, specific and being able to toggle that profiling on and off, at least with some default.

A

Generic implementation in all our would certainly be useful for all our languages if you could get profiling data just from having implemented the amendment. But I do agree that there are two concerns for that in.

A

F

Genome yeah does they keep basically.

A

Yeah well, the actual content to the range check will be j9 specific right. But it's the notion that I have a chunk of code that I wish to run. Some small proportion of the time and I wish to have reversible patching to get to that chunk of code and back again or.

C

Patch that right, the infrastructure should be no more yeah. The mechanism may be open to niceties of those certain extent police force. Yes, of course, it can be trouble.

A

Yeah thing to me: it depends on what design we settle on I. Think for the control mechanism. If it ends up, you know conflated with the stack overflow check and whatever happens in there and open j9. Then that may be the way that open j9 wants to do it. You may wish also to have a more generic implementation in a more so that it's kind of supported out of the box, but maybe not as efficiently as if you carefully conflate it with your languages back overflow check or whatever other things you do on on method.

A

Entry right, even you can certainly even have the Omar Viet thread. Vm thread with the control word, with an explicit control where check as a default, and you can replace that downstream a project with something similar.

C

I mean the infrastructure. Is you call a method in Omar which touches an entire Jettas body, yep mechanism, you folks that may be specific and they'll be under the different kinds of things that need to be done I, so you mentioned so you're going to patch multiple instructions on we have concerns or authenticity. That's under discussion.

B

We have options to we can do we don't want to do too many. So currently, currently I just talked about currently what we have kearney on axis. We have a load and in a compare and then it jumped basically compare and jump jump to the outline sequence where we do the rest of the reburial. So that's two instruction as minimum. We need to patch currently on x86 on Z, all the ribery sequences in line in main lines. So there's like seven or eight or something we don't want to.

D

B

That many so at least some others need to be online, but going back to the X, so we have to. um We could patch to the Reese what we're patch to provided that which instruction we patch it's atomic.

B

The reason why it's I think it's functionally incorrect. Is you patched it's basically packaging and comparing the jump we patch to jump, we just do that impaired, um I, don't think it matters is a way to compare so.

A

We patch them in so so before we get lost in not never pick Euler details, there was one.

A

There is one thing that we were discussing, which I think is important in considering the multiple instruction patching notion right so on a platform like x86, where the memory model has strong consistency, consistency guarantees right, strong ordering guarantees if we have it so that every thread every method when it enters checks, the state of the method to see if it needs patching, whether that's in the stack overflow sequence or in the main part of the prologue one or the other you're going to run this check.

A

You then we're going to run patching code, that's going to go and patch the instructions to a fixed sequence, that's going to be whatever that site is going to be, and the thread with that thread will then mark the control word as having been updated. Right update the state of the method, basically attached to this configuration, and then it will begin execution so on x86, where you have ordering guarantees if all threads are check controllers right and the last thing that you write is the methods control word, update.

A

Multiple threads could enter there and as long as the caching is sort of impotent- and it will always do the same thing, then, if multiple threads come in and multiple threads go in package, they're all gonna patch it to the same thing and execution can continue. So whether you're patching one instruction or multiple instructions, no thread is going to enter the method until all the patches have been written right, but then we're shooting.

C

Ourselves into always executing the patching logic on the execution path. For example, you can have a background thread which does attaching right.

A

So there's various dirt. So if you're you yet you certainly give you, if you're going to do the check in the mainline and if you're going to do the check on entry and patch on entry. Yes, then you and you're going to rely on multiple threads. You know going out to the patching logic so that you don't have to worry about aligning absolutely everything. It.

C

Also means you need some sort of mechanism to touch methods which are currently executing within same body, so you need an ASIC checks. I think my latest AG walk is doing yeah. So there's a stop at.

G

The road we have a stock Walker, no more carving now but I meant to describe two.

B

G

And you need both of them right.

B

Yeah, it really just gets messy so.

G

D

B

That GC and a threat could be within the within the body, was paused, and then it could be running one. Somebody on action, great.

A

And easy in right, but so easy and somebody's going to yield and somebody could still be running and patching the method. And then.

C

Yield but no, you could be executing yes within the method, while another thread gets in the patching logic of the entry right.

B

C

Could potentially overripe instructions which another set is currently executed because the GCL.

B

Where you ever.

C

Read to GC point.

B

Yeah, GC and I think it is a step up. So we don't only the stack wallet, stack, walk perpetually applies to kg start a GT end. We don't I, don't think we do that. If.

H

We were to walk the stack at DC end yeah. Then we could patch the bottle, monitor.

D

H

The stack at that time and.

D

H

Would be no concurrency issue there, but if we don't, then we do have to care about the code being I.

B

Don't know onyx, are you online? Are you on the phone? No, no.

A

But I think that, like whether you're going to patch one instruction or multiple instructions as a function of how you're when you're patching is going to link the atomicity concern is a function of whether what it is that you're actually doing when you're planning on doing infrastructure doesn't necessarily have to concern in the zone with whether you're going to do one or n.

A

Now, if you're going to do n and you're going to do it, while the program is running, then you better make sure that you align it and whatever according to the requirements of that platform. But it's also possible that you can factor the current read barrier code to not require multiple instructions.

A

So, for example, in x86 I know that when Victor was working on this, we did an experiment where we moved all the Ribery err checks out of line both of the range checks out of line and did a call and a return and I say he sinks. Because of the call return prediction that actually didn't add. Very who is unmeasurable the extra overhead of doing that and meant that we only had to patch one instruction. So we didn't even have this problem right.

A

It's not a call to an address, that's an out of line code sequence and then it wasn't a problem now on some platforms. A call in return is no good. You know your jump might be better, but again we're coming down into the specifics of what it is that you're actually going to patch and I. Don't think that the mechanism necessarily needs to be that specific.

C

I just want to make sure it's a design point that touching is potentially non-atomic yeah.

A

But I think it comes down to the utilization of the structure that requires that particular bit of care right. That's.

C

The fifth thing yes, they're, designed early I, think.

A

It provides Omar with the most flexibility and how to use it right. Consumers of it.

G

A little bit, you can't do multiple instruction patching on some platforms without having synchronization in the instruction. Oh, you can yep and that that makes it essentially impossible.

G

We'll never break it even right, because you're going to have to have I think's in there to stop the processor to make sure that they don't execute instructions in the round or even.

D

Though you wrote.

G

Patched in the right order, so I.

B

Guess what I'm saying both I don't see how you can do that? Can we do it at the end of the patching helper? We would think.

G

It doesn't matter what you do on the right-hand side on the patching side and the processor. That's potentially executing those instructions can see them in anywhere, but.

A

They all have to go to there well so in the kiln all right, but in the case where you're dealing with the C at the end we were talking about, we were talking about the cycle. Is that is ending right and you can have you couldn't you're not going to walk this back at that point, somebody can enter the method and patch the stuff out. At that point, I'll.

B

Be your daughter, yeah.

A

B

A

In there so you're.

B

A

That one, you can't hear.

B

A

um However, one question is say on power: if we have an inline, if we have the sequences a a load and then I compare Anna branch conditional, let's say we leave the loading compare on power as they are, and all we're going to patch out is maybe the branch conditional, whether that's performing or not, I'm, not going to argue about that right now.

A

But if all you are going to do is patch out the branch conditional right and you could run the knopf or you could run the branch it doesn't matter because the branch will fall through the condition check is okay, then, as long as the right is the single instruction, you either see the new version or you don't, and then you don't need the ice, and even if you see the wrong version for GC end, it's functionally correct. Right, like the load in the compare will generate a valid result.

A

Yet dwarfish are under arranged checker patching one instruction here: you're fine yep, two.

G

A

G

H

B

Dicey, right, I think two is okay. At the end, as on per instructions to top I, didn't quite understand everything.

A

Well, I think, okay, it's going to require some careful thought based on the memory models of the various platforms and the allowable patching as you. What is the sequence that you actually want to patch in around on x86? We have little bit more flexibility in the patching than on power and we may choose to patch a couple of instructions, but at the very worst there's kind of a one instruction patch that gets us most of the performance that we want, which is that kind of call return style.

A

Now that may not work on other platforms, we'll have to study the sequence in the context of the platform. The architecture may not make it possible to regain as much as the performance as we can on x86 well, we'd like to put the tool into omr to let us try and do the x86 and if it helps the other platforms, then fantastic.

A

That's kind of the argument that I think is being made here and Phillip is completely correct. There isn't concern about the atomicity, but I think that comes down to the design of the patching that uses the instruction fair enough. Oh.

B

And some Octonauts, basically the something that Phillips mentioned in the issue, was how long it takes to actually do the patch. Was it too expensive?

B

Considering there's how many you know, indirect loads are there in the in the code cache we could do something like only doing patching for loads that are only in hot blocks could cut down. We hope a good number of useless patches right.

A

But again, that's orthogonal to the infrastructure that our would need to support doing the patching it. It's a choice of where do you employ the patching right that one design point would be if the patching takes too long for the use case. You're. Looking at you cut down the number of places by sacrificing performance in places that you don't expect it to run very often, you just leave the code in the patch state or whatever right, but inside it's a choice on the correctness and the particular thing that you're.

C

A

To whether that's acceptable so.

C

Presumably, you need to store this and just persistent memory right, yeah.

A

Well, I think it would probably end up looking like a runtime assumption kind of thing which is not being factored up into all of our billing. There is the node. There is a runtime assumption table in the abstract sort of runtime assumption exist. You know more the concrete runtime assumptions that are currently user down in open j9, but the notion that you have a location you need to keep track of these things against a certain kind of operation.

A

The major difference from the current runtime design, runtime assumption design, is that the current ones are one way things.

C

We don't need to store what you're patching over or in Wrangell.

A

C

A

C

So don't have any estimate of footprint that would incur no, not at the present the no marios was concerned previously about the size of these tables in the metadata that we store. Yes,.

A

And that certainly very true for the when you're like adding a pointer to every single one of them, when you can achieve it, without doing that, this one would require a reversal. There is a payload available.

A

Not all of them are currently using it. We may be able to hijack that with a Union lapdog, because it's not currently packed enough.

C

A

Omr, so how.

C

It clean the bigger concern would be to plate to help you have to except restore the instructions you're passing over yes and.

A

I, don't think there's any way around it from reversal at different. Well,.

H

Depending on what you're patching, if you make sure that what you're patching is we're going to replace this jump with a fixed offset to enough, then you could get away with storing just the offset which could be smaller than potentially smooth. But then you can't batch compares Stachel, compares right, yep, I.

A

Mean to have a general infrastructure, then, yes, you have to store the bikes, you're, overwriting, yeah I. Don't think, there's any way around that in some footprint cost of doing rehearsal, patching kind.

H

Of like a debugger right, but the debug break instruction there, you remember what's supposed to be there and you pretend that's what they're all debate: I'm, sick, yeah and.

A

Yeah it'll have a footprint cost, but there will be a throughput game and it says it's a balancing act. Iii think that might be a trade-off. That would be acceptable in you. Jim.

E

We can check homeless, actually ends up being as well. Yep and.

A

Again, the same idea of reducing the amount of patching time, you can also reduce the footprint by ignoring the ones in low friends you'll, also going to hook the.

C

Lever or scale I guess yeah.

A

It's a spectrum.

C

Yeah, it's a spectrum.

A

The tool you choose to use it where it makes sense to do so. If you overuse tool, then you pay the cost.

C

So, as far as that, as part of this effort, will we be sinking, some of the runtime assumption table goes from open, j9 down into a more or I'm.

A

Not deaf either I'm not quite sure how that's going to look I think we'll have to actually do the engineering and then try to figure out what makes sense to factor the API for the runtime assumption table is in all more.

A

The guts of the implementation are not, and most of the guts of the implementation right now are to Java specific, to really be useful, so whether it's that we contribute the runtime assumption implementation that allows you to store this stuff and open j9 does a way of storing them, and we have some much more naive way of doing it in or Mars a placeholder for the time being, pending the rest of that stuff being refactored or something because I think factoring.

A

The runtime assumption table is going to be on the scope of what we're proposing here. You just want it.

A

B

Have a concerns.

E

Have you implemented.

B

Any of this, yet the resistance, all theoretical.

E

B

Going to presume right, we are going to, we are trying to sort of compartmentalize certain chunks of work and then we'll have a few people in here.

A

So have we built the infrastructure? No, have we done the experiments that suggest that we will get performance from it? Yes, we haven't built it. So, just looking at your your x86 code here, if you're using relevant, addressing here, you're assuming this is only 64-bit.

C

A

You can in Geneva you can do a similar effect with a pop and yeah I'll call each other yeah you can. You can generate a rip relative address on 32-bit with a small instruction sequence, but it doesn't have to be rip relative. It was if you're going to store the word in the control word in the prologue. That's one way of accessing it on 64-bit, it's highly efficient on 32-bit. You might want to do something else. I mean.

H

On 32-bit, you can just put an entire address into a memory reference. You know.

A

Your relocation on it yep sure.

H

A

It's just a relocation.

A

We know how to relocate addresses already. We can relocate the method entries yeah.

H

We have relocation records already for presents.

D

H

Are effectively within the Jay Ducote or at a fixed offset from the data code, and we use those for table switch at least yeah. This.

A

Could be another one of those if you wish.

E

So I mentioned before was that the culverton scheme that you try and Lilia didn't have much of an overhead at all right, yeah. Okay, so if you let's say it did a call, return team host and then at the second step you patched all the methods. So every method only has one call returned to the sequence that does whatever it needs to be done and then returns, and you just did all methods, because there's only a few thousand method, every single one only has one call and one return.

A

You need a call at every read barrier site, so there are any calls that have to be pageant or body right. So at the moment the ribery err sequence is specific to the read barrier because it will use the registers.

E

But the order of doing the call and the return was negligible. You would say yeah.

A

So if we, if we outline so at the moment, there's an outline sequence which consists of a range check and then a potential call to the to the garbage collector, what are we allowed and then a return and then a jump back and in the main line, there's a load, a compare and a jump conditional. We move that first load compare and jump conditional, which is so there's two range checks that have to be done. He base like collected region base and collected region pop, so we moved that first range check.

A

That was in line into the top of the out of line sequence and we have one sequence for each read barrier. We didn't do duplicate them one reach and we did a call to the sequence for that particular read: barrier with a rest back to main I put.

E

Those open calls this code, husband do those two cases or it.

A

Would complicate doing the out of line sequence because the out of lines, the the outer line sequence, relies on the registers that are live at that read barrier so where you're going to get the reference from could be r8 or RA X or whatever the one that we generate is this to the site. Now we can do a form of deduplication, but that wasn't done just jumped. We were already generating one of these stubs for every read barrier. We just kept doing that. We just move the mainline out, wait.

E

Yes, I'll modify the original comment slightly then a ESU. You can either course the the object to be in the same register so that you can literally have one sequence or you can have n sequences where and.

B

E

A huge number like this for so many registers that object happens to be a sitting in in the method ten times. The number of methods may not be such a fantastically large number of sites, but I've.

A

Added here we go so the deduplication is only saving footprint for the actual instructions that are going to run for the read barrier. The number of sites that need to patch remain unchanged is less patch each read barrier site to call that sequence. So the number of sites I need to patch are the number of read barrier opcodes. It's.

H

The mainline part of the code of a.

A

Mainline part of the code: that's.

E

Patching for the call is your question, a call, nothing but I popped. The call in the return were not significant or head I. Don't care about patching the call? What is significant overhead is the check. Isn't it now only.

H

I think so Vijay is suggesting going into the sequence that you call and just putting a return at the beginning of it.

E

Based on the assumption that the call return isn't that expensive and there aren't that many sites, if you did not it's.

A

Another design point you would need the same infrastructure training regardless yeah again.

H

Right you'd have to know what was there such.

C

A mechanism of your pattern yeah: it would still need a touching infrastructure there, but one instruction is slightly different.

E

It is really new hot watching insisted on just today. You.

A

Need a reversible by genetic structure. Yes,.

E

With but not a jump, some time reversal for patching in system I.

A

Would still worry this with a large number of compiled method, you could still have a problem where the amount of time that it takes to catch all the is non-trivial and the fact that I a that there are other potential uses for it, such as software profiling, I, don't know that the X, the small extra engineering cost small week or whatever of making sure that we can support more general case I, don't know, is miss spent. It provides a utility in Omar that could be very useful for a number of things.

A

There have been discussions about other kinds of reversible caching at certain points in time. Sorry.

E

The reversible caching is not in question, obviously, or to reverse it yeah. It's the just-in-time aspect office at the heart of my question, yeah.

A

Right, I still worry that writing that many instructions across all the methods in the code, cache still add up to be my concern because we're talking about a very short pause. What we're trying people are actively moving path length out of that positive, a disorder ensures you know putting path length into it is.

H

Situation, we may be try to estimate yeah. We might reckon kind of order of magnitude of time it.

F

H

Tank in reasonable scenario, well ones at.

C

600 right, you can even hack the GC like the stop the world via cycle. When it begins. We saw world on every thread. You can traverse all the methods in to touch it right. In the C code, yeah yeah.

B

But yeah we should do that. I agree with lenders in wisdom. Well, CSS we don't want to. We want to spend all the time you know doing all the CS work to bring us where we are and then shoot us ourselves in the foot by you know putting you know by increasing back e to keep all the time like this stored or because we do. If we, if we take the just-in-time aspect out of it, we might have.

C

B

Know increase the stop pause to the point of not being very acceptable right.

H

Again, my points more just that, if it might help support your case better, if you had like a back-of-the-envelope calculation.

F

Oh does yes, you.

H

Know, oh if we have 10,000 methods with this, many read barriers on average and every time we touch one, we miss cash. How much time does that take right and and then to have some idea of that would right. Yeah I could probably.

C

Be enough to satisfy me having only view yep, it yeah like an easy way. It would be just to put a debug counter and right before you call out to the GC green barrier, just count how many, how many methods you're actually seeing during a CF cycle. If say, you have to touch 1,000 method, whereas you're only executing 100, the just-in-time would only patch 100 methods, whereas the nan would patch an exome or 100x more I.

A

Think I think counting the numbers like that is cheap, trying to estimate the amount of time is contained to patch and sites that's harder. This is much harder to simulate because we don't have any of this infrastructure, so you have to write some of it. It needs to I.

E

Think those memory is some buoys. Ennoble exercises not taking too much time.

F

How that's slightly related question, which, assuming that adjust implying matching, is warranted as I understood it I've been distracted, so maybe I've missed some subtleties here as I understood it most of the checks that you're in, like the check that you're inserting for.

F

For figuring out, when you have to patch a method, is it method, entry and your and your we talked about overloading, possibly the stack overflow check? um Did you consider just changing all the view calendar my table entries for methods? You change it to the entry point that we would do the patching we're not.

H

There that machine follow virtually that.

F

Will be another.

H

F

That doesn't create a romantic cost.

H

Right so every method you have to.

F

Do that anyway, it sort of is.

H

Oh well, it's per you don't have to set a bit for every method. You set a global bit and the methods only the methods that run well.

A

Know they are there.

H

A

So each VM thread has a control control word that says: I want three barriers to run or solitude. Each method has a state of my Ribery errs are on or off, and when you arrive at the method, you check do I match a thread state. If, yes, do nothing, if not I need to go and call some patching code, so you're only going to update the control word either globally or one per thread. Vm thread is convenient for power.

A

You only have an small and number of those to update I.

H

Suppose it's possible that um patching every read barrier sites say it's prohibitively expensive at the time in which we're thinking of doing this, but then patching you know every prologue is not very least expensive depending on you know what kind of scaling factor you're looking at there, but I'll just I, don't know that there's even a consequence of that that we want to consider right now with all those leave it there. How.

F

Do you deal with methods that don't get executed in this like, like.

A

So their control word does not get updated. We update the VM thread to say so say we toggle it on submit is set. You start running some methods, they check their bits, nuts that they patch themselves cycle ends. We turn the control bit off then call a method that never patched its control word matches the global control word and it does nothing.

A

So the lazy aspect of this is you literally only patch, on the methods that are going to run and you only patch things out when you're going to use them, because if the method doesn't run again until the next vs cycle it doesn't need to, and in fact the unterhaching can be even more lazy, because in this case it's functionally correct to keep running the channel. So we can even defer unmatching it further.

E

So as I have to participate in my head, like what happens for a long-running inefficient, we get in, let's say you have to yeah yep now you're in the method you're executing, while PS is enabled at some time. Yes and still.

A

E

A

The moment the barriers will be left and.

E

H

Is where we left them, but if another thread enters that well, the method is running, then April payroll will be able reverse the patching. While the method is running it.

A

Requires to somebody to enter it there's, because the issue is, is that there is no slack walk that happens at the end cycle. Adding one integration for like this or pause could become.

E

The problem- that's.

H

E

Argument perhaps we're having a separate threads that was people. Turning to this.

E

Well, anyway, we're raising designers here, I think.

A

There's very specific to the use cases.

C

Just in the interest of some time and.

A

Following the agenda, I think maybe just want to cap this in does another couple of minutes, and then we move on to the other topics, because I want to give them a little time as well. In the half hour that we have left so I think that one of the conclusions that I see from this is that we like to separate the engineering after the the mechanism aspect of it from the from the use aspect of it. So when you're looking at getting done to Omar and we'll think of it, that way and.

C

I think thinking.

A

About it in the more general sense as well like not just x86 specific or anything like that, let's try to keep this as common as as possible. Okay,.

C

Any other thumbs for comments or.

E

Yen, you can always continue to add comments to the I. Have one but I. Don't.

A

A

Is it vitally important to the I.

C

Don't know I'll.

E

C

E

I'll talk to him separately or comment: yeah, okay,.

A

Okay, thanks very much Anna. Okay, next thing, I wanted to move on. To was the risk v. Back-End implementation in Omar, the Boris and yon have been working on Boris. Are you? Are you there? Do you want to talk about? What's.

D

Next, with with that, and what you plan on doing with that, uh yeah I am here, I think that yon.

I

Is here with us as well, maybe on this first topic, I will pass the mic to young because he was recently making the last pieces of progress, so he knows best where we are yeah.

J

Okay, so hello, everyone well, first of all, I would like to say that the even though I did the last bit Boris with the first bits, and they are always remember the most difficult one but yeah now as I as I said on the general channel, we like make the public curl the first, let's say version from Witcher. We would like to you know, take it as a start and then continue evolving. Now this version it is based on the AR, 64 and forth and yeah it's far from being complete.

J

But there is quite a few things that already work so as of today, there are some 33,000 failed tests that are passing, including integer, arithmetic, floating-point arithmetic, both like a single double comparison.

J

Conditional compiler, like this HTM P, greater or equal sort of opcode.

J

And calls work so we know the records recorded for Fibonacci works, Mundel both work as I said the other day, so it's in quite ok shape and what we would like now is to start the process of getting this in goo-goo MRI poet. You know there is an interest and it's impossible, and we are starting talking about this. Ok.

A

um So, first of all, I think that some, you guys have made some great progress, pretty much more or less on your own I think. That's that's really great to see and any we really would like to get this get. This contributed so I think my just given the experience that they had with AR 64. The the approach that we took with that was to essentially not drop. The entire thing in, as one large commits one logical request.

A

We did spend some time on individual aspects of the back end so that we could do a much more focused review on on the code and find problems that way. So we found that to be fairly effective.

A

So that was sort of the nature of my request yesterday, that if you can try to break this up into a number of smaller commits of of different aspects of the of this, so that we can do that focused review, because other I think that, if it all is, if it's all going to land at once, it's going to take a while to get through everything and just to make sure that you know that we we do a proper to review all the of all the different different aspects of it.

A

So I'm not sure what your your thoughts are on that are I.

A

Mean you don't necessarily have to carve up files, I think that I think you've different different files. Even right, like you could do the like. The like those of the machine class, for example the instruction hierarchy, the the opcode tables. Things like that you can sort of deliver those in little batches that that people can pour over and and make sure that they're that they're sound.

J

Yeah well, I mean the structure in terms of files is pretty much pretty much similar to the one of AR 64, obviously, which is also similar to policies.

J

As far as I seen, however, to the comment there is bunch of them, I don't know, maybe 30 the comets are structured like you know, they show the progress of actually incrementally rewriting and removing the AR six before big stuff and making it the pure risk five code. Now I am not really sure. If it, you know, I, don't know how to how we want to break it right.

J

We can we break it by file, but then you know you would have a bunch of commits that you would review it individually, but none of them would compile until you combine all of them together.

J

Well, if you know into if the idea is that we would create a comic, this would like make the impression that we started from zero.

J

Then you know I would find it quite difficult and like really time-consuming, there will be couple months to just you know, take what we have and then start like. You know copying pasting and making sure that the profiles and the to make sense I'm.

C

J

About it, I think.

C

Part of the problem we had with care towards that we hadn't configured build now that yet we have fields for from the beginning. Actually we do cross compile pills, we don't do native, we don't run natively, but we have the cross, compile double red, but we don't have the trill test. Running till test were not running. No, they have to be run manually right. Yes, so I think the difficulty is going to be.

C

How do you know that any of the review suggestions are not breaking things well, I think that it's going to so one of.

A

The things we're gonna have to discuss as well is the infrastructure. That's going to be able to tap, build and perhaps even test as I got NCI test for this right so and that might get into like first well. Do we do cross compilation to begin with, or is there do we run this on? An emulator right are.

D

A

At or do we actually add hardware to a farm somewhere that we can actually do testing with it? We.

J

Do or I do have a CI setup that runs it on the real hardware. On the other side, I Unleashed I did some experiments is course compiling on MD 64.

J

It does work to some extent, but then there is. There are some places when, when you build the innate like a tools, let's generate something that is in turns and compile and I the because I do you have very low hardware. I did in order to fix this. He makes and all that's due to build this. You know raise general courses like this to build them with native compiler and fill the rest is in the coastal file, but it it says what, if you do it so.

A

So the sea I kind of think sorry. Obviously, the CI testing that you did with that just on your own personal, where or is that on Yemen.

J

Or my table both my from my own personal, hot rod: yes, okay,.

A

So that's sort of the same situation that we have with aired 64, where we can build it, sort of publicly cross-compile it, but right now, because of the lack of availability of a rh 64 hardware in the open which we're working on, we don't actually have a CI test that actually execute the trail testing on AR 64. So I think we're going to be in a similar position with risk v in that we could possibly build it, but we can't actually run Li test until we connect.

A

We can get some hardware to bakit and I guess the question would be: where do we get that hardware from? Do? We actually need real native hardware, or can we run like qemu or something like that? Yeah.

J

You could you can do all of this in few of you I just I. Just do it in real hardware, because, first of all, I have it seconds it's AIT's faster than 300, but you know I, don't have this much powerful machine. So if you have a powerful x86 hardware, then you might be fine and for CI you know whether it build this in an hour or hour, 20 minutes. That is not much. It.

A

Okay, have you have you used the qemu for risk five comments on how well that works? Is it a reasonable? Yes,.

J

It works, I actually spent quite a lot of time on repairing set of scripts that filled the whole environment in which me and Boris develop the stuff, and they generate essentially Q an image, and then you transfer it into on a real hardware. Okay, that is I, would say pretty much straightforward, so I mean there's clip that builds with you everywhere may Chen, you just invent largely qmu and a part of the real execution machinery behind there is no different, so that could be a setup. The CI inside the Q&A, okay,.

A

So that would be an acceptable solution in your mind, at least so we actually figure out the final native solution: okay, yeah I.

J

Mean it might be possible to get the Unleashed there I don't know, maybe, but it slowly.

I

Well, I believe you can order one yeah I, think I mean we. We get ours as sort of a present from sci-fi, but that's because we are individual contributors who are just doing this. You know already.

G

I

I I think, if we're talking about getting one for IBM or something like that, they will be yeah you'll go by okay.

A

um Okay, I'm not sure what the answer there yet is. It wouldn't be a like a system that IBM would necessarily own. It would have to be something that's kind of out in the in the public space so that their public builds can actually run on it. So.

A

But maybe just circling back to the original question about how this is going to land and whether or not it could be broken up. I mean until I mean one of the first things that we probably should get set up is some sort of a at least a built environment where, at the very least you just do a cross-compiled bill of the of the of the code. Just so that we can see that it actually works.

A

We can work on adding a CI build based on qmu as well, so that we can actually so we can run some of the tests.

A

um I mean the the requests to actually break the initial contribution up was more I mean it was more like I'm not going to be too hard on that, because I'm just going to say that it may take a little bit longer to review the entire blob.

A

Like and thoroughly review each in the different sections, then it would be if we could land individual parts of it. So if that's acceptable, then then that's the way that we can go I. Also don't want you to spend two three months breaking it up. If it's, if it's difficult to do, because that would just be the amount of time that we would potentially be spending reviewing it anyways.

I

Yeah I'm, not even sure you that kind of breaking up makes sense at all, for one reason: what the young said that none of the individual commits none of those broken up individual commits, would even compile right and number two is while looking at how the code is structured.

I

That's the structure of the bestest structure, of how things are built, how things are broken up into file right there is that there is a way how the comb is organized and at every particular snapshot of the code like at each particular commit.

I

There are reasons why code is structured in one way and not in the other way. So in that sense, using a basically at a history or change management in order to express the organization of the system is just really using the fundamentally the wrong tool for day.

I

You know I, just I just don't understand how this would work even in crazy.

A

Okay, well, um why don't you carry on the way you were thinking and we will we'll make do with with with what we get.

A

The the other thing that I mentioned yesterday was I think that there is an eclipse legal process that we're going to have to follow, for a contribution of this size as well need to I believe create something called a commit. Your questionnaire, it's not really that it's not a terribly onerous process it, but about what they.

A

What eclipses really wants to do is to make sure that any large contributions are, you know, are sort of free from any kind of IP concerns, and that sort of thing, so they will actually scan the code and and do some things like that with it to make sure that it's properly licensed and attributed to a certain author.

A

That kind of thing so I will certainly help you with that, just to get just to get that through, but because of its a parallel process, you could actually make your pull request, but we just can't merge it until right. Okay, we just won't merge it until that process is completely, and we episode.

F

J

Am am ready to go through this process. Okay,.

A

Yeah I imagine yeah I would imagine that it's going to be at least a few weeks before. We could actually merge this because we do want to be spending some time with with with a code for sure so so.

F

Thank you and book love of that nature is usually very straightforward, so wouldn't expect any issues there. Okay, yeah.

C

Look, they don't a bunch of feel coded at a time. I mean okay,.

A

Okay, all right! So why? Okay, so I guess you could prepare to make your PR then well.

J

The question is whether you want to PR release the individual comet that shows how it evolved from the hr6 before Judas five or, if you want them false question, do one.

A

How much detail do you have in all those individual commits? Do you actually have a description of what the different changes were that you made, or is it job.

J

It's like you know. This is like you know this. The base of fixes deconstruction 560. um These realize.

A

Yeah, if we're not gonna, be able to go back so if there isn't much detail there and if we're not going to be able to go back and try out different parts like if I wanted to go back to a certain version of the machine class to save this, certain problem occurred whatever I'm, not sure. If there's value in that, so I guess I would argue for squashing.

C

Why don't we talk about, but the commits we watch? The flow emerge, yeah.

I

What what I think, what I think I, whatever proposed, or one one idea that we were discussing with with young once that.

D

I

Now there is several branches that this was developed in and it is kind of pretty clear what merged into what and finally like there were some smaller ones and they got merged into risk five devel and finally, that got merged into the main required branches of this morning and what I think is that they they're about what 35, 40 Cal is now on the way that separates like we're. You know where we started to where we are now and I think for the actual master branch of oil mark. That is too much detail.

I

So we were thinking about squashing that into what initials and it's saying well, this is our initial contribution into into one our and then we still keep that in the risk. 5-12 range all of those things.

I

So if somebody wants to see how ever particular piece of code both developed through the different, you know fixing this bud and oh no, we tried this, but then we've scratched it away and and like, for example, we already did half of this process, because our original implementation that we will cleaver first started with, will be on Power PC port, and there are another about 40 commits. But right now on the branch that we have, we don't have any of those experiments, even though, but we just keep them separately right.

I

They just don't. Okay,.

A

And, and just to be just to be clear when you're talking about keeping them on a risk, five branch, you mean keeping them a risk right branch in your in your personal info, even.

I

J

In my clone yeah in my Park right, we will site, we will certainly keep them at least for ourselves. We can barely wash things and, if you, if you for whatever reason want to have you know more details or see the evolution, there will still be something. We can point you to and say look. This is how it's developed yeah. It's.

I

All in the oven, like all of the steps that we did and how and you know to prove for legal, for example- that, yes, we wrote this all ourselves here is a detailed history, starting from you know more than a year ago, and it's it's yep sure we had that, but.

A

F

I

Commits now into into the main line you know saying: oh, these were all the steps that have got to respond to the required port, like it's just collaterals, okay,.

A

So the suggestion that we're going to make is that when you do your first pull request your initial four requests. You can have all the individual commits there, but when we actually merge it we're going to ask the spot in the one. So we see the history of your of your development on your initial pull request, but before we merge it will.

C

Get you to school Washington to one, and so we have a single just for context that will help with the review process. In case we need to go back and see where certain piece of code originated from that's.

J

A very good idea: yeah hi, mrs. Richard.

C

Wash before we merge, though, but if I had the mast, master bench is clean.

D

I

The only thing here is: huh can we do a review on on the well brain stem? Well, you know that the removal for.

J

The reason why, yes,.

D

J

The JV is a risk. 5, 12 and risk 5 are now exactly the same. Yes,.

I

J

I

Cannot really, you cannot really put anything into the eclipses master range because all those commits don't have the signed off error. Tagline, ok,.

J

We can rewrite this.

I

J

This is something I, don't I, don't feel well to.

I

Just squash it on the squash commit putting the sign off boy and running, signed off by Boris ginger. Oh no.

J

We can we can just we can just the a de to sign off to every single commit message. I, don't.

F

Do having a single one of these certainly.

J

When asked, if I, if the.

F

Goal is ultimately to squash everything and all that that squash one has to be signed at the end, it will just ignore the fact that the Eclipse sign off King sales until yeah I, don't think that's a big issue and.

A

Another minor point, while you're doing that kind of work is to make sure that the copyrights and all the files are there and also that they, the leading date, is 2019.

A

So I knew that I looked at yesterday. They were all 2018.

D

A

D

F

Is there any third party code here? Is it all you wrote.

I

Right, Cody, no.

F

There is no Marty.

I

We have wrote, incomplete, tell ourselves: okay,.

F

Cool thanks: let's see you know whether CQ is needed done just the one. Okay, so I'm just.

I

Putting down notes that you're right.

A

um Okay, Boris I, don't know how much time you needed you wanted to have for aisle specification. I'm, assuming four minutes isn't going to do it justice you mind if we move that to the next to the next call, or do you have something to you, I.

I

A

I

Not really, we can discuss it whenever we want or we can organize the call, because how many, what.

A

We do it in two weeks: that's not a problem in because the.

I

E

I

E

I

Weeks, that's the week for the risk five workshop in Switzerland. Wait. Let me verify so in two weeks we're talking about responsible borrowed.

I

We're talking about the six right, no.

C

Yeah yeah right June.

I

6 June 6 still work yeah, okay,.

A

Yeah so either you can quickly give us an introduction now and in a few minutes or maybe we can just hold off until the June. The 6th, ed.

D

Gold at all, okay.

A

All right, okay, so yeah, so we look forward we'll look forward to that that pull request and to get that like to get that, get that in also start looking into what we need to do from the infrastructure side. In order to be able to get the some some sort of testing going for this I think there's some parallels with the a or at 64 that were that we can that we can do so.

J

You have any questions about how to who said that stuff. Just you know, got me a message. I can help you absolutely.

B

A

Okay, um any questions for Boris or young, or do you guys don't have anything you want to anything else? You want to talk about that.

F

I'll, just echo your comments from before. It's really great to see this kind of contribution to the project so.

C

Thanks: okay, thank you very work.

A

Okay, all right! Well, if there's nothing else, I guess we'll we'll end the call so thanks everyone for calling in thank.

J

You bye there, you go have a nice day, Thanks.