Eclipse OMR OMR Technical Talks, 15 Jun 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Code Generators and Much More (Part II)

Description

Thanks to @nbhuiyan who fixed the first 12 minutes of audio that were missing from the initial video! Here is the newly re-recorded and stitched together Compiler Vitality Talk -- Code Generators and Much More (Part II).

A

So we are going to finish off this cogent topic. I studied a one month ago, the going to the page number three, the agenda, the blue sections.

A

The topics were covered previously and this time topics of data are immensely cache architecture, atomic update, locking and memory consistency going to the next page.

A

About the data integrity or atomicity, this basic about for a particular data item.

A

What value you really read out of data item or stored into that data item? Imagine a scenario where a different thread store or load from that data item.

A

Suppose the thread 1 star value, 1, to variable X and throughout the to star value, to 2 variable X and thread 3 load from variable X. Now what value do you expect from that load? Is it a very one value to actually is a mixed byte of value, 1 and value 2?

A

Typically, this mixed part of value, 1 and value 2 can happen. Only the data item of X is misaligned and, however, Java object, model, naturey aligns or data I feel naturally, nature alignment means to buy. The data item will align at least to multiple up to address.

A

Four by the data item were aligned at least multiple of four addresses, and so on so forth. Then why I brought up this data? Atomicity topic is all because Java has a idiom called array copy array. Copy semantic is defined in terms of the array element, and you can you can imagine for a integer array. Then the array copy semantics is a defined copy, one array for by the element a time and they they are because they are naturally aligned. Already they guaranteed to be data.

A

Integrity is guaranteed in that way. However, in our runtime, when we do array copy, we actually using wider instructions to do the copy is more efficient and perform better. But when using the wider instructions to do the copy, you potentially break the data at a measly rule.

A

For example, you, when you do eight byte load and store on a four part element array. The in hardware it actually in the in the behind sings viii by the lodestar, can be turned into byte by byte lord install that will cause word tearing or you will lose data integrity and atomicity.

A

The reason being your 8 byte load store applied on 4 by the data element which may be aligned on 4 byte addresses and not 8, by the addresses that causing that data integrity problem you need to watch out for that. Going to the next slide.

A

And I here offer a more detailed system architecture, in particular the cache architecture, details this, including the carry property of coherency, write back, write, allocate, inclusive or victim cache.

A

All this concept, I will go into details in later slides here. I just briefly mentioned what they, what the three main architectures differ or the same in these properties, for example, the I cache all three platforms right now is coherent and T cache also coherent, but they were differ in right back or right through right at okayed on no right a decade and also a outer level caches to an l-3 whether they are inclusive or victim. They are different as well going to the next slide.

A

Now we are going to describe coherent verses in noncoherent here. I have an example of instructions as Kealing on processor, 0 and processor 1, and that compare what coherent and noncoherent differ in behavior, and here the I mark the processor, cache and memory as the different layers of layers of hardware in system. However, the memory here you can, you can imagine, is a outer level cache or real memory, as long as is, is a beyond the current cash. It's fine!

A

Now on a coherent side, you can see that.

A

9 is plus 2x and later on, processor 1 as killed it 99, plus 2x. It is coherent, you can, you can see the state transition in processor, 0 cash is going, the X cache line, value going from 0 initial value to 9 and go into no. No here means the cache line is not going to be present. In that cache, it's going to be invalidated.

A

Why is invalidated because is invalidated by the later exclusion of X multiplication on p1? That means is coherent, coherent is automatically invalidated in the neighboring peas, zeros cash, and in the memory side the x value is going to transition from initial zero to nine and then eventually is 108 and in the cache of the P one is from zero at around nine to 108. So it's coherent. On the other hand, when it's noncoherent things is going to be very absurd.

A

You you can see in p0 is going from zero to nine and then still nine, because it's noncoherent the later exclusion on p1 will not change the cache line in p0.

A

Although the memory here we assuming the memory is the cache is right through so memory still stay with the transition from zero to nine to 108, that's correct in certain sense, but the very weird state in p0 cache is 0 to 9 and then stay there, because non coherence is never invalidated.

A

From now I was assuming. Data cache has to be coherent in multiprocessor noncoherent. The data cache only exists in the past. In a uniprocessor system. You can do noncoherent the cache there you can. You can manage the data you push down to the memory you can do that, but the only multiprocessor multiprocessor system. If data cache is noncoherent you can. You cannot imagine how it's going to work. On the other hand, struction catch, it can be feel incoherent, because instruction cache is modified. Very rarely typical program.

A

However, in our JVM runtime legit, the instruction cache can be modified relatively frequently, so that situation is a little bit different for JVM, but in typical other applications. Instruction cache is not modified, so they can be noncoherent next slide.

A

Inclusive with the victim cast the example here, I think that the name already is very what it means includes if cash on this example, if your level 1 level 2 level 2, is inclusive or at level 1, what that means is every line in level. 1 will be in level 2 as well. So what happened here you?

A

You can see initially level 1 level 2 are empty and then you'll read X then actively to be brought into a level 2 and level 1, and then you you need Y and both x and y are brought into a level to a level one. So it's inclusive there and then you'll help for some reason: X is evicted from level 1. You can stay in level 2 still because in level to have a bigger capacity, you're evicted from level 1, it can stay in level 2 left line.

A

But if you still inclusive, you only have why level tool containing Y? That's fine! Then you have a pet invalidation. What happens? Is you have a cohesive traffic from external world to level 2? You invalidate evicted Y from level tool that as long as you're inclusive cash means because you are evicted from lower tool, you need to do a packet invalidation to level 1. So if you're going to do it back religion, why in level one so level one? Why is evicted invalidated?

A

A

On the other hand, the victim cache victim cache means it's only contained thing to push out of the inner level cache so level. One cache content is evicted when it is evicted is going to the victim outer level cache.

A

If you are so initially, it's containing a and victim is the CMD now level one needs that the possessor need, the content of cache line, P and conflict basis. A what happened there is P is brought into level 1, but not bought into the level 2 because your victim, but at the same time a is evicted. A is video where is going to is going to the victim cache. So it's going to piss brave the P, because P initially is a are you arrived, is least recently used.

A

Basically, it is going to be evicted so a it will praising P. So becoming C and B and a C now become a IO because, if least recently used, you see in ever used a and Beals right and then later on, you need to take again the process and meet again. What happened? You got two egg exchanges so be need to kick out and a is brought into the cache. So what happened here is because the P is kick out. You need to go to the victim.

A

Cache is push out at the same time, a is brought into a level one. So is it going to get into level one? So it's basically doing a swap and C. We may are you okay, so this to kind of catchy data architecture. What the the trade-off here is. It basically been visible as the capacity for victim cache. You can imagine a level two cache a a level two eleven, whatever the external level cache is a capacity expansion to the inner level cache.

A

So you have a 512 kilobytes of level two and 32 kilobyte of level. One is bit pretty much. You have the cache capacity, it's a 32 kilobyte apart 112 kilobyte, but on increasing side you, your capacity is pretty much dominated by the external level of cache. Then you need to pay the cost of bandwidth for victim cache. You need to snoop for the consistently for the coherency. You need to snoop the backup more more pathways, because you don't know.

A

For example, you have the external coherency traffic coming in to invalidate a for example, then you don't know whether the a is in the level 2 or level 1, so it's going to typically is going to snoop back up in parallel to both cache. So if there you will pay a higher cost of the bandwidth, okay and next slide and write backwards right through the example. Here is right back right back that shot and a short description of right back with the right through.

A

Basically, your data is going to be the external here in external level, cache or memory is going to get. The value is by eviction of the in inner level or is as a part of the right itself. If, if as part of the right itself is right through if later on as part of the eviction, Lange is a right back, so example here ability you'll do a on p0. You do ye go X, plus one I'm talking about the cache line containing X.

A

Only here, I didn't help by Y at all, so you can see X becoming because the p0 require X, so X is brought into the cache 0. Is there and later on, p1 need F, doing X plus 99, then, as part of the store operation, the p0 cache is going to be not is evicted there because it is invalidated and nine a p1 will contain the x value of 99 there.

A

But memory itself is not 99 at this point because, if not with you, yet it's not right through is right back right back at me, memory we are containing 99 later on when the p1 x is evicted so that the right back right back baby you, the X external level of cache on memory, will get the content as part of eviction, not part of right and right through it exactly the other way.

A

So the external level that get the content as part of the right you, the 99, is going when you the p1 right 99, the red 99, is going to the external as well. Okay,.

A

Going to next side and also write, allocate versus not write allocated so the simple shot. The described description here is whether your cash, you are allocated a line for the written cache line as part of the right along because it catchy typically is going to be populated when you do lead, but as a right, you know allocated cache line when you do a right along.

A

Do you allocate a cache line or not right allocated basically means yoga allocated cache line when you do right along no right allocated basically means you, you write a longer, not allocate the cache line. The trade-off here is you.

A

Write is the typically is you write things you don't need it in the near future that the right is relatively smaller in smaller quantity as to read so when you were to write allocate, the recent data will remain closer to the CPU and for know, write, allocate you your with, and data, don't trash the cache. So that's a trade-off here one. Is we mainly closer another? Is you don't you don't tragedy? You don't visit, you don't compete for the capacity of the cache you follow with, and data basically yard the inner level of cache.

A

The capacity can be remained for the read data. Okay. So that's the trade-off here and going to next slide. Sookie I. We showing the cache architecture on different CPUs you can see for Tagalog Tagalog is the scanning app. There are a lot of flavor of skylake. The tagalog s is the bigger one in that flavor there a level three level. Three cache we become victim, cache, not inclusive. On other skylake, smaller character, laptop version of skylake, the level tool actually is 256 kilobyte.

A

Then the level ii actually is inclusive for skylake. This version of skylake. You can you can understand why they become victor us, because you, if people in this flavor level, 3 cache, is inclusive, then your your level 3 cache in capacity. You can imagine you have 28, you have twenty eight cores. Each core is a one megabyte level two and it is inclusive. Your 39 megabyte level 3 both of them need to contain include the 28 megabyte of level 2 right because is shared so busy.

A

You are you level, 3 cache the total capacity, not even twice of the total level, 2 cache capacity. So it's not picking up Venga level. 3 cache your value is, it is reduced significantly so editing in this configuration. The power side is always a victim from his history history or from ten years back the level with her.

A

She is always the victim cache Lee because they have such big 11:3, so they are inclusive and they are inclusive and the right back right at OK'd is a mix every year on sky leg is a right back right at okay and the power 9 is right: flue, no right at okay at all on a d'caché, so the peak has basically you. If you only do you write the data, never in the cache is only made to the lower two, and three is all the way through is right if inclusive, so it's right flu.

A

So when you do a write on Lee is going to write to Akash right to a 2 right, 2, L 3, all the way out at one go not as a part of the addiction, and this have. This has implications to our location, to the physical change. Later I will show you going to next slide and now is going to I cash, I, gotcha, coherency and Similac. What Seymour? Actually? Basically you when you to, because in a TT runtime you you do frequent multiplication to is relatively frequent instruction modification as opposed to other program.

A

They do much instruction modification, so in order to do code patching the instruction modification- you always you. Certainly you assuming what you modify is aligned. You can do the data atomicity. If you are not data. Atomicity, you hand you the the the instruction as a part of data. It will not admit it's not integral to be written. Then your soldering, your view is not work. This is a assumption for sure. In addition to that data, atomicity assumption, we will have two.

B

A

Things here requirement for the code patching to work is you need to take care of I cache coherency because I catch on certain processes are not coherent, II, so so I cache coherence, basically governing when your modification will be seen by other processor. You do the modification, your neighboring processor pick it up or not when you're going to pick up, that is Ayakashi, coherency and see mode actually is.

A

The best specification is a concurrent modification and execution because it typically when, when we do other language, the instruction modification and that instruction to be executed, it's not happening concurrently is typically you compiler generate code, and then you send that code to be accurate. That is a two dating Tuesday evening, but for cheat code patching is not to say it concurrently. You have multiple sled, although one thread is doing a modification. The other threads are probably right now, actually executing that instruction, so that is governing in this sea mode.

A

Access back a concurrent modification and ask Yujin is back which instruction can be modified and your programs do behavior.

A

This both this Akashi coherency and Similac spec, a processor implementation specific and our one-time need to know so if I cut is not coherent with respect to data cache, what you need to do one timely to sync it up. Basically, when you do the modification, because here we nowadays, we always assuming Howard architectures, not one human objection, it.

A

Basically, you have separate a discussion, I cache when you do the modification in a d'caché what we are going to do on I catch, so you need to typically you need to fast the D cash to in memory barrier and invalidate your eye cache at the same time to another memory.

A

Barrier that you can do to make make sure is coherent that way now coming to the mode access back, unfortunately, on current or a3 architecture, they didn't have create definition of C mode, active constraint even on power of a long time ago, probably 50 years ago, I push it to have a clear definition of Similac, so basically, what instruction can be modified and program still behave.

A

Otherwise is always the architecture always define saying if something bad happened, a behavior is undefined, because I'm defined and usually the architecture side, they have a career that they have a easier way out when, when they do the architecture of the finishing and define but undefined, they really mean anything ever so.

A

We really need to have a clear definition of what really have an undefined. So right now on a power side, the halfway relatively clear definition in actually in the next version of architecture coming out and on X and V right now, busy is either bright, chart or try an error. You try something up, however, it motor it works.

A

Motor is what you'll do to modification is really still behave. It only gives the this kind of concurrent modification and execution the the situation is lipid world in one dynamic one time it was then in debugger, because debugger typically is debugger help. This division as well you'll have concurrent modification in the execution, but in debugger scenario typically uses the world is stopped where you do the modification, okay, next slide, and now here, basically the cache architecture relevant to our code generation, as I mentioned in school, basically saying I'll casually transparently.

A

You know you don't need to care about cash, because everything you transparent is ordinary work, but it certainly is not from a performance perspective, for example, the cache line size. You have the you have the trade-off of memory, bandwidth versus your spatial locality, the trade-off there. When you have 256 byte cache line, you expect a lot of data there. You are going to you, you you are using the first buy that you probably expect to use the next byte and it's brought into the cache.

A

You are going to respect to use a lot from from that cache line. But if you are not using a lot of from that, cache line, you'll basically waste a lot of memory bank because you you're brought in 256 byte, but you only use the 4 by the for example. Then you're busy you, you you're wasted, not more memory. Bandwidth versus your.

A

What you really use and also you'll have wider catch line. You'll have more possibility higher possibility for false sharing and a false sharing is what one is. The false sharing with a logical, different logical data happens to be residing in the same cache line, and it can potentially.

A

Give rise to a false contention causing unnecessary contention if you can break it into different cache line lengths as a no contention there that the false sharing this is very easily is the ten times of performance. If you have photo sharing going on because of you when you help false sharing going on the data, even if you have what is called a called cache intervention vision, you can hang your data from your processor to catch it to your neighboring processors called cache.

A

That is the still, although is a faster than memory coming from memory, is still hundreds of cycles. So it's easily ten times as long and in terms of Java I. Think the the we have the energy ala annotation to request. This data need to be your own cache line that there is such a notation to avoid the false sharing, but I, don't think we j9 currently honor honor this annotation. We don't do that and it's also relevant to catch up share relevant to Tirupathi curry.

A

You probably know we have a periodic means: the threat, local heat flat, local heat, so each it's red when L, okay, Java object and new create a new Java object is typically coming out of your allocated from toh the threat local heat.

A

Now we our three code generators. We have different configuration, whether the threat local heat is clear. It's Betty, clear, clear or is not very clear. The trade up here for Patrick clearing is, you have a path length. Trade-Off is because Java has a semantics. When you do new object, the object need to be initialized with zero right.

A

So particularly clearing basically says you. The whole flat logo. Heat is initialized to zero, to begin with and and is doing a messy way and and press on to this week away, for example, on power, we are using that the instruction called data cache approaches zero. So one instruction we are zeroing the whole calculation, so you you can imagine you help a 128 kilobyte of local heap.

A

You only need the one K instruction to do the whole clearing, so this death occurring is done by the GC when, when you, when you ask for a new conceived, doula created for you and then you have the toe edge in and when your bread is doing new object, you don't need to do zero initialization of the object, because the alternative was clear already so the trade-off here is you. You TC will do the path occurring, using a wider interaction to the trajectory and in a city code.

A

You'll, do object, allocation button and need to initialize zero by the to zero by the zero field, zero fuel one by one.

A

Now you have the performance whether you can do particularly beneficially. This is going to be related to the cash architecture.

A

Basically, are you right through a right back, alright allocated move is more relay to write, allocate or not, if you are doing p cash right allocated and when you do zero it, because the bachik rating is also right. Oh, so you do write, allocate what that means. The whole charity will be brought into D cash is trash. Is a Pikachu multiple times, because the pkg studied or kilobyte and clarity typically is 138th, provide you do that clearing your EKG was trash each of four times.

A

So, basically, all your wrong data sitting in a DK is always that has some performance implications for your later run and fight through not right allocate on power, because the level two is much bigger. For example, on p9 the level two is a 512 kilobytes. You do a 128 kilo by the query. That's fine is 1/4 of the size of level 2. You still have 3/4 of the level 2 kg there for the other important data and level the pkg level. 1 is never touched when you do the query.

A

So there's something here, X and Z, because it says write allocate is- is that the difference here also SMT level consideration, because you have a certain level. You have for thread doing zeroing when you are really thrashing, because each one is 1 or 22 kilobyte you assembly for total to be 512 kilobytes, so I have a experience on KGB 2015. You really need to tune theoretic maximum size, be a smaller and actually improve performance, not to trash the whole level tool.

A

Okay, go into the next right. Next side is atomic update and locking and interpreter communication for pachala for any programming language. The enterprise communication typically is going through your atomic operation and in Java is also through the volatile variable because water, how variable Java volatile variable, has the has the sequential consistent later I will talk about that when the baby, you have a void, unchecked databases and humming operations have the because they are going to be exclusive anyway atomically, so they can do that interpreter communication.

A

Now you need to differentiate atomic update from locking, although they are using the same, they are using the same primitive to achieve the goal, but they are actually in the hardware in the Halliwell exact behavior. They are different.

A

Atomic updates basically means the test set F get and add whatever the atomic operation.

A

Their behavior is you do update you do, for example, atomic imager increment by one when you do increment by one later on you, you, basically you you, assuming that atomic integer data can be handed to other thread to do another adopting update, but for locking is different for locking you, although you are using, compares what to grab the lock there. The occupation of that lock, pretty much means this cache line should not be passed to the other processor, because it's elaborate either.

A

Even you get that cache line, they cannot do anything about it because you're locked, so it's different from the the peasant set of fetch an ad or whatever you do, atomic updates ugly there. You can toss the cache line to other processor to do further fetch and add that different here, so how to differentiate this, to behavior, locking, locking youth, compare network and and atomic updated using comparative as well. How to differentiate this into two behavior I didn't see actor and Zi can differentiate it on power.

A

We have a instruction sheet so on the components water that a loaded reserved instruction on power. Where you do the load and reserve you can, you can provide a hint. You have included instructions slightly differently. You have a hint here tell the peseta. These instruction is intended for atomic update instead of locking and then later on, the processor. We are managing the cache line differently.

A

When you do compare atomic update, the cache line is going to be easier or readily going to be not passed around to other processor. Eval oppressors are requested, but for the locking the prototype thing, they tend to hold the cache line for longer.

A

Yes for the hint.

A

A

C

C

There's a lot of.

C

A

Yeah yeah that give the same goal as the heat non-holiday.

A

The probability you hinted, then, is it that it's actually the issue through the cash estate empower the cash estate has a sitting state and while the state varies in you are you are, you are exclusively sketch line for Locke or atomic update, and then the behavior will be different if it makes it made a huge difference in performance for contended data actually for db2 pivotal when this Hindi exactly water additive, a typical typical DVD would love the login service is all yours adopting a bit and in other cases they are using, locking in a passive, cannot distinguish and the coming update the basically slow down because it is holding and when the king is editor, is really the performance improved three times or something like that, and in Java our lock is bimodal bimodal being the lock the Java object model.

A

The job object that the lock has to stay. Warm is the flat lock busy is only the lock word in object, indicating a lot where you are holding the lock or not and the length as a way when the lock is contended. There are multiple thread contending for the same object, a lock and it's going to be inflated.

A

So the lockwood going to be in created a way is in credit that lock were containing a pointer to j9, monitor object and that jane, I'm melina object, actually containing a reference to a thread of milk people at a mutex object. Who is a wrapper there and.

A

In brain lock is rubbing that, and also that another way of the java utility concurrent for java, locking is called you going through the unstable pack and unpack primitive. That lock is is very similar to flatter lock and basically, it may not help.

A

If you have contended there is going to later on is going to determinate into people, pthread mutex and the conditional variable things there. So I even didn't talk about the unsafe pocket and pocket style. Login here is a Java monitor, lock. Basically, you health to to state flat, lock and in created a lock okay, so in the next slide and memory consistently. So what it's very consistent you'll have had a hardware memory, consistency model and a software memory condition is model so memory consistency model beta, is a contract between, for example, for hardware memory.

A

Consistency model is a contract between the hardware behavior versus your program and for software memory. Consistency model is a contract between the programmer and a program which is governing the behavior. What is behavior and here I is all because of cache. It was as no cash in a system you're pretty much in you. You don't have a memory, consistency issue here and is pretty much coming down to sequential consistency, because you can imagine you know cache everything you go into its converge on a memory controller to do the memory operation on a memory.

A

Then you help a you: have a single funnel through to the memory. Then it's going to be as long as your typical appearance that we've ordered a memory operation. Then everything is a funnel of the food memory controller. At the memory point, then you don't have a memory consistency issue at all, because it is by definition the single final memory controller, the a a sequential consistency, sequential consistency is easier to to understand and the deals to the behavior now.

A

The the other, if you relate to memory consistency, is interest read or during interest, elevated within a single thread, what the father behavior DVD, the behavior, eight people each return ordering. This is a some assumption for sure and implicitly true. This assumption is true on all processes. I knew, except on 40 50 years ago, that a procedure called alpha by Dec.

A

They they have some behavior interest well already didn't conform to program order. They have some value of speculation going on in their processor but or because, if GUI it can lead to very strange behavior. When your interest rate already didn't conform to program order program order basically means your instruction layout disorder.

A

Then your behavior is confirmed to your as you see what a you where you see in your instruction sequence that behavior that it was there already is your that prayer observed, but memory consistency model is dictating the total ordering of total memory access in your system and that can get into different model, so so interest rate interest, read ordering is program order setting assumption. Otherwise you have a lot of paradoxical, very trendy, behavior behavior. You have a causality problem problem there, but even the interest led is program order.

A

Interval ordering is program order, but they're rather ly externalized when your access external angrily your job, your memory, access that well memory access is put on our outside obvious. If you call their order, doesn't conform to the program order, so you internally will observe your program ordering, but externally other people other thread observe your access is not in your program order. That is the problem here, and so so there are different.

A

You have different Hardware commitment, the the memory consistent model and different, not different, is a single model on the software side, and so I'm here is talking about a hardware consistent model, so the axiom is e, RTS or total store ordering and a P is weak ordering. So now what the people's between here is. I have an example here so x and y shut off as 0, both verbal 0 and the 3 thread doing things the third one. Basically, we stop 1 to X, star 1, Y and thready 2 would try to load.

A

Why included is the other two? Why and load a single other 2s and fell asleep in the same thing, although it's going to put two different register now, what again after this program is done filling what you are evacuated, observe 4804 total story is pretty much means we the thread to the x and y equal, 0 and 1 respectively, that it is impossible, but 3. The 0 and 1 is also impossible, because this total saw orderly memory model total saw already.

A

Beta means this store and stall wise will be observed in their order by other professor, so pretty much. If other processor can observe Y go one is guaranteed to observe X equal 1, because if they cannot be reordered so hopeful from that model, you can be used x and y observed x.

A

0 1 is impossible because you observe y as 1, but at the same time you observed XS 0, that's impossible for total, stop ordering okay, but our power is weak or during recovery means this store, and this to store ethical, 1 and y equal 1 can happen any order. So any combination is possible for on power.

A

For that sleep read the you. Can you can observe anything? Okay, that's the difference, PSO and recovery, and next slide a last slide here so now, going to back a language language consistent model in history for CNSE, probably even doesn't help with a memory model only in the purpose of 2011 standard. They added a simple past memory model so that at that time you have a contract between the language and program. You when you write a program this way, you are guaranteed to see this behavior. Otherwise, in part that you write something in Central Park.

A

Good luck, you multi Prairie! The venue is not currently the what you but, of course, is typically is okay ja. In early day, they already define a Java memory. Model is Luigi a theory as a Java specification requested. Number 133 is a so p3 just a program and in your execution, you'll have the we contractor between your program and when is actually the it need to conform a certain certain way. So the Java language to to have a concise description of what majority model means is basically is a sequential consistent of all water level.

A

Test, plus all lock regions is consistently consistently quinary consistent as well. These two things put together and plus interest rate conforming to your program order. These three things together, it's governed the behavior of your program, and now you have the Java memory model. In your hand, you have a hardware behavior TSO or we ordering yacht from Java Java Runtime point of view. You need to guarantee your Java program will behave like G defined by Java memory model.

A

So you need to make your code down to the code down to the T KML give definition, so you need to massage in your code using write memory barrier stuff too, on your platform to exhibit the behavior of Java memory model definition, although your underlying hardware model actually can be a week or during our total store already.

A

So so your your novella, because I wrote here it no matter whether underlying hardware is strong or weak or during your PD compiler. Immediate code need to behave like defined by the Java memory model. That's the contract! Okay! So in our cheat in our teacher behavior is you have you still have you need actual memory barrier for JVM safety? What I mean here is you have impressive data in object, for example, for a array object, you have the in object header, you have the object type.

A

You have the rail length that to field our implicit data in your Java object and for safety means. If we didn't. If you didn't guarantee the order of these implicit data, you can crash in a JVM. For example, if you already you initialize the your Java of array to be length of 100, but the other thread applicable. The Java leg array object the length to be 1000 personal order right then you pick up. 1000 then is going to be accepted, something wrong in the crash. Okay that happen.

A

That happened not that infrequently on power actually and we need to. We need to insert the right memory barrier at the new object case in that instruction sequence after the instruction to guarantee when, before you can publish your object reference to other thread, otherwise you can imagine you initialize here and you publish it. You probably say with you the so-called publish your object reference.

A

For example, you store your object, reference into a static field right so static, a eco object, my new object and static, a can be seen by the other thread, so the other thread we'd study, ger a so pick up the object, your newly created object and then studying referencing studying referencing that newly created object, of example, the array length, but on these creation size, although you thought raela's already there, but is not the CPU didn't enforce the ordering of your store to the structure versus your Leila stop.

A

So the other trailer didn't see a real end store at all. So we pick up a wrong length. So what happen you are you media is on a newly create object before you start anything else, anything global. Then you need a memory barrier there to guarantee the array length increase if data is a visible okay.

A

So any questions on this.

A

B

So I want to thank Julian for giving an overview on all three architectures, not that many people on the team are able to give such a good discussion on the three architectures and what all the subtle differences and, where they're similar and how they actually matter in our code generators. So I wanted to thank you and thank you all for coming to this talk stay tuned for another talk next month. So thanks.