.Net Foundation Design Reviews, 17 Mar 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: .NET Design Review: ARM Intrinsics

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right, so that seems to work. Hopefully, my mic is also working. Otherwise it will be a sad day for friends on YouTube.

A

So how do we want to do this? You go. Do you want to talk about this.

A

A

A

B

C

Good news mo, we can hear you.

D

E

Yes, hopefully,.

D

I was talking, and my mic was muted, so I will repeat what I said.

D

My initial thought was I didn't like the approach, but after looking at the C++ usages and realizing that it's that it's doing byte index not bit indexing I think it makes more sense to use element index here and it will probably better match the common use case. And if people do want to do other indexing, they can convert to the appropriate type.

F

Yeah I don't have.

G

Don't happen to me.

H

That's a scene, it seems.

F

H

E

H

General fights with my shorty: don't work, don't try to deceive I. Suppose that's.

A

That's right so.

E

A

I

But I think that that the C++ motivation that Tanner was sighting was usage right. This is this is how it's used, and this is why my son, if I, were done it perfect right.

C

What's the issue number where we have the proposal for fused multiplied by selected scalar because I don't see it in this issue.

C

Because of what I'm trying to do is I'm trying to see what the proposed API was, because it's not listed in this particular issue or what the approved API was.

C

Because there's there's the comment with with igor's proposal, but I I'm having trouble mapping that to what the approved API was.

A

What was the name.

C

Was a medieval tip.

A

C

Multiply ad by selected scalar.

D

So so, basically, the I can't find the issue number right now, github not searching, but basically the thing is is the we approved the first overload and igor's asking about if we should also add the second overload, because that's what C++ does so, the first overload is all types are the same, and the second overload has the has the third type as a vector 128.

G

So maybe I'm just babies like.

J

D

D

The second you're going to scroll down on your screen a little bit mo. Oh I,.

J

Just sent a link to the EPA.

K

E

D

Virtually normal.

E

B

L

B

L

D

B

D

It by selector one yeah, the one just in the middle of your screen there. Oh you just scrolled past it.

D

Right there I know, that's multiplied, extending.

B

Right, yeah I think it was the fused version.

B

I, don't think they held it in reality,.

A

As far as any cons, why does.

J

G

J

G

J

Guess it's fine, we can just discard x, select a scalar because they have similar interests.

H

B

Look at the smoke coming.

E

A

E

The reducing things.

J

So here in C++, there's at least multiple overloads for fuse, multiply every select, the scaler and now so. Basically, we have overloads that have same the exercise for left right and add them. Both current in C++, right, size and Adams and left sizes are independent. They have basically three different overloads or four different overloads, instead of instead of two and the reason why just a mark suggested it's because the dog have easy way to go between vector, 128 and patru 64 in our kids can down cast from vector 128.

J

So, instead of exploiting all this overloads, we can just have just multiply add by selected scalar, where write is all this vector 128 and in case, if you got 64, we can up cut it director. One. Thank you. Well,.

D

It's it's not quite that simple! So, with most of the instructions, the size of all operands are the same, but in the case of FMLA, there's both a cubit and an H bit the cubit controls the size of the result, the addend and the left, but the H bit controls the size of right. So you can actually have vector 64, left and vector 128 right.

M

And that particular pattern will repeat itself quite often for a lot of my elements, sources.

J

N

D

J

Would need all these overloads yeah.

D

So for all the API is that our by element that also allow the the selected element vector to be 128 or 64 bit independently of the other operands. We should have a separate overload.

C

Sure yeah, that makes sense. Okay,.

J

So I guess we have more overload.

C

Yeah I was I was trying to convince myself out of just using a flag for this, but since you're actually changing the parameter types I don't see a way to do that without adding the Overland's yeah.

G

How many combinations.

C

A

D

It's just one extra overload for all of the by selected scalar variants, but.

C

Only for the ones that take 64 bit vectors correct know.

D

All the 128 bit also need one that take a 64 bit overload as the third parameter, so, okay, so so so for each vector, 64 we'll have one additional one back to 128. We'll have one additional one: I don't know if it's how trivial it would be, but another option is we just always take that third operand is vector 128 and we look for a pattern around.

D

Using vector, 64, 2, vector, 128 and then pick the right encoding based off that so.

C

For the for the ones that take vector, 128 has addend and left and 64 is right.

B

C

It's impractical to just tell people to call dot get lower on that third parameter.

D

Well, it wouldn't be calling it lower you you'd be you'd have to. If you already have a vector 128, you probably want to call the vector 128 overload yeah, but if you have a vector 64, then you have to extend up sure which means that you're going to be taking up an additional register slot.

C

Okay and you think it's common enough- that people will have a combination of 64 that and 128 bit doctors that I don't know.

D

J

Democratic and additional registers for on our 64 128, and thank you for exactly one rated you're, not taking another right.

C

Right right right but I guess my question was: how common is it to use 64-bit vectors in processing like this like? Are we going to be adding overloads that literally, none of our audience will ever call.

M

Well so one of the papers prisons, where I frequently used such overloads, is where I have to create a constant and it's just replicate those across all links. Identity, use a smaller value set and then get the category get a zero index from death, so that I only have a smaller load or small movie or something.

C

And that's an optimization that can't be applied to 128-bit vectors in that parameter, not.

M

Reading them from a scalar.

C

M

If it has to actually create the one factor: okay, but I, guess it's in principle: you could optimize that ethic. All.

M

Sure, but in general, the use case is that, for instance, you did some other cutting calculation and you got the number out of it and you just want to do a multiply with, for instance, okay,.

A

H

Other one first and.

G

H

C

Yeah the thing that was throwing the offices was proposed as an overload to adjusting approved api's, but we couldn't finally approved api's anywhere.

M

D

The other was the element index, which was just above this.

L

K

D

So I'm going to introduce a couple concepts here: real quick, so on arm. Many of the instructions have a few different overloads. So there's the base instruction which just as the normal operation and then many of them, also have versions that do rounding or saturation, and so that's where you see, for example, add hi here and then round it at high they're, essentially the same instruction, but there's a bit changed in in there in the underlying instruction.

D

They differ slightly, but they perform the same operation only different in, for example, whether you round the result versus truncate it. So a rounded result would be 0.5 go to point. 5 goes to 3, whereas truncated would be 2. Point 5 goes to 2.

O

It looks like you're adding integers here, though, why would you need to round.

D

Because if you do 5 divided by 2, you either want 3 or 2 as the result, depending on the direction you're going. Yeah.

C

The these add high api's are basically the equivalent of divided by 2 to the 16 or divided by 2 to the 32.

C

I'm simplifying but yeah yeah.

D

So, in the add case here, you're basically doing, for example,.

D

65 k, plus 65 k and then you're only taking the upper 16 bits of that 32-bit result, in which case you want to round based on the lower 16 bits there gotcha makes sense, is.

C

That comment on rounded add high, correct by the way like it's, it says, upper half left, Ellen, plus right, Ellen plus 1. Should it be like? Is it actually that or is it left, lm+ right, Ellen plus 2, to something let.

D

Me double check.

D

It is some element to some round constant and then take some bits. 2 times e sighs minus 1 through these sides, so that comment should be correct. Well, it seems like the round constant. Might.

C

Be different based on the size of the data type that you're working with yeah.

D

Round is 1 left shifted by E size, minus one else, 0, okay, yeah! So got it. Yes,.

C

So the comment is it's.

D

Not plus 1, it's plus.

C

2 to some power depends.

C

Okay, that makes more sense.

C

C

So I guess: if we want to start, you know Battle Royale, Iman names.

C

My first comment would be we're not actually adding the hi components of the inputs, we're adding the two component in the two inputs and then taking the hi component. So the name reads a bit strange to me, but I: don't well. You can't really think of about her name.

D

The closest concept we have is in system math, we've got big mole and we've also got like multiply high and those are the closest existing concepts we have for some of these, where big mole is you're, taking, for example, two 32-bit values in producing a 64-bit result and then multiply. Hi is the same except you only return the upper half of the result. Okay,.

C

D

Have precedent.

C

For this name, then yeah, okay,.

M

Good question: it's a reason why you called that one, the lower one at high and the other one had height upper instead of F high, lower and Papa.

D

For the adage, in an adage in yes,.

M

That seems you you, you call the two variants of the functions always upper but you're, going to call them on two ones lower. Just one thing is this:.

D

D

That might be because, in the other places we've used ad high or we've done up high and offload to indicate whether it's working on the upper or lower half.

D

So that might be an oversight on my part due to the.

E

You know needing.

D

To use high as the name for the result.

P

The instruction uses the term narrow is there any reason why that should or shouldn't be in the function? Names.

D

We could call it ad and narrow I.

D

Don't know how we would specify that it's narrowing to give the upper half of the result.

C

Yeah normally, when we use the word narrowing elsewhere, daily tend to main just chop off everything of what doesn't fit into the data type right.

D

And in this case, we're taking the upper half that would normally be chopped off, yeah.

C

So, like ad low, for instance, would be the equivalent of what we think of as narrow.

D

So so, back to tomorrow's comment for most of the operations we've exposed so far that have high and low variance we've named them, for example at high and AD low. But obviously that's going to conflict with the name that I've used here for ad high.

D

Do we have a better name we can use for give me the upper half of the result.

C

Can really think of anything, because this is this is logically to operations rolled into a single instruction? Isn't it so? We can't just tell people to write two different methods, because it would have different behavioral characteristics right.

P

Like you, you don't want to spell out the full name. Given the manual add returning high narrow.

D

We could, but that might content in with the other names that we've picked so far sure.

D

Maybe we could call, it add. High result.

C

Or add and return high yeah.

D

Maybe maybe that would work.

Q

C

It one of the downsides of that is that we, we start getting pretty long method names, yeah.

D

In which case the the arm, 64 specific variant for add agent to would be.

D

Add high and return high, and this one we might want to name, add low and return high.

B

D

So so, there's basically two parts to this instruction, similar to some of the other ones. We've reviewed where this one in particular deals with the the lower half of the vectors and then there's another one which will do the same operation but write it to the upper half the result, because, because you're narrowing the result from short to pipe, you end up with a vector that can contain twice as many elements. I.

B

B

Yeah I have to.

M

G

Always won't have to eat the measurement so.

C

Should have just use the instruction name.

B

Yeah yeah yeah I.

C

I, keep coming back to add. Hi is a good compromise, though, because again we have precedent from the math class and it's it's succinct. Yeah.

D

I guess the problem with it, though, is what tomorrow raised, which is once you also look at the second half of the instruction.

D

The names don't fit, because what we've done so far would be. We would end up naming this add hi hi in this one add hi-low, based on our existing conventions.

Q

B

C

His ad low, the right thing to use here is a name because it is adding the two elements in their entirety: it's not taking the low parts at the elements and adding them I guess what I'm confused by is. How did we come to add low and return high.

D

D

Vector to the lower half of the destination register and clears the upper half well add agent to writes the vector to the upper half of the destination without affecting the other bits of the register. So they both do an ad high operation. They both do the operation described there in the comment or in the summary, but it's a question on whether they write it, whether they write the result to be lower or upper half of the destination vector.

D

So it's doing an ad high and then it's right result to lower or upper half of destination. Oh right.

C

But in the in the other instructions, I think you use an upper suffix to denote that you're writing to the upper half of the death of the out poker, echt I.

D

Thought we used high and low in the name of the instruction.

M

The confusion here is that the types in the instructor should the input should be factor 128.

M

Since ta is always a11 predicted.

M

C

Are you referring to the input parameters or the output, the.

M

Input, okay, so it's going to be a patient on the lower part of the community, big factor.

M

Which is why you have the upper component, because you can only do it: half half the factor. Okay, the other part, to fill in the remaining the remaining half.

C

And I'm, just going through the proposal looking for the API that operates on the upper half of the 128 bit inputs.

D

It's under the arm, 64 class.

D

It's the first one on the under.

C

D

C

That takes three inputs instead of two, so it doesn't seem like a properly analog of this API. It.

D

Takes three inputs because oh it's combining and yes, it combines okay, yeah yeah.

D

And tomorrow would be right, it should the instruction coding takes effect or 128, and then it the.

D

The encoding specifies whether or not you're working on the lower upper half.

C

But it so behaviorally, even though it's returning only 64 bits of useful data, is it actually effectively zero extending the entire register so that the yeah it's the.

D

C

The upper 64 bits of the results are all cleared. Basically yeah.

C

Does it make sense if we change this to take vector, 128 says inputs to also have it return a vector 128 as output, and then say the behavior is such that only the low 64 bits are populated on.

D

In order for two, so the c++ intrinsic has it taking in vector 128 inputs and returning a vector, 64 output, okay,.

C

And we prefer to keep the consistency there.

D

Yes, I think so: okay, so so and then on the other half it has it taking it. Has it returning a vector, 128, taking a vector, 64, lower and then taking vector 128, says the left and right.

D

This is for the the arm, 64. Second, half version: okay,.

C

Yeah, so if we change, if we change the parameters here, to take vector 128 as input, then yeah I, think putting lower in the method may make sense.

D

And I probably made that mistake on a few of these api's where they say vector 64, and they should be back to 128.

D

Sure, specifically for the the two halves api's.

C

So it wasn't ad lower and return hi.

D

M

Q

I was just taking so what.

G

Was the goal is the food.

Q

G

N

Headed away I'm trying I.

Q

R

S

C

Sorry, what was so sure me a blow.

S

And return high or ad low returning hi I.

S

Am Dan Pleasant doing two things.

A

So I can show you.

L

M

Just to double-check, you said: there's an easy way to go from our factors. 64 to of doctor wanting the aedra yeah.

D

It's a zero cost conversion; okay,.

M

Now she's gonna add because the for some of these influences it is quite handy to have the status ticket for overload. What is easy, when you do a baby, don't need it. Yeah.

D

At least for arm 64, it will be a zero cost conversion for arm thirty-two. Since the vector, 122 practice two registers. If we ever support it, it might be a little bit different.

C

And that's not an unsafe operation to do that. Extension! No!.

D

Cuz on arm 64, it's the same register slot just like on it's like eczema, 0 to Y mmm 0 on x86 yeah.

C

But I thought that that why I thought that the conversion from 64 to 128 was unsafe if he wanted it to be zero cost.

M

So, in particular this case because it's only operating on the bottom half and week is just a matter of type checking. Oh yeah.

D

On on, unlike x86, where the upper bits can be preserved or 0, depending on the operation and coding on arm 64, there 0, ok,.

C

So we can make that optimization within legit done yeah.

D

C

Sounds good what I, what I was just trying to avoid was having people you know, write out the words unsafe or dangerous every time they wanted to call these API sites.

C

Well, they already have to because pointers, but I mean, if you're not dealing with memory, if you're just dealing with actual register data but yeah sure.

D

Yeah I, just with rounded, is everyone fine, with rounded as the prefix for an operation that rounds.

O

A

This would no be all.

B

Be betrayed yep, I,.

S

Mean just considering intellisense: do we want it as a suffix instead of a prefix I? Would.

I

It could be as epic personally so.

C

Ad low, rounded returning high a suffix to the ad low yeah.

D

That sounds fun to me. Just noting that when we get to some of these later api's, we might have to tweak the name some because, for example, there's a there's variants that are something like AA round and saturate, lower, etc. There's some very interesting combinations, so I'm, just noting.

A

The UN agency, if.

S

We want to say that the intrinsics are weird and we want it as a prefix. That's that's fine I'm, just if we want it to group together in intelligence, and it can't be that's the only consideration right.

C

Well, if you put it as an infix than all of the outlays all of the ads low round heads which would stay together separate from the ad low, the.

D

Other interesting bit is making sure we don't. The ordering of the intrinsics of these keywords in some cases can be impactful in how a user might interpret the results. Acts just another thing to keep in mind. I, don't think, there's any rounded in particular that what some of the others might.

C

So in fix it is.

A

Neither one down.

C

The Sun down in fixing.

F

A

C

Well, it just means that all of the rounding operations are grouped and all of the non rounding operations recruiting and sorry all of all of the rounding ad low. Our group distinct from the non rounding, add load, but they are still separate from the rounding at high and non rounding at high.

D

Yeah in particular, in this case we're doing an ad rounding and then taking the upper half.

C

It gets weird this all gets very weird.

G

A

R

Just keep going it's just going in see.

A

Where these so having.

N

So hobbing Edie.

D

D

In particular, I called this one having ad, because I didn't think at half.

C

And this is, this is subject to overflow.

C

As in when you, when you add the left and the right element, is there, is the intermediate result in truncated to the element with or visit, keep the carry been around for the final division operation.

D

Tomorrow, do you know that's not something that I had the opportunity to check yet.

D

So in particular, this.

N

D

Be impactful for.

D

In max value plus int max value, /, yes, where, if it, if it's fused, it will return and max value, otherwise it will return in da max value /. Yes,.

C

Yeah well, I was also thinking in thought, I think adding in Totman value and value / you'd end up in zero as well. Yeah.

A

R

C

Instruction, it would be fused or not right. Tanner yeah.

D

I think we were, if it's a fused operation, we would pre fixed it with fused.

D

Having add I think.

S

So for prefix versus suffix again, do we want it to group under h or under a.

S

Add having or how add, housing or housing ad and in the next one being ad helping rounded or I.

D

I, don't know if ad housing in particular I guess if it's ad having.

D

Result or something, maybe maybe that will be clear enough to users that it's an ad and then a half rather than half the inputs and then add them.

A

G

M

No I can't seem to tell for this particular instruction.

M

If it's fuse or not but I, don't believe it is, but I can ask.

D

Let me see if I we're going to be the other ones, all right up a program and run it on my surface to see what it outputs so.

F

D

College and I call it air howling, howling.

F

F

And then I guess the next one, then I guess the next one.

A

Would be I guess.

D

I think it would be a dad, rounded having I think yes see specifically because you're doing.

C

A

C

I think your comment for ad saturate, my tier frogs, had her yeah yeah.

D

It's a it's definitely meant to be a plus there and they're supposed to be a common that says it. Saturates that result.

C

In mean you don't overflow from positive to negative or vice versa. It basically gets capped at one of the maximums or minimums yeah.

D

So so int max value, plus one returns, int max value rather than zero.

S

Saturated saturating saturate seems wrong on.

D

We Mon on x86, we used just the word saturate yeah.

C

I'd keep this, as is because it matches x86.

C

And when you have the T here, that just means it supports every integral data type right. Yen float.

D

D

C

And then the 64-bit one will support you. Okay,.

B

All right all right.

R

K

R

B

R

B

A

So I guess this is my.

H

C

A

C

So exact same thing that we said for the ad ones just replaced out what struck.

B

D

For it's all the same operations just subtract now so.

A

It would be, it would be.

L

E

Low effect low returning.

A

E

B

Okay and the other one with the other one would be.

B

To subtract abstract.

B

Rounded down the eternal, tiring hi.

A

There's no way out, there's no way up and then and.

R

A

And then the other one is just.

E

It's just for advice actually.

A

Q

E

Right all right.

B

D

So these are so this one is the other half of the original variant. So you can see it takes an additional parameter. It's the first one which specifies the lower bits of the result.

C

And I think we had said earlier. Please correct me if I'm mistaken that here left and right should be vector: 128 not vector 64 and I only operate on the upper components of those vectors, correct.

D

A

So I said all the type of all.

F

The types and let me back to what I should be.

A

A vector 128, it.

D

E

And so what is the.

M

D

Love returning high versus the other one would earn. No, this one's sorry add upper returning high, in contrast to the add, lower returning high.

B

At upper turning high so.

M

B

L

B

And so on, the.

A

F

A

B

Should be add up and.

R

Okay, add exaggerate.

A

L

A

D

This is because the there's two parts one is: if we took a scalar, the J, it would have to have significant work in order to understand that integer values can be held in tsim d registers and the other part is we're actually taking in the actual thing taking. It is a vector 64. It's only operating on the lowest.

A

D

This mirrors, what we did this mirrors, what we did on x86, so it will be as X vector 64 of int, and it will only operate on the 0th element of that.

B

A

Q

Is the opposite of low.

A

D

It's upper and lower.

A

Because, before.

Q

A

Lower lower yeah.

B

C

We hit set lower yeah.

B

C

Fix it in post production.

B

Okay, Edie Lowell, Edna I.

A

L

I said: there's nobody.

A

All right done all.

Q

B

A

L

A

D

So this is doing a pairwise ad, but the result is twice as wide as the input. So s by it becomes short by it becomes you short, it becomes long, etc.

C

That makes sense.

S

C

Add pairwise widening because we didn't use end and the other names. But what.

D

About that would be fine too. It's the same number of characters. It is.

A

Right so everything is.

H

H

G

Guess you do the same to get.

Q

The second same thing on the second.

G

M

Sharing the name of the second group contained accumulator, something we.

D

M

D

We had used add elsewhere, as the suffix add guys add, widening sounded weird.

M

Yeah I think but like like this now the instruction seems kind of ambiguous to me. If I were to only look at the name than the signature, yeah.

D

Do we want to call it add, pairwise, add widening.

E

Sorry Edie pairwise well.

D

So in other, in other api's, we had used add as a suffix to indicate that the result was added to in addend, because basically, this is doing, add pairwise and then it's taking that result and adding it to the existing to an existing value. I.

A

See so this is.

E

Basically, two additions.

A

D

And actually it's add pairwise widening and then the add operations performed.

S

So in English that's effectively, add pairwise, widening and add yep.

D

S

Slowly, add pairwise, widening and add.

E

And we want the.

D

I wanna be the first one or.

A

D

One yeah because that's the way, the instructions encoded and how we've ordered the other ones. Because of that.

B

Absolutely episodes.

A

Different eyes.

R

B

R

Should this be wise, me read the word.

D

Yes, I think so.

B

So they all go to.

R

Lose their attention.

D

Yes, because it's an absolute difference, so it's returning an absolute value.

A

E

Should probably be.

M

Is another one where the factor 64 should be 128 I? Think.

D

Yeah, that would be the other case.

A

Sorry, there's.

G

B

G

B

Lower lower should take term.

A

128 Oh nectar 128 only yeah.

D

Because they're they're operating on the lower half of a vector 128, not on the lower half of a vector, 64.

P

And presumably that comment about the operation is wrong.

D

Absolute value symbol in there and it should be subtract.

D

Actually, tomorrow, is this one operating on vector 128 because of the widen operation? Oh.

D

No I guess I guess it is I just missed that part of the spec.

E

C

B

Q

B

Think somebody's.

A

Mad, we have differences with difference.

B

A

B

Difference and and.

A

B

I, probably probably be absolute.

A

B

With different head and head lower and.

E

D

Anything Lee I think we said absolute difference, widening and add, or no absolute difference, widening lower and.

B

Yes, specifics by.

A

Jeff nor I thought it was likely to operate. That's not the premise. The difference.

B

Widen lower lower.

G

B

A

And then presumably the.

E

Only visibility, excellent and conducted.

A

D

C

Is there a non widening version of these api's or are there only widening versions.

D

Yes, I believe there's non widening and I believe we already reviewed them. Okay, great.

E

D

Yeah we called it absolute difference ad for the already approved and reviewed API.

G

That's actually not bad actually,.

B

G

D

Also on the ones we already reviewed and approved, we have them taking.

D

Vector 64 rather than vector 128, so maybe that's something we need to fix. In retrospect.

B

All on say this last, but against let.

G

D

Anywhere, do anything takes.

B

Enrichment, so.

D

So we were saying for these ones: they shouldn't be taking vector, 64, left and right. Yes,.

G

D

G

D

Taking vector 128 on the existing reviewed and approved API in to for 794 they're taking vector 64 so.

B

D

Seven: nine four, seven.

M

I see the zubble example: two, there multiple times were smarter than supposed to be something else.

D

M

You just pick a 64 and 128, for instance, right. Oh.

D

D

M

The one pin, 8-bit versions below are correctly accumulated and the one to loop out in some set are not accumulating so that destruction doesn't match up to this.

D

Oh, it should be sab DL not sab. Al.

D

Actually I think I see why we did it as vector 64 rather than vector 128 here, yes,.

M

Of course, flag for indeed for samoto.

D

Right but first first Sabol you were, you were saying.

M

So for sound factor, 128 for the add-ins and 64 for the inputs, okay,.

D

R

Actually is closed.

A

Actually is correct.

M

The stats function there take one two and one argument: I can't see past the babble.

D

It takes two arguments left and right.

D

So they're, correct and taking vector, 64 and having over here.

M

It seems to be.

R

R

A

H

Yes, ignoring the in Orion enlighten, though, and just look at these schools, you look at these groups here, so this.

A

H

This one here this takes.

A

H

Seconds collectively,.

A

It's correct, yeah. Okay, then, the previous one.

Q

A

Previous one, which takes 64.

Q

Years also correct, it's also.

A

N

A

So then, let's.

R

M

Along with you all right, all right so.

M

This isn't Sabol, it's.

E

A

Oh yes, okay! So that's good! All right! All right!.

D

Yes, I think so, and I think we said they should be add, widening, lower and add widening upper.

A

E

Go ahead, white upper! What.

A

Upper and then attending.

B

The work at least 128.

A

Right and then and then.

B

I'm confused I'm confused. You seem to have its.

R

L

A

So add white, like what I hear it was twice over 100 128 right.

L

So this Connexions.

A

L

Here seems to be the same as really.

A

L

A

You probably want these.

L

N

L

Right because this, because these students.

A

Are logically related right and.

R

E

There's difference instruction.

A

Sure but these.

A

Q

By no day at.

A

This one year.

D

Right and they're different instructions by own remember how they differ.

Q

D

I'm just trying to see what the difference is.

D

Well, the operation description is identical.

M

Yeah we're trying to figure out.

M

D

It's a difference between whether it's a big ad, oh.

D

D

They are functionally similar. The difference is the first one. The first set does a spite, plus s by T equals short. The second set does short plus s by.

G

A

And it's intention.

H

Isn't ready to both take.

A

D

Yes, because in the latter case, you're taking a short and so it's going to be twice as wide, it's the same size as the result.

G

A

Of these, within all of these, depending why somewhere I'd make no work.

E

B

R

And then I guess these.

A

Entities that I widened, you know I'm lightening, lower /.

R

A

R

D

G

Then the other.

E

Ones are the ones.

A

There's a little souvenir.

H

A

Should all be multiplied.

F

A

Widening Luyden.

N

A

B

And then and then.

K

B

To play widening the.

Q

Widening summer walk and subtract.

B

B

Walking by widening.

P

Q

B

J

All right all right.

D

Yeah, that's right.

R

Wow Wow all right all right.

B

Then I guess it'll equals this one.

K

R

L

So hopefully this one takes.

K

On, quite as long quite as long doesn't only have 37 to.

B

L

B

Till I take it away. It.

D

M

F

D

So something that Igor and I had discussed on another one of the issues was rather than using value tupple. Here we could do what C++ does and define a custom type. For example named we would have vector 128 by itself for the single element and then for the two element tupple. We would call it vector 128 X 2, which roughly I, for example,.

M

So the reason I original one for topple is because synthetically, it's the easiest way to pass for Forest yeah.

G

A

So is there any reason? Is.

H

There any reason why affectionately to.

M

Thompson, because.

R

To cause them to.

A

Call them all rightie chapters, I.

C

Technically invisilites for free and wake us with the greatest expression.

N

A

Self expression, so they so they.

E

Just put parentheses.

A

Around parentheses,.

E

Of two or three or four.

H

A

To use coming here, I mean I'm, not your home. How.

H

Much I actually.

B

Q

A project, but it.

A

Seems like the most obvious.

Q

A

N

B

Is ingredients is a.

A

M

So my question there is I, don't know tonight is relegated, it can add the constraints only for these functions and not everybody use double or if we get a new type deriving from a double there.

D

Was some discussion at the starting in the comment that Igor just link to, but care would be the best person to comment on the register. Allocator I.

I

Can find the onion?

I

This is something that's going to take some time. Do you support I? Don't you know I, don't think I think to find the API, but I think there's no question whether it's something we can do for dotnet v.

I

The biggest issue I mean partly it's going to be a bit of work, but the biggest issue will making it be making it pay for play.

I

It's probably going to be an impact on throughput just to have to check if I have this case, so I think it'll I think that making it you know sort of pay to play will be at you, the bigger part of the work. If that helped.

A

I

I'm sorry I wasn't answering the right questions so.

C

Carol you just cut out.

I

Whether it's you know what what the actual type is my suspicion is that we will probably want to have a special type in the jet and whether we convert something do that type in the importer or whether you know it comes in as the special type is probably not a huge impact.

N

I

I guess maybe I left a wrong impression. I, don't think.

B

I

Take that within the jig we'll have to do that. The only way to handle this will be to make this a special support to make these special supported types. So I say whether those get created in the jet or whether they are you know, come in as differentiated types.

I

It's it's just a matter of how much work has to be done in the importer. I would.

D

Imagine, at least from the perspective of usability, it might be easier to define a custom type one. It lets us extend custom functionality on that in the future. For example, debugger display things like that, but it also we can make sure that it's immutable and some of the other niceties value. Tuple has public fields, which means users are free to take the address of them and do all kinds of weird things with it, which might make it harder to do various optimizations.

D

We could prevent that with a custom.

A

To your custom.

E

A

Yeah yeah yeah.

G

O

Subtext both language syntax for free. If you see there's and the tennis leaders of the tangent items.

G

I

A hacer top minute I know I may be like completely misremembering, but I think we want to be consistent between this and things that returned multiple registers and I thought. There was an issue with returning a value couple: a minds: misunderstanding.

D

I, don't think we've had an issue with returning value tupple, but so far all the api's that have returned multiple results have used resultant out. We've not used value tupple, so.

I

That is a bit problematic.

I

I mean address of something it's um it's just to try.

S

Q

Never fix exact.

J

A

A

Does that make sense? Is.

G

That makes it just confuses just to.

I

A

R

Tell people to tell people reactions.

I

Well, I will digress sorry, never mind.

P

Is it the case that we would define these before we have a JIT implementation to validate our assumptions about how difficult these would be to implement, say for dotnet 5, or would we not define these until we knew we had a implementation that works I.

D

Don't think we could publicly expose them until we had an implementation that works.

I

Finding it so that so they at least we have a plan. Okay,.

D

Right I think it's fine to find them. We just can't expose them until we have something working. It.

P

It seems like you know, the multi register, load and store ones might be more important if we're gonna, assign a priority to defining and implementing an API around. These is that fair.

D

Well, we we could have, we could also expose them and have a less efficient implementation.

D

Temporarily speaking and then have a work item telling people that these ones aren't true intrinsics, yet they will be in a future release.

P

Well, that that actually locks us into the API shape media that one does right.

M

But in terms of your first questions and yes I would say the loads and stores or a more important strategy, meals.

P

And have we actually defined and reviewed an API for the like the load to load, reload for etc?.

D

There's a proposal for that. It's three thirty, four: ninety, which is a thing that II gorlice linked to above with some of the loose discussion around it, but it that's not gone through review. Yet okay and you already have.

P

This the same same question about how to represent the multiple registers. Yes, but in secular registers, but.

D

Specifically, on returns, rather than input, parameter, okay,.

A

So what I'm hearing.

F

Is that what I'm hearing why it's there yet.

D

We we need the the single element version for Don at five. It's the two three and four element versions where we need to determine an API shape in which we might not be able to ship in five.

A

D

Three or four elements right, yeah and we're just going to I think in either case, rather whether we do tupple or custom type, we'll just be taking in a vector 128 for the.

N

Single element.

E

P

So what do we have to do to figure out how to represent the multi register form? There's the value tupple case for the custom type case. Is there another? How do we make that decision.

A

Api switch.

K

Depending on when were to calmly you mind getting me, you might get almost all the better poet uses topics directly.

E

Rocky Creek, you could have the conversion.

A

E

To your topic,.

K

On your private, hopefully tweets per.

K

Subjects but when t-shirt.

A

K

N

A

Were just basically constructed.

N

A table instructor.

M

E

Many constant I prefer just.

H

A

To a big little vegetable, that's really the only that's.

N

Really the only benefit of a good couple right in return for their excellent.

N

Proximity so matrices before I.

N

A

N

I would think that you.

A

N

To determine which.

E

Is very significant over tensions, UAB tensions, you have loading those loading, those and then Comac like excellent on the on our end customers, but.

A

N

B

A

Like even people that.

B

Even if he thought makes also text language work, language, work, I, don't.

A

B

The only thing.

J

A

Know, oh my god.

J

It just seems like a lot of things like you guys, think.

H

We should do.

A

It if we can't.

H

Do it it'll be intense.

A

Will be it and.

D

We should determine what downsides in the jet. There will be with value tuple having public fields. Yes,.

B

I

Wait did I miss what would be there.

D

The fields are public, so users are free to do to take the address of individual fields, read and write, individual fields, etc. I think so it's just it's a potential concern where there's more places where a user could be doing something they think is clever, but in actuality and ends up hurting code. Jen.

I

Yeah I'm not sure yeah I mean if you need to be able to reference those individual elements anyway, which I yeah sorry. But this is probably a side conversation. I.

D

Well, it's basically the case of like, if you, with with the vector types we've got specific, get get element, methods that are able to explicitly optimize. For example, getting the x value of a vector, 128 flow to a specific instruction. If we have a custom type, we can likewise have the property getters for the you know, element 0, 1, 2, & 3 that are specifically optimized by the JIT, knowing that there that it's otherwise immutable, whereas with value double that will be much harder to recognize and specialize yeah.

I

I'm, not sure I'm, not sure where the balance winds up on that, because right having the extra stuff to support versus okay. Well, it just falls out is the JIT support for stressful fields, right I'm, not sure which is gonna, be a better scenario. So right.

N

I

P

So Carol, would you say that we need to do some experimentation here on which way would work best.

I

Well, I, you know, I would love to I, don't know where that winds up relative to other stuff we're working on, because you know I, don't I, don't know that you know without actually getting to the point where we can register allocate these things. I, don't know how much progress we can make on it's. Nothing, not okay!.

I

Yeah I don't mean to be a downer.

F

R

B

Addiction InDesign.

C

Harold have been over the past few weeks, so it's okay.

B

A

B

Q

B

Cause it's wrong.

S

So I know that vector table extension is the name of the thing in the docs, but I think it's kind of a terrible name and maybe vector table lookup extension is better. Since it's look up with extended semantics.

S

If we're still talking about the methods that are on the screen,.

L

Yeah, how would I project.

R

Oh boy, that's pretty concise because I love you or the other thing, because unless.

B

You see what I see see what I see I'm calm, I'm, not going to hog the case.

M

Yeah I think dr. table lookup extension or extended what we find.

R

So this is basic.

A

Ones to take one.

B

More than one little.

A

B

Alliance will be, can.

A

See what remains.

R

A

Remains, though, there's no practices.

K

A

Right to people.

C

Modulo Jeremy's feedback on the names at looks right. I did.

A

We decide that.

C

We're using idx no.

A

D

The arm 32 ones we can approve, but they won't be.

D

They won't be implemented for dotnet 5. So we could probably.

A

K

I tell my you just said:.

M

What is ladyship yeah.

L

Alright, so right.

N

L

Brings us closer.

R

Brings ease a little bit to this.

Q

Alright, alright.

E

Right so I know.

G

B

So tedious to be within to index.

A

S

The exercise, room, I, guess, right and believe it by then that.

M

Wouldn't in disgusting butter, so there's a list of illnesses.

S

Okay, so by indexes yeah.

D

Wouldn't it be indices, I.

H

Think so yeah I think so yeah. Let me get out the book.

D

Yeah I don't know if there's a special rule here we have or not.

I

In English, it would definitely be in defeat I believe.

A

Sure that's what.

I

R

S

All right, according to the table of standardized names in dotnet, we use indexes and never use in disease seriously seriously.

B

S

All right, I didn't write all right, don't blame me. I'm.

C

Going through I'm, going through all of the files in libraries just to make sure that we're actually following those guidelines.

A

In sanctions, and also does making evading, where have work anywhere,.

K

Like what does a.

B

P

Actually do like Jeremy suggested vector table lookup extension, that's actually the name in the manual I believe in Frank. Yes,.

M

The difference is just what what happens when the index doesn't exist in the lookup, so the normal table lookup will return zero. The extended purposefully, we'll leave it unchanged.

Q

Probably help something else.

A

All right happy happy myself, ooh who's. That then.

D

I'm not sure I, like the name list, but I'm not sure I've got a better.

D

Better option either, and likewise with result.

P

Or these, yes, we just together operation.

D

I think it's more like an advanced form of permutation, rather than gather.

C

Yeah, having result as a parameter name strikes me as a bit odd, especially one that's copy by value are those default.

O

M

They're sort of like the dumpling I.

S

Mean part of the naming of the list parameter depends on how we think we're gonna expand the rest right.

S

Because we could make them the method that takes four registers, just take four parameters and deal with like and make the allocator have to deal with that, at which point we would call them vector, 1, vector 2 that could free vector 4. That.

D

Would eventually be inconsistent with the multiple element, while multiple elements store could be okay, but the multiple element return would be inconsistent.

S

Yeah I mean this is what it's called in the in the documentation, but they point out. Lists can be a variable or semi variable linked thing, but yeah I don't have a better name. If we don't know how we do it, but we could. If we wanted to just playing devil's advocate, we could expose the flat call versions of the one-two-three enforce right and then we can add a at a packed one later. If we add the packed register type.

M

Maybe list will be called stadium balloon stats. This is essentially the table you're. Looking for something.

D

Yeah, that table might be a good name. There.

S

Now it's not called list.

A

To seeing now is called.

N

Now so this is that.

A

What you said now.

S

If we rename it to table, then it's no longer called list and there's no longer Oh.

N

S

A name that means something and not not already.

A

A

Even though table in.

E

The table here because.

A

To be a table, something.

E

Two-Dimensional.

A

Two-Dimensional but I guess it kind of is that.

L

I, guess, is you have the role so and you have the roles and then you have you know the actual elements may actually.

B

L

So what about results.

A

L

B

Q

With that happy.

G

D

I'd the reason I don't like results, as the name is because that's new normally used to imply it's the result of the operation rather than where this is the result of a vector table. Lookup operation.

L

D

That, basically.

O

The extension, but you possum did you pass it I.

D

Got so I guess it's if more than one source registers used to describe the table, the first source register describes the lowest fight. Oh, no, that's the wrong comment. I was trying to figure out if this was equivalent to lower that we used elsewhere.

M

Sort of since it's being used as as babies too destructive operator I.

M

Was gonna suggest four instead of result, maybe default values.

D

So I guess it's a little bit unclear. How is this, how is the result used exactly in the computation of the actual result, so.

M

If the, if your index is Auto great, then the value in the original result, we've only talked to this use or what is left unchanged. Okay,.

D

So maybe we could just call it default or something default result. I.

C

Prefer default values over default, resulting result in a.

M

D

I did want to call out I I was able to check how add having performs, and it is a huge operation, so bite max value plus one returns.

D

D

Okay, so so we're gonna call it I, guess fused at having to make that clear.

L

R

That means and the business.

A

R

The previous one we.

L

A

It once it is a good one that has high everything that is having yeah.

L

G

O

G

L

You want that to be the place to be.

R

L

Right, it's yeah.

D

So yeah so far, we've done fused everywhere.

L

G

D

L

So there's no fuel, there's no fused subtract long.

D

No, there is no rounded sub.

L

L

D

So then, remaining there's a proposal for the shifting intrinsics.

D

There's a proposal for the load and replicate api's and then there's a handful more than I need to finish getting up into a proposal and this scalar variants that were missed.

D

There's two issues open so far in the night, god I think two more issues that still need to be opened.

D

And the other two are Intel.

L

D

G

Or the other, so that's the.

L

D

The first two forearm I believe are the ones for x86. It would be nice to have, but I, don't think, there's any critical asks for them. So.

L

My challenge is I think.

K

H

Essential essentials, which is really.

O

Going to be in full.

L

L

L

J

Video writer, who was.

L

S

Probably and that's assuming that we work out the namespace offline, which we haven't started trying to do yeah yeah, but we can always do that as the last.

S

G

All right all right.