Rust Programming Language rustc lecture series, 6 Mar 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: oli-obk on miri and constant evaluation

Description

miri is an interpreter for MIR, Rust's internal representation. miri is the foundation for Rust's compile-time evaluation capabilities. Its design enables it to simulate the workings of the machine at a low-level, meaning that it can interpret not only "safe Rust" but also a lot of unsafe Rust code, including complex and highly optimized libraries like the stdlib hashmap. In this talk, compiler team member oli-obk will dig into how miri works, giving us some insight into its architecture, the way that it represents and reasons about memory, and what kinds of capabilities it offers for Rust itself.

A

Okay, okay! Well then, let's start oh you're, already recording cool! Yes,.

A

Right so I am gonna dive into the constant evaluation stuff in a bit, but first I'm gonna give you a small intro into some things that are necessary first to understand. First, so we we have to mi are in the compiler and the console valuator is based on the MIR. It's essentially a virtual machine that can run mir so to understand what all the constable stuff is doing. Will first go a little bit until the mir itself.

A

You can see the first well no scrolling for me. I guess.

B

It's more sensitive, huh yes,.

A

I, just crow very slowly.

A

You can see the graph here which makes up the three components that are relevant for constable from the NIR. So if we started with the code, for example here with variable X and the value is 42 divided by 2 plus 1, then we mutate X and then we return X and theoretically this could be turned into the MIR.

A

You see on the left, it's not going to be turned into this, but it is very simplified version that is still valid, Mir kind of so what happens and then you kind of get a graph that is made out of single blocks and each block either has a go to or some other command like a and if that goes to multiple blocks, to decide where to go next.

A

So if we start out with basic block 0 and we have a temporary I, will it computes the 42 divided by 2, which we have over here, and this is stored on temporary rebel variable? And this next step we are assigning to another variable, the value of the previous computation plus 1. So we have to slow computation here to understand really underscore one and twos are coming from local table in the mir which maps variables you declare for temporary variables into one big namespace, just just numbers and each of those types.

A

Each of the three bills also has a type. The return place is kind of a very special variable. It doesn't actually exist as a variable. It's just a hook, so the MIR can no knows where to write the return value, because it's not actually in your function. You, you return the value to outside of the function, so it's very special thingy and this is actually what we do. In the last step we assign to the return value and then we execute an mi.

A

Our return statement I'll come to those things later in more detail, but right now the important difference is you're, not actually returning a value with a return statement. What you're doing here signing the value and then you're terminating the evaluation of this function with a return statement.

A

So this lets just get a small overview about how the MIRR itself kind of looks. It's way more detail there. There's many more intermediate steps in there, but those are for pretty irrelevant for constant valuation right.

A

Another part is how constants actually look inside the compiler. So if you have a constant and you end up with your final constants value or like hey, let's make an example: if I can click.

B

I think change to edit mode in our candy III.

A

Just oh I just edited, and then you should seed like so.

A

Okay, so if a constant variable foo and has to have you 32 and you do like I know, let's do the thing we did earlier.

A

And you compute this value at some point. You're gonna have to store the final constant, that is, the value of the constant foo somewhere you're not going to be carrying around the 42 divided by two plus one forever, and what we do is we have a virtual memory, representation of any kind of constant that you could ever have, and this isn't just a random list of bytes. This is a little bit more. It has a few additional features. So one thing is, obviously you have a list of bytes.

A

So if you look below this could be the memory of a little bit more complex. Constant could just contain a lot of fights, and these bytes would have here, for example, zeros or f3, or have some random value. You can store anything in this kind of memory. This is literally just a Becky wait and additionally, constant evaluation cares very much about whether a byte has actually ever been written. So if there's padding inside of a struct, those bytes have never been written. So what we do? We store a second list.

A

You see the Deaf and the undef here and assess this byte is defined, so somebody actually wrote this bite into this virtual memory location but, for example, these two bytes could be some padding and they've never been written. So we have another information about this deep dosa undefined and the rest of the bytes are defined again.

A

So, even though we have just a list of bytes here technically, the constant also knows whether these bytes have ever been written to, and this is transitive. So if you copy from unconstitu another, even though you're writing undefined by it somewhere it'll still know those were undefined when you started out with so. The second layer is also copied whenever you are using it, and then we have a third layer and a third layer. That's for pointers!

A

So imagine we on a 32-bit system and these four bytes specify a pointer to some other, maybe to a static or something and now just having a number here could be, could be used to define pointers just like in. If you have a full computer of you, you know, every pointer is essentially an address.

A

It's a little bit more than that and rafi young will tell you a lot about that, but technically inside a computer you just have a list of bytes in your memory and those bytes specify the address of something and we interpret those by its as a pointer in constant evaluation. That's slightly problematic!

A

Imagine if you could just create a raw pointer I'll make an example again. So if you had.

B

I just clarify one thing: this 42, yes, that you've listed there; yes, absolutely nothing to do with the 42 in your Const foe example above right. In other words, oh, that has nothing to do.

A

B

Kind of allocation idea, or something so so this is constant word I was going to come to that in a second okay. This constant, we're looking at is not an integer, it's some kind of struct with a pointer or something.

A

Yes, so if you had, for example, well, a Const bar, which is an address of a u32- and we just do like one.

A

So if we're taking the address of this, we could be using a normal fight based address. You could just say: okay, so we're storing just underscore one somewhere and then we're just using bunch of bytes here and I guess the compiler will know where it is the problem with.

A

That is, if you allow that, you will also allow something like creating a user 32 by transmuting I, don't know some number and well and unsafe around it, but if you're, if we're doing that and in the end, try to, for example, use this number to index to create an array with the lengths of this number, the compiler. Should it look at the address 999 and find an integer there. So this is a question that can't really be answered.

A

It never makes sense to use in a number or as an as a pointer during constant evaluation and then actually try to read from it. So in order to uniquely identify something like this, the memory that goes into this one, we are using the third layer, which is essentially just an a counter inside a compiler that increases every single time.

A

We create a new allocation and so every time we create one of those big blocks, we also get for the number this so called a log ID, and this ID might be anything it can be any number. You can never see this number it's abstracted away, so the user never knows about this actual number. In fact, between compilations these numbers actually vary, depending on your incremental compilation, state and other things. So these numbers are completely irrelevant. They are just for identifying one of those big allocations.

A

So now, if we have this ampersand one, we get this third layer set to the value or to the ID of the allocation where we storing the one inside and now, if we're trying dereference this bar, what we're going to do is we're not even gonna, really look at those by its. What we're going to do is we're going to look at the number in the third layer fetch the allocation.

A

So now we get one of those big allocations and then we're going to look at those bytes because those by it's already offset inside that allocation. So then, if it's less five, then we're going to go one, two zero one, two three four five, so I would point somewhere here, for example, and with these two informations with the offset and the allocation ID, we can point it to any other allocation and create basically any data structure we might want to have in memory.

A

Ids also saw a surfer secondary purpose. They allow us to create cyclic data structures inside constants. So, for example, think if you had a graph data structure in memory, certain memory he places would point directly or indirectly back to themselves and we could also choose a different representation where we wouldn't have those LM ID is here, but we just like place a box of a location in here.

A

So we'd have like this big tree of data structures, but at some point you'd get into cyclic data structures, and then you couldn't point have an allocation that points back to itself. We could have course then start moving on to like arcs and ref cells and whatnot, but this is not going to get us into happy place. We actually tried this in some situations and it never worked. So we have the separation between allocations and a log IDs.

A

Don't worry, I'm going to actual constant evaluation stuff in a second once we're done with this. So allocations are the actual memory and the a log IDs are just an ID referring to this memory and you can freely convert between stuff between those two. So if you want to get from an alligator in a location, this is a big word full. You can access the alarm app and then fetch the memory from that and so on.

A

But you actually never have to do this yourself, because you never actually touch a log IDs while developing constant evaluation stuff. What you instead do ya have some abstract objects inside the concert where your evaluator, which you can ask the console by the way, the result for you so you're, not even looking at method allocations yourself and you're. Never looking at our IDs yourself. The constant evaluator is doing this in a background in the foreground.

A

You're just asking the constant evaluator, for example, to read something from a virtual pointer or to write something to a virtual pointer, you're, never actually going to work with those types yourself, except if you're, diving really deep into the constant evaluation.

A

But um oh no I scrolled again, that was bad, I'm! Sorry I.

A

Made most of these things in the Indus presentation, I made hyperlinks. So if you're interested in more details, you can actually click on the separate part. Here, for example, the memory type is where all the memory accesses or all the allocation accessing happens. But it's abstracting away a lot of things, so you can ask it to actually like read a user t2 from memory or read a see like string from memory and there's all kinds of features like this, that you should be using the abstraction and not directly access the memory.

A

Of course, if you're building an abstraction, you can access it directly through memory and then write your abstraction that does some special kind of access.

A

Okeydoke that the boilerplate one thing I also want to say: if there are any questions, just interrupt me in the middle and we'll resolve them right in place. I think that's easier than having questions in the end, so um yeah feel free to stop me at any point.

A

Okay, so constant evaluation itself starts pretty simple. First of all, one thing you have to know a constant is nothing else than a function that has zero arguments and a single return type. So to evaluate a constant. What we're doing. We are calling this kind of zero argument function by not passing any arguments because doesn't have any and then starting evaluation. How this works is. We are creating a evaluation context. This is an object. That is the entry point for any kind of constant evaluation.

A

What it does it stores internally a stack of frames and every frame itself has again a stack of local variables and those local variables itself. They point again back to allocations. So, if you're looking at this on the bottom, the evaluation context is full of those frames. When you start off, you have a single frame, that's the frame for your constant. It contains the MIR off the constant contains the locals inside that constant.

A

It contains a return place, which is the information where to fight the final result of the constant once it's computed, and you have two IDs, which is the basic block ID that we're currently evaluating and a statement inside that basic block in the mir that you saw earlier. I had multiple basic blocks and some basic blocks had multiple statements. So these two fields they tell us we're inside the mir.

A

We are right now during our evaluation, so you have kind of gdb debugging experience here where you know exactly where you are, and you can step through your code except you're, stepping through the constant evaluation.

A

So once we have created this first frame, what we do is we actually trigger the stepping by having a loop that calls the step function on the evaluation context and a step function. Just that's all the logic internally, we'll go to the details of step function later, once the step function returns that it doesn't have anything more to evaluate, because there's no more frames left to evaluate that means. The final frame has returned so, no matter how many functions are called in between at some point, you're going to end up back at your original frame.

A

The one that you created was the mir after constant and then at some point. That frame is going to call return and then the evaluation is over, just no more frames to evaluate. So now we know the value of the constant. The final value is stored. Inside the return place we gave, we gave the evaluator and what we do now is we run so-called validity checks on the result.

A

You might have seen those before if you've tried to do some unsafe things with constable, for example, if you're using a union inside a constant evaluation, you could end up having, for example, undefined by it's for a u32 or you could have a reference that is actually an 0 value, which is obviously not okay, and so this final validity check it goes through the tile layout and make sure that everything different in at I lay out is actually adhere to so you're.

A

If you have an enum, which is three variants, for example, and they have discriminant 0, 1, &, 2 and there's a value for then that's, not legal and the tile layout actually presents this. It tells you what kind of values can be inside a certain type and additionally, due to that, we have some additional checks like we enforce that there are no dangling pointers at all. So even if you put a dangling pointer inside padding of a struct, it will check the dere. No dangling pointers.

A

Do you kind of think pointers ever in constable and also characters checking that they're actually related unicode index and not some arbitrary other value? Here's a bunch of more checks if you're interested in this there's links on the bottom of this presentation to all the different parts of constant evaluation, also to validity once we are sure that our value is saying what we're doing is we're moving the memory of this allocation, so the allocation objects, we're moving them to the tie. C ties, C txt so to the global in Turners.

A

So once constant evaluation is finished, we still have access to this memory and can use it to compute the value of any other constant. So a constant is only ever evaluated once and it's returned. Its values are cached inside their global context.

A

Okay, so earlier I skimmed over a few things sort of frame it contains locals. Those are not the same locals that you have any my are so for each value inside the locals each value inside ami. Are you have a value in the locals field, but they are completely different types.

A

What you have here is for one you have to lay out of the local and you have a pointer to an allocation that backs the memory of this local, and so, if you're writing to one of those variables, what you're actually writing to is to the backing allocation of a local? The same thing goes for the return place. It also refers to some allocation, and that internally, is where you write. The final result of your computation to and the allocation has introduced earlier, has some bite.

A

Layer has a relocations layer, which is the thing used for pointers, and we have an undef mask which says for every bite whether it's defined 25.

A

So the rubella I scroll too much.

A

The step function of the evaluation context is a follows. A few very strict rules, one is there needs to be one frame on the understand. So if we keep calling step after everything is finished, you're probably going to get a panic. That's.

A

Intended this way, because we used to have problems where the constant evaluator kind of got finished or accidentally called step after it got finished and that kind of messed up all our state. So what you have before calling it you have to check that there's actually still something to do.

A

Then we have the basic block ID in the frame which identifies the current basic block that we're evaluating fetching that from the MIR and then we're checking for that basic block. If this statement IDE points to an actual statement inside s Mir, so we always have this list of statements and below the list of statements.

A

We have one terminator and we are just reusing the statement ID to either point to statements or if it points to one beyond two statements we say: okay, we want to evaluate a terminator, so the Terminator is, for example, the go-to to jump to another basic block, or it could be an if or it could be, a return yeah. How do you work internally exactly I'll get to later, so these steps they're, basically just run in cycles until we run out of frames, and then we assume that our evaluation is finished.

A

We'll start with statements first, so if we have a statement to interpret all that, we are really interested in are these two statements. Assignments and set discriminant set. Discriminant is essentially an assignment to the attack field of an enum and is very, it shows a very rarely it's not very interesting for us right now, all the other statements. They are pretty much irrelevant for all of constant evaluation.

A

These two they are just creating and destroying local variables, they're very they're, very simple code, and that hasn't been touched in quite a while. But it's not yeah. It's not a very important evaluation. We could just allocate all memory and never destroy it, and everything would work out. Just fine too.

A

One statement that one might see is to retag statement. Constant evaluation ignores it, but Miri uses it for doing this tack, Burroughs checking yeah we could we were. We could have put this into constant evaluation too, but then all our constant evaluation would get very, very slow. So we are choosing not to do that.

B

Yes, so we ultimately adopt, let's say some variant of stock borrows as the definition of undefined behavior yeah.

B

Does that imply the cost evaluation would permit undefined behavior because we're not checking the retag that we might have caught otherwise, okay, yes and.

A

We're kind kind of worried, but only semi worried, so you can't do much stupid stuff during Constable. There's it's it's! It's rather limited in the fancy stuff that it can do. Miri itself is more powerful and while you still can do a few things, especially if you start transmuting and so on, the the final result that you need to end up with needs to be a same value.

A

So all you could do is have like undefined behavior during constant evaluation, but not in the final value, and it's not nice, but we like kind of reserved to write to break your code if it does undefined behavior. So we will basically hoping we we can get away with extending that to constant evaluation.

A

B

It seems like alternative better than the runtime implications like we'll be able to create a rate. For example. Oh yes,.

A

Definitely so these are absolutely deterministic undefined behaviors, so they did this. If we change something- or we add in the additional checks, even with some point decide, we want to have stack power us and some constant evaluation. We can easily add this as an extra feature that starts off with a warning face and everything so we're not going to get accidental breakage we're gonna, have to add those explicitly.

B

Maybe it's worth adding for those listeners who don't know stack Barros is a proposal for checking sort of what kinds of aliases are illegal and unsafe code and rust, Ralph, Young's, blog site blog website has a lot about it all right carry on. Thank you.

A

Okay, so the really really important statement, the one that we actually care about is the assignment and.

A

Stuff is hanging, do I still have internet? Oh.

C

A

B

A

Okay, yeah stuff for us for a second okay.

A

A

So a statement consists of two parts: it has a destination where we want to write something. This is the thing on the left side of equals sign and we have a value that we want to sign there. Just a thing on the right side of the equal sign, the left side. We have a place which can be a bunch of things.

A

It could, for example, be a static or a local variable, or it could be a projection into these, so it could, for example, be in an index operation into a local variable or a dereference of a local variable and then access to a field of it or so so, if you write x, dot foo equals y, then the X dot foo would be such a protection you're accessing field through after variable X and the data structures around T's are changing currently.

A

But it's not going to change much about how a constant evaluation is going to work with it.

A

Yeah, what the.

B

Essentially, only disease or like the full place, data structure and mirror currently at least includes the ability to index into a slice. Oh yeah,.

A

I left a bunch of things out here: okay, this is not a full. This is basically just the important ones so they're inside a field projection yeah. If you click the link, oh you'll see a bunch of data to a bunch of enum Maryann's and those are all the protections that you can actually choose from yeah.

A

Meiri kind of simplifies this place by evaluating it to a pointer, a pointer, is just a group of a pair of an Alec ID and an offset so similar to what we have in allocations, where we have the officer that inside divides and the allocation ID in the third level.

A

A pointer is a simplification of this data structure and it's just a tuple of the LOA ID and the asset, and this is much easier to work with, so we always convert such a place to a pointer and then continue working on it to write to it. In order to know what we want to write to it, we need to have an R value. The R value is the thing on the right side of the of the assignment, and it's not scrolling again there. We are.

A

There's a bunch of things that can be on the right-hand side of an assignment. For one thing you could just have a naming and a variable or a constant or accessing a field of some local variable. What you also can have is different operators, so you can have plus multiplication and unary negation or something you can have aggregates, so creating a tuple or an array or something by including a bunch of objects together, for example, there's also one R value, we're taking the address of a place and storing that in the place.

A

That's on the left side of the equal sign so.

A

Instead of evaluating these two a value and then storing the value at the address, we got from the place. What we're doing is we're directly evaluating these into the target destination, for example, if we are evaluating in aggregate instead of computing all the fields of the aggregate and then writing them to the destination allocation, what we're doing is we're just immediately writing into the destination allocation to save us. The trouble of having to allocate additional memory just temporary memory to store the value before writing it back to the memory and.

A

These optimizations are very important for constable performance. We were trying on different setups there, but basically this is I think pretty optimal already that we're writing this directly there one saying such a binary operation, for example those actually create a temporary intermediate value, but since we only can use those for floating-point or integers, we don't really have to worry about those they're, pretty small. They don't require a bigger locations so binary operator, even though it's a plus operator it can be applied, for example, strings and so on.

A

These binary operators owner operators, they are just four basic values, so they're, just four integers floats and so on. Anything that doesn't actually need any complex, computations.

A

So so, if we start out with such an hour value that we had, for example, binary operations or aggregates, we need a bunch of values to operate on. So we need something to do a binary operation on, and these are values then have certain so-called arguments or I didn't find a better way for name for it. So, for example, if you have an a binary operation, you have two so-called operands and an operand can either be a place or constant.

A

So if you have X plus 3, that would be an operand where the first one points to a place to X and the second one. It's just a constant in itself and in the case of, for example, taking an address or something so the ref R value. The argument is just a place. We're taking the the address of a place, any expression that the user could write that is more complex would automatically generate temporary variables.

A

That aren't usually fine, and we don't really need to worry about those two in constant evaluation, because they're just treated like normal variables, and we can then reuse the r-value operations and so on. On those temporary variables, so everything is split up at certain points when it gets too complex, you know so back to assignments, we have his left side and right side left side is evaluated to a pointer, so we know where to write our information and the right side is evaluated directly into this address.

A

There. I did not include any links here. I'll do it later. There is a one big entry point which is evaluate our value into destination and this method on the evaluation context, basically, thus the whole constant evaluation. This is this is like the D entry point that does all the logic behind it. So if you want to follow bottom up, no top down where constant variation works, you would start with that function.

A

So once we have exhausted all the statements inside a basic block, we reach a terminator and the terminator there's some other terminators too, but it does up basically the usual ones. We can have function calls we can after searchings, which just basically function calls that you don't have a success case or throw a panic. We can have go-to, which does literally nothing except go to the next block, with a switch which, depending on a on an operand, decides which basic block to go to, and we have a return which just terminates the function.

A

It does not return any value or anything it just terminates a function.

A

A function call also jumps to a new basic block afterwards or well all Terminator students. So when terminator is finished, evaluating we jump to a new basic block and start the whole process anew.

A

In order to know where we are, the the frame contains a basic block index and the statement index, and these are incremented in different situations. So the statement index is incremented after every single assignment is evaluated. Once we are pointing to a terminator, because there are no more statements inside this basic block, then we change to block that's currently being evaluated and set the statement field back to zero, so we starting off in a new block. At the start, the only operators that do something different, but the only operator, that's something different.

A

It's a return operator because the return operator actually destroys the current stack frame. So we are back at the original callers stack frame. All other terminators follow exactly this kind of procedure. So even if you're doing a function, call all you're doing.

B

One question all right: there's also panics right, yes, I guess, that's more.

A

Or less they're not really implemented in constable? All they do is tell you the compiler panicked and there's no unwinding or anything happening. Okay,.

B

A

Simplifies things very simple right now.

A

Yeah so function calls.

A

B

Ask a different question: yes, maybe there's a bit of context: I wasn't fully aware of so in Amiri, though our panics ignored or are they implemented now.

A

In Mary, we also don't do unwinding. Well, we just stop when the panic ABS so.

B

To what extent is the constant value raishin and miri like the same codebase? Well, do they share these memory and other.

A

Data structures, yeah that'll, come again: okay, we'll get there.

A

Yep, so when we call a new function, which we here below we fetched Mir of this function, we sets the frame to point to the initial block, which is usually block 0. We set a statement to 0. We add two stack frame to the current stack and then we don't do anything. What happens is next time. We call step step we'll, take the uppermost frame on the stack and start evaluating there, so the frame below simply stays where it was.

A

It's not even getting touched and the uppermost stack, which is our new frame that happens after the function. Call that gets evaluated so a function call actually takes two steps inside the interpreter. It takes a step that evaluates the determinator for calling, and then it takes a step to actually be inside the function and do something with it. So pushing a stack frame is a single step and then evaluating inside the stack frame needs its own steps.

A

Okay, so how do we do a function? Calls the easy one is if we're starting out with a function pointer, a function pointer is also analog ID, but it has two invariant that the offset must be zero and the a log ID doesn't actually point to an allocation, this a little bit hackneyed. But what we're doing is we are allocating allegretti's for functions, so we have a hash map somewhere when you go from analog ID to an instance. So you can ask the global context to give you the instance that belongs to an elevator.

A

A instance is essentially a pair of def ID and a substitution set, which is important if you have generic functions, for example, if you have a function that has generic parameters, but you now know all generic parameters, it's basically Monomoy feist. At this point, then you also have a set of substitutions that defines how this function. How did the functions? Generic parameters are defined in this current call other or monomyth highest? So.

B

In some sense, the allegheny might be like a special value ID. You could think.

A

Like it's not only.

B

For allocations, but.

A

Alligators locations, we have considered doing some different setups, but we have not been able to do this in a performance or scalable way. So right now we're working with that, but it seems to work I think, there's even other. Oh yeah, there's also other other I like IDs, for example, as analogue ID for referring to statics. So if you refer to as static you, don't actually refer to the statics memory. You only refer to the static in an abstract way. So there's a map going from alpha guy T to def, ID and DEF.

A

Id must be the DEF idea for static, and this exists, because you can actually aesthetics that are cyclic, taken a static, a which isn't a raw pointer, pointing to static B and saying P can have a pointer pointer, static, a and we would never be able to create the memory that is structured this way.

A

So, while evaluating one of one static, we would need to evaluate the other static to get yellow the allocation and the correct a logue ID for the allocation, but that static would also want the other statics 1 and then we'd been in bind. So what we did. We created this kind of shallow pointer, which is analog ID. That just refers to the def idea of another static and doesn't actually need to evaluate the other static.

A

Once you try to dereference such a pointer you, the constant evaluation will actually fetch the correct alok ID and the correct memory.

A

So if we started with a definition, we have two def ID of the function, usually from the type. So if you call a function on directly when you get us like a serie sized value which is of a function type which directly contains the DEF ID of the corresponding function, and what we also use is the substitutions of the current function to figure out how to call the next function.

A

So, if you're in a generic function, calling another generic function, but the outer generic function already is being monomer Feist, you know the exact generic parameters and you can forward them to the inner function, call and thus correctly call it completely mono more fast.

A

One more intermediate step is the instant result, call which is important if you, for example, trying to call a clone method on, for example, a string or something. This is actually a trade method, but we need to resolve it to the actual in pull method of in pull clone for string this.

A

mmm This is required when you're, for example, monomer firing inside a generic function, but the generic parameter has, for example, been set to string and when you're now calling clone on the generic parameter. You know it's string, but you don't know what the actual function is. The actual clone method that's on string. So that's what instant results basically does it gives you the very did a concrete function from the input block.

A

If you try to create function, pointers from the ether by, for example, transmuting, a integer to a function pointer, maybe we'll definitely notice and complain loudly. So you can't actually try to do something like evaluating bytecode or something that just doesn't work.

A

Okay, now we're getting to a few more fancy features that aren't actually implemented yet and let's go to 44 again.

A

So one thing that works in Miri but does not work in constant evaluation is heap allocations and if allocations are actually a very simple feature when we implement those in the very beginning, they were quite very, very little additional code because we already have stack of locations but, as you saw earlier in the graph stack of locations, aren't really like a stack that you know from actual computers.

A

They are more a little bit like I, don't know Java stacks or something where every variable itself is like completely separated from every other variable. So each variable has its own memory and you, you can't possibly go from one memory. Look to an hour variables memory pointer into one variable, no matter how many times you do offset on this pointer, you will never ever end up in another variables. Memory they're, completely logically separated and the same thing works a heap allocations.

A

So if you create a heap allocation, then you have a heap allocation of the size that you specified, but you can't actually get into other heap allocations. You can't accidentally run into the stack there they're completely separated, which is possible because the alloc IDs aren't a normal memory. Space like we know it from from a computer they're each a log ID has its own memory, and then we can only offset inside that memory in order to create heap allocations. Now what we did is we kind of hacked on some support to intercept function calls.

A

So if somebody tried to call the underscore underscore rest allocate function, what we did is we just did not call it and simply return a new Alec, ID and offset of zero and let's the rest of the rest code, continue on as it was so we're not actually interpreting the rest allocate function, because that would go for into the system a locator or something what we instead do. It was kind of implement our own system, a locator which use the same mechanism for that.

A

We used for second locations, simply allocated a piece of memory and returned that that pointer and so function intercepting is one thing. We're constable and miri differ very extremely, so Miri has random functions, implemented that it actually will intercept and do something that you go with it. So, for example, you can actually call malloc and free there's a bunch of sis calls that you can do. We implemented quite a list of P thread functions and the list is very long.

A

I think I partially got memory, mapping to work and so on, so Miri itself will do a lot of crazy stuff. That is definitely not good for constable, but there's a list of things that are interesting in constable, for example intrinsic s--. We have a lot of intrinsic for our integer arithmetic.

A

You can see those if you're, for example, calling checked, add or overflowing soap or something these functions will get intercepted by the conciliator and then do something actually to do some hard. Handwritten interpretation there's no rest code. There. It's just.

A

Miri internal code that will kind of emulate what the actual function would do. For example, if it were translated to LLVM, we also have a bunch of laying items that we are intercepting. For example, the panic Lang item is intercepted, so we're not going into the panic machinery, but the moment we hit the panic Lang item. We abort consideration and report an error about having hit a panic and by now and as the question has already been there there, it's clear, there's a lot of code, duplication between contestable Amiri in order to reduce this.

A

What we have is a central part which is called the miry engine, which contains everything that constable needs and that miri needs without having anything more that either of those would need.

A

The miry engine is, for example, the evaluation contacts, the memory inside of that and so on, and these types are all generic and they are generic over the so called machine trade. The machine trade implements a bunch of methods and has a bunch of associated types and associated constants that you can use to configure how your evaluation is actually gonna kind of work.

A

For example, there's a function in there for mutating a static and in constable that function will simply return, error and Amiri. It will actually have some logic implemented, and this is also the entry point for the function intercepting so there's a method on the machine trade which intercepts arbitrary functions and you can define which set of functions are going to get intercepted by overriding. This message yourself, the functions that have intercepted both by a constable and miri. They are directly inside the miry engine and it don't even ever hit the machine method for intercepting functions.

A

They are completely inside the miry engine.

A

So, if we're implementing new constable features most of those already exist inside Mary, so what we're doing often is we are uplifting them from me, read to the miry engine and removing the relevant machine hooks if the Constabulary that doesn't need them anymore.

A

One thing that we do quite a lot as uplifting, oops, uplifting intrinsics, so there's a file which implements all the intrinsic sand in Miri and well. What we do is we move over to code from the for Miri to the concept by later evaluator. They.

A

The files are even named the same thing, they're very easy to find that they refer to each other. Actually in the code in the documentation.

A

Once we have moved over the code, we make the intrinsic wrapper Khan seven and at the intrinsic to the white list inside this file. The white list is all the intrinsics that are allowed to be called at constant values in time. There's a bunch of intrinsics that don't make sense during constant variation time. So we don't want to accidentally allow arbitrary intrinsics.

A

So we have a whitelist to allow those, and once this D steps are done, and maybe there's some additional workarounds for stability and so on, you can remove the code from Miri entirely and it's inside the maybe engine and both constable and Miri can now use the same code base.

A

Yep, that's it pretty much from me. I included a bunch of links on the bottom to have some quicker entry points into what kind of code lifts we're inside conciliator, yeah, wow.

B

That was really fun very cool, thanks, ollie, trying to think. If there's any final questions on my part or anybody else, maybe has any.

C

So, where are we with uplifting can't float alleys loops and some NDE London OPR? So what was the first part just.

A

If and you, okay, okay, a control float that was worth yes, so it so we're not even we don't even need to uplift it. The problem is, it's already inside a mirror engine, it's just disabled, because we can't currently prove certain saying statically. So the cons qualify, 'el is actually a and let me scroll down I can't anymore. Maybe.

B

Maybe just slow.

A

Scroll down there.

B

A

Yeah well so we have this can't qualify file, and this is a bunch of static analysis on constants that prove that we can actually say with a certain level of certainty that the constant is sane. This is important because some constants can't just be evaluated if you have, for example, an Associated constant of a trait and that straight method that rate constant depends on other constants that are not defined.

A

Yet you can't evaluate them yet only when the user actually uses the trait and implements their the trade for their own type, they have to specify the constants. So if you have, for example,.

A

If you have a an associate, constant full of type u32 and its value is power, minus-1 and cons. Are you 32 isn't actually defined? Yet if a user would supply a zero here, then this will panic later or it will break constant relation, and this is one of the cases that we can't actually detect statically, but there's other situations where we have so, for example, if we get to control flow- and we say.

A

A

We do, let's do something more complex, let's do an option.

C

A

Sell you 32 and usually we put a nun in there, but maybe in some situation we might put a.

A

A

In there, if you now had runtime code that x equals foo and you do, for example, X dot unwrap dot set 99. You would not want that y equals to Y 2 equal 99, because that would not only be very surprising if constants could change the value. Additionally, it would be very problematic if you could have a cell to exchange data between threats, because that's that has race conditions.

A

So we kind of want to look at all the branches and figure out information from them to declare whether this constant is actually a legal, constant or if it does some things that we know are really problematic. I don't have to issue at hand where we, where we have more examples that are actually more real-world examples that are problematic but yeah. We are trying to statically prevent certain very problematic things so and unless you read a number of post manifestation post, one of my fat asian errors to do a minimum. So in this case.

C

You would require that both the if and the else branches are well-formed right.

A

Can we yeah and yeah.

B

Keep going I was just thinking that this actually sounds like an interesting topic to dive into, perhaps in a follow-on session, or something like it's a little off field from where we covered so far, but read right, there's a lot of stuff. A lot of interesting questions are coming to mind from this.

A

It's a big topic, and so far we haven't discussed that much from it. It's basically a struggle between the postman office, a postman of a citation or other okay people and the people were not okay with it, because we're always trying to find like a balance there. We already have postman of our citation errors. For example, the one thing was the subtracting one from another variable, so any integer arithmetic can actually cost postman authorization errors, but a lot of other things cannot and just keeping this balance in a nice way as I.

B

Think this fits we had this essential you'll recall in our line, teen discussions. We had this sort of. We should some kind of working group around cost evaluation. It sounds like this is the kind of thing that would be exactly under that agenda, so we should sync up on that. After the call we.

A

Do have the contest Eva RC repo, where we're discussing a bunch of these topics right.

B

I'm, basically talking about trying to lift some of those discussions or make make.

C

B

More visible to everyone who might have any gaps, I know you're already well underway, I'm excited about all right. So we end this call and carry on.

A

Oh yes and yeah, and our questions can also be raised in these in the sulla threat for four comes Eva I think it has don't channel so just open the new topic there, and then we can talk about it. Well,.

B

Thanks again, this was excellent. See you later.

C