Rust Programming Language Oxidize Global 2020, 15 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Reliable optimizations for idiomatic Rust

Description

Optimizations are mainly known for

- making code fast,
- aggravating undefined behavior, and
- making developers suffer during debugging.

In the Rust compiler we have a scheme that allows us to optimize code without affecting the debug-ability while at the same time actually causing compile time speed ups. This talk first introduces the MIR, on which the Rust compiler does optimizations. Then various concepts are explained, which allow us to write idiomatic Rust and still getting performance that hand-crafted low level code can’t beat. Finally an outlook on cool-things-to-come ™️ shows how language-guaranteed optimizations can be leveraged in resource constrained environments.

A

All right, hi there ollie all right thanks so much for joining us again on short notice. I think you put out a tweet beginning of this week or end of last week and you're like I really want to talk about this and we're like we have a place for you.

A

B

A

Glad to have you last minute and I'll go ahead and hand it over to you.

B

Okay, thanks so hi everyone. um This is my other hat. I do lots of cons stuff, but I also do optimizations in ros compiler. So today I'm going to talk to you about how we do fancy optimizations in rest that make our lives in easier in many many ways that were unintended before we even when we started to do these optimizations, but as usually with rust things get really cool once you start digging into it.

B

So just quick reminder: I have this logo, so if you see this anywhere, this is very likely to be me and yeah just say hi. If you see me so, let's dig into rust optimizations before we do that. um What are we optimizing? um I have to introduce a few uh words. So, um let's start with the mir, this is the medium intermediate representation in the rust compiler, which is placed around here. So you go from source code over abstract, syntax tree into some kind of higher intermediate representation.

B

And finally, we go into the mir and from that we go to llam or cranelift for generating assembly code.

B

Lots of optimizations happen in lvm, but we have done a few optimizations on mir and these have been really really high impact. So, let's look at mir. First mir is basically rust, but remove all syntax sugar that you can think of. So, if there's anything that happens with any kind of magic, it's not happening on mrr mir is really really low level rest. So you have no macros.

B

You have no loops you! You really have goto in in um mr it's it's that bad yeah. There's.

A

B

If no match you have a switch similar to c you're, basically just switching on integers there's no type inference trade resolution. All your function calls are going to be like fully qualified function. Calls you write the entire path in mir, there's no auto draft, so you have to write all the stars and reference operators that you normally take for granted and rest and there's no expression chaining. So every expression is a single operation. You can't do a plus b plus c.

B

You have to do a plus b stored in a variable and then do plus c in a different way. So there's no no complex things in mrr. The cool thing about that is, it makes optimizations much easier and it makes lots of analyses much easier. So, um let's look at an example. How mr looks um this is rest? There's not, mr yet, but just a demo. So if we have this piece of rest code, um we have a small assignment 2x.

B

We have some operation during the assignment where we compute 42 to the power of three. We do a condition and after the condition we either go into loop or we finish our program so um in mir this looks like this. So what's going on here at the beginning, you still see your main function. This time was an explicit type similar to on cc.

B

You have to specify all local variables in the beginning, so you declare all your variables and then you write code, but um in contrast to rest code, um where you write your code, it was like blocks.

A

B

Well, how you write last code. Obviously, here we have so-called basic blocks, you see them specified with bb0, bb1 and so on, and these basic blocks are essentially a single unit of execution that you fill with statements and at the.

A

B

This basic block, you have a so-called terminator, which moves to another block or which returns- and here you can already see a bunch of go-to's and at the bottom, these jump to other blocks, and you can also see the switch end, but this representation is not really fun to explore, so we have a different one graphical ones. So this is the same piece of code that you just saw, but this time it's much easier to comprehend as a human. So you start out with the power operation that we had.

B

As I already said, you have a fully qualified path, so there's the core num, then there's the input block for i32 in there there's a power function and then the arguments and after that's called we store the result in the variable underscore one whatever that means, um then we go to the next block. Here we compare the variable and let's go one with a number get a result.

B

We switch on this number and if it's false, we return. Otherwise we go to another block and then we end up in this loop that we saw. So this is just a little bit of the basics of mir that we're going to need when I'm going to show you optimizations later. So, let's talk about optimizations in general.

B

Optimizations are there to make your code fast? You want the code to be small and you want to reduce your memory footprint. This has several reasons.

B

You'd want to write high-level code, but you still want the code to be fast, so you don't want to do all these little bit. Twigly things just to make code faster.

B

You also don't want to use up a lot of program space on your embedded device, for example, um and so you want to shrink the code into the smallest thing possible and you want to reduce the memory footprint. So if you call lots of functions, each functional call needs lots of memory and yeah. You just want to reduce it so that, if you do lots of operations, you don't run out of stack space or otherwise blow up your memory.

B

There's not even talking about memory leaks, it's literally just about the memory usage that a normal program uses the other thing about optimizations. Is they make debugging basically a nightmare? So if you ever used a debugger on an optimized program, you probably went partially crazy um before either going back to print of debugging or turning off lots of optimizations.

B

So you could nicely debug your code. The other thing that optimizations do is they make undefined behavior symptoms much much worse. So optimizations are the things that actually abuse undefined behavior to cause well lots of fun, behaviors that you've seen from undefined behavior and yeah. So I'm going to talk about all these points, but let's start out with one confusing thing about optimizations optimizations can reduce compile time.

B

So that seems like contradictory right. You do more work and then you do less work. So the thing is when you optimize out code later, optimizations have to do less work, so you have some cheap, easy optimizations that you can do up front. The next optimization has to do less work and will be faster, and this is a super high impact. If you are optimizing generic functions because generic functions are instantiated many times for each generic parameter when you're compiling down the assembly.

B

But if you optimize the generic function, each instantiation of this generic function will have to do less work. So, for example, vectors are used everywhere. If you can optimize a function inside vector it will reduce the compile times because it's it's used in all kind of rust, crates and with all kinds of types. So each duplication, you save a little bit there, but if you have many duplications well, you save a lot.

B

So um the next point that I talked about was optimizations optimizations, makes things faster, um for example, take this for loop, um a for loop and rust does several things when it's being compiled as first step. If you take this for loop, it will get converted into this beast. So you have like this interior function call there's a loop there's a match on the iterator.

B

Then you look at the sum and none and then it executes some code, but that is a lot of noise just to iterate from the numbers from zero to n. Like you want this tight loop on in assembly, where just incrementing a counter and then executing the loop body, you don't want all this function.

B

Call noise so there's lots of optimizations here where we uh clean up um this uh generated code um and then simplify it down to the point where you literally have this perfect assembly um tight loop that you want other things you want code to be small. So if there's operations that you can remove without changing the program, for example, here we could divide, do the division at compile time and just store the result.

B

um Then you save some assembly instructions that you're not going to need because they're not going to happen at runtime, and this makes your binary smaller um for reducing the memory footprint. There's um optimizations, for example, inlining, which takes the body, for example. Of this, do something function and just verbatim copies this body into this for loop directly here, so you don't have a function. Call you don't have to store variables to stack or move some registers around. You can literally just execute the code.

B

That's there and it saves you a bunch of memory during execution and the other thing that it can do. It can open up new, optimization avenues where, after you do, one optimization, suddenly other optimizations jump in and do even more things.

B

Okay, um this was just a broad intro, so let's actually look at optimizations, because that's what you're here for right um there's a very common thing that is used in rust, which is the question mic operator, um and the thing was for a long time after it was stabilized, I think, over a year it was actually not optimized that well, um it's allow me I'm just couldn't figure out like a pattern that was in there, so I'm going to show you what pattern llm couldn't detect and what we did in rust to well optimize the question mark operator better.

B

So if you look at this piece of code, we apply the question mark operator to the x variable, add one to the result and then wrap it again in an okay. So if we unroll this manually, we would get something like this. So there's a match on this variable. If it's okay, then we add one and put it back into it. Okay, if it's an error, then we create the error.

B

Object object again and return that, and if you look at this piece of rust code and then you look at the mir code for each of these branches, you get mr code that looks like this. So let me explain what's going on here, we have the x variable and we are matching on it.

B

So then we want to read the a field from the okay block, so what we're doing is we're actually accessing the first element of the okay tuple struct, um which is with the dot zero operation and reading that into the variable a then we do an addition by adding 1 to a and writing that into the first field of the ok tuple struct, which is stored at the return place. So it's stored where the return command would return from a function and the final thing we have to do is we have to set the discriminant.

B

So the part that says this enum is now an okay to zero, because zero in this case means okay and one would mean error. So we have to do all these operations. Where we take apart this complex, enum pull out the fields, then do some operation put it back into a field change some things and um lmm really really has a lot of trouble figuring. This out, like lvm, just basically gives up it.

B

Verb attempt translates this code one to one into assembly, and if you look at the error path, it's even worse we're not actually doing any operation we're taking the e out of the x and then putting it back into the return value. So we basically didn't do anything. We just copied something out copied it back in and it was all the same types.

B

So if we're looking at this error type error path, what we actually want to do, we simply want to return the original thing like we don't want to take it apart and build it back together. We just want to return the thing and lvm just could not figure this out, so we wrote a so called people optimization, which is an optimization that is very, very targeted to a specific um piece of code.

B

So we we're literally looking for the pattern that you're seeing here like it looks for take something out of an oven, a number variant, put it back into an enum variant and then set the discriminant to the same enum variant. And if this exact pattern happens somewhere, we just remove all the code and write an assignment, and this optimization caused compile times to go down and cost runtime to speed up and there to be less instructions in the assembly.

B

So this is one of these operations where we are using idiomatic rust code with, for example, the question mark operator and um we're getting the same performance that you would get if you handcrafted your pointer arithmetic or whatever you're doing around your enum handling okay. So this is how we are building optimizations right now, but there's several things around optimizations that um I already talked about. For example, debugging, suddenly becomes hard, and um so let's dig into this question. So what are common misconceptions about optimizations in general, the first one optimizations make debugging harder.

B

Well, no optimizations just make debugging harder, because lots of optimization developers didn't really put the effort in to get optimizations to the point where debugging just stays a decent experience.

B

When you're optimizing some code, you know a lot about what you're changing you can change the back information to actually handle what's going on. So let's look at an example.

B

If you have this instruction somewhere in your program code- and you have nothing else- you're, not even using x or anything, you literally have just let x equals 42. and then you're debugging your program, the debugger will at best tell you x, is a local variable, that's optimized out and that's it like.

B

It won't tell you anything about x, um but it's actually very easy to change your system to figure out that the value is 42, because you knew at compile time. The value is 42, so you can change your debug information to just contain this 42 and instead of pointing to some memory location that doesn't exist anymore, you're, just pointing to a debugging for the back info information and reading that value.

B

So that's the easy part like doing storing some constants and um well reading them again doing debugging. But this kind of debugging information can do a lot more. You can literally do any kind of computation that you can do in the debugging console and just repeat it during the reading of such a variable. So look at this example. There's a little bit more complex. You have next variable that's of type substruct and has a bunch of fields.

B

So a b c d and you're reading variable y from x, dot a and at some point y gets optimized out, but you keep the debugging information that the y value comes from x, sort, a and then debugger at runtime, while you're debugging will read from the x variable, which didn't get optimised out and will read the a field from this x variable and you can even add computations in there.

B

So if you wrote like let y equals 2 times x, dot a you could store in debug information that y's value is literally 2 times x, dot a and then the debugger could compute this while you're debugging.

B

So um this is one thing that um the rest compiler does in the background when it's doing optimizations and um you can turn it off. If you want to, you, can nuke all the back information and just go ahead like optimize everything, but if you're not changing anything by the default settings. This is what you're getting now the other.

A

B

Is optimizations use uv to just like mess up your program format, your hard drive, and I don't know nasal demons and um well yeah? Okay, they do sorry, there's no way around that, because that's actually what we want from optimizations optimizations are supposed to take assumptions that are guaranteed and use them to make your code fast.

B

But what? If optimizations emitted warnings when it was doing something that it found? Fishy became less aggressive, for example around certain unsafe code constructs and you could configure them from the surface language. So you could say this function here should use should be used differently in in certain optimizations.

B

You can give meta information to optimization passes, so they handle your code specifically in different ways.

B

One way that you might already know about is allow in inline always or inline, never, which is which are attributes that you can add to functions to change the inlining behavior. But there are avenues for adding more of these annotations, so you can fine-tune your functions without changing your source code.

B

So you keep writing idiomatic, rust, but you're changing the way it gets optimized to fit your use case and um the the one thing that maybe got um a little bit under the wheels here is the emit warnings part um if you're emitting warnings when during optimizations, when when something weird is happening, you can turn them off. After having reviewed the situation and saying, okay, I've actually looked at assembly. The assembly is totally fine and I'm fine with this optimization doing something fishy here, but you get notified about each site where something weird is happening.

B

So, if you're reading your your source code for any kind of security or safety issues, this would be a really powerful tool that you can use and there's some preliminary work that uh in the rust compiler, where you can already do this on some const evaluation. Things where you get warnings, if your code, for example, would panic at runtime or if your code would guaranteed cause undefined behavior at runtime.

B

So summarizing optimizations are totally awesome because they make not only your code faster. They actually make your compilation times faster and they're super fun, because they're basically puzzle games. So for me, when I'm writing optimizations, I'm basically a programmer that programs programs for other programmers- and this is like you- you move pieces around and trying to to like put them together in a way that they become faster and yeah. This is a really fun experience and it's also super accessible, because it's very close to high to surface language rest.

B

So you basically have your um your you, your cut down language that you can work on and you you look at how it is before how it is afterwards and you adjust your optimization until it does exactly what you want. Then you run it on this really big test suite that we have in the rest, compiler and check.

A

If it doesn't mess up anything else,.

B

And it's a really good way to get into working on the on the rest, compiler.

B

um So um there's a bunch of things that are really high impact that we'll have coming in in the near or sometimes very far, future for optimizations. um So one thing that is known from a bunch of other languages is called partial special specialization.

B

When you replace one function argument with a constant because, for example, all call sides to a function are using a constant for the function argument, you can just replace all the uses of this argument inside the function with this constant and then optimize the function so um you're, basically creating special instances of a function for your use sites.

B

While this increases your binary size, um if all the users have used the same constant, um there's no loss here and um depending on the complexity of the function, um this might actually be a speed up both in in your runtime or it might actually shrink your your memory usage, because um a lot of code inside the function can get optimized out.

B

um The other thing that we are working on actively right now is called mirror inlining. So we are trying to create an inliner that does not have the downsides that lots of um heuristic inliners have so. Basically, you get a deterministic inliner that looks at the function body and knows either inlining. This function will be an improvement or it won't it's not that it makes a heuristic like statistically, this should be an improvement, but it tells you like this is guaranteed to be an improvement, and that's why we're in lining.

B

um Of course, you can still find unit manually by using inline, always or inline never, but the default will become a guaranteed. Optimization, there's uh also some work going on on polymorphization.

B

uh This is a long confusing word that I'm slowly getting into my um into my dictionary, but it basically means that you have a function that is generic, but instead of combining the function down to assembly for each generic parameter, that's being used in you, look at the function and realize oh, this function just doesn't really care about the generic parameter. For example, if you have len so the lens function on vectors, it doesn't really care what the generic parameter is.

B

It literally just reads a field out of the vector struct, so there's no reason ever to instantiate multiple versions of the function back len, but right now, that's what russ compiler does it looks at the function and if you have a back I-32, it will create a length function for that. If you have string we'll create another length function for that, so polymorphisation would prevent this kind of duplication.

B

Then we have value range analysis. This is a really neat feature where you look at the values how they are used. So if you, for example, somewhere a condition if the slice length is less than 10, then from then on any index that is smaller than 10 is known not to ever panic because of out of bounds, because you just checked if the slice length is less than 10. so from then on, you can optimize out all index operation out of bounds checks. If the index is known to be less than 10.

B

Another optimization the nrvo named return value optimization. It is a really cool thing for recursive functions or simply functions that return big objects. So if a function that returns a large thing, um what you might want to do is um sometimes is to put a function pointer into the function, a pointer into the argument list of your function and right through that pointer.

B

But rust can do this magically in the background for you, so you can automatically return the object by value from a function and rust will, in the background, turn everything around to optimize out any additional copies that you wouldn't normally need.

A

Similar to that there's.

B

Copy propagation, it's basically the opposite of nrvo, instead of looking um where you can return things or like, like looking from from the end where things are written to you're. Looking from the beginning like this variable is only ever used once so, maybe all users of it should just get the value the variable directly. So if you have like x, equals five somewhere and the the uses of x can simply be replaced by the number five. So you can optimize this kind of um this assignment uh another fun one is d virtualization.

B

So if the rest compiler can figure out that your trade object is actually known at compile time, it can just remove the trade object and remove any virtual function, calls and just call the functions directly.

B

This is a really common around a bunch of generic functions related to closures, so we could really really speed up some some kind of closure usage here and, of course, as already mentioned earlier, I really want to harden programs against undefined behavior by making optimizations lint when there is fishy behavior going on, and while this will give you false positives, we won't turn these limbs on by default.

B

But basically, if you are really in a critical environment, you rather take a false positive of a limb and turn it off and say I checked this lint is triggering falsely then have it not trigger on a situation where um well you're doing undefined behavior, and I don't know throwing away some secret data or well worth like shouting it out to the world um yep.

B

So this is it for me. um I'm going to be working on the impulse on me, optimizations and consewas. So talk to me in the next two days. If you want to know more about these topics,.

B

B