Rust Programming Language RustFest Global 2020, 23 Dec 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Andrew Dona-Couch - Tier 3 Means Getting Your Hands Dirty — RustFest Global 2020

Description

My simple embedded Rust code was broken, and I couldn't figure out why. Let me walk you through what I did to find & fix a bug in the compiler. We'll take a whirlwind tour, pausing at each step for a gentle introduction to the tools and ideas, until we finally learn to speak directly to the machine.

More at https://rustfest.global/session/14-tier-3-means-getting-your-hands-dirty/

A

Andrew donah couch will now go to the farthest reach of trust, to show if you're willing to get your coding, feed, wet tire. 3 has some room to grow.

B

Good afternoon welcome to russ fest global 2020. My name is andrew dona couch, and this talk is tier. Three means getting your hands dirty, I'm a freelance software developer, I'm a big fan of rust and I love playing around with hardware uh and recently hardware and rust.

B

You can find me on social media generally at couchand and, as was previously mentioned, this talk is about my experiences: finding and fixing a bug in the rust compiler within the rest, compiler within a library used by rust, called llvm, and this bug only comes up on the avr platform.

B

So what's avr avr is a series of 8-bit microcontrollers. These are tiny computers, they are an older series and, and so they're, cheap and they're. Very hacker, friendly and support for avr was mainlined into the rust compiler this past summer into rust and llvm, thanks to a significant amount of effort by a gentleman named dylan mckay. So thanks very much dylan, as you can see here, there are the avr line ranges my apologies. I no problem my cat chewed through my power cord, and so it was not charging.

C

All right, these are the risks of feline.

B

So, just to briefly go over these ground rules again, I I'm going to try to show everybody respect today. Hopefully I uh succeed at that.

B

We'll also be talking about things that I'm relatively an amateur about and so uh bear with me uh and, and most importantly, I'm gonna be oversimplifying because I'd like to cover a lot of material and uh due to those technical issues, we're of course now a few minutes um behind from our from our plan all right. So with that in mind, what are these simplified models that I'm going to be talking about all right? So the first example is a black box model we'll be talking about some complex system.

B

Take this rube goldberg self-operating napkin, and we're going to put this system inside of a box and the reason we do. That is because we want to analyze this complex system, not in terms of the details of the insides, the implementation, but rather in terms of its inputs and its outputs, its relationship to the outside world. Now that's a bit abstract. So let's look at a couple examples.

B

We can think of a computer as being a box, and the input to our computer is some kind of program and the output, uh we'll we'll say we're hand waving a bit here, but that we have two outputs, computation and effects on the world all right. This is a technical conference. So let's look inside the box and we find that there are four additional boxes.

B

There's uh we'll talk about each of these four in turn a little bit later, but right now I'll give you just a brief sense of of what they what they are so on the left. We see we have a register, we have a set of registers and some memory and on the right, our two boxes are called processor and peripherals, so just to get a vague idea of what each of these are.

B

We'll use the metaphor of an old school paper pushing office, and you have someone sitting at a desk moving moving paperwork around and that person is the brains of the operation and that's analogous to the processor, the desk in front of them that's covered with the paperwork that they're currently working on is something like the registers.

B

The memory is a bit like a file cabinet that holds the paperwork that they need. They need to have access to, but they don't need it immediately at hand.

B

And finally, the peripherals in this metaphor are something like an intercom sitting on top of the desk, where, if we need to connect with the outside world, if we need to interface with the outside world, ask for a cup of coffee. We can push the button on the intercom and and request a support all right. So let's connect our arrows from earlier.

B

So we said that the computation comes from. We said the processor is the brains of the computer, and so the computation can be thought of as coming from there, and likewise we said the peripherals are the interface to the outside world. So we can think of our effects on the world as coming from those peripherals and then the program will we will load that into memory and then we will go ahead and start executing from there.

B

All right, let's look at another example of a black box. This one is a compiler and we can think of a compiler as a box, and the program is the input to the compiler, and since this is a rust conference, our compiler presumably is rust. C. The rust compiler and our program is the rust source code for the rest source code for our program all right and we can look at the output of our compiler and well. Now, we've reached a really interesting point: what what is the output of this compiler?

B

So again? Let's not define it based on its intrinsic properties, but rather use another simplifying model.

B

So here we'll define this output by how we use it and we'll see that we use it by passing it as the input to our computer box from previous, and we already said that that input is a program, so we've now determined that this compiler is a box that it's the compiler something that takes a program as input and produces a program as output.

B

Now this is a bit confusing. So we'll be more specific about this. We have two different representations of the same program and the input representation. We already called it source code and the output representation is the one that's meant for the machine, the computer machine, and so we call it machine code all right. So again, this is a technical conference. Let's look inside this compiler and we see two more boxes.

B

We have a rust c front end and an llvm back end. Now, rest c, we know is the rust compiler, and so that term is not new, but llvm might be new. So what is this llvm thing?

B

Llvm is a library that uh it contains all of the common parts of compilers things that are shared among uh compilers, no matter what language they're, for it has things like optimizations um code generation uh for the for the machine code and and so forth, uh as suggested by that previous slide. Our back end uh in for the rust compiler is llvm, uh but the llvm library serves as the back end for many other languages, including julia and swift and and many others, and notably the s.

B

The silang clang compiler, which is part of the llvm project, is the system c. Compiler on on the macintosh, okay, so looking at our compiler model again and let's once again, connect our arrows for our input and output, and here we just extend them inwards and connect to the source code. Input to the rest c front end and we, the the source code, is the input to our front end and the output of our llvm back end is our machine code.

B

And that leaves one big gap. What is it that's between these two boxes? Well, it's another representation of a program and because it is intermediate between the source and machine code, the llvm authors have decided to call it intermediate representation.

B

It's frequently abbreviated ir, so we'll see frequently the the term llvm ir and that's this common currency between the rest c front end or the front end for any compiler that might be using llvm and the llvm backend.

B

All right, it's about time that I tell you about my project, so I was working on something that is more or less the hello world of embedded the the hello world of embedded is it's sometimes called blinky and the idea here is you take a microcontroller.

B

So here we have a breadboard with a 80, tiny, 85 microcontroller, that's a particular avr microcontroller and we will go ahead and connect it to an led. An led is like a tiny light.

B

So we have our tiny computer wired up to our tiny light and we write a program for the microcontroller and that program will make the light turn on and off and on and off and on and off and in embedded you tend to write things as an infinite loop, so we're going to turn on and off our light forever.

B

Well, I expect that, like me, for most of you, the bulk of your software development experience is on the desktop or server, perhaps modern, mobile or or a web context, and there are a few important things to know about writing software for an embedded context that make it slightly different from those other environments.

B

So the the most important difference is that in an embedded context you have extremely limited resources.

B

Now we sort of generally probably know what that means in terms of memory, processing, power and so forth. But let's look at again back at our computer model and see what what does this limitation in resources imply for each of these components.

B

So, first we'll look at the processor and inside our processor. We have again hand waving quite a bit here, some math related stuff. That's going to be doing, arithmetic, perhaps and some program control stuff that lets us do loops and conditional jumps and moving around in our program and the big difference on embedded is that it's slower and in some cases it's less capable, for instance, the avr devices that I'm generally developing for don't have floating point numbers. So all arithmetic has to be done on integers all right.

B

Let's look at another com component of our computer model, the peripherals. So I said previously that the peripherals are the interface to the outside world.

B

And let's look at a couple examples: we can consider a video streaming application where you might need to have some access to a networking, peripheral that you can use to fetch a video from some service and then once you have the video you want to show it to a user, and so it you, you use some sort of video hardware which would be provided by a peripheral now in an embedded context. It.

B

You certainly could have a networking peripheral, although it's not universal, you might have a video peripheral, although it's somewhat rare, you tend to have access to peripherals that are much more limited.

B

For instance, you could ask the computer to count to a number and then let you know when it gets to that number, and this is a bit like when my kids are in the other room and I'm working on the dishes, and my kids want me to come in and play with them, and I tell them. I need to finish up what I'm doing, and so I ask them to count to 20 and let me know when they get to 20.

B

and we can do the same thing for our for our microcontroller for our for our little computer. If we need to delay and perform some computation at a later point in time, we can ask the computer to count to a number for us and let us know when it gets to it all right. Let's take a look at the memory real briefly, so we're going to be talking a bit more about how this, how the the stack will work, but let's sort of get a a big idea of what it is inside this memory.

B

So I have this little photocollage that I made up, and hopefully it is not completely misleading, but we have three, broadly speaking, three parts of our memory. um So, as we said previously, we're going to put our program into memory, we're going to load the program into memory, and so that's we see at the bottom. The brick wall at the bottom is our is our program.

B

That's been loaded and any static variables, any static, static variables in the program would be there and then we we're going to have a heap potentially, and this is a bit like the program. This is where the program stores things that it may need later, but it doesn't need right now it can throw them on the heap and then go back and look for them later.

B

And finally, we have a stack and there are really two models that might be useful for thinking about the stack. So the first model is a bit is more of an operational model, and this is where you are you're. Thinking of the stack as sort of nested context for the program, so we have a perhaps a pile of file folders on our desk. um Now we're departing a bit from our desk metaphor earlier, because here the desk is not the registers um but but using a slightly different metaphor.

B

So I'll have perhaps I'm working on a client. I have my fi the file folder open and I have some paid pages and then my boss comes in and has is holding the file folder for a client.

B

That's more critical and we open that up and set it down on top of the pile of papers on my desk, and we look at that one and then the boss's boss comes in with another client's file folder, and we open that up and put it at the very top of the stack and then, when we finish the boss's boss's client, we close that file folder up and take it away, and we resume work on my boss's client and when we finish that up, we close that folder up, my boss leaves- and I can resume work on whatever it was before that I was doing before I got interrupted uh and that's sort of the the nested context, view of of the stack, the sort, the somewhat more physical view of the stack I like to think of as a stalactite uh growing from the roof of the cavern and and the reason I think of it.

B

This way is that in most uh in on most computers, the stack begins at the very top of the memory at uh memory. Very the very highest address and grows downwards grows down into memory, and so it's a bit like a stalactite hanging from the the roof of a cavern and.

B

uh Yeah, so it's a bit like a stalactite hanging from the roof of a cavern where, as we add additional context, our stalactite grows down and then, as we resume previous context, we shrink it back up and the most important difference for the memory in an embedded context is that it's much that it's that there's that it's significantly restricted. You have much less memory in most embedded devices, for instance the a t, tiny 10, that you saw on the very first picture of the avr slides that has one kilobyte of program memory.

B

um There are. There are similar devices in that line that have only 512 bytes of program memory, and so you must write your program so that it fits entirely in 512 bytes and that's a very significant limitation. You need to keep in mind when you're developing all right for completeness. Let's look inside the registers and talk about what the the types of registers are.

B

So the main class of registers are is general purpose registers, and this is where we store the the state that we're currently working on, as I as I had mentioned previously in in the desk. The desk metaphor- um and we can sort of think of the general purpose registers as a pile of etch-a-sketches on a table and anytime, that we need to ask the processor to do work for us. We have to write down the numbers on the etch-a-sketches and give them to the processor.

B

So if we need, if we want to, if we want to provide some numbers for the processor to work on, we need to go to our table and pick up an etch, a sketch and and ideally find a blank one. But if we don't find a blank one, we pick up one that's in use.

B

We write down somewhere, whatever number is on it and shake it to erase it, and then we put our the the value of of our our number on it and we need to write it down because we probably were using that previously and so when we're done with, with whatever we're doing immediately, we want to restore that edge of sketch to its previous state, and so we want to know what was on it before we erase it so that we can put it back on.

B

And this is a process- that's referred to as clobbering the registers, so when you're using a register, that's already in use, you clobber it and if it's important for you to maintain what was in it previously, then you have to save and restore that register all right. We also have a few special purpose registers and there are lots of these, but there are two. There are only two that we will talk about today and the first one is called the status register.

B

This tells you what the processor has been up to recently, for instance, if you do some math and the result is zero, the status register will tell you that the last the last math you did the result is zero.

B

Another important status register we'll talk about later today, is a frame pointer, so the frame pointer, always points at the current function, stack frame it points at where, on the stack, we can find the local variables for the current function and, like I said there are many other special purpose registers that that we won't be talking about today.

B

All right, so that is the first difference. We took that opportunity to also expand our computer model a bit, but remember the the first significant difference for developing and embedded is that we have these very tight resource constraints.

B

Alright. The second difference is that in an embedded context, we're working in a it's a no standard environment, and what this means is that you don't have access to the standard library.

B

You do have access to the core library, the rest core library- and this is the parts of the standard library that don't require memory allocations and don't require platform support.

B

So with the core library, we don't have access to collections like a vector or a hash map, and we don't have access to operating system. Things like like a file system, but we still do have a lot of the the basic functionality.

B

In addition to no standard context, we are working in a panic, abort model frequently, and this means this is because unwinding. The stack in the case of a panic is very expensive in terms of memory use and processor, and so in general, when developing for an embedded context, we abort on panic rather than unwind the stack, and so this just leads into the third difference, which is limitations on debugging and observability.

B

If we can't unwind a panic, then if a panic occurs, we have a lot a lot fewer clues about what has caused the panic to happen, but there are other debugging and observability limitations.

B

So I'm vaguely aware that there are these professional in-circuit debugger systems, there's a standard called jtag that I'm aware exists, but I've never used these myself. My impression, which might be a naive, express impression, but my my understanding is that they're probably expensive hard to set up and probably windows only systems, so I've never really looked into them.

B

So what that means is, in general, when I'm developing for an embedded context, I'm using a much more naive, debugging methodology.

B

Now I tend to like to write my software using see lots of unit tests with really good making sure that I have a really good mental model of everything that's going on, but when I'm developing for embedded, I often start working on a project with with the like this meme shows of my code doesn't work.

B

I have no idea why and I changed something about it and then it works, and I have no idea why and- and it takes much longer for when developing and embedded for me to to really understand what it is, that's making it work and not work.

B

One third option that I've recently started looking at for debugging uh in an embedded context, is simulating so now, because these embedded devices are so limited. It's actually quite easy to run a program on your desktop computer that simulates the entirety of the embedded device.

B

So there's a wonderful one for avr, that's an open source project called simavr and it allows you to run simulation of your avr program and you it can produce these trace files as output, which you can load into a gtk wave or other trace visualizers, and here we can see a trace from a program. I was simulating a few weeks ago running on an avr device.

B

Something else that's interesting about sim avr is we can use these trace files as inputs. The trace files can represent memory inside the processor, but they can also represent the state of pins on our chips. So, for instance, my a t, tiny 85, that I love has eight pins. Two of them are power.

B

One of them is a reset pin, so we have five five general purpose pins to use, and so you can see the state of those five pins and if one of them is an input, you could provide a trace file that has those inputs all right. So that's the debugging and observability somewhat related to that is in an embedded context.

B

Your compiler is a cross compiler you're generally running the rust compiler on the same device that the program you're compiling runs on, but in an embedded context you compile it on a host machine like your desktop computer, and then you send that to the embedded device and run it on the embedded device and there's a whole host of nuance to this, but it, but it leads to things being a little trickier okay. So, let's get back to my project, it's a simple real world button, handling library, real world example for a button handling library.

B

Now the details are not relevant, so I'm not going to get into it. But this is the button handling library that I, that I'm writing an example for- and the thing to to note here- is that it's an it uses embedded how which is a hardware abstraction layer for the embedded context.

B

So that means we can write code for an embedded context that doesn't need to know details about the about what hardware it'll run on okay, so we'll go back to blinky, so we're wiring up a microcontroller and we're connecting it to an led, but because I'm writing this example for a button handling library I'll, add a button, and the idea is that I can write up my an example that when I push the button it will blink the light on and off all right.

B

So I take my existing blinky example, which I've run and I know that it works and I add in a little bit of change. I add in a little bit of support for the button part, and I send that to my microcontroller and I try it out and nothing.

B

It doesn't work. Okay, it's a it's a very simple example: it's not much code, but still there's a 95 percent chance that the it's my fault, 95 percent chance. The bug is in the code. You just wrote, but in fact I'm using unsafe uh for for this code because I'm using static mutable variables.

B

So it's actually more like a 99 chance that the bug's in my code, so what are some other things that I'm thinking about well avr is tier three, so tier three means that it's supported by the rest compiler, but it's not supported it's not built and tested automatically by this continuous integration, server and critically, as you can see from the docs here, it may not work so where everything's a bit up in the air when you're running on tier three, and also, as I mentioned previously, using static, mute, static, mutable variables and that's always unsafe, and I'm not.

B

I don't know about you, but I one reason I like writing rust is: I can almost always ignore unsafe code and not have to think about it. So I'm not entirely confident that I'm using my unsafe code correctly here.

B

One other thing to keep in mind is that avr, interrupt service routines are explicitly experimental, so the code that I've written uses an interrupt service routine- and this is experimental, there's a tracking issue in rest. That has no no progress effectively made on it, and so that means I have to compile with the rust nightly and add a feature flag and all of those sort of add confounding factors to the debugging process.

B

Okay, I said that avr interrupts are experimental. I said I'm using interrupts, but what is an interrupt?

B

Well, an interrupt. It's it's a lot like a function call, but instead of one piece of code calling another function, it's a bit like the world is what's calling your function and, of course, as we saw in the model previously, the world here is the peripherals.

B

So we have a function call we have a function on the left, a and a function on the right b and a does some things and then calls into b, and when b is done, it returns back to where in a it was, it was running. And what is this? Actually? What does this look like in terms of what we were talking about with the stack?

B

Well, there's this thing called a calling convention and the calling convention says some of the working registers need to be saved by the function. That calls that is doing the calling and so we're going to do that. First, we're going to save working registers that we need to save, and then we jump over into the function and then the first thing inside the function is that we may need to save other working registers, and these are the callee saved registers and which registers are in are saved in step.

B

One or step three are determined by the calling convention. But the important thing to note here is that there that some are done in one and some are done in three okay. So let's look at the same thing for an interrupt. We have function a on the left and we have an interrupt service routine on the right and function. A is just doing some local things, but the world says: oh, wait: let's call into our interrupt service routine.

B

Perhaps the computer finished counting to 20 or perhaps finished counting to 256, and uh no so now we jumped into our interrupt service routine, or perhaps we received a byte on a network interface, but something in the world has something in a peripheral has determined that we need to service this interrupt. So we go into the interrupt service routine and then, when it's done, we return back to wherever it was that we were interrupted all right. So what is different for a calling convention for an interrupt service routine, as opposed to a regular function?

B

Well, the first thing is that we don't we're not jumping into a function on step, two we're jumping into an interrupt service routine, and the second thing is because the function that's getting interrupted doesn't know that it's going to be interrupted. It can't perform step first step. One first, the function that's being interrupted doesn't know that it needs to save the working registers.

B

So we need to move step, one down to step two, and so now the first, the now the first thing we do inside our interrupt service routine needs to be save those working registers that would otherwise be saved by the caller okay, so I have some code and it's not working and I think it should, and so I'm making a bunch of changes to to try to get it to work or not work, and some of the things that make the bug up here and disappear are moving. My interrupt service routine code into main well.

B

This, maybe is a clue but doesn't provide a lot of info because we know isr is our isr, is experimental and, and also just interrupts, are a bit hard to to to identify. What's going on, we also, if I change an inline annotation to inline. Never that makes the bug now. This is curious. I've I've heard of there being compiler bugs related to inlining, but that means it's not my bug. It's not a bug in code. I wrote that implies it's a bug in the compiler and it's never a bug in the compiler.

B

So that's very curious. Another thing that can make the bug disappear is adding a map error to unit before I unwrap. So this is basically I've built a conveyor belt and any error is coming down. I just throw into the trash, but it's important to note that this error never actually gets called the this map. Error never actually gets called so adding I'm adding code that doesn't get called and that changes the observable behavior of the program, all right.

B

Let's dig into that just a bit more so here I have the an example of my broken code: it's an interrupt. We call it pcint0 because it's an interrupt on the pin change.

B

If our pin changes, value, we'll jump into our interrupt, and so on line three we toggle our led and then on line four we unwrap well. Why is it that toggle returns a result?

B

This goes back to the embedded hal I was talking about. Embedded hal has a number of traits that different pieces of hardware can implement and in this case we're talking about an output pin trait and it returns a result because on some platforms, attempting to set an output pin can fail, but not on avr on avr this, the error type used for this trait is infallible. It's void, and so this result can never be the error case. We know statically.

B

The result is never the error case, so the unwrap never actually gets called or the the the error case of the unwrap never gets called all right. So, like I said earlier, we can make this work by mapping our error to unit. Now, it's not entirely clear. Why throwing away the original error and replacing it with an empty error would make this broken code work, particularly because I know statically that that that that error case never actually happens.

B

We can also replace that with a panic. If we explicitly panic rather than panicking, when we unwrap an error case, then it works.

B

So I don't myself see a semantic difference here between this panic version and the original broken version. So I'm going to say, I found a bug found a bug in the compiler. We need to minimize our reproduction of the bug, so we're going to remove all of our external references.

B

We're going to remove, of course, the crate that I was writing. Example code for we're going to remove any other crates that I make use of I'm going to remove references to the core library where possible, because I'm trying to eliminate anything non-essential and, finally, I'm going to remove the memory shenanigans around my unsafe static mutable, because I want to eliminate anything that could possibly distract from the bug- and I do this by copying.

B

My working version of the code to a file called a dash working and the broken version of the code to a file called a broken, and I make the same change to both of them and I compile them and send them to the microcontroller and check that the working version is still working and the broken version is still broken.

B

And I make these incremental changes repeatedly until I about it, looks like I guess the the y case x or y I get to y, and I have what I think is a minimal reproduction and the minimal difference here looks a lot like my my minimal difference before, but I've removed all of the core library code and the and the crates that are in use and just by changing a reference to to an error to a reference to unit, I can make the bug appear and disappear, and so I've confirmed a bug.

B

I have a minimal repro, so I file a rest issue and I sit for a little bit and nobody comments on it. I guess people have other things to do so. I say I'm going to go ahead and dig into this now. I'm vaguely aware that rust uses llvm and I've done some messing around with llvm in the past. So I think, let's take a look at the llvm ir. That's emit. That's that's that's generated for this code, and so I use this incantation.

B

Rust flags equals emit llvmir before the cargo build and the rust compiler dutifully complies and emits some llvmir and here's the version for the working code, and it's not important to understand this in detail, but we can see what the broken codes difference is and that the main difference is that we have this one aloka.

B

So we have an eloqua and that makes the code fail. Well, what is this a loca that we see we're reserving state space on the stack for the local variable? So we said previously, our stack has a stack frame for each function and if that function has a local variable, we need to reserve space on the stack frame for that local variable and that's what an llvm aloca instruction does.

B

But why is reserving space break? My code break my break my example, so we need to keep digging.

B

So, let's take a look at some assembler assembler is a textual representation of the machine code that we're talking about earlier. It's as close as you can get to really understanding exactly what the machine sees without reading the binary itself and the rest. Compiler has a flag to emit assembler as well, and it looks very similar to the llvm flag and we ask it to emit assembler and the rest. Compiler dutifully complies and we get this working code, which is slightly different, which is significantly different from our broken code.

B

That's a lot more, a lot, more significant differences and they come in three main sections. So this first section where we push some things onto the stack and then we do some in and outs and we'll go into exactly what this is doing a little bit later. And then we have a second and third section where we're doing some pops and some more outs, all right. Let's walk through this, but first we need to get a little bit more information about how to read this avr assembler.

B

So the push statement, like I alluded to a moment ago, is we're pushing a register value onto the stack. We take some value in a register and we put it on the stack to save it for later and when we want to get it back, we pop it that takes it off of the stack and puts it back in our register.

B

We have an in operation which will take a value from us one of the special registers and put it into a general purpose register and we have an out command that will take a value from a special register and I apologize take a value from a general register and put it into a special register so push and pop take register values to and from the stack and in and out take register values to and from special registers.

B

For the purpose of this talk that they do other things too.

B

uh We we can also clear a register, and we can disable interrupts with the cli command, so disabling interop set tells the tells the machine to not interrupt us so that we can run some code without being interrupted, and we can also call that clearing the interrupt flag and it's worth noting that on avr that interrupt flag is in the status register. We were talking about earlier all right, one other important concept.

B

Before we dig in here we have a prologue and epilogue for every function and these bookend the body of the function, and they provide that calling convention that I described earlier and it's important that these fragments mirror each other because they tend to use the stack to implement their their the saving and restoring of registers. They need to mirror each other.

B

Well, what does that exactly mean to mirror each other? Well here we see the prologue and the epilogue of our working code, all right. So let's read this from the outside in so we're going to start with a push and pop on register zero.

B

So, if we're starting from the stop from the top of the function on line two, we push register zero onto our stack and then we push register one onto the stack and then we're going to have perform this sequence so 63 on line 4. The constant 63 refers to the special register, the status register, so we're going to read in the status register and we're going to push it onto the stack.

B

So now our stack has register 0's prior value register, 1's prior value and the status register's prior value, and then we push register 24 onto the stack all right. The the reason that 24 is down below and r0 r1 and status register are up above is because register. 0 and 1 are the caller saved registers and the only reason we are saving them here is because we're in an interrupt.

B

If we were in a regular function, this prologue would start with line seven okay, and then we go do lines eight through ten, which are the body of the function, and then we perform the epilogue. We pop 24 off the stack we take the we pop the status value off of the stack and we use an out command to put it back in so now, our status special register has its prior value and then we pop register 1 and register 0 such that at the end of this interrupt.

B

We have now restored all of our registers, our special and general registers, and we can return all right what's different about the broken code.

B

So in the broken code we have the same sequence at the start for the prologue, so we can push register, 0, 1 the status register and 24, and then we have a few more callee saved registers that will push onto the stack 25 28 29, because our broken version of the interrupt service, routine, clobbers, three additional registers, okay and then towards the end, we'll pop registers, 29, 28 and 25 off the stack.

B

And finally, we have the pre, the epilogue from our working where we from our working code and this epilogue pops 24, it pops our status value and it pops register 1 and 0.. But note that sequence is interrupted by this. This other sequence.

B

So that's a bit mysterious and we note that this is the sequence to adjust the frame pointer so before we walk through that, let's walk through the corresponding sequence from the prologue.

B

So first we're going to read in from 61 and 62 and we're going to store that on registers, 28 29., we said previously, we clobber 28 and 29, and here is where we do that so 61 and 62 are the special ad the addresses of the special registers for the frame pointer. So we read in the frame pointer and we. uh So this is what we're supposed to do here.

B

So we'll read in the frame pointer into register, 28 and 29, and then we subtract one from it on line 31, we subtract one from the frame pointer and then we go ahead and we send that updated version of the frame pointer so that subtracting one is the at. What's allocating space on the stack and then on 16 and 18, we are going to put our updated version of the frame pointer into the frame pointer, special register.

B

All right. We note that that little sequence is interrupted by this section: 14, 15 and 17, and this is a miniature version of saving and restoring the status register. So on 14 we we save the status register into register zero on 15. We clear interrupts so that we can perform our pushing of the frame pointer without being interrupted and then on 17.

B

We restore the status register, restoring the value of the interrupt flag, something curious to note that 17 is before 18, because you get an extra instruction free when you set when you enable interrupts.

B

All right and then what is it supposed to do? To restore the frame pointer at the in the epilogue?

B

Well we'll go ahead and add one to our updated frame pointer in registers, 28 and 29, and that's going to that's going to restore that value to the what it was prior to entering our interrupt service routine and then we'll output that value into our frame pointer and at the end. Our frame pointer will have its original value.

B

And you can see we have the same little status register and interrupt clearing.

B

All right, that's what it's supposed to do, but we note that it we break symmetry here in the prologue we push into 28 and 29, and then we read into our read in our frame pointer in the epilogue we pop from 28 and 29, and then we send out to our frame pointer. So we have a push and an in followed by a pop and an out. But if this were symmetric, if this mirrored properly, it should be push in out pop.

B

So we have these last, the the epilogue is in the wrong order. Let's see, let's see what that actually means. So what is it actually doing? Well, as we saw previously were first, we pushed the 28 and 29 registers onto the stack and we read in the frame pointer in on lines 11 and 12 and subtract one from it and send that back out to the frame pointer.

B

So our prolog is fine, but then, in our epilogue well, first on 22 and 23, we pop our values from registers, 28 and 29, and then on line 28. We add one to that value and then on 31 and 32. We put that value into our frame pointer register and we note that's not the prior value of the frame pointer register.

B

That value is a completely unrelated value, based on the previous value of some unrelated registers.

B

So we've now confirmed we have a bug in llvm, so I file an issue and here's a screenshot of the issue in llvms bug repository and I sit on it for a bit and I wonder who's going to fix it. Well, hermes, conrad one of my favorite characters from futurama said: if you want a box hurled into the sun, you have to do it yourself. So let's dig into the guts of llvm and I'm running low on time, so I'm going to breeze through this, but don't get overwhelmed.

B

This is c plus code, but I'm mostly concerned about the comments. So we see that we we have special epilogue code to restore registers, one register zero and the status register. That sounds familiar, and then we see this early exit. If there's no need to restore the frame pointer- and I recall restoring the frame pointer if we don't need to restore a frame pointer- the code works if the code, if we do need to restore the frame pointer, the code doesn't work.

B

So this triggers something in my mind, and then we see that we're going to skip the kali, save pop instructions and then insert our frame pointer, restore code all right. Let's match this up quickly to what we had what we saw in our assembler.

B

So we said we are mid, emit the special epilogue code here and we see that's the same as this special epilogue: we're storing the status register and register one and zero, and we see we restore the frame pointer by doing this arithmetic and that matches up to our the sequence we walked through a few minutes ago.

B

So now that gets to this bit in the middle, this question of: where do we insert the frame pointer restoration? Well we're going to do a loop here. Mbbi starts with the end of the function, and it goes until as long as we haven't reached the beginning of the function, we step backwards through the function and we check if the op code. If the current instruction is a pop, then if the current instruction is a pop, then we continue. If it's not a pop, then we will break out of our loop.

B

Well, what does this look like in our broken code? So here's the broken code before we insert our frame pointer restoration and we start on 29 and that's a pop, so we keep going. We start on. We see 28, that's still a pop. We keep going, we see. 27 27 is not a pop, so we insert our frame pointer restoration code there and that's what leads to our our our pops, our our frame pointer being restored later than it should, and we see lines 22 and 23 really need to be after the frame pointer restoration.

B

So I can go ahead and make a fix now that I've figured it out. It's actually quite straightforward, once I've once I worked through.

B

All of that, I pull out a function to restore the status register from this special epilogue code and in the case of an early exit, restore the status register, then, and otherwise restore the status register at the very end, and I contribute that to llvm first write the fix, but oh, I probably need to write a test to make sure that it works, and before that I need to compile llvm, which is itself subjective, perhaps a full talk.

B

Then I submit the patch to llvm here's a screenshot of the fabricator interface that llvm uses, and I I get dylan mckay fortunately had the time to to review my patch and and and committed it, and so I appreciate that thanks again dylan, so it's fixed, the bug has been fixed in llvm and now I want to contribute it to rust. So rest keeps a fork of llvm, so we cherry pick the fix into that fork and then need to update the rust compiler and after a couple of prs, get landed.

B

Finally, the rust bug has been fixed, hooray all right. So what are my next steps?

B

There are several other outstanding avr issues, including, as you can see, several that relate to avr, interrupts and now that I've worked through stepping through the assembler that's generated and working through the the code that generates that assembler. I feel a little bit of a responsibility to take a look at these bugs I haven't had time to yet, but I hope too soon.

B

Well, that was a whirlwind, but thank you very much for listening and for your patience with my technical difficulties at the beginning. Hopefully we have maybe a couple more minutes to take a few questions. If anyone would like to hear anything more. Thank you.

C

Andrew thank you for that incredible talk which can only be called epic thanks. That was an epic wow, um so avr, maybe tier three, but your patience. Man is like god here, not.

A

Not only not only.

C

All you've done, but also with like handling all the tech issues, and talking just you know, I can't believe so. Thank you so much for your patience. Actually we're just like this. This is gravy for us, um so I do have a few questions. um First, avr is quite a new target right, uh that's right, how are you finding it and have you tried things like stm32 targets.

B

I have uh I've messed around a little bit with some of the other embedded targets. I haven't done the stm32 for for anyone in the audience not familiar. That's the target that the rust embedded book, the the intro book talks through using a board called the discovery board. I have uh on my list of too many things to do. I have uh the goal of picking up one of those discovery boards and working through uh that that that, but I haven't, uh I haven't, had the opportunity to do that.

B

I um yeah I have done a little bit of arm. Development uh arm is another embedded target, another embedded platform. I've done a little bit with rust, but honestly very little. I've done very little rust development at this point. Most of my embedded experiences with c programming in c.

B

I've never liked, and it's always been frustrating, and so I'm very grateful that uh that the rust embedded community is working so hard on making rust and the and the rust compiler contributors are working so hard to make make rust a viable option um for embedded, because that's there's there's a lot of potential there.

C

Absolutely do you know of a good reference for avr assembly.

B

uh I, uh the avr documentation generally, is pretty good um directly. It's often hard to find the right pdfs, but once you find them um yeah they they tend to be pretty pretty solid. So the avr avr is actually a very limited platform. It's a it's uh much more limited than than, for instance, arm, and so the documentation, the the assembler reference is quite complete.

B

So if you search for the avr assembler reference guide, uh the and I could drop a a link to it in in the slides when I release those, but but that uh that guide is quite complete. um The other resource that I found to be incredibly helpful is a as a forum called avr freaks, um and this is a forum that that a bunch of people who love programming for avr uh answer all kinds of questions. So almost any question that I have has already been answered on that platform.

B

On that, on that uh forum um I mean on one one post or another, and- and so you know coming up with search results on avr freaks is, uh is fantastic and then the third resource I would suggest is the avr.

B

The the lib c documentation for avr is also contains a lot of nuggets that are very useful for illuminating how things actually work on on the avr platform, cool.

C

This is an interesting question: do you think that making types of the standard library less dependent on the global allocator would make your job easier in any way.

B

Certainly yeah, that's a great that's a great point. Yeah I have. I have not yet experimented with using out the allocator you can so I was previously in my talk. I was talking about you. Don't have access to the standard library, but you have access to the core library and then there's a middle ground there, where you have a lib aloque that can give you access to uh to collections like vector and hash maps and so forth, and you can theoretically compile that for for for an embedded context. I've never really experimented with that.

B

The devices that I tend to work on are the the at tiny's, which are extremely limited, and you it's almost always worth doing analysis ahead of time to make sure that you don't run out of memory and that analysis is significantly harder to do if you're using the heap. So my programs unembedded almost never do I even think about reaching for the heap because um it it it seems like it's going to be, creating a lot more problems than it would solve.

B

I think on other embedded devices, it probably is more it's much more relevant, particularly. You know, for instance, arm obviously, but but uh but but other more capable platforms. um I think using using an allocator makes a lot of sense.

B

I also not related to avr, but I feel like the reliance on a global allocator for the standard library means that other other contacts, so that the other context I do a lot of my development in is is high performance, web development and that's a place where being able to use, for instance, a slab allocator on a on a per object basis would be incredibly valuable.

B

But again I I think that the you know I'm getting into the weeds for something not related to this talk, but but I do see that the the user, the user experience benefit of having it be based on having those standard libraries based on a global allocator, probably outweighs the uh the technical benefits for these niche use. Cases.

C

Cool cool all right. There are a few more questions, but um we're really running out of time, so um maybe you could answer them in chat um or or later on. So once again, thank you so much andrew for that epic talk.

B

Yeah glad to thanks, everyone have a good.

B