GitLab Secure: Brown Bags, 28 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020.07.28 - Brown Bag: Creating a Snapshot-based, Feedback-guided Fuzzer

Description

This is a BrownBag Session (https://gitlab.com/gitlab-org/secure/brown-bag-sessions/-/issues/33) about creating a snapshot-based, feedback-guided fuzzer that uses perf events for feedback. Project with example code: https://gitlab.com/gitlab-org/vulnerability-research/kb/presentations/creating_a_snapshot_feedback_guided_fuzzer

A

All right: well, this is a brown bag session about creating a snapshot and feedback based, fuzzer and yeah. We will go pretty technical. I have a fair amount to cover in less than an hour. So if you have any questions, please bring them up while we're talking and then I can.

A

I can condense or expand content as I need to. If I wait till the end, I'm afraid nobody will have time for questions all right, I'm going to share my screen and we will get going all right.

A

So uh I am james johnson. I am a staff security engineer on the vulnerability research team at gitlab and a lot of my background has to do with buzzing. It's something I find really fun to develop tools for it. You get tossed into a lot of interesting situations, problems to solve and yeah. It's just interesting to me and I have done it a lot at past jobs, and this is a link to all of the example material in the slides.

A

So if you wanted to, you could follow along and download everything and run it yourself. While we talk about it.

A

Let's see, I can paste this into chat as well.

A

A

All right, so this is what we will be talking about today. So we'll start at a pretty relatively high level, with debugging um talk about what we want to cover. Why we want debugging with the types of information we might want to capture and then we'll cover mutation, snapshots, feedback and then we'll touch briefly on using a corpus of inputs for fuzzing.

A

All right, so you could call this a very basic debugger. It's a bash script and oh here we go that's better all right. It runs a process. You get the exit code and if the exit code isn't zero, it isn't successful. You could say, crashed it operated in a way that wasn't intended, but you have no idea. What's going on and bash, I will say, is probably not the best way to implement a fuzzer or a debugger.

A

So here's roughly the same thing except written in rust. So a lot of the source code in this is in rust. It's something that I've enjoyed learning. I don't really use it much here at gitlab, but a lot of the source, material and research for this presentation comes from some things that I've done on the side uh and you know just for fun programming and that has been in rust. So it's been easiest to do a lot of the code in this presentation and rust so yeah, but this is doing pretty much the same thing.

A

It spawns a new process uh checks to see if it's successful. If it's not successful, we'll say it crashed all right.

A

So if we are going to test the debugger, it is often most useful to have a known crashing program, so you can test things out so the first example is this target simple.c it has one function, it has a main function that calls this function, and if the data passed to this function is gitlab, then it crashes- and this is a no pointer dereference- we're trying to write the character a to the address zero, which will fail.

A

All right, so if we run this locally, uh you can just compile it with gcc uh or clang. If you want- and you run it like so we'll see. Oh, I did not have oh here. We go uh we'll see something like this, uh we'll see a seg fault and if we check the status or the exit code of the process that we just ran, it is non-zero, so our basic bash or super basic rust debugger. We will see something it didn't exit cleanly.

A

Now, let's go back though, uh and this is.

A

All right, this.

B

A

Actually, yeah that side didn't need to be there, uh but we need more than this this and this isn't enough information to really have a robust debugger and to figure out what's going on or if the crash was interesting.

A

So just looking at the output from here, we can tell that the exit code was 139, but we don't really know much else. We know this is the code that caused the crash. We wrote it it's very simple, um but we don't really know much more. Besides it had a non-zero exit code and we know the exit code is 139., so we need more information.

A

So if we run the same target, simple binary with gdb, we'll see we have a seg fault, we have a signal sig seg v, which means segmentation fault, and so that's now we have the name of the signal that we are capturing with gdb.

A

Now we're not going to automate gdb in the past, I've done that type of debugging, quite a bit where I just wrap an existing debugger. I've used that a lot from python both on windows and on linux. I've written generic debugger wrappers that wrap either gdb or cdb, so command line 1db when debug and it works pretty well, but for really fast kind of higher performance fuzzing.

A

It does not work that well um and when I was using it, it was mostly for things like adobe reader browsers that type of thing your performance on those is pretty small anyways.

A

All right, so, let's talk briefly about signals, so this seg fault or segmentation. What was the full?

A

That's just a segmentation fault, I'm not sure what the v stands for, but the seg fault signal is sent by the kernel when memory errors occur, you can think of signals as they're similar in the same way that interrupts relate to kernels and hardware. Signals relate to processes and the kernel. So with hardware an interrupt can be sent to the kernel that absolutely needs to be handled. If it isn't handled, then the computer itself will just crash.

A

The same relationship exists with processes and signals. The kernel can send a signal to a process and the process may handle it or not. um That is one of the big differences is that most handlers are optional or they have default actions. um So sig kill and said. Sig stop can't be handled, though. um If that is sent to a process, the process just dies.

A

There's no way to stop it all right. So let's go through a few more examples of contrived crashes in programs just so that we can look at the signals they generate now. This one is a double free. uh We allocate some data and we free it twice in a row. This does come up in uh complicated c based languages. um You can have double freeze and this does not generate a seg vault. You have a sig aboard so a little slightly different behavior.

A

So if we are looking at trying to glean as much information as we can from the target, as it's running, knowing which signal was sent to the process, can be very useful um or more useful than not having it. Let me say that all right, so let's look at another one. This is a stack overflow, so we're not overflowing a buffer. Just a stack overflow, infinite recursion. We run out of stack space. um This one is also a seg fault and again we're doing the same thing. We compile it and we run it with gdb.

A

So we can see this uh signal and here's another one. This one is a little more interesting on this one. I is. I have seen this before, especially with, if you're fuzzing um a target that has some aspect of jit to it.

A

You can have it execute invalid instructions or if you have a use after free, and you make the instruction pointer jump to some random place in memory that happens to be executable it can you can see this error an invalid instruction, so a 90 is a knop, and so that's here and then cc is in 3 and then knop in three and this one six is an invalid instruction on x64, um and so what I'm doing here is I allocate some data.

A

I set it to be read, write, execute and then I cast that data location to be a function, a callback, and then I call it to make the instruction pointer go to that data. Then we start executing these raw, these raw opcodes directly, and this is where it crashes. This also happens to be a seg fault all right, and this is actually the full list of signals.

A

uh One of the ones that I recently had to deal with was: oh, it's not going away all the way at the bottom. There's sig of winch and I'm pretty sure it means window change so when the window resizes, that signal is sent to the program.

A

So if you are just generically capturing all signals sent to the process- um and I actually haven't fixed this in the code- so if you run the sample fuzzers uh in the examples directory, if you resize the terminal window, it will think a sig winch signal is a crash, um so yeah there's a lot of signals going on and it's kind of interesting having that insight into what's happening with the process, all right. So uh so we've got signals. We know the exit code, but we really want more information than that.

A

So next up we really need to have our own debugger on linux. This is where I would start using p trace and that's what we'll be talking about. We can use this to fully uh debug and control process. We can read and write registers. We can look at the process. State read memory from it, change, register, values, monitor the signals. uh Everything and actually this is what gdb uses uh uses p, trace all right and so I'll be quoting the man pages.

A

A lot in here uh it's the very succinct, uh clear definitions about what all these functions are, so p trace, uh there's a p trace function um and you send it a request and the request values like p trace trace me um are the main way to use p trace.

A

So the way you start using p trace is you have a child process in the child process you have to have. uh You have to call p trace with the request p trace, trace me once that is done, then you can start debugging the process and tracing it.

A

So, in the examples directory there's the b spawn with p trace very similar to the a example- and here we are doing exactly what I said. So we spawn a new process and that's what we're doing here when the process is spawned, it's initially stopped, and so we execute some code first in that context, and that's where we do the p trace trace me request and then after that is done, the process is created.

A

Then we have the child pid, and then we wait for the pid and make sure it is stopped, and once we hit this point, then we just continue the process and the process continues running.

A

How are we doing on time? Okay, we are yeah, we're doing just fine all right. So if we run this uh with each of the different processes we can see or each of the different example targets that crash in different ways, uh we can see that we are capturing the different signals, so uh all of them are seg faults, except for the double free, which is the sig board, um so yeah yeah, our debugger, is working.

A

So knowing the signal still isn't enough, but we want the registers, we can also look at the last instruction. We can start analyzing the stack and the heap having a debugger in place where you can actively inspect a process, is very critical in automating. This type of thing, all right so another way that we can gain additional information about how a process is crashing or how these errors are occurring is using sanitizers and clang.

A

So sanitizers are so clang is part of the llvm project and part of how the llvm project works. Is it takes source input and transforms it into llvm's intermediate language, so llvmir and then transformations are done performed on top of the ir, and then at that point or after that, then the ir is transformed to architecture-specific uh machine code.

A

So the sanitizers operate on the ir and they insert instrumentation in different ways so that we can get extra feedback about how things are occurring. So we may get more insight into how something crashed, but it can also ensure that things do crash when they go wrong. um So certain types of errors like use after free is a good one. um When, if you're not running with one of the sanitizers, you could free an object and the memory of the freed object is still floating around on the heap it isn't cleared out.

A

So if you have a stale pointer pointing to this freed object, you could still use it. um It may have been overwritten by something else or partially overwritten, um and then, when that stale pointer is actually used, it is going to use a corrupted object.

A

So what some of the sanitizers add checks into the code at compile time to make sure that never happens like it will crash if there is ever a use after free, so here's a list of some of the main sanitizers there are a few other ones, I'm not that familiar with uh the most important or most used ones to me are address sanitizer, that's the one I pretty much always use, there's. Also memory sanitizer, detects use of uninitialized memory, so address sanitizer is, I think I quote it on the oh.

A

I didn't add it all right so address sanitizer itself is pretty interesting. um It does operate on the code at compile time and every. If I remember correctly, every heap allocation gets its own page in memory. So that way, if anything is read beyond the scope of the allocation, then it causes a crash. So if you allocate a buffer on the heap- and you try to read beyond the balance of the buffer or write it crashes,.

A

All right and using sanitizers is pretty straightforward if you've never used clang just think of it as another version of gcc, something that knows how to compile c code um so use it like normal. But you add, this f sanitize equals, then the name of the sanitizer.

A

So if you want to compile the example targets with this, this is a command line you could use.

A

So the double free uh address sanitizer will output additional information and context into why something, crashed and we'll go through the different sample or contrived targets and how they look with address. Sanitizer.

A

All right and this one is the stack overflow um it does give you very nice error messages, saying a little more clearly exactly what happened and maybe a little more context about what caused it, and so this one is the very basic one of with uh that crashes. If you give it gitlab and it tries to dereference null. So here we go a hint address points to the zero page, so signal is caused by a right memory axis.

A

So you know it's not because you're trying to read from somewhere that doesn't exist, you're trying to write there and it's the zero page.

A

So to summarize so far we can launch a process and we can monitor it and debug the process. We can use sanitizers, but now we need to actually start sending inputs to the target process, all right. So with mutation we'll keep it pretty straightforward. We will just mutate random bytes and existing data. We're not going to worry about changing the size of the data being sent or anything.

A

So and here's example c: it's another rus project. This takes an existing array of bytes, so u8, it's just a character, a byte and it's a vector, it's an array. So we take that and a random object. This knows how to generate random numbers, so n number of bytes uh or four n number of bytes or n number of times. We choose a random index and we set it to some random value in the character set, and that's it very simple mutation, and here we are in the main function actually using it.

A

So we are spawning the process every single time, um and so we have this scratch buffer that we just keep reusing. We keep overriding, um so we copy the original input in and then we mutate the scratch buffer again passing in the rand object and the scratch space buffer, and then we spawn the process, and then we monitor it to see if there's any, uh to see how it crashed and now this I let it run for a while- and it never found gitlab um yeah.

A

It just never found it so uh and that's actually expected um so we've got six characters here. So if we do.

A

One two three, four: five: six that many options to rent or one in that number- are the odds of finding the right randomly generating the correct uh value and actually it'd be a little higher because we're not generating six bytes every single time we're generating a random number of bytes or mutating a random number of bytes. So the odds are even lower that we will find gitlab with this method, all right.

A

So moving on to the next phase, uh snapshotting, um it's a little bit of a gear shift, but why would we want to snapshot so process? Creation is very slow if we are creating a new process every single time. That's a lot of setup time and tear down time for that process.

A

One of the other benefits of snapshot fuzzing is that there, the program itself may have a long setup time before you even get to the interesting code. So if you are using a snapshot based fuzzer, you could potentially take a snapshot.

A

After all, the setup has occurred and then you fuzz only the interesting part, and you also have options to make it more deterministic. It's not always the case, but um that is something that applies in general to the concept of snapshot based fuzzing. It can be more deterministic.

A

Actually, let's go back to this and look at the iterations per second, so creating a new process every single time we're getting about 700 iterations per second, uh so it's faster than some other things, but um in general, that's not too fast.

A

All right all right we're doing good, and so what does a snapshot actually mean um to me? It means you're, recording the state and restoring it. It should be that straightforward, it does get a bit complex and there are shortcuts you can take. You don't have to do the full thing, but if you're fuzzing or wanting to snapshot really complicated targets, it gets a lot more complicated than that.

A

So these are some of the things you might want to record and then restore so the memory of the process.

A

If you take a snapshot and then the process starts, allocating things on the heap um starts, making state changes uh on the stack or in wherever you are going to want to be able to reset all of those changes back to their original state. When you took the snapshot so register values, there are standard and floating point registers. You need to snapshot both of those sets and save them, so they can be restored file descriptors. So let's say a process has 10 files open.

A

How are you going to handle those? Let's say you snapshot the process and you know you continue doing the fuzzing. It closes five of the file handles. How do you restore it back to its original state um or let's say that a file was mapped into memory uh or there's existing maps that were then closed. How do you deal with those types of things? um It does get very complicated.

A

uh If, for the examples in this presentation, we won't deal with file, descriptors or network sockets or map data, we're really only going to focus on memory and register register values, but the other ones are definitely things to think about all right. So proc, fs, uh proc fs is uh it's a pseudo file system um and it gives you access to kernel data structures.

A

um It's not the fastest thing in the world, but it does give you the information you might need, if you're going to do some implement some sort of snapshot system, so in general, proc fs is our friend, but really it's our friend of me. um We like it because it gives us information, we need, but we actually really don't like it and there's reasons for that. We'll get to those later, though so.

A

Here's some important proc fs files for snapshotting, um there's proc maps, oh and for each of these, if you want to test them out or look at them on your system, you go to proc, slash self, slash, then the file name that refers to the current process. So you could do proc self maps and that will be the maps file for that process, all right, so first off uh procped maps, I it contains a list of all of the mapped regions for that process.

A

So region is different than a page when you map something it shows up as a new region. The stack has its own region, each of the binaries or libraries loaded into memory lives in its own region.

A

So if you cat, proc self maps, you'll see something like this and you can see user bin cat is actually loaded multiple times with different sets of permissions, um and I'm not going to go into that. But it's. This is really interesting to look at and to kind of get a feel or understanding for what's going on in the kernel when a process runs, but down here you can actually see this is the stack region so as functions are called, they leave stack frames on the stack.

A

The stack would definitely be a memory region that you would want to restore. um So if you're thinking about snapshotting, this gives you a lot of information. You can probably see ways where you don't need to capture all of this stuff in memory, so you can restore it. If you can't write to it, it's probably never going to change. So maybe you don't need to save it, for example all right, so this one is another interesting file.

A

This is the memory of the process. It's not actually just a file with all of the processes. Memory remember these are it's an interface to kernel data structures, so if you open proc and then the pid and then mem uh and seek to an address in memory, so let's say you know of an address in a process and you want to read that value using the mem process, proc fs file, you would open it seek to the address and then read.

A

However many bytes, and that would be the memory from that process and one of the examples in the examples directory is proc. Fs uh readmem.c- and this does exactly that- and this is what it looks like uh here. We go we're reading, proc self mem, and this will always be the current pid.

A

So it's very handy that way, and then we seek to the address of this data variable here or data pointer, and here we're going to create, allocate some new data and again we're oh, actually we're going to allocate the data and then we're going to read from proc mem the length of data and then we're just going to print the result.

A

So what should occur is that we will print out hello world, but except it will be from new data, and that will be a value that we read through proc fs through the mem file in proc fs, and that's exactly what happens so I I don't know why you would actually want to do that. um Maybe there's reasons but uh for normal programming. I've never needed to do that.

A

Maybe for some anti-cheat types of things having an outsider's perspective of your own process could be interesting.

A

All right so clear refs! So if we go back to this, this is a lot of data. uh That's loaded into memory. These regions can be fairly large, um and so, if we're restoring each of these, every single iteration during fuzzing, um that's going to be a serious bottleneck. So keep that in mind as we go through these next steps, so clear refs, it's a write, only file again you're setting some value in kernel memory, you're accessing kernel, data structures through proc fs, and what this does is it it sets.

A

It clears all of the dirty flags on all of the pages uh for the process.

A

So what you can do with this is clear when we look at the next one page map, so the page map gives you so there's a 64-bit value in this file for every single page loaded in into the process and the 55th bit of that value for each page in the processes, memory um indicates whether or not that page is dirty. Since the last time clear refs was written to um so now we have if we write the value 4 here to proc, clear refs.

A

All of those dirty bits will be cleared and then, if we let it run for a little bit and then we check these values in page map for every page in the process, we'll know which pages in memory have been changed since we wrote to clear reps, so this actually did originate from the checkpoint restore project criu.org.

A

So that's where this concept originated and it was created in order to do snapshotting or to make that easier, and it is actually used a lot with docker containers. um Yeah, there's a lot of research around how this applies to docker all right. So, but why do we really care why they are dirty? um It's pretty much performance. um If we only have to restore one page in memory versus entire all of the regions that we know about for the process things we will be much faster.

A

Also if we're operating on x64. The address space here is incredibly huge and we do not want to be trying to save all of the processes possible memory for every snapshot. All right now remember: proc fs is our frenemy. It's not our friend, um it's useful, but it's very, very slow. um So this I'm not! This is his handle, I think, uh but his blog talks about how proc fs is not that fast and he had a fork of the linux kernel that was trying to do.

A

Let's see change the apis for accessing those kernel.

A

Data structures instead of using a file based system um where you have to use multiple sys calls just to get at the data he his branch of the or fork of the linux kernel was using a different method in an effort to speed it up and if you search around you'll see a lot of links talking about how proc fs is really not that fast and we'll talk a little bit more about that on these slides so uh reading and writing memory from proc mem requires a few syscalls.

A

You have to open the file you have to seek, and then you have to read or write the values and then you have to close the file. So now you could leave the file descriptor open and then you just seek and read or write as you want and then close it when you're done so you could kind of rule out the first two, but even then for every single value you want to grab from the processes memory you have to do two syscalls at least you have to seek, and you have to do the operation.

A

So there are better apis for doing this type of thing, so there's process, vm, write, v and also read v. These both take an array or two arrays of local or not local. Two arrays of I o vectors is what they're called so there's a local io vector and the remote io vector, and there are arrays that indicate what data to write: that's a local io vector and then where to write it to so you can pass a whole list of data and address pairs.

A

uh You can send those lists to process vm write v and it will do all of them in one syscall and actually it's not using system calls. I actually thought it was so, but it does it all in one shot. Instead of having to do two sys calls for every operation and it's the inverse for read v.

A

So you have an array of buffers with known sizes that data from the remote process is going to be read into, and then you have an array of addresses uh in the remote process that indicate where the data will be read from all right. So we have a lot of the building blocks in place now, let's uh start putting them a bit more together, so we have.

A

If we want to record a snapshot, the process must be stopped and we need to record the registers and these have to do with p trace and then with proc fs and the process vm functions. We need to copy all of the writable regions indicated in the maps, and then we write 4 to clear refs and so that will clear the dirty bit flag on all the pages in the process's memory.

A

That way, when we go to restoring the snapshot again, the process must be stopped. We restore all the all of the registers that we saved, and then we checked the dirty bit for every page in the process and if it was dirty, then we restore all of the dirty pages with process. Vm write v in one shot.

A

And I mentioned this before this: does ignore all of these things. I didn't mention child processes. What, if it spawned a child process and we're resetting things these? If your target is the target that you're fuzzing is very complex, you will probably have to deal with these and figure out how you want to approach it. Maybe it's fine to when you restore it to just kill all the child processes that didn't exist at the time when the snapshot was taken.

A

Maybe that's all you need to do, but you still have to deal with multiple threads memory, mapped regions, uh new file, descriptors that were open lots to consider all right, so uh snapshotting is super useful. um There is a lot of research going on in that realm and I did not add a link to something else that I want to uh so afl. It's a very popular fuzzer, there's afl, plus plus some of the afl plus plus guys have been working on this afl snapshot.

A

Linux kernel module that takes care of all the snapshotting in on the kernel side, so you can have high performance snapshot based fuzzing without having to do multiple. Sys calls, oh one of the other things I was going to mention here is that there is also research going on in doing emulated, fuzzing, where you emulate a different architecture, possibly a simpler architecture and memory and in the emulator.

A

You have full insight into everything, that's going on in the process or whatever you're emulating so you're able to capture everything you need to without needing to do anything with the kernel. It's it's been emulated, so you have absolute control over everything and that's another very interesting area with snapshot fuzzing.

A

So now that we know how to make the snapshots, let's make a better target all right, so we have in the examples directory target with breakpoints.c and uh this source code. Here I took it all, took out all of the breakpoints to talk specifically about the slow setup that I added.

A

So the main function calls do something, and this intentionally takes maybe a second uh or maybe a little less. So if you were to fuzz this program end to end, you would not have a fast loop that you're iterating through um you would have like five iterations per second or something.

A

We are all right, so if we look back at.

A

Not that one I meant to include this on this other slide.

A

Okay, so this data is coming directly from the command line of the process and I'm realizing. I did not put the screenshot that I wanted to show in there all right. So, let's go back to here, so we've changed the target process to have a slow section with the interesting bit coming after the slow section um and the data that is being provided to the program just comes from a command line argument. Now, if we're using a snapshot based system, um we'll have we'll create the process once and then keep resetting it.

A

So how do we modify that input value on the command line? And that's what we're talking about next?

A

So we can manually insert breakpoints.

A

The intention here is that you could have an automated way to find these locations that you want to breakpoint and I'll talk about some of the ideas on that in a little bit, but in general the example fuzzers work like this: it flags the data that will be fuzzed and then it triggers the snapshot when to take it and then triggers the snapshot restoring. So the target itself has been manually instrumented to do have these interactions with the fuzzer.

A

So this is how we're flagging the data inside the target process uh we're using some raw assembly and we are triggering an n3 that generates a sig trap signal.

A

But before that occurs we are putting a special value into rcx, so food feed. If the fuzzer sees food feed, then we know that that sig trap uh is at the location uh where the memory address uh was tagged and then, when that happens, then we grab rax from the registers and rbx indicates the size of the data, and now we know the address in the target process and the length of the data that will be fuzzed and something else here. You might notice at the bottom.

A

We say set watch point, and that is what we do on the next slide. So taking the snapshot, um this is actually setting the watchpoint or a hardware breakpoint.

A

That is done on the address of the data to be fuzzed. So in the target process the rgv1 is passed in and that gets put into rax, which becomes this overwrite data address right, and then we set a watch point on it. So, the next time that data address is accessed, then the a sig trap will be triggered or sent to the process um and what that gets us is.

A

We don't need to manually figure out the best place uh where, um let's see what that gets us is, we don't need to manually insert the breakpoint to trigger these buzzing. um So now we take the snapshot as soon as the data of interest or the input to the program is actually used, um and in this contrived example that gets us past that slow section and where that actually takes us is, let's go all the way back to here, one more all right so, where that actually takes us is inside of stir length.

A

This is the first time the data is accessed and that's where the hardware breakpoint triggers the sig trap, and at that point that is where the snapshot is taken, and this is completely after the slow section and during the development of the snapshot. Buzzer code that I did in my free time it I actually had a problem where I wasn't: saving the floating point registers.

A

So a lot of the string um functions in the standard library use floating point registers to you know, increase performance right um and I wasn't saving those when I was doing the snapshotting and I kept uh it kept crashing sometimes uh here inside of sterling, when I after I restored it, and it took me a while to realize that I wasn't restoring the floating point registers, because those are what sterling was actually using.

A

All right, okay, so we have the hardware break point that gets us uh it's. It sends a sig trap um inside of the sterling function and that's the point at which the snapshot is taken. um So at this point we need to uh actually that's right. This slide is talking about specifically how the hardware breakpoint is set. um It's pretty interesting uh it.

A

We are starting to run out of time, so I'm going to kind of gloss over it a little bit, um but there's debug registers on intel processors and dr0123 are each track, a specific location in memory that can be set. So you can have four hardware breakpoints going at a time with those four registers.

A

The dr7 register is a debug control register and you set values inside of that register to enable or disable each of these dr0123 registers, it's a little more complicated than that, but that's the gist of it and you set these registers, specifically with ptrace poke user. You can't use the normal getregs uh request, all right, so restoring the snapshot. um There's another breakpoint triggered at the end of the target program, and this is watched for in the fuzzer.

A

uh If a sig trap occurs after we've already started the fuzzing loop, then we know it's ours and then we just break um now. Really a sig trap could occur in the process. Maybe the developers of an actual real application have an assert that they're left in that would trigger a sig trap.

A

More logic would need to be added here to handle that type of thing all right. So if we run this, we'll see that we are, we receive the tag memory address um and we have the max data length and we have the address of the data to be fuzzed. uh And here this is the watch point or the hardware breakpoint being hit. This is inside of the stir length function, and so we took the new snapshot and we copy all of the writable regions from memory, and then we start the fuzzing loop.

A

So every at the start of every fuzzy iteration we restore the snapshot and then do the fuzzing and this thing keeps showing up. But if we look at this uh we're getting 32 000 iterations per second. If you remember, when we were creating a new process for every iteration, we were only getting 600 iterations per second, so snapshot fuzzing um does have huge potential and it's really not so much the snapshot in that is getting us.

A

This performance, it's that we're not creating a new process every time and there are other techniques to not create a new process every iteration. But in this case, since we aren't creating the processes every time, we also get that benefit with snapshot based fuzzing.

A

But if you look at this, we have what 18 million iterations and it never triggered the correct value, we're just randomly mutating data, random number of bytes every time and we still haven't hit it even with a much greater speed.

A

All right. uh Let's see so, let's say: um if we're generating inputs to send to the program, some inputs will be more interesting than others. uh The general thought theory behind a corpus is that interesting inputs will need to be tracked and may be prioritized based on how interesting they are and non-interesting inputs, maybe ones that you've seen before will just be discarded, and so this corpus is work hand in hand with a feedback metric or a fitness function, and here this is all I need for a corpus.

A

I just need a hash set to know if I've seen some feedback metric or not, and a list of the items that I have seen first and I care about.

A

In the example code, I'm prioritizing the last half of the items or the most recent half of the items that I've seen.

A

That is some way of saying the most recent ones are the most interesting because they've made it furthest into the program.

A

All right, usually uh using feedback in a fuzzer, looks something like this. You might revert the snapshot start recording whatever. That means to the type of feedback run the target, and then you get your metrics from the feedback, and then you check in the corpus if you've seen that feedback metric or not and decide. If you want to save that input, so types of feedback coverage is the obvious one, uh but there are other ones: performance, counters manual, breakpoints, anything meaningful that can indicate progress in the fuzzing process.

A

So coverage is the default. Clang supports coverage as a sanitizer, so it gets inserted into the code at compile time and you can access those coverage metrics except you uh directly in your fuzzer except you have to um you, have to know about it and you have to access them. So you need to read the clang documentation to be able to work with the uh clang sanitized coverage sanitizer, uh the ones, the feedback metric that I'm using in these examples is actually performance counters.

A

um It was very interesting to me and that's why I used it. It was different and I wanted to use a different feedback than coverage, because coverage is what everything uses and I wanted to see if something like this would even work.

A

So these are all of the types of performance counters that are tracked by the perf linux subsystem, um there's quite a bit, and these are for the user or system-wide as the kernel. You can specify what type of counter you want to have for. Each of these part of the problem with using performance counters is that they are non-deterministic.

A

The perf subsystem is sample based, so it will record samples of each of these calendars throughout the recording process. um So two consecutive runs of a program will end up with different counter values.

A

So if we run these uh this instruction or this command, we're and all we're doing- is echoing hello and we're recording the number of cycles, number of instructions and bus cycles right could have added branches in there too.

A

So this is one and two.

A

They have pretty largely different numbers on these now does that matter actually, and it actually turns out it doesn't matter um or not as much as you might think it does so cycles. Definitely matters there's a huge variance in cycles like 4 000. Different instructions uh is a lot. Fewer branches is actually very few.

A

So if you use performance counters that don't change too much, even though it's non-deterministic, it still works just fine, and actually I think it might even help the process. I'm adding some jitter to the feedback.

A

Will cause you to save extra inputs that are interesting just because of the jitter, um but that could help you in the fuzzing process to not give up on certain paths uh too early, all right. So this is an example. This is the side project that kind of spawned. All of this for me um of doing snapshot based fuzzing in rust, with performance counters, so re-smack fuzz test. I call the fuzzer rees mac, and this is just my experimental fuzzer for that.

A

um So this is that running on a target. It's pretty much identical setup except it's looking for the words reese mac, and uh this is the full thing working with uh proof counters as its feedback mechanism.

A

um So if you remember, we had run 18 million iterations with the snapshot based buzzer and we still didn't see the crash. This is looking for an even longer input and we get the crash in about 60 or 70 000 iterations, um so it took like two seconds to find it, and then we got it now.

A

We are very, very close to being out of time. um Does anybody have questions I have? Oh, I've got two slides left so think of your questions. I'll cover these real, quick, so perf events definitely has pros and cons. Some huge pros is that I didn't need to instrument the process at all. All I needed was to have the perf subsystem and linux working or functional on my system, and that was it. The problem, though, is that there's a 4x overhead to recording performance counters while you're running a process.

A

This was some benchmarking that I had done. I thought maybe if I only use one of the counters, that it would be only 2x, but no, if you use any performance counters, it's pretty much 4x across the board.

A

So this here is how long it took to run a very basic function with no perf, and this is with perf counters turned on. So there are downsides to this. um If you spend a lot of time in the target process, this will have a much bigger impact than our trivial targets, which you know spend just a very, very small fraction of their time, actually running their own code and yeah. That is the end of this presentation. Did anyone have any questions.

B

uh Yeah, I um thank you very much. It was very cool super interesting. I just it's like two questions to the agenda. um The first one was about the what what basically the best the best point of creating a snapshot is so um if you have like inputs, that's used multiple times in your program. um How do you know what are the best like points in the broadcom's execution to take the snapshot.

A

Yeah, so that's that kind of comes down to knowing knowing the target right. um So it's let's see there is a thought that.

A

Having a one size fits all solution for fuzzing, it's really awesome and gets you off the ground pretty quickly but targeted fuzzing, where you spend a lot of in-depth time with the target process, you know how it works. You know the code.

A

Those do tend to be more successful than just plugging and playing right. So, knowing what let's see where to do the snapshot, um it would have to be targeted. um You could try to figure out where the process spends its most time. Maybe that could work. uh You could use perf to figure that out.

A

You could look at the functions that have the most time spent in them, but really, if you're looking for bugs you're, not really looking for where does it spend its most time right? You want to try and leverage or cause the most code in the program um to be executed, basically or if you want. If it's more of a state machine type thing you want to cover all the different possible state transitions right um yeah. So it's a there, isn't an easy answer to that.

B

Yeah yeah. The second question is about um that was um just something I think I read in a paper right ago. There was, um I have to dig it up somewhere to link somewhere, but um they were. You were storing the state of a process by using like four chords, so they were a process which, basically, that creates um essentially like the you. You have like the memory mappings and everything available in the child process, and then they were like um uh running it and it would fail.

B

They were restoring the parent process, so they were basically using four calls to create like copy uh the copy of a parent process, and I was wondering if this could be also useful for fuzzing or if this is something that wouldn't seem. I mean if you have access to the source code, if you could inject um fork calls uh into the source so.

A

That is actually a method that a lot of the fuzzers use uh is to fork and use that to track the child process.

A

um Let's see, I'm not, I haven't read any comparisons or I haven't done them myself on fork based fuzzing versus snapshot, uh so I'm actually not sure where the performance lies on that um that definitely sounds doable sounds cool.

A

All right, uh I think, did anybody else have questions.

A

No all right, then, I think we will end the presentation here and I'll put the recording up after it's done processing and let's do.