GitLab Secure: Brown Bags, 11 Mar 2020

Previous Meeting

⏯

youtube image

►

From YouTube: 2020.03.11 - Brown Bag: Symbolic/Concolic Execution and SMT Solvers

Description

A presentation about symbolic/concolic execution engines, existing tools, and their applications.

A

All right and I will share my slides.

A

All right, so this brown bag is about symbolic in colic execution and SMT solvers I, initially thought I was going to go heavier into specifics about implementing symbolic or nking colic execution engines. It's going to be partly on that and partly on existing tools and then also on a proof of concept. I made that ended up being pretty cool. So yeah, that's what we will be talking about.

A

Alright, so symbolic execution. It's a technique used to model the execution of a program using symbolic variables, now to really understand that the opposite of symbolic execution is concrete execution which executes the program with concrete real values, basically just running the program. So symbolically you model the variables and the actions within the program, and then you can perform introspection on them and develop constrained space on how the program executes and I do also want to say. I I have used them before.

A

I would not call myself as much of an expert at all as Julian is so Julian. Please if you notice something wrong that I say please correct me. If you didn't know, Julian did is doctoral thesis on this topic. Exactly so yeah odds are I will probably get something wrong. Alright, so suppose we have this function here.

A

It's got a few branches, so three separate branches and one variable. If we wanted to explore all the code paths and this function, what would we have to do so? First, we would start with the input to the function, the X parameter and we would have to model this specific type of variable so assigns 32-bit integer our symbolic execution engine, and that would help us the reason I'm specific about saying it's. A sign. 32-Bit integer is because integer math you could wrap the integer around. You may be performing bitwise operations.

A

There are to have a fully functioning symbolic execution. Engine you'd want to model it fully alright. So next we execute the code on our symbolic variables until we reach a branch instruction and the first one is this x equals zero and for each branch we determine which values need to exist. For that branch to be viable- and this is where SMT solvers come in often actually I, don't I guess you can have a symbolic execution engine without us in queue. Solvers I think all the ones I've seen actually always have them.

A

Yeah I know Klee uses it. Basically, all the ones I've looked at they're paired together, and so, as you reach branch condition, you can update your current set of constraints that get you to that path, and so we're modeling the code. We know what the comparison operations are and we build our constraints. That way. One thing I didn't add in here is say: there's like an X plus equal one. Then you would also have to model that add that to the constraints as well and model that in your execution engine alright.

A

So we have a constraint and we know it's satisfiable X can equal zero, there's nothing preventing that. So we fork that's, usually the method that I've seen them use once they want to branch and explore a new branch. They fork the current state and if the path to the new branch is satisfiable, then it will set the new values to that and then continue down the new path. And so now we still have these current constraints being x, equals zero.

A

So the Exodus mean we've executed. Those lines, and so now we're on to the next one. So we have the constraints x equals 0. That's what got us here and then now we see a new constraint. X is less than 0, which is unsatisfiable. You can't have x, equals 0 and less than 0. So we don't pork to explore this first one and which takes us to the second one where we use the inverse of this condition: the X less than 0, and now that we fully explored that function.

A

We start back at the top and I tried to make this as simple as possible. There are optimizations you can do, but I think for demo purposes. This gets the point across, so we go back to the top and we have the sets of constraints that we use before, except we negate one of them and that skips this branch and takes us to the next one and we already added this constraint. So it's fine and we end up going in here and we skip this one.

A

So then, again, you go back to the top and you basically perform all permutations of negating the clauses or no the expressions in your constraints. The conjunctions no I always like on the specific terms for a piece of this equation, and so now we've got the negated ones. We skipped this part, and this will not go into here and now we'll go into this one. Now, we've fully explored this function. This is glossing over a lot of things.

A

I tried, I did keep it high level, partly because I wanted to talk a lot more about the applications of this and.

A

There is a big part that I did gloss over on this as well, so this is operating directly on the source code.

A

Our symbolic execution engine will have to know what the source code is, what it's doing so, whether its operating on the ast itself or you compile it to an intermediate language and then operate on that your Subotica execution will have to understand the code. Usually it operates on an intermediate language of some sort, all right any questions about that No.

A

All right so can colic execution, also known as dynamic symbolic execution. It's similar to symbolic execution, except that concrete values are used to gather the path constraints. So the concrete values won't change throughout the execution of the program, while you're, exploring your code and concrete and symbolic values for variables may be mixed.

A

All right and symbolic execution tools- these are the few that I've known of and that I've known people using Klee is one of the best ones. um A lot of people use it. Bat I, don't know. Ackley stands for, but Babb stands for binary analysis platform that one operates on its own intermediate language. The bap intermediate language and Klee operates on LLVM bit code Triton I believe has also its own intermediate language.

A

This, the guy, who makes this is mmm I've, been following him on Twitter for a long time, and he does a lot of really interesting research. I, really like his tools. Anger is a somewhat recent one from UCSB and Manticore is another one from trellis and I think ranked in how old I think they are.

A

This is how, with the new, the newest ones are thin of the list as far as I know, or at least from when I became aware of them, and these three specifically are very heavily focused on finding security flaws and programs. These two also have toyline like that.

A

I see these ones use a lot more specifically for those types of challenges. It's like reverse engineering things and automatically finding bugs determining if they're exploitable and possibly helping you perform exploitation of a program.

B

You mentioned that we are using BAP. Do you mind just sharing where boats we're using Batman, for, though, like if I said, I.

A

Didn't mean to say that if I did say that I've used it and personally um I have not used it. I get lab I've used it in previous companies. Oh my god, yeah alright, so CLE is one of the foremost tools for symbolic execution. If Forex I teach branch to explore different paths, it and it uses the move, STP SMT, solver I think I may have gotten that name wrong, but it offer it's on LVM big code.

A

So if you can compile anything to LLVM big code, then it can symbolically execute it and explore the paths within that code.

A

Alright and so the sample function, if we wanted to explore it with Klee, you need to instrument it. So a lot of these mmm I guess what I would say more advanced analysis tools. They require instrumentation. They require a test harness to be set up, and it's not so much plug-and-play. Usually so, what's Klee, you need to instrument it. This is us calling that function, and we need to say this variable or parameter to the get sine function is symbolic and Julie went over this last time.

A

I wanted to briefly revisit it because it will make the next examples make more sense.

C

A

So this is running clang or compiling it with clang into elegant bit code. This flag here emit LLVM um tells it to emit the big code for it instead of actually compiling it to an executable, so symbolic oops PC. There we go.

A

Oh I'm, in a docker container file.

D

Clams not there.

A

But that's what it is.

A

All right so now running once we have the bit code, then you can just run Klee directly on it. That's this first command Klee symbolically example, Debussy and Julian mentioned this. One too others take a test tool that will show you, the output of one of this mm generated inputs for the program.

A

So the green part is at executing it. When clear runs, it generates a series of folders, so you have Klee out zero.

A

And with each of these being different inputs that cause or that Klee use to explore your program, and so if we do a test tool, we last.

A

This is where we see okay, so the variable name was a that's. What we told it size was for it's a an integer, a 32-bit integer. This is the binary representation of the data hex, and that's it right, so three different branches.

A

So it chose to use a very large integer for the second one and for the third one it used a negative integer, and if we go back to keep doing that the example here so we had x equals zero, which gets us to this one and a negative integer goes down in this branch, and a very large integer goes down that branch so explored all the different branches in code. 100% code coverage on that function.

A

All right, so how can we apply this to what we do here at get lab? That was a big question at the end of june x' presentation.

A

It's there's a lot going on here, we're using SMT solving symbolic execution, lifting code from its source code, language into an intermediate language. So we can operate on it. There's a lot of ramp up time to be able to know what everything is doing and how to use those tools, especially if you want to do this from the perspective of gitlab, possibly adding it to like the secure stage or an extra CI pipeline right. It would be a lot to try and do it automatically.

A

So this came up with come. Let's see using unit tests as a starting point ended up being a really cool idea that came up the conversations that I had with Julian a while back. We were talking about I, don't remember exactly what we were talking about, but in the context of using SAST and DAST on programs. Sometimes you need to know how to run the program first before you can actually do more interesting things with the code. You can only get so far only looking at the code statically, it also applies to fuzz roots.

A

You need to be able to know how to run build the program. So if you use unit tests, the developer has already taken care of setting up everything for the project. You know how to run the code. You know all the dependencies the environment is already set up because it's running the unit tests. Now, if you run symbolic execution on top of the unit test, you get a few added benefits.

A

You can boost code coverage, it would be relatively straightforward to instrument most projects use some sort of unit testing framework that and most support plugins and you can add hooks for framework specific conditions. So say it's Ruby on Rails application or Python ask web application. You could check for specific things specific to the framework like not allowing cross-site scripting or maybe not writing to files outside of a certain temporary directory.

A

So all that is building up to this. This is the proof of concept I made it I called a PI test, auto Explorer. It is for Python and there's a reason for that, and I will talk about that in a minute, but it is a PI test. Plugin that automatically explores the code using defined unit tests as known good starting points and the whole experiment with PI test.

A

Auto Explorer was to be able to install the plugin and automatically boost your code coverage from your unit test and I'm super happy to say it was pretty successful. It is limited right now, but here's a link for it and we're gonna walk through using it. Oh here's my slide.

A

Where I talk about that, but yes, this is using the unit test is mmm I like it a lot and so far it's panned out again keeping in mind that the goal is to not have the developer have to learn the details of SMT, solving or fuzzing or symbolic execution, all right, so PI e XZ 3 was a project I'd run across around the same time that we were talking about using unit tests to bootstrap analysis, PI X III is basically a bog execution engine for Python programs and the way it does.

A

Things is pretty cool, but it is a did not add it actually I'm going to show you some code, I didn't add in the slides, but I meant to and it's super cool. So it's a small execution engine it uses SMT solver, specifically the one we're using is the XIII solver and explores all the branches in the Python code. Now the way it does. That is I, think pretty interesting.

A

So this is where PI ext3 implements its symbolic type and.

A

The way it knows when a branch or the constraints that have been placed on the code or variables is it overloads the comparison operators of it's symbolic variables and I thought. That was a super awesome use very simple and it works great. So if you have a symbolic variable, R equals you know symbolic variable and then you just if R equals 10 so say you have this code in this operation.

A

This function will be called and this records the constraints that have been checked against the variable, so it actually performs execution or the comparison, but also logs what was being compared against. So as the Python code is executing, it can build these constraints based on the current execution of the code yeah.

A

So it actually runs a Python code, replacing real, concrete values with these symbolic ones, so that it can capture the yeah constraints and the past that it has gone down any questions about that.

D

C

D

Oh sorry, yeah I just wanted to ask a short question about the I mean based on on this overloading of the different operations that you have. Is it also possible, if you, if you find let's say a function, call to hook a set up as a deacon strange?

D

That basically represents the semantics of this function, call and on to your set of constraints, because they don't be great, you don't have to actually go inside a certain library or something you could just pull the template, hold it into your set of constraints and try to solve that. That.

A

Would be totally possible, um I hmm I'm, not sure, let's see, do actually know you could you could do it all right.

A

Remind me when I start talking about instrumenting the functions, that's where I would do it, yeah I think you could store that information on the function object itself. So in Python everything is a function right or everything is an object. So a function is a function object and you can add additional attributes to the function object. And so, if you fully explored a function, you could store its I guess constraint tree on the function itself, so you don't have to so you get that up front it. That's what you're talking about right, yeah.

D

Basically, and because I know that this is somehow a problem, that's the kind of unsolved. Yet so at the moment, what what what's usually happening is that you're going into Louise and you basically. This is one of the reasons why we usually run into this products. Potion problem is because you're basically are sending all the time into into other called like call code, that's somewhere, and then you try to extract the constraints, and so you constraints is getting bigger and bigger.

D

But if you already know the semantics of a certain operation and if you have a template for that, it would be super cool if you could just take the template and avoid like ascending into called functions. Thank you.

B

Bye template, do you mean, like I, could like a predetermined path? That's already determined for that library. So you don't.

D

B

D

I basically mean that if you have like an operation that you could express with a basic set of constraints, instead of like going into this operation and collecting all the constraints, you could basically just all the template, a good template, yeah and then a software.

B

Sometimes, like a version of that of that actual right, sorry I cut you off yeah.

D

Yeah, basically, oh that's just really yeah yeah. That was basically what our minds here. There would be super.

B

Close half have some canonical way. We can represent that. You know because I'm thinking it like you know, if maybe this is getting too far into the weeds here in terms of details.

B

But if you can represent the path, it's like sort of a key values and then each path exploration can be cached at some level, and so you can terminate having to explore the full tree each time, especially if, like a lot of shared libraries are in use. Once you know, we've analyzed one project and we sort of understand the path exploration for that library being able to reuse everything from a certain layer down. It's probably an optimization.

B

If I understand, we were saying Chile and tell me if I'm completely here, yeah.

D

Yeah, no, that's that's one. One way of thinking about it would be caching, but another way would be also if we have like across libraries. You also have a lot of common functionality. You're thinking about swing, libraries, for example, that you have for different languages, and if you, if you, for example, have some upfronts and some domain knowledge upfront about how certain operations are working, that's a string concatenation! You could have basically have already a template for that prepared. That's basically pull it.

D

Whenever your observed is of operation, and then you actually don't have to solve it for the first time, because you already have attempted attend that could be usable across different languages. Basically yeah.

B

It sounds like that knowledge that that template becomes a like a an inventory of templates that you can pull on the fight to potentially remove the need for runtime calculation, because we've already figured it out. Yeah.

A

Well and that it would suppose you had that for like everything, then you wouldn't necessarily need to do mmm framework specific things, because you know all the details about the framework. Instead,.

B

A

Say like a web framework like Route dot, right rights to a WebSocket that would have been fully explored and you may not need to do the framework specific things mmm interesting.

A

So it's a there's a lot of overlap there with the unified cest things that we've been talking about, I'm, cashing out tract syntax, trees and representations of code. You know.

B

There was something you said: James I just want to make sure I understand so like from the developers perspective. They were able to go write code as they would, and the tooling that we provide here would be sort of transparent to them. It'd be minimal in terms of being employed checked so that it can then explore the different paths that the developer may not have considered. Yes of that code and.

A

Were they or that were maybe too tedious to add so I 100% code coverage isn't always a goal, because it takes a lot of work to get that last, like 5%, if you had a very robust symbolic execution engine or something that automatically explored your code, you could probably get a hundred percent without having to do all the tedious work yourself and the.

B

Benefit to the developer, is you don't have to do the tedious work you get this knowledge of like here are the hotspots you may want to focus on and it's it requires very little investment. The investment is in this case. It was PI test. Auto Explorer has the additional package to include, and it was unobtrusive for the most part from yep.

A

Exactly exactly okay and I'm, going to so PI test, Auto Explorer is the proof of concept that I made and I was surprised at how I guess relatively straightforward and ended up being um there weren't any huge snags anywhere.

A

Yeah I'll get a bit more into that yeah, so taking a step back a little bit, PI X III. The reason why I use PI type Python for my proof of concept is because I'm building on top of PI X III, and so some of my proof of concepts limitations are because of limitations. Within this proof of concept. They are overcome, evil or they can be overcome. I don't think overcome. Abbu is a word.

A

They can't be overcome, but it is they are there right now, it's a proof of concept, so the big one is that PI exe 3 only operates on integers, so it uses SMT solvers. It works with symbolic values, but only integers, and now everything like SMT solvers, support, string theories and so Julian talks about different theories with SMT solvers.

A

They support strings regular expressions, but that would have to be added in I. Don't think it would be too much work, but it's just not there right now, all right so with PI tests. This is a PI test, is extremely straightforward to write tests for it's one of my favorite testing frameworks in any language I've ever used, you don't even have to import the PI test module. You just have to have a test function or a function.

A

That starts with tests underscore and you use assert to assert different conditions and PI test takes care of everything else. So as an example, this is running the previous code and there we go.

A

That is it all you say is run use the PI test command on your code or point at a directory, and this is actually used telling the coverage high test plug-in to only do coverage on the test sample module, which is just our code right.

A

So now we covered 100% of our test code, which makes sense because we ran the test well yeah. That is it's as simple as that is, and I just has a very robust, plug-in API that lets you hook. Most aspects of running tests and soap, I, test, Auto, Explorer hooks functions within within test cases themselves and performs either can colic execution or fuzzing on the identified function calls. So it's kind of a bonus that I toss.

A

Then, since I'm instrumenting the code and poking all of the functions, we know the input variable or parameter types, their initial values. We could also do fuzzing in the same pass, which is why I called the plug-in Auto Explorer and not SMT or like symbolic execution. It's just automatically exploring your code.

A

Alright, so I made a very simple test project, it's in the examples directory in PI, test, Auto Explorer, and it does some basic math operations only adding and division. We don't care about multiplication here, all right, so again, very simple, but they are buggy. They do weird things right. So if the number is dead, beef we're writing to a file for some reason right. Otherwise, we just return this and same with division.

A

We're doing unnecessary things everywhere, manually, raising an error returning a non numeric value of a number is 1 3, 3, 7, weird things about the code. Maybe this is old legacy code and somebody inherited it right all right and the test project also has some very basic tests. So this one just calls the function. If there's an exception that was raised, then the test would fail.

A

So this is basically just making sure there's no exceptions raised while running the tests or calling the functions, and this is performing some minimal mm tests on the return values of the functions. So abs divide needs to return a float and this one tests that no files should be written when using the add operation like what should never happen. So the developer has the test there, but it's actually not catching the test.

A

So out of the box, without using the plug-in, it's got 68 percent code coverage, which isn't great I personally, would want to see I see that a lot higher and those tests are not very robust, they're, not exploring all the past in the code.

A

So now we're going to talk about actually using PI test, Auto Explorer and a bit about how it does the hooking and exactly what it's doing so the requirements to be able to use it all you have to do is install it and then rerun PI tests and add batch Auto assault and that's it and then it instruments the code and automatically explores all the paths in your code for integer values.

A

Alright, so here we go. We are going to do high test test. Oh.

A

All right flight test code best project, so this is telling the coverage plugin to only mmm record code coverage for the test project module. If we don't do that, it'll do test coverage, I mean code coverage for all of the built-in Python modules. Everything and we don't care about that. So we'll say Auto solve and that's poop, that's project tests, a Rio and suddenly we've got code coverage boosted up to 94%. Instead of running it. Without it, we've got 80% which doesn't seem right. Oh I modified something dang it mmm test project.

C

A

A

Didn't have it okay, huh so I think I added a new test after I made that slide so 68 percent was the wrong number that 80% right now all right. So let's do this. Oh okay, I know where that number came from that's project.

A

So now, if we do it, it was including the test code itself, which groups of the number do.

C

You go test project, slash test project for the cub or CD in a test project.

A

Yes, I just CD in the test, Prado.

C

I see that okay.

A

I'm, sorry about that, oh no you're, good, you're, good and just test there we go, there's the 68% all right, so the boost forgetting adding Auto so now we're at a hundred percent. There we go 100% code coverage just by installing the plug-in and saying auto solve now all this extra output I don't know how to get rid of it, I'm not sure where it's coming from so that would be a to do. But you can see here we are actually getting a bunch of errors. I don't know if you notice.

A

So initially we have three tests. When we run it, we have 23 tests and that's from automatically explore in the code and instrument in the test functions. So these are salts and cases. This one is specifically for the test. Add function.

A

I'm gonna make this slightly smaller and.

A

A

All right so ice testing, this function right and it says if the number is this number here, which is actually that.

A

All right, which is actually dead beef. So if num1 is dead beef, then it raises this exception. Math function should not be opening files, which is was the original intention of the developer, was to make sure that it's not opening files except he wasn't passing and beef or the developer, could not necessarily a guy was passed, not passing and dead beef. So we found that one now, these other ones, so you get the arguments or the parameters to the function that it was called was and absolute always return a float.

A

It returned special, so it found that one and this one here is the explicit / 0 exception, which was explicitly raised.

A

That was right here and again. This is just a slightly different code cough to get to the same thing, so it hit an extra branch getting there, which is why that one showed up- and this one is a different variant of the / 0 exception, but yeah you can see. All of these are awesome and you get it just by installing the plugin before I go into details about its implementation or continue on. Does anybody have any questions about this so far?.

A

No all right cool, let's do the next part, so the way PI test, Auto Explorer works, uses pythons, runtime, introspection to determine function, arguments and depending on the type of arguments the test function, has it may either directly instrument the Python, bytecode itself or rap the test function and solve the wrapped function and there's a pretty big distinction between the two directly instrumenting, the Python bytecode of the function means that within a single function, so let's go here.

A

So say, for example, within this function it will instrument this function call specifically and then here it will implement instrument that function. Call it's not instrumenting, this entire thing. So that's the difference. What I mean it's wrapping the function versus instrument in it all right, so python bytecode, is something I need to explain before. I can really talk about how I'm instrumenting the Python bytecode, so python bytecode is similar to other forms of bytecode.

A

It's very simple: it operates on a stack or it has concept of a stack, and it has variable names that it stores and it most of the instructions, are pushing to the stack popping from the stack or grabbing a variable from the variables store based on an index yeah, it's not too bad at all. You can see what Python bytecode looks like. There's a built-in Python module called disk for disassembly. You can give it a code, object or just a raw string, and it will show the disassembly of a code. I mentioned before that.

A

Python functions are objects, their function objects, every function, object has a dunder code attribute, so dunder means double underscore so double underscore code, and that is the actual code object. The compiled bytecode of that function, and this is what that function looks like, and these indices are arguments to the instruction itself. Sometimes they reference a an index into the variable store or number of parameters is specific to the argument or to the instruction.

A

Alright, so simple test functions that do not declare keyword, arguments each function. Call that calls into the current project is replaced with the custom call back and the call back captures the original function, the positional arguments and the keyword arguments that were being sent to the function, and at that point we have full control to do whatever we want. We can run it like originally just run the function and pass in the provided arguments, or we can do whatever we won't want with it fuzz.

A

It do have some key solving sky's the limit times all right.

A

So, for example, this is a function, call it loads. The name sum function passes in two parameters, one and two and then calls a function on the call function instruction. The argument for it is the number of parameters, so it pops that many parameters off the stack and then what's left on the stack would be some function value and then it calls that function with those parameters. So after instrumenting it. This is what we have very similar setup.

A

It loads a function puts that value on the stack puts these two constants on the stack, the parameters. But then we cache the positional arguments, the keyword arguments there are none, so it makes a an empty map dictionary and then it stores of the original function itself in this temp function variable and then it does some things here to mmm load the assistant callbacks object and it uses this key to look up the callback and then calls that callback from cystic callbacks with the cached or temp arguments.

A

And at that point, that's where we are fully into our code and we can do whatever we want with the function- and this is, is there's no magic value here. It's I didn't have this. Those slides when it does. The hooking of the function calls assigns a unique identifier and then so syscall FM callbacks is a dictionary and the keys of a dictionary are the identifiers. The values is the callback. It's just an easy way to access the callback, because there's not much state persisted in here. That would be easy to it.

A

It's not easily accessible from the Python bytecode alright, so that was four simple functions for functions. That declare keyword. Arguments at the.

B

A

Is not normally something you do in Python tests or PI test functions, but if the developer does that, then we could just wrap the function itself and solve this function. What that gets us is that the developer can add custom mm air condition checks. So then it becomes more of a real unit test than we're running the full unit test, with all the different possible values for these declared arguments.

A

Any questions comments about that before I keep going, how.

C

Are we going about what values to pass into the test function.

A

Okay, so that is a good point. So one of the things that I liked about using this mechanism just keyword, arguments with default values, the default values, become the default right and, if you're doing SMT solving, then it knows the type of the value. You could also use. Python type int, but I didn't add that in here but yeah it gives you the type of the parameter, a default initial value and it's a good starting point.

A

So how does high test auto explored know what values to give it if you're doing solving it, does SMT solving to figure out what the values need to be to go down certain paths if you're doing fuzzing it just depends on the fuzzing mutation, algorithm you're, using right.

C

Yeah, that makes a lot of sense. Thank you. Yeah.

A

No problem, all right so quick bit about fuzzing I kept this super simple. The code base for Python Auto Explorer only operates on integers because of pi X III limitations. The buzzing aspect is not limited by that, but I just kept it simple, and it's also only integers and I didn't go crazy on it. Implementing fans here fuzzing. All it does, is for all known, integer arguments to whatever is being instrumented. It tests all permutations of those arguments with the range negative 20 to 20, and this could very easily be expanded.

A

Actually, since I'm in here.

A

Where it's at odds right here we go. So this is the buzzing function, and this is actually from the instrumented code, whether it's wrapped or directly swapped out in the Python bytecode. You end up getting the original function, the args keyword arguments- and this is a few extra things that you get oh yeah. This is where it's doing the fuzzing, so here it's actually calling it, and here it is doing all combinations of this range for this number of arguments and then it assigns them and does its thing.

A

And if there were any exceptions raised, then it catches it and logs the test case. So relatively straightforward.

C

A

Right so now, let's do the same PI testing with project PI test all right. So this.

A

That's here we go 68% and now, let's only auto fuzz, then we get up to 84%, but since we are only oh and you should notice, we are testing many more test cases now I'm all permutations of the range negative twenty to twenty and number of arguments.

A

So that's what sixteen hundred test cases and we have a lot of failures, so it hit failures, hoped that is I, thought I fix this. This was a book in my code. It shouldn't have been that many failures, it's gonna say well, demos live and presentations, but it's fuzzy and let's.

A

Let's just comment that out, so we don't have to deal with that all right. So now, if we run it, we're not going to have that error. So now we're hitting divide by zero exceptions and different things. So eight hundred past twenty three fail. This is more about what I was expectin, see and again didn't have to instrument the code at all. We are close to running out of time all right, so let's keep moving. So if you did want to explicitly say I want to auto explore this function.

A

I did add a decorator to the PI test, Auto Explorer code. So on a test function, you could say explicitly Auto explore this and it will do it. You don't have to add the dash dash, auto, solve or auto fuzz. Now things this gets. You is that you know auto solve or auto close all tests. Maybe you don't want to do that. It could be very expensive.

A

Maybe there's only certain areas that you want to focus on, but then also once you set certain options and be way more specific about what you're doing, but it does require setup by the user, which is what I was trying to avoid with this whole proof of concept. So this was an afterthought, but it is useful.

A

Now we talked about mixing symbolic and concrete values. You can do that with the auto export decorator. If you call it, if you use the decorator as a function as an you call it first, then you can set the symbolic values and the concrete values. Concrete values are never changed. They are opposed their past verbatim throughout the program while executing so yeah that lets you do more complicated, setups PI, test. Auto Explorer also does some smarter detection in the.

A

For example, here we have num1 and num2. These are integers, but this mocker objects. That's a PI test, specific thing and it's a mock object, PI, test, Auto Explorer knows it can't handle it and automatically makes it a concrete value instead of a symbolic value.

A

So there is some automation going on there, but you can explicitly set these things as well.

A

Another thing you can do with this is, you could say, explicitly turn off solving and only fuzz or you could say, fuzz equals true and the default is to solve so then you'll be doing fuzzing and solving on your test function, but yeah that is I, think we are just about out yep, that's about it. Any questions want me to go back to anything, explain something else. I.

C

Was curious if you are aware of hypothesis and a.

A

Library I have seen it it's been a while, since I've looked at it. um That is where you let see you decorate your functions with what it's expecting. Basically, the function contract right and then it tests that right.

C

Yeah, it takes comma property testing approach where you can set up the properties, and you know the different values that you might want and you can limit the range etc.

C

But what's interesting about hypothesis, two I used it quite a bit in Python, but they've also been working on expanding it to other languages, so there's a more common DSL for property testing across other language tax, which is pretty interesting, especially when thinking about you know figuring out how to maybe apply some of these concepts automatically to someone's codebase know.

C

One thing that I really liked about hypothesis approach is that they actually store failed examples in a directory, so that so that you have like that. What do you, if, when you fix a bug or whatnot, you know you have that regression test. So that's pretty useful too, for and and I don't actually know. This is probably a really clear question: what what is the difference between fuzz testing and property testing? Is there a difference, or is it just two Dame's for kind of the same thing so.

A

You know I don't know, um fuzzing is not security, specific you're, throwing you're generating inputs and sending it through the program. How smart you get with it depends I think.

A

As far as I know, they could be the same thing. It's maybe there's something more specific about property testing that I don't understand, but my my understanding is that they're pretty much the same thing now.

C

It's kind of my understanding too so I was curious. If there was something I didn't know about there sound good, oh.

A

That was another thing that I wanted to mention so here. In this example, we are sold sharing my screen, yeah we're making sure it doesn't open any files right, math functions should not have to open a file. I didn't add this into the example, but since we're instrumenting this whole function, we could add our own mock things to test for certain conditions, whether they are security related or not.

A

You could build that into the plug-in itself, so you don't have to add this in the test code, and so you just have this assert, as instance, wha and then the plug-in itself would have a whole suite of additional checks to check for it. So if there's a flask application- and we were using string Theory's with us infuse over and everything supportive strings, then we could testify cross-site, scripting or something yeah.

A

So there's a lot of having the initial mmm code instrumented and having yourself plugged in gives you a lot of flexibility and what you can do potentially in the future. um Again. This is just a proof of concept. It's not super robust, it's only numbers, but it's interesting.

C

um Another question I have is one of the things that came to mind when you were talking earlier about.

C

You know that last 20% of test coverage how it can be really hard one, a big part of that is dealing with you know, codes, have side effects right and like well now I have to set up the world or I feel like, and not do it in this like very fragile way um and one of the things that I think is really cool about symbolic execution, and some of these other approaches is that you know you don't necessarily have that same side effect problem of like it can look in the node.

C

It can think through it. It's not executing your code necessarily directly, so you don't have those side effects worried about you know. Arum are effing the root of your driver or something you know, but in your case I think. Are you actually executing these test functions or the functions within the test functions? You.

A

Didn't worry about that, it is actually executing it. It is catching exceptions, but it is not well- and this is where um let's see Julian probably has a lot more insight, but it depends on the implementation of the execution engine itself. He's likely operates on El opium, big code, this PI X III execution engine that I'm basically wrapping actually runs the Python code itself, but uses symbolic values during execution. So it's concoct execution yeah. So it's in my case. It's not emulating it. It is actually running it. Yeah.

C

Perfect, so that's kind of the distinction between con colic and symbolic I, I missed the very beginning. I was playing on watching back. Ok,.

A

So yeah another word for colic execution is dynamic, symbolic execution. So if you see skip draw lots up, I.

C

Can watch the food I'm? Sorry! Oh.

A

Sorry about that yeah, no, it's fun yeah! So it's using concrete values during your runs.

C

A

Makes it a colic execution now, I will say: I've I've done a very proof of concept, implementation of a symbolic execution engine in the past, but my experience with them is mostly knowing mmm decent enough to use them and know how I can use them and when I can't apply them.

A

But my expertise in it is not I, wouldn't call myself. An expert I know how to use them, though yeah so terminology was I. Think I'm on point: hey thanks, yep and yeah. That's about it any other questions. um Does anybody want to see any other examples, specific ways things were implemented? Anything.

D

Yeah yeah I'm kind of curious about the, because you Illustrated past testing and also symbolic execution I was running a. We did the different types of bakso issues who discovered were there a large overlap or were the kind of issues different with both both approaches that you so that.

A

Is a very good point, so fuzzing is only as good as your mutation engine and your feedback, so feedback driven, fuzzing I think the general case. You should consider all fuzzers feedback driven and really dumb posers just have a no op feedback mechanism, but really a fuzzers only as good as those two things combined.

A

So if your fuzzer knows, if a new test case has explored a new code path, then it could mutate on top of that test case, to hopefully explore further into code paths, red or new code paths past that one a sample fuzzer that I have in this does not do that so Julian you were asking about overlap because the buzzer or the mutation engine I gave the fuzzer is extremely basic and only test. The range is negative 20 to 20.

A

It will not find those values where it's checking for like 1, 3, 3, 7 or dead beef. It just won't find them, because it's not even coded to do that right, so auto pose. Here we go so we have. It did find / 0 exceptions. We went from 68 percent up to 84 percent, but again we're not getting those other higher values because we're just not testing it yeah and that that is a big difference.

A

We're in the overlap there fuzzing can potentially go a lot faster so that we're doing over 800 test cases and 0.1 3 seconds for Python. That's not bad, and let's see this won't really be a good example. There's a lot of set-up time, but just for doing 20, test cases we're already at 0.2 seconds right. So that's one of the big distinctions or hmm pros benefits of using a fuzzer is that they can be really fast and a lot of times. Extreme performance will be any amount of smarts yeah with interpreted languages.

A

The difference isn't quite so big. If this was C code, we were instrumenting, the difference would be exponential. It would be extremely dramatic, high-performance fuzzers now so one of the guys I follow on Twitter. He I love his work, he's constantly trying to improve fuzzers and he's.

A

He wrote at his own risk, emulator that uses vector instructions on Intel processors and he's getting like a billion. No, it was 100 million test cases per second fuzzing. The ctags program and like hundred million per second, is insane for anything interpreted like there is no way you can get there so yeah, it's a that's just a sample of the difference between the two fuzzing can be very high performance, and then you just brute force your way through it.

A

You mentioned oh.

A

He dropped off. It was mentioned that the test cases could be saved off into a file. I did think about doing that. I've seen another PI test. Fuzzing plug-in. Do that, but I didn't dive for this yeah. That's always an interesting one like do you want to add that into CI and like add the test cases back into the code- and it just seems weird committing back to your own repository from CI but yeah- that's something you could do.

A

All right blast things: the clue is: here's a few links. There's the home page I was using clean, a docker container. Here's some good documentation.

A

This slide had a pretty good overview of symbolic and concoct execution, and this is the tool I was using to present this presentation, I like to do things from the terminal and I wrote this, and it's just markdown rendered in the terminal. So.

B

It's amazing I liked how it had the little terminal inside your presentation. Yes, it's.

A

A I could do a separate I could talk about this tool for a long time. I like it a lot. It has a lot of benefits, long story, short short right here, I'm just going to pause, the recording, stop where's.