GitLab Fuzzing 101, 4 Aug 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Brief overview of using a corpus with coverage-guided fuzz testing

Description

GitLab Principal PM, Sam Kerr walks through an overview of what a corpus is, how it relates to coverage-guided fuzz testing, and why you might use it.

If you've not seen the high-level overview of coverage-guided fuzz testing yet, check it out at: https://www.youtube.com/watch?v=K3sX_dwyvqQ&list=PL05JrBw4t0KoYzW1CR-g1rMc9Xgmnhjfe&index=2&t=0s

Fuzz testing documentation: https://docs.gitlab.com/ee/user/application_security/coverage_fuzzing/#coverage-guided-fuzz-testing-ultimate

Fuzz testing direction page: https://about.gitlab.com/direction/secure/fuzz-testing/fuzz-testing/

A

Hi, I'm sam kerr, I'm a principal product manager here at gitlab and in this video I'm going to be talking about coverage, guided fuzz testing. What a corpus is, how it relates to coverage, guide, fuzz testing and how you would use it.

A

So, in a previous video, we talked about how fuzz testing is really all about your application. Your application is composed of multiple functions and fuzz testing is all about identifying one function, to really look at specifically and passing it. Many different inputs to try and find bugs and vulnerabilities the example that we're going to be talking about as part of this video is a function that loads pdf documents and outputs a jpeg image of each individual page.

A

If you remember from our other video the way that a fuzz tester will approach testing this function is the fuzzer will generate thousands or even millions of pdf documents to pass as input to this function, to try and find bugs or security vulnerabilities or edge cases that traditional qa testing missed.

A

But an interesting question that you might ask is: where do all of these pdf documents come from? How does the pd, how does the fuzzing engine know how to generate them? How does it know what makes a good test, what makes it a bad test, and so that's really what the the core of this video we're going to talk about and what we're going to be answering with a corpus.

A

So if we think about how the fuzzer is working, the fuzz engine is going to be creating all these different documents for all those different tests and to make good testing inputs to make all of those pdfs.

A

The fuzz engine uses, what's called a corpus and a corpus you can think of it as a collection of files, a collection of inputs that tell the fuzz engine. This is what a good input to the function looks like that you can make small changes or mutations on to generate all of these other pdfs. That might exhibit a crash or a fault in the application under test, notably, a corpus can contain no files.

A

You can simply tell the fuzz engine you know, generate random data, see what happens it can contain one file or it can contain as many other files as you have available and the more you work with coverage. Guided fuzz testing you'll find a good balance between providing many different files as input to the corpus versus not providing too many, while working with a few different projects.

A

I've found that adding as few as three input files to my corpus allows me to find many different sorts of bugs in an application under test, and the reason that providing these corpus files is so important is because it allows that pheasant to make intelligent mutations.

A

If we were to not provide any sort of input to the corpus. The only thing that the fuzz engine will be able to do is create random strings of ones and zeros.

A

um The files that are the inputs that are being tested will possibly eventually find some bugs, but it's going to either take a lot longer or it's going to never find those, especially if the program under test has very good error, handling capabilities where it's looking for individual pieces of structure inside the data, for example, if the pdf engine is always checking that a checksum value or that some signature in a header is present, it's going to be very difficult, and it's going to take a long time for the fuzz engine to pick up on that um without some input files in that corpus to help it know what works, what doesn't work, and so that's really a high-level view of what a corpus is why you would want to use it, and so, if we look at how it interacts again with fuzz coverage, guided fuzz testing as a whole.

A

If we start in the upper left, we have this corpus of known good input files, so these would be pdfs that you might have from an email from writing a document. What have you you pass? Those to the fuzz engine, the fuzz engine?

A

Then uses those input corpus files to make thousands to millions of new pdf documents passes those to the load, pdf function that we're testing, and it's going to be monitoring that function to see if we get any crashes or if we find any vulnerabilities that traditional qa and testing did not find, and that's really how you'll use a corpus to test these individual functions that are inside of your application and improve the overall security and quality of your app.

A

If you'd like to find out more our product, documentation is always available online at gitlab.com. We also invite you to take a look at our direction page. This direction page covers where we're going with fuzz testing, what sort of problems and use cases we're focusing on and also gives you a lot more information, we'd love for you to check it out. We think you'd find really interesting, and you can also also you can always also create an issue and talk to us directly. uh My handle is at st kerr.

A

If you want to reach out to me or you can ping the rest of the fuzzing team. Thank you very much appreciate the time.