►
Description
GitLab Principal PM, Sam Kerr walks through an overview of what a corpus is, how it relates to coverage-guided fuzz testing, and why you might use it.
If you've not seen the high-level overview of coverage-guided fuzz testing yet, check it out at: https://www.youtube.com/watch?v=K3sX_dwyvqQ&list=PL05JrBw4t0KoYzW1CR-g1rMc9Xgmnhjfe&index=2&t=0s
Fuzz testing documentation: https://docs.gitlab.com/ee/user/application_security/coverage_fuzzing/#coverage-guided-fuzz-testing-ultimate
Fuzz testing direction page: https://about.gitlab.com/direction/secure/fuzz-testing/fuzz-testing/
A
A
So,
in
a
previous
video,
we
talked
about
how
fuzz
testing
is
really
all
about
your
application.
Your
application
is
composed
of
multiple
functions
and
fuzz
testing
is
all
about
identifying
one
function,
to
really
look
at
specifically
and
passing
it.
Many
different
inputs
to
try
and
find
bugs
and
vulnerabilities
the
example
that
we're
going
to
be
talking
about
as
part
of
this
video
is
a
function
that
loads
pdf
documents
and
outputs
a
jpeg
image
of
each
individual
page.
A
But
an
interesting
question
that
you
might
ask
is:
where
do
all
of
these
pdf
documents
come
from?
How
does
the
pd,
how
does
the
fuzzing
engine
know
how
to
generate
them?
How
does
it
know
what
makes
a
good
test,
what
makes
it
a
bad
test,
and
so
that's
really
what
the
the
core
of
this
video
we're
going
to
talk
about
and
what
we're
going
to
be
answering
with
a
corpus.
A
The
fuzz
engine
uses,
what's
called
a
corpus
and
a
corpus
you
can
think
of
it
as
a
collection
of
files,
a
collection
of
inputs
that
tell
the
fuzz
engine.
This
is
what
a
good
input
to
the
function
looks
like
that
you
can
make
small
changes
or
mutations
on
to
generate
all
of
these
other
pdfs.
That
might
exhibit
a
crash
or
a
fault
in
the
application
under
test,
notably,
a
corpus
can
contain
no
files.
A
You
can
simply
tell
the
fuzz
engine
you
know,
generate
random
data,
see
what
happens
it
can
contain
one
file
or
it
can
contain
as
many
other
files
as
you
have
available
and
the
more
you
work
with
coverage.
Guided
fuzz
testing
you'll
find
a
good
balance
between
providing
many
different
files
as
input
to
the
corpus
versus
not
providing
too
many,
while
working
with
a
few
different
projects.
A
A
The
files
that
are
the
inputs
that
are
being
tested
will
possibly
eventually
find
some
bugs,
but
it's
going
to
either
take
a
lot
longer
or
it's
going
to
never
find
those,
especially
if
the
program
under
test
has
very
good
error,
handling
capabilities
where
it's
looking
for
individual
pieces
of
structure
inside
the
data,
for
example,
if
the
pdf
engine
is
always
checking
that
a
checksum
value
or
that
some
signature
in
a
header
is
present,
it's
going
to
be
very
difficult,
and
it's
going
to
take
a
long
time
for
the
fuzz
engine
to
pick
up
on
that
without
some
input
files
in
that
corpus
to
help
it
know
what
works,
what
doesn't
work,
and
so
that's
really
a
high-level
view
of
what
a
corpus
is
why
you
would
want
to
use
it,
and
so,
if
we
look
at
how
it
interacts
again
with
fuzz
coverage,
guided
fuzz
testing
as
a
whole.
A
A
If
you'd
like
to
find
out
more
our
product,
documentation
is
always
available
online
at
gitlab.com.
We
also
invite
you
to
take
a
look
at
our
direction
page.
This
direction
page
covers
where
we're
going
with
fuzz
testing,
what
sort
of
problems
and
use
cases
we're
focusing
on
and
also
gives
you
a
lot
more
information,
we'd
love
for
you
to
check
it
out.
We
think
you'd
find
really
interesting,
and
you
can
also
also
you
can
always
also
create
an
issue
and
talk
to
us
directly.
My
handle
is
at
st
kerr.