►
From YouTube: 2020.07.21 - Brown Bag: Unit-test Derived Fuzzing
Description
This BrownBag session discusses problems and solutions for deriving fuzzing harnesses from existing unit tests.
BrownBag issue: https://gitlab.com/gitlab-org/secure/brown-bag-sessions/-/issues/28
A
All
right,
can
everybody
hear
me:
okay,
yep
awesome.
So
this
is
a
brown
bag
session
about
unit
test
derived
fuzzing
and
yeah.
This
turned
out
to
be
a
lot
more
there's
a
lot
to
talk
about
here,
so
we
won't
go
too
much
into
code
and
seeing
this
is
run
with
python
ruby
and
go
like
I
had
originally
planned,
but
we
will
cover
everything
we
need
to
do
to
get
there.
There's
a
lot
to
talk
about
all
right.
So,
let's
get
started
so
who
am
I
my
name
is
james
johnson.
A
A
A
Basically,
it
renders
markdown
slides
in
the
terminal
so
that
I
can
mix
code
and
bash
prompts
all
right.
So
the
problem
that
this
that
we'll
talk
about
in
this
brown
bag
session
is
making
it
easier
for
developers
to
start
using
fuzzing
there's
a
lot
of
things.
You
have
to
do
to
be
able
to
use
it
and
use
it
well
with
a
project.
A
A
You
have
to
reset
up
environments,
copy
and
paste
a
lot
of
code
create
fuzzing
harnesses
to
do
all
of
the
work,
and
then
once
you
get
everything
set
up,
you
have
to
actually
run
it,
and
then
you
have
to
monitor
for
results
and
tweak
the
settings
of
your
fuzzer
and
kind
of
dial
it
in
so
that
it's
very
targeted
and
does
what
you're
expecting.
A
A
All
right
so
in
standard
fuzzing
situations,
often
the
fuzzer
will
modify
a
single
input
and
the
fuzzing
harness
will
forward
that
input
to
the
target,
binary
or
target
program
doing
whatever
setup
needs
to
be
done
in
order
for
that
input
to
actually
be
processed
and
looked
at,
and
this
is
a
very
simple
straightforward
case-
it
works
very
well
for
image,
processors,
things
that
process
raw
data
now
for
deriving
fuzzing
harnesses
or
fuzzing
code
directly
from
unit
tests.
Things
won't
be
quite
that
simple.
A
So
here
is
an
example
of
a
very
straightforward
kind
of
classic
fuzzing
harness.
This
is
taken
from
the
chromium
source
code
itself.
This
is
fuzzing
their
time,
parser
library
and
again
it
just
operates
directly
on
raw
data
by
the
time.
So
as
a
developer,
this
is
all
I
would
need
to
set
up
and
so
llvm
fuzzer,
that's
all
part
of
lib
fuzzer,
so
lib
fuzzer
itself
takes
care
of
the
mutations.
A
You
can
override
it
and
provide
a
custom
mutator,
but
the
default
is
all
you
have
to
provide
is
the
harness
to
forward
the
data
to
the
correct,
targeted
location
within
your
project.
So
as
a
developer,
this
is
all
I
would
have
to
do.
I
wouldn't
have
to
worry
about
mutation
or
anything
just
for
the
data
on
and
again
it
is
only
just
raw
data.
So
an
array
of
bytes
and
a
few
things
to
note
about
this,
and
I
did
already
mention
it.
The
data
is
already
mutated
and
this
is
called
once
permutation.
A
So
if
you
have
very
heavy
setup
processes
as
a
developer,
you're
going
to
have
to
be
aware
of
that
to
set
everything
up
in
a
way
that
won't
slow
down
your
fuzzing.
So
there's
a
lot
to
be
aware
of,
as
you
write
fuzzers,
basically,
the
faster
your
fuzzers
run,
the
more
success
you'll
have,
if
you
can
only
fuzz
a
few
iterations
per
second,
it's
better
than
nothing,
but
it
if
you
can
exponentially
make
it
faster.
It's
obviously
much
much
better.
A
A
Now
it
does
look
very
similar
to
the
buzzing
harness
here.
It's
a
single
function
and
it
does
things
inside
of
it
sends
data
onto
the
target
library
or
project
in
a
targeted
manner,
and
the
target
is
the
sum
function
here,
except
now
we
have
two
variables
and
it's
not
raw
bytes,
so
that
right
there
would
make
it
harder
to
make
it
work
with
lib,
fuzzer
or
other
types
of
fuzzing
libraries.
A
So
the
rest
of
this
brown
bag
is
mostly
going
to
talk
about
how
we
can
take
these
unit
tests
and
translate
them
or
map
them
into
a
similar
type
of
setup
as
what
norm,
fuzzing
is
or
and
map
them
into
the
same
type
of
setup
that
fuzzing
usually
uses
today
all
right.
So
let's
look
at
this
one
more
time,
so
this
is
called
once
for
iteration,
there's
mutation.
That
happens.
It's
taken
care
of
by
lymph
fuzzer,
though
we
don't
have
to
do
it
ourselves.
A
There
is
a
corpus
of
data.
We
will
talk
about
that,
but
it
is
a
basically
a
way
of
seeding
the
fuzzer
so
that
you
have
initial
known,
good,
starting
points
to
start
from.
Otherwise,
you
start
with
an
empty
data
and
you
have
to
guess
the
correct
format
or
the
fuzzer.
Does
it
also
keeps
track
of
whatever
the
fuzzer
deems
it
as
interesting
and
we'll
get
to
that
as
well.
So
these
are
also
usually
feedback
driven
and
there's
only
one
parameter
fuzz
at
a
time.
A
Now,
if
we
look
at
this,
though
we
could
easily
put
this
in
a
loop
call.
It
once
per
iteration.
We'd
still
have
to
modify
some
of
the
parameters,
so
we
have
to
take
care
of
mutation.
It's
not
easily
supported
the
corpus
corpus.
A
You
have
to
think
about
it
totally
differently.
It
is
not
just
a
collection
of
raw
sets
of
bytes
in
the
corpus
it
it's
a
lot
more
complicated
when
you're
fuzzing
function,
calls
or
multiple
function
calls
from
a
unit
test
feedback
driven.
This
should
be
very
easy
to
map
and
we'll
talk
about
that.
One
input
parameter
fuzz
at
a
time.
I
did
want
to
put
this
in
its
own
category
to
talk
about
because
it
does
pose.
A
A
Now,
if
you
look
at
say
a
lib
png,
a
function
for
lib
fuzzer,
there's,
maybe
50
lines
of
setup
code
before
you
actually
forward
the
data
to
libpng,
so
that
concept
still
applies,
but
with
unit
tests.
You
have
to
be
aware
of
the
testing
framework
and
still
maintain
that
setup
and
tear
down,
and
this
was
something
that
I
ran
into
with
pi
test
auto
explorer.
It
would.
A
I
ran
into
database
database
problems
because
I
wasn't
running
the
setup
and
tear
down
every
fuzz
iteration,
so
it
would
try
to
in
a
unit
test,
try
to
create
a
new
user,
and
then
you
know,
do
operations
on
it,
but
primary
key
constraints
failed,
and
I
was
getting
all
of
these
other
errors
that
didn't
have
anything
to
do
with
actual
bugs
in
the
code.
So
calling
a
test
case
multiple
times,
there's
more
to
be
aware
of
than
just
putting
it
in
a
loop.
A
A
Let's
say
if
total,
let's
say
the
developer,
didn't
want
the
sum
function
to
ever
return
the
value
10.,
instead
of
checking
that
it's
actually
what
it
wanted,
or
instead
of
checking
that
5
plus
5
equals
10,
the
developer
actually
cared
about
the
total,
never
being
equal
to
10.
For
some
reason,
we
could
maintain
that
logic
from
the
developer
during
fuzzing
and
shake
out
cases
where
that
still
might
be
the
case.
A
All
right
so
mutating
binary
data
is
very
straightforward.
It's
very
simple
to
change
a
byte.
Add
a
new
byte
mix,
things
up
and
you
can
add
all
types
of
logic.
On
top
of
it,
for
example,
this
is
the
go
fuzz
mutator,
there's
actually
19
different
cases
in
the
source
code.
For
this,
this
is
just
as
many
as
would
fit
on
the
slide.
A
Often
it
has
to
do
with
these
types
of
things,
removing
bytes
treating
a
say,
four
bytes
in
a
row
as
an
unsigned
integer
and
incrementing
decrementing,
shifting
left
or
right
that
type
of
thing,
flipping
bits.
Yeah
and
again,
every
buzzer
buzzing
framework
has
their
own
set
of
mutators
and
they're
kind
of
derived
from
the
experience
of
the
person
who
wrote
the
fuzzer.
A
Now
again,
this
is
directly
operating
on
raw
bytes.
So
how
do
we
mutate
built-in
types?
So
they're
collections
their
strings,
integers,
booleans,
so
booleans,
it's
a
very
simple
one.
Only
thing
you
can
do
is
make
it
true
or
false
flip
it
the
other
way
and
that's
about
it.
But
what
if
you
have
a
dictionary
or
a
list
or
a
nested
set
of
lists?
A
A
A
All
right
so
across,
I
have
a
slide
that
talks
about
just
a
second.
Let
me
find
it:
oh,
okay,
it's
in
the
different
section,
so
we
will
circle
back
to
what
crossover
means
in
this
setting,
but
basically
it
is
merging
two
sets
of
inputs.
So
say
we
have
these
two
inputs:
a
and
b
these
two
values,
integer
values
and
hex.
If
you
wanted
to
mix
them,
these
are
ways
you
could
do
it
now.
A
If
this
actually
makes
sense
and
gets
you
results,
I
have
no
idea,
but
this
is
a
type
of
topic
that
is
or
a
concept
that
is
used
when
you're
fuzzing
binary
data.
You
have
two
sets
of
inputs
that
were
previously
deemed
important
and
you
might
mix
them
you
do
you
cross
them
over
to
create
a
new
input?
A
That's
blended
from
both
of
them
so
trying
to
map
buzzing
these
native
data
types
in
different
programming
languages
to
that
concept
may
not
always
make
sense,
but
in
this
case
maybe
it
gets
you
something,
but
maybe
not
all
right.
So
the
other
part
of
this
is
that
you
also
have
non-built-in
types.
You
have
data
structures,
you
have
classes,
you
have
whatever
the
developer
dreamed
up,
so
these
will
be
passed
around
into
the
functions
that
are
being
targeted
in
the
unit
test.
A
One
example
of
code
that
or
a
project
that
already
modifies
these
at
runtime
is
go
fuzz
and
that
is
different
than
go.
Dash
fuzz
go
dash.
Fuzz
is
the
lib
fuzzer
implementation
for
go,
or
basically
it's
a
fuzzy
library
that
uses
lib
fuzzer
made
specifically
for
go
phrase.
It
that
way
go
fuzz,
specifically
modifies
fields
randomly
on
go
objects.
A
And
if
we
wanted
to
do
something
similar
in
other
languages,
we
would
have
to
use
runtime
introspection
and
I
will
get
to
the
place
where
that
introspection
would
occur
all
right.
So
let's
talk
about
a
corpus
of
inputs,
so
a
corpus
of
inputs.
I
mentioned
it
seeds,
your
fuzzer.
It
helps
you
know
a
known
good
starting
place,
something
that
gets
you
relatively
deep
into
the
code
without
having
to
start
from
nothing,
and
it
helps
a
lot
with
genetic
algorithms
and
hill,
climbing
algorithms.
A
So,
for
example,
lib
png.
Just
during
its
normal
development,
they
have
collected
a
series
of
test
pngs
that
you
could
very
easily
use
as
a
corpus.
If
you
wanted
to
fuzz
lib
png-
and
these
are
intentionally
made
to.
A
To
cause
libpng
to
go
down
different
code
paths
they're
for
testing,
so
there
is
a
lot
of
correlation
between
fuzzing,
corpuses
and
data
that
you
might
collect
just
throughout
your
normal
process
of
writing
unit
tests.
A
A
So
if
we
want
to
track
that
in
a
corpus
we'll
have
to
think
about
it
completely
differently,
so
we'll
want
to
save
function.
Call
invocations
we'll
need
to
track
the
variables
that
are
used
during
the
unit
test.
Not
all
variables
that
are
in
the
unit
test
will
be
relevant
or
maybe
even
passed
directly
to
the
target
function.
So
we'll
have
to
do
some
analysis
of
the
code
to
make
that
work.
A
A
So,
instead
of
having
a
corpus
of
unique
function,
calls
we
may
need
to
even
track
sequences
of
function
calls.
So
if
we
look
back
at
this
one
suppose
the
unit
tests
call
sum
and
then
with
the
return
value,
it
calls
sum
again
with
that
value
as
another
argument
or
it
chains
a
few
function
calls
together.
We
may
need
to
track
that
as
a
series
in
order
to
kind
of
have
a
corpus
that
makes
sense
in
the
unit.
A
A
All
right,
so
this
is
something
that
I
took
from
pi
test
auto
explorer.
This
is
the
type
of
data
that
I
was
saving
for.
Every
new
crash
or
error
that
it
that
it
detected
it
has
the
there's
the
file,
here's
the
source,
and
these
are
all
of
the
inputs
that
were
passed
to
the
function,
and
this
is
the
type
of
data
that
is
being
mutated
by
pi
test,
auto
explorer
and
along
with
these,
the
this
is
saved
in
memory.
None
of
this
was
ever
written
to
disk.
A
So
if
we
wanted
to
make
unit
test,
derives
fuzzing
that
operates
on
function
calls
work,
we'll
have
to
be
able
to
serialize
this
to
disk
so
that
we
can
persist
the
corpus
and
you,
you
reuse
it
in
later
fuzzing
sessions
and
again
it
just
gets
more
complicated
because
we're
dealing
with
non
straightforward
data
types.
A
All
right,
so
any
questions
before
I
talk
more
about
feedback
driven
fuzzing
and
how
that
would
play
out
with
unit
test
drive
test
cases,
no.
Okay,
all
right
so
feedback
driven
fuzzing.
I've
got
a
few
links
here.
If
anybody
needs
them
for
reference,
so
genetic
algorithms,
they
mutate
data
to
try
to
so
I
didn't
add
this
in
here:
there's
a
concept
of
a
fitness
function,
something
that
says
one
input
or
one
thing
is
better
than
another
thing:
more
ideal
or
performant.
A
So
this
is
exactly
what
most
of
the
fuzzers
are
using
now
there's
also
if
a
fuzzer
is
not
explicitly
using
that
it
may
be
a
variant
of
a
hill
climbing
algorithm
where
one
solution
is
found
and
then
incremental
changes
are
made
to
the
input
again
with
some
sort
of
fitness
function
to
tell
if
you're,
making
incremental
progress.
A
Oh
all
right,
so
if
we
look
at
these
concepts
with
unit
testing
in
mind,
we
need
a
fitness
function.
Code
coverage
is
something
that
is
already
used
by
most
testing
frameworks
and
most
projects
have
that
set
up,
or,
I
will
say
a
lot
of
them,
it's
very
common,
so
getting
that
type
of
feedback
for
fuzzing
should
be
relatively
straightforward.
A
Unit
tests
and
testing
frameworks
are
set
up
to
already
deal
with
that
now.
This
is
an
example
of
how
feedback
driven
fuzzing
would
work
and
we'll
walk
through
it.
So
let's
say
we
have
this
function.
Handle
data
takes
two
inputs,
a
an
array
of
bytes
chars
and
we've
got
the
length
of
it.
If
length
is
less
than
two
it
returns,
otherwise
it
does
some
processing
now,
if
we
call
this
function
and
with
an
empty
array
of
bytes
with
length,
zero
and
nothing
in
our
corpus.
A
A
So
now,
if
we
run
this
function
again
with
these
inputs
a
and
one,
we
run
through
the
same
lines
and
we
don't
cover
any
new
code,
no
new
code
coverage
and
we
add
nothing
to
the
corpus
again.
If
we
do
it
with
b,
nothing
doesn't
help
us
now.
Let's
say
we
use
a
previous
input
b
and
or
we
decided
to
randomly,
put
together
two
bytes
and
we
have
ba.
A
Now
we
got
past
this
first
check
and
we're
here,
which
does
mean
that
we
had
new
coverage,
and
so
now
we
have
two
items
in
our
corpus
and
we
choose
the
last
item
we
had
and
we
mutated,
and
we
come
up
with
h
a
so
now.
We
keep
proceeding
a
little
further
in
the
process
or
into
the
code,
and
here
we
so
we
have
b
a
h,
a
and
now
suppose
we
had
mutated
h
a
to
have
an
h,
lower
case
a
now.
We've
made
it
even
further.
A
Now
we're
processing
h
a
and
again
the
same
process
would
occur
since
we
have
these
items
in
our
corpus,
and
these
tend
to
be
prioritized
that
that's
logic
kind
of
found
in
the
fuzzing
framework
itself.
A
So
recent
items
in
the
corpus
sometimes
are
prioritized
over
old
items
in
the
corpus,
so
the
odds
of
this
being
randomly
created
are
pretty
high
and
then
the
process
continues.
We
take
another
item
from
the
corpus,
randomly
mutate
it
and
try
and
get
further
into
the
code.
A
A
A
So
I
wrote
a
something
for
fun
and
it
doesn't
use
code
coverage
for
feedback.
I
actually
use
performance
events
and
it
uses
instruction
counts
and
branch
counts
as
its
feedback
mechanism.
So
you
don't
have
to
instrument
the
code.
All
you
have
to
do
is
run
the
target
binary
with
these
performance
measurements
in
place.
A
Actually,
let's
look
at
target,
let's
see.
So
this
is
what
the
target
looks
like
very
similar
to
the
test
example
except
it's
very
deeply
nested,
and
it
should
be
very
obvious
if
we,
if
the
fuzzer
is
more
performant
than
just
randomly
generating
what
nine
bytes
with
the
correct
value.
A
All
right,
so,
if
we
run
this,
it
does
occur
pretty
quickly
and
this
is
using
performance
events
instead
of
direct
code
coverage.
So
my
emphasis
that
I
often
bring
up
that
code
coverage
is
just
one
feedback
mechanism,
and
this
is
one
example
of
that.
So
if
code
coverage
is
unavailable
during
unit
testing,
there
are
other
ways
that
we
can
figure
out
if
we
are
progressing
further
into
the
code.
A
All
right,
so
one
input
parameter,
is
fuzzed
at
a
time.
I've
talked
about
this
a
lot
of
times.
I
don't
think
we
need
to
talk
about
it
again
so
now.
This
is
where
the
actual
implementation
would
come
into
play,
and
I'm
saying
it
that
way,
because
I
have
not
made
it
as
far
on
the
code
for
this
as
I
wanted
to.
So
that's
why
the
brown
bag
is
kind
of
ending
around
here.
A
But
this
is
my
approach,
and
I
am
in
the
middle
of
this
I'm
and
here
somewhere,
but
the
methodology
that
I
am
using
right
now
is
to
parse
the
existing
unit
test
into
an
ast,
an
abstract,
syntax
tree
and
then
rewrite
all
the
function
calls
previously.
I
had
rewritten
the
python
byte
code
to
do
this
hooking
this
instrumentation
to
capture
known
good
values
from
the
original
unit
test,
but
that's
really
not
that
sustainable
and
it's
very
python
specific,
so
rewriting
the
source
code
should
work
between
major
versions
roughly
of
the
language.
A
Unless
it's
a
very
new
language
and
it
would
work
across
languages,
you
would
still
have
to
implement
it,
but
it
is
a
bit
more
generic
and
I
think,
a
bit
more
sustainable
and
also
you
would
also
need
to
monitor
for
code
coverage
or
whatever
feedback
mechanism
you
would
use.
A
A
A
You
could
have
a
thousand
unit
tests
and
you
wouldn't
create
a
thousand
separate
targets
that
wouldn't
really
make
sense
to
me.
I
think
you
would
probably
round
robin
through
each
of
them
on
each
fuzzing
iteration
but
yeah,
that
is
about
it.
Preserving
the
imports,
the
environment,
the
module
hierarchy
of
everything
that
you
need
for
the
unit
test
is
something
that
you
would
have
to
do
when
you
parse
the
ast
and
create
the
standalone,
fuzzing
harnesses
and
yup.
A
That's
what
the
rest
of
this
would
be
draw
the
rest
of
the
owl
it
there
is
a
pretty
clear
path
to
it.
I
did
not
want
to
put
off
the
brown
bag
yet
again,
just
so
that
I
could
have
code
in
place
or
in
the
state
that
I
want
just
so
we
could
talk
about
it,
but
yeah
that
is
the
process,
and
so
far
everything
does
seem
to
be
panning
out.
Does
anyone
have
any
questions?
Anything
you'd
like
me
to
go
over
a
bit.
B
More
hey
james
thanks
for
the
talk,
I
do
have
a
question
just
a
pretty
generic
one,
but
so
you
mentioned
that
you're
you're,
currently
working
on
this
is
this
I'd
be
curious
to
to
check
it
out.
I
noticed
that
the
one
performance
fitness
test
that
you
had
was
written
in
rust
is
that
the
this
one.
A
Yeah
yeah
that
one
yeah,
so
let's
see
ross,
isn't
very
high
on
the
priority
list.
As
far
as
work
is
concerned,
it's
just
more
of
a
new
language
that
I
wanted
to
learn,
and
so
this
was
something
that
I
did
for
fun.
I
brought
it
up
because
it
uses
a
different
feedback
mechanism
than
code
coverage
yeah,
but
the
same
concepts
would
apply
with
rust
as
well.
Yeah
is
that
where
you
were
going
with
that
question.
B
B
A
A
All
right,
so
this
is
here
we
go
resume
chat.
So
this
is
the
pie
test,
auto
explorer
project.
Where's
chat
chat
there
we
go
all
right,
so
this
is
where
I'm
starting,
because
I've
got
a
lot
of
code
in
place
for
this.
I
ran
into
some
snags
trying
to
trying
to
make
it
more
than
mvc
make
it
work
very
nicely
with
pi
test
instead
of
just
raw
rewriting
the
source
code
and
then
implementing
it.
A
So
right,
currently
pi
test,
auto
explorer
the
main
branch
of
it
does
do
the
instrumentation
and
captures
the
function
calls
it
doesn't
create
standalone
files
to
do
the
fuzzing,
and
that's
that's
the
aspect
that
I'm
currently
working
on.
Once
I
figured
out
those
topics
with
pi
test
auto
explorer,
then
I
was
going
to
basically
re-implement
them
in
ruby
and
go
yeah
and
they
each
have
kind
of
different
tool.
A
Sets
that
could
help
you
with
it,
but
pi
test,
auto
explorer
would
be
the
one
to
look
at
right
now.
It's
the
one.
I've
been
focusing
on.
B
Cool
nice.
So
would
this
be
kind
of
in
the
same
vein
as
like
the
generic
sas
idea,
where
it's
language,
agnostic,
the
fuzzer
or
would
we
be
implementing
specific.
A
A
To
me,
I
think
the
easiest
most
sustainable
way
would
be
to
rewrite
the
source
code
parse
it
into
an
ast,
so
that
would
have
to
be
language
specific,
but
maybe
there's
a
way
to
wrap
it
into
a
common
library
where
you
can
abstract
away
a
lot
of
the
language
specific
things.
So,
if
you
wanted
to
add
a
new
language,
then
maybe
all
you
have
to
do
is
implement
the
specific
pieces
right
so
like
implement
the
source
code
rewriting
and
a
way
to
understand
the
testing
framework.
A
B
A
A
A
No
all
right
well
cool,
then
I
will
stop
the
recording
here
and
I
did
add
a
part
one
on
this.
I
will
have
a
part
two
once
the
code
is
in
place
and
in
a
state
to
show
I
in
the
past
I
had
when
I've
merged
a
lot
of
technical
topics.
With
wanting
to
talk
about
the
code,
it
tended
to
get
really
messy
presenting
anyways,
so
I'm
kind
of
liking
having
it
split
up.
So
next
time
we'll
be
talking
specifically
about
the
code,
all
right
and.