►
From YouTube: Annotating diffs - Git Merge 2018
Description
Presented by Grant Mathews, Software Engineer, Atlassian
About GitMerge
Git Merge is the pre-eminent Git-focused conference: a full-day offering technical content and user case studies, plus a day of workshops for Git users of all levels. Git Merge is dedicated to amplifying new voices in the Git community and to showcasing the most thought-provoking projects from contributors, maintainers and community managers around the world. Find out more at git-merge.com
A
All
right,
hello,
my
name
is
grant
I'm
a
developer
with
bitbucket
cloud,
and
today
I
want
to
talk
about
annotating
diffs,
so
I
spend
a
lot
of
my
time
reviewing
code
and
that
typically
means
looking
at
changes
in
the
form
of
diffs.
So
here's
a
DIF.
This
is
a
minor
variant
on
a
unified
DIF
format.
So
it
has
a
diff
header,
which
has
some
header
lines
that
we
don't
really
care
about,
and
it
has
a
header
lines
that
tell
you
which
file
is
changing.
A
Those
we
do
care
about
tips
normally
have
one
or
more
diff
a
diff
hunk
has
all
the
changes
that
you
would
expect
to
see
in
a
different
part
and
a
diff
hunk
has
a
hunk
header
which
tells
you
where
those
changes
are
occurring
and,
of
course,
has
all
the
added
lines
and
remove
lines
that
you
would
expect
to
see,
and
it
often
has
contexts
lines
which
are
just
unchanged
lines.
So
hopefully,
none
of
that
is
too
new
for
anybody,
but
that's
unified.
A
For
example,
I
might
want
to
know
what
the
commit
message
is
for
a
particular
set
of
changes
and
I
don't
have
to
go
through
every
commit
to
find
that
oftentimes.
It's
important
to
see
if
changes
happened
simultaneously,
all
in
the
same
commit
that
can
be
very
important,
sometimes
and
oftentimes.
There's
just
a
lot
of
detail.
You
know
a
lot
of
information
associated
with
a
particular
line,
so
this
one-line
change
has
137
lines
in
the
commit
message.
A
Most
projects
aren't
like
the
Linux
kernel,
though
they'll
put
that
level
of
detail,
typically
in
a
pull
request
on
a
website
somewhere.
So
when
I
review
code,
it
looks
more
like
this,
or
this
is
just
part
of
a
dip
in
a
PR
on
the
bucket
and
even
in
this
view,
I
often
want
more
context.
So
there's
a
drop-down
menu
that
has
extra
stuff
and
I
think
it
would
be
cool
if
we
could
add
a
show,
commits
item
to
the
drop
down
menu
like
that,
and
that
this
isn't
like
a
product
announcement.
A
Anything
I'm,
not
promising
anything,
just
I
think
it'd
be
cool
to
have
the
ability
to
show
the
commits
that
are
associated
with
the
changes
that
I'm
looking
at
and,
if
we're
being
really
ambitious,
we
could
put
those
commits
right
in
line
with
those
changes
like
that.
Just
have
a
link
straight
from.
Oh,
this
change
is
in
this
commit
and
that
link
would,
of
course,
just
take
you
to
the
commit
page
where
you
may
or
may
not
have
a
useful
commit
message.
But
what
the
bitbucket
team
tends
to
have
installed
on
their
repositories
is
very
useful.
A
A
That
might
be
useful
and,
if
we're
again
being
really
ambitious,
we
could
put
the
link
to
the
PRS
right
next
to
those
lines
changed
I
think
that
would
be
very
useful
functionality.
So
the
question
is:
how
do
we
tie
the
commits
to
the
changes
that
we're
looking
at
like?
What?
What
do
we
do?
Do
you
know?
How
would
we
represent
that
so
I
want
to
suggest
an
annotated,
diff
format.
So
this
is
what
I
think
an
annotated,
if
should
look
like
it
kind
of
looks
like
a
unified
diff.
A
A
Each
line
gets
prefixed
with
a
commit,
and
so
that's
the
annotation,
of
course
also
useful
I
think
it
would
be
nice
to
have
the
original
line
number
from
when
that
change
was
introduced,
and
that
would
be
very
useful
if
you're
in
a
PR
on
a
website-
and
you
want
to
attach
a
comment
to
a
particular
line
of
change
and
maybe
have
that
comment
show
up
in
different
contexts,
maybe
in
subsequent
PRS
or
if
somebody's
just
browsing
commits
so
I
think
that'd
be
cool.
There's
one
other
piece
of
information.
A
We
need
to
really
make
that
work
though,
and
that's
the
file
name
which
can
change
during
a
DIF.
So
we
need
to
track
the
renames.
So
we
just
throw
that
in
the
header.
That
already
has
that
name.
Information
and
a
detail
that
would
become
important
later
is
I,
feel
that
removed
lines
should
show
the
commits
that
removed
them.
That
just
seems
like
the
most
logical
thing
to
do.
You
know,
for
that.
Annotation
less
important
is
how
context
lines
get
annotated
for
my
use
cases:
I,
don't
particularly
care
about
context.
This
could
be
all
zeros.
A
Space
is
not
important
here.
I
just
gave
it
the
same
hash
as
everything
else,
because
that
was
literally
the
easiest
thing
to
do
not
critical,
but
I
think
that
would
be
a
useful
way
to
represent
this
information.
Let's
look
at
a
slightly
bigger
example:
the
previous
annotated,
if
only
had
one
commit
this
one
has
four
and
you
can
see
that
all
four
show
up
in
the
rename
list
just
again
feels
useful
and
then
you
can
look
at
the
line
numbers
and
they
do
jump
around.
A
We
are
tracking
the
original
line,
numbers
small
detail,
but
again
I
think
that
would
be
really
cool
to
be
able
to
generate
and
use
in
displaying
diffs
for
PRS
in
particular.
So
then
the
question
is:
how
do
we
generate
these
annotations?
What
what
tools
can
we
use
and
get
hopefully
provides
an
annotate
command
already?
A
So,
let's
explore
that
and
see
if
we
can
use
it
for
generating
annotated,
diffs
and
what
get
annotate
does
is
it
will
look
at
a
file
and
it
will
go
through
history
where
in
get
history
is
represented,
of
course,
as
a
directed
acyclic
graph,
where
each
commit
is
a
node
in
that
graph.
You
don't
really
need
to
know
too
many
details
about
that,
except
that
it's
often
called
the
dag
and
going
through
history
is
often
call
walking
the
dag,
so
I'll
probably
say
that
a
lot
so
get
annotate
starts
with
file
walks.
A
The
dag
looks
at
lines
that
are
introduced
and
tries
to
match
those
lines
to
the
lines
that
it
has
in
the
file
and
eventually
it
will
spit
out
an
annotated
version
of
that
file,
and
here
we
see
that
there's
hashes
there's
line
numbers.
This
is
literally
half
of
what
we
need.
This
is
great.
This
is
every
added
line
and
even
every
context
line
which
we
don't
really
care
about,
but
still
cool
for
a
diff.
So,
let's
see
if
we
can
really
squeeze
diff
annotation
in
given
what
yet
already
gives
us.
A
A
The
only
thing
left
to
do
is
get
the
removed
lines
and
get
annotate
has
a
reverse
option,
which
will
walk
the
dag
backwards,
which
is
forwards
through
history.
If
you
give
it
a
file
and
a
place
to
start,
it
will
start
at
that
place.
Look
at
the
lines
of
that
file
and
then
track
the
last
time
that
it
saw
each
line
as
it
goes
forwards
through
history.
So
it's
not
quite
what
we
want.
It
doesn't
annotate
when
the
line
was
removed.
It
annotates
when
it
was
last
seen.
A
So
it's
it's
off
by
one
in
terms
of
what
we
want,
but
it's
close,
so
we
can't
quite
get
what
I
really
want
out
of
the
built
in
give
commands
and
maybe
a
Perl
script
wrapped
around
those
and
there's.
You
know
two
problems,
one.
We
would
have
to
call
get
annotate
twice
for
each
file.
That
shows
up
in
the
diff,
so
it'd
be
slow,
not
too
bad,
but
it's
also
not
the
annotations
that
we
want.
A
A
Just
like
get
annotate
does,
and
we
know
that
get
annotate
does
exactly
what
we
want
for
added
lines
so
that
that
shouldn't
be
contentious,
oh
and
at
when
we
implement
this
ourselves
at
each
step
in
the
walk,
we
can
look
at
every
file
that
we
care
about
instead
of
having
to
do
multiple
walks
per
file
so
slightly
more
efficient
and
we
can
walk
backwards
through
the
dag,
just
like
it.
Annotate
reverse
does,
but
instead
of
comparing
it
to
a
file,
we
can
compare
to
the
diff
and
take
the
diff
at
each
point
and
see.
A
Oh,
this
removed
line
matches
up
with
that
removed
wine
and
in
theory
we
can
annotate
a
diff.
So
that
seems
like
a
reasonably
obvious
approach
to
take
and
when
I
was
looking
at
doing
this
I
thought.
Okay,
I
can
probably
you
know,
be
clever
and
being
faster
about
it.
So
I
worked
on
an
optimized
approach
and
one
of
the
requirements
to
be
faster
is
to
walk
the
dag
once
dag
box
can
be
slow,
so
I
wanted
to
minimize
the
number
of
dag
blocks
and,
of
course,
any
real
implementation
wouldn't
actually
watch
walk
the
dag
twice.
A
It
would
generate
a
cache
of
things
that
had
seen
on
the
first
walk
and
I
didn't
want
to
use
a
cache,
because
that
is
differently
expensive,
for
example,
I
think,
two
weeks
ago
there
was
a
bit
bucket
cloud
support
ticket
about
somebody
who
had
an
11
million
line,
diff
that
wasn't
rendering
properly
for
some
reason
and
I,
don't
want
to
store
11
million
lines,
especially
multiple
times
in
a
cache
somewhere
where
I'm
blocking
other
things.
So
one
walk,
no
cache
trickery.
A
A
Everything
gets
the
same
annotation,
and
so
you
can
do
that
for
every
commit
that
you
see,
and
then
all
we
have
to
do
is
figure
out
a
way
to
combine
those
as
walk
through
to
build
up
a
diff
and
that
should
without
using
a
cache,
allow
us
to
get
away
with
one
walk
and
if
we're
building
up
an
annotated.
If,
as
we
go
and
we
have
a
diff
at
each
step,
we
don't
even
need
that
initialed.
A
If
that
becomes
extraneous
and
getting
rid
of
that
is
kind
of
in
line
with
another
performance
constraint,
never
keeping
more
data
than
we
need,
and
so
that's
no
cash,
no
extra
diff
one
dag
walk.
The
super,
clever
optimized
approach
and
I
came
up
with
that
approach
without
really
digging
into
the
problem
and
Donald
Knuth
has
a
quote
that
comes
around
from
time
to
time.
He
says.
Premature,
optimization
is
the
root
of
all
evil.
So
when
I
talk
about
my
optimized
approach
for
the
rest
of
the
talk,
you
should
keep
that
quote
in
mind.
A
That
is
important,
but
moving
on,
let's
look
at
the
simplified
single
file
case,
see
if
we
can
actually
make
this
work
you
know
like.
Is
it
gonna
actually
do
what
I
want
it
to
do
so
here
we
have
three
commits
C
on
top
of
B
on
top
of
a
and
then
the
state
of
a
file
at
each
commit,
and
so
we
want
the
diff
from
A
to
C.
You
know
just
dropping
foo,
adding
Baz
and
we
want
to
know.
Can
we
annotate
that
incremental
e
step
by
step?
So
let's
try
it.
A
A
That
would
be
a
completely
valid,
annotated,
diff,
but
we're
going
all
the
way
all
the
way
to
a
so,
we
have
to
step
to
B
generate
the
diff
from
B
to
a,
and
we
know
that
those
changes
were
only
introduced
by
commit
be
so
trivial
to
annotate
cool.
Now
all
we
have
to
do
is
figure
out
how
to
combine
these
two
annotated
ifs,
all
right.
You
can
kind
of
look
at
it
and
stare
and
say
all
right
foo
with
foo
Baz
with
Baz.
It
should
look
something
like
that.
A
That
seems
doable
like
that's
a
reasonable
sequence
of
steps
to
take.
So
this
looks
promising.
This
looks
like
oh
I
can
actually
maybe
code
this
up,
but
before
I
do
that,
let's
see
if
I
can
ignore
or
anything
to
make
it
easier
so
list
of
things
to
not
worry
about.
First
off,
multiple
files
am
I.
Gonna
have
a
weird
like
situation
where
I
have
a
ton
of
files
to
track,
and
they
overlap
and
weird
stuff
happens,
and
it
turns
out.
A
A
What
about
rename
so
renames
get
weird
all
sorts
of
things
happen.
You
know
it's
kind
of
difficult
to
detect,
renames
and
again
it's
the
same
case
just
generating
the
diff
yeah
it's
hard
for
that
code,
but
it
spits
out
the
rename
I
never
have
to
worry
about
that,
and
since
each
file
is
entirely
independent,
it
turns
out
all
of
the
interesting
cases
happen
just
in
one
file.
So
I
don't
have
to
worry
about
that
case
at
all
easy.
A
What
about
merges
now
merges
are
very
important.
That
was
my
release,
pull
request
case,
where
that's,
basically,
just
a
series
of
merges
and
I
really
want
that
to
be
annotated
and
annotated
correctly.
So,
let's
quickly
look
at
that
where,
if
you
have
a
merge-
and
you
know,
there's
your
dag-
it's
not
clear
that
that
can
work
just
by
going
incrementally.
A
You
know
one
by
one
step-by-step,
you
kind
of
have
that
left
part
that
looks
like
okay,
yeah
I
know
how
to
annotate
that
part,
and
maybe
that
right
part
I,
could
also
annotate,
and
you
know
they
start
and
end
at
the
same
place.
So
I'll
assume
that
I
can
handle
that
later,
just
to
get
started,
I
will
ignore
it.
A
Hopefully
I'll
come
back
to
it
and
get
that
working
branches
on
the
other
hand,
or
if
you
have
the
heads
of
two
branches
of
development,
and
you
want
to
compare
those
and
you
want
to
annotate
the
diff.
It's
very
easy
to
generate
a
diff
between
two
heads.
You
know:
do
it
all
the
time,
but
what
would
in
the
annotation?
For
that
look
like
at
first,
it
kind
of
seems
like
the
merge
case.
You
have
that
left
side
where
oh
yeah
I
can
generate
an
annotation
for
that,
and
then
you
have
the
right
side.
A
Oh
yeah
I
could
build
up
something
there,
but
they
don't
share.
You
know
the
same
starting
point,
so
it's
not
clear
that
they
could
be
easily
combined
and
you
might
be
able
to
get
around
that
say
by
flipping
the
direction
of
one
of
the
walks
as
I
owe
you
go
down
then
up,
but
that
gets
weird
where
you're
going
backwards.
So
are
you
changing
like
you're
added
and
removed
lines
as
you
go
along
if
you
annotate
a
diff
and
you
get
a
removed
line
with
a
commit?
A
If
you
go
to
the
commit,
will
it
be
an
added
line?
Thinking
about
this
one
hurt
my
head
so
I
just
said
no
forbidden,
my
code
will
just
crash.
Well,
it
will
crash
with
a
nice
message.
Yeah
don't
do
that,
but
that
really
simplifies
the
problem.
That
means
okay.
We
only
care
for
now
about
the
linear
single
file
case.
A
Great
all
that's
left
to
do
is
you
know,
line
up
the
lines
and
what
that
means
is
that
we
have
to
track
changes
to
line
numbers
through
history,
we'll
be
adding
and
removing
lines
well,
we'll
be
coming
across
added
in
removing
lines
as
we
walk
the
deck
we
just
need
to
know.
You
know
how
things
shift
and
we
need
to
keep
track
of
how
things
shift
so
that
we
can
match
up
lines.
You
know
just
oh,
these
lines
collide
and
when
that
happens,
we
have
to
decide
what
to
do
with
lines.
A
A
We
know
that
it's
actually
from
line
2
as
well,
and
so
we
track
all
the
line
numbers
we
possibly
can
for
this
and
from
line
2
matches
up
directly
with
to
line
2
in
that
next
chain.
So,
oh
that
actually
matches
up
nicely.
We
know
how
to
combine
those
great,
let's
look
at
a
slightly
different
case.
So
what?
If
there's
a
bunch
of
lines
coming
in
shifting
things
down?
What
happens
if
we
have
this
annotated,
diff?
And
then
we
run
across
these
changes?
A
Later,
you
might
say:
Oh,
d3,
Baz,
that
there
is
no
line
3
and
that
other
diff
what
it
turns
out.
This
is
actually
the
same
case.
D3
is
to
line
3
in
Committee
d,
but
it's
actually
from
line
one,
and
we
know
that
with
no
magic
just
by
reading
the
hunk
header,
and
so
we
know
that
o
D
three
matches
or
D
one
from
D
one
matches
directly
with
the
two
C
one
changes
in
that.
A
Next,
if
great
so
there's
two
things
to
note
here:
first
off
we
only
have
to
track
line
numbers
well,
I
mean
I
have
to
look
at
line
contents
to
verify
things,
but
the
computer
can
only
you
know,
get
away
with
tracking
line
numbers,
nice,
efficient
and
second
off.
You
can't
just
Ram
these
two
together
you'll
end
up
with
a
removed
line
in
the
middle
of
a
bunch
of
added
lines.
That's
bad
dif
etiquette!
A
So
this
seems
like
something
that
it's
easy
to
come
up
with
a
finite
number
of
cases.
You
know
figure
it
out,
maybe
write
it
on
the
back
of
a
napkin
type
scenario,
and
so
I
did
that
you
know
I
went
through
every
case.
You
know,
plus
meets
plus
minus
meets
context.
That
sort
of
thing
came
up
with
these
cases
are
obvious.
These
cases
are
forbidden,
coded
up
run
it
through
some
tests
find
out
that,
of
course,
the
forbidden
cases
aren't
so
forbidden.
That's
fine!
You
just
go
back.
A
We
examine
your
model,
it's
still
relatively
simple
change.
All
the
code
run
it
through
some
more
tests
find
out.
Oh
no,
this
isn't
quite
tracking
line
numbers
correctly.
So
that's
fine!
You,
you
figure
out
those
details,
and
you
know
I
only
rewrote
this
code
about
three
or
four
times
kind
of
felt
like
thirty
or
forty,
but
at
the
end
of
this
I
got
a
good.
You
know
set
of
code
that
hey
this
will
annotate
diffs
and
it
works
like
almost
every
time.
A
I
say
almost
because
there
was
a
problem
and
that
problem
would
show
up
if
you're
going
through
the
other
dag
you
run
across
a
change
and
then
later
in
that
same
file,
you
run
into
the
opposite
change.
So
if
somebody
makes
a
change,
then
undoes
that
change,
then
you
would
expect
that
to
show
up
in
a
diff
as
either
context
or
to
just
not
show
up
at
all,
because
nothing
changed.
Of
course,
what
my
code
was
doing
was
ramming
all
the
lines
together,
which
is
technically
what
happened.
A
I
mean
that's
technically
accurate,
but
nobody
cares
that
it's,
you
know
technically
correct.
They
just
wanted
to
look
like
they
expect
it
to
so
I
needed
a
way
to
fix
this
problem,
to
make
it
go
away
so
fixing
the
problem.
Nobody
cares
that
it's
not
technically
broken,
but
I
just
needed
to
figure
out
which
lines
were
duplicates.
You
know
really
simple
and
computers
have
a
great
way
of
doing
that.
It's
different
so
in
generating
this
diff
after
running
and
calling
DIF
all
these
times.
A
I
then
had
to
read
if
the
results
and
I
had
to
do
that
after
each
step
in
the
walk
to
avoid
these
sorts
of
things
snowballing
out,
because
you
know,
if
you
have
an
error
and
you
keep
doing
stuff,
it
will
build.
So
what
that
looks
like
is,
if
I
have
a
suspicious
set
of
changes.
I,
look
that
I
okay,
there
might
be
some
repeated
lines,
POW
run
it
through
DIF
and
okay.
A
There
we
are
out
pops
a
better
diff,
and
the
important
thing
here
is
besides
calling
diff,
not
the
command,
but
still
the
algorithm
multiple
times.
You
know
it
slows
things
down
and
we're
now,
looking
at
the
contents
of
the
line,
we
can
no
longer
get
away
with
just
tracking
line
numbers
so
for
an
approach
that
was
supposed
to
be
optimal
or
at
least
optimized
fast.
This
is
very
bad.
It
slows
things
down
quite
a
bit,
so
that
was
depressing
discovery,
but
at
this
point
you
know
the
code
was
almost
feature
complete.
It.
A
It
handles
linear,
merges
just
or
linear
paths,
just
fine
what
about
merges.
So
briefly,
if
you're
walking
through
the
dag
you've
got
some
changes
that
you're
tracking
and
going
through
and
you
run
across
a
merge,
what
do
you
do
so
emerge
when
you're
going
backwards
through
history
kind
of
looks?
Like
a
branch,
you
get
two
separate
paths
that
pop
up
all
right
at
this
point,
I
had
given
up
on
being
clever
and
I
had
given
up
on
being
efficient.
A
I
just
did
the
easiest
possible
thing
copy
the
entire
annotated
diff
once
for
each
parent
and
great
now,
I
have
multiple
identity,
the
diffs
for
each
one.
You
just
track
the
parents
that
it
expects
to
see
next
so
that
when
you
walk
the
dag
run
into
a
commit,
you
feed
it
to
the
correct
annotate.
The
diff
annotate,
combined
as
usual
great
keep
walking,
get
another
commit
feed
it
to
the
correct
annotated.
If
and
it's
a
combine
cool
keep
walking.
A
So
the
interesting
part
here
comes
when
you
run
across
that
common
ancestor,
the
original
branch
point
and
you
have
to
recombine
the
annotated
discs
that
you've
been
building
up
so
carefully,
but
it
turns
out
that's
just
line
matching
and
you
know
decision
logic.
I've
already
done
a
lot
of
that
already.
This
is
slightly
different,
but
not
terribly
different.
You
need
to
keep
track
of
things
like
which
commits
you've
run
across
that
were
merge
commits
so
that
you
can,
you
know,
decide
the
proper
annotation
for
every
case,
but
it
ends
up
being
relatively
straightforward.
A
A
One
issue,
though,
is
that
it
generates
an
unexpected
DIF.
So
there's
no
uniqueness,
guarantee
and
ifs.
You
can
represent
the
same
changes,
many
different
ways
and
since
I
was
generating
diffs
in
an
entirely
different
way
than
get
generates.
Disks
I
was
probably
gonna
end
up,
generating
slightly
different.
You
know
which
line
is
removed,
forces
context.
So
that's
not
great.
That's
not
what
users
expect
that's.
A
A
So
overall,
this
is
clearly
not
optimal,
but
you
know
it
was
functional
and
at
some
point
after
writing
this
code-
and
you
know
coming
to
this
conclusion-
I
got
roped
into
going
to
a
mercurial
sprint
and
at
some
point,
during
the
mercurial
sprint,
I
got
roped
into
talking
about
annotated
disks
and
so
I
presented.
My
oh
hey,
cool.
There's
this
incremental
thing
that
I'm
doing
and
I
got
some
feedback
on
my
approach
and
I
said:
yeah
no
you're
right
that
this
is
pretty
bad,
but
you
guys
are
dag
experts.
You
live
in
breathed
eggs.
A
You
know
tell
me
the
correct
way
to
do
this
and
they
said
ok,
so
you
start
with
a
dip.
I'm
like
okay,
yeah
sure
sounds
reasonable,
and
so
you
walk
the
dag
forwards.
I'm
like
okay
I'm,
with
you
sure
and
I
said
you
walked
it
back
backwards,
Mike
whoa
backwards,
dag,
walk
too
slow.
Can't
do
that
and
I
said.
Of
course
you
wouldn't
walk
the
dag
backwards.
You
would
build
a
cache
on
the
first
walk
and
whoa
cache.
A
No
and
I
was
really
digging
in
my
heels,
about
this
being
like
the
slow
way
what
they
were
recommending,
but
I
realized
afterwards
that
I
hadn't
actually
gone
back,
and
you
know
asked
myself
wait.
Why
is
it
so
slow
like
re-examined
it
in
depth?
So
let's
do
that
now.
So
this
is
the
slow
way,
the
obvious
way
like
the
initial
theoretical,
straightforward
approach
to
annotating
diffs,
so
starting
with
the
diff.
What's
that
cost
us
turns
out
it
limits
the
number
of
lines
you
examine.
A
So
if
you're,
if
you
have
the
diff
all
the
changes
that
you
care
about
and
you
step
through,
that,
you
know
walking
the
dag
and
you
find
a
set
of
changes
that
aren't
in
that
diff.
You
know
you
don't
care
about
them.
You
have
to
track
line
numbers,
but
you
can
throw
everything
else
out,
like
okay,
cool
skip
it
and
that
works
at
the
file
level
too.
A
If
you
have
a
file
that
you're
an
across
and
that
file
is
not
in
the
diff
for
the
whole
thing
out,
so
that's
occasionally
a
nice
optimization
that
you
can
do.
You
can
also
skip
context
during
the
walk,
and
the
only
reason
that
we
tracked
context
in
the
incremental
approach
is
that
we
need
to
spit
context
out
at
the
end.
A
You
know
we
have
to
keep
that
and
moved
along
with
everything
else,
so
that
we
generate
a
reasonable
DIF
for
the
user,
but
generating
that
initial
DIF
gives
us
all
the
context
we'll
ever
need,
and
since
we
don't
really
care
about
annotating
context
or
at
least
I
didn't,
you
can
just
skip
it.
When
you're
looking
at
everything
else,
and
since
some
diffs
are
like
80
85
percent
context
lines,
you
know
if
you
have
a
ton
of
single
line
changes,
that's
not
a
bad
optimization
but,
most
importantly,
it's
consistent
with
user
expectations.
A
It
generates
the
diff
that
the
user
expects
to
see.
So
you
can
do
things
like
generate
a
diff
and
then
say:
oh
I
want
to
annotate
that
and
then
get
the
same.
Diff
annotated
way
more
useful
than
what
I
was
doing,
so
that
turns
out
to
be
a
very
important
stuff
not
to
skip,
and
then
let's
look
at
the
real
performance
killer,
the
double
dag
walk.
So,
of
course
any
real.
You
know
implementation
would
not
double
with
dag
walk.
A
It
would
use
a
cache
generate
on
the
first
walk,
but
it
turns
out
that
cash
is
actually
really
efficient.
It
only
needs
to
store
line
numbers
and
since
we're
only
looking
at
removed
lines
at
that
point,
if
we
only
need
to
remove
two
line
numbers-
and
we
don't
need
it
like
per
line-
we
can
just
store
ranges
of
lines.
So
that's
a
really
efficient
cache
and
we
know
that
will
never
need
line
contents
because
we
already
have
all
the
line
contents.
We
would
ever
care
about
in
that
initial
diff
that
we
generated.
A
So
this
turns
out
to
be
kind
of
a
stupid
thing
to
skip
it's
a
very
important
step,
and
this
isn't
so
much
the
slow
way.
It's
probably
the
fast
way
and
I
say
probably
because
I
haven't
gotten
around
to
implementing
it.
Yet
don't
know
how
I
can
screw
this
one
up
yet,
but
you
know
it
seems
like
the
correct
thing
to
do,
and
so
hopefully
I've
convinced
you
that
annotated
ifs
are
interesting
and
useful
and
can
be
generated
efficiently,
even
if
I
have
yet
to
do
so
myself.
So
thank
you.