►
From YouTube: Build-aware sparse checkouts - Git Merge 2022
Description
Presented by Waleed Khan + David Bernadett
Twitter has developed a tool called focus which manages sparse checkouts as defined by targets in Bazel. By carefully defining the dependency model, we can precompute dependency queries such that users can create the sparse checkout without necessarily having to invoke Bazel first in a dense checkout.
About Git Merge:
Git Merge is dedicated to amplifying new voices in the Git community and showcasing the most thought-provoking projects from developers, maintainers, and teams around the world. Git Merge 2022 took place at Morgan Manufacturing in Chicago, IL on September 14th and 15th.
A
Hello,
everyone
we're
from
Twitter
and
we're
here
for
the
second
of
three
presentations
on
basil
and
sparse
checkouts
yeah.
It's
a
common
problem,
so
Twitter
focus
is
our
open
source,
Tool
software
tool
for
managing,
build,
aware,
sparse,
checkouts.
A
My
name
is
David
Bernadette
I'm,
a
software
engineer
on
the
source
team-
and
this
is
my
co-presenter
Walid
Walid-
is
on
the
build
team
at
Twitter
and
is
our
senior
software
engineer
I'm
really
like
brought
this
whole
project
together.
A
So
our
agenda
today
is
first
we're
going
to
talk
about
like
the
difficulties
of
the
mono
repo
at
Twitter,
and
then
we
are
going
to
talk
about
our
solution
for
these
difficulties
through
our
build
aware,
sparse,
checkouts
and
then
we'll
lead,
we'll
talk
about
our
different
basil,
bazel
caching,
strategies
and
then
finally,
talk
about
adopting
Focus
for
other
teams
using
bazel
and
monorepas.
A
A
A
The
fundamental
problem
with
sparse
checkouts
is
that,
when
you're
trying
to
build
build
a
particular
Target
with
a
handmade
sparse
checkout,
it's
really
really
common
that
you
run
into
like
file,
not
found
errors
in
an
Ideal
World.
We
would
have
a
sparse
checkout
that
is
aligned
exactly
with
our
build
graph
so
that
when
you
do
run
bazal
build,
you
always
build
a
successful
package
and
avoid
any
like
missing
files.
So
how
do
we
get
to
this
aligned,
sparse,
checkout.
A
The
answer
is
in
the
bazel
world,
the
answer
is
bazel
query
bazal
query
can
give
us
can
calculate
the
sparse
checkout
for
us,
it
just
needs
an
outlining
tree
and
we
consider
an
outlining
tree
to
be
this
minimal,
sparse
checkout,
with
only
the
files
needed
by
bazel
query
concretely.
This
is
mainly
build
files
and
Dot
vcl
files.
A
A
Unfortunately,
it
isn't
just
one
big
win.
We
find
that
bazel
query
is
extremely
slow
and,
ideally,
users
run
bazel
query
on
every
checkout
to
align
with
the
BuildCraft
five
minutes
to
just
do
a
checkout
makes
the
tool
like
kind
of
unusable.
So
now
we'll
lead
we'll
talk
about
our
solution
to
this
problem,
including
caching,
bazel.
B
Thanks
David,
so
caching,
bazel
query
like
David
was
saying:
we
need
to
Cache
this,
because
five
minutes
to
check
out
is
pretty
much
unusable,
I'm
going
to
be
discussing
two
methods
today
that
we
use
to
Cache
basal
query.
The
first
is
a
course
grains
cache
and
the
second
is
a
more
fine-grained
cache.
B
So
normally
when
you
want
to
build
a
sparse
checkout
profile,
you
have
a
set
of
targets
that
you
want
to
build,
and
then
you
query
bazel
to
see
what
the
dependencies
are
for
these
targets,
which
requires
you
to
have.
You
know
your
work
increase,
ready
to
service
bazel
queries
with
these
two
things.
You
can
create
a
sparse
checkout
profile,
which
git
will
accept
and
use
to
create
your
sparse
checkout.
B
B
So
yeah,
the
key
here
is
the
targets
and
the
commit
hash
and
the
values
as
far
as
checkout
profile.
The
advantage
of
this
is
that
it's
pretty
simple
to
implement.
You
only
need
to
do
one
lookup
in
your
cache
to
find
the
necessary
data,
so
that's
extremely
fast,
but
the
disadvantage
is
that
even
a
small
change
to
your
working
tree
can
cause
a
cache
Miss.
So
if
I
make
a
change
to
a
single
source
file,
then
you
know
your
entire
cash.
B
Your
entire
cache
key
will
change
and
you
won't
be
able
to
access
the
cache
and
get
your
sparse
check
out.
So
this
approach
is
much
better
when
you
have
a
set
of
commits
that
are
well
known
and
a
set
of
targets
that
you
knew
ahead
of
time,
that
you
can
build
these
caches
for
so,
for
example,
if
you
are
pushing
commits
to
your
main
branch,
those
are
a
good
candidate
for
catches
for
commits
to
build
caches
for,
and
the
second
approach
that
I'll
cover
is
a
fine-grained
approach.
B
So,
instead
of
caching,
the
sparse
checkout
profile
for
the
entire
commit
or
the
entire
work
entry
State,
we
instead
cache
data
on
the
granularity
of
a
single
Target
at
a
time
so
pretend
I.
Have
this
target
called
edit
tweets
implements
the
Tweet
edit
button
and
it
has
some
dependencies
so,
for
example,
a
dependency
on
the
button
Target
on
the
Tweet
Target
and
then
maybe
there's
a
non-bazal
dependency.
That's
just
a
bunch
of
boilerplate
files.
B
So,
in
order
to
analyze
this
and
produce
A
fine
grain
cache,
we
look
at
the
build
file.
That's
associated
with
the
edit
tweet
Target,
and
this
build
file
declares
the
dependencies
for
this
Target
and
also
it
declares
it
loads.
Some
bzl
files
which
contain
definitions
which
are
used
to
determine
the
dependencies,
and
we
access
this
through
a
parse
function
which
takes
the
build
file
and
it
traverses
the
bzl
kind
of
dependency
tree
to
get
all
the
loaded
files
and
from
this
it
produces
a
cache
Key.
B
By
combining
all
of
this
data
into
a
single
hash,
and
after
we
have
this
cache
key,
we
can
then
just
query
bazel
with
a
function
like
called
bazel
analysis
that
just
takes
a
Target
and
Returns
the
actual
dependencies
for
that
Target,
and
this
together
is
enough
to
make
our
cache
the
parse
that
results
with
the
parse
function
is
critically.
It
doesn't
require
you
to
actually
evaluate
a
bazel
query.
You
can
calculate
this
just
using
textual
content
from
the
build
files
and
bzl
files,
and
this
forms
the
cache
key.
B
So
here's
some
performance
numbers
for
my
tool.
The
first
command
is
US
using
the
fine
grain
cache
and
the
second
one
is
without
any
cash
whatsoever.
We
clear
the
cache
before
we
actually
try
to
synchronize
the
working
tree
with
the
sparse
checkout
profile.
B
So
in
the
first
case,
you
can
see
that
it
took
an
average
of
a
little
under
five
seconds
to
run
on
a
typical
project
in
our
modern
repo
to
synchronize
a
working
tree
and
in
the
second
case
where
we
actually
have
to
query
bazel,
because
we've
cleared
out
the
cache
ahead
of
time.
It
takes
a
little
over
30
seconds.
So
that's
a
performance
Improvement
of
a
factor
of
about
six,
which
is
pretty
significant
and
we
do
expect
to
be
able
to
optimize
further
on
the
time
it
takes
with
the
cache.
B
So,
for
example,
by
default
it
does
shallow
clones
to
reduce
load
times
and
reduce
load
on
your
servers.
It
has
facilities
for
managing
old
branches.
In
your
shallow
clones,
you
can
upload
it
and
download
caches.
So
these
caches
we
were
talking
about.
Maybe
your
CI
machine
will
generate
them
and
upload
them
and
your
clients
will
actually
download
them
so
that
they
can
have
a
warm
and
ready
cache.
We
also
have
a
UI
for
discovering
projects
that
you
may
want
to
use
and
easily
find
and
add
them
to
your
sparse
checkout.
B
So
we'd
like
to
invite
you
all
everyone
here
with
a
big
git
model
repo-
and
you
know,
bazel-
build
graph
to
consider
adopting
focus.
It
is
open
source
on
github.com
twitterfocus.
It
is
written
in
Rust.
It
includes
the
tutorial
to
build
bazel
using
Focus,
so
you
take
Focus
to
check
out
a
part
of
bazel
so
that
you
can
build
bazel
using
bazel
with
Focus.
B
It's
also
extensible
to
other
build
systems.
So
not
just
bazel.
If
you
have
a
different
one,
you
can
add
support
for
that
and
I'd
like
to
extend
an
amazing
thanks
to
all
the
people
on
our
team
who
helped
us
get
focused
to
where
it
is
today
over
the
last
year.
Yeah.
So
thank
you
all
they're
now
here,
but
they
did
a
lot
of
work
to
get
us
to
the
state
and
thank
you
all
as
well
for
coming
to
listen
to
our
talk
and
we
hope
that
you'll
consider
using
Focus.
A
B
Yeah
so
minor
repos,
you
have
all
your
dependencies
in
one
repository.
Some
examples
of
things
you
can
do
are
code
based
wide
refactorings,
which
are
a
lot
harder
to
orchestrate
across
a
lot
of
different,
smaller
repositories.
B
Ideally,
you
would
have
just
one
version
of
every
dependency
that
just
simplifies,
you
know
deployments
and
builds
that
might
not
always
be
possible
at
Twitter.
We
have
multiple
versions
of
various
dependencies
in
our
modern
repo
and,
generally
speaking,
it
kind
of
shifts
a
lot
of
the
tooling
pain
that
you
would
have
from
dealing
with
many
different
repos
centralizes
it
to
where
a
single
like
team
for
Source
control
can
start
to
address
those
problems
for
everyone,
rather
than
everyone
having
to
use
some
other
tools
to
deal
with
it.
A
And
I
think
most
companies
also
find
that
they
have
more
than
just
one
repo
there's,
always
like
little
edge
cases
where
people
want
to
like
mirror
an
external
repo
or
maintain
a
fork
or
maintain
some
open
source
project
like
this,
you
will
not
get
access
to
our
monorepo
if
you
want
to
use
this
tool.
So
it's
separate
on
GitHub.
B
So
the
question
is:
how
approachable
is
focus
to
new
developers?
You've
been
the
developer
for
a
little
over
two
months.
Is
that
right
and
you
work
with
a
modern
repo
at
your
company?
Yes,
okay!
So
we're
certainly
hoping
it
will
be
approachable.
B
The
documentation
is
in
early
stages,
but
we
do
have
a
tutorial
and
you're
absolutely
welcome
to
post
on
the
discussion
board
on
GitHub,
and
we
will
try
to
you
know,
work
with
you
to
for
one
thing:
get
you
where
you
need
to
be
with
focus
and
another
improve
the
documentation
to
a
point
where
everyone
can
successfully
use
it
on
their
mono
repos.
So
we're
very
happy
to
work
with
you
on
that.
A
I
would
say
if
you
have
like
a
fairly
passing
familiarity
with
with
bazel
and
to
start
and
with
sparse
checkouts,
get
you'll
be
in
a
pretty
good
shape,
and
then,
if
you
wanted
to
extend
it
to
like
some
particular
build
system,
then
it's
going
to
require
just
a
little
bit
more
like
rust.
Knowledge,
but
again
would
be
happy
to
help
with
whatever
you're
trying
to
extend
it
to
yeah.
C
Hi,
thank
you.
Have
you
found
I,
don't
know
how
far
adoption
is
a
focus
within
Twitter,
but
have
you
found?
Maybe
Dev
teams
are
optimizing,
their
build
graph
or
or
their
dependency
graph,
to
take
more
advantage
of
focus.
Is
that
something
you've
seen
happen.
A
We
have
not
seen
that
happen,
and
probably
the
first
thing
we'll
do
is
try
and
optimize
our
like
initial
Sparks
checkout,
there's
a
bunch
of
like
mandatory
things,
you
kind
of
need
to
make
sure
that
the
build
system
works
and
just
like
minimizing.
That
will
also
like
improve
things,
probably
more
dramatically
than
having
any
individual
teams
try
and
like
self-manage
their
their
build
graph.
Thank
you.
There's.
B
Actually,
just
a
slide
here,
which
maybe
we
can
look
at
this
is
just
some
other
miscellaneous
data
at
the
top.
Are
some
targets
there's
this
thing
called
strata
that
does
code
generation,
there's
graphql,
and
these
change
like
every
10
to
50,
commits
and
then
the
projects
that
people
are
actually
working
on
might
change
like
every
100
or
200
commits.
So
we
want
to
optimize
that
stuff
at
the
top.
That's
core
infrastructure
that
everyone
depends
on
to
improve
these.
The
bazel
build
graph.