►
From YouTube: Git at Google: Making Big Projects (and Everyone Else) Happy, Dave Borowitz - Git Merge 2015
Description
Google likes to push the boundaries of what's possible with Git. With big projects like Android and Chrome, we need some pretty big hosting infrastructure: we built a custom globally replicated storage layer, added bundles and bitmaps to reduce server load, and built specialized tools like Gerrit Code Review. But working on large Git-based projects is still not as nice as we'd like it to be. In this talk I'll discuss what we've done to make large projects happy at Google, and what we hope to do to make large projects happy everywhere.
A
B
The
contributors
summit
yesterday,
ivar
l
had
a
really
interesting
discussion
about
all
the
ways
in
which
get
is
slow,
I'm
going
to
talk
about
some
of
them,
but
I'm
like
scratching
the
surface,
there's
a
lot
of
pain
that
people
are
feeling
that
is
not
exactly
the
same
pain
that
we're
feeling,
but
I
do
want
to
keep
this
kind
of
positive.
Not
everything
is
terrible.
B
There
are
some
little
small
improvements
that
we've
made
that
we
can
make
in
the
future
to
make
life
on
a
big
project
easier
and
we'll
talk
about
some
of
those,
and
the
last
thing
I
want
to
say
before
I
really
get
started.
Is
your
project
may
or
may
not
be
as
big
as
Google's,
but
everybody
in
this
room
is
going
to
feel
some
kind
of
pain
working
with
git
at
scale
at
some
point
in
your
career,
so
hopefully
something
I
described
here
will
be
relevant
to
you.
B
A
quick
history
of
the
stuff
we've
been
doing
at
Google
with
get
on
2008.
We
did
the
initial
release
of
something
you
may
have
heard
of
called
the
Android
Open
Source
project.
This
was
originally
hosted
up
at
Oregon
State
University,
using
the
same
servers
as
kernel.org
in
2011.
There
was
a
little
bit
of
a
security
and
scalability
issue
with
kernel.org,
so
we
launched
this
new
get
service
on
Google's
infrastructure
called
Google
source
comm,
that's
growing
a
little
bit
over
the
years,
most
notably
in
2014.
The
chromium
project
completed
a
big
migration
from
subversion.
B
To
get
that
we'll
talk
about
a
little
later,
some
numbers
here,
the
first
of
several
slides
it
may
make
your
eyes
glaze
over.
We
don't
serve
a
whole
lot
of
repositories,
we're
not
as
broad
a
base
as
something
like
github,
but
would
you
serve
some
kind
of
big
repositories?
Androids
got
a
few
hundred
chrome
has
a
few
hundred
tens
of
gigabytes
of
data,
we're
serving
several
terabytes
terabytes
of
traffic
a
day.
B
B
Android
is
is
kind
of
big.
As
far
as
get
projects
go,
there
are
800
repositories.
The
default
checkout
gives
you
about
4
or
500
of
those.
It
includes
everything
that
you
need
to
make
a
phone
operating
system.
That's
like
the
core
OS
all
the
bundled
apps,
there's
a
whole
copy
of
Chrome
in
their
device
images
for
Nexus
devices,
compiler
testing
tools,
third-party
dependencies,
you
name
it
it's
in
Android,
these
sort
of
range
in
size
from
really
really
tiny
repository
to
very
kind
of
big
repositories.
Over
10
gigs
in
size.
B
There's
also
like
a
complete
internal
Fork
of
all
of
that,
which
is
where
we
do
our
quasi
closed
source
development
before
our
periodic
releases
to
the
open
source
project,
and
we
also
have
to
collaborate
with
a
lot
of
partners.
Hardware,
manufacturers,
OMS
chipset
manufacturers,
things
like
that.
That
introduces
some
more
challenges.
B
So
why
is
Android
special
I
have
a
little
asterisk
here,
because
Android
is
not
actually
that
special.
It's
not
that,
unlike
a
lot
of
projects
that
maybe
are
the
ones
that
you
work
on
or
the
size
that
you
will
grow
to
be
in
the
future.
It's
pretty
big.
The
total
check
out
of
a
just
what
you
get.
If
you
follow
the
instructions
for
downloading
the
Android
source,
it's
about
17
gigabytes
it
we
have
to
work
with
lots
of
partners
this.
B
These
are
pretty
big
companies
and
when
you
work
with
big
company,
you
have
to
sign
contracts
with
them.
We
have
a
really
complicated
permission
system,
because
you
sign
a
contract
that
says
these
assets
can
only
be
shared
with
this
group
of
people,
and
these
assets
can
only
be
shared
with
this
other
group
of
people.
So
this
is
a
kind
of
a
source
of
confusion
for
these
people,
sometimes,
and
also
we
want
to
really
strive
in
the
Android
ecosystem
to
make
sure
all
the
tools
we
use
are
open,
open,
source
contributors
and
partners.
B
They
want
to
use
the
same
stuff
internally
in
their
organizations
that
Google
is
using
in
our
organization.
So
it
doesn't
do
us
any
good
if
we
build
something
that
can
only
be
used
for
Google,
or
so
we
want
to
help
everybody
who
participates
in
Android.
Equally,
the
other
big
project
I'm
going
to
touch
on
briefly
here
is
chrome,
which
is
a
browser
project
that
many
of
you
probably
use
these
projects.
There
are
a
couple
of
big
ones:
there's
chromium
the
core
browser
project.
B
Blink
is
a
fork
of
WebKit
rendering
engine
started
by
Apple
forked
by
Google
and
chromium
OS
is
a
sort
of
packaged
version
of
chrome
that
runs
on
a
laptop
or
a
desktop
machine.
That
therefore,
has
some
of
the
similar
problems
you
might
have
with
Android
in
terms
of
working
with
partners,
and
things
like
that.
B
So
why
is
Chrome
special
again?
Not
actually
that
special,
it's
kind
of
big,
but
it's
not
completely
unreasonable.
They
just
completed
a
giant
migration
from
all
this
history,
a
subversion
data
into
get,
which
is
something
that
a
lot
of
people
go
through.
Taking
lots
of
old
development
history
and
the
developers
of
chrome
were
actually
like
very
set
in
their
ways.
They
built
all
these
workflows
around
this,
like
web
front-end
to
subversion.
B
B
These
repositories
are
a
little
bit.
Big
chrome
is
about
three
gigabytes
in
size.
Blink
is
about
five
gigabytes
in
size
and
sometimes
I
think
they're
really
just
trying
to
make
my
life
difficult
because
they
decided
we
have
two
giant
repos,
what's
better
than
two
giant
repos
combining
them
together,
but
this
is.
This
is
something
that
gets
supports
very
well,
and
there
are
definitely
benefits
to
having
a
large
monolithic.
Codebase
may
have
heard
some
talk
from
Facebook
about
how
they're
scaling
mercurial
to
work
with
a
codebase
is
actually
much
larger
than
this
in
size.
B
So
those
benefits
in
terms
of
atomic
refactoring
or
something
we
want
to
support
for
our
developers
to
make
them
more
productive
and
not
let
the
tools
get
in
their
way,
and
the
last
thing
that
I
previously
mentioned
is
Chrome.
Os
is
like
basically
like
Android.
It
has
all
the
same
problems
with
partners
with
heart,
unreleased
hardware,
with
contracts
and
access
controls,
so
they
they
feel
some
of
the
same
pain
that
the
Android
team
is
feeling
as
well.
B
So
when
I
get
hit
on
a
few
times
in
this
talk,
is
that
being
big
is
hard?
It's
hard
when
you're
a
get
client
just
running,
get
commands
on
big
repositories
becomes
hard,
and
this
is
where
I'm
only
going
to
touch
on
a
few
things,
but
it's
actually
hard
and
way
more
ways
and
I'm
going
to
talk
about
just
the
first
thing
you
do
when
you
download
Android
is
like
you
need
to
get
this
17
gigabytes
of
data
from
our
servers
to
your
desktop.
B
Somehow,
if
you're
lucky
enough
to
have
a
Gigabit,
Ethernet
connection
that
goes
to
a
machine
in
your
office
that
you're
not
sharing
with
anybody
else,
you
can
download
this
in
two
minutes
that
that
seems
pretty
reasonable.
But
more
likely
you
have
a
broadband
internet
connection
you
get
like
20
megabits
and
then
all
of
a
sudden
we're
talking
about
three
hours.
B
Actually,
the
average
broadband
speed
here
in
France
is
about
seven
megabits,
a
second
so
take
those
numbers
and
multiply
them
by
three
and
also
you
should
really
hope
that
your
internet
connection
is
not
flaky
at
all
because,
as
we
all
know,
get
clones
are
not
resumable.
So
if
you
start
to
download
two
gigabytes
of
repository
and
then
it
craps
out,
1.8
gigabytes
in
you
have
to
start
from
scratch,
which
is
not
not
super
great.
When
we're
talking
about
repositories
of
these
sizes.
B
Once
you
get
all
this
data,
you
have
to
actually
do
some
Delta
resolution
which,
depending
on
your
knowledge
of
getting
internals,
you
may
or
may
not
understand,
but
you
do
understand
that
it
is
a
painful
step.
You
have
to
go
through
checking
out
files.
Copying
them
from
the
packed
representation
onto
disk
can
take
a
really
long
time.
Android
has
about
half
a
million
files
of
varying
sizes,
and
if
your
operating
system
is
something
like
Windows
we're
writing
a
file
takes
opening
a
file
for
writing
takes
a
long
time.
B
This
can
be
kind
of
painful
before
I
go
too
much
further.
I
want
to
make
this
kind
of
like
meta
point
about
this
pain,
which
is
as
tool
developers
and
server
administrators.
You
can
only
get
so
far
by
saying.
Oh,
don't
do
that
you're
holding
it
wrong.
Don't
put
your
giant
binaries
in
git
repositories,
don't
import
1
million
commits
of
history,
but
it's
really
hard
to
tell
somebody
to
do
that.
One
really
they're
just
trying
to
get
their
job
done.
B
At
the
end
of
the
day
we
built
a
service
that
can
like
handle
these
repositories,
so
you
can
bet
that
somebody
is
going
to
try
to
push
a
repository
that
reaches
this
maximum
size
back
some
size
or
goes
over
that
and
I
don't
want
to
be
the
person
telling
these
people.
No,
you
can't
do
that.
I'd
rather
build
a
service
and
work
on
tools
that
are
able
to
handle
something
of
this
scale.
B
We
also
found
that
one
reason
you
can't
tell
people
don't
do,
that
is
that
they
really
don't
like
learning
multiple
tools.
It
was
bad
enough
trying
to
get
the
chromium
developers
to
migrate
from
subversion
to
get
asking
them
to
use
a
brand
new
tool
to
handle
some
of
their
source
control.
Just
because
we've
run
into
scalability
limits
that
we
should
really
be
able
to
work
around
can
cause
them
a
little
bit
of
pain.
B
So
one
thing
we
can
do
to
to
reduce
the
size.
The
size
causing
pain
for
users
is
just
fetching
less
data
that
would
really
be
nice.
Github
just
had
a
really
exciting
announcement
yesterday
about
storing
large
files
in
a
way
that
you
don't
have
to
download
them
all
the
time.
That
would
actually
be
really
nice,
but
obviously
we
have
not
yet
integrated
that
into
Android
one
trick
you
can
use
today
if
you're
not
familiar
with
it,
is
a
shallow
clone
which
only
gets
a
little
slice
of
history.
B
It
saves
you,
maybe
fifty
to
ninety
percent
of
the
data
transfer
that
you
need
which
actually,
if
you
think
about
it,
is
not
as
much
savings
as
you
might
expect.
Like
one
commit,
has
half
of
the
data
of
a
hundred
thousand
commits.
It
turns
out
that
git
has
like
really
really
really
good
Delta
compression,
but
shallow
clones
have
their
own
problems.
Some
commands
just
don't
work
like
you
would
expect
like
log.
B
You
can't
see
all
the
history
if
there's
only
one
commit
of
history,
they're
not
really
good
about
aggressively
garbage
collecting
old
data,
as
you
fetch
new
data,
so
can
work
could
be
improved.
A
little
bit.
Narrow
clones
are
a
topic
that
comes
up
periodically
on
the
git
mailing
list.
We
had
a
nice
long
talk
about
this
yesterday.
B
Also,
this
is
the
idea
that
you
can
check
out
only
instead
of
just
a
small
slice
of
history,
you
check
out
only
a
small
slice
of
the
trees
and
download
only
the
files
you
need
to
check
those
out
best.
I
can
say
about
this
problem.
Is
it's
turned
out
to
be
non-trivial
which,
as
we
all
know,
is
programmer
slang,
for
we
have
no
idea
when
this
is
going
to
happen,
but
we
actually
do
have
some
like
stopgap
measures
that
we
can
implement.
In
the
meantime,
we
split
repositories
up
into
multiple
repositories.
B
You
already
knew
this
because
I
told
you
Android
checks
out
500
repositories
when
you
download
it
the
way
that
we
do
this
in
the
Android
project
is
we
wrote
a
little
just
a
little
Python
wrapper
around
get
called
repo
if
you've
ever
used
repo.
You
know
that
it's
it's
a
little
more
than
just
a
little
python
wrapper.
It's
got
a
whole
bunch
of
extra
features.
Every
software
project
of
any
size
grows
until
it
contains
an
XML
parser.
B
But
it
actually
does
a
lot
of
useful
things.
This
is
a
little
bit
of
XML.
That
I
hope
makes
your
eyes
glaze
over,
because
if
I
were
sitting
there
at
9:00
in
the
morning
10:00
in
the
morning,
my
eyes
would
glaze
over
looking
at
XML,
but
it's
a
simple
way
to
represent
a
list
of
projects
that
can
all
be
checked
out
at
once.
B
Has
some
defaults,
you
can
set
parallelism,
you
can
set
which
repositories
are
going
to
be
checked
out,
and
this
is
a
very
representative
of
the
kind
of
approach
Android
is
taken
to
solving
these
giant
scalability
problems
that
they
like
actually
need
to
be
able
to
store
this
number
of
repositories
and
waiting
for
some
solution.
That's
going
to
magically
provide
us
with
narrow
clones
is
not
going
to
that
would
have
been
nice
eight
years
ago
when
Android
was
originally
released.
It
will
be
great
in
the
future,
but
we're
just
not
quite
there.
B
Yet
we
have
to
have
these
little
stopgap
measures.
The
one
thing
I'll
say
about
repo.
Is
it's
this
Python
wrapper
it's
grown
a
lot
over
time,
it's
kind
of
crufty,
but
there's
really
nothing
in
repo
that
couldn't
just
be
implemented
and
get
you
might
ask
what
we're
going
to
replace
it
with,
or
maybe
some
of
you
already
know
and
the
answer
that
will
cause
some
groans.
I
think
is
get
sub-module
if
you've
ever
worked
with
git
sub-module.
It's
not
it's
not
great.
You.
B
You
could
you
can
represent
instead
of
having
a
little
XML
based
configuration,
you
have
a
little
git
config
based
configuration,
but
there's
a
lot
of
like
pain
points
working
with
repo
right
now,
I
like
to
say
it's
not
that
it's
painful,
it
just
needs
a
little
bit
of
love.
We
need
to
handle
recursive
recursively,
checking
out
all
of
the
sub
repositories.
Sorry
I
should
have
said
if
you
don't
know,
get
sub
module
as
a
way
of
embedding
one
repository
or
many
repositories
inside
of
another
repository.
B
So
you
have
the
super
project,
which
has
a
bunch
of
sub
modules
or
sub
object
inside
of
it,
and
whenever
you're
operating
on
the
top
level
project,
you
really
kind
of
want
to
operate
recursively.
So
if
you
make
one
commit
in
the
super
project,
you
also
need
to
make
commits
in
all
the
sub
projects
that
have
been
modified
and
the
git
commands.
Don't
handle
this
very
well
right
now
we
don't
have
a
really
easy
way
to
say
only
check
out
this
subset
of
repositories.
B
You
can
sort
of
do
like
all-or-nothing
or
manually
check
out
some,
but
Android
has
eight
hundred
and
we
wanted
to
check
out
five
hundred
by
default.
There's
no
easy
configuration
way
to
do
that.
We
also
really
need
to
paralyze
this.
We
can
squeeze
a
lot
more
Network
and
CPU
performance
out
of
repo
by
just
simply
running
a
bunch
of
git
commands
in
parallel,
but
there's
no
reason
that
gets
sub-module
has
to
do
everything
in
serial.
We
just
haven't
gotten
there
yet,
and
a
lot
of
people
really
like
to
hate
on
get
sub-module.
I
know.
B
I've
certainly
been
like
that
in
the
past,
but
again
I
want
to
keep
this
talk
kind
of
optimistic
make.
Man
maybe
make
a
humble
request.
If
you
were
thinking
about
doing
some
horrible
workaround
for
the
fact
that
git
sub-module
doesn't
know,
doesn't
do
what
you
want
to
do
like,
for
example,
writing
a
completely
different
application
in
Python
to
replicate
some
of
the
features
of
git
sub-module.
Consider
whether
it
might
be
more
useful
to
contribute
back
to
the
get
upstream
project
and
make
sub-module
nicer
for
everybody.
It's
really.
It's
really
there's
a
bunch
of
low-hanging
fruit.
B
It
doesn't
have
to
be
as
bad
as
it
is
today.
It
could
be
like
way
more
fun
to
use
it's
about.
All
I
have
to
say
about
the
client-side.
Being
big
is
also
hard
for
servers.
I
spend
a
lot
of
my
time
running,
git
servers
doing
DevOps,
II
kind
of
stuff,
so
this
is
a
topic
that's
near
and
dear
to
my
heart,
you
may
remember
from
earlier
versions
of
get
this
phase
when
you
did
a
clone
of
something
like
the
Linux
kernel,
where
it
says:
County
objects
for
a
really
long
time
for
Linux.
B
Is
this
just
to
count?
The
number
of
objects
that
need
to
be
sent
in
a
pack
file
over
the
wire
would
take
about
60
seconds,
which
is
not
good
for
users,
but
it's
also
not
good
for
server
administrators,
as
this
is
taking
up
an
entire
CPU
for
one
minute.
Every
time
you
call
in
the
Linux
kernel,
so
if
you're
sending
hundreds
of
colons
concurrently,
that's
like
that's
a
lot
of
CPUs
that
you
could
be
using
for
other
things
and
Linux
is
not
even
big
compared
to
some
of
the
projects
we're
talking
about
here.
B
Now
that
traffic
can
come
from
a
lot
of
sources,
it
can
come
from
users
which
are
the
people
that
you
care
about.
That's
not
true.
We
care
about
everybody,
but
our
end
users,
the
people
who
are
downloading
the
code.
We
want
them
to
be
happy.
We
also
there
are
a
lot
of
automated
tools.
Continuous
integration,
things
like
that
that
have
like
can
provide
tons
of
load
on
a
server
a
fun
story
about
Chrome.
One
time
is
they
have
this
giant
farm
of
build
BOTS
and
they
in
a
previous
iteration
of
their
software.
B
B
All
the
time,
so
you
have
all
these
build
BOTS
constantly
deleting
their
three
goodbye
repositories
and
running
git
clone
again-
and
this
is
particularly
bad
for
us,
because
we
actually
have
this
like
system
in
place
for
setting
a
limit
on
how
much
data
you
can
download.
But
we
turn
that
off
for
Chrome
because
ricono,
we
trust
you
guys.
B
We
trusted
them,
and
then
we
had
to
turn
the
quota
back
on
and
tell
them
to
change
the
architecture
a
little
bit,
but
I
mean
this
had
worked
for
them
because
we
built
this
service
where,
under
normal
operation,
it
was
like
ok
for
them
to
build
or
to
clone
3
gigabytes
at
once,
so
they
just
naturally
used
the
thing
that
worked.
It's
really
hard
to
blame
them,
for
that
garbage
collection
is
another
thing
that
can
really
take
up
a
lot
of
CPUs
in.
B
If
you
want
to
serve
a
git
repository
efficiently,
you
periodically,
you
have
to
just
go
through
and
compact
all
the
stuff
together
to
do
a
bunch
of
Delta
compression
and
get
the
really
good
compression
ratios
that
get
is
used
to.
But
you
don't
want
to
share
the
same
CPUs
that
are
doing
user-facing
traffic
and
making
your
users
happy
with
this,
like
grungy
background
work,
but
that's
a
problem
with
the
classical
guitar
kotecha.
B
So
one
way
we've
cut
down.
Cpu
usage
is
a
wonderful
new
feature
and
get
2.0
called
reach
ability
bitmaps
I
shamelessly
stole
on
this
slide
from
Sean
Pierce,
who
shame.
We
stole
this
diagram
from
Scott's
book
and
the
general
idea
of
a
reach
ability.
Bitmap
is
instead
of
just
naively
walking
through
an
entire
git
repository
tree
and
Counting
all
the
objects.
We
store
this
optimized
data
structure
where
you
can
for
each
up
commit
in
the
repository
you
store
the
list
of
all
objects,
reachable
from
that
commit
and
by
organizing
them
in
a
particular
way.
B
You
can
make
the
counting
objects
phase
like
really
really
fast.
It's
just
a
cute
little
bit
of
algorithmic
magic
and
I'd
like
to
talk
about
any
more
detail,
but
I
got
a
lot
of
things
to
talk
about
too.
So
ask
me,
after
if
you're
interested,
this
was
originally
implemented
in
jagat,
the
Java
implementation
of
git
that
runs
our
infrastructure,
visa
and
Marty
at
github
was
kind
enough
to
port
that
to
see,
and
it's
now
a
part
of
get
2.0.
B
B
Another
trick
we
use
to
really
just
get
load
off
of
the
server's.
This
is
both
CPU
and
network
load
is
to
use
a
bundle
file.
So
a
bundle
is
a
little
pre-built
pack
file
you
can
create
one
by
running.
Git
bundle
create
that
includes
a
pack
file
and
the
refs
that
that
pack
contains-
and
you
can
just
pass
this
file
around
and
run
clone
just
pass
it.
The
filename
and
I'll
clone
from
the
bundle.
B
So
we
do
is
we
actually
redirect
all
of
the
clone
bundle
URLs
for
each
repository
to
a
special
URL
on
a
static
hosting
service?
In
our
case,
a
content,
delivery
network,
that's
used
by
the
same
one
used
by
YouTube,
but
this
could
be
any
static
hosting
service.
The
client
downloads
that
file
then
does
a
little
incremental
fetch
to
pick
up
everything
that's
been
added
to
the
repository
since
that
bundle
was
created.
This
is
really
great
for
us
because
it's
like
essentially
zero
server.
B
We
have
to
do
a
little
bit
of
redirecting,
but
compared
to
actually
shipping
these
gigabytes
two
gigabytes
of
data.
It's
really
much
better.
You
can,
if
you
remember
from
one
of
my
earlier
slides,
we're
actually
serving
like
three
quarters
of
Android
traffic
is
coming
off
in
the
CDN.
Instead
up
off
of
our
core
data
centers,
which
is
really
nice
from
a
server
administrator
perspective.
Also,
it's
better
for
users.
B
These
bundle
files
are
just
static
files,
they're
resumable,
you
can
start
downloading
a
two
gigabyte
file
and
if
you
get
interrupted
halfway
through,
you
can
resume
right
where
you
left
off,
which
is
which
is
pretty
nice.
I
know
that
you
know
has
some
really
exciting
ideas
about
how
to
take
these.
This
resumable
bundle
idea
and
apply
it
to
general
get
fetch,
but
that's
like
another
thing.
That
is
an
idea,
but
it's
a
little
way
off
I.
B
B
That's
about
all
we
can
do
to
just
sort
of
shunt
shunt
work
off
of
the
kora
servers.
We
also
like
to
spread
load
among
a
bunch
of
different
servers
that
are
running
in
the
data
center,
using
some
sort
of
shared
file
system,
where
you
just
have
different
get
processes
talking
to
the
same
underlying
file
system
storage.
That
would
be
really
nice.
That
would
be
it
would
allow
us
to
spread
the
CPU
load.
You'd
still
have
sort
of
like
a
disk
read
bottleneck
because
you
still
have
a
single
shared
disk.
B
You
could
do
garbage
collection
in
separate
workers,
so
you
don't
have
to
worry
about
garbage,
collect
collection,
backing
up
and
user
facing
requests.
You
can
do
this
sort
of
right
now
with
NFS
I
say
it
works
because
there
there
are
some
performance
problems
when
you
have
high
throughput
repositories
running
on
NFS
I
know
that
github
has
like
something
nicer
than
NFS
for
for
doing
a
shared
file
system,
kind
of
approach,
I'm,
not
sure
what
the
like
open
source
state
of
the
art
is
here.
B
You
can
do
replication
between
multiple
servers,
so
you
have
one
master
that
handles
all
of
the
right
traffic
going
into
the
master
and,
as
that's
receiving
writes
it
pushes
out
those
to
a
bunch
of
slave
servers
and
those
slave
servers
can
serve
read
traffic.
This
is
nice.
You
can
share
a
lot
of
the
read
work.
You
still
have
this
bottleneck
for,
writes
and
you
can
have
problems
with
replication
lag
if
the
pushes
are
very
large
than
replicating
those
out
to
slaves
can
take
a
little
while
so.
B
B
There's
a
lot
of
really
interesting
stuff
in
here.
I
could
talk
about
this
all
day,
but
the
stuff
I
want
to
focus
on
is
these
little
yellow
bits.
We
have
a
single
shared
file
system
and
database
using
Google,
BigTable
and
Google
file
system,
and
we
have
a
bunch
of
git
repositories
that
talk
to
the
shared
file
system.
Now,
each
of
these
repositories
concert',
sorry,
I'm.
Sorry,
these
are
get
front-ends.
B
Each
of
these
front-ends
can
serve
any
number
of
repositories
in
any
repository
can
leave
in
lots
of
front-ends
the
way
that
we
sort
of
manage
this
at
a
high
level.
Is
we
have
a
get
aware,
load
balancer
that
redirects
requests
for
certain
repositories
to
certain
front-ends,
depending
on
load
depending
on
the
size
of
them,
and
we
built
this
distributed
file
system
layer
on
top
of
jacott
the
java
implementation
of
git?
That
is
able
to
page
in
files
from
a
slow
file
system.
B
This
GFS
is
like
way
slower
for
opens
and
reads
than
you
would
expect
from
a
normal
posture
file
system.
So
we
need
to
aggressively
prefetch
stuff
and
cache
it
as
necessary
in
the
git
front
ends.
We
have
a
completely
separate
pool
of
garbage
collection
workers,
which
is
nice
when
you
are
garbage
collecting,
100
gigabyte
repositories
and
another
cool
feature
we
have
is
before
we
accept
the
write.
We
actually
do
some
replication
to
a
remote
data
center.
B
All
the
stuff
outside
of
this
box
lives
in
a
single
data
center,
but
we
actually
have
six
of
these
worldwide
and
anytime.
You
do
a
push
to
us
before
we
even
say
yes,
that
push
was
accepted.
We've
actually
replicated
that
out
to
three
other
data
centers
around
the
world.
So
this
is
like
this
gives
us
good
performance
when
you're
is
sitting
in
Europe
and
you
don't
have
to
talk
to
a
server
across
the
Atlantic.
B
If
you
want
to
download
all
of
Android,
we
also
have
some
data
centers
in
Asia
that
are
useful
for
a
lot
of
our
Asian
partners
for
Android,
and
we
also
have
really
good,
really
really
good
availability.
Since
there
are
six
these
data
centers.
If
one
goes
down,
we
like
almost
don't
notice
and
that
happens
kind
of
more
often
than
you
might
think.
B
Fortunately,
we
don't
usually
get
more
than
one
going
out
at
the
same
time,
so
you
might
be
wondering
how
you
can
do
this
at
home.
I
would
really
love
to
stand
up
here
and
say
you
can
just
like
download
this
package
and
push
out
a
bunch
of
docker
images
and
have
this
running
the
reality.
Is
they
like?
Some
of
this
is
a
one
source,
the
jacott
DFS
stuff.
B
If
you
are
interested
in
a
sort
of
dynamic
caching
strategies,
that's
some
interesting
code
to
look
at,
but,
like
there's
a
lot
more
stuff,
we
need
to
do
to
get
this
open
sourced
a
lot
of
the
sort
of
global
database
glue
and
the
replication
glue.
We
should
open-source
but
haven't,
had
an
opportunity
yet
there's
some
like
secret
sauce
that
were,
unfortunately
not
in
a
position
to
share
with
you,
like
the
big
table
implementation.
But
there
are
open
source
equivalents
of
this.
B
We
would
probably
build
a
reference
implementation
on
HBase
in
HDFS,
rather
than
big
table
and
GFS.
That's
that's
about
all
I
have
to
say
about
servers,
but,
like
I
said,
this
is
a
topic.
That's
very
interesting
to
me.
So
if
you
have
more
questions
about
it,
I'll
be
happy
to
talk
after
the
break.
The
last
way,
I
want
to
talk
about
being
big.
B
Being
hard
is
for
humans,
we're
all
humans
in
this
room
as
far
as
I'm
aware,
but
we
find
it
difficult
when
there
are
hundreds
of
repositories
to
like
manage
those
like,
even
if
you
have
a
tool
that
understands
this
perfectly
like
how
do
you
as
a
human
know,
what
you
need
to
modify?
This
is
a
problem
that
you
have.
Even
if
you
merge
everything
into
a
single
repository,
just
working
in
a
large
code
base
is
hard
and
I
don't
have
any
like
silver
bullets
for
dealing
with
that.
B
B
Android
has
this
like
additional
problem
that
they
have
an
internal
fork
and
external
fork,
for
example,
if
they
get
a
contribution
to
the
open
source
project,
they
somebody
needs
to
manually
cherry-pick
that
back
onto
the
internal
pre-release
branch
and
that
may
have
diverged
in
the
meantime-
and
this
is
just
some
pain
that
Android
has
semi-automated,
but
not
really
automated
tools
to
deal
with
and
access
controls.
Man,
access
controls
are
just
between
you
and
me.
I
think
they're
kind
of
a
mess,
but
I'm
public
will
really
say
they're.
B
So
how
can
we?
How
can
we
make
this
better?
How
do
we
ensure
that
what
goes
into
the
repository
history
is
good
and
when
I
say
quality
history
I'm
talking
about
like
this
true
source
of
truth
repository,
we
all
know
that
git
is
a
distributed
version
control
system,
but
generally
speaking,
especially
in
a
large
organization,
you
have
like
one
true
source
of
truth.
Like
Lena
says
kernel
repository
or
the
git
repository
maintained
by
Junio,
so
our
solution
at
Google
is
we
built
this
tool
called
Gerrit
code
review,
which
some
of
you
may
have
seen.
B
You
may
love
it.
You
may
hate
it
if
you
haven't
seen
it
before
here's
a
little
side-by-side
review.
I
made
some
comments
on
some
of
my
colleague
Stefan's
code.
You
can
see
the
progression
of
this
change
as
it
changed
over
time.
I
took
a
neat
little
interface
for
doing
side
by
side
code
review.
It's
got
a
lot
of
features.
I
mean
it's
a
bit
of
a
Swiss
Army
knife.
It
does
access
controls
the
stuff
that
we
use
to
implement
all
of
our
contractual
obligations.
B
B
It
doesn't
matter
to
me
you
do
what
makes
code
review
easy
for
your
organization,
but
I
will
say
not
gonna
have
time
to
talk
about
this,
but
one
project
I've
been
working
on
in
the
past
few
months
is
like
I.
Think
code
review
should
really
be
interoperable.
You
shouldn't
have
to
choose
that
I
want
to
do
all
of
my
reviews
and
Garett,
or
all
of
my
reviews
in
github,
so
I've
been
doing
some
work
on
a
on
a
git
notes,
based
format
for
sharing
code
review
metadata
inside
of
a
git
repository.
B
So
that's
about
all
I
have
you
may
think
I
was
if
you're
a
git
contributor
I
was
a
little
light
on
technical
details
or
if
you're,
a
git
user
I
may
have
been
too
heavy
on
technical
details,
either
way,
I'm
happy
to
elaborate
on
any
of
these
things
during
the
break.
If
you
have
questions,
you
have
complaints
about
Google's
tools
that
we've
built
I'm,
a
great
person
to
throw
your
rotten
vegetables
at
and
I,
really
like
here.
Other
people's
horror
stories.