►
From YouTube: Native Git support for large objects - Git Merge 2019
Description
Presented by Terry Parker, Engineering, Google
"Large binary objects pose a special challenge for Git. This talk will explain how Git’s new partial clone feature and a new proposal to use content distribution networks can help."
About GitMerge
Git Merge is the pre-eminent Git-focused conference: a full-day offering technical content and user case studies, plus a day of workshops for Git users of all levels. Git Merge is dedicated to amplifying new voices in the Git community and to showcasing the most thought-provoking projects from contributors, maintainers and community managers around the world. Find out more at git-merge.com
A
Okay,
so
I'm
Terri,
Parker
I
work
at
Google
as
a
software
engineer
and
I'm
gonna
talk
today
about
handling
large
files
and
get
using
some
new
protocol
features.
I
managed
the
get
core
team
that
works
on
the
open
source
project
upstream
and
a
tech
lead
manager
of
the
server
team
that
manages
a
large
hosting
service
that
Yvonne
and
men
talked
about
the
variety
of
clients
we
support
there.
So
the
agenda
today
is
to
define
terms.
What
are
we
talking
about
with
large
files?
A
Why
do
I
consider
native
support
for
large
files
to
be
important
and
I'm
going
to
talk
about
a
couple
new
features
and
get
that
are
emerging?
Well,
the
first
one
partial
clones
are
is
an
emerging
thing
just
introduced
in
get
to
dot,
17
or
18.
This
is
why
I
need
my
speaker
notes,
but
introduced
in
April
of
last
year,
and
the
second
one
is
a
feature:
that's
a
work
in
progress.
It
isn't
out
there
anywhere,
yet
it's
just
being
proposed
on
the
kid
open
source
project,
which
is
using
content
distribution
networks
for
cloning.
A
So
what
do
we
mean
by
large
file?
How
are
we
going
to
define
this?
Well,
it
can
be
by
extension,
type
things,
name,
dot,
bin
or
generally
not
text
files,
and
so
you
might
say,
every
dot.
Bin
file
is
large,
whether
it
has
a
byte
count.
That's
large
or
not.
You
may
have
specific
VM
image,
extensions
or
android
apk
files,
or
you
can
define
it
by
size,
but
you
wanted
to
set
that
threshold
at
100,
K
bytes,
a
megabyte,
I
think
it's
important
to
be
flexible
here.
A
So
clients
can
make
them
trade-offs.
They
need
and
servers
can
make
the
trade-offs
that
they
need.
So
am
I
talking
about
getting
LFS
I'm,
not
talking
about
you
to
LFS
I,
don't
consider
LFS
to
be
a
native
understanding
of
git
LFS
is
using
some
pre-existing
hooks
called
smudge
and
clean
filters
that
basically
are
pre
and
post
hooks.
A
So
when
you're
about
to
push
a
large
object,
get
LFS
substitutes,
a
URL
that
points
to
that
object
in
the
place
of
the
object
and
the
problem
with
that
is
that
you
have
to
have
everything
pre-configured
for
that
to
work
correctly.
So
there's
a
scenario
where
you've
configured
LFS
for
all
of
your
dot
bin
files.
A
A
Something
that's
really
important
to
me-
is
being
able
to
serve
lots
of
customers
effectively
and
if
we
have
people
doing
large
clones
and
they
may
be
coming
across,
you
know
slow
or
moderate
speed
connections
if
you're
cloning,
ten
gigabytes
you're
not
going
to
get
that
done
very
quickly
and
you're
tying
up
a
thread
on
the
server
and
using
lots
of
bandwidth.
To
do
that.
A
So
partial
clones
are,
as
I
said,
an
emerging
feature
that
were
introduced
in
dot
17
or
18
in
April,
and
we've
been
improving
support
through
time.
So
it's
best
to
use
2.20
of
git.
If
you
want
to
test
this
out,
but
the
way
to
think
about
it
is
it
allows
the
client
to
say:
hey
I
want
this
clone
of
this
repository,
but
I
want
to
filter
out
certain
objects
and
I
will
come
back
to
you
if
I
need
them
later.
So
on
demand.
Look
downloading
by
the
client.
A
A
So
in
this
case
we
have
1m
binary.
That's
a
you
know,
a
megabyte
size,
binary
and
a
help
text
is
subscribing
the
updates
that
were
made
to
it
each
time
so
in
this
command
line.
We're
cloning
with
this
filter
blob
limit
equals
one
megabyte
and
if
you
take
a
look,
we
have
a
couple
grave
filter,
gray,
ovals
now
and
those
are
the
things
that
are
not
being
downloaded
by
this
command.
Now
you
may
notice
that
the
one
am
bin
prime
prime
is
being
downloaded,
and
that's
because
a
clone
is
actually
two
operations.
A
The
first
operation
is
a
fetch.
It
fetches
all
the
content
into
the
docket
directory
in
your
local
client
and
the
second
operation
is
checking
out
a
branch.
So
by
default
it's
going
to
check
out
whatever
branch
had
points
to
which
by
default
is
master.
So
in
this
case
the
initial
fetch
into
the
docket
directory
did
not
fetch
that
one
M
that
bin
primetime
file-
it
was
the
checkout
where
it
said
hey.
This
is
something
that
I
actually
need
to
populate
in
the
work
tree.
That
did
that
as
a
second
transaction.
A
Here's
the
second
command
git
clone
with
no
check
out
the
no
checkout
stops
that
second
phase.
So
it's
just
going
to
fetch
things
into
the
docket
directory
and
in
this
case
we
said,
fill
blob,
none
so
we're
downloading
all
the
commits
all
the
trees,
but
none
of
the
file
contents
and
here's
a
further
check
out
with
a
newer
feature.
The
debate
is
probably
made
available.
Only
in
2.20,
which
is
the
filter
tree,
equals
zero
and
that's
saying
not
to
check
out
any
of
the
trees.
A
A
You
know
if
you're
working
on
office,
you
don't
need
the
windows,
you
don't
have
all
the
windows
stuff.
So
this
is
a
facility
to
allow
the
further
development
of
these
large
mono
repos
and
all
our
developers
to
be
efficient.
So
the
second
feature
I
want
to
talk
about
is-
and
this
is
this-
is
just
a
work
in
progress.
The
proposal
has
been
made
on
the
get
upstream
list
and
it's
pretty
receptive.
We
think
this
is
going
to
happen,
but
let's
talk
about
why
using
a
content,
distribution
network
is
is
important.
A
Content
distribution
networks
are
are
really
good
at
handling
high
peak
volume.
Loads
I
actually
wanted
to
get
that
viral.
Surprise.
Kitty
video,
but
I'm.
A
mere
software
engineer
who
doesn't
understand
copyright
and
I
didn't
know
whether
I
could
actually
use
that
and
get
attributed.
So
we
just
get
a
cute
puppy
instead,
but
they
are
very
good
at
scaling
up
and
load
balancing
and
also
do
a
good
job
of
moving
the
content
close
to
where
the
user
is
requesting
the
data.
A
Every
when
you
do
a
clone
is
crafting
a
custom
pack
file
with
all
the
latest
commits
right
up
until
you
know
the
second
before
you
requested
that,
and
if
you
get
99%
of
the
way
through
a
10
gigabyte
transfer,
and
then
you
have
some
kind
of
failure,
you
have
to
start
from
scratch.
It's
not
resumable
at
all
and
content
distribution
networks
have
the
nice
property
where
they're
they're,
HTTP
GET
commands,
and
if
you
fail
9
gigabytes
out
of
a
10
gigabyte
transfer.
A
A
So
you
know
my
message:
my
takeaway
message
is
that
we,
the
git
project,
is
trying
to
deal
with
these
things.
You
know
it
initially
get
was
intended
for
source
code.
People
are
putting
a
lot
of
other
different
types
of
assets
in
there.
Andget
hasn't
always
adapted
well,
but
the
gay
community
is
cooperating
together
to
to
make
sure
these
things
work.
So
you
can
look
for
these
features
in
a
release
near
you.
There
are
lots
of
companies
that
blog
post
github
does.