►
From YouTube: Gitaly: faster clones using Git bundles
Description
B
No
pressure
cool,
so
this
is
a
little
demo
of
an
idea
to
speed
up
its
clones.
It
only
works
for
full
clones
and
the
idea
is
to
do
less
computation
on
the
server
interest
stream
data
from
disk.
First,
let
me
show
what
the
normal
situation
looks
like.
I
have
a
good
laptop
kids
here,
and
that
is
the
thing
that
we're
going
to
be
cloning,
and
this
is
my
clone
command.
B
C
B
Yeah,
when
you
do
a
full
clone,
normally
that
defaults
to
branches
and
tags,
so
first
it
gets
a
list
of
all
the
branches
in
tags
and
then
those
are
commits
and
tags
in
a
tree,
and
then
it
walks
down
the
tree
and
finds
everything
that
is
reachable
from
there.
And
then
it
is
this
compressing
stuff.
So
I
think
I've
made
the
point
that
this
takes
time.
So
what
I
can
now
do
with
the
demo
is
that
I
can
create
where,
as
my
history
I
can
create
a
bundle,
file
and
I'm
creating
it
inside
that
directory.
B
So
this
is
kit.
Bundle
creates
the
file
I'm
creating
is
a
fixed
file
name,
because
my
little
hook
assumes
a
fixed
file
name
and
I
want
it
to
pack
everything
that's
under
branches
or
tax,
which
is
corresponds
exactly
to
what
goes
in
a
normal
clone.
This
also
takes
a
while.
This
is
doing
roughly
the
same
job
actually
because
it
has
to
enumerate
income.
Okay,
I
should
have
just
kept
that
file
and
not
just
deleted
it.
Earlier.
A
So
one
of
the
things
we're
discussing
about
this
is
the
disk
storage
requirements
and
obviously
pre
computing
and
storing.
This
requires
more
storage,
space
yeah,
but
in
this
case
we're
not
actually
storing
a
full
copy,
we're
actually
storing
the
stuff
that
you'd
clone.
So
in
the
case
of
Gil
and
where
we
saw
a
lot
of
key
Brown
drifts
and
other
sorts
of
data
and
history
in
the
repository
that
may
have
been
removed,
this
bundle
should
actually
be
smaller
than
the
repository.
Although
yes
likelihood,
it's
probably
going
to
be
close
to
the
size
of
the
robot.
B
Yes,
exactly
I
have
no
idea
what
the
ratio
is
of
how
much
non
reachable
stuff
we
keep
around,
but
I
don't
think
it's
very
much
because
suppose
with
the
spaces
in
blobs,
and
unless
you
have
a
lot
of
weird
binaries
that
you
that
are
not
normally
reachable
the
blobs
who
are
like,
like
you,
have
a
merge
request
and
you
create
a
certain
file
and
then
that
got
the
rebased
or
something.
But
you
probably
had
the
same
file
in
the
ends.
Well,
that's
very
similar,
so
Delta
compresses
against
what
you
already
adds.
B
So
I,
don't
think
that's
I!
Don't
expect
that
stuff
to
add
a
lot
of
space,
but
I
actually
don't
know.
I
do
know
that
this
base
repo
is
not
in
a
perfectly
packed
state.
But
let's
look
at
that
for
a
moment.
So
I
can
look
at
the
objects
used
by
the
thing
itself
and
the
bundle
weights
so
what
objects
directory
and
then
I
want
the
tunnel.
B
A
B
Would
only
do
it
for
the
parents
and
then
you
end
up
like
no
matter
what
you
do
with
this
approach.
You
end
up.
You
risk
ending
up
sending
data
to
the
client
that
they
don't
need
yeah,
but
it's
likely
not
a
lot
of
data,
they
don't
needs
and
the
flipside
is
that
everything
is
faster,
so
yeah
yeah.
So
what
is
next
yeah
and
I'm
just
going
to
run
the
magic
clone
again,
so
that
was
at
RM
RF
thing
so
now,
I've
created
that
bundle
file
and,
let's
see.
B
C
B
C
B
B
A
So,
essentially,
what
we're
doing
is
with
like,
if
we
think
about
the
numbers
of
clones
like
we're
trading,
like
the
fact
that,
on
a
highly
cloned
or
regularly
cloned
repository
on
the
opportunity,
calculation
like
at
a
lower
frequency
on
the
server
side,
so
we're
reducing
the
compute
so
that
you
get
a
faster
clone
and
we're
paying
for
that.
In
the.
A
B
I
mean
and
gets
defense
during
a
normal
clone,
there's
already
a
lot
of
optimizations
going
on
the
the
normal
enumerated
objects,
counting
objects,
stuff
that
they
reuse
compressed
data
a
lot
already,
but
it's
not
the
same
as
what
what
this
is
doing.
Its
really
pretending
that
somebody
already
clones
at
the
level
of
the
bundle
file-
and
you
may
get
fetch
on
top
of
that,
and
if
you
do
a
good
fetch
of
a
week's
worth
of
data,
that's
always
going
to
be
faster.
B
B
Me,
okay,
yeah,
so
I
one
day
noticed
we
had
config
setting
in
the
manual
page
of
kid.
Config
calls,
let's
see
if
I
can
pull
it
up
back
objects
hook
this
thing.
So
this
is
a
config
setting
where,
instead
of
running,
get
back
objects,
which
is
something
that
runs
deep
inside
server
part
workload
of
a
nested
clone.
You
can
run
a
different
program
in
its
place.
B
That
gets
the
original
commands
as
as
an
arguments,
and
it
gets
the
same
standard
in
that
back
objects,
would've
received
and
whatever
it
works
to
standard
out
and
standard
error
goes
back
so
you're,
just
replacing
that
thing
and
I'm
rapping,
so
I'm
reppin.
This
get
back
objects
thing
and
the
input
of
get
back
objects.
Is
it's
just
a
list
of
the
objects
that
the
clients
once
optionally,
followed
by
a
list
of
the
objects
the
client
already
has.
B
So
during
a
normal
clone,
the
list
of
objects
the
client
already
has-
is
empty
and
what
I'm
doing
is
I'm
using
the
bundle
file
which
contains
a
list
of
the
reps
and
tags
it
contains
and
I'm
pretending
so
I'm
saying
two
pack
objects.
The
client
already
has
these
files
and
then
the
real
pack
objects
only
text
the
difference.
A
B
B
Yeah
it
uses
the
presence
of
this
magic
word
not
and
uses
some
part
of
the
input
structure
to
decide
if
it's
a
clone
or
not,
and
if
it's
not
a
clone,
and
it
is
a
fallback
to
the
normal
back
objects,
and
otherwise
it
opens
the
bundle
file
as
a
regular
go
file.
But
then
all
we
do
is
wrap
that
in
a
bow
file
new
reader.
B
So
if
this
instead
of
opening
this
file,
if
this
was
an
HTTP
gets
URL,
then
you
get
get
response
body,
which
is
also
an
I/o
reader
and
go
so
you
can
just
plug
anything
in,
and
this
is
all
we.
This
is
all
the
assumptions
we
make,
so
we
don't
have
to
seek
in
this
file.
This
is
a
nice
property
of
the
bundle
file,
because
a
bundle
file
starts
with
the
description
of
its
contents
at
the
top,
and
we
use
that
description.
B
No,
because
this
is
a
hook
that
runs
inside,
get
upload
back
and
get
upload
back
runs
on
the
yes,
ok,
we
could
glue.
No,
no
I,
don't
this.
This
is
not
because
the
there
are
two
parts:
there's
the
bundle
and
there's
the
little
extra
incremental
piece
on
top
and
we
income.
We
compute
the
incremental
piece
with
information
from
standard
input
into
this
hook,
and
that
information
comes
from
catered
load
back
eventually
and
get
the
plot
back
runs
on
the
Gately
server.
B
B
B
B
First,
is
this,
which
is
commit
IDs
and
ref
names,
and
that's
exactly
what
I
need
to
lie
to
pack
objects
and
say
the
client
already
has
these
objects
and
I
can
just
read
this
list
of
objects
from
the
top
of
the
file
and
then
after
that
becomes
binary
pack
file
data.
So
this
is
in
exactly
the
right
shape
to
pull
off
this
trick
without
having
to
seek
or
do
anything.
B
So
these
are
the
two
ingredients
you
need.
You
need
these
object,
IDs
that
describe
the
pack
file
and
you
need
a
pack
file
itself
now.
What
I
think
is
going
on
with
Delta
Islands.
Is
that
if
you
do
a
repack
with
Delta
Islands
and
you
do
it
rights,
then
the
first
X
bytes
of
your
resulting
PAC
file
are
actually
the
same
thing
as
what
we
have
here.
B
So,
but
you
need
to
know
what
X
is
like:
how
many
bytes
into
the
back
file.
Do
you
need
to
read
to
get
this
sub
back
file
plus
you
need
to
know
what
are
these
object,
IDs
that
describe
this
sub
PAC
file?
If
you
know
those
two
things
actually,
if
you
know
the
object,
IDs,
you
can
know
you
need
to
know
those
two
things.
Then,
basically,
you
could
generate
the
bundle
file
on
the
fly
by
reading
from
a
PAC
file
that
you
need
anyway,
I.
B
B
Yes,
but
the
thing
I
am
not
sure
about
yet
is
if
the
way
Delta
islands
is
implemented.
Now
we
have
enough
information
to
find
this
hidden
embedded
back
file,
because
it's,
if
you're
in
the
middle
of
the
gif
codes
that
is
generating
the
PAC
file
during
a
repack.
You
know
where
that
part
ends
and
the
rest
continues,
but
to
piece
that
back
after
the
facts
is
not
entirely
obvious.
Okay,
so.
A
B
It's
also
cheaper
than
storing
it
on
block
storage,
probably
and
yeah.
You
have
a
bunch
of
nice
properties.
Like
you
could
say,
depending
on
your
object,
storage
provider,
you
have
may
have
the
option
that
old
files
expire.
So
you
can
say
we
create
these
things
with
the
free
pose,
don't
get
access
a
lot
and
we
expire
their
precomputed
bundle
file
eventually,
and
we
keep
this
stored
that
mean
on
Kate
lot.
What
conduct
would
reduce
the
storage
cost
tremendously
yeah?
B
If
we
want
to
start
doing
this
behind
the
feature
flag,
then
there's
two
things
that
hook
executable,
that
I
made
needs
to
be
cleaned
up
and
included
into
Italy
and
into
the
Omnibus
config,
and
it
does
nothing
is
if
there
is
no
file
called
cloned
or
bundle.
So
we
can
just
install
it
globally
and
leave
it
there
and
it's
transparent,
and
then
the
only
thing
that
we
need
on
top
of
that
is
a
way
to
is
to
have
a
special
RPC
that
can
create
those
clones
or
bundle
files
on
demands
or
a
way.
B
B
Yeah
I
mean
you
get
I
think
it
happens
in
normal
use,
because
if
the
client
would
have
actually
done
a
fetch
at
the
time
that
the
bundle
file
was
created,
it
would
have
those
objects
like
if
it
would
have
flown
then
yeah.
If
it
would
later
does
a
fetch,
then
it
has
some
dangling
objects.
So
it's
the
same.
It's
the
same
situation.
B
C
C
B
B
A
B
B
And
you're
also
only
accelerating
a
full
clone.
If
somebody
does
a
partial
fetch,
this
thing
gets
out
of
the
way
and
does
nothing
so
then
you
fall
back
to
the
normal
situation.
If
somebody
does
a
partial
clone,
I
think
in
CI
we
do
a
lot
of
partial
clones.
Sorry
shallow
clone.
We
have
limited
history
depth
that
also
completely
bypasses
this
so
you're.
Investing
this
big
space
costs
to
accelerate
exactly
one
type
of
request.
A
B
I
think
so
local
shelf
own
is
not
necessarily
cheap.
I
mean
what's
cheap
about
shallow
clone
is
the
amount
of
data
you
send,
but
the
problem
is
that
because,
as
I
understand
it
is
that
the
way
back
objects
works
but
the
way
it
works
it
tries
to
avoid
sending
data
that
the
client
doesn't
need.
You
know
the
existing
packets
object
is
arranged.
The
pack
data's
arranged
in
such
a
way
that
it
would
include
objects
that
do
not
belong
in
the
shallow
clone
results.
B
A
B
B
I
I
don't
really
know
how
the
scales
end
up
between
what
you're
safe
and
because
the
downside.
The
other
downside
is
that
if
you
feed
the
wrong
information
into
the
real
PAC
objects
process,
it
actually
takes
longer
in
the
first
iteration
of
this
I
just
thought:
okay,
I'm,
just
going
to
take
a
chunk
of
a
back
file
and
I
can
use
the
index
to
just
get
the
object.
Ids
so
I
just
had
a
random
selection
of
objects
and
I
said
during
a
clone,
I
would
say
to
PAC
objects.
B
I
want
this,
and
by
the
way,
I
have
this
random
really
long
list
of
a
couple
hundred
thousand
objects,
and
then
PAC
objects
became
very
slow,
so
that
was
that
was
bad
and
was
nice
about.
The
bundle
scenario
is
that
there
are
900,000
objects
in
that
bundle
file,
but
that
list
of
refs
at
the
top
is
not
none
at
nine
hundred
thousand
objects
and
I,
don't
know
the
details,
but
my
suspicion
is
that
if
you,
if
the
list
of
haves
is
too
long,
it
may
slow
down
PAC
objects.
B
A
B
Thanks
I'm
very
enthusiastic
about
this
myself
and
I
I've
been
sitting
on
this
for
a
while,
because
but
I
was
too
busy
doing
other
stuff
and
I.
You
can
ask
me
the
question
yesterday,
James
or
I
said
something,
and
then
you
asked
the
question
and
I
said:
okay,
I
better
put
my
money
where
my
mouth
is
and
prove
it.
This
is,
if
I
can
do
this
or
not
or
if
we
can
do
this
and
we
can.