►
From YouTube: GSoC 2022: Git Cache Maintenance Projects Idea
Description
Git Cache Maintenance Projects Idea
Brainstorming Together About Ideas and Alternatives
Objective
Meet for 60 minutes with those interested in the Git Caching project idea to discuss ideas and alternatives and to identify areas where there may be questions. Encourage discussion of different alternatives and ideas that might lead us to a better implementation.
A
So
the
idea
is
that
the
jenkins
git
plugin
has
many
caches,
that
it
maintains,
as
on
the
controller
and
those
caches
by
their
nature,
sometimes
become
sub-optimal
because
git
operations
are
not
focused
on
maintaining
long-term,
optimization
they're
forced
focused
on
short-term
performance,
and
so
this
idea
is
hey.
Let's,
let's
find
ways
to
automate
the
process
of
maintaining
those
caches
and
keeping
them
healthy.
A
So
so
the
idea
was,
I
was
thinking
and
scribbling
about
something
with
now
you'll
notice,
my
lovely!
This
is
such
a
beautiful
user
interface
picture.
I
know
you
all
wish
you
did
user
interface
pictures
like
this.
The
the
idea
is
on
the
manage
jenkins
page.
So
let's
bring
up
a
real
jenkins
and
look
at
it
so
that
we
can
see
how
real
okay
so
on
the
manage
jenkins
page
here
today,
there
are
these.
These
things,
like
the
label
implications
and
like
configuration
slicing
and
like
configuration,
is
code.
Each
of
them
is
its
own
subpage.
A
If
you
will-
and
I
was
thinking
okay,
this
get
cash
maintenance,
maybe
belongs
in
some
sort
of
a
subpage
of
managed
jenkins
like
this,
so
that
was
the
first
now
to
the
rest
of
you.
Does
that
make
sense
to
you
or
is
there
something
you
would
recommend?
Instead,
no
it'd
be
better.
If
we
did
it
this
other
way,.
B
So
mark
one
of
the
questions
yeah,
I
I
think
definitely
we
should
have
a
separate
page,
because
I
was
going
through
get
maintenance
documentation
yesterday,
and
I
saw
that
there
is
a
lot
of
behavior
that
that
is
customizable
right
and
we
would
want
the
user
to
be
able
to
have
that
in
a
separate
page.
Instead
of
doing
it
in
the
let's
say,
configure
system
or
the
global
team
configuration.
B
But
my,
but
my
biggest
concern
with
having
which
I
saw
in
the
document
as
well
in
your
ideas,
was
that
having
a
page
word
would
be
a
global
settings.
Right
would
be
the
like
a
system-wide
configuration
where
all
of
the
repositories
would
have
the
same
configuration
for
maintenance.
A
That
that
was
at
least
my
assumption,
so
I
think
what
you're
highlighting
is
there
may
be
cases
where
I
need
to
do
specific
repository
configuration.
For
example.
I
know
the
linux
kernel
needs
some
different
cache
maintenance
operations
configured
than
every
other
repository
in
my
system,
because
that
linux
kernel
repository
is
enormous.
Is
that
sort
of
what
you're
alluding
to
rishabh.
B
My
two
two
concerns
there.
One
is
that
how
is
my
get
executable
chosen
when
I'm
running
this
command?
I
mean
I
was
looking
at
get
maintenance
start
and
you
do
that
where,
on
whatever
repository
you're
doing
that
it's
going
to
choose
to
get
executable
on
the
basis
of
that
repository
and
in
let's
say
in
a
system
where
we
have
multiple
executables,
then
how?
How
is
that
going
to
happen?
B
A
A
And
the
memory
footprint
shrinks
or
the
memory
footprint
is
not
inside
the
controllers:
jvm,
okay,
so,
and
now
now
back
to
your
question,
how
is
the
get
executable
chosen?
I
think
that,
wouldn't
you
think
that
would
need
to
be
some
sort
of
a
global
setting
say
I
want
to
use
git
or
when
you
see
all
I
get
or
I
want
to
use
j
git.
B
A
Yeah,
okay,
so
so,
and
when
I
think
about
global
tool
configuration
what
it
presents
to
me
is
possible,
get
implementations,
but
it
doesn't
really
choose
one
right:
it
presents.
I've
got
one.
I
named
get
windows,
I've
got
another
one.
I
named
git
dash,
2.11.1
and,
and
those
are
any
one
of
them
I
can
choose,
but
none
of
them
is
selected.
If
I
recall
is,
maybe
I'm
wrong
is
default
selected
as
the
default.
A
C
I
was
wondering
that
is
it
possible
if
we
can
have
have
something
that
that
uses
both
the
j-gate
and
the
cli.
A
Oh,
oh,
that's
a
good
question.
Okay!
So,
let's
put
that
are
there
cases
where
it
would
be
useful
or
helpful.
A
The
hypothetical
would
be
something
like
what
reshab's
project
did
two
years
ago,
which
was
what,
if
j
get,
is
significantly
faster.
A
Faster
at
some
operation,
what,
if
cli
get,
is
significantly
faster,
so
reshab
found
by
benchmarking
that,
with
large
repositories,
cli
git
is
significantly
faster
for
fetch
operations.
Rishab
did,
I
say
say
that
correctly.
Yes,
yes,
so
it's
a
good
question.
Should
we
should
we
consider
the
potential
that
we
might
need
to
do
some
performance-based
selection
ooh?
This
repository,
we
know,
is
this
size
and
we've
got
in
our
toolbox
j,
both
jet
and
cli,
get
versions
such
and
such
and
we've
had
run
benchmarks
previously
that
tell
us
with
that
repository
size
or
this
some
characteristic.
A
A
A
A
B
Yeah
mike,
the
second
concern
that
I
that
I
have
with
global
configurations
is
that
the
tasks
that
are
going
to
be
performed
with
gate
maintenance,
some
of
those
tasks
are
direct,
are
correlated
to
the
size
of
a
repository.
B
So
there
is
a
possibility
that
I
don't
want
to
run
gc.
Let's
say
for
a
huge
repository
with
the
said
interval
that
I've
set
in
the
global
configurations,
because
I
know
that
that
repository
will
take
a
lot
of
time.
Gc
operation
would
take
a
lot
of
time,
so
do
we
want
to
do
we
want
to
give
an
overrideable
way
somewhere?
B
I
I
think
that
would
be
possible
right.
I
mean
having
a
global
configuration
and
then
a
way
to
override
that
configuration
for
repository.
A
Of
the
maintenance
tasks
right,
because
I
think
I
think
you
you
raise
an
excellent
point.
Garbage
collection
on
the
linux
kernel
repository
takes
a
very
long
time
is
very
cpu
intensive
and
very
memory
intensive.
It
will
with
command
line.
Git
use
every
core
on
your
system
and,
if
I
remember
correctly,
it's
willing
to
use
almost
as
much
memory
as
you
give
it.
A
A
So
so
it's
it's
a
very
good
question.
What
what
might
we
consider
so
one
might
be
override
rules,
maybe
where
we
say
or
override
settings
based
on
based
on
repository
size.
A
A
C
So,
do
we
absolutely
restrict
the
user
from
g
saying
even
like
once
a
week
or
twice
a
week,
or
do
we
just
like
strongly
warn
them
that
it
could
be?
It
could
be
eating
a
lot
of
memory.
A
So
so
for
me,
I
would
I
would
generally
it's
a
good
question,
preferred
in
the
past
anyway,
to
allow
the
user
to
choose
to
do
it
and,
where
necessary,
offer
them
a
warning
or
even
better
offer
them
hints.
If
things
are
going
badly,
that
would
tell
them
why
things
are
going
badly.
So
should
we
we
prevent
users
from
doing
certain
tasks.
A
B
Yes,
I
that
is
a
so
I
was
reading
about
commit
graphs
as
a
task,
and
I
I
got
to
know
that
there
is
a
there's,
a
setting
which
is
not
enabled
by
default,
which
is
called
right
command,
craft
fetch.right
command
cloud.
So
what
it
does
is,
if
so,
how
commit
graphs
would
work
right
now?
B
Is
that
your
gc,
whenever
your
gc
runs
it's
going
to
update
your
commit
graph
and
after
that,
whenever
there's
going
to
be
a
fetch
in
your
repository,
so
the
commit
graphs,
the
amount
of
time
that
it's
going
to
take
for
it
to
update
the
commit
graph
depends
on
the
the
number
of
commits
that
that
are
going
to
happen
to
your
repository.
B
B
So
I
mean
we
need
to
look
at
the
individual
tasks
that
we
we're
enabling
by
default
and
see
how
they're
going
to
you
know
affect
the
existing
user
behavior
or,
if
they're,
going
to
affect
the
existing
behavior.
A
Well
and
now
to
take
that
theme,
how
could
we,
how
could
we
make
the
information
about
that
task
available
to
the
user?
What
if
we
gave
them
a
an
entry
on
the
ui?
Something
like
this?
Let's
see,
update
commit
graph
down
here
and
one
of
the
one
of
the
data
points
we
show
them
is
the
trend
graph.
That
shows
how
long
that
ran
on
their
repositories
and-
and
hopefully
they
look
at
the
graph
and
say:
oh
wow,
here's
this
repository,
where
the
no?
A
B
A
Yeah
but
okay
so
write
a
commit
graph.
Okay,
here's
commit
graph
right.
A
B
Let
me
either
send
you
a
link.
Can
you
open
this.
B
So
there
is,
I
believe,
I'm
not
hundred
percent
sure,
but
there
is
a
way
they
change
the
current
graphs
so
that
you
know
they
take
the
deltas
and
not
the
whole.
They
don't
update
the
whole
commit
graph
on
the
basis
of
you
know,
every
fetch
that
they're
doing,
if
you
have
this
setting
enabled,
but
if
you
don't
then
they're
going
to
write
it
every
time.
A
Well
and
see,
I
don't
know
how
costly
it
is,
but
I
think
I
think
it's
worth
us
worth
us
just
doing
exploration.
They
chose
to
disable
it
by
default,
so
it's
certainly
a
cost
that
I'm
not
paying
at
all
right
now
right,
I'm
when
I
do
a
fetch.
I
none
of
the
the
git
repositories
that
I
handle
are
doing
this,
and
yet
it
says
it
would.
I
do
get
log
minus
minus
graph
all
the
time,
and
so
this
says
wow.
B
I
mean
I
my
my
point
was
just
that
we
need
to
look
at
each
of
the
tasks
and
the
settings
that
they're
providing
and
then
think
what
strategy
could
we
implement,
which
ones
to
enable
by
default,
which
ones
do
not.
A
A
C
I
think
this
might
be
due
to
something
that
was
implemented
in
one
of
the
later
versions
of
kit.
C
It
significantly
reduced
the
number
of
comments
that
it
needed
to
read
through,
and
I
think
it
it
used
like
some
khan's
algorithm
and
computed
the
number
of
in
degrees
while
it
was
traversing
the
the
the
graph
but
but
like
after
after
the
generation
found
after
the
generation
count
was
implemented,
it
didn't
need
to
and
it
got
a
lot
more
efficient.
C
A
Yes,
okay,
that's
very
wise
because,
well
and
to
your
point,
commit
graph
may
not
even
be
available
on
some
of
the
command
line,
git
versions
that
we
run
and
and
may
not
help
if
it
were
available
right
because
if
I'm
using
doing
a
command
line,
git
operation
and
the
command
line,
get
implementation
doesn't
know
anything
about
commit
graph.
It
certainly
can't
use
it
interesting,
good,
okay,
very
good.
B
Yeah
and
earlier
it
used
to
do
a
commit
craft
update,
while
it
was
doing
the
gc
tasks
so
their
their
rationale.
There
was
that
g
compared
to
the
gc
task
site
commit
graph,
won't
take.
You
know
much
of
the
operational
tank,
so
they
they
club
it
together
and,
and
that
is
what
they
used
to
do.
A
Okay
and
that
that
makes
sense
to
me
at
least
it's
like
yeah
garbage
collection
is
very
expensive
right.
It's
doing
yes,
it's
doing
recombining
and
then
it
does.
This
large
compression
operation
and
and
compressing
files
is,
is
almost
always
very
expensive,
so
so
yeah
that
makes
sense.
You
could
easily
hide
a
small
operation
like
commit
graph
inside
all
the
time.
You're
spending
doing
garbage
collection,
good,
okay,.
A
Yeah,
so
so,
well,
so
for
me,
it
would
be
okay.
If,
on
the
task
selection,
I'm
going
to
propose
an
idea
and
let's,
let's
test
it
as
an
idea,
and
then
we
certainly
can
throw
it
out.
My
initial
thought
was
test.
The
the
task
selection
priority.
Here's
my
proposal-
okay,
so
I
think
prefetch
has
the
most.
A
A
A
A
So
for
me,
I
think
this
one
is
should
be
priority.
One
first
choice
make
sure
that
works
and
we
get
good
results.
Now,
if
we're
doing
prefetch,
then
the
next
question
is:
okay.
Now
we're
potentially
every
hour
bringing
in
or
every
some
time
bringing
in
things
that
are
come
in
as
loose
objects.
They
come
in
without
necessarily
being
well
well
placed
inside
the
repository.
D
The
incremental
repack,
basically
I
I
feel
it
works
like
a
you,
know,
b3,
okay,
where
all
the
objects
are
placed
in
a
sorted
manner,
okay
in
in
the
in
the
mdx
file,
okay
and
each
object
is
referred
to
as
separate
pack
file.
So
it
would
be
easier
to
search
through
the
comments
if
you,
if
we
have
a
incremental
repack
as
a
second
option,
is
what
I
feel.
A
B
A
So
so
my
interpretation
of
the
way
this
is
described
is
it's
doing
the
equivalent
of
a
get
fetch
minus
minus
all,
but
placing
the
refs
in
a
different
location
so
that
the
repository
doesn't
doesn't
so
the
repository
state
of
the
mod
of
the,
for
instance,
the
master
branch
pointer,
is
actually
not
updated.
So
it
says
this
is
done
to
avoid
disrupting
the
remote
tracking
branches.
My
interpretation
of
that
is
prefetch.
B
A
A
A
D
This
actually
gave
me
an
overview
like
actually
it
gave
me
an
overview
of
exactly
how
this
incremental
repack
works
using
the
multi-pack
index,
so
the
both
the
commands,
that
is
the
expire
and
the
repack
command,
has
been
explained
in
this
okay,
so.
A
A
Let's
talk
about
the
linux
kernel,
multi-pack
files
can
cost
cost
time,
but
we
may
not
be
able
to
repack
into
a
single
pack
pac
file
because
it
just
takes
too
long
right
or
consumes
too
much
space,
and
so
what
this
is
offering
us
is
the
multi-pack
index,
and
we
get
that
by
doing
the
incremental
repack.
Is
that
correct
for
shikash.
A
B
Okay,
oh
and
sorry
mark
I
one
question
that
I
have
is
that
there
is
also
loose
objects
in
the
maintenance
task
right.
So
I
I
guess
this
is
more
of
a
confusion
for
me.
If
we're
doing
a
pre-fetch,
then
are
we
introducing
more
loose
objects
into
the
the
local
directory
or
we're
introducing
more
pack
files?
A
A
A
And
let's
go
look
at
it
just
to
see
so
how
about
in?
Let's
see.
A
Okay,
so
here
is
something
and
now
what's
in
its
pack
direct
yeah,
okay,
here's
a
here's,
a
terrifying
example
of
this
is
a
hundred
or
150
megabyte
repository.
I
use
it
to
test
all
sorts
of
awful
things,
and
so,
but
what
you
see
here
is
an
embarrassing
number
of
pac
files
right.
There
really
should
be
in
an
ideal
world
too,
an
idx
and
a
pack,
and
that's
it,
and
this
has
many
many
more
than
that
and
it's
got
all
sorts
of
loose
objects.
Now,
if
I
do
a
git
pole.
A
It
added
four
more
files,
so
I
think
that
indicates
it
did
add
new
packs.
Not
just
new
loose
objects
did.
Did
that
address
your
question?
Yes,
yes,
now
we
should,
and
we
should
be
able
to
see
that
by
doing
this,
we
should
see
that
yeah
notice
here
is
here
is
something
which
changed
february
12
and
then
there
are
four
more
things
march,
22.
A
A
A
B
A
A
A
C
A
A
And
now
we
have
oh
and
look
there.
It
is
loose.pack
okay,
so
so
it
and
and
back
to
their
comment.
They
said:
hey
we're
going
to
do
loose
objects
and
it's
going
to
create
the
new
pack
file,
but
it
did
not
apparently
delete
all
the
other
things
it
left
them
around.
So
there's
a
pac
file
for
use,
but
the
loose
is
still
seems
to
still
be
there
interesting,
okay
cool.
A
A
I
see
what
your
point
is:
okay
and
but
now,
let's,
let's
test
that,
so
they
say
they
run
incremental
repack
and
loose
objects
daily,
but
they
run
prefetch
hourly
by
default.
So
should
we
be
considering
their
24
times
more
frequently
running
prefetch
than
they
are
running
incremental,
repack
and
honor?
The
same
idea.
D
I
have
a
doubt
here:
would
this
incremental
repack
would
would
it
even
consider
lose
objects
as
part
of
it?
I.
A
Okay,
it
deletes
unreferenced
pac
files
and
then
combines
pac
files,
so
I
would
think
incremental
repack
does
not
do
loose
objects
that
would
lobby
for
rashob's
argument
that
we
should
do
loose
objects
and
incremental
repack
as
sort
of
two
steps
close
to
each
other
one
right
after
the
other.
Is
that
what
you
were
asking.
D
A
A
A
A
A
A
Okay,
all
right
next,
so
this
one,
oh
okay!
Now
now,
then
this
is
so.
We
saw
this
one
loose
objects
created
entries
in
the
in
the
pack
directory
lou
stash
something
what
you're
thinking
is.
This
may
actually
create
them,
as
real
packs
is
that
is
that
correct?
What
you're
saying
ariane?
Yes,
okay!
So
let's
try
that.
A
A
D
A
E
A
Oh
good,
okay,
all
right
so
so
for
me
pack
refs
now
now
my
repositories
typically
don't
have
an
enormous
number
of
of
references.
The
that
hundred
megabyte
one
that
you're
you
were
seeing
is
probably
has
several
thousand
might
be
as
many
as
ten
thousand
most
get
plug
jenkins
plug-in
repositories
have
far
fewer
than
that
right.
They
have
on
the
order
of
hundreds,
maybe
interesting.
Okay,
so
back
to
the
question:
when
now
wait
a
sec,
they
don't
even
list
pack
refs
here
as
a
task.
A
E
B
A
B
A
Yeah,
the
challenge
for
me
would
be:
how
would
because
of
what
it's
doing?
How
would
we?
How
would
we
do
that
benchmark
it's
so
it's
okay!
So
here
we
go
a
repository
with
too
many
refs
should
pack
all
its
refs
with
minus
minus
all
once
and
then
run
pack
refs,
so
I
assume
so
it'd
be
get
pack
refs,
minus
minus
all
and
then
every
so
often
run
this.
D
I
think
this
would
be
useful
if
you
have
a
lot
of
branches.
A
lot
of,
if
you
have
too
many
branches
on
or
get
repository,
has
too
many
branches.
B
B
So
the
reason
why
I'm
stressing
on
that
is
we
so
when
we
did
the
benchmarks
on
gate
operations,
what
we
found
out
was
that
the
time
it
takes
for
a
git
fetch
to
happen
is
is
is
a
function
of
the
amount,
the
size
of
objects
that
you
have
in
your
repository,
rather
than
the
number
of
comments
or
the
number
of
branches
or
none
of
the
tax.
That
is
what
we
found
at
that
time.
A
A
I
don't
know
how
we
would
get
that,
but
if
we,
if
we
computed
the
number
of
references
and
if
the
number
of
references
was
beyond
some
certain
threshold,
like
they
say
here
right
a
repository
with
too
many
riffs.
So
if
we
did
some
measurement
periodically
and
said
this
repository,
has
this
many
refs,
some
arbitrary
number,
a
hundred
thousand
and
if
it
has
more
than
that,
we
will
at
least
once
do
a
pack
pack,
refs
minus
minus
all
and
then
automatically
schedule
it
to
do
a
get
pack
refs
once
a
week.
Something
like
that.
A
A
Okay,
well
this
this
has
been
a
most
effective
session.
Thank
you
very
much
to
everyone
who's
been
here.
I
had
wanted
to
limit
us
to
an
hour.
What
I'd
propose
is,
if
you're,
would
you
like
to
do
this
kind
of
session
again?
Are
you
willing
to
have
these
kind
of
discussions
and
would
it
fit
for
you
if
we
did
it
later
this
week
or
early
next
week?
Would
that
be
okay?
Are
you
interested
in
that,
or
is
this
not
nearly
interesting
to
you,
you'd,
rather
just
focus
on
other
things?
What
what's
your
feedback.
D
D
So
I
it's
better
if
we
have
it's
good,
if
we
have
like
the
these
kinds
of
sessions.
A
C
A
Okay,
then,
what
I'd
propose
is,
let's
plan
for
an
hour
a
week.
If
that's
okay
and
it
would
be,
it
would
actually
be
a
little
better
for
me
if
we
were
willing
to
do
it.
On
my
on
my
day
when
I
already
am
doing
office
hours,
so
would
you
be
willing
to
do
it
fridays
rather
than
doing
on
on
wednesday,
like
we're
doing
this
one?
A
A
And
we'll
try
to
meet
weekly
meet
to
discuss,
so
that
means,
let
me
double
check
my
calendar
just
a
minute.
To
be
sure,
I've
got
the
right,
so
that
means
we
would
next
meet
on
friday.
The
1st
of
april.
C
A
A
A
Then
then,
let's
plan
for
that
and
if
we
and
then
we'll
we'll
try
the
same
thing
the
following
week
and
and
let's
make
some
progress
thanks
very
much
I'll
upload,
the
recording
of
this
probably
24
hours
from
now.
It's
I'm
a
little
behind
schedule
on
recordings
right
now,
thanks
everybody
for
your
time.
Thank
you.
So
much.