►
From YouTube: 2022 06 08 Git Cache Maintenance
Description
Git cache maintenance project for Google Summer of Code 2022
A
We'll
discuss
the
action
items
first,
so
the
first
action
pending
action
item
that
we
had
was
to
update
the
project
details
for
entry
details.
I
believe
rashikesh
has
raised
a
pr
on
the
jenkins
I
ordered.
A
A
And
to
do
this,
I
I
myself
had
to
go
deep
into
the
kids
and
understand
the
pattern,
so
I
realized
that
during
my
time
you
know
I
was
I
I
never
I
I
remember
I
had
to
use
this
the
descriptors.
We
have
to
use
it,
but
I
was
never.
You
know
I
never
explored
into
the
depth
that
I
should
have
to
explain
this
pattern,
so
I
created.
B
D
Implementation,
you
know
the
the
ui
is
based
on
the
implementation,
so
I
I
post,
you
know
the
implement
implementation
for
now,
and
I've
created
a
design
document.
You
know
explaining
two
ways
of
implementation
that
is
using
current
syntax
or
you
know,
using
the
build
discarder.
So
that
was
what
I've
done
this
entire
week.
C
So
then
it
feels
like
that
rashad
and
I
should
review
the
design
document.
Do
you
want
to
give
us
a
summary
of
the
the
concepts
that
you
found
different
between
the
two
and
do
you
have
a
particular
one
that
you
recommend
for
shikesh.
D
Oh
yeah,
so,
oh
so
basically,
this
was,
you
know
to
just
you
know,
explain
the
difference
between
both
the
strategies,
the
global
builders
carter.
The
aim
of
the
global
builder
scholar
is,
you
know,
to
schedule
maintenance
tasks
without
having
a
cron
syntax
in
the
ui.
Okay,
it
is
auto.
It
is
done
intelligently
by
jenkins
internally,
the
cron
syntax,
you
know,
or
administrators
have
to
pass.
You
know
you
know
cron
syntax,
for
each
maintenance
task
in
in
that
strategy,
so
the
global
build
discarder.
D
Here
I
I
have
written
an
em
or
you
know
a
working
of
how
exactly
it
works.
I
was
going
through
the
documentation.
D
So
basically,
if
you
check
there
is
or
what
it
is
or
class
called
background,
global
build
discarder
yeah
that
that
so
this
background
global
does
build
discarder
it
executes
every
hour.
There
is
this
method
called
get
a
recurrence
period,
okay,
which
is
which
is
currently
by
default,
set
to
hour.
So
every
every
hour,
this
this
thing
runs
and
it
calls
the
execute
method.
D
This
execute
method
calls
the
process
job.
Okay,
it
gets
its
check,
it
checks
all
the
jobs
in
the
jenkins.
On
the
jenkins
controller,
and
then
it
checks
whether
this
job
is
applicable
for
a
global,
for
you
know
discarding
the
previous
builds
which
have
been
present
or
not.
D
If,
if
you
go
to
ratio,
can
you
go
to
the
design
document
yeah?
So
in
the
second
step
yeah,
it
calls
the
execute
function
hourly
and
runs
the
build
discarder
on
all
the
jobs
present
on
the
jenkins
controller
it
it
is
based
on
the
strategy
present
in
the
global,
build
discarder.
Okay.
So
basically,
there
is
a
of
a
function
which
we
need
to,
or
you
know
over
override,
that
is
the
is
applicable
function
which
the
user
needs
to
set.
D
If
the
user
sets
this,
so
basically,
whichever
functionality
he
writes
in
the
is
applicable
function,
that
configuration
is
used
by
jenkins
internally
to
decide
whether
it
should
this,
you
know,
run
the
build
discarder
on
that
job
or
not.
This
is
the
basic
functionality
of
the
global
build
discarder.
D
I
was
thinking
we
can
use
the
same
functionality
for
you
know,
but
it
all
maintenance
tasks
as
well,
but
I
was
having
few
questions
regarding
this,
because
global
build
discarder
is
only
used
for
jobs.
It's
only
used
to
iterate
over
jobs.
So
is
there
any
ways
you
know
where
is
there
any
way
where
we
can
use
it
for
caches
as
well.
C
But,
but
I
think,
I'd
assume
that
you
were
thinking
here,
because
global
build
discarder
happens
every
hour.
Would
we
then,
in
the
ui,
have
the
user
choose
something?
C
D
D
I
didn't
want
it
to
overload
the
system
okay,
so
I
was
thinking
that
is
there
any
way
where
we
can
find
the
cpu
utilization,
how
much
cpu
has
been
used
or
how
much
ram
has
been
consumed
so
that
you
know
based
on
that
data,
we
can
you
know
schedule,
you
know
the
maintenance
tasks
or
hourly,
or
you
know,
every
three
hours
and
let
the
user,
even
you
know,
have
an
option
of
scheduling
it
weekly.
D
Something
like
that.
I,
that
was
what
I
was
thinking-
I'm
not
sure
about
how
we
would
is
there
any
other
intelligent
ways
of
scheduling,
maintenance
tasks.
A
I
was
just
saying
that
I
think
we
should
divide
this
into
two
steps
as
mark.
The
first
thing
that
mark
asked
is
that,
as
we
decided
during
the
initial
project
right
there,
we
are
going
to
expose
a
way
for
the
user
to
be
able
to
set
a
schedule
for
these
tasks
right.
So
the
when
you
talk
about
cpu
utilization
and
then
the
system
having
the
intelligence
to
be
able
to
show
you
the
jobs
on
the
basis
of
the
current
status
of
the
system.
A
We
don't
have
to
make
it
intelligent
at
the
first
begin
with
the
first
iteration
of
this
feature
right,
we
can
start
with
the
first
rule
which,
which
I
believe
please
correct
me
if
I'm
wrong
was
to
take
the
user's
input
on
what
is
the
frequency
with
which
they
want
to
show
you
these
jobs
and
then
use
that
input
to
run
the
tasks
and
not
think
about.
I
mean
if
we
could
think
about
it.
It's
great,
but
it's
okay!
If
we
don't
think
about
cd
utilization
at
the
first
step,
would
you
agree
with.
C
A
C
Yeah
now
we
push
progression
in
terms
of
I'm
not
sure
that
cpu
load
is,
is
the
crucial
measure
there
I
would
or
and
and
even
if
it
is,
it
may
be
difficult
to
get
that
in
a
platform-independent
way.
I
think,
though
we
can
get.
How
long
did
the
sub-process
run
before
it
completed,
and
the
duration
of
the
of
the
run
of
a
sub
process
gives
us
a
first
first
level
approximation
of
how
much
demand
it
placed
on
that
on
that
computer.
C
C
D
Yeah,
so
I
have
a
doubt
so
basically
this
what
you
did.
Finally,
are
we
let
other
ui
in
the
ui?
Are
we
going
to
know
what
what
kind
of
ui
are
we
looking
into?
Are
we
looking
into
you
know
or
taking
you
know,
cron
syntax
from
the
administrator
to
schedule,
maintenance
tasks,
or
do
you
want
the
do?
You
want
the
maintenance
tasks
to
be
run
automatically
without
having
a
input
from
the
administrator.
C
C
C
So
if
if
we,
if
we
went
with
the
global,
if
we
went
with
the
global
build
discarder
concept
and
so
it
checks
every
hour,
is
there
work
to
be
done
and
then,
if
the
job
configuration
said
only
do
that
every
24th
hour
or
every
48
hour
that
might
be
added
might
be
enough
and
then
we
we
don't
need
to
process
cron
syntax.
Now,
I'm
not
sure
that
jenkins
users,
jenkins
users
may
say
well,
but
I
had
to
learn
cron,
syntax
everywhere
else
and
and
they'd
be
right.
C
D
C
Yeah,
that
was,
I
agree
with
your
your
observation,
at
least
for
me.
I
have
a
hard
time
imagining
a
repository
that
is
busy
enough
that
refreshing,
its
cash
every
hour
would
be,
would
be
important
and
even
more
difficult
to
envision
that
refreshing,
its
cash.
Every
every
few
minutes
would
be
worthwhile.
D
Oh
or
we
can
give
an
option
where
you
know
we
can
run
it
not
hourly,
or
you
know,
on
a
daily
basis
or
some
no
to
or
something
configurable
by
the
administrator,
because
the
aim
of
the
global
build
discarder
is
to
not
have
crons
in
tax
it.
It
has
to
be
done
by
jenkins
internally,
the
other
implementation
provides
crons
and
taxes
for
the
year.
You
know
administrator
where
he
can,
where
they
can
plug
in
the
crowns
and
taxes
and
run
the
maintenance
tasks,
so
that
that's
the
whole
difference
between
both
the.
A
I
just
wanted
to
say
on
this
point
of
how
do
we
decide
in
shedding
these
jobs
frequency
is
going
to
be.
I
believe
we
should
test
this
idea
by
when.
A
This
feature
we
should
run
this
on
mark's
machine
which
has
a
lot
of
projects.
I
I
believe
we
should
take
inputs
from
that
from
that
machine
on
how
the
the
frequencies
that
we're
trying
to
whatever
is
frequencies
that
we're
assuming
should
be,
you
know
optimal
for
the
system.
We
should.
I
believe
it
will
be
a
good
practical
test
for
us
to
know
how
it's
actually
going
to
work
on
a
user's
machine,
and
if
our
frequency
is
not,
you
know
the
optimal
range
that
we
wanted
to.
A
C
A
A
C
A
C
D
C
A
So
sarushi,
as
I.
A
But
for
the
user
we
want
to
provide
a
way
for
them
to
configure
these
tasks
right,
yeah,
and
if
that
is
the
aim,
then
more
customizability,
especially
when
that
feature
affects
the
performance
of
the
system
like
more
granularity,
could
mean
that
the
admin
would
have
more
options
to
essentially
find
out
the
let's
say
we
don't
r
frequency
or
whatever.
We
think
the
different
default
frequency
is
not
the
one
that
should
be
divided
in
their
system.
A
C
D
Nothing
other
that's
what
I've
gone.
Actually,
though,
I've
gone
through
the
entire
cron
syntax
implementation,
and
I'm
pretty
much
confident
of
when
implementing
it.
That's
there's
no
favoritism
as
that
sort,
I'm
still
a
bit
confused
about
the
global
build
discarded
like
what
are
the
various
conditions
based
on
which
we
are
going
to
implement
the
maintenance
tasks
so.
A
D
What
you
know
we
can
we
can
create
our
own
asynchronous
thread
by
extending
the
you
know:
async
type
periodic
work
that
does
that
yeah.
If,
if
you
extend
to
that,
you
can
create
your
own
background
process
and
and
then
run
the
maintenance
task
in
that
thread,.
D
Here
what
would
happen
is
and
and
that
in
the
pattern
in
the
cron
syntax
implementation,
there
would
be
one
thread
which
would
be
running
every
minute,
which
would
check
every
minute
whether
the
cron
syntax
is
valid
or
not.
If
it
is
valid
and
then
the
corresponding
maintenance
task
is
run
on
all
the
repositories.
A
B
D
Over
all
the
jobs
and
then
yeah
yeah
yeah,
both
of
yeah
exactly
the
rest
of
the
implementation,
is
same.
The
only
main
feature
would
be.
Are
we
taking
the
cron
tip
syntax
from
the
user
administrator
and
scheduling
it,
or
do
we
intelligently
schedule
it
behind
the
scenes
in
jenkins.
B
C
D
We,
even
you
know
we
can
safeguard
crons
and
tax
or
like,
as
I've
stated
in
one
meet
that
assume
or
an
administrator
runs
a
gc
every
minute.
Okay,
here's
his
in
taxes
corresponding
to
every
minute
or
every
30
minutes.
We
can
safeguard
by
putting
some
rules
behind
the
scenes
where
you
know
he
can
start.
You
know,
running
maintenance
tasks.
Only.
D
C
A
Was
just
saying
that
we're
tilting
more
towards
preferring
the
second
approach
right,
the
parameters,
better
music
from
syntax
approach,.
C
A
A
A
B
A
C
I
can
imagine
someone's
decided
to
compile
the
linux
kernel
for
their
raspberry
pi
and
as
part
of
that
they're
doing
a
garbage
collection
operation
on
the
two
gigabyte
linux
kernel
repository
on
their
raspberry
pi
controller,
and
it
may
take
many
hours,
but
they
get
the
benefit
that
when
it's
done
it's
it's
done.
I
tell
tell
me
more
about
cases
where
you
worry
that
you're
worried
that
hey,
they
may
have
many
copies
of
that
and
therefore
they
might
somehow
not
be
able
to
complete
the
other
work.
A
C
B
C
D
A
You
know
prepare
for
right
now,
but
this
question
that
I'm
trying
to
ask
is
only
because
so
let's
say
currently
in
my
system,
if
git
gc
is
going
to
run,
my
limited
knowledge
is
that
it's
going
to
use
whatever
resources
that
I
have
on
my
system
to
run
that
process.
It's
not
going
to
be
a
single
threaded
process.
C
A
D
So
I
I
was,
I
I
I
think
I
I
don't
know
if
if
it
works
like
that,
but
I
was
thinking
when
I
schedule
a
maintenance
task
using
you
know
the
get
client
plugin.
It
calls
the
underlying
kit
command
line
present
on
your
system,
which
runs
a
separate
process
to
run
the
maintenance
task
and
once
that
maintenance
task
gets
run,
you
get
the
result
into
the
get
client
plugin.
So
that
was
what
I
was
thinking.
C
A
A
D
Here
I
was
worried
about
this
like.
If
I
run
again,
you
know
a
gc
command,
you
know
and
that
you
know
over.
You
know
it
consumes
a
lot
of
resources
on
the
of
the
computer.
Would
that
I
think
that
would
be
a
problem
right
like
it
would
it
would
consume
like
90
percent
of
cpu,
you
know
making
the
computer
a
bit
slow.
D
I
was
not
sure
about
how
do
you
proceed
with
that,
or
is
that
fine?
For
now?
I'm
not
sure.
What
do
you
do
with
that?.
D
Or
you
know,
we
can
read
like
the
frequency
of
how
how
you
know
free
when
exactly
the
system
is
idle,
and
then
you
know
give
a
recommend.
The
administrator
based
on
that.
You
know
whether
you
know
so
that
he
they
can
schedule
the
maintenance
tasks.
D
Although
I
had
another
doubt
regarding
the
kit
caches
so
basically
when
I
create
a
free
style
job
on
the
freestyle
job
or
in
the
jenkins
ui,
it
creates
a
separate
workspace
work
directory
which
contains
the
entire
repository.
Whereas
if
I
use
a
multi
branch
pipeline,
it
only
creates
a
caches
folder.
So
here
we
are
only
worried
about
the
caches
right,
not
regarding
the
freestyle
repositories,
which
is
present
on
the
jenkins
controller.
C
Correct
because
it's
an
it
is
strongly
advised
to
not
have
any
jobs
that
execute
on
the
jenkins
controller
and
so
having
us
perform
any
maintenance
on
jobs
that
the
user
makes.
The
mistake
of
running
on
the
controller
I
think
is,
is
a
would
be
a
bad
pattern.
It's
we
only
want
to
deal
with
caches
that
are
maintained
by
jenkins
core
itself,
not
with
freestyle
jobs
that
the
user
constructed.
C
C
D
Also
yesterday
I
was
you
know
just
messing
around
or
some
other
you
know
trying
to
make
a
implementation.
So
there
I
try
to
save
the
entire
data
which
I've
got
like
the
cron
syntax,
which
are
taken
from
the
user,
and
you
know
stored
it
as
an
xml
file.
Can
this
xml
file
be
changed
by
other
users
on
like
if
that
computer
doesn't
belongs
to
that
administrator?
C
D
D
C
So
ui
based
validation,
is
good
and,
but
is,
is
certainly
necessary,
but
probably
not
sufficient
at
the
low
levels
of
the
api.
We
wanna
we'll
want
to
be
sure
that
we're
we're
using
we're
checking
the
data.
The
schedule
that's
proposed
for
sanity
there
as
well.
C
C
D
So
now
we
are
so
now
we
are
more
favored
towards
parameter.
You
know,
cron
syntax
approach,
so
I
can
you
know
so
I
I
I
I
can
you
know
start
exploring
more
about.
It
is
what
I
was
thinking.
That's
what
was
this?
You
know
this
weekend's
agenda.
You
know
to
fix
the
architecture
so
that
we
can
proceed
on.
You
know
how
we
would
implement.