►
From YouTube: 2020 06 19 GSoC Git Plugin Performance Project
Description
Google Summer of Code office hours of the Jenkins Git plugin performance improvement project. Topics include performance measurement results for the redundant fetch removal, repository size estimation heuristics, and approval for today's release of git client plugin 3.3.0 and git plugin 4.3.0.
A
A
So
today,
I
tried
profiling.
The
gate
plug-in
check
out
step
with
a
much
larger
repository.
It's
it's.
A
public
repository
of
a
framework
called
seed,
app
it's
little.
It's
used
for
data
analytics
and
it's
of
the
size
of
almost
1gb
the
size
of
the
repository
I
have
it's.
So
it
has
a
lot
of
commits
fifty
thousand
1,000
branches,
so
so
very
big
repository.
So
so
what
happened?
So?
A
How
was
I
profiling
and
analyzing
the
results
so
I
sorted
the
threads
from
from
the
time
when
it
starts
to
rain,
increasing
order,
and
so
I
know
the
first.
So
what
I
know
is
that
with
the
fakes
I
would
just
see
one
fetch
call
and
without
the
fix
I
would
see
to
fetch
ball
one
after
sometime
from
the
first
one,
so
with
the
first
one.
A
B
A
B
A
And
whatever
tests
I
provided
related
to
profiling,
they
will
over
the
network.
They
were
not
really,
they
would
not
throw
my
local
gate,
seven,
okay,
so
without
the
fakes,
the
first
gate
fetch
call
it.
Okay,
the
first
gate
fetch
call
is
taking
means
I'm,
not
sure.
Why
is
it
hitting
less?
That
might
be
because
of
the
network,
but
the
second
gate
fetch
call
is
just
taking
ten
seconds
more
okay.
A
So
when
I
was
saying
that
there
is
a
difference
of
two
minutes,
what
I
think
I
did
wrong
was
with
the
fakes
that
I
did
not
wipe
out
the
workspace
before
building
the
second
eye.
It
was
a
constitutive
build
of
the
results.
I
would
so
the
build
was
a
second
build,
so
maybe
I
did
not
vibe
out
the
workspace,
and
that
is
why
the
first
get
fetched
the
time
taken
was
considerably
less
from
the
come
from
the
build
I
was
comparing
without
the
fix,
so
I
may
need
to
test.
A
B
B
So
you
have
saved
you
bye-bye
if
we
would
remove
the
redundant
fetch
and
even
if
only
save
ten
seconds
out
of
twenty
minutes,
that's
that's
still
a
win
and
the
the
crucial
thing
that
I've
seen
at
some
large
installations
is.
They
have
their
local
bitbucket
server
and
they're
overloading
that
bitbucket
server
with
calls,
because
all
of
their
agents
are
calling
in
to
this
bitbucket
server
and
by
cutting
out
by
removing
one
of
the
calls
you
have
cut
in
half
the
load
that
we're
applying
to
that
bitbucket
server.
So
so,
yes,
it
may
it
may
be.
B
It
may
be
a
smaller
number
in
terms
of
of
the
actual
impact
on
a
specific
job,
but
by
cutting
in
half
the
number
of
times
we
make
a
request
to
that
central
server.
We
may
significantly
improve
performance
for
some
of
these
people
who
are
who
are
very
attached
to
their
large
repositories.
I,
remember
a
previous
employer
where
I
had
a
20
gigabyte
repository
and
every
single
clone
was
was
just
terribly
expensive.
So
so
this
is.
This
is
a
great
excuse
to
save
some
time.
So
don't
be
shy!
Good
thing,
you
learned
that's
great
I.
A
B
B
It's
it's
up
to
you,
you
get
to
choose
and
and
I
think
I
think
there's
there.
You've
discovered
fascinating
things
already
here
that
hey
ii
requested
this
one
was
much
much
faster
and
that
matches
with
results.
I
had
seen
when
I
had
done
benchmarking
some
years
ago
from
bug
reports
and
Jenkins,
where
they,
where
users
said
hey.
The
second
call
is
prohibitively
expensive
and
my
attempts
to
duplicate
it
as
prohibitively
expensive,
I'll
fail.
I
saw
it
had
cost,
it
wasn't
free,
but
I
didn't
see
the
you
know.
A
And
one
more
thing:
I
only
test
is
the
number
of
commits
is
that
making
a
difference
rather
than
the
size
only
so
right
now,
I
was
just
looking
at
the
size
of
the
repository
so
with
with
the
results
I
showed
yesterday
there.
This
is
the
repository
the
Samba
repos
inside.
It
has
a
lot
of
it's
way
more.
The
commits
is
almost
50%
more
with
the
current
repository
I
have
so
maybe
I'm
not
sure,
but
this
something
III
test.
B
A
B
Heard
that
called
sensitivity
analysis
to
decide
which
parameter
is,
is
the
most
impactful
to
some
change,
and
it's
certainly
very
helpful
to
your
determining
which
heuristic
you
should
use
for
repository
size.
If
number
of
commits
is
the
important
thing
then
asking
for
the
size
on
disk
is
not
nearly
as
relevant
as
as
other
other
queries.
A
B
E
A
So
so
far,
while,
while
I
have
profile
these
operation,
the
git
fetch
operation
the
redundant
one
I.
The
only
thing,
in
my
mind,
was
the
size
of
the
files,
but
this
is
the
realization,
as
I
was
as
I
was
showing
the
results.
This
is
the
realization
I
had
just
had
that.
I
should
also
look
at
the
number
of
commits
and
the
number
of
branches
while
I'm
doing
so
so
so.
This
is
the
next
thing,
sensitivity
analysis.
B
Now
now,
in
terms
of
your
profiling,
is
it
would
it
be
lower
cost
to
do
that
outside
of
JFR
with
simple
timestamp
instrumentation?
Is
that
reliable
enough
for
you
or
do
you?
Are
you
finding
JFR
is
so
helpful
that
you
just
assume
be
inside
JFR,
I,
I,
don't
know
your
experience
there
as
JFR
been
helpful
for
you
in
that
regard.
You've
liked
java
flight
recorder.
A
Personally,
mark
I
tried
using
system
dot
nano
time
to
mark
the
difference
with
me,
but
I.
It
was
giving
me
very
undeniable
results
when
I
was
positively
testing
the
builds
taking
the
results,
but
with
the
JFR.
The
one
thing
I've
seen
is
that
the
results
are
consistent
for
the
same
repository
for
the
same
experiments
when
I'm
when
I'm
launching
consecutive
builds
fed
the
thread.
The
time
duration
for
the
thread
for
one
gate,
fetch
call
is,
is
nearly
the
same
so
so.
A
A
Probably
do
both
I
could
just
so
if
I
put
system,
dot
nano
time
and
I
know
about
the
difference,
so
I
can
see
that
on
the
build
log
also
I
can
use
profiling,
so
I
could
do
both
if
I
have
to
check
the
difference.
If
there
is
a
difference
without
using
EGFR.
So
you
what
you
saying
is
that
is
JFR,
adding
a
production
and
overhead
to
the
performance.
B
Actually
I
was
more
I,
wasn't
worried
about
jf
ARS
overhead
as
much
as
I
was
worried
about
is
it?
Is
it
the
simplest
way
for
you
to
do
what
you're
doing?
If
it's
the
simplest
way,
then
that's
that's.
That
gives
it
value
immediately
because
you're
trying
to
explore
and
understand
so
whatever
whatever
works
best
for
you
do
that.
F
A
The
next
thing,
so
we
discussed
yesterday
that
we
are
on
Wednesday
that
we
are
going
to
interactively
test
the
fix
we
have
done
for
the
gate
redundant
fetch
so
for
that
I've
started
doing
that,
I've
done
it
for
some,
so
I.
The
first,
the
scenario
I
wanted
to
take
was
all
the
use
cage
cases
which
are
related
to
somehow
the
structure
of
the
repository,
which
would
bring
a
difference
in
the
structure
of
repository
have
after
we
choose
that
behavior,
which
could
be
related
to
the
size.
A
Ref
specs
commit
history,
so
so
I
I
have
tested
interactively
this
fix
with
advanced
with
the
advanced
clone.
Behavior
I
have
chosen
shallow
clone
with
the
depth
and
then
so.
How
do
I
test
it?
I
go
to
the
workspace
and
I
see
the
log
I
see
the
history
I
see
the
head:
where
is
it
attached
for
both
of
them
and
and
of
course,
the
site?
Of
course,
the
size
of
the
repository
so
with
these
parameters
IMC
same
results
for
both
this
is
how
I'm
basically
testing.
For
any
case.
A
The
second
test
scenario
I
took
was
to
check
out
for
a
specific
branch.
I
wanted
to
see
if
I'm,
resulting
with
the
same
branch
is,
that
is
the
head
attached
to
the
same
branch
when
I
check
out
in
the
workspace
with
the
fix
and
without
the
fix-
and
it
was
the
same
so
so
I'm
going
to
take
more
cases.
So
it's
I
have
no
preference
with
the
behaviors.
I
mean
I'm
choosing
right
now,
but
these
were
the
cases.
A
B
I
would
drop
sparse
check
out
because
it
is
entirely
a
workspace
operation.
So,
okay,
so
it
doesn't
change
the
quantity
of
history
we
retrieve
all
it.
Changes
is
the
checkout
operation
and
what
your
focus
is
on
is
the
fetch
operation.
So
you
don't
need
to
spend
time
on
sparse
checkout.
If,
if
we
break
it,
we
broke
it.
For
another
reason:
okay,.
A
A
The
fetch
results
we've
discussed,
so
you
change
the
parameters
of
my
profiling
a
little
bit
and
then
see
what
kind
of
results
we
have
so
for
the
heuristics
we
we
were
discussing
to
calculate
repository
size,
so
I.
So
the
first
thing
we
were
talking
about
was
to
use
be
talk
to
before
that,
so
I
was
I
was
exploring
a
little
bit.
How
is
the
repository.
A
B
A
I
have
I
actually
I
compared
two
repositories
with
this
operation:
gate,
LS,
remote
and
so
in
the
right
side
you,
the
gate,
LS
remote
has
done
for
gate,
client
plug-in
and
in
the
left
side
it's
done
for
seed
AB.
The
repository
I
just
showed
there's
way
much
more
branches
and
comments,
so
so
I
think
as
I
scroll
down
it's
it's
clear
that
just
once
I
can
I'm
going
to
show
you
the
difference.
So
this
is
the
seed
app.
The
left
side
is
the
seed
of
one,
and
you
can
see
the
list
goes
on
and
on.
A
The
point
is
that
we'll
have
much
more
references
for
larger
sized
repository,
so
it
is
safe
to
say
that,
yes,
if
we
use
get
LS
report,
we
could
approximately
assume
that
okay,
this
is
a
large
sized
deposit,
your
small
sized
repository,
but
my
concern
hair
is
with
so
what
I
did
was
that
I
pulled
up
some
of
the
largest
repositories
I
could
find
on
github.
These
might
not
be
the
largest
one,
but
so
the
famous
one
and
what
I
saw
was
so
for
this
repository
the
seed
app
one.
So
we
have.
A
This
is
a
1gb
file,
size
repository.
It
has
approximately
1,000
branches.
Now,
let's
go
to
the
next
depository.
This
is
the
vs
code,
Microsoft
repository.
It
also
has
almost
a
1gb
size
file
size
but
way
less
branches,
50%
less.
Then,
let's
go
to
kubernetes.
It's
1
GB
repository,
but
just
41
branches
and
and
with
I
also
tried
with
that
and
Sybil.
It's
also
around
it's
900
or
800,
but
44
branches,
ruby,
ruby
was
was
very
less
I
think
it
was
around
600
700,
but
22
ranch's.
So
so
what
I
can
I
could
see
here?
A
Is
that
get
LS
remote
just
seeing
the
branches
might
not
be
the
best
way
to
estimate
the
size
of
a
repository.
So
maybe
we
need
to
make
a
combination
like
maybe
2
or
3
heuristics,
which
could
calculate
estimate
the
size
of
the
repository
and
I
was
also
so
I
I
was
searching
the
internet
to
find
a
good
via
to
find
the
size
of
the
repository
without
cloning.
There
is
a
repository,
so
it
turns
out.
According
to
my
search,
it's
not
that
simple.
If
we
have
a
clue,
it's
pretty
easy
I
think
we
have.
A
We
have
get
provides
us
a
functionality.
There
are
ways
we
could
for
sure
know
the
size
of
the
repository,
but
without
the
clone
I
think
one
one
sure
way
was
what
Fran
were
Justin
suggested
for
github
and
get
lab.
We
have,
they
have
exposed.
Api's
I
have
tested
those,
so
we
can.
We
can
get
the
size
from
a
simple
request:
REST
API,
but
with
that
also
I,
think
the
concern
is
that
we
would
have
users
with
bitbucket
or
maybe
individual
gate.
A
Servers
or
Katie
are
a
lot
more
services
which
provide
did
SEM
so
so
that
might
also
be
not
the
complete
solution,
I'm.
Actually,
the
more
I
research
on
this
thing.
It's
we
might
not
have
one
single
way
come
to
completely
figure
out
the
size
we
might
need
to
use
multiple.
We
might
have
need
to
have
multiple
actions
for
multiple
scenarios
and
then
create
a
class
like
that,
and
also
so
one
more
thing
I
was
I
was
searching,
was
do
we
before
before?
A
Could
we
do
something
like
this?
Once
we
have
build
built,
the
repository
posit
the
build
the
repository
for
the
first
time
using
gate.
Plugin
do
I
think
we
cache
if
we
have
the
workspace
somewhere.
I
am
Not
sure
I'm,
actually
not
very
well
aware
of
the
management
of
getting
again
between
master
and
agent.
I.
Think
I
think
that
we
have
our
workspaces
on
the
agents,
and
so
if
we
have
the
workspace.
A
So
what
I'm
trying
to
say
here
is
that
for
once
we
will
not
be
able
to
improve
the
performance
for
the
first
time,
but
for
the
consecutive
builds
or
maybe
once
we
have
some
information,
then
we
will
be
able
to
use
the
size
or
some
information
from
that
build
from
that
workspace
and
then
use
performance
enhancement
in
that
way.
Also
before
I
can
be
just
one
last
I
think
it's
probably
a
stupid
suggestion,
but
can
we
ask
the
user
for
the
size
of
the
repository
highly
configuring?
A
The
gate
plug-in
I
think
that
that
will
be
the
easiest
thing
to
do,
but
I'm
not
sure
if
so
one
thing
I
have
experienced
personally,
is
that
if
I
have
a
github
repository,
I
think
there's
no
direct
way
to
know
the
size
of
the
repository.
There's
no
I,
don't
see
it
on
the
UI
of
my
repository,
so
I'm
not
sure.
If,
if
that's
something,
we
would
like
to
place
that
responsibility
on
the
user,
I
haven't
seen
it
in
the
gate
plug-in
the
current
behavior,
but
it's
just
something:
I
was
thinking
about
so
yeah.
Yes,.
B
So
I
think
what
you
described
is
exactly
the
nature
of
heuristics
right.
Fallible
rules
are,
in
fact
the
word.
Fallible
is
so
strongly
emphasized
in
that
phrase
of
a
fallible
rule
right
we're
trying
to
find
something
that
we
know
is
imperfect
and
we
know
it
cannot
be
perfect,
because
the
information
we
need
is
not
not
available
always
to
us
locally.
So
I
think
you're
on
the
right
track,
keep
keep
working
on
those
topics
and
yes,
Ellis
remote
is
and
I'm
glad
that
you
found
and
confirmed
Ellis
remote
is
probably
the
weakest
of
the
heuristics.
B
B
Now
now
the
the
I
like
the
idea
of
asking
the
provider
for
the
number
of
commits
or
for
whatever
the
provider
gives
as
size
hints
right
seems
like
number
of
commits
is
probably
one
of
the
size
hints
branches
is
another
size
hint
on
github.
This
notion
of
releases,
which
is
really
number
of
tags,
is
probably
another
size
hint
and,
and
each
of
those
things
could
be,
could
be
part
of
that.
A
Okay,
okay,
maybe
if
I
provide
an
additional
behavior
like
I
was
talking
before.
So,
if
someone
is
choosing
to
improve
the
performance
of
gate
plug-in,
then
they
can
fill
in
all
of
those
details,
but
then
I
think
the
biggest
the
biggest
disadvantage
is
there
that
the
adoption
of
this
whole
feature
would
be
very
slow
if
you're
providing
an
option
like
that.
A
B
And
it
doesn't
stop,
stop
you
as
part
of
this
project,
from
contributing
pull
requests
to
the
github
plug-in
and
to
the
bitbucket
plug-in
or
to
the
github
branch
source,
whichever
layer
it
is.
That
makes
sense.
You
are
welcome
to
go
into
any
plug-in
necessary
to
to
do
the
to
do
that.
It
hit
the
goal,
but
but
it
does,
it
does
add
an
extra
layer
of
complexity
as
soon
as
you
start
doing
more
and
more
plugins.
A
B
Don't
know
I
would
be
surprised
if
they
gathered
that,
but
the
I
know
that
the
github
plug-in
does
have
an
API
that
it
uses
and
that
that
API
is
quite
rich
and
capable
I.
Don't
know
that
the
size
information
is
already
in
the
API,
but
I
know
some
people.
We
could
ask
about
details
of
that
API.
If
that
would
help
Liam
Newman
is,
is
one
that
I'm
sure
if
we
just
mention
him
on
the
get
plugin
get
her
chat,
he's
happy
to
come
answer
questions.
A
B
A
A
B
And
give
it
you,
okay,
can
you
do
the?
Can
you
do
the
the
exploration
of
the
sensitivity
piece
as
part
of
your
testing
activity
that
you
were
doing,
because
could
you
get
double
duty
just
by
while
you're
doing
these
tests
interactively
checking
redundant,
fetch
behaviors
watch
the
numbers
to
see
hey?
What
did
this?
What
impact
did
it
have
that
I
chose
this
repository
rather
than
that
repository
actually.
B
If
I
and
I'm
just
borrowing
this
meeting,
because
we
got
Fran
here,
Justin
you're
welcome
to
chime
in
an
armed
car
as
well,
but
Fran
and
I
are
Co
maintainer
czar,
the
plug-in
so
Fran.
My
proposal
is
to
release
get
plugin
for
3-0
and
get
client
plug-in
three-30
today
with
the
contents
of
the
current
master
branches
that
won't
give
us
the
symbols
capability
that
Carl
Schultz
has
been
working
on,
because
I
found
a
compatibility
problem
there.
It
surprised
me
and
I
just
don't
want
to
risk
it.
B
B
All
right
so
so
Rishabh
for
your
project.
What
this
will
mean
is
that
that
the
distance
from
the
four-30
release
to
your
changes
will
be
much
less
than
if
we
had
had
you
working
on
something
based
on
four
to
two,
so
you've
been
working
on
the
master
branch.
Therefore
it
shouldn't
change
your
experience
dramatically,
but
it
was.
It
was
one
where
I
want
to
be
sure
that
when
we
bring
your
changes
into
a
release
that
that
is
the
dominant
portion
of
that
release,
rather
than
all
the
other
noise.