►
From YouTube: 2020 07 01 GSoC Git Plugin Performance Project
Description
Google Summer of Code project for the Git Plugin Performance improvement in Jenkins
B
Well,
and
if,
if
the,
if
the
continued
results
that
I'm
seeing
hold
throughout
today,
I
think
it
is
safe
for
us
to
merge
your
your
change
for
nine
over
nine
hundred
four
I
think
that
it's
we're
not
we've.
We
still
want
a
global
switch
for
it
to
have
the
ability
to
turn
it
off
as
an
escape
hatch,
but
that
code,
it's
I've,
been
running
it
since
Saturday,
in
all
sorts
of
stress
environments.
In
my
in
my
test,
setup
and
I
found
one
failure
that
was
due
to
my
tests.
B
B
A
bunch
of
tests
I
have
a
thousand
plus
jobs
that
I
run
inside
a
Jenkins
server
that
check
various
conditions
and
those
those
tests.
Some
of
them
are,
in
this
case
we're
badly
written.
They
were
asserting
that
the
message
was
shallow
fetch
and
the
message
is
now
shallow
clone,
because
we
skipped
the
shallow
fetch
that
was
the
redundant
fetch.
Sorry,
your
your
floor,
rishabh
I,
think
we're
ready
to
merge.
I'm
gonna,
give
it
maybe
another
six
or
eight
hours
of
test
and
then
I'm
likely
to
merge.
C
Okay,
so
one
of
the
first
things
I
want
to
discuss
yes,
so
how
much
do
I
want
to
explain
gee,
imagine
how
we
are
using
it.
So
I
was
thinking
of
first
talking
about
how
do
we?
How
have
we
integrated
the
JMS
module
inside
the
gate
line
plug-in
then
I
can
show
the
repository
the
JMS
benchmark,
folder
inside
it
and
I
want
to
ask,
should
I
explore
one
of
the
benchmarks
we
have,
although
we
just
have
15
minutes,
do
you
want
to?
We
don't
want
to
go
there
right
so
so.
B
C
Minutes
is
not
nearly
enough
to
show
source
code,
yes,
okay,
so
no
code,
so
I
just
show
that
this
is
the
module
with
there
we
code
our
benchmarks,
then
the
second
step
is:
how
do
we
run
it
on
the
infrastructure?
So
I
can
show
the
step
we
have
added
in
the
Jenkins
file
and
I.
Think
for
this
we
don't
need
to
show
the
blue
ocean
pipeline.
That's
not
needed,
and
after
that
I
was
I'm
thinking
since
I
have
just
once
again,
I
have
integrated
the
jmh
report
again
inside
the
Jenkins
instance.
C
I
can
show
how
that
is
working.
So
this
is
a
project
which
is
a
pipe
which
checks
out
the
so
I
have
asked
before
this
I
have
a
standalone
project
for
jmh,
and
so
it
has
git
client
plug-in
as
a
dependency
in
the
poem
and
and
it's
it
runs
from
a
different
time,
and
it
doesn't
it's
not
running
the
same
way.
We
usually
do.
We
usually
run
our
benchmarks,
it.
It
doesn't
run
from
me
when
it's
come
and
it
turns
from
Java
jar
as
a
Java
jar.
C
So
so
the
pipeline
is
simple:
it
checks
out
the
standalone
project
and
then
it
builds
it
may
even
clean
install
and
then
it
the
test
stay.
This
testing
stage
of
it
it
contains
I'd
rather
show
that
Jenkins,
it's
okay,
the
testing
stage
of
it
contains
running
the
benchmarks
and
then
I
have
added
the
GMs
report
plug-in
you
just
have
to
add
a
stage
where
you
have
to
point
it
out
to
the
to
the
JSON
file,
which
is
going
to
be
generated
by
the
benchmarks.
C
So
once
you
have
that,
there's
this
another
tab
here,
which
is
called
jmh
report
and
once
we
open
it
so
for
the
bench
so
I
I
just
ran
one
benchmark
because
I
wanted
quick
results,
and
this
is
how
it's
going
to
show
visually
it'sit's
the
same
website.
It's
basically
the
website.
We
use
to
visualize
the
benchmarks.
There
are
some
options
as
well
switch
the
scale.
If
we
really
want
to
see
small
want
to
increase
the
it
and
then
so,
if
you
have,
multiple
benchmarks
is
going
to
comparatively
show
multiple
benchmarks
as
well.
C
I
was
thinking
too,
when
I
demo,
this
tomorrow
I'm
going
to
put
multiple
benchmarks
here
so
that
it
looks
a
little
better.
So
so
these
are,
the
three
I
could
say:
stages
of
how
we
are
working
with
benchmarks,
with
gate
line,
plugin
and
then
I
think.
The
second
point
pointed
and
so
I
was
thinking
of
removing
some
of
the
slides,
because
it's
mice,
currently,
my
presentation
is
about
2021
slides
and
it
might
increase
as
I
work
upon
it
tonight
as
well.
So
I
would
want
to
reduce
it
because
I
just
have
40
minutes
for
it.
C
Then
I
was
thinking
of
skipping
the
parameters
or
the
the
parameters
I
wanted
to
discuss.
More
I
was
thinking
to
first
come
to
the
results
and
the
graphs
and
the
inferences,
and
maybe
if
people
are
interested,
they
would
ask
questions
what
are
imperative
or
should
I
explain
them
before
going
to
the
results.
What
do
you?
What
do
you
think
guys?
What
do
you
think
I.
B
C
A
C
So
high,
how
I
was
thinking
of
showing
the
result
was
for
us
to
show
so
I
can
I
can
include
the
parameters
or
I
could
first
show
the
result
here
that
okay,
this
is
the
results,
we've
got
mid,
git
fetch
benchmarking
did
fetch
and
then
I
could
explain.
Okay,
this
this
machine
was
was
running
on
this
version
of
gate.
So
do
we
want
to
go
to
the
details
where
we
I
explained
the
version
of
gate,
the
platform
and
Java
8
or
Java
11?
Where
am
I
running
it?
So
that's,
that's
something.
C
I
was
I'm,
not
sure.
Maybe
I
can
just
write
it
here
and
not
speak
about
it.
That's
also
something
I
can
do
so
the
results
so
I.
What
I've
done
here
is
what
you
can
see
here
is
a
clear
difference
in
the
behavior
of
GA
gates
performance,
it's
kind
of
so
what
you
see
here
is
with
the
first
graph,
is
the
performance
benchmark
on
Mac
OS,
my
local
machine
and
the
second
benchmark
is
on
a
sent
to
s7
machine.
C
So
the
first
result
you
see
what
we
see
here
is
that
so
there's
this
intersection,
the
repository
size
before
that
intersection
is
around
less
than
5
MB,
and
what
we
see
is
something
we've
already
discussed,
that
J
gate
is
performing
better
than
gate.
Y
axis
is
the
average
time
execution,
and
this
all
of
this
is
in
milli,
microseconds
per
Mille
seconds
per
operation,
and
so
what
we
see
is
for
a
smaller
sized
repository.
Jake
is
performing
battle
and,
after
a
point
one,
we
could
call
this
as
a
decision
variable
for
our
improvement.
C
It's
a
size
where
we
decide
we
will
switch
the
implementations
Jake.
It
starts
to
exponentially
degrade
in
performance,
as
we've
seen
so
the
point
this
is
a
visual
graph.
Quantitatively
there's
for
a
300
sized
repository.
There
is
a
1.5
when
it's
difference
between
jagged
and
gauge
performance,
so
so
the
first
graph,
you
can
clearly
see
how
the
nature
is
changing,
as
the
repository
size
increases.
The
second
graph
also
the
nature
is
same
though
the
intersection
point
comes
at
a
later
stage.
C
So
the
reasons
when
we
discuss
so
we
have
discussed
both
of
these
results
in
one
of
our
previous
meetings
and
the
reason
which
we
could
find
out
was
with
my
local
machine.
There
could
be
disturbances
which
are
which
would
not
be
present
in
this
sent
to
s7
machine.
This
is
a
master
node
in
a
cluster
so
and
it's
not
people
is
not
doing
anything
at
the
time.
C
I
think
it
was
a
freshly
installed
cluster,
so
it
is
not
doing
anything
at
the
time,
so
so
the
differences
in
the
result
could
account
to
that
reason,
and
my
local
machine
was
also
profiling
at
the
time
doing
a
lot
of
other
things
as
well.
So
this
is
how
I
was
thinking
this.
This
was
I
was
thinking.
This
is
the
most.
This
might
be
the
most
effective
way
to
show
the
results,
because
we
we
can
show
that
to
change
in
nature
of
performance
when
it
comes
to
J
gates,
relationship
with
the
size
of
repository.
C
So
instead
of
bar
graphs,
this
this
seems
like
a
better
way
to
visualize
the
change
and
I.
Think
after
that,
we'd
get
a
less
remote
should
I
talk.
I
can
talk
about
it.
Right,
get
a
lesson:
what
we
have
no
differences
between
the
implementations,
so
okay,
so
I
think
I
have
discussed
that
I'm
going
to
change
the
graph
I
prepare
the
graphs.
I
am
the
place
that
way.
So
it's
basically
the
same
thing,
but
with
a
clearer
visualization
after
this
one
more
issue.
C
I
have
is
that
with
fixing
redundant
fetch
the
when
we
were
estimating
the
impact
of
the
performance,
I
have
run
several
multiple
other
benchmarks
to
confirm
that
to
confirm
from
those
benchmarks
that
there
is
less
than
a
seconds
difference
between
the
redundant
fetch
and
the
initial
fetch
yeah.
The
addition
of
the
redundant
fetch
is
adding
less
than
a
second
in
flow
for
a
repository
sized
5
MB
to
a
repository
sized
800
MB
as
well.
I
haven't
seen
more
than
a
seconds
difference
which
is
so-so
yeah,
so
I
was
thinking.
C
This
is
kind
of
cheating,
maybe,
but
so
with
my
profiling
results.
I
have
a
very
I
think
here.
I
can
show
okay
I
when
we
fix
the
redundant
fetch
issue,
the
second
fetch
so
I,
the
red
box.
You
can
see
the
second
fetch
it
takes
around
what
endpoint
six
seconds,
so
we
so
in
profiling.
We
could
see
that
by
after
after
applying
the
fix,
we
would
remove
this
much
amount
of
time
from
the
total
execution
of
the
check
out
step.
But
then
it
comes
to
benchmarks.
C
B
Think
for
me
saying
that
the
data
does
not
give
is
not
supported,
does
not
say
that
there
is
a
consistent,
dramatic
improvement.
That's
a
true
statement,
however:
removing
redundant
operations.
We
have
reports
from
the
field
from
from
users
that
the
redundant
operation
was
expensive
for
them.
We're
trusting
that
and
trusting
that
by
removing
a
redundant
operation,
we're
probably
not
going
to
harm
performance.
B
I
wouldn't
worry
about
trying
to
justify
it
with
data
at
this
point,
because
I
already
tried
that
route
and
the
users
kept
coming
back
to
me
saying
mark
I,
don't
care
that
it
is
very
cheap
for
you
to
do
an
incremental
fetch
the
second
time
it's
not
cheap
for
me
and
I
can't
refute
what
the
users
say
right,
I
I,
their
bare
experience
is
real
and
they
ran
their
numbers
and
they
said.
Look
that
second
fetch
is
costing
me
this
much.
C
D
B
Okay,
yeah
you're,
presenting
to
a
bunch
of
software
people
the
word
redundant
is
is
is
one
of
those
evils
and
software
people
write.
They
say
you
read
under
this
bad,
get
rid
of
it,
so
you
don't
even
have
to
justify
getting
rid
of
it.
You
just
say
it's
redundant,
we
proved
it's
redundant
and
we
removed
it.
Okay,
yeah.
D
And
you
can
just
say
like:
there
are
a
lot
of
people
in
there.
I,
don't
know
how
many
people
in
the
field,
but
there
are
folks
in
the
field
who
were
saying
that
this
was
causing
them
significant
increased
in
time,
and
so
we
forbade
it.
Okay,
we
validated
that
it
wasn't
necessary
and
it
was
Factory
done
and
we
should
be
no
harm
and
remove
it
for
everyone
else.
C
Sounds
great
okay,
after
that
this
is
also
so.
This
is
one
of
the
benchmarks
I
that
I
try
to
perform
the
same
operation,
but
with
the
remote
repository
instead
of
using
a
local
git
repository.
This
is
a
gift,
fetch
alteration,
and
so
this
is
I
think
it's
kind
of
an
obvious
fact
that
if
we
include
network
while
we're
fetching,
it's
going
to
increase
the
time
of
that
operation,
so
so
should
I
show
this
result.
C
It
just
shows
that
without
network
the
performance
of
the
individual
implementations
for
Jenkins
repo,
which
is
around
360
MB
and
for
who
variable,
which
is
around
for
70
MB-
and
we
just
so
the
graphs-
they
show
that
without
Network
it's
stupid,
so
we
have
a
there's
an
increase
in
the
performance
overhead.
Then
we
add
Network
to
the
equation
of
bench:
benchmarking,
the
operations
so
and
it's
it's
for
both
of
them
for
gate
and
for
jacott
as
well,
so
I
I
also
wanted
to
ask,
should
I
show
this.
B
Don't
I
I,
don't
know
that
you're
gonna
get
any
benefit
to
the
audience
telling
them
the
networks
are
slower
than
local
access
to
districts.
I
think
I
think
they
if
they
don't
understand
that
they'll
understand
it
soon
enough
and
so
I'm
I'm
surprised
that
the
differences
between
those
between
the
with
and
without
network
are
not
larger
than
they
are
so
so
that,
but
that
I'm
not
interested
in
exploring
it
probably
says
you're
you're
somehow
nearer
University
in
India,
and
they
have
really
great
connections
to
local
local
caches
and
that's
wonderful,
I'm
glad.
C
Ok
and
the
results
might
be
a
little
for
the
for
the
network,
one
they
might
be
a
little.
They
might
not
be
accurate,
because
the
error
rate
is
too
huge
whenever
I
use
Network
to
benchmark
it.
Just
here
the
error
increases.
So
that's
also
something
after
that.
So
the
next
thing
I
could
talk
about.
The
last
thing
could
be:
what
do
we
want
to
do
in
the
Phase
two,
and
so
the
first
thing
is,
of
course,
are
the
estimated
you
are
thinking
about
the
heuristics
which
we
are
going
to
use.
C
B
Think
you're
saying
that
we've
we've
realized
that
get
roped.
We
have
data
that
says
git
repository
size
matters,
but
we
cannot
always
determine
the
size
of
a
repository.
Therefore,
we
we
believe
the
next
step
is
apply
some
heuristics
to
decide
how
big
this
repository
is
and-
and
we
don't
always
had
the
repository
locally,
therefore-
and
that
that
would
be
enough
for
me
as
the
description-
why
why
are
we
using
doing
anything
about
repository
size,
because
the
graphs
in
the
earlier
part
of
the
presentation
approved
repository.
C
C
B
Your
second
line
item
there
reminds
me
that
there
are,
there
are
known
challenges
or
potential
challenges,
hiding
in
multi
branch
pipelines
and
organization
folders
that
may
be
related
to
locking,
and
so
so
my
my
repository
with
a
hundred
and
fifty
or
a
hundred
and
seventy
five
branches
sometimes
spends
a
lot
of
time
waiting
for
a
lock
on
the
on
the
cache
on
the
master
on
the
master.
So
that
may
in
fact
turn
into
a
very
interesting
angle
for
performance
that
isn't
jagged
get
specific,
but
is
very
much
still
impacting
a
user.
Okay.
C
C
B
Don't
know
no,
no
just
saying
that
you
know
no
just
saying
that
you're
broadening
it
is
already
enough.
Okay,
that's
that's.
There
is
more
to
investigate,
is
already
statement
enough
for
me.
You
know
way.
Should
you
describe
which
specific
things
is.
We
don't
know
that
that's
just
me,
making
wild
speculative
guesses
I
mean.
A
C
C
A
E
C
Actually
one
one
thing
I
forgot
to
mention
is
that
this
is
not
the
scale
here
is
logarithmic.
It's
not
linear,
I
switched
it
because
I've
been
linear,
the
behavior
was
not
as
obvious
as
it
was
with
so
I.
Think.
I
should
mention
that
before
here
explaining
because
the
quantities
you
see
on
the
y-axis
they're
not
actually.
B
B
Yes,
ma'am
is
there
one
more
that
we
need
to
ask
for
availability
of
the
the
the
jmh
plug
in
on
CI
jenkins
that
io?
So
we
can
visualize
these
things,
because
I
don't
think
it's
there
right
now
and
I
think
he
would
help
us
particularly
given
that
there
are
resources
on
CI
de
jenkins
that
I
hope
that
are
not
available
to
you
elsewhere.
I,
it's
got
a
system
390
mainframe
from
IBM.
C
B
And
I
think
I
can
show
you
a
Jenkins
file
that
already
uses
them.
So
I
have
an
example
that
you
can
use
as
a
reference
when
you're
ready
to
say
I
want
to
run
on
PowerPC
and
on
arm
64
graviton
and
on
system
390.
When
you,
when
you
reach
that
point
where
you're
ready
I've
got
a
simple
Jenkins
file
that
already
shows
it.
Okay,.
C
C
Okay
and
do
that,
okay,
so
one
more
thing
I
explored
during
these
days,
which
was
what
heuristics
the
possible
nearest
expel
using
or
the
repository
size
estimation
and
with
the
we'll
gonna
be
one
of
the
ways
we
are
trying
to
determine
the
size
that
is
using
the
API
is
the
REST
API
is
provided
by
the
providers
of
gate.
So
what
I
want
to
ask
is
that,
first
of
all,
with
get
plugin
I've
seen
a
lot
of
browsers.
So
those
are
those
are
the
providers
which
are
implementing
the
gate
SVM
right.
B
That
is
sort
of
what
the
browser's
are
is
those
are
simple
transformations
from
a
repository
URL
to
a
diff
URL,
so
a
repository
posit
for
a
URI,
because
it's
not
always
a
URL,
a
repository
URI
that
may
be
get
at
github,
comm,
:,
marquis,
weight,
slash,
get
client
plugin
that
get
needs
to
be
mapped
to
something
and
the
mapping
to
is
HTTP
something
something
with
parameter
replacement.
So
those
browser
things
are
just
ways
to
view
changes.
So
they're
not
they're,
not
much
more
than
that,
in
fact,
they're
not
getting
more
than
that.
So.
C
When
so,
when
I'm
writing
the
rules
so
to
get
the
sizes
of
a
repository
hosted
in
gate,
github
gate
lab
how
many
providers
do
I
considered
violent
doing
that?
How?
How
do
we
know?
Where
is
the
repository
host?
Well,
you
would
have
to
figure
that
out,
but
even
if
we
figure
that
out,
there
are
I,
don't
know
possibly
more
than
five
providers
who
would
be
using
this
functionality,
so
we
would
need
all
of
them
or
or
how
should
we
think
about
that?
B
You
would
provide
a
basic
implementation
in
the
gate
plug-in
and
it
would
probably
only
use
command
line,
get
and
have
all
sorts
of
flaws,
because
it's
only
using
command
line
yet,
but
then
you
might
go
to
the
git
lab
plug-in
or
to
the
github
plug-in
and
implement
that
extension
point
and
use.
The
REST
API
is
from
the
gitlab
plugin
to
make
calls
to
get
lab
and
answer
the
question
better
than.
If
then,
it
can
be
done
from
from
the
get
plugin.
B
B
E
C
Okay
and
I-
guess:
okay,
so
I'm
going
to
explore
how
to
create
an
extension
point
and
how
we
could
do
that.
The
second
question
I
had
was
how
how
would
be
ranking
the
heuristics.
Are
we
going
like
the
use?
The
first
one
is.
If
we
find
out,
if
you
find
a
local
cache,
then
it's
that's
the
best
thing
that
we
get
the
size.
The
second
is,
the
API
is
B
plus
the
API
is
then
the
third
is
the
last
one
is
TLS
remote.
C
B
B
That
seems
reasonable
to
me,
I've
seen
in
in
other
places
things
like
a
popularity
weighting
where,
where
they,
the
heuristic,
provides
its
own
assessment
of
its
waiting
and
and
you
ask
the
implementer,
please
provide
your
your
assessment
of
the
reliability
of
this
heuristic
as
a
numeric
value.
For
instance-
and
you
say
the
the
value
of
the
here-
the
reliability
of
the
local
cache
is
absolutely
one.
You
know
it's
it's
it's
flawless,
there's
not
much
better
than
that.
B
C
Okay
and
the
last
question,
I
guess
with
the
elastic,
is
with
git
LS
remote.
How
do
we
so?
How
do
we
reach?
How
do
I
reach
to
a
point
where
I
can
decide?
Okay,
this
is
a
size
where
I
want
so
I
want
get
CLI
get
when
there's
a
large
size
repository,
let's
say
300.
So
how
do
I
correlate?
The
number
of
references
to
this
size
should
I
just
have.
C
Maybe
a
lot
of
I
should
take
a
lot
of
depositories
and
perform
gate
LS
remote
on
it.
Maybe
you
know,
create
a
benchmark.
I
have
not
a
benchmark,
maybe
a
GJ
unit
test
where
I
just
create
to
a
lot
of
repositories,
collect
a
lot
of
data
and
then
average
through
to
get
some
experimental
results,
because
otherwise
it's
more
of
a
because
I'm,
not
sure
if
there
is
a
direct
correlation
between
the
number
of
sizes
and
the
size
of
the
repository.
B
I
bet
I,
like
the
idea
of
sampling,
I,
think
that's
a
very
wise
thing
to
say:
I'm
gonna
go
sample
repositories
to
test
to
test
any
one
of
those
heuristics
yeah.
You
definitely
should
not
describe
those
things
in
tomorrow's
presentation,
but
as
we
continue
our
discussions
yeah
we
should
we
should
evaluate
I.
It
may
be
that
you
ultimately
will
decide
that
LS
remote
is
such
a
poor
heuristic
to
just
discard
it
and
that's
perfectly
okay
to
to
say
look.
There
is
no
correlation
that
we
could
rely
on
at
all.
D
I
mean
I,
guess
the
thing
I,
usually
try
and
do
with
presentations
is
think
about
how
many
slides
I
have
versus
how
much
time
I'm
going
to
need
to
spend
on
each
slide.
Be
careful
about
adding
too
many
slides,
because
you'll
will
invariably
speak
a
little
bit
differently
when
you're
presenting
in
front
of
people
to
you.
Okay,
okay,.