►
From YouTube: 2020 05 27 GSoC Git Plugin Performance Project
Description
Google Summer of Code 2020 git plugin performance improvement project discussions from May 27, 2020. Topics included discussions on alternatives for benchmarking, locations to perform the benchmarks, and how to best approach profiling and Java Flight Recorder data collection.
A
We
like
is
we
like
to
record
this:
recording,
has
started:
hey
Justin,
hey
how's,
it
going
Nathan
I,
know
Rochelle,
okay,
so
Rishabh
we're
past
our
start
time.
So,
if
you
want
to
share
your
screen
and
we
use
the
screen
to
work
through
the
agenda,
I
see
you've
assembled
in
a
gem
but
agenda
for
us.
Let's,
let's
talk
you
really:
okay,.
A
B
So
one
of
the
first
things
I
was
thinking
to
discuss,
was
to
to
have
a
consolidated
plan
since
June
one
is
starting
I
think
we
should
convert
our
action
items
into
JIRA
tickets,
if
that
is
the
way
to
consolidate
them
and
I
I
have
a
process
in
mind
and
I
would
like
to
discuss
that
so
for
jmh.
What
so
the
benchmarking
strategy
I
discussed
this
in
a
meeting.
In
this
sake,
platform
meeting
and
I
I
want
to
know
if
this
is
what
we
would
like
to
do
for
benchmarking.
B
The
first
step,
selecting
a
get
operation,
second,
to
test
them
with
jmh
and
third
to
test
them
in
the
Jenkins
CI
I
think
I
have
this.
First,
two
steps
for
gate:
git
fetch
I,
already
have
something
in
place:
I
have
a
work
in
progress
PR
for
that
the
third
step,
I
actually
am
figuring
a
you
get
out
with
mark.
We
were
having
a
discussion
on
gate,
sub-module
and
I
would
like
to
discuss
that
after
this.
B
B
I
want
to
fetch
on
CI
on
the
infrastructure,
so
that
is
something
I'm
figuring,
fingering
figuring
out
right
now,
so
after
I
have
that
I
think
I
should
have
PR,
where
I
have
a
benchmark
written
for
gate
fetch,
and
it
should
run
on
our
infrastructure
on
different
environments
so
that
we
have
a
good,
solid
data
to
work
on
work
upon
then,
one
of
the
more
one
more
possible
tasks
coding
tasks
we
could
have
is
for
the
double
fetch
performance
issue
be
existing
the
existing
issue.
We
have
I,
have
a
PR
for
that.
B
I
think
we
need
to
discuss
how
to
move
forward
on
that.
Who
do
we
need
more
automated
tests
to
be
any
testing?
So
that's
that's
something
we
could
discuss
and
then
I
already
have
a
micro
benchmark
test
on
gate.
Fetch
I'm,
not
sure
how
do
we
move
from
gate?
Should
we
parallely
work
on
different
operations,
or
should
we
first
choose
an
operation
and
then
work
on
its
implementation,
see
that
it's
working
or
not
and
then
move
on?
That's
something
I.
Also
we
should
clear
before
moving
on
and
then
implementing
the
optin
performance
feature.
B
I
was
thinking
about
once
we
have
the
data.
We
know
that
for
this
circumstance
for
this,
for
this
scenario
Jake,
it
works
better
than
gate.
How
I
was
thinking
to
implement
it.
I
should
have
a
PR,
at
least
on
forget
fetch.
How
I
want
to
do
that?
Maybe
in
the
gate,
SEM
checkout
step,
I
could
do
do
it
and-
and
you
know
there
is
a
PR
for
that-
and
the
last
is.
B
We
also
were
thinking
I-
think
it's
an
active
discussion
on
replacing
gate
in
A+
fetch
step
in
the
gate,
SEM
checkout,
with
git
clone,
and
for
that
we
need
a
lot
of
changes
and
I.
That's
something.
I
haven't
researched
a
lot,
but
it's
a
possible.
It's
it's
one
of
the
optional
deliverables
I
had
in
my
proposal,
so
I
was
also
thinking
to
move
it
to
not
from
from
optional
to
a
mandatory
deliverable.
I.
B
Think
it
would
not
stretch
our
timelines
too
much,
that's
something
we
can
definitely
discuss,
because
I
I
need
some
coding
tasks
to
work
from
Juneau,
because
I
think
right
now,
it's
more
of
a
research,
lose
research-based
approach
and
I
think
it
should
should
be
consolidated
a
bit
maybe
on
JIRA
or
any
any
sense
in
any
way,
maybe
on
this
document.
In
any
shame,
so
that's
the
first
thing:
I
was
thinking
of
discussion.
Any
questions
you
guys
have
so.
A
Let's,
let's,
for
me
at
least
I'd
like
to
go
from
the
very
top
I
I
like
I,
like
number
one,
a
lot
I
think
having
something
that
runs
jmh
on
CI
a--
jenkins,
that
io
will
teach
you
and
the
rest
of
us
a
bunch
of
things
about
about
running
that
because
it
will,
whereas
you've
got
mac
OS
right
now,
it
will
immediately
put
us
into
a
Windows
and
a
Linux
environment.
Even
if
we
do
nothing
else.
If
all
you
do
is
is
execute
on
Windows
and
Linux.
A
There
will
be
things
we
learn
from
that
experience
that
are
crucial
to
the
next
step,
so
I'm,
very
much
yes
for
number
one.
That,
for
me,
is
really
good
and
number
two
is
so
high
value
that
that
absolutely
number
two,
because
anything
we
can
do
to
and
that
that
is
one
of
those
were
it's
such
a
glaring
performance
problem
on
large
repositories
that
that
yeah,
that's
instantaneous
savings
if
we
can
prove
it
works.
A
B
A
You
I
think
you
said
it
exactly
right.
It
is
the
the
crucial
question
is
there
is
what
steps
are
needed
in
order
to
accept
PR
845
845
into
the
into
the
plugin
and
release
it
to
users,
because
it's
that
is
so
valuable
and
and
such
a
help
for
large
repositories?
It
doesn't
do
serious
harm
to
stop
fetching
twice
for
small
repositories,
but
it's
a
major
win
for
large
ones.
B
A
A
So
so
then,
my
personal
bias
is
choose
to
let's
discuss
them
in
the
PR,
because
it
keeps
the
conversation
directly
in
the
code.
I,
don't
know
that
coasters
is
at
the
moment
interested
or
not,
and
if
we
start
the
discussion
in
the
JIRA
ticket,
it
may
distract
him,
whereas
discussion
in
the
PR
keeps
us
right
in
the
code.
A
B
A
So
it's
a
good
question:
I
I
could
imagine
the
double
fetch
thing
will
have
places
where
we
blocked.
Where
you'll
be
you,
you
will
be
blocked
waiting
for
me
or
waiting
for
a
Fran
or
somebody
else
or
Justin
to
review
something
there
for
you.
May
you
may,
in
order
to
continue
making
progress
and
need
to
start
something
another
thing
right.
You
may
need
to
start
JM
h,
aj,
MH
measurement
of
goodell,
remote
or
ohm
aj
MH
measurement
of
let's
see
what
are
some
other
sample
operations.
A
B
Okay,
so
I
think
this
is
done.
Then
the
second
thing
was
Java
flight
recorder.
The
discussion
we
had
with
Oh,
like
the
platform
shape
so
I,
wanted
to
discuss.
If
we
want
to
profile
data
using
Java
flight
recorder
and
how
are
we
doing
that
because
for
me,
I
have
tried
Java
flight
recorder,
it's
it's
integrated
in
the
JVM
I.
Have
a
student
license
with
IntelliJ,
so
I
think
they
I
can
access
the
commercial
version,
so
I
tried
JFR
and
though
it
was
not
very
into
with
the
the
thread
dump
said
gave.
B
It
was
not
very
intuitive.
I
use
a
different
profiling
tool
after
that
I
don't
remember
its
name,
but
it
gave
me
the
percentage
of
the
amount
of
CPU
usage.
It
is
a
particular
operation
or
a
particular
thread
is
taking
with
JFR.
It
will
mostly
flame
graphs.
I
had
I'm,
not
sure,
that's
something
you're
aware
of
flame
graphs.
So
it's
basically
a
visual
representation
of
how
much
space
a
thread
is
taking
and
then
it's
the
threads
are
stacked.
I'll
share
a
link
for
that.
B
So
then,
when
I
was
researching
about
JFR,
what
I
saw
is
that
we
have.
We
have
JMC
Java
I
forgot,
the
name.
I
actually
forgot
the
name,
Java
control,
Java,
Mission,
Control,
yeah,
Java,
Mission
Control,
which
which
which
basically
takes
the
the
dump
file
of
JFR,
and
it
could
be
used
to
visualize
the
results
in
a
better
form.
So
my
question
I'm
sorry,
my
question
here
is
that
I,
whatever
profiling
I
did
personally
was
just
checking
out
using
the
gate,
sem
check
out
feature.
B
So
my
question
is
I:
haven't
used
gate
plug-in
in
a
in
a
ways
the
users
do
I
think
at
least
I
have
used
it
for
certain
use
cases
only.
So
what
should
be
profile,
should
you
mark
or
for
any
any
of
the
other
matters,
or
should
we
give
it
to
other
users
as
well?
We
asked
them
to
profile
their
Jenkins
instance
with
JFR
and
I.
Apparently
also
do
that
should
should
I,
learn,
I!
Think
that's
it's
it's
an
absolute.
B
A
G,
aha,
how
about
as
a
proposal
one
technique
we
could
use
is:
have
you
on
your
local
machine
using
using
java,
11
test
drive
running
a
jenkins,
just
run
the
war
file
directly
with
java
11
on
your
machine
and
experiment
with
java
flight
recorder
as
bundled
in
a
bundled
in
java
11?
If
that
works
well
for
you,
that
will
give
you
experience.
It
will
let
you
see.
This
is
how
I
use
flight
recorder.
A
A
A
Then,
for
instance,
once
you've
seen
some
small
sample,
you
might
say:
okay,
now
I'm
going
to
try
to
clone
mega,
make
a
horrible
terrible
repository.
You
pick
it
the
get,
the
the
Linux
kernel
or
Jenkins
dot.
Io,
that's
40
megabytes
size.
You
know
you
pick
a
large
repository
and
watch
what
JFR
tells
you
about
that
case
where
you
say
I
know,
this
is
absolutely
going
to
be
catastrophic.
On
everything
will
be
spent
on
the
command
line.
A
Get
operation
then
switch
and
do
it
with
J
get
where
now
all
of
a
sudden
you're
inside
the
JVM,
it's
no
longer
CL
I
get
hiding
things
you're
inside
the
JVM
and
you'll
probably
now
see
hot
spots
inside
J
get
itself
of.
Oh
look.
This
is
hot!
That
is
hot,
again
good
experience
to
just
iteratively
decode.
What
does
JFR
tell
us?
How
can
we
use
it.
B
C
A
Be
wrong:
no
no
Rishabh
I
am
confident
we
can
enlist
the
help
of
others.
If
your
initial
experiments
do
not
quickly
show
you
hey.
This
worked
in
this
work.
There
are
lots
of
people
who
do
have
experience
with
profiling
that
we
can
enlist.
Don't
don't
sabot
cause
your
progress
just
because
something
is
getting
in
your
way
and
you
just
can't
figure
it
out.
We're
happy
to
happy
to
go.
Find
people
who
have
experience
in
this.
A
C
B
A
A
In
terms
of
the
next
one,
the
profiling
data
from
willing
users
I
am
happy
to
put
you
on
my
what
I
would
call
nearly
production
scale.
Jenkins
instance
running
on
Java
11
I
just
have
to
grant
you
an
SSH
connection
into
my
environment
and
then
you'd
welcome
use
that
it's
got
30
agents,
it's
running
on
a
machine
with
32
gig
of
ram
it's
on
Java
11.
It's
got
a
thousand
plus
jobs.
So
after
the
initial
learning
period
after
the
initial,
let's
do
something
on
small
things.
B
B
Ok,
so
that'll
be
a
good
great
next
step.
So
after
that,
so
with
running
benchmarks
on
Jenkins
CIA
agents,
we
were
talking
about
using
sub
modules,
so
the
the
question
I
had
was
I.
What
I
thought
was
you
were
saying
that
a
sub-module
is
basically
it's.
It's
like
a
pointer
it.
It
points
to
that
repository.
We're
adding
as
a
sub-module,
so
I
understand,
get
some
model.
Add
will
do
that,
but
once
I
use
get
sub-module
in
I
initialize
it
and
I
update
it.
B
After
that
it
must
bring
all
the
files
from
that
repository
to
my
repository
right
so
screw.
How
would
that
solve
that?
If
I
have
a
300
size
and
300
MB
size
repository,
it
still
would
make
maybe
get
client
plug-in
a
very
large
repository,
something
we
would
not
want
to
do
right.
Well,
what
is
it
that
I
do
it
for
for
this
experiment?
For
jmh
I
I
do
this.
A
A
So
maybe,
let's
take
pull
back
just
a
little
to
the
objective.
You
need
repositories
of
interesting
sizes
that
you
can
use
for
tests
and
you
need
those
locally
right.
That's
that's,
ultimately
what
we
need.
We
need
interesting
size
for
repositories
and
we
need
them
locally
and
we'd
like
to
express
inside
the
tests
themselves
how
to
get
those
large
repositories
locally.
Yes,.
A
So,
in
that
case,
I
think
you
might
choose
in
your
jmh
set
up
code
in
your
code.
That's
doing
the
prep
work
before
your
jmh
benchmark
start
just
clone
from
the
remote
repository
to
a
local
cache
directory
and
and
then
reference
that
local
cache
directory
without
without
any
sub
module,
without
adding
anything
to
the
plug-in
except
the
source
code.
That
describes
what
your
cloning
so,
which
what
you
have
is
an
expensive
operation
initially,
which
says
clone
this
40
megabyte
repository
or
this
300
megabyte
repository.
A
Do
that
once
and
then
all
operations
in
the
in
the
measured
portion
use
the
local
copy
on
the
file
system.
So
then
you
don't
have
to
mess
with
sub
modules.
You
don't
have
that's
that's
sort
of
a
technique
that
they
get
plunk
client
plug-in
uses.
It
uses
itself
and
clones
the
copy
into
a
temporary
directory
locally.
So
it
has
a
reference
copy
of
something
with
interesting
history
in
it.
B
Was
only
concerned
about
this
thing
that
we
initially
thought
that
we
will
not
interact
with
my
thought
was
to
isolate
the
jmh
environment
from
internet
or
any
possible
external
connection,
but
I
guess,
ideally
if
the
the
the
method
JMS
provides
which
is
provided
for
us
to
do
the
before
running
the
benchmark,
it
should
should
be
isolated
from
the
benchmarks
measurement,
so
I
think
it.
It
should
not
be
a
problem
to
do
what
you
say
and
I
think
that's
the
easiest
thing
to
do
for
me
as
well.
Well,.
A
There
there's
Jenkins
the
the
Jenkins
maven
components,
I
think
have
a
concept
where
they
use
a
directory
named
work
inside
the
inside
the
plug-in
development
directory,
so
maven
HP
I
run,
for
instance,
uses
a
work
directory,
I,
guess
conceptually.
You
could
do
something
like
that
where,
if
the
work
directory
exists,
use
it
and
use
its
contents
directly,
if
you'd
rather
not
do
the
setup.
A
B
But
the
work,
the
work
thing
you're
saying
it
works
when,
when
I
run
the
Jenkins
instance
using
even
a
sphere
and
right
I'm,
actually
so
would
the
user
cloned
the
sample
repositories
I
want
to
test
with,
because
then
I
could
reference
that
for
my
tests,
that's
something
I
would
have
to
specify.
First.
A
A
B
A
B
Did
not
try
because
I
knew
it
it's
going
to
fail
because
I
don't
have
the
local
I,
don't
have
the
repositories
fetch
from
so
III
I
did
not
try
it.
Okay,
I
will
try
it
once
III
code,
this
first
and
then
I'm,
going
to
raise
a
PR
and
I
think
that's
that's
been
it's
going
to
I
lied.
Also
I'd
run
bench
benchmark
stage
on
Jenkins
file
in
then,
let's,
let's
see
what
happens?
B
Okay,
one
more
question
I
had
was
that
how
how
do
we
so
the
sample
repositories
I
took
what
I
had
in
my
mind
was
that
I
born
repositories,
since
we
were
talking
about
size
of
the
repository
and
structure
of
the
repository,
so
something
where
we
could
have
the
variability
in
the
number
of
branches
in
the
number
of
commits?
So
is
that
with
what
we
should
go
because,
like
what
I
took
here,
this
is
initially
the
the
first
night,
the
first
repository
it
has.
B
A
At
least
for
me,
that
was
exactly
what
I
was
hoping
for.
Each
of
your
rows
is
almost
an
order
of
magnitude.
10X
10x,
larger
than
the
next
each
row
is
10x
greater
than
the
one
before
it.
So
you
what
you've
done
is
very
quickly
broaden
the
search
space
to
interesting
things,
the
next
the
next
row
down.
If
we
were
ever
to
get
to
repo
five,
probably
Linux
right
I
mean
it's
okay.
The
next
one
down
is
a
1.4
gigabyte
repository
with
this
enormous
number
of
branches
and
I
would
guess,
probably
yeah
a
hundred
X.
B
A
B
B
Okay,
okay,
Justin
and
one
more
thing
I
had
in
my
mind,
was
do
do
we
have
the
data,
the
user
data
on
what
is
the
average
size
of
repository
people
use
gate
plug-in
with?
We
would
never
have
that.
Okay,.
A
Iii,
don't
that
would
that
would
be
a
telemetry
thing.
I
can
certainly
I
can
tell
you
horror,
stories
and
I'm
sure.
Every
other
mentor
in
the
in
the
group
here
could
tell
you
horror
stories
of
repositories
that
were
bloated
and
terribly
but
much
bigger
than
they
should
have
been,
but
but
actual
data,
no
I
I
mean
we
could
we
could
sample.
If
we
sampled
we've
got
a
good
sample
right.
We've
got
a
thousand
plus
repositories
in
Jenkins
CI,
and
if
you
want
a
sample
that
sample
will
give
you
one
sample,
which
would
say
the
repositories.
A
B
A
There's
another
thing:
yeah:
that
would
be
a
big
help.
That
would
be
a
big
help
if,
as
we
get
this
on
the
CI
Jenkins
that
IO
having
something
that
would
visualize
would
help
me
a
bunch
if
I
ever
need
to
look
at
things.
So
I
think
that's
very
worthwhile.
So
I
sure
do
do
this
as
when
vid
the
PR
for
at
the
benchmark
right,
at
least
for
me.
A
B
Okay,
so
one
more
thing,
one
interesting
thing:
I
noticed
when
we
will,
when
you
in
the
platform
seg
meeting
Oleg,
said
that
we
want
to
target
users
with
big,
as
you
said,
that
usually
quad
producers,
they
have
big
repositories
and
those
are
the
use
cases
be.
That
is
where
we
should
look
for
improving
the
performance,
because
we
will
have
marginal
increase
if
they
are
able
to
improve
the
performance
there.
B
So
so
I
I,
actually
Marc
I
could
I
saw
a
presentation
you
gave
on
get
enlarged,
something
I
think
in
Jenkins
world
I
I,
just
look
at
the
P
presentation,
PPT
I
didn't
see
the
video
I
couldn't
find
it
actually
so
so,
I
I
saw
that
you
you
gained
some
you
you
gave
there
some
suggestions
for
the
users
to
do
things.
They
use
reference
repository
and
for
you,
you
bifurcated
get
what
get
plugin
doesn't
master
and
agents
and
what
we
could
do.
B
The
first
was
really
a
reference
repository
in
using
narrow
respects,
shallow
clone
containing
the
depth
and
gate
LFS.
So
I
I
had
this
thought
in
my
mind,
an
idea
that
could
be
provides
some
kind
of
wend
when
a
user
when
in
user
is
when
a
user
is
configuring,
the
gate
plug-in
could
be
provide
any
kind
of
suggestion
at
the
time.
Then
he,
when
the
person
is
maybe
for
an
example
we
do
validate.
If
a
person
is
entering
the
wrong
who
you
are
repo
URL
fits
wrong,
we
do
validate
it
on
the
fly
right.
B
So
could
we
do
something
like
for
an
example?
We
know
that
for
this
for
repository
size,
greater
than
this,
we
have
seen
that
there
is
a
70%
increase
in
performance.
If
you
narrow
your
reps,
reps
reps,
packing
I
assume
that
some
people-
don't
they
don't
they
don't
either
expect
at
all
we.
So
by
default,
we
we
fetch
everything
or
every
branch.
If
they
don't
honor
the
initial
respect,
they
don't
add
any
respect.
B
So
so
could
we
do
something
like
suggesting
them
at
that
point
that
if
you
choose
to
add
an
arrow,
an
arrow,
a
rev
spec,
you
would
save.
Maybe
this
percentage
well,
you
wouldn't
did.
It
would
be
a
boost
in
your
performance.
Could
we
possibly
do
that
with
in
keeping
the
this
fact
in
mind
that
we
should
not
take
a
look
a
large
amount
of
time
in
doing
that?
Yes,
I
think
the
biggest
constraint,
because
we
don't
want
a
lot
of
validation
going
on
at
that
point.
B
A
Is
possible
and
I
think
the
example
you
chose
is
and
is
a
very
interesting
specific
example.
That
could
be
quite
helpful
because
what
what
we
could
envision
the
plug-in
doing
is
performing
a
get
LS
remote
to
list
the
number
of
branches
and
the
sha-1
of
each
of
those
branches
on
the
remote
it's
a
relatively
low
cost
operation
today,
I
think.
A
The
way
we
check
is
we
check
to
see
if
it's
a
good
repository
at
all,
but
if
we've
made
that
a
little
heavier
weight
and
asked
forget
LS
remote,
that
would
list
all
the
branches
and
we
might,
as
a
heuristic
say,
the
heuristic
is,
if
you
have
more
than
10
branches,
we
should
suggest
there
rev
spec,
because
anybody
with
more
than
ten
branches
probably
will
benefit
from
an
intentionally
narrow,
dress,
spec,
so
I
think
I.
Think
your
idea
is
a
good
one.
I'll
send
you
a
link
to
the
videos.
A
C
A
I
think
just
a
lot.
I
love
the
idea.
Well,
Justin's
got
a
good
point
that
you
someone
the
the
first
time
you
configure.
A
job
is
actually
not
as
frequent
as
reconfiguring
a
job
which
you
may
already
have
a
cloned
copy,
so
it
Justin's
got
a
good
point
that
you
may
at
some
times
be
configuring
the
plug-in
and
we
actually
have
a
convenient
local
copy
where
we
could
check
the
size
of
the
thing
and
say:
hey:
it
used
to
be
some
sometime
in
the
past.
It
was
this
big.
A
We
can
assume
that
it's
probably
never
gonna
get
smaller.
It
will
not
shrink,
get
plugged,
get
repositories,
don't
shrink.
Therefore
we
can
make
heuristics
offer
offer
suggestions
based
on
if
we
found
a
local
copy
and
now
if
the
local
copy
is
not
available,
no
suggestions,
but
but
there
there
are
a
number
of
places
where
we
might
look
for
local
copies
and
say
hey.
We
found
a
copy
of
something
that
looks
like
this.
It's
this
size,
here's
our
recommendation
for
your
performance.
C
Okay,
I
mean
even
some
people,
don't
even
so
we
use
chopping,
sell
quite
a
bit,
and
so
we
don't
even
touch
the
configuration
screen
that
just
happens
for
us
and
so
like
having
that
having
an
option
to
potentially
have
it
in
the
in
the
billboard
or
maybe
on
the
admin
screen
or
something
like
that,
would
be
cool
dude
or
as
an
option.
Potentially
some
people
might
not
like
that
option.
B
B
Yeah
I
think
I
am
and
the
first
thing
the
first
two
things
I
think
we
have
decided
is
a
PDR
on
the
GMs
model,
learning
on
Jenkins
CI
and
the
kata
double
fetch
performance
issue.
You
know
open
our
discussion
there
and
I
think
we
would
start
there.
I
think
that's
that's
in,
and
should
we
start
doing
this
on
JIRA?
Is
that
something
we
need
to
do
or
is?
A
C
I
was
just
gonna,
say
yeah
last
year
we
did
it
in
JIRA,
I
guess
if
we
are
chew
on
the
next
one
like.
Maybe
he
can
give
you
his
experience
from
this
from
the
from
his
perspective,
but
it
works
pretty
well
in
terms
of
communicating
things,
but
I
think
it
also
like
I,
think
what
Mark
was
referencing.
Let's
do
it
will
help
help
you
and
help
the
general
project.