Jenkins Platform SIG, 27 May 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020 05 27 GSoC Git Plugin Performance Project

Description

Google Summer of Code 2020 git plugin performance improvement project discussions from May 27, 2020. Topics included discussions on alternatives for benchmarking, locations to perform the benchmarks, and how to best approach profiling and Java Flight Recorder data collection.

A

We like is we like to record this: recording, has started: hey Justin, hey how's, it going Nathan I, know Rochelle, okay, so Rishabh we're past our start time. So, if you want to share your screen and we use the screen to work through the agenda, I see you've assembled in a gem but agenda for us. Let's, let's talk you really: okay,.

B

I, don't share my screen now you would have to enable.

A

Alright, once again, my same mistake has always hang on just a minute. What we have to do here is we change participant. Is that there's something security allow participants to share screen? Okay, you should be able to share your screen now.

B

B

So one of the first things I was thinking to discuss, was to to have a consolidated plan since June one is starting I think we should convert our action items into JIRA tickets, if that is the way to consolidate them and I I have a process in mind and I would like to discuss that so for jmh. What so the benchmarking strategy I discussed this in a meeting. In this sake, platform meeting and I I want to know if this is what we would like to do for benchmarking.

B

The first step, selecting a get operation, second, to test them with jmh and third to test them in the Jenkins CI I think I have this. First, two steps for gate: git fetch I, already have something in place: I have a work in progress PR for that the third step, I actually am figuring a you get out with mark. We were having a discussion on gate, sub-module and I would like to discuss that after this.

B

So after doing this then comes the employee, datian part, so for implementation, part I had some coding tasks in my road, so the first is a PR on the jmh module with running on the gentian CI. So right now, I don't have a way to access the local repositories.

B

I want to fetch on CI on the infrastructure, so that is something I'm figuring, fingering figuring out right now, so after I have that I think I should have PR, where I have a benchmark written for gate fetch, and it should run on our infrastructure on different environments so that we have a good, solid data to work on work upon then, one of the more one more possible tasks coding tasks we could have is for the double fetch performance issue be existing the existing issue. We have I, have a PR for that.

B

I think we need to discuss how to move forward on that. Who do we need more automated tests to be any testing? So that's that's something we could discuss and then I already have a micro benchmark test on gate. Fetch I'm, not sure how do we move from gate? Should we parallely work on different operations, or should we first choose an operation and then work on its implementation, see that it's working or not and then move on? That's something I. Also we should clear before moving on and then implementing the optin performance feature.

B

I was thinking about once we have the data. We know that for this circumstance for this, for this scenario Jake, it works better than gate. How I was thinking to implement it. I should have a PR, at least on forget fetch. How I want to do that? Maybe in the gate, SEM checkout step, I could do do it and- and you know there is a PR for that- and the last is.

B

We also were thinking I- think it's an active discussion on replacing gate in A+ fetch step in the gate, SEM checkout, with git clone, and for that we need a lot of changes and I. That's something. I haven't researched a lot, but it's a possible. It's it's one of the optional deliverables I had in my proposal, so I was also thinking to move it to not from from optional to a mandatory deliverable. I.

B

Think it would not stretch our timelines too much, that's something we can definitely discuss, because I I need some coding tasks to work from Juneau, because I think right now, it's more of a research, lose research-based approach and I think it should should be consolidated a bit maybe on JIRA or any any sense in any way, maybe on this document. In any shame, so that's the first thing: I was thinking of discussion. Any questions you guys have so.

A

Let's, let's, for me at least I'd like to go from the very top I I like I, like number one, a lot I think having something that runs jmh on CI a-- jenkins, that io will teach you and the rest of us a bunch of things about about running that because it will, whereas you've got mac OS right now, it will immediately put us into a Windows and a Linux environment. Even if we do nothing else. If all you do is is execute on Windows and Linux.

A

There will be things we learn from that experience that are crucial to the next step, so I'm, very much yes for number one. That, for me, is really good and number two is so high value that that absolutely number two, because anything we can do to and that that is one of those were it's such a glaring performance problem on large repositories that that yeah, that's instantaneous savings if we can prove it works.

A

B

Iii have that PL, so I think we should work upon how to test it, maybe or if that solution is good or not yeah, so you're, saying I'm. Sorry, no.

A

You I think you said it exactly right. It is the the crucial question is there is what steps are needed in order to accept PR 845 845 into the into the plugin and release it to users, because it's that is so valuable and and such a help for large repositories? It doesn't do serious harm to stop fetching twice for small repositories, but it's a major win for large ones.

B

What maybe we to do is create maybe a ticket on this and then discuss the possible test scenarios where, where should we discuss this? What would be a good platform? Yes,.

A

I, don't know there actually is a. There is a JIRA ticket for that exact problem.

A

Konstantin shot so and Jocasta Shah had submitted the ticket, and so we could either do the the discussion in that JIRA ticket or we could do the discussion in the in the poll. Requests either is fine with me I'm, open, Justin and Fran. Do you have a preference.

C

No I think both of those sound good, yeah, okay,.

A

So so then, my personal bias is choose to let's discuss them in the PR, because it keeps the conversation directly in the code. I, don't know that coasters is at the moment interested or not, and if we start the discussion in the JIRA ticket, it may distract him, whereas discussion in the PR keeps us right in the code.

A

B

Okay, so, okay, you could start their discussion there and so the another question I had was to do. We want to benchmark multiple operations or are we looking for first or should we first do this the first two things and then move forward with other operations? How how do we do that.

A

So it's a good question: I I could imagine the double fetch thing will have places where we blocked. Where you'll be you, you will be blocked waiting for me or waiting for a Fran or somebody else or Justin to review something there for you. May you may, in order to continue making progress and need to start something another thing right. You may need to start JM h, aj, MH measurement of goodell, remote or ohm aj MH measurement of let's see what are some other sample operations.

A

Ls remote is the noteworthy one because it's a check for changes operation, but there check out, for instance, might be like.

B

A

And where we say okay, is there a substantial difference between J, get check out and CLI get check out and if so, what? How should we handle that.

C

Okay, okay and check.

A

Out has a benefit that it is a hundred percent local LS remote is intentionally designed to be a hundred percent remote.

B

Okay, so I think this is done. Then the second thing was Java flight recorder. The discussion we had with Oh, like the platform shape so I, wanted to discuss. If we want to profile data using Java flight recorder and how are we doing that because for me, I have tried Java flight recorder, it's it's integrated in the JVM I. Have a student license with IntelliJ, so I think they I can access the commercial version, so I tried JFR and though it was not very into with the the thread dump said gave.

B

It was not very intuitive. I use a different profiling tool after that I don't remember its name, but it gave me the percentage of the amount of CPU usage. It is a particular operation or a particular thread is taking with JFR. It will mostly flame graphs. I had I'm, not sure, that's something you're aware of flame graphs. So it's basically a visual representation of how much space a thread is taking and then it's the threads are stacked. I'll share a link for that.

B

So then, when I was researching about JFR, what I saw is that we have. We have JMC Java I forgot, the name. I actually forgot the name, Java control, Java, Mission, Control, yeah, Java, Mission Control, which which which basically takes the the dump file of JFR, and it could be used to visualize the results in a better form. So my question I'm sorry, my question here is that I, whatever profiling I did personally was just checking out using the gate, sem check out feature.

B

So my question is I: haven't used gate plug-in in a in a ways the users do I think at least I have used it for certain use cases only. So what should be profile, should you mark or for any any of the other matters, or should we give it to other users as well? We asked them to profile their Jenkins instance with JFR and I. Apparently also do that should should I, learn, I! Think that's it's it's an absolute.

B

It's necessity for me to learn most of the major use cases of get bugging, because I haven't given too much time on that, so so so I'm a bit confused in that front. What should I do? How should we go about profiling? Oh my.

A

G, aha, how about as a proposal one technique we could use is: have you on your local machine using using java, 11 test drive running a jenkins, just run the war file directly with java 11 on your machine and experiment with java flight recorder as bundled in a bundled in java 11? If that works well for you, that will give you experience. It will let you see. This is how I use flight recorder.

A

This is how I get useful information out of it, just to do anything to get you familiar with it, because I am not familiar with it. I don't know about Justin a friend, but for me it would be a completely new Explorer, a ssin.

B

Okay, I could do that. Yes, that's a good step now.

A

B

A

Just run Jenkins war, so Oh docker images nothing more than just launch it using Java 11 on your Mac and see if you can instrument flight recorder inside of it once you've went sure comfortable with that, then it's okay. What data does this tell me which are the hot spots?

A

Then, for instance, once you've seen some small sample, you might say: okay, now I'm going to try to clone mega, make a horrible terrible repository. You pick it the get, the the Linux kernel or Jenkins dot. Io, that's 40 megabytes size. You know you pick a large repository and watch what JFR tells you about that case where you say I know, this is absolutely going to be catastrophic. On everything will be spent on the command line.

A

Get operation then switch and do it with J get where now all of a sudden you're inside the JVM, it's no longer CL I get hiding things you're inside the JVM and you'll probably now see hot spots inside J get itself of. Oh look. This is hot! That is hot, again good experience to just iteratively decode. What does JFR tell us? How can we use it.

B

C

Angry I think that that sounds good I think you'll still see like some weights when you.

A

Gesture, when you say wait, you mean pauses, halts, we're sorry, yeah.

C

I think I. Remember right! That's profiling tools, I think they usually show like when you've got pauses and stuff like that. Good I think they would show like once you've shut out, will kind of like tell you what's what threads are waiting and stuff I'm right like it's been a while, so that could.

A

Be wrong: no no Rishabh I am confident we can enlist the help of others. If your initial experiments do not quickly show you hey. This worked in this work. There are lots of people who do have experience with profiling that we can enlist. Don't don't sabot cause your progress just because something is getting in your way and you just can't figure it out. We're happy to happy to go. Find people who have experience in this.

A

C

Yeah I would just say like if you run into questions or problems and stuff like that. Just kind of like take a note and then we can get those out to the community and.

B

Sure, that's that's a good way to the weekend between us, okay, so I do that refining! Okay! Yes,.

A

And I wanted to add one more I'm going to add one more comment there. Just in case you have to refer back to it, so it is clone a large ish repository people with CL I get review the traces review, the traces clone take it review the cases that.

B

A

In terms of the next one, the profiling data from willing users I am happy to put you on my what I would call nearly production scale. Jenkins instance running on Java 11 I just have to grant you an SSH connection into my environment and then you'd welcome use that it's got 30 agents, it's running on a machine with 32 gig of ram it's on Java 11. It's got a thousand plus jobs. So after the initial learning period after the initial, let's do something on small things.

A

If you really want something industrially, large I've got it I.

B

Think that that will be a great thing to do. First, I should understand how to use it and I move to analyze, thread, dump, say it first and then I maybe would want to see what happens in production or new production, and that's really yeah.

B

Ok, so that'll be a good great next step. So after that, so with running benchmarks on Jenkins CIA agents, we were talking about using sub modules, so the the question I had was I. What I thought was you were saying that a sub-module is basically it's. It's like a pointer it. It points to that repository. We're adding as a sub-module, so I understand, get some model. Add will do that, but once I use get sub-module in I initialize it and I update it.

B

After that it must bring all the files from that repository to my repository right so screw. How would that solve that? If I have a 300 size and 300 MB size repository, it still would make maybe get client plug-in a very large repository, something we would not want to do right. Well, what is it that I do it for for this experiment? For jmh I I do this.

A

B

A

So maybe, let's take pull back just a little to the objective. You need repositories of interesting sizes that you can use for tests and you need those locally right. That's that's, ultimately what we need. We need interesting size for repositories and we need them locally and we'd like to express inside the tests themselves how to get those large repositories locally. Yes,.

B

A

So, in that case, I think you might choose in your jmh set up code in your code. That's doing the prep work before your jmh benchmark start just clone from the remote repository to a local cache directory and and then reference that local cache directory without without any sub module, without adding anything to the plug-in except the source code. That describes what your cloning so, which what you have is an expensive operation initially, which says clone this 40 megabyte repository or this 300 megabyte repository.

A

Do that once and then all operations in the in the measured portion use the local copy on the file system. So then you don't have to mess with sub modules. You don't have that's that's sort of a technique that they get plunk client plug-in uses. It uses itself and clones the copy into a temporary directory locally. So it has a reference copy of something with interesting history in it.

B

I've actually seen that, with the tests, we have a yeah.

A

I, don't know that it's a good technique, but it is act. It is the technique it uses. I.

B

Was only concerned about this thing that we initially thought that we will not interact with my thought was to isolate the jmh environment from internet or any possible external connection, but I guess, ideally if the the the method JMS provides which is provided for us to do the before running the benchmark, it should should be isolated from the benchmarks measurement, so I think it. It should not be a problem to do what you say and I think that's the easiest thing to do for me as well. Well,.

A

There there's Jenkins the the Jenkins maven components, I think have a concept where they use a directory named work inside the inside the plug-in development directory, so maven HP I run, for instance, uses a work directory, I, guess conceptually. You could do something like that where, if the work directory exists, use it and use its contents directly, if you'd rather not do the setup.

A

For me for me, it's easier to code the set up into your test and say clone this repository, but but it also be fine if you said no I, you need to already have this. If you're going to run these tests, okay,.

B

But the work, the work thing you're saying it works when, when I run the Jenkins instance using even a sphere and right I'm, actually so would the user cloned the sample repositories I want to test with, because then I could reference that for my tests, that's something I would have to specify. First.

A

That was what I was thinking, but as you describe it, it sounds like that. It just won't work I, think it's much better to put the the definition of how to do those clone operations inside the jmh setup code. Now now.

B

A

That Sturge the jmh tests, then that may not be acceptable, but Julie I, assume you'll figure that out you'll decide. Is that disturbing the test or not I? Think.

B

That the first thing I would do I would I would have the pre-implementation I have right now, I would I have the results and then I would compare it with the new implementation. Okay I'll do that first and I'll shortly raise appear, so we have something at least a benchmark tests on infra and.

A

Now you had yeah, you had found that there was in the in the pipeline the share the pipeline shared library. There was already something that invokes jmh. What's your experience been with that? Did it work out? Okay for you or haven't tried it yet I? Actually.

B

Did not try because I knew it it's going to fail because I don't have the local I, don't have the repositories fetch from so III I did not try it. Okay, I will try it once III code, this first and then I'm, going to raise a PR and I think that's that's been it's going to I lied. Also I'd run bench benchmark stage on Jenkins file in then, let's, let's see what happens?

B

Okay, one more question I had was that how how do we so the sample repositories I took what I had in my mind was that I born repositories, since we were talking about size of the repository and structure of the repository, so something where we could have the variability in the number of branches in the number of commits? So is that with what we should go because, like what I took here, this is initially the the first night, the first repository it has.

B

It has one branch one commit and then it goes to a very large number. So is this the kind of variance we're looking for right so that we can sufficiently test our hypothesis that performance is affected by repository size and structure, yeah.

A

At least for me, that was exactly what I was hoping for. Each of your rows is almost an order of magnitude. 10X 10x, larger than the next each row is 10x greater than the one before it. So you what you've done is very quickly broaden the search space to interesting things, the next the next row down. If we were ever to get to repo five, probably Linux right I mean it's okay. The next one down is a 1.4 gigabyte repository with this enormous number of branches and I would guess, probably yeah a hundred X.

A

The number of commits in your repo for the volume is is frightening.

B

A

That repository so yeah it's, we should not do repo five, but we have a place. We could go if we needed to a.

B

C

To start nothing.

B

Okay, okay, Justin and uh one more thing I had in my mind, was do do we have the data, the user data on what is the average size of repository people use gate plug-in with? We would never have that. Okay,.

A

Iii, don't that would that would be a telemetry thing. I can certainly I can tell you horror, stories and I'm sure. Every other mentor in the in the group here could tell you horror stories of repositories that were bloated and terribly but much bigger than they should have been, but but actual data, no I I mean we could we could sample. If we sampled we've got a good sample right. We've got a thousand plus repositories in Jenkins CI, and if you want a sample that sample will give you one sample, which would say the repositories.

A

Are this size on average? The reality is those things don't look anything like most corporate repositories, those those are small, isolated components and many corporate repositories are monsters that everybody uses a single repository, and so we have data sources if you'd like them. That's.

B

Okay, it's just just a question: I was so after that I haven't tested, gmh visualizer plugin did they took it back in that's something. I have to do that and well.

A

There's another thing: yeah: that would be a big help. That would be a big help if, as we get this on the CI Jenkins that IO having something that would visualize would help me a bunch if I ever need to look at things. So I think that's very worthwhile. So I sure do do this as when vid the PR for at the benchmark right, at least for me.

A

I would like that that's I considered optional, but it would be a it's a real help and again I'm happy to put the jmh visualizer plug-in into my environment first. If that will help, you I think.

B

I can I can do that with the benchmark yeah.

B

Okay, so one more thing, one interesting thing: I noticed when we will, when you in the platform seg meeting Oleg, said that we want to target users with big, as you said, that usually quad producers, they have big repositories and those are the use cases be. That is where we should look for improving the performance, because we will have marginal increase if they are able to improve the performance there.

B

So so I I, actually Marc I could I saw a presentation you gave on get enlarged, something I think in Jenkins world I I, just look at the P presentation, PPT I didn't see the video I couldn't find it actually so so, I I saw that you you gained some you you gave there some suggestions for the users to do things. They use reference repository and for you, you bifurcated get what get plugin doesn't master and agents and what we could do.

B

The first was really a reference repository in using narrow respects, shallow clone containing the depth and gate LFS. So I I had this thought in my mind, an idea that could be provides some kind of wend when a user when in user is when a user is configuring, the gate plug-in could be provide any kind of suggestion at the time. Then he, when the person is maybe for an example we do validate. If a person is entering the wrong who you are repo URL fits wrong, we do validate it on the fly right.

B

So could we do something like for an example? We know that for this for repository size, greater than this, we have seen that there is a 70% increase in performance. If you narrow your reps, reps reps, packing I assume that some people- don't they don't they don't either expect at all we. So by default, we we fetch everything or every branch. If they don't honor the initial respect, they don't add any respect.

B

So so could we do something like suggesting them at that point that if you choose to add an arrow, an arrow, a rev spec, you would save. Maybe this percentage well, you wouldn't did. It would be a boost in your performance. Could we possibly do that with in keeping the this fact in mind that we should not take a look a large amount of time in doing that? Yes, I think the biggest constraint, because we don't want a lot of validation going on at that point.

B

The conclusion is that something possible, or it's not something we jank is there something ever considered in jenkins environment for any plugin? Do we do that? It certainly.

A

Is possible and I think the example you chose is and is a very interesting specific example. That could be quite helpful because what what we could envision the plug-in doing is performing a get LS remote to list the number of branches and the sha-1 of each of those branches on the remote it's a relatively low cost operation today, I think.

A

The way we check is we check to see if it's a good repository at all, but if we've made that a little heavier weight and asked forget LS remote, that would list all the branches and we might, as a heuristic say, the heuristic is, if you have more than 10 branches, we should suggest there rev spec, because anybody with more than ten branches probably will benefit from an intentionally narrow, dress, spec, so I think I. Think your idea is a good one. I'll send you a link to the videos.

A

I've got that we've got the videos on YouTube and so I'll. Send you a link to those videos that you can watch them too to think about other ways that it might we might apply the technique you're suggesting.

B

C

Another thing you could do since it says it does involve like checking out the repository you could.

C

This could be something that you optionally, including the logs, or something like that, because it would already have a lot of that data, including like sides dated and maybe object data but I like parts idea of, and maybe there's a button you can push since it is a heavier operation like maybe you get, allow add a button to the configuration and say gonna be performance things, and then it would invoke the the thing, depending on how much time it took in how much performance, yeah yeah.

A

I think just a lot. I love the idea. Well, Justin's got a good point that you someone the the first time you configure. A job is actually not as frequent as reconfiguring a job which you may already have a cloned copy, so it Justin's got a good point that you may at some times be configuring the plug-in and we actually have a convenient local copy where we could check the size of the thing and say: hey: it used to be some sometime in the past. It was this big.

A

We can assume that it's probably never gonna get smaller. It will not shrink, get plugged, get repositories, don't shrink. Therefore we can make heuristics offer offer suggestions based on if we found a local copy and now if the local copy is not available, no suggestions, but but there there are a number of places where we might look for local copies and say hey. We found a copy of something that looks like this. It's this size, here's our recommendation for your performance.

C

Okay, I mean even some people, don't even so we use chopping, sell quite a bit, and so we don't even touch the configuration screen that just happens for us and so like having that having an option to potentially have it in the in the billboard or maybe on the admin screen or something like that, would be cool dude or as an option. Potentially some people might not like that option.

C

You know Razia scenarios, but in general I love the idea. Okay,.

B

So I think I should research more on that and probably add it to design document. That's something we would want to look at.

B

Okay, so I guess this is it for today. This is what I have to discuss.

B

Yes, it's number yeah all.

A

Right so official start of coding is four or five days away right, so you're comfortable to you're ready to go.

B

Yeah I think I am and the first thing the first two things I think we have decided is a PDR on the GMs model, learning on Jenkins CI and the kata double fetch performance issue. You know open our discussion there and I think we would start there. I think that's that's in, and should we start doing this on JIRA? Is that something we need to do or is?

B

Should we I'm not sure if that's that's done in G, soft or not EE, you know again, you.

A

Are welcome fused to your tickets. If that will help you absolutely I have no objections to that. It allows you to have a public place, to show things and and say: hey, look! Here's this ticket I'm working on that's great and.

B

Only port was near, we didn't sure yeah.

C

I was just gonna, say yeah last year we did it in JIRA, I guess if we are chew on the next one like. Maybe he can give you his experience from this from the from his perspective, but it works pretty well in terms of communicating things, but I think it also like I, think what Mark was referencing. Let's do it will help help you and help the general project.

C

It's a way to engage with the community potentially but I, don't know that we had like a lot of outside community involvement other than eventually like people. We have reached out to you. So what's my two cents on it, okay,.

A

Now, Rishabh we've got one more meeting before start of coding right we're scheduled for I, believe it's Friday, Hey yeah right, so you're, ok, we'll we'll plan to meet at that time. Yes,.

B

A

That's that's right in the middle of the closing of the hackfest, so we will I will drop out of the closing of the hack fist will do this meeting and then I'll rejoin the closing of the hackfest. Maybe.

B

We can okay, if you can recondition your little kids, it's a problem for you. I don't have a problem with another time: I, don't.

A

Actually, I've I think I think it is the correct focus to say: hackfest does not take priority over google Summer of Code hackfest is wonderful, but we as a group can certainly meet together during that session. I. Don't think we need to change the time.

B

B

A

Right guys, all right, thanks for sure, see ya,.