Jenkins Platform SIG, 24 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020 06 24 GSoC Git Plugin Performance Project

Description

Jenkins project office hours for the Git Plugin Performance Improvement project.

A

Everything and let's go okay, so let's start with the agenda directly. So the first thing I wanted to discuss today was the interactive testing. I did for the redundant fish shakes and the modification for which, which for which I had a discussion with mark. So the first thing is interactive testing. I shared the plan with you guys yesterday, so I took some scenarios. I thought would affect would affect the results of removing the second fetch. The first one was with advanced clone behavior, because of course we are cloning we do.

A

This is directly related to the fetch fetch, so the first mode was to see if effect, enabling or disabling fetch tags would directly result in any kind of difference. In the information we have for the repository and so I. Apart from looking at the result, I also looked at the code and I was looking at how the second h is handling these same behaviors, because if there was a difference in behavior, then there would be a chance. Then the results would be affected by removing the second fetch.

A

That is what we saw with with one of the one of the issues mark caught, which was that some of the references were not being handled by the first clone API, but were actually handled by the second fetch we will perform in so with fetch tags. The second fish is also doing the same thing as the first fetch is, so there is no difference when, if we disable them the first first, if we enable them the first file, the first fetch will bring all the tags you disable them. There will be no difference.

A

The second when I was interested in was the shallow clone. First, with no depth. No edit I was I wanted to see if by default, the second, the second fetch is it providing deaths by default, and the first is not doing that so that there might be a difference in the commit history. So so, when I, when I compared the code, I could see that the clone API. Basically, it shares the same implementation for doing a shadow clone. So they both if I, don't provide any depth. They will. They have a by default depth.

A

One level for doing shadow clones, so there is no difference if I, so the behavior of the same. If I remove, if I remove the second fetch, the first fetch will take care of performing a clone with depth equals to 1. Then the third test was with shallow klum with depth to I. Think it was, it didn't I think it's the same thing it doesn't. It won't make any difference in the results. Then timeout was kind of just I. Just wanted to see if there would be I.

A

I was sure that timeout would not make a difference, because both of them again they implement the same from git clone API, is also is doing what the second gate stretch the functions. It uses it's the same functionality, so there is also no difference in case of timeout, then by.

B

The shop, so what is that this.

A

Timeout specify exactly so basically I would specify this if, if you're cloning or repository it, if it takes more than five minutes, it's going to it's going to a breath abruptly canceled the bit. That's what happens is the math right? Yes, so that is timeout by default. It's ten minutes for any foreign foreign operation.

A

Yes, the second scenario was with white workspace and force zone, so this I I just tried because I wanted to see because it's it's cleaning the repository and forcing a Rekluse, so I wanted to see. If that would somehow change anything so I enable wipe workspace and I tried, I, try, I, compare the results of without the fixture with the fix and I could not see any difference. In the repository information we had, then it was for checkout for a specific branch and I realized.

A

So this is this is the third scenario and I realized that it has. Basically, you know it's something we do after this step of which involves these double fetches. So it's it's in a function called a tree changes it checkout is a stage which is which comes at a dated. Part of the cone, so then there's this interesting behavior I found out, which is not user visible. It's called gate sem source defaults. As far as I could understand it's. It's basically done it's done.

A

I was reading and let me recall it: it's done to go to the default, revert to the default behavior that is enabling on a rev spec, and the second thing was I. Don't remember exactly. That was not sure. If this would make any difference, I tried it. It did not, but since it was doing it was it involved and honoring respect. I just wanted to check if this behavior would, although I could not understand. How is this behavior being called? I did not go too much in deep how this behavior was working.

A

I just wanted to test how this is working and then pre very much I I wanted to ask mark and sign and everyone you guys if it would make any difference. We did not know I.

B

Would not expect to make any difference, but it's an interesting question, because pre build merge requires two branches, at least inside the workspace right. There's got to be a source branch and a destination branch, but that you've got to have that one way or the other, whether it's from a wide spec in ischl e or from an honor respect and used to declare both branches, so I think I think it is, is unaffected by this.

A

Ok, so I haven't destinated, so I think I should test it still. I would.

B

Given what you've done so far, I would even propose skip that safely. Don't worry about it! Let's trust that it's its! It is not going to affect this okay.

A

So after doing these interactive tests, there was one interesting problem which mark he pointed out, and that problem was that with the ref specs, while while we are fetching the ref specs, so we have multiple kinds of references. Ref specs are basically mappings for the references between little more repository and the local depository. So so we can, when the ref specs, which the first which handles they are, they are related to the references of branches that is less heads any any related branch or star which brings all the branches.

A

Now, if there is a case when we want to check out for a particular pull request, which is a feature specific to github or maybe something some other kind of reference for gate lab, the first fetch will not handle that if we are not honoring the English and the respect for the initial clone, if the user is not choosing that option, those fetches will those references will be missed by the fix I have proposed, and so so that so this is a huge issue for for us to move on with this fix, because that would bring direct failures and many use cases will be broken.

A

So so I had a discussion with mark on how we could safely retain the fix and modify the code so that we can. We can not break any use existing use case and we still do not have to call the second fetch. Although the current modification I have tried, there are cases where I would have to call this second fetch call to not break an use case. So right now, would you guys like me to go through the fix?

A

The modification I have tried and then the interactive testing I tried upon upon that modification to see the cases which mark pointed out for which the code was breaking the use cases. I tried those cases and now it's working with them. So would you should I, explain and go through that code, or would you guys review that from the PR and that's a we shouldn't use this time on discussion of that code, so I'm.

B

I'm open to either I will have to review the code either way. So, let's look to the Omkar and to Justin and to Fran. What's your preference? Do you want to skip detailed code review in this session and go on to other topics, because you've got several other topics we need to address Rishabh right. This is this is not the only topic for a session today.

A

Agreeably, so that is I just wanted to ask so was there already raised the one you mentioned is? Is it talking about that? Yes, yes, it's it's. A comment on the latest PR I'll share that we are on the data chart.

B

Okay, it's 904 if I remember, right and I'm becoming very aware of exact key our numbers for your work, Rochelle 845 904 because of the multiple builds. Oh, no, it's because it's very important, I'm, very interested.

A

Just in fun, would you like discussed it, or should we move forward I think we can creaking yeah, okay, so the second. The second agenda is related to the performance benchmarks and benchmarking in general. I was very benchmarking as being surprising and irritating at the same time. For me right now, so what has happened is the first of all. What I one of the important things I have to discuss is that I was profiling.

A

The Jenkins instance when, with whatever changes I did with the fakes and without the fix with Java flight recorder, so using that profiler. What what I was experiencing experiencing with consecutive bells was some kind of issue. I could not find out what the issue was, but there there were huge time. Differences between in the git fetch calls between some repositories, which was which was what I showed in the platform's a meeting and which was wrong.

A

So then- and it also took a lot of time for me to change the repositories and then again run the instance fixes it was taking a lot of time and I wanted to do it with a lot of repositories to actually see oh, how are we doing with the redundant fetch? What kind of performance overhead would be reduced if we are allowing that stretch, so I shifted to so I shifted, sending you to use of performance, then JMS benchmark, because I would have to just write a benchmark.

A

I will have enough parameters where I'll directly put a lot of output, multiple repositories and I'd, and then I don't have to do anything. I just have to wait for the results and also, theoretically benchmark is the heading to understand the root cause. If you have to do a root cause analysis, it's the best thing. One of the best things to do is what I thought so so I tried a bench so I've written two benchmarks related to the redundant fetch and I'd like to show you the results and the benchmark.

A

So the first benchmark is so-so. I've raised a PR for the redundant benchmark as well, I've also written another benchmark. Meanwhile, I'm going to show you that benchmark first, so with this benchmark, what we're doing is we have written two benchmarks here. The first one is going to use the initial clone it's going to initially, so it's going to clone the repository and we're going to see what kind of time it takes to clone the repository for the first time, then in the second benchmark, so it so.

A

The first benchmark acts as a baseline experiment that we can compare when we actually add the second operation. That is the second reg call. How much time difference are we gaining because of that? So the second benchmark is basically again the first thing, the initial clone and then again a fetch, a fetch operation. One I realized by writing. The benchmarks was that I was not doing any kind of validation to check if these operations are actually doing what they should do and I am what motivated me to do.

A

That was that some of my men shrunk so in giving me results which I could not understand completely. I was like I, don't I, don't know how to write a benchmark either or is this I am NOT able to understand so so what the initial validation I've put here is that once I, the first thing I would be do with the benchmarks is that we clone the benchmark from an upstream source, the repository to a local place for the instance of the benchmark.

A

So that is the first thing we do so at that time, I record the size of the repository we have when we're doing that and then, when I'm, using the benchmarks and I'm actually cloning them from that local upstream local repository I have local git repository so I once I I, do the fetch I compare the sizes and I see that actually this operation has taken place and the time is not just because it's actually not doing what it's supposed to and I'm getting time difference then analyzing them and I'm I'm inferring.

A

Some observation from that in those times are not even those operations are not doing what they should. So this is the first validation I was I have put here, but I am thinking of putting more validations to actually see if the operations I'm trying to benchmark are working or not, and the times are we get or not. So so, with this benchmark, the results I have is yes, so the results I have I'll explain what you see in here is so these are the two benchmarks. The first benchmark is here with the initial clone.

A

This is with the double fetch calls. So this is the second been wrong. This is the first benchmark with the first benchmarks, the color grading. It basically means is that so we have get and jagged two implementations I'm actually also testing that as well. You see how both of the implementations work, so the two bars here these represent, so I took two trees. The first is the jenkins repository and the second is the ruby repository and why I took those two repositories. Let me just show you quickly the reason.

A

The jenkins repository the number of commits at 13,000 branches 31 with the ruby repositories I have 61 thousand commits it's basically double so I wanted. So one more thing I'll discuss after this, whatever explained, is that I'm not able to find constant size repositories or something near that when I'm actually looking for real repositories so with the Jenkins one? This was the closest thing I could find. This is 366. I mean this is 471, there's a good 100 MB difference, but this is was I.

A

I thought, maybe I could get something out of this, because the commit size is increasing, double its double with the branches. It's not it's. Actually, let's not see the branches there. First, let's see if there's actually the pinnacles of the comic, so the results here, VCS or the for those two repositories: Jenkins depository and the ruby depository. The first one is the Jenkins repository with the gate implementation. The second bar is the room with with the gate implementation.

A

Then next lighter blue color bar you see, and the gree color value C are both of the repositories with JK, and this is for the first benchmark and then the same thing for the second benchmark. So if we see the results technically of real-life performance, there is theoretically from these benchmarks.

A

If I could infer, there is no difference, no tangible difference between between a single fetch and adding a second fetch on that same repository and as you can see, the first benchmark 11 seconds per operation with the second with Jenkins, it's again 11 seconds, there was some difference. It was some microsecond difference, a millisecond. Sorry, not much so weird. So I took the time unit for seconds this time, because I actually wanted to see real-life differences.

A

How much would we have when be removing when we're actually adding the second clutch so and the only difference the only difference I could see was where the the last one that is with jagged and with Ruby. There was one second difference between both of the benchmarks, which is kind.

B

Of so you're you're, confident that the that it was really using Jake, it is the implementation those are so so similar to each other. It seems like they're either both using they could either both using CLI get or both. That's that's, fascinating. I can't explain what you're seeing, but that's really interesting. So.

A

I have another test for the redundant fetch any.

B

A

B

You didn't you looked at the logs and you saw that the logs were showing in both cases or or do you use some technique to confirm that? Yes, I really am using Jake it for this. For this one I really am using CL I get for this other one. So.

A

I, usually log I have I, have I, usually print and I'm, using when I'm with the benchmarks are in the implementation I'm using. That is how I'm sure that okay, this is how it's being calculated in and since since this is looking a little episode, I I actually have place where I ran these benchmarks head and sure you certificate. Yes,.

B

Say so and this one that one was clearly using command line yet because it has the command line, get markers in it. So.

A

B

Saw a screen that went by that had the clear logging from command line get I, don't know that it's all related to what you're doing go ahead and show us the the files that you were going to. Yes,.

A

So this is, this is the run which happened: the the visualization you're, seeing this. These are the results in this form. So here you can see with the first benchmark, which is just the initial clone. Yet it's it's giving us 11 seconds and with again get with the second benchmark, which has two fetches. It's giving eleven point, one eight one seconds so so different evidence is very minut and not a good thing. Actually, no.

B

But okay, that supports the observation I had earlier. When initially people told me, this fetch is enormous, ly expensive. You have obviously found at least one case where the fetch is not enormous, ly expensive, that redundant fetch. That doesn't mean it's always free, but at least it means you found one case where it is free and it's surprisingly low cost. So interesting, fascinating, I, wonder if so, when you're, when you reference the the repository and a local disc, do you reference it by absolute path?

B

Or do you reference it by file colon, slash, slash, URL, high.

A

Inclination: okay,.

B

Because you may want to read the CLI get documentation, I, don't think they'd do the same optimizations for jacott, but CLI get may do some things where they say. I know this is local and remember that the person who started writing this was Linus and therefore he thought very seriously about file systems. He says if I know it's local I'll just do hard links or I'll do symbolic links or I'll. Do you know there are all sorts of things that he could do knowing?

B

Oh, this is local I'll, just I'll just make it take advantage of the fact that it's local okay.

A

That video benchmarks we are assuming what we're doing is we're fetching the repository from a local filesystem repository. So this is.

B

A

A real use case real-life use case. That is so because, with profiling, what I saw I saw results, not huge results, but at least there were there was a 10 second difference between, so that the second fetch it it was costing around 10 seconds or maybe 8 seconds or 12 seconds, at least that much so this was a little bit surprising. Well,.

B

Okay, so another argument here might be that in fact, the network transfer time to do the incremental fetch is so important. It may be maybe a dominant factor there and therefore we won't see it in this in this intentionally controlled environment. Good, ok, interesting, yes,.

A

One more observation, I think which we've discussed already, is that, with a larger size, a positive jacket is performing way worse than what CL I get is doing right, I kind of have a question that why are we you? Why do we give Jake it as an option when we're seeing that I actually don't know why we use jacket I, see I have never asked you that. Why are we using it when and we see that for any normal sized depository Jake? It is going to perform worse than she liked it yeah. So.

B

So the the original, the original dream, many many years ago before I became a plug-in maintainer was that Jake it would be every bit as good as command line yet and we would get better results by being by using a full native implementation. The reality about a year into using that implementation was, we learned very painfully that Jake it was not a complete implementation of CLI get and, and since, since that time, the evidence has proven it will probably never be a full implementation of CLI get the people who maintain Jake.

B

It are very committed to it and they do great work for the things they need from it. But but of course they work on the things they need, and so so that the the one one use case where Jake it is very very helpful, is if you have a platform where you can get Java, but you don't have a command line, get port Jake. It will will still work for you, so so in that case, it's interesting for large repositories. It looks like we have clear evidence. It's never interesting for you.

B

The other danger with large repositories is its using its using java virtual machine memory to do the clone and and therefore you have to worry about memory leaks inside or an inadequate garbage collection etc inside the Jake it implementation, whereas with CL I, get it's always a sub process. The operating system will garbage collected for you, so so yes, your observation is, is very wise, but why use Jake it for anything larger than larger than about ten megabytes.

A

Okay, and so the next benchmark I I have actually raised, appeared for that. So this with this benchmark, what I'm doing this and I think it shows benchmark as well, so the dis benchmark? We have multiple depositories, it's it's from the Jenkins. These are Jenkins repositories, small plugins, I, just incrementally increase the size and number of commands number of branches to see.

A

What are we seeing here and I was actually tired to find so what so the conclusion I came into is that either we we need to create repositories on our own, where we have constant sizes, but differing number of commits omeka also suggested that, but the issue with that I I see is that we will never be able to simulate what an actual repositories for an example right now, I can compare to depositories where I have a 13,000 comment, number of commits difference, I I think I will be very difficult for us to see all of those kind of parameters.

A

We set those parameters for repositories. We create ourselves while we're benchmarking, but but to have a clear sensitivity: analysis where we directly want to find out how this parameter, like the number of commits, would affect the execution time for gate which, without freezing without taking the size of the repository constant I, am not sure how we'll be able to confidently say that.

A

Okay, if we increase the number of commits this, is how the execution time is going to change, because the size with what I've seen precise all these increases, when the number of commits are increasing, with what I've, seen and I think that's kind of an obvious fact, and so so that is, that is one of the issues with this strategy, both of the strategies I have so so, with this, this benchmark, so I had some of the deposit I had for repositories here and it's it's doing the same thing.

A

It's actually not doing the same thing. Here's a differ. The difference is that, with the earlier benchmark, I was actually cloning. The repository for the first time within the benchmark, so I was benchmarking. The execution time for that operation as well. Here the that operation is is taking place in the setup before the benchmark. It's it's it's happening before the benchmark, so ideally it should not affect that time. So clearly, I should what I should get is the execution time when I am the results?

A

The execution time for the incremental fetch is what I should get from this benchmark, and so what I got so I'll just show the benchmark. This is the benchmark. It's incremental fetch the git client I'm, using it the the git repository it's referencing it should. It should have already have a git repository fetched from the local git repository I have so so. The results here we can see is that we get so. The colors you see is basically multiple repositories with gate and then with jagat, it's just one benchmark, so we don't have confusing result.

A

It's not that much confusing, so so we'd get as we increasing repository size. One positive result I can see is that the execution time is increasing, for the cost of having an incremental fetch is increasing, though the increases in microseconds milliseconds, but it's an increase and I'm sure as I increase. The size I take it to may be much larger repositories mean have a change, but what I have to do is after this. One of the most important thing is to map this, the theoretical or derivation, with practical observation and to do a practical observation.

A

What I've seen is that I can use JFR profiling tool to see how for those repositories, what kind of performance overhead I am reducing, while I'm, avoiding the second fetch. With this, we can see that okay, now there is a change. There is a difference. There is an increase when we increase the size of the repository.

A

The number of commits also increase the number of branches increase, but I can never say for sure what is contributing the most for the keep stretch right now, because since the size of the repository is increasing, III there's no way I can say that. Okay, the commits is why this is happening. For that to happen. I need maybe to 500 MB repositories with one bit having.

A

There should be a clear difference in the number of commits, possibly something like 20,000 commits in one and second might have 30,000 or 40,000, so that I can see okay for these constant size repositories, if the number of commits are increasing. This is how the execution time is increasing or decreasing, or it is having no effect so, but that is a yeah.

B

I thought I thought our intent here was trying to understand which things should we include in the sizing, heuristic and isn't. Isn't your observation here saying we should include both repository size on dist and number of commits because they seem to both show as they increase. We, the execution time increases. So do we already have enough information here to say yeah number of number of commits in the repository and size, the repository and the disk are both relevant to to performance. So we include them in the heuristic.

A

Yes, mark you're right, they are limiting for doing that is to to find what performance? How is it how much affected from what predictors, but what I'm saying is that we are not able to test them independently, not as independent variables. Here they are depend. I'm, not sure, is if the file, what is contributing more to the performance changes in the get fetch? Is it the file size? Is it the size, the pack, the size of the pack, dot pack object or is it? Is it the number of cores?

A

Of course it would be the dot pack object. Most of the performance would be affected by the size. Only it's common, as my common sense says, not or technically.

B

I haven't even.

A

B

No no go ahead and finish scuse, my interrupting so.

A

My hypothesis I, wanted to test was that if we have a repository with a large history and maybe not a considerable size, but a large history, would that affect the second fetch more, because what I assumed with the second fetch was that the first fetch would download, although would clone all the objects the packed object. The second fetch does not have to do that. What it should do is is what I think I haven't checked.

A

I haven't looked in to confirm this, but it showed the ways to iterate through the list of the commit history or basically it has to get the increments in references or any changes in the repository. The second fetch we would want to do that and to do that it would go through the history and so my my hypothesis, what that was that the the redundant fetch would actually have a considerable performance overhead. If we have repositories where the history and the branches they they're they're larger, then there are cons.

A

I would say a considerable number is there for those repository, so that is something I wanted to test and I'm, not I'm, still not sure. With these we're sure that with increasing the size and all of those, the number of commits we're seeing that the the performance overhead of the second leg is going to increase, we're sure about that, because we can see that with those other benchmarks, not for the second one.

A

First one too much, but this in in whether this microscope the time unit, we can see clear difference but I'm, not sure independent variables, how they're contributing to the performance, and so again we can see that jagged is actually performing better for us for small size repositories that to think we have the observation. We have that for a small size, repository Jake is going to perform better than cake.

A

We were seeing that with these, this benchmark as well that it's performing, but though it's the difference is not much in real-time I think we see the differences with much larger repositories Jake. It is not fun good, I'm, not sure how much this would affect tea performance for a user noticeable changes, but theoretically it's Jake. It is performing better than the first one size repositories.

A

So yes, so with benchmarking strategy I have so if, if our aim is say, if our aim is just to see that so we need to make an estimator and to make make an estimator to estimate the size of the repository. What kind of parameters we need to see so the obvious one is the size of the compare. The two objects, the second it's safe to assume it's number of commits number of branches, but how much how much independently they affect the performance is something I haven't, not able to figure out right now.

A

That's that's my concern. Yes, so any yeah do you guys know this yeah.

B

So I'm I think you've, you've, you've answered the question. Should we include size and number of commits in in the in the assessment? Absolutely and we've got you've got data here that says yes, Jake it for small size repositories is marginally faster. So so there's there's another incentive to say: okay, we should now probably look at code and say or put you into code and say all right. How do we use this now to implement the heuristic or to implement the estimator, the size, estimator and and start seeing?

B

How do we get that into the code to allow people to the option to say I want to use fastest thing for my repository.

A

Okay, that is what I thought as well, that we could. We have clear evidences that some of the parameters there how they are affecting, so we can start working with the estimator and and I think the next agenda good thing I had. Then there was analysis on fine. So we have discussed this performance. Predictors forget wretch.

A

Now the repository size estimator, so I thought that I would write a class I could show a prototype on on how we're going to do it. The approach, but I could not do that. This much stuck with benchmarks and effects, but I did research on the heuristics. We were talking about a little bit.

A

I saw the things we could do, and so the first option we had was I think the first option we had was to use the easiest one to use github API, so the API Zechs fall ready exposed by these providers so with github or the problem I I've seen is that they gave the side of the bear repository instead of key action repository, so so that might not be a clear indicator of the size of for the deposit.

A

So I experimented that with vs code, microsoft me xcode, so I cloned it- and I am now also- I tried testing the api provided by github to check. So what was the size? It was returning to me. So the size was around according to the github again it was around 300 mb, but and I cloned it was around nine I didn't so that's that's a huge difference inside this. So I checked around so I found out that github under servers. They have repositories bare repositories, so they they they gave that size as a result.

A

When we are trying to receive that information and yes mark so.

B

On on when you say that the repository in your local copy was 900 Meg, does that mean that the dot git directory was 900 bag or that the whole work space, including the checkout copy, was 900 Meg.

B

Think we may have lost you, you.

B

Yeah I think we did yeah okay well, while rishabh gets ready to come back, I assume he will eventually reattach for us as mentors we've got an upcoming day arriving soon. Oh he's back good. Sorry.

A

I'm Chad you in power, okay, oh yeah, since I've been or who you seem like a mystic. So.

B

When you, when you, when you say that the vs code repository on your disk was nine hundred megabytes, is that just the contents of the docket directory, or is that that the entire checked out copy, not just that get directory.

A

So, two to measure that how do I measure the size of when including the repository, how do I do that I, basically I I, think I have it so what I'm cloning it? It usually shows us downloading the objects, and so it if you, if you you I, am sure that instead of explaining I.

A

Think we all are aware: I think we can yes, I lied, show it here so.

A

So T amount of objects- it's downloading, so it shows the size. So here I just so. This is what I consider the size of the repository and I'm sure you guys can see and I.

B

Have no idea what that number represents: okay, alright, so so that number I don't know what that represents. I, usually look at this size, the d u- s, output for the docket directory, because what that tells you is size on disk of of the the fundamentally. What is almost the bare repository as represented on the other side,.

A

So I think I have to confirm that I haven't IIIi. Think I did check the object, dot pack object, which is downloaded by go by clicks alone, so I could see similar sizes from some of these. From this thing and from from that object, but I think I check that mark first to see if that is working and so and with estimators with the estimated class. So right now the object one option mod gave was the grade. Option is 2.

A

So if you have a cash cash repository for the project, we could use that if it exists, we do that. Currently in our code right mark and and if we want to, if that exists already, we could touched it's the best way to estimate the size of the repository.

A

So we could use that I haven't explored how in what for what lifetime that cache exists. Where would I find that cash I assume it's it's it's on the Evo. It's in the workspace right agent, workspace. No, it's it's in the master. Only.

B

In the master actually and and ok.

A

So mark, if I, if it's in the master and I, have my box with an agent so either I. So what do I do I reference it? Ok, so once again master agent, this share the same HDFS I'm, actually not very heated. They showed how that's going so.

B

So the since the execution of most of the logic is happening on the master for you. Therefore, you can ask questions of the cache on the master pretty directly so things and things like the when the gate SCM object is created, you can assume that's on the master and that SCM object then can can look at the local cache and interrogate the local cache. So I, don't think you have to I. Think it'll be pretty straightforward.

B

Actually, if you just use that I, don't even think you'll have to do a cash lock in all seriousness, because I think all you're trying to do is look at the file system, so you get to get the a directory of the cache and then knowing the directory name. You go use. File system calls to ask for the size of the contents of that directory, and that gives you a relatively quick approximation of the size of that repository deposit. Yes,.

A

Okay, so I'll explore that okay I think I'd like to prototype as fast as I can for the estimated class, so I think we've extended the given officially an obviously extended this to one other. Is that okay with everyone I'm? Sorry that oh.

B

Yeah not for me, I trust that the mentors who can't be here will drop off. So so you that that's we set it at 30 minutes and where we go as we can.

A

B

A

B

The the demo work the demo plan because you've got a demo coming up. Is it next week or the following week? I think it's next next week: okay, great yes,.

A

So I have to do a bunch of things. I saw the mail, so I have to publish a blog post before that I think we should discuss what all should I put in the demo, and so from my side. What I would like to show is first, the whole benchmarking strategy, how we're doing that the code and then the results.

A

What I would also like with benchmark. You.

A

More operation so that it's maybe.

A

Like get a lesson, what I was interested to see how that would work? Maybe I have something interesting observation to show a great head. Leslie motor I I want to expand the benchmarking study for those two operations. That's the first thing we could show. The second thing would be the redundant fetch book. How we have done it so I was thinking because of the demo.

A

I I would have to show what I would have to show something visually so I as a feature or something in the user interface and motion things I have it's usually code or weird results, so I'm actually not sure what what are your guys expectations? How is the are you, the guys who will be my will be the panel and evaluation in the evaluations, or is it the cool committee of Jenkins? How so we're.

B

The evaluators you it's it's us, you you've got all four of the evaluators online and tight. Yes, so I guess.

A

The best yeah, yes Mack, yes, Tony I, think.

B

I think, if you show graphs and you I, don't think you have to show Jenkins UI as much as graphs and highlights of hey here's, what we've learned as part of this exercise. Look at this look at this here's an improvement here. Here's an improvement here and people will be more impressed actually with with graphs and charts of performance performance comparisons than they ever would be with show them a jenkin july, because we knew this was a performance performance project.

B

A

Yes, so that's a great Ernie, because I was, as I was seeing other projects. I was a I, don't have any kind of so I was seeing that they am their own plugins. They have jewelry interface. Yes,.

B

A

I, don't have all of that.

C

A

C

My experience last year we had some other projects that were similar to this as well, and it's it's not a big deal like if it's a plugin based thing where you're actually building and you pull again yeah like you, might get into the demos of how that works and use your Experian and stuff like that. But yeah. My Park said it think I definitely focus on the meat of this project and that all people will like it. You.

B

Know in a perfect world they will see nothing different it'll just be faster right. So so so, if you, if you show I'm going to show you nothing except it's faster than that, that should already delight people. It's like wow, that's great, because usually it's it's faster and I had to break the following things in order to make it faster.

A

Okay, so so what I'm thinking is the first thing is the benchmarking strategy, with the kid fetched what I did and how I improve the benchmark on the Jenkins are standing in everything. One thing which is missing right now, which I haven't showed you guys, is integrating the JMH visualizer plug-in or the Jenkins page I have to do that because that's I think it's going to be a great improvement, because we will be able to see visually how the results are shine.

A

So that's something I'm gonna do and I'm going to do it for gate, LS remote as well. So for these two operations, the benchmarking strategy, then with the redundant fetch I think from the fixed would would you guys be interested in seeing the testing scenarios and the cases we consider the while we were fixing this and the use cases we had to consider if we would break them all how to do this safely. The whole thing or is that something we don't have to discuss?

A

We have to discuss the fix and then the benchmark, which would show that this fix is showing some improvement. Ideally yeah.

B

So, for me, I'd keep the the testing in your back pocket in case somebody asks hey. How did you check this I I suspect the audience will be I'm I'm gravely concerned about not breaking compatibility right, that's a big deal for me, but the audience the larger audience probably will just assume that think. Of course, no one's going to break compatibility and so they'll be more interested in your results, with numbers and with the performance results, and your observations on hey here are the characteristics we saw.

A

Okay, so with benchmarks, as we've seen, that the theoretical results are not showing much of a difference. So what I want to say what, with the redundant fetch, the results I would like to show is the profiling results as much as I can so that I have a large sample and the result is results are not something which we do not expect well, I. Think.

B

It's okay to show that show the surprises as well and say: welcome to the real world. Sometimes we get surprised by how software behaves I I, feel no shame in declaring that we were I was that you were completely surprised to see this result comparing to this other result and that more investigation is needed. That's that's perfectly. Okay, okay,.

A

So so the the benchmark results and the profiling results both of them for the katadyn fetch issues and then I think the third thing would be. The third thing would be the estimator class if I'm able to create that with some heuristics, which we've thought about and so I I need to first consolidate the approaches we can take and if it's even possible with the way we won't, we do it because I'm right now, not too much sure because with the API is I, was actually seeing.

A

Something which I discussed there difference in these sides because of their and T objects. I have to confirm that with the cache thing. So if the cache doesn't work, then what we do because, with the cache it's I think if we have the cache, then it's it's. It's simple to estimate the size, but if you don't have that, then it's the real work where we would have to understand how Wurster how we could estimate the size. I was hoping that number of commits and branches would have a great.

A

So if we could get the number of branches we could get. The number of commits- and we even if you don't- have the size of the objects we might be able to make the decision, but with the the experiments I have done, I was not able I'm not able to find out independently how these factories are.

A

These factors how they contribute to the performance, so maybe I'll try to I try with some more experiments. If I'm able to isolate that, so I think I already have a lot. Yes,.

C

And one other thing maybe try if you wanted to like, if we wanted to roll out the disc things like the lightest thing that work was talking about, you could maybe set up like a bit bucket or gitlab server on your local network, put these repos on there and then that would like get you to like. Maybe it's not going to optimize for being on the file system? Oh that's a possibility as a that's.

B

Optional possibility well and Rishabh I have an environment that we could use to simulate exactly what Justin described. I have a local git server on my network, that that happens to be just full of all sorts of, interestingly sized repositories. So so Justin's idea is good. However, even before that I would take one more I think you've learned something in this extra. You you've gained a crucial piece of knowledge that I don't think you highlighted nearly enough as you're in your summary.

B

It is that there is a performance curve, there's a performance result for git command line, get and there's a performance curve for jagged and there's an intersection between those two curves that is dependent on repository size factors and- and that is something truthfully before you did this project.

B

We did not know that I had not I had an assumption, but I had no data to support that and what you have is you have hard data which says, as these attributes of the repository increase but carry the characteristic performance of git is like this and jagged is like this and that curve? If that curve, is your opening slide? Even for me, that would be great because it says, oh, oh, everybody should be aware of this characteristic of the jagged implementation.

B

You've done, you've done concrete measurements and they measurements showing over and over again this exact same story that, with large repositories, jagged is a poor choice, and so people should be aware of that. You've already contributed to the body of knowledge. Just with that that that initial graph.

A

Okay, so okay, and make sure that I say I did not highlight it this time, because I thought that we already we've had discussions, read benchmarks which show these results.

A

That is why I bought I did not like it as much so so this is what everything I'll show in the evaluations and apart from the plan also, what I have to do is I need to add a blog post where I probably show the results to the thing you just described and I'll show that and I'll show the benchmarks and I would also have to make a presentation.

A

So would you guys like to be a part of how I'm making the presentation, or is it something I'll just make an L show it's you guys are not involved in it. So I.

B

Propose the we that you show us your initial framework of the presentation on Friday, if you would be willing so then, so that we have a chance to give you feedback. For instance, it asks for a blog post and I thought. You know what the performance results you've seen would be a great blog post. Let's say look just for the information of Jenkins users without any code change. You should be aware that if you choose jacott and your repository size is larger than such, you are sacrificing performance intentionally.

A

Okay, okay, so I'm going to do that and with the presentation, okay, so I'll I'll show you show guys something on Friday presentation, sample presentation just.

B

To highlight we use jagat as the implementation on CI that Jenkins that I, oh and that's fine for small repositories, but remember that the documentation repository and the Jenkins core repository are both well beyond the threshold size that you've identified. So I already have an improvement to make in CI a-- jenkins that io to get it to get some performance back.

A

Okay, so, okay, so I think this is that I think this is what I wanted to discuss with the block which I wanted to ask. Do I have to do that on a Jenkins Rodya or can I do it? Oh, it's it's mandated. We would have been Chang I thought. I was setting up a gator page blog and I was thinking that I could do it there. You.

B

You are welcome to start it there, but ultimately, I will expect it be a Jenkins that IO blog post I don't.

C

Know any others, but.

B

What's that it.

C

Will get you some visibility to having it on the Jenkins io right, yeah.

A

That would be good, I'll I'll. Add it to the Jenkins I, eventually, okay, so so the evaluation starts from June yeah, yes, Justin yeah.

C

Plus one for like demoing, your demoing, your demo, that was a good way for us to give feedback before and then one thing that we did before, which is up to you I think we had done it in Google. Slides, doesn't matter like what technology use, but if you want to share that with us like, we can do markup and comments for if you want feedback things do look up to.

A

C

A

I'll use that, because I we collaborated the working with library, the time, okay, okay, so I think that's it. Thank you guys for spending much more time in allocated for the meeting hey.

B

Can you shop? Okay, thanks I'll post the recording thanks, everybody have a great day, we'll talk to you, Friday Rishabh! Yes,.