Jenkins Google Summer of Code Office Hours, 28 Mar 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GSoC 2022: Git Cache Maintenance Projects Idea

Description

Git Cache Maintenance Projects Idea
Brainstorming Together About Ideas and Alternatives

Objective
Meet for 60 minutes with those interested in the Git Caching project idea to discuss ideas and alternatives and to identify areas where there may be questions. Encourage discussion of different alternatives and ideas that might lead us to a better implementation.

A

Welcome this is the 22nd of march 2022 and we're having a google summer of code brainstorming session on the get cash maintenance project idea, and we should really call it that to make sure it's correct.

A

So the idea is that the jenkins git plugin has many caches, that it maintains, as on the controller and those caches by their nature, sometimes become sub-optimal because git operations are not focused on maintaining long-term, optimization they're forced focused on short-term performance, and so this idea is hey. Let's, let's find ways to automate the process of maintaining those caches and keeping them healthy.

A

So so the idea was, I was thinking and scribbling about something with now you'll notice, my lovely! This is such a beautiful user interface picture. I know you all wish you did user interface pictures like this. The the idea is on the manage jenkins page. So let's bring up a real jenkins and look at it so that we can see how real okay so on the manage jenkins page here today, there are these. These things, like the label implications and like configuration slicing and like configuration, is code. Each of them is its own subpage.

A

If you will- and I was thinking okay, this get cash maintenance, maybe belongs in some sort of a subpage of managed jenkins like this, so that was the first now to the rest of you. Does that make sense to you or is there something you would recommend? Instead, no it'd be better. If we did it this other way,.

B

um So mark one of the questions yeah, I I think definitely we should have a separate page, because I was going through get maintenance um documentation yesterday, and I saw that there is a lot of behavior that that is customizable right and we would want um the user to be able to have that in a separate page. Instead of doing it in the let's say, configure system or the global team configuration.

B

But my, but my biggest concern with having um which I saw in the document as well in your ideas, was that uh having a page word uh would be a global settings. Right would be the like a system-wide configuration where all of the repositories would have the same um uh configuration for maintenance.

A

That that was at least my assumption, so I think what you're highlighting is there may be cases where I need to do specific repository configuration. For example. I know the linux kernel needs some different cache maintenance operations configured than every other repository in my system, because that linux kernel repository is enormous. Is that sort of what you're alluding to rishabh.

B

uh My two two concerns there. One is that um how is my get executable chosen uh when I'm running this command? I mean I was looking at get maintenance start and you do that where, on whatever repository you're doing that it's going to choose to get executable on the basis of that um repository and um in let's say in a system where we have multiple executables, then how? How is that going to happen?

B

Considering the fact that different git versions are going to limit- or you know, give us the ability to perform various tasks? Usually you know involved in slumding.

A

Good good point, okay, so so let me let me for those whom rashad is showing his incredible value at having done that project two years ago. So if j git is selected, let me let me highlight this one. So if j git is selected, then the controller process address space.

A

Or controller memory pro memory footprint will increase while jit is performing the operation.

A

Inside the controller jvm.

A

As one example right, whereas if command line git is selected.

A

Then a separate process is run.

A

And the memory footprint shrinks or the memory footprint is not inside the uh controllers: jvm, okay, so, and now now back to your question, how is the get executable chosen? I think that, wouldn't you think that would need to be some sort of a global setting say I want to use git or when you see all I get or I want to use j git.

A

Tell me more of your thinking. Rashab.

B

Yeah, I I agree when we're talking about a global configuration, we need to make sure that we are consistent with what we choose in there. I believe the global tools configuration page when we are trying to choose the get version and the type of get implementation that you want to use.

A

Yeah, okay, so so, and when I think about global tool configuration what it presents to me is possible, get implementations, but it doesn't really choose one right: it presents. I've got one. I named get windows, I've got another one. I named git dash, 2.11.1 and, and those are any one of them I can choose, but none of them is selected. If I recall is, maybe I'm wrong is default selected as the default.

A

I don't remember good question all right. So so it's it's a it's a valid thing to say.

A

I think. If we're setting on the page, then we would expect.

A

All tasks on the page to use that image or to use that that version that get tool.

B

A

A

All right so before we go further with that, any questions from others around that topic of of how is the get executable chosen.

C

I was wondering that uh is it possible if we can have uh have something that that uses both the j-gate and the cli.

A

Oh, oh, that's a good question. Okay! So, let's put that uh are there cases where it would be useful or helpful.

A

To use both j git and cli get and, and that might let me let me give a hypothetical.

A

The hypothetical would be something like what reshab's project did two years ago, which was uh what, if j get, is significantly faster.

A

Faster at some operation, um what, if cli get, is significantly faster, so reshab found by benchmarking that, with large repositories, cli git is significantly faster for fetch operations. Rishab did, I say say that correctly. Yes, yes, so it's a good question. Should we should we consider the potential that we might need to do some performance-based selection ooh? This repository, we know, is this size and we've got in our toolbox j, both jet and cli, get versions such and such and we've had run benchmarks previously that tell us with that repository size or this some characteristic.

A

We should choose this implementation. I I think it's a a valid thing now. For me, performance, optimization is usually a late stage thing.

A

After the implementation is working and delivered so, for instance, we didn't do reshop's project until the get plug-in had existed for over 10 years. I think 2007 to 2000 yeah. It was an over 10 year old, plug-in before we actually applied reshobs optimization.

A

So so I'm not terribly worried about this optimization, but I think it's a valid question to ask.

A

Did that address your question? Yes, sorry, it wasn't true. That was that was that hushakesh or was that.

A

Thank you. Thank you great all right, so.

A

All right, okay, so well so we've we've talked about choosing the get executable and possibly choosing to mix implementations. Any other questions that we want to raise around those sort of topics.

B

Yeah mike, the second concern that I uh that I have with global configurations is that uh uh the tasks that are going to be performed with gate maintenance, some of those tasks are direct, are correlated to the size of a repository.

B

So there is a possibility that I don't want to run gc. Let's say for a huge repository with the said interval that I've set in the global configurations, because I know that that repository will take a lot of time. Gc operation would take a lot of time, so do we want to uh do we want to give an overrideable way somewhere?

B

uh I I think that would be uh possible right. I mean having a global configuration and then a way to override that configuration for repository.

A

I think that's and I think that's a very good question: how how what what mechanisms can we give the user.

A

To to provide finer, grained control.

A

Of the maintenance tasks right, because I think I think you you raise an excellent point. Garbage collection on the linux kernel repository takes a very long time is very cpu intensive and very memory intensive. It will with command line. Git use every core on your system and, if I remember correctly, it's willing to use almost as much memory as you give it.

A

The the linux kernel. People are not at all shy about using memory. They think memory is something that should be used.

A

So so it's it's a very good question. What what might we consider so one might be um override rules, maybe where we say um or override settings based on based on repository size.

A

um What else I mean we could we could call a call a a shell script, a user-provided shell script, to decide if this thing should be run or not.

A

We could allow call a user-provided groovy script.

A

Since this is system level stuff, it could be doing system level groovy there's a little bit of danger hiding there, but we could any other ideas on mechanisms to provide fine-grained control of the tasks.

A

How about this we could just say repository a repository based exclusion list.

A

So here's if the repository you are if the repository url is get colon, slash, kernel.org, I'm making things up now. Sorry slash linux.

A

Don't gc or maybe gc only gc monthly.

A

Something like that.

A

C

uh So, do we absolutely restrict the user from g saying even uh like once a week or twice a week, or do we just like strongly warn them that it could be? It could be eating a lot of memory.

A

So so for me, I would I would generally it's a good question, um preferred in the past anyway, to allow the user to choose to do it and, where necessary, offer them a warning or even better offer them hints. If things are going badly, that would tell them why things are going badly. So should we we prevent users from doing certain tasks.

A

And- and my thought was no but but I'm open to difference there right, the get maintenance. Man page definitely says hey. We we intentionally do not run gc as part of maintenance, but we allow you to decide that you will run gc.

A

That may lead us to this next question, of which tasks should we enable by default, and how would we decide.

B

Yes, I um that is a so I was reading about commit graphs as a task, and I I got to know that there is a there's, a setting which is not enabled by default, which is called right command, craft fetch.right command cloud. So what it does is, if so, how commit graphs would work right now?

B

Is that your gc, whenever your gc runs it's going to update your commit graph and after that, whenever there's going to be a fetch in your repository, so the commit graphs, um uh the amount of time that it's going to take for it to update the commit graph depends on the the number of commits that that are going to happen to your repository.

B

So if you have an active repository and we're doing the interval of updating the gcn, then you're performing a get fetch, you actually could potentially slow down the operation time of a git fetch.

B

If you don't have this um command. You know if this setting enabled which is right commit graph, and I believe this is not enabled by default according to the the man page for the summit graph.

B

So I mean we need to look at the individual um tasks that we we're enabling by default and see how they're going to you know affect the existing user behavior or, if they're, going to affect the existing behavior.

A

Well and now to take that theme, how could we, how could we make the information about that task available to the user? What if we gave them a an entry on the ui? Something like this? Let's see, update commit graph down here and one of the one of the data points we show them is the trend graph. That shows how long that ran on their repositories and- and hopefully they look at the graph and say: oh wow, here's this repository, where the no?

A

No, that that's, maybe not good enough, is it because your point rishab was that if I don't update the commit draft, I may get slower performance from get fetch.

B

Yeah I mean, if I know, if I don't enable this setting and if I have a large active repository, then there is a potential of slowing the get fetch operation itself.

B

That is what the man page says so.

D

In this case, because prefetch would you know, do the fetch operation beforehand.

A

Well, I thought that I thought that richard's concern that the the way that it was describing the commit graph- let's uh get commit graph. I thought it was that it when the fetch is performed, then it does. The update of the commit graph is that, did I understand that correct correctly, rishab.

B

Yes, so if you search for write, commit graph, fetch dot write commit graph. If you search for that, there's a setting which is not um fetch.

B

Now it could be right, camel case.

A

uh Yeah but okay so write a commit graph. Okay, here's commit graph right.

A

Interesting. Okay, so maybe I'm on the wrong page, rashaab.

A

So here's this core.commit graph is that is that what you were referencing.

B

uh No, this is this is what I think enables comment graph as a setting global setting.

A

A

Okay, so what we're looking for, then, is use of the word right.

B

uh Let me either send you a link. Can you open this.

A

Oh yes, absolutely.

A

Oh, oh, it's described in git config, not in okay. Oh very good! Thank you all right!

A

Okay, so it says set true to write a commit graph after every fetch that downloads a pac file.

A

If the split option is used, it will write a small one and occasionally they will merge and rights may take longer interesting.

B

So there is, I believe, I'm not hundred percent uh sure, but there is a way they change the current graphs so that you know they take the deltas and not the whole. They don't update the whole uh commit graph on the basis of you know, every fetch that they're doing, if you have this setting enabled, but if you don't then they're going to write it every time.

B

Which is a costly operation right.

A

Well and see, I don't know how costly it is, but I think I think it's worth us worth us just doing exploration. They chose to disable it by default, so it's certainly a cost that I'm not paying at all right now right, I'm when I do a fetch. I none of the the git repositories that I handle are doing this, and yet it says it would. I do get log minus minus graph all the time, and so this says wow.

A

I probably should turn on fetch write commit graph so that my log minus minus graph calls are faster.

B

I mean I my my point was just that we need to look at each of the tasks and the settings that they're providing and then think uh what strategy could we implement, which ones to enable by default, which ones do not.

A

Right right, very good, okay, the idea being hey should we should should commit graph be enabled by default. Now, let's, let's get maintenance, okay, so get maintenance.

A

This one so commit graph is enabled by default right and it's scheduled to run hourly. If you register the maintenance, it will run every hour.

C

I think this might be due to uh something that was implemented in one of the later versions of kit.

C

I read this one article where it mentioned that in version 2.24 of care, it introduced a new thing in the comment graph and it was called a generation number and what it did was like.

C

It significantly reduced the number of comments uh that it needed to uh read through, and uh I think it it used like some uh khan's algorithm and computed the number of in degrees while it was traversing the uh the the graph but but like after after the generation found uh after the generation count was implemented, it didn't need to and it got a lot more efficient.

C

So before that version, I think it was uh the comment. Graph was very inefficient, so if we were to implement it for something like centos, so I think for those cases it might be inefficient. Oh.

A

That's a good insight that there may okay, so what you're saying is there may be versions of command line, get where these settings should be quite different?

A

Yes, ah okay, that's very wise because, well and to your point, commit graph may not even be available on some of the command line, git versions that we run and and may not help if it were available right because if I'm using doing a command line, git operation and the command line, get implementation doesn't know anything about commit graph. It certainly can't use it interesting, good, okay, very good.

B

Yeah and earlier it used to do a commit craft update, while it was doing the gc tasks so their their rationale. There was that g compared to the gc task site commit graph, won't take. uh You know much of the operational tank, so they they club it together and, and that is what they used to do.

A

Okay and that that makes sense to me at least it's like yeah garbage collection is very expensive right. It's doing yes, it's doing recombining and then it does. This large compression operation and and compressing files is, is almost always very expensive, so so yeah that makes sense. You could easily hide a small operation like commit graph inside all the time. You're spending doing garbage collection, good, okay,.

A

All right well so then go ahead.

B

No mike, I just said yeah, so we probably need to see that I mean we just need to decide what kind of strategy we're going to implement on the basis of.

A

Yeah, so so, well, so for me, it would be okay. If, on the task selection, I'm going to propose an idea and let's, let's test it as an idea, and then we certainly can throw it out. My initial thought was test. The the task selection priority. Here's my proposal- okay, so I think prefetch has the most.

A

Let's put in my words mark thinks. Prefetch has the most opportunity to improve things.

A

A

It is it avoids.

A

How would you say it it loads? Oh, it does network traffic, which is very, very slow compared to disc traffic network traffic um reduction right. So one of the best things we can do is do less network traffic.

A

This thing, when I'm doing a fetch in order from a repository that has already done prefetch, it avoids a whole bunch of network traffic because it's already been done. It's already been pulled in.

A

So for me, I think this one is should be priority. One first choice make sure that works and we get good results. Now, if we're doing prefetch, then the next question is: okay. Now we're potentially every hour bringing in or every some time bringing in things that are come in as loose objects. They come in without necessarily being well well placed inside the repository.

A

So should we then consider other things as second as later priority.

A

And now this is where I don't know, which of the next ones should be, should be preferred any any insights to offer anyone there.

D

I I feel the incremental repat should be placed after the prefetch.

A

And and tell us more.

D

uh uh The incremental uh repack, basically uh I I feel it works like a you, know, uh b3, okay, where uh all the objects are placed in a sorted manner, okay in in the uh in the mdx file, okay and each uh object is referred to as separate pack file. So it would be easier to search through the comments uh if you, if we have a incremental uh repack as a second option, is what I feel.

A

Good, okay and that, I think, that's a that's a testable idea and- and that feels reasonable to me.

B

So before that, I just have one question: prefetch: what is it exactly? Is it just getting the references, the updated references into a separate directory, or is it actually downloading the objects that are not yet present in the um local repository.

A

My understanding is it's getting the objects.

A

So so my interpretation of the way this is described is it's doing the equivalent of a get fetch minus minus all, but placing the refs in a different location so that the repository doesn't doesn't so the repository state of the mod of the, for instance, the master branch pointer, is actually not updated. So it says this is done to avoid disrupting the remote tracking branches. My interpretation of that is prefetch.

A

Does the fetch and hides the result of the fetch locally in a way that git can find them, but does not update the remote tracking branches.

E

Okay: okay, okay,.

A

And- and for me this was a oh, that's smart, because I would have just done a fetch, but the problem with doing a fetch is somebody else may be, depending on that cash staying in its having its remote tracking branches stay in their current state.

A

You know the the the the thing that's maintaining that's that owns the cache thinks it has control over when, when remote tracking branches are updated, did did that answer your question richard.

B

Yes, it did, I I mean so. The subsequent question is that if objects are going to be downloaded and they are loose, they're, not uh they're, not in a pack, fine right. So then would we not want to run something which are going to put them in a fact frame.

A

Well, and I think that git fat fetch will, on later versions of git, actually place objects into additional pack files. So so there's, if I remember right, there's this thing called the multi-pack index that will allow that allows get to use multiple pack files.

A

But if I understand correctly, that's a recent, a relatively recent like within the last two or three years feature of git it does. Anyone else have experience with multi-pack index that that they can enlighten the rest of us.

A

See if I can find it multi-pack? Oh here we go look: okay,.

A

Okay, so incremental repack, it uses multi-pack index to repack the objects first by calling expire to delete, unreferenced pac files and then by calling repack to combine several pac files into a single bigger one.

A

Yeah, so so this feels like for me, okay back to our question, it was should we put. I think this is lobbying that russia had it right. That incremental repack is a really good choice, as as very very close to prefetch in terms of its values.

D

Mark I I'll share you a link. Okay, one man. Can you open it one minute sure.

A

So, are you sharing it through the gitter chat or through oh good? You did.

E

It okay: here we go.

A

Perfect here it is.

A

Okay, so 2.20 introduced a single file that consolidates all of the index files.

D

This actually gave me an overview like actually it gave me an overview of exactly how this incremental repack works using uh the multi-pack index, so the both the commands, that is the expire and uh the repack command, has been explained in this uh okay, so.

A

So this this article, so let's be sure we include a link to this okay, see the stack overflow article.

A

For incremental repack details very good, excellent, okay.

A

Sorry, I had to cough excuse the muting, okay, very good, so so what this is telling us is it's not especially healthy to have many pac files, and what this has done is if we do yeah. Okay, here we go. This is talking about.

A

Let's talk about the linux kernel, multi-pack files can cost cost time, but we may not be able to repack into a single pack pac file because it just takes too long right or consumes too much space, and so what this is offering us is the multi-pack index, and we get that by doing the incremental repack. Is that correct for shikash.

D

Yes, ma: okay,.

A

Good all right, okay, switch, which feels like that gives us a strong reason to say yes pre-french and yes to incremental repack. Those should should both be on just as they are in the and now I assume we've got a challenge there of. Maybe I should make a note here.

A

Need to assess the operations based on and their results, based on the different versions of command line, get.

A

Right because multi-index multi-pack index looks like it requires at least get 2.20.

A

Go ahead, rushy.

B

Okay, oh and sorry mark I one question that I have is that um there is also uh loose objects in the maintenance uh task right. So uh I I guess this is more of a confusion for me. If we're doing a pre-fetch, then are we introducing more loose objects into the uh the local directory or we're introducing more pack files?

B

A

B

A

And I thought that, with current versions of git it does a we could easily. We could test it really quickly if you're, okay, if I run a test, let's just go, do a quick look to see. So I happen to have a a repository.

A

That is rather large.

A

And let's go look at it just to see so um how about in? Let's see.

A

Yeah there we go, the this directory has a copy of it.

A

And let's see what a mess it is.

A

Okay, so here is something and now what's in its pack direct yeah, okay, here's a here's, a terrifying example of this is a hundred or 150 megabyte repository. I use it to test all sorts of awful things, and so, but what you see here is an embarrassing number of pac files right. There really should be in an ideal world too, an idx and a pack, and that's it, and this has many many more than that and it's got all sorts of loose objects. Now, if I do a git pole.

A

Let's see how about let's count, the number of those there are.

A

So we have 62 files in that direct directory right now,.

A

Let's see if it has to bring anything okay, so it's bringing in some new content.

A

It added four more files, so I think that indicates it did add new packs. Not just new loose objects did. Did that address your question? Yes, yes, now we should, and we should be able to see that by doing this, we should see that yeah notice here is here is something which changed february 12 and then there are four more things march, 22.

A

And yeah, so here's a good indicator notice the size of this monster. This file is 77 megabytes. Yes, yes, it's embarrassing. Nobody should put 70 megabytes in a git repository, that's sick and wrong, but but that's what this one has done. It's got.

A

One of the this pack file is enormous and there are other pack files that are pretty hot. Pretty large. This one looks like it's 25 mag, you know so so it's a big. This is a big repository.

A

And I I I suspect, if I run git gc, it will run for 15 minutes or more so so did that address the yes we're confident we want prefetch. We want incremental repack.

A

Your question, reshop, I think, was: do we also want to make loose objects a standard, a standard part of it like like get maintenance, does right because get maintenance has chosen to do loose objects daily, less frequently than prefetch, but but it much more frequently than gc.

B

Yes, and if you do, if you're choosing to perform loose objects, then we would do it before the incremental impact right, because we want to have more pac files first and then repackage them. Whatever the impact is doing.

A

Good good point yeah. Let's so, let's see it says it place, cleans up loose objects and places them into pack files yeah, so well so, okay, so I'm gonna try something here in that that repository. So it's got some stuff in objects.

A

Get loose objects! Oh! No! I don't think that's the kind yeah! How about maintenance? Minus minus task equals.

A

Oh come on, there's got to be a way to do it.

A

So there it is.

E

Loose objects, loose objects, job miners.

C

Get maintenance run dust.

A

C

A

Thank you right, of course. Clearly I don't have enough experience with this. Do I like that?

A

Okay get maintenance run.

A

Maybe it's task: oh yep, okay, that was it and now what did it do to our they're still there, except did we get a new entry in the.

A

Let's see ls minus altr uh dot get such objects. That's pack. We had 66 before.

A

And now we have oh and look there. It is loose.pack okay, so so it and and back to their comment. They said: hey we're going to do loose objects and it's going to create the new pack file, but it did not apparently delete all the other things it left them around. So there's a pac file for use, but the loose is still seems to still be there interesting, okay cool.

A

Now I have no idea. I assume git must be able to use the loose thing. Okay, good! So rishab back to your question. Are we answering the question that you had about? How do we approach it?.

B

Yes, I think we will, we would run, we would want to run loose objects first right like refresh loose objects and then incremental feedback, so that so you know.

A

I see what your point is: okay and but now, let's, let's test that, so they say they run incremental repack and loose objects daily, but they run prefetch hourly by default. So should we be considering their 24 times more frequently running prefetch than they are running incremental, repack and honor? The same idea.

D

I have a doubt here: would uh this incremental repack would would it even consider lose objects as part of it? I.

A

D

It did not right didn't.

A

It say that it only does multipack index.

A

Okay, it deletes unreferenced pac files and then combines pac files, so I would think incremental repack does not do loose objects that would lobby for rashob's argument that we should do loose objects and incremental repack as sort of two steps close to each other one right after the other. Is that what you were asking.

D

What was it uh like? My question is like would increment uh the multi-pack index file? Would it consider the lose objects pac file, which has been generated in the top? Oh, I see what you're.

A

Saying I that's a good question, I don't know let's well, let's, let's try it and see. We've got a task here. So this time we're going, we just did loose objects. So now, let's do incremental repack.

A

Now it added three more files.

A

This multi-pack index and two more pack and another pack.

A

Now I don't remember seeing a multi-pack index in the list at all. Let's see and the count of files it was. We went up by three. So we we got a pack.

A

So multipack index did not exist before I did.

A

Get maintenance run incremental, repack interesting, so I've been running with a sub-optimal setup because I wasn't using get maintenance at all. Oh, this is really great. Thank you, you're, all wonderful to be teaching me more about git. Thank you.

A

Okay, so so I think what that's saying is man page review.

A

Indicates we that loose objects.

A

Should be run as well.

A

And there it's daily, and this one is also daily. This one is currently hourly right in the get maintenance default.

A

C

There's also another command directly below the incremental repack called back reps and that collects the loose reference files.

A

ah Okay, all right next, so this one, oh okay! Now now, then this is so. We saw this one loose objects created entries in the in the pack directory lou stash something what you're thinking is. This may actually create them, as real packs is that is that correct? What you're saying ariane? Yes, okay! So let's try that.

A

Pack dash rests.

A

Okay, that was very fast.

A

And it still seems to have left the the the loose objects in there. If I, if I'm seeing that correctly, there are definitely loose objects in that directory. Okay, so what was pac refs doing it says, collects the loose.

D

Reference, that's only for branches.

A

D

Tags by default, it's not for all the objects. That's what's specified when you open, when you open that link there.

A

B

Optimization is not for the object side.

A

Right, it's that the word the words reference files here are very important, I think, is what you're saying right is that this is doing it used to write right. Okay, it used to store one file per ref in a directory, and if we look, I think I can see that.

A

Yes, now, if I go, let's go up, we've already taken it out. Let's try a different.

A

What's a good bug report, how about.

E

A

No okay, so I'm not seeing oh, oh! If I look in tags, there are a bunch of tags.

A

Now, if I go back to the master directory, I might expect that those were somehow less because it's somehow done a small database of those stored in some other location is that is that what you're telling me that pack refs really is creating instead of one file per ref, um it's.

D

Now stored in a distorted directory it's stored in pack drift, there should be something called paragraph there right so.

A

If I look there- and here is this thing- that is some sort of a better representation than a single file per tag or a single file per tag, plus single file for branch.

A

Oh good, okay, all right so so for me pack refs now now my repositories typically don't have an enormous number of of references. um The that hundred megabyte one that you're you were seeing is probably has several thousand might be as many as ten thousand uh most get plug jenkins plug-in repositories have far fewer than that right. They have on the order of hundreds, maybe interesting. Okay, so back to the question: when now wait a sec, they don't even list pack refs here as a task.

B

Yes, it's not in the incremental strategy.

A

Oh okay, now so you think there's a reason for that. Interesting.

A

It's not in the incremental strategy. Okay, so so does that lobby that we should probably only use pack refs in very special cases.

A

I mean it is saying that it speeds up operations that iterate across many references, and I don't know how many of those we actually have.

E

B

So I was just saying that the operations that we would want to optimize would be a get fetch or a big goal. I mean the operations where there is a significant network in bandwidth usage right, so I believe the priority of the tasks should be should be tuned to optimizing. Those operations.

A

Right and this one packrefs is packrefs is not as far as I can tell a network, a network related one: it's not going to help network performance, it's not going to reduce network traffic or spread it spread it out. So, for me, assumed not part of our default set. Is that a safe way to say it.

B

Yes, and could we could we do a benchmark on, I mean we already have the benchmarking framework within the gift line depository.

B

So this is a test that maybe could be included in the proposal, but it's an experiment that could be done. It should be interesting to see if.

A

Yeah, the challenge for me would be: how would because of what it's doing? How would we? How would we do that benchmark it's so it's okay! So here we go a repository with too many refs should pack all its refs with minus minus all once and then run pack refs, so I assume so it'd be get pack refs, minus minus all and then every so often run this.

D

I think this would be useful if you have a lot of branches. A lot of, uh if you have too many branches on uh or get repository, has too many branches.

B

How how I mean, if you have a lot of branches, what does it affect? Does it affect the gate of the kit fetches operation time? How is it affecting.

D

Because we have to look through a lot of like through if we go through the ref folder, we'll have a lot of branches right so searching through that, I think, would take a lot of time if we do a background everything we be put into one place.

B

So uh the reason why I'm stressing on that is we so when we did the benchmarks on gate operations, what we found out was that the time it takes for a git fetch to happen is is is a function of the amount, the size of objects that you have in your repository, rather than the number of comments or the number of branches or none of the tax. That is what we found at that time.

B

Okay, if you think there is a way for us to demonstrate that the number of branches are going to affect the network intensive gate operations, then I think we should definitely consider it.

A

Yeah, so so- and I think I think that's a that's a valid point- that pac refs may be a later optimization that we consider or we might what. If what, if we said, hey one of the measures we take of repositories is the number of references.

A

I don't know how we would get that, but if we, if we computed the number of references and if the number of references was beyond some certain threshold, like they say here right a repository with too many riffs. So if we did some measurement periodically and said this repository, has this many refs, some arbitrary number, a hundred thousand and if it has more than that, we will at least once do a pack pack, refs minus minus all and then automatically schedule it to do a get pack refs once a week. Something like that.

A

I mean that might be. I don't know what that threshold would be, I'm not sure how we would def obtain that threshold and identify it, but it could be a way we handle this. This comment what the documentation says.

B

Yes, I agree interesting.

A

Okay, well this this has been a most effective session. Thank you very much to everyone who's been here. I had wanted to limit us to an hour. What I'd propose is, if you're, would you like to do this kind of session again? Are you willing to have these kind of discussions and would it fit for you if we did it um later this week or early next week? Would that be okay? Are you interested in that, or is this not nearly interesting to you, you'd, rather just focus on other things? What what's your feedback.

D

It's good to have these sessions mark because you know I'm learning a lot about what exactly is required and how to proceed also uh with the implementation.

D

So I it's better if we have uh it's good, if we have like the these kinds of sessions.

A

Okay, good well, and so others. I I like that that that's that's great for me. Do others have the same feeling.

C

Yeah I agree. This session has been like really helpful, in understanding and and and knowing about how we can proceed forward and how we can look at things.

A

Okay, then, what I'd propose is, let's plan for an hour a week. If that's okay and it would be, it would actually be a little better for me if we were willing to do it. On my on my day when I already am doing office hours, so would you be willing to do it fridays rather than doing on on wednesday, like we're doing this one?

A

So we would just do it right after google summer of code office hours, or is that not a convenient time for you, so we would basically make gsoc office hours 90 minutes for you instead of 30.

D

I I for me, it's fine mark now, I'm I'm ready for it.

C

Oh, it's it's good! Thank you.

A

How about you and chris you would with that, would would friday work for you or it's. Both of them are right in the middle of your working day, and I apologize for that. It's india, time and rocky mountain time in the us are different enough. It's always going to be complicated.

D

Should be okay,.

A

Yeah, it should.

B

Be okay for me.

A

Okay, all right, so then let me take I'll. Take the action item to schedule.

A

Schedule recurring sessions of immediately after the asia gsoc office hours.

A

And uh we'll try to meet weekly meet to discuss, so that means, let me double check my calendar just a minute. To be sure, I've got the right, so that means we would next meet on friday. The 1st of april.

A

Is that okay, or do we need to meet sooner than that.

A

C

A

So friday april 1 it would be at 3 30 a.m. Utc.

A

Which is about 30 minutes prior to this time, because we are right now? No, let's see is that I know it is. It is exactly at this time. So it's right now! No, no! I take it back. It's 30 minutes after this time.

A

So what time is it locally for you in india right now, 8 30 in the morning? Okay, so it's 8 30 a.m now, so the meeting then would be would go from nine o'clock a.m. India, time to 10 o'clock.

A

Does that work? Okay, for you.

B

It works fine for me.

A

Yeah, it's fine, okay, great!

A

Then then, let's plan for that and if we and then we'll we'll try the same thing the following week and and let's make some progress thanks very much I'll upload, the recording of this probably 24 hours from now. It's I'm a little behind schedule on recordings right now, thanks everybody for your time. Thank you. So much.

C

Thank you. Thank you.

E