Jenkins Google Summer of Code Office Hours, 8 Jun 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2022 06 08 Git Cache Maintenance

Description

Git cache maintenance project for Google Summer of Code 2022

A

We'll discuss the action items first, so the first action pending action item that we had was to update the project details for entry details. I believe uh rashikesh has raised a pr on the jenkins I ordered.

A

But if you think it looks good, then we could probably move forward and merge it get it.

B

C

Yes and I'll, I I'll do a review and if you've already reviewed it, I think my review will be quite quite rapid, just to be sure it formats correctly and then I'll merge it I'll try to do that before I go to sleep tonight,.

A

So, apart from that, we gave us some doubts that rishikesh had today, and these were action items upon me to help him on this. But unfortunately I could.

A

And to do this, I I myself had to go deep into the kids and understand the pattern, so I realized that during my time uh you know I was I I never um I I remember I had to use this the descriptors. We have to use it, but I was never. You know I never explored into the depth that I should have to explain this pattern, so I created.

C

A

With with whatever I I could find with my explorations- and I hope that serves as a starting point for him to understand the pattern.

D

I haven't again worked on the ui because you know based.

B

D

Implementation, you know the the ui is based on the implementation, so I I post, uh you know the implement implementation for now, and I've created a design document. You know explaining two ways of implementation that is using current syntax or you know, using the build discarder. So that was what I've done this entire week.

C

So then it feels like that rashad and I should review the design document. uh Do you want to give us a summary of the the concepts that you found different between the two and do you have a particular one that you recommend for shikesh.

D

Oh, oh, I have shared the link exactly in the github channel or one I can share it here.

D

Oh yeah, so, oh so basically, uh this was, you know to just uh you know, explain the difference between both the strategies, uh the global builders carter. The aim of the global builder scholar is, you know, to schedule maintenance tasks without having a cron syntax in the ui. Okay, it is auto. It is done intelligently by jenkins internally, uh the cron syntax, you know, or administrators have to pass. You know uh you know cron syntax, for each maintenance task uh in in that strategy, so the global build discarder.

D

uh Here uh I I have written an em or you know a working of how exactly it works. I was going through the documentation.

D

So basically, if you check there is uh or what it is uh or class called background, global build discarder yeah that that uh so this background global does uh build discarder it executes every hour. There is this uh method called get a recurrence period, okay, which is uh which is uh currently by default, set to hour. So every every hour, this this thing runs and it calls the execute method.

D

This execute method calls the process job. Okay, it gets its check, it checks all the jobs in the uh jenkins. On the jenkins controller, uh and then it checks whether this job is uh applicable for a global, uh for you know discarding the previous builds which have been uh present or not.

D

If, if you go to um ratio, can you go to the design document yeah? So in the second step yeah, it calls the execute function hourly and runs the build discarder on all the jobs present on the jenkins controller uh it it is based on the strategy present in the global, build discarder. Okay. So basically, there is a of a function which we need to, or you know over uh override, that is the is applicable function which the user needs to set.

D

If the user sets this, uh so basically, whichever uh functionality uh he writes in the is applicable function, that configuration is used by jenkins internally to decide whether uh it should this, you know, run the build discarder on that job or not. This is the basic functionality of the global build discarder.

D

I was thinking we can use the same functionality for you know, uh but it all uh maintenance tasks as well, but I was having few questions regarding this, because global build discarder is only used for jobs. It's only used to iterate over jobs. So is there any ways you know where is there any way where we can use it for caches as well.

C

I I don't know of a way directly to use that, but I would think that its implementation would could be used and we may have to do a new implementation that says: okay, we're going to use global, build discarder as our pattern and copy its code or duplicate relevant portions of its code into the git plugin to iterate over git caches.

C

But, but I think, I'd assume that you were thinking here, huh because global build discarder happens every hour. Would we then, in the ui, have the user choose something?

C

How frequently they want it, whether it's hourly or every two hours, every 24 hours every 48 hours and then skip it on those hours when it's not selected? What was your? What was your vision for people who don't want to run caching every hour.

D

Oh so what I was thinking I was thinking of running and and so first my thought process was yeah. Now whenever it runs every hour, I was thinking. uh First of all, you know I don't.

D

I didn't want it to overload the system okay, so I was thinking that is there any way where we can find the cpu utilization, how much cpu has been used or how much ram has been consumed so that you know based on that data, we can you know schedule, uh you know the maintenance tasks or hourly, or you know, every three hours and let the user, even you know, um have an option of scheduling it weekly.

D

Something like that. I, that was what I was thinking- I'm not sure about how we would uh is there any other intelligent ways of scheduling, maintenance tasks.

A

I was just saying that I think we should divide this into two steps as mark. The first thing that mark asked is that, as we decided during the initial project right there, we are going to expose a way for the user to be able to set a schedule for these tasks right. So the when you talk about cpu utilization and then the system having the intelligence to be able to show you the jobs on the basis of the current status of the system.

A

I believe that is something that we can implement when when we know how to implement a scheduled job based on the users, so I I I think we should take this step by step.

A

We don't have to make it intelligent at the first begin with the first iteration of this feature right, we can start with the first rule which, which I believe please correct me if I'm wrong was to take the user's input on what is the frequency with which they want to show you these jobs and then use that input to run the tasks and not think about. um I mean if we could think about it. It's great, but it's okay! If we don't think about uh cd utilization at the first step, would you agree with.

C

I do, I think, that's a that's. That's a, I think, we'll need the cpu utilization or the the overload prevention, no matter which scheduling technique we use. So I think I think considering them as separate steps is a good idea.

A

Okay, I mean I just wanted to what my point was to you know as a steps and requirements. We know that okay, first, we need to do that and parallely as a feature. We could figure out how to how to understand how to get.

A

Affect the system.

C

Yeah now we push progression in terms of I'm not sure that cpu load is, is the crucial measure there I would or and and even if it is, it may be difficult to get that in a platform-independent way. I think, though we can get. How long did the sub-process run before it completed, and the duration of the of the run of a sub process gives us a first first level approximation of how much demand it placed on that on that computer.

C

So we know when it started and we'll we'll be monitoring its exit code. So we'll know when it finished, and the difference between those two is the duration of that process. In terms of its wall, clock execution, however many cpus or cores it had available to to use.

C

But I think I think rishabh is right, that it's probably much more important, that we do the first step and get it get confidence there before. We worry about optimizing to not overload.

C

So rushikesh did that answer your question.

D

Yeah, uh uh so I have a doubt so basically uh uh this what you did. Finally, are we let other ui in the ui? Are we going to know what what kind of ui are we looking into? Are we looking into you know or taking you know, uh cron syntax from the administrator to schedule, maintenance tasks, or do you want the uh do? You want the maintenance tasks to be run automatically without having a input from the administrator.

C

I was assuming we wanted the administrator to have some control of the frequency. Okay, I don't know that we want a cron syntax. I I or maybe maybe I should say it differently. A cron syntax may be more precise than we're actually ready to use.

C

uh For instance, I think it would be disastrous- or at least at minimum very unwise if they scheduled it to run every minute.

C

So if if we, if we went with um the global, if we went with the global build discarder concept and so it checks every hour, is there work to be done and then, if the job configuration said only do that every 24th hour or every 48 hour that might be added might be enough and then we we don't need to process cron syntax. Now, I'm not sure that jenkins users, jenkins users may say well, but I had to learn cron, syntax everywhere else and and they'd be right.

C

It's just I. I have a hard time, imagining us running cash maintenance more than once an hour, and maybe I'm maybe I'm misunderstanding there. What's your thought on it, do you think that there will be interesting use cases that require it to run more than once an hour.

D

Oh, I I don't. It depends on the repository. If the repository isn't that big or if it doesn't frequently updated it, I don't think it would make sense to run it every hour. So.

C

Yeah, that was, I agree with your your observation, at least for me. I have a hard time imagining a repository that is busy enough that refreshing, its cash every hour would be, would be important and even more difficult to envision that refreshing, its cash. Every every few minutes would be worthwhile.

D

Oh or we can give an option where you know we can run it not hourly, or you know, on a daily basis or some no to or something configurable by the administrator, because the aim of the global build discarder is to not have crons in tax it. It has to be done by jenkins internally, the other implementation provides crons and taxes for the year. You know administrator where he can, where they can plug in the crowns and taxes and run the maintenance tasks, so that that's the whole difference between both the.

A

C

Ahead, go ahead.

A

I just wanted to say on this point of uh how do we decide in shedding these jobs frequency is going to be. I believe we should uh test this idea by when.

C

A

This feature we should run this on mark's machine which has a lot of projects. um I I believe we should take inputs from that from that machine on how the the frequencies that we're trying to whatever is frequencies that we're assuming should be, you know optimal for the system. We should. I believe it will be a good practical test for us to know how it's actually going to work on a user's machine, and if our frequency is not, um you know the optimal range that we wanted to.

A

So it could be a test that we could see run on your machine and then see how you know the system matrix how they are performed on the frequencies that we've decided could be our year, could be databases or do we need to reduce them? How can we reduce them?.

C

I'm certainly willing to to be a test case. I'd be honored to be a test case, not just willing. That would be a real privilege. So if, if my little installation is is a of of use, I'm happy to do it and, yes, it does have several.

C

In some cases, rather embarrassingly large repositories that it's caching.

A

Yes, so I I, I believe it would be a good exercise to understand, understand the relationship between the maintenance maintenance tasks and how they, how they're affected by the size of the repository or are the.

B

Other parameters.

A

Which are defined in the repositories and.

B

A

You have a large number of sizes repositories: how how do.

B

A

This is the task that you've decided how they're going to interact with the system, how much resources they're going to take. I mean I.

B

I was just listening to.

A

The question that we're trying to answer right what the frequency should be- and I thought that before without trying actually these tasks on the machine at that scale, how would we understand? How can we answer that question.

C

I am and- and I think that some that sounds very good to me now- I'm not sure the answer to that question resolves which path russia should take, whether global build discarder or cron syntax, because I can see arguments for for either.

C

A

Correct me, if I'm wrong, the tron syntax allows us to. Essentially it allows us a way for us to decide the frequency beside the frequency uh on a more.

B

Granular level.

A

Then exposing a way for them to just say that okay, every one hour or every five hours right.

C

Correct, for instance, I can with cron syntax. I can say things like on the second and fourth thursdays of the month. Those kind of things I.

D

C

Anyway, so it's so yes, it is much more sophisticated than the the simple hourly scheme that the global build discarder uses.

C

So it and crown syntax is very, very rich in jenkins. It has keywords like at daily, where it says, run it sometime during the day or at hourly or at weekly and and so it it has. It has a level of sophistication that is certainly very, very, very powerful.

A

So uh sarushi, as I.

B

Understand the primary goals that.

A

We have for this project is to do the heavy work, heavy lifting behind the system of running.

C

A

But for the user we want to provide a way for them to configure these tasks right, yeah, and if that is the aim, then more customizability, especially when that feature affects the performance of the system like more granularity, could mean that the admin would have more options to essentially find out the let's say we don't r frequency or whatever. We think the different default frequency is not the one that should be divided in their system.

A

So what I'm trying to say is that the throne syntax would allow us, allow the user more freedom to decide for themselves. What is the best way for them to run these tasks instead of us giving standard slots or something like that.

A

C

An earlier question: would you be okay? If you are, do you have something that causes you to lean towards one choice or the other.

D

Nothing other that's what I've gone. Actually, though, uh I've gone through the entire uh cron syntax implementation, and I'm pretty much confident of when implementing it. uh That's uh there's no favoritism as that sort, I'm still a bit uh confused about the global build discarded like what are the various conditions based on which we are going to implement the maintenance tasks so.

A

So, with the second strategy strategy, hiroshi expert, we can use the front syntax.

A

The parameter is shared in the plugin, so there are the same way that they we've seen in the global building strategy. There are books exposed to run processes.

D

Yeah yeah yeah.

D

What you know we can uh we can create our own asynchronous thread by extending the you know: async type periodic work that does that yeah. If, if you extend to that, you can create your own background process and uh and then run the maintenance task in that thread,.

A

So what you say is that the second strategy that you have the second implementation, the difference would be somewhere in this contract.

D

A

Reference period would be divided by decided by.

D

Here what would happen is uh and and that in the pattern in the cron syntax implementation, there would be one thread which would be running every minute, which would check every minute whether the cron syntax is valid or not. If it is valid and then the corresponding maintenance task is run on all the repositories.

A

So, in terms of implementation, the the core difference between these two options is the usage of concept tax.

D

I didn't get you, can, you repeat.

A

I I just wanted to ask: am I audible, yeah yeah yeah? I just wanted to ask if the the only difference between these two implementations is the ability to use on syntax apart.

C

B

A

We could use the same. We could extend the same contract to iterate.

D

Over all the jobs and then yeah yeah yeah, both of yeah exactly the rest of the implementation, is same. The only main feature would be. Are we taking the cron tip syntax from the user uh administrator and scheduling it, or do we intelligently schedule it behind the scenes in jenkins.

A

B

A

Believe could, for us could be a step-by-step approach.

C

That sounds very reasonable to me. I think cron-based syntax feels like you've done a very good job of exploring cash.

D

uh We, even you know we can safeguard crons and tax or like, as I've stated in one meet that uh assume or an administrator runs a gc every minute. Okay, here's his in taxes corresponding to every minute or every 30 minutes. We can safeguard by putting some rules behind the scenes where you know he can uh start. uh You know, running maintenance tasks. Only.

D

You know like a base from hourly. He can start it like one hour, 30 minutes one hour, one minute, something like that, so that he doesn't overload the system.

C

Alternately, you could put a limiter in that says if a current, if there is currently a maintenance thread running, I will refuse to start a new thread.

D

Or as we discussed, we can add it into a queue and then you know dequeue it and then yeah.

C

Yeah, even better you're right, not not just just cue it and then somehow.

B

Use the same thread.

C

Right exactly yeah good point very good.

A

Using your own syntax approach.

C

Sorry, I missed part of that sentence. You said something about the cron syntax approach. Could you say that again I.

A

Was just saying that we're tilting more towards uh preferring the second approach right, the parameters, better music from syntax approach,.

C

I think I think, though, yeah and and as re as as akesh noted, some safeguards only have a single thread. That's processing these, so that we are forcibly rate limited. We can never have more than one running at a time that kind of thing yeah.

A

So I I have a question related to using a single thread, the execution of these jobs, so how how do we? So if we are not using multiple threads, let's.

B

Say I have 60.

A

Repositories- and I have these five tasks.

A

To perform, let's say within a day so when I start running them it reading through the repositories.

A

If I'm using a single thread would be running a chance of increasing the time of execution for these tasks,.

A

I I'm not to a point where it is unfeasible in feasible, because we this these are background processes. So we don't have to worry how much um okay.

B

So the question.

A

Would be first of all, do we worry about how much time does these tasks are taking and do we put put an upper limit to those to those time.

B

A

B

An example if, if.

A

Gc has started to run on a particular repository or a single thread, and it's taking, I don't know five hours so do we do we have some kinds of upper limits where we say that okay, we can't proceed forward considering.

C

A

It might take five days for the whole batch of tasks to run on these 60 repositories.

C

Yeah, I'm not I'm not immediately visualizing why we would want to put an upper bound.

C

I can imagine someone's decided to compile the linux kernel for their raspberry pi and as part of that they're doing a garbage collection operation on the two gigabyte linux kernel repository on their raspberry pi controller, and it may take many hours, but they get the benefit that when it's done it's it's done. I tell tell me more about cases where you worry that you're worried that hey, they may have many copies of that and therefore they might somehow not be able to complete the other work.

A

My only cons, my concern is, I think, it's born out of this. This assumption that I have that. Let's say there are between the two scheduled frequencies that we have for the whole batch of maintenance tasks.

A

How do we guarantee that all of the tasks that we have decided that are going to run for all of the repositories within the system run before the next frequency scheduled time for them to run since we're doing this on a single thread.

C

And I thought that the answer there was that that, because there's a stack or a queue that that won't allow the next task to begin until its predecessor has completed, and and for me that was okay.

C

That means that if they've scheduled them to occur too rapidly, they will queue and the work will be done when when when, when the first maintenance task completes, the second will begin if it was scheduled to begin earlier and same for third, fourth and fifth now now I don't know that we want an unbounded q, because in that case it's just cueing to do work that well yeah.

C

If the degenerate case that you're describing were to happen where processor and file system combination is so slow or large repositories are so so large that the the work simply cannot be completed in in the yeah, if the, if the, if the controller were continuously falling further and further behind in processing its cue of maintenance tasks, there's no point in making that cue very deep.

C

It will just work on them when it can.

C

B

C

D

A

C

I talked to your question rishabh or no correct, correct.

A

Yes, but I as you've said it's it's a case which could be an extreme case, so I don't know if it's something that we should.

A

You know prepare for right now, but this question that I'm trying to ask is only because so let's say currently in my system, if git gc is going to run, my limited knowledge is that it's going to use whatever resources that I have on my system to run that process. It's not going to be a single threaded process.

C

I that's my understanding as well. Get gc get gc command line. Get gc is specifically written to use multiple cores if the. If this the computer has multiple cores, and so it will run portions of the garbage collection in parallel.

A

Yes, so when we're saying that we're going to limit our jobs in a single threaded um process, then we what are we for. We are forgoing the execution time right. We're increasing the execution time for these individual tasks.

D

So uh I I was, I I I think I I don't know if if it works like that, but I was thinking when I schedule a maintenance task using you know the get client plugin. It calls the underlying kit command line present on your system, which runs a separate process to run the maintenance task and once that maintenance task gets run, you get the result into the get client plugin. So that was what I was thinking.

C

Meet me too, and in the fort, no one, please family decides to use multiple threads and it does our multiple cores and it does that independent of jenkins. Unless we were to somehow configure it to do less than that.

A

That makes sense that answers my question. I was not thinking uh you know in that way. Yeah.

C

Okay, it's still it's we're only putting a single command line, git process, forking a single command line, git process, but then that process chooses to use multiple cores as it sees as it feels to do so.

A

Correct, that's true. That's true. I was thinking that it would be somehow allocating a single thread to the git command line operations, but that is not happening. Yes,.

D

Here I was worried about this like. uh If I run again, you know a gc command, you know and that you know over. You know it consumes a lot of resources on the of the computer. Would that I think that would be a problem right like it would it would consume like 90 percent of cpu, you know making the computer a bit slow.

D

I was not sure about uh how do you proceed with that, or is that fine? For now? I'm not sure. What do you do with that?.

C

My opinion was that's fine, because what we're doing is delegating to command line git, and if that is a problem for the user, we would invite them to change the scheduling and that argues for the cron syntax change, the scheduling so that it only happens during periods when they reasonably the system is idle or is less busy.

C

Now, on a system like ci.jenkins.io, those are not regularly predictable times, but there is some pattern to them.

D

Or you know, we can read like the frequency of how how you know free when exactly the system is idle, and then you know give a recommend. The administrator based on that. uh You know whether you know so that he they can schedule the maintenance tasks.

C

Good point: yes, if and there may be some historical data like that available since jenkins itself- does predict load statistic or does present load statistics for at least the last two days.

B

D

Although I had another doubt uh regarding the kit caches so basically when I create a free style job on the freestyle job or in the jenkins ui, it creates a separate workspace work directory which contains the entire repository. Whereas if I use a multi branch pipeline, it only creates a caches folder. So here we are only worried about the caches right, not uh regarding the freestyle repositories, which is present on the jenkins controller.

C

Correct because it's an it is strongly advised to not have any jobs that execute on the jenkins controller and so having us perform any maintenance on jobs that the user makes. The mistake of running on the controller I think is, is a would be a bad pattern. It's we only want to deal with caches that are maintained by jenkins core itself, not with freestyle jobs that the user constructed.

C

C

Did that address your question? Hiroshima? Yes, yes, yes, so I agree wholeheartedly with you that we should only do multi-branch. We should only do caches on the on the controller, not job workspaces.

D

Also yesterday I was uh you know just messing around or some other you know trying to make a implementation. So there I try to save the entire uh data which I've got like the cron syntax, which are taken from the user, and uh you know stored it as an xml file. Can this xml file be changed by other users on uh like if that computer doesn't belongs to that administrator?

D

Can some other people change that xml file asking you know just for security for a reason.

C

Certainly jenkins configuration files can be modified by anyone who has permission to modify them. The next time jenkins starts those configuration files will be read. Okay,.

D

So if any, so, if any malicious user tried to change the cron syntax in that configuration file, would that affect the uh jenkins software.

C

Yes, that's correct. Oh.

D

C

So ui based validation, is good uh and, but is, is certainly necessary, but probably not sufficient at the low levels of the api. We wanna we'll want to be sure that we're we're using we're checking the data. The schedule that's proposed for sanity there as well.

C

The other reason for that is configuration as code.

B

C

Allows those kind of configurations also and then the user might have the administrator writing the configuration as code definition might have made a mistake that causes it to now be scheduled every minute. Something like that.

C

So yeah we, if, if there are safeguards at the ui, we usually what happens, is safeguards at the ui are also implemented in the api, and the ui just presents a pretty error message of the same. The same safeguard yeah.

C

Any other topics we need to discuss today.

D

So now we are uh so now we are more favored towards parameter. You know, cron syntax approach, so I can you know uh so uh I I I I can you know start exploring more about. It uh is what I was thinking. That's what was this? You know this weekend's agenda. You know to fix the architecture so that we can proceed on. You know how we would implement.

C

And that sounds very good to me. Rishikesh thanks for doing that exploration and thank you for for having researched global, build discarder versus the scheduled tasks interfaces. Those well done.

C

Now timeline wise, I believe we're about to start the official coding phase. Aren't we.

D

Yeah, let's it's going to start on june 13th according phase.

C

14., okay, and are you, are you feeling like you've, got enough that you're ready to start start? The coding.

D

C

Great, I I apologize it's, it's approaching, 11 p.m. My time and I'm I'm not nearly as awake as the two of you tend to be at this hour of the night.

D

I don't have any other questions. I think we can wind up the session if you want mark.

C

All right, then I'll go ahead and stop.