GitLab Engineering Productivity Showcases, 8 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab Pipeline Caching Showcase

Description

Engineering Productivity showcase highlighting how caches are used in the GitLab pipeline

A

Yeah, you can all see the slides.

B

A

Great um yeah, so first question was: why is it important, so I will start with that just to show and uh uh improvements that we made uh with the changes to our caching strategy. So I think uh we did the chances like one year and a half now ago, um so yeah with this new caching strategy, uh so reduce the in the result, jobs, duration. By from like from one to 20 minutes in some cases um and for pipelines and duration, it was between six and twenty minutes. So that was quite uh quite a lot.

B

A

Our goals at the time- and we also reduced an approximate three thousand dollars of ci machines per month, and I don't recall any uh cash related problems after that, um or at least we haven't have, we haven't had a lot, that's for sure um so now about the strategy itself.

A

So um the goal with this new strategy was to make it very simple, like not have a lot of cases for, for you know like um when, should we purge or you know, update the caches? What should be the cache keys because, as you all know, that's one of the hardest problem is to find a cache keys and to and to like, to find uh yeah to basically know when to purge caches.

A

So basically, the the important steps for this strategy is that so we are following the best practices that we that are defined in the gitlab docs. um So the first point is that every job should be able to pass without any cash or even with an outdated cash.

A

So that means that caches are only there to speed up jobs, but that they are not there to make sure that jobs pass. um Otherwise, that's that's a bug. um The second important item regarding performance is that all jobs must only pull from the cache and never push, and this is just to avoid unnecessary uploads of caches. That would that in most cases are identical.

A

So that's just a waste of time and bandwidth. Basically, if you do that, um and that's like pull pull push so pulling and pushing from into the cache is the default behavior. So you need to change that and the goal of cache is to avoid like restarting dependencies every time, so ruby, gems nodes and go packages.

A

But if you, if you look at the first point- and that also means that the job still must still like, run the install dependencies comments, because the cache could be empty or outdated um and so yeah that so on this that works well for this package managers because they handle outdated dependencies.

A

So, for instance, if you have a merge request where you update a gems version and the cache as the previous version of the gem, um then it's it's fine. It would just like use the gems that don't change from the cache and only install the the new gem. So that's really efficient.

A

About the cash keys, we really took the simple way of having fixed cash keys, because yeah that allows all pipelines to use the same cache. uh We do that because um there's no like it's related to the point above like if a a gems or a package in general is updated.

A

Then it would just install the missing one. So it's totally fine to not start from scratch.

A

If you only update one dependency um and yeah that allows to limit the number of caches, um so the number of combinations, basically uh so, instead of having one cache per like and one cache per branch but like- and we don't have only one cache- we have uh maybe 10 um different caches, so that would you know, multiply 10 by number of branches, so that would be a waste of uh storage um and yeah about the updates of the caches.

A

Since we don't update in in jobs uh in general, we only perform the update every two hours in in our regular scheduled pipeline.

A

So two hourly scheduled pipelines and in these jobs we just we don't even pull the current caches. We just start from scratch just so that we don't have to care about any like cleaning any outdated dependencies. So that's really simple.

A

So I've put the link to the request that implemented this.

A

So we are using the multiple cache feature, which was quite recent. I think it was implemented like introduced this year and it allows to have atomic caches uh really specific to like. We have the ruby cache. We have the nodes cache and then you can define a cache that would um combine these two um and that's also register the number of jobs we need to update the caches.

A

So that's that was a great improvement as well, um and I've listed just two specific caches definition.

A

um So there's the guitar binaries cache, which is which the key is based on the content of the list server basement file- and this is because this cache um stores the guitar binaries that that are built in the setup test and job, and it's just simpler to rebuild these binaries when this file changes, because the values are dependent on this file um rather than you know, like compare the version, um the binaries version with the with the file um and given that this version doesn't change very often it's fine and the second specific one is the assets cache um which includes um compiled frontend assets, and it also includes a specific assets.

A

Ash txt file, which is the hash of these assets- and this is this- is just an optimization so that we don't have to rebuild to recompile the assets if the the hash of the files is the same as in the the master cache basically.

A

So this is just two specific cases that are a bit different than our usual strategy, but that that works well um and yeah. I think that's mostly it for our caching strategy, uh so anyways. If anyone has questions.

B

I think there is an issue about uh cash for fork because uh for now suppose, there's someone folks the project and create their own merge request and that branch will use their own cache and there's nothing so and they, if they don't put, they don't try to update cache with labels or merge, request, title and.

A

B

Touch their default branches, then there will be no cache for very much requests forever. So I'm wondering if how how we can improve this default and sending merch request experience.

B

A

Yeah certain points.

A

I think, ideally, I added that that should be possible to like I mean in the gitlab product.

A

It would be great to be able to allow forks to use the you know the canonical cache, but um there's probably some security issues with that, but I think that would be yeah.

B

A

um Because yeah, our strategy doesn't really work for folks. In that sense, um except if yeah folks define the schedule by applying to the caches, I guess.

B

Yeah, can we specify where to fetch the cache? I don't know if this makes sense at all, but.

A

Yeah, I don't think that's possible at the moment.

B

B

I'll put the issues in the engine.

C

That was great thanks, remy um other. Do we graph? um This is probably a an ignorant question. I apologize. Do we graph the the times like where you we had the first or the one of the early slides the improvements? um Do we have that charted somewhere? So we can see the the point at which they, like the benefit, started to to come in.

A

Yeah definitely uh so we do um we do graph, so I'm sharing just a a gitlab issue, but the graphs that you see are from sisense and yeah. So me I can actually yeah. I won't show you right now, because I need to log in and stuff, but yeah we graph all that, so we graph as you can see the like pipeline type.

A

So, for example, this one is for the qa pipeline type. So these are the pipelines that run the packaging to a job. This one is from the front-end pipeline type which, which deployed the review apps, and this one is from the code pipeline type. So mostly like back-end pipeline, we could say- um and we also graph per job- and this is useful to detect so per jobs. Look like this, for example, this is useful to detect regressions, usually, and it's also useful, when we have improvements for sure.

C

Yeah it'd, be I'm sure, I'm not the first person to mention this it'd be awesome to have these sorts of, like mini mini versions of these uh graphs kind of embedded as part of the like pipeline view, um because it's I.

B

C

Is there, but it's perhaps not not close enough um to be able to be rendered inside the kind of the gitlab interface, but that'd, be so cool to to be able to see that, like the regression side is, is as interesting and fascinating as the improvement like performance side.

A

Yeah totally agree, and uh I think we, uh if you recall, we discussed that a few weeks ago in the team meeting, because we had a regression in the um in the set up, tensed and yeah. I think it was this one um and- and I actually created uh you, know a feature proposal to to detect that at the merch request stage, and rather than looking at the graphs.

D

A

The major cases merge so yeah. I totally agree.

C

That's great thanks.

D

And and um I'll say, like testing well, different teams in verify are looking at uh pipeline. Intelligence, I think, is what they're calling it's like analytics to support those those sort of insights, so that customers can kind of can have more nudges and reminders around their ci minute usage when it might be changed, trends may be changing good or bad. You know whether things are improving or getting worse, so I'll have to dig up those issues and, like remy said he created an issue for that specific thing.

D

Cool remy, that was great um any final questions.

D

Thank you. That was really really useful.

A

I will share the presentation as well.