GitLab Gitaly group, 30 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Speed Run: Partial Clone, Sparse Checkout, and File Locking

Description

Learn more: https://docs.gitlab.com/ee/topics/git/partial_clone.html#filter-by-file-size

A

Hi I'm James Ramsay group product manager, here at Gear, lab for the create stage of the develops lifecycle and I'd like to talk to you about three features relating to working with large binary files, and yet first is partial clone. The second is fast checkout and the third is file, locking so I think the easiest way to explain this is with a quick demo project and I've got this small little video game here, which is a knockoff of a flappy bird and I, am terrible at it.

A

So you'll see here that we've got some sound effects. We've got some textures. Our tiles. We've also got some font as well over the top. So let's look at how we'd work on a project where there's some source code and also binary assets in the same repository and how we would do that in a way that isn't disruptive to the different kinds of people that would be using that repository.

A

So here's my sample project, all my binary files are in the resources directory and everything else is in the root. So, ideally, you would have your binary files separated from your source code files and you probably have finer grained organization of the resources or binary assets associated with your project as well, so they're easy to control, which ones you're interested in so the first thing we need to do is a clonus project and take a look at how to use partial clone.

A

So partial clone is really all about passing in this filter argument, and that says only give me the objects that are not blobs and blobs are the file contents. So what that means is git will download just the metadata the commits information, the trees and a few other little bits and pieces, but it won't actually download file contents for the whole history. So usually, when you do a git clone, it will download the entire history of the entire repository every version of every file, which is great because it means you can check anything out.

A

You want at any time without having an internet connection, but if you've got an enormous repository, that's the exact opposite of what you want, so partial clone says: don't download that I want you to download it only when I need it on a just-in-time basis. So that's what the filter argument does. Second argument is to do a sparse checkout, which will mean we'll only checkout this root directory.

A

So let's take a look at what that does in practice right, so we can see there were cloning, only eight objects and then we seem to get another six objects. So what's going on here. So as I mentioned, there's a bunch of metadata that gets going to downloads, that's commits trees and other kinds of get objects that are not the file contents that are very small and then it's going to do a sparse checkout, and let me show you what that is by showing you what is in the flappy directory.

A

So here it is practically in front of you. The sparse checkout means that only the files in the root of the repository will be checked out by default. So that includes like the readme and any other files that are there, but you'll notice that the resources directory is not there. So what happened in the second phase of the partial clone is when we said oh I want to check out that directory. It said oh I need to download those blogs that I didn't have so I'll download. Just the latest version of those files.

A

I won't download the history of those files, just the latest version, so this is great. I can work on my source code and prove the game logic, but if I want to work on those assets, particularly images, I'm kind of stuck I haven't downloaded them they're, not in my repository as far as I can see. So, let's do a space checkout to add those images.

A

I, don't want the audio or the fonts I just want the images and here's how you do that using the new space check out command and we can check out paths or files and and here's what happens when I check out just a file.

A

If it's created a new resources directory and images directory- and we can see the tiles file also, we can see in our command line that get went away and contacted the server and said I need this tiles PNG file, which I don't have. Please? Can you give it to me sparse, check out, download that file and then can check it out? If we compare that to what's on the server, we can see that we've only just got the one file. We need. We've excluded all those other files.

A

If you have an enormous directory of 2,000 different textures working on a giant video game that could be really useful, then you only want to download a subset of those files. You don't want to download all of them, but we can also download all of them. If we want, we can just widen the scope of our sparse checkout to include everything in the images directory. You notice that there's another four objects that get downloaded and we can see those four files turn up in our repository.

A

So that's partial, clone and sparse checkout working together to help you download just the files. You need for just the file paths and directories that you're interested in, so that you can work faster when you've got large files and deep history that we otherwise slow you down. The other part. That's really important, with working with binary files is file, locking if I'm working on a binary file and I changed this. So this PNG, for example, someone else changes at the same time.

A

I can't use the standard, merge tools and get to combine those two versions: there's lots of different versions of a PNG file format like there's different color depths, there's all sorts of other like custom metadata that might be in there and also we're not just talking about one type of binary file. We've got to consider all kinds of binary files and there's lots that just really have no an analogy to emerge that they support their complicated binary file formats and they would require highly specialized custom tooling to be built that doesn't otherwise exist.

A

So it's not practical and so the solution, the general solution that solves this is to take an exclusive lock on the file which says only one person can edit a file at a time and the way that works is by making these files read-only by default instead of right by default.

A

So we assume no one can write to them until you explicitly ask if someone can, if I can write and here's how we set that up using git, we can use git LFS, even though we're not going to store the files in elephants, git LFS has a file, locking feature that get lab supports and you can use it without. Lfs works with partial clone and sparse check out is how so we'll install LFS in our project set that up locally.

A

The next thing we need to need to do is tell git LFS, which files it should be worried about in terms of file locking. Now we do that using the git attributes file.

A

This file is a list of file patterns and then attributes and the attribute that matters here is lockable that tells LFS to treat this as read-only by default, except until I've explicitly requested, write access to that file, so I've configured that for PNG files in the repository that's on the server, so it came when I cloned it- and you can see that here so now that we've got LFS installed, we've can configure target attributes.

A

We just need to update the permissions, so this is what they are now and what we'll do is we'll do a git, checkout and we'll just do doc, which says just update the checkout and then, if we take a look you'll see now, those PNG files are read-only, and this is really important, because no one has a lock on those files by default. No one's allowed to write to them until you explicitly ask to write to them.

A

If I open this file with preview you'll see that it's locked, the operating system knows that that file does not have write permissions. So here's how we can ask for them, so I want to edit that file. So I'll go get LFS, lock, resources, type images tile, which ask the server. Does anyone else have a lock on this file? There's anyone else editing this nope, so success.

A

I now have the lock, which means no one else can have the lock, which means no one else can edit it if I try and edit it in the web interface that won't work. So we go resources, images tiles.

A

We can see that it's, it's locked, there's an unlock button if I want to release the lock and if anyone else had the file open, it was logged in they wouldn't be able to edit it at all can't push to any branch. This file is locked on all branches and if they've got LFS configured that lock will propagate when they next sync the locks to their local computer.

A

Edits I could scribble on this really make this a high quality texture put some nice little details on it and then, when I'm done, I'd commit it push it and then the important command to run is get LFS unlock. What that does not see it knows that I've got uncommitted changes so I, better discard this.

A

Good one James.

A

And then I can unlock it and.

A

That means someone else can now take the lock and edit that file with more useful edits rather than my scribble. So that's a quick introduction into partial cloned, sparse, checkout and file, locking which are really important if you're wanting to use get with large files, particularly binary files, where you can't resolve merge conflicts. Let me know if you have any questions and thanks for your time.