GitLab Gitaly group, 14 Sep 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Gitaly Training - Architectural Overview

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Dreamer, like stream starting soon yeah.

B

C

B

C

I, don't think I can bring myself to do the uh the twitch thing.

B

Oh looks like uh we have a good number of folks um and then this is being recorded as well. So I think we can go ahead and get started. So my name is John I'm, a yam on The, Daily team and um yeah. We we wanted to do a series of talks, I guess we'll call it training sessions. Originally, the audience was support.

B

Folks, because yeah Italia is a pretty complex topic and uh comes out comes up a lot um uh issues happen uh with customers or uh you know even on getlab.com, and so we just wanted to provide a series of talks to kind of um kind of dive into different aspects of the giddly um code base. I guess so not just the code base just how it how it works and how it works with other parts of gitlab.

B

um Just so folks can get more familiar um and so yeah um today uh will Chandler uh he's a senior back engineer on Gili team. He has a support background as well. Actually um so he's going to give our first session on um like an overview of Italy architecture, so take it away. Will.

C

Next shot, uh I just make sure we're recording uh that is: okay, we're recording it cool uh all right yeah. So this is just uh supposed to be kind of a a little bit overview of giddly from kind of the admin perspective, mostly and.

D

C

Components fit together uh all right. Can you see my browser.

E

B

Cool sorry well, I forgot to say one thing um sure so we're gonna have q a at the end, so uh oh whoops I just sent the wrong link.

B

um Okay, I'm gonna, send over the uh agenda Doc in the zoom chat, so feel free to put down any questions uh under the questions section and then we'll have about 20 minutes um for uh to answer those questions. So sorry about that. Go ahead! Well of.

C

Course, okay, yeah. So basically my goal is to kind of like walk through. What does this diagram mean in a little more detail and uh give you some more ideas on what gitly is doing under the hood?

C

um So giddly is a grpc server um that handles any git operation, you're doing within gitlab, so anytime, you're looking at a file or looking at branches or anything that's touched in the repository, that's going through Italy.

C

um In the past we had a sidecar called gilirubi that is gone as of 160 I. Don't know! Where uh tone did that work thanks tone um and then, as of like 16 4, there is also another binary launching called uh good to go.

C

Let's use the get two, but that has now been fully replaced with just git, so any git operation you're doing we are just working a git process, basically standard git I think we occasionally will use our own patches on our Git Version, but typically we just try to stay with the Upstream version of git um and so we're just working there and reading from standard out typically to get the uh whatever the details are.

C

uh Okay, so there are kind of three main clients that giddly serves so there's rails itself, so like a puma process, so anytime you're actually like looking at a project. That's that's Puma, sending rpcs. um If you're familiar with the performance bar, which you can get with hitting p and B on your keyboard, uh you can just click here and seal. Our PCC is rescinding so find license and you can see all the parameters we're sending so typically anything that rails, ascending is going to be small.

C

You know, like hey, show me what files just by this name or what are the references that exist and so forth, and so those are all going to be over grbc and they may or may not be streaming grpc responses, sometimes they're unary or, like a you, have single response. Sometimes they stream back depending on the size, um but that's that's puma and then next would be Workhorse.

C

So workhorses uh gonna be a longer live connection, because you know Workhorse is pretty unusual to restart it. um So any HTTP get operation which angular.com will be. You know, 95 of CI um or in.

E

C

Environment really I was using GitHub CI, it's going to be it's going to be HTTP right, so usually you'll see most of your clone or fetch traffic come out over Workhorse, um and so, when we're doing any kind of Clone or fetch, we use What's called the sideband. So originally we were sending the the pack file details over grpc which involve getting the output from git encoding into your grpc buffer, sending it over to Workhorse and then de-encoding it and writing it up to the client, and that was inefficient. So it was.

A

A year or two years ago,.

C

Now um scalability Team added, what's called a sideband connection, so we Multiplex the grpc connection to create a very simple pipe on there. That will just send the the pack information in the clear, so we don't have to encode it. So that saves a lot of CPU time. So that's why you'll see like SSH, upload pack or post upload pack with sideband? That's what that with sideband is referring to. Is this this Multiplex connection that we have running?

C

um Oh yeah? Thank you side. Sad band is git itself outside channel is the thank you Tom right, um and so so it's like I, said any any kind of git client operation so like a CI, Runner cloning or fetching or someone just on their own terminal, doing a fetch, we'll all use uh uh Workhorse and then the the final connection or client is shell, so that that'll be for incoming estate requests on kitlab.com we're using uh gitlab sshd.

C

So that is, you know stable process, and so that will have an existing connection out to giddly, and so we can kind of reuse that connection, um so, if managed instances that's off by default. So when a new connection comes in we're going to the standard, opensshd is going to Fork a gitlab shell process, which will then establish a new connection. So you'll see a lot more connection churn there.

C

Things will be much more ephemeral when it comes to that, but it's the same same deal as Workhorse, where any it's just over SSH um it'll be any get client operation, so that is more likely to be humans. In most environments, some self-managed customers use SSH for their CI. So it's not always, but it depends uh so there is. This is the closest thing we have to a diagram um which is a this page here.

C

This is not mentioned, Shell versus uh well. This is on the killer dogs. There is also like a larger gitlab.com architecture somewhere in the docs I. Don't know if someone knows where that is offhand, I, don't I, don't have the link handy, but that's uh that's pretty enormous, but you can you can pick out all the stuff I'm talking about there too. If you know what you're looking for it's just there's a lot more stuff going on.

C

um Okay and then occasionally, you also have Italy now to talk to each other. For example, if I am working a project, that's not using an object, pool I might need to go from one giddly server to another, to fetch the details of that project and then clone it over to my own goodly server um and I'm, not actually sure if this, like loopback connection to itself, really happens these days, that used to be for giddly Ruby, where we would like connect over like to the the host address that rails, they would buy um I.

C

Don't think offhand that we still do that. So this might be a little out of date. uh This section here at the bottom.

C

Okay, uh all right. So those are the three clients that we have um was it going to show you oh yeah. One thing: that's kind of interesting to see. So, if you so here is a standalone, giddly, node.

C

Actually, let's go here.

C

So if you look at the storage section of the config it which, if we look at.

C

Which is uh corresponds with the storage section in the giddly configuration, so you can see here, there's only one storage name listed here, so so, if I wanted to Fork. So let's say: I had a a repository on Italy one here and I wanted to clone it over to giddly Shard to say before, because it's a private project we can't use object, pools then what you'd have to do is you have to create a connection, but Kitty itself is not aware of any other giddly nodes um so like on.com.

C

Actually, if you look at different figs, you'll see that, like all the Italy stories are listed in the config, but it's on it's only there for ease of configuration um in reality, it's on it's only using whatever storage that node is actually configured for, um so that's slightly confusing, but but basically, how does? How does this giddly node know to talk to this one? Why doesn't the config the way it works is in the rails? Node actually excuse me.

C

um Rails will bundle in the addresses it's aware of in the grpc headers um so that we can fetch uh so that it can basically, it tells the Gilly Clans. Here's where you need to go to.

E

D

C

Node, uh let me show that in action, real quick right, we moved I, don't know why they got rid of the wrench. Oh that's fine, right repository. What do I have is this primary storage right now? Okay, so my cluster is this: where everything's going so, let's work this.

C

Let me also enable.

C

No that's already on there turn that off.

C

So this uh goatee buggy should be 2D bug is enabling extremely verbose logging uh on giddly, which.

F

You're never going.

C

To do in production, environment all right, but for our purposes is Handy, uh so you can see it was like writing each header that's receiving on a request uh all right. So let's do.

C

C

So now, if I work.

C

Okay, here we go yeah, so you can see in the uh we get this giddly servers field and then you get the space 64 encoded string.

C

And then, if you expand that you can see that it's a Json blob that says: hey here's, this Gilly server Shard, which you can see here in the rails, config and then here's the address. Here's his token, so you can authenticate yourself. So that is how giddly servers know how to communicate with each other there's, never any config on the gidley Node itself, as of today at least, to talk between nodes.

C

This a lot of this is subject to change like we're planning to completely overhaul how gitly cluster works and that's going to add some client-side routing which hasn't been outmitted yet um so that may so this this may be something to change, but right now this is how it works.

B

Could you explain real quick um why, during a forking process ideally would know we need to know about another yearly known.

C

Yeah, so it depends so if you have a public project or an internal project, um Italy will create, what's known as an object pool which is um kind of like a a separate get uh repository, which will be the prefix with an at pools path on disk and that will have all the objects in common between the forks. So that way we would do duplicate most of the repository, usually almost all of it, um and we can save a lot of space.

C

So if you have a project like gitlab, which has you know, hundreds or thousands of forks I, don't remember that over top of my head, um we don't have 500 copies of the repository we have 95 of one copy and then five percent scattered around across the other ones, is what they differ, um so that let's just be much more space efficient.

C

uh However, if it's a private project, we always create a new Full repository, and so that's what I did in this case, um and so in that case, if you have, let me go back to the admin page.

C

um Repository storage, so this page here in admin under Repository lets, you set a relative weight for where your new repositories are created. um So right now, I have all of my repositories being created in my default giddly cluster and none of them go to Shard and I can change that to whatever waiting I wanted.

C

um If I, if I had certain nodes, they wanted to fill up and other ones to not be touched, then I would adjust that value um anyway. So in this case, because I'm creating all new repositories in default and the existing copular repository wasn't shard, the default giddly nodes which are in a gilly cluster, will have to reach out to Shard and say: hey. Give me this repository, so it'll do internally a little Fetch and get all the objects and create a Zone cut very copy of the Repository.

E

C

Going to show you what it looks like on disk real, quick.

C

So, by default, this is the directory that, or Anonymous instance, will put the repositories in um you'll see this stately metadata file, which is historically been used to identify whether or not we've been using NFS. So, basically, the way it would work is rails would validate what this file looks like on its end and then the killer server would also say like what this, what the EU ID it has if they match them, we say: okay, this must be NFS and we'd use previously we'd use rugged, which is replica two bindings Ruby.

B

Okay, sorry well, I, don't know if other folks are experiencing this, but I think it's getting cut off at the bottom. For me, it could use this type thing clear, real, quick yeah thanks.

C

Sure, um anyway, so I can't remember the top of my head. If we've removed rugged entirely or not, I think we've disabled it right job, I.

B

Think we uh yeah well I, think you can well. No! You can still use if you really wanted to through feature Flags, but for all kinds of purposes. It's not being used by an application.

F

Will be removed in 16.5? Oh that's. The current cloud.

C

Great okay: well, we won't talk about it too much, then um all right and you have this.gilly directory or plus Gately. Excuse me, and this is kind of Italy's scratch directory.

C

Where, when we're leading a repository, for example, we will move rename it as a repository name Dash deleted in this directory, so we can atomically move it out of place and then from there we can delete it um and we create um reference or archives or archives caches for um the uh the Reps that we're serving uh that's another thing: that's in there um it's really implementation detail and then there's a cluster directory will have the actual repositories itself and if I had any pool repositories, there'd be a pools directory. Here too.

C

um This is this is the new format. uh That's interesting.

C

So you can see we have some very repositories in here: um let's commit graphs, uh backfile, reverse index index, bitmaps all the different housekeeping things that help keep you efficient, uh and this is a file that guildley creates to say Hey when's, the last time we did a full repack, so we can keep track of that as well uh as I. Go here. I think I see yeah. Okay, let's go here.

A

A

C

Yeah, sorry, okay, okay, so um just to explain a little bit more, so this host here get Elite Training Italy 2 is within a giddly cluster um and so prefect will. Let me use this at cluster um format for its paths. Well, this is a shirt again so Standalone giddly, which is what almost all of our Ghibli nodes at.com, are, and what you'd see in a typical small self-image instance as well right- and so here we use the add hash um format for the path, and so this is.

C

This is the path that you'll see internally in rails as well. So if we uh go to projects and so like a look here and I can say, okay, this is the path that rails has with project and this matches up with what we have on The Shard. However, if you look at a product, that's in the cluster it'll show up as hashed, but there is no path at hash 66p86 in in the on the file system here, so we have to do.

C

C

Let's go into the prefect which I have mentioned, I'll get back to that in a second um database to map that between them um so like before I go any further. Let me actually talk about what is giddly cluster, so gitly cluster is a way of having multiple copies of a repository that are, you know, kept in sync with each other without uh well, exposing only a single instance to rail. So it's a way of getting redundancy, which is not something that we have with the shared goodly.

C

There's no way to replicate a repository well to multiple different charts in the current setup. So what that does is typically you'll have a little Bouncer and that'll be the address that you give to rails. uh Let me show you here so like, for example, here this could be my load balancer. In actual fact, I only have a single prefect node in production. You typically have three, maybe more um one. One failure mode is seen somewhat of them with self-managed.

C

Customers is, if you have more than more giddly nodes behind prefect and you have prefect, they can send so much traffic that it overwhelms the network, connection um or network bandwidth that is available to the prefect node and that will cause silent slowdowns. It used to be it used to cause memory blow down perfect, but that's been fixed. um Is it like fifteen five or something like that um anyway? So perfect nodes, what they they're stateless?

C

They have a they have a postgres behind them and that tracks all the repositories in the state of all the repositories, and you know which repository is going to which node. So if you look at the repositories table.

C

uh Let me make that bigger, so we've got right now: we've got four repositories, two of them being one keys to them being actual projects, and so you can see this is the path that rails knows.

C

um So, if we go back here is at hasht6b86 Etc. um But if you look here, it actually maps to at cluster Repository hd4732 um and you can see giddly one is the primary node for that which would make gitly 2 being along with the secondaries. I only have two giddly nodes in this cluster typically get F3. It's two is fine. If a customer only wants to there's no need for Quorum or anything like that, um and this generation counter is saying like how many changes have happened.

C

So this is how we keep track, of which node is in what state. So if the generation counter is the same, so if we look at storage repositories, so you can see here they both have the same generation. So in this case a read request for this repository could be read to either get ready one or delete two because they're both at the same generation.

C

So this lets us an active repositories, in particular, distribute reads among multiple nodes and decrease the load on the primary any rate will always go to the primary and one issue that some customers are seeing right now in general is that if a right cannot be simultaneously proxied to the secondary and they don't they don't agree on it, for whatever reason so like the secondary fails to grow an update for whatever reason, then the secondary falls behind.

C

It has to replicate to basically do a fetch from the primary um if it's a very large repository that fetch might take 30 seconds in which case, and if changes are coming in faster than that, the secondary might not catch up to the primary for hours, and so in that case we wouldn't be able to distribute reads: We Do, It, All rights, or all reasons have to go to the primary along with any rights and so you're losing most of the benefit other than you know.

C

Having redundancy in terms of you know the primary hard drive failing um in this scenario, you can't you can't spread the load, so that's a failure mode that we see somewhat often it's been improved. Most I think pretty much. All RPS at this point won't do that for a while over some that would always trigger a replication and that uh you know that just decreases the efficiency.

C

um Okay. So that's! This is kind of internally how prefect nodes are tracking. You know like how can I route a request. So basically the request comes a little balancer low balancer chooses any of the stateless prefix nodes, prefect, we'll check the database say: okay, I can send this request for repository 6b86 to either get really one or get li2, because the generation is three it'll route it um and then it'll so it'll proxy. The grpc request down to gidly one of the giddly nodes, and then Gilly will do its thing.

C

Send it back out to prefect which proxy it adds a little bounce, we'll send it back to the client and so on. So there are a lot of steps there. um uh So how does giddly actually get get data so I'll show that real quick.

C

So it's s tracer.

C

uh Actually, no, let's do this sure it makes more sense and.

F

C

Everything I'm saying, makes sense.

C

C

Start the and let's.

A

C

C

So disability of this page will have triggered some yeah all right.

C

All right, uh that's actually at that ahead.

C

It's kind of hard review: isn't it sorry um anyway? So let me let me walk through what's happening here, so so the rail is going to requests to pull up a project. So it's going to send a few rpcs so like. If you look here and see it's sent search files by name in this case. We are looking for itself um and then it's looking for markdown files from the wiki repository. You can see the reg X that we're sending and then, in addition, we're also to.

E

C

The wiki that's interesting, yeah surprised, we're doing. That's fine um has local branches. So it's saying hey what branches do you have um to this Repository?

C

um And so here you can see this is the the um the file that's actually being forked. So it's it's not opt embedded in Optical I haven't been in bin git um gettili when it launches will extract its own git reposit, um don't copy of the get binary, so let's actually just expand that directory. So here you can see everything. That's there, so you've got get to go, which, as I said, should be gone at the next release, um and then you have Kili gpg.

C

We use that to sign and validate uh signatures on commits hooks is used for well, it's used for the pack objects caches kind of on the primary um things which I'll get into in a little bit.

C

um Lfs smudge is used for handling lfs objects and that's kind of a long-lived process that we use to make it more efficient and then give the SSH is used to connect internally between guildley nodes, um and then we also have hooks and socket directory in there as well. um Okay and so sorry so I just started printing, it's confusing all right. So then we got this gift or director saying hey.

C

You want git operate in this directory here and we always say: Don't run get GC automatically, which, by default git, will join GC on some operations, just as like a standard housekeeping, but cataly controls housekeeping. So we don't want that.

C

um We were saying treat controlled F to whatever you're receiving or crlf the line ending, don't use, replace refs, which has been a security issue. These are the objects, that'll be f-sync, so objects metadata references or fsynced. By get uh whenever we make a change there and then fsync is fsync, uh and this I believe has to do with reducing contention on the Pax rest file.

C

uh Okay and so then here's the actual commander running, which is git cat files. This is saying, give me a blob um and then we're running Z, which will give it like null uh null by terminated um and then they're saying patch command, which is saying we're going to send you a number of commands every time. So I. Don't let's see if we have.

E

C

Get file processes right now and we don't um so what gittley does when were these get for these Ruby or rails operations that are often making lots of little requests so, like I, want to look at this file. I want to look at that file um within the context of a given uh rails request, so I think it works out to a single correlation ID.

C

um We will Fork a cat file process and then keep that around, so that we're not for each file that we're trying to view we're not forking a new cat file process. This lets us be a little more efficient um work for your processes.

C

So when you, when you look at a production, server, you'll probably see a whole lot of cat file processes around those are mostly just their hashed waiting to see if any more requests come in or or that given request for that, for that given overall, like web request, that's coming in um so that's that's! Why they're there and then the other thing you'll see a lot will probably be uh upload pack or.

D

F

C

Objects related to clones. Those will be long-lived typically for a larger clone.

E

um So there'll be around.

C

More uh anything else I want to talk about. Well, let me pause right here, first, um anything that any questions that are like immediately relevant that anyone wants to ask.

E

I have a question just yeah: what's the logic behind creating its own work of the git uh um executable is.

D

There any reason for it or.

C

Let's clearly have more control over what git process or what get uh binary we're running um I. Don't really reason that it was difficult to keep it in.

D

C

With the Omnibus, but this way we didn't have to rely on Omnibus updating what it bundled we could always. We could very you know, control exactly what this is, what we were executing and clients or you admins- can still override they get binary. So I don't believe we have to use this. That's not something we recommend, but it is something that you may encounter in the wild. If you're dealing with a self-managed customer.

B

Yeah, another reason is sometimes: if there's like a security fix or some high priority fix, um we will build a git binary with some of our own changes.

B

um Usually we ship it's the standard, get binary, that's released, but sometimes we'll ship with some of our own changes so um yeah. If self-managed customers use their own git binary, it might break because it doesn't have the features that um Italy needs. Well, we.

E

Don't ship the security patches to our customers before they're released.

B

Yeah, that's right.

E

C

Yeah, but one other thing I can talk about within the connected daily. Cluster is failing over between libraries. So if I go back up here,.

C

uh So right now we have for this repository one: uh it's primary is going to be one, um so that is the skilly. No, that's not that one let's go back! uh Is this one yeah this one? Okay?

C

So if I go to giddly one.

C

You stop giddly and then I send a request to view the project we should fail over at this point just to get a lead to.

C

A

D

C

I need to go in and actually hold the.

A

C

What am I doing wrong?

C

Can't oh well, I wonder if we need to make a push, because it might just be routing the reads there. Let me actually try modifying something.

A

Subscribe, your file.

C

All right, okay, I just didn't test that I need to update. This is a config problem. It's not writing out to the right address for the internal.

A

C

All right: well, let's make that work. Let me tested this beforehand. Oh well!

C

um No actually I was just reading it wrong. It did update okay.

B

Probably just took a second yeah.

C

Maybe maybe it already changed, I didn't even look great, that's probably it anyway. So you can see now for this repository only. We have now updated the primary to be Italy 2.. So the way failover works is when a request comes in that the primary needs to handle and the primary is not available. Then we will update the primary to be one of the other in sync nodes that is on the gilding cluster.

C

If there are none, then the repository becomes will become unavailable until the a up-to-date node returns, um but we only do that for repositories that come in the have requests come in so you'll note that this other repository has not updated. Even though gidly one is offline. Originally, we did update all of the repositories and that caused a lot of work in postgres, and it was just it was. You didn't have to do it that way right so now it only it lazily updates the primary as needed.

C

uh So that's uh one thing that might be relevant: okay,.

C

Yeah and then I'll just quickly mention um you may see, deadline exceeded errors in the logs. That means that some operation that rails, typically until it's always going to rails, are sidekick um sent took longer to regularly to serve than the timeout configured in the admin area. So if we go back again and leave page.

C

Preferences, gitly timeouts, so whatever these values are set and whatever the operation is to develop the time that will depend on the uh evaporation that you're doing, but if it takes particularly if it's a large repository and it takes longer than this, then you'll see a deadline exceeded the other area. You'll see a lot is canceled the context canceled. That means typically that somewhere between giddly and the client, the connection went away. Maybe the client closed their git process. Maybe you know, load balancer went out. It could have been anywhere along that chain.

C

If the connection has lost and you'll see that connection error, contents, canceled, uh error, messenger, then giddly so basically just means that the client went away for some reason. We don't really know why. um Just you should look Upstream to try and figure out why that happened.

C

Okay, yeah, that covers everything I wanted to go over. uh Let me stop sharing for a quick second I'll pull up the dock.

B

Thanks will so um yeah I know that that was pretty technical. So if you have any questions or if you want will to go over, maybe one of the parts that he already went over um uh again, uh maybe yeah. So um please uh put those in the questions and um we'll go over them one by one.

C

uh Okay, oh after I asked why we use our own, which I think we covered um manual. uh Oh so so the pooling feature.

C

Yeah Karthik has answered that so so we're using a built-in git feature called alternates. So basically you can say here's a repository on disk and then you'll have an alternate file. Then it'll say: okay and I'll also go to this other directory and use that to get additional objects from it. So we're saying all of you all of you pool reposit all your projects that are using this pull repository.

C

We add that alternate of the pool to them and so then get itself just knows: okay, I need to go to this other directory, which is just a relative path to um to fetch the objects so there's limitation, which is that, because it's a relative path, you know it has to be on the same node like we can't sell git, uh use an HTTP request or something to go. Do it so that limits any any four cluster to a single giddly node, um the negatively cluster.

C

um In theory, you still have the the forks spread out or replicated amongst all the nodes that was broken for a long time. I think Justin has gotten that basically fixed at this point or close to it.

C

um So it should start working soon if it doesn't work now and for a while we weren't keeping those in sync accurately. um Does it answer your question at all.

C

Maybe he left okay uh Steve asked. Is there a Civic reason that we put things in the cluster directory as a.

D

C

Are we adding extra overhead of translating the path through uh prefect, um so I think the main answer there is that we wanted to have a a permanent home for a given Repository and but rails was still asking us to move repositories around on disk now, particularly in the context of Geo. So you'd have like a new repository created in G at Geo, temporary and then once the fetch hit geofactor completed, then it would get moved into place.

C

um So the actually, the reason that we have the add hash name original in Italy or on on giddly shards, is the same logic. Originally, the path on disk was the repository names so like including the project so it'd be like our applicant lab git repositories. Repositories, um getlab, Dash, org get lab uh would have been the path forget that.

C

But um whenever you had someone you know rename their project or rename their organization, they ended up doing a ton of work on disk and just rename all these directories and it was expensive and error prone. So that's why I created that hashed format, um and so then this cluster thing is doing the same thing.

C

um Just at another level of abstraction does this way kill the cluster knows that a repository is always at this path, even if rails thinks that it's moved somewhere else. So that's that's why we added that.

E

um It does add another layer of complication. Cluster is always the same, no matter which you could delete shark to talk to correct.

C

Okay yeah. So if you look at.

C

C

Back one yeah, sorry that was actually over here. So if you look at the path, so it's actually the hash of the ID, but instead of taking the full hash here like we do with Ed hash, we just put the actual number here. So this is the full hash of number one. This is also hash. One just use one at the end, um but they don't they don't always. You know correspond with the ID that uh rails has like this in rails is Project one and cluster it's project or sorry.

C

This is Project one and cluster and project two years project, one in rails, project two in Gilly cluster. So one issue that we see is that, should the prefect database be lost, there's no convenient way to recreate the association between the path that rails has and the path that the cluster had.

C

So there's yeah, there's just no. Unless you're backing up your database, which I don't think be really advised in the docs there's just no way to re-establish that you can. You can kind of fake it if you can go through the logs for so for any traffic, any project that was creating or getting traffic. You could at least look at recent logs and say: okay I can translate between those two because that'll have both, but um if it's like an idle project, then you're, probably not gonna, be able to easily do it. There is.

C

There is one thing you can try.

C

If we go into oops.

C

I'm, looking like a fig actually, this one doesn't have it okay. Well, there we go yeah.

E

So sometimes yeah.

C

Yeah for a long time, I used to write there used to be a gitlab section in this config in the git config. That would list the the project name, so you could relate it back that way. um I think we added that, like 10 version, 10 could get that 10., um but it's that's going away now, so you can't rely on that either.

C

uh Okay, Sean asks so some customers didn't get really independent of gitlab hugging like clogging face I think is certainly the most prominent. There might be others that are doing it. That I'm, not aware of um so. This, isn't something that we officially support um in terms of like a support team's not going to help them. There's most logic in Italy is pretty agnostic there. There is some stuff that is get lasific. You know, like authorizations, I think are hard-coded to go to internal aloud.

C

For example, we do have documentation, I, think I'm, a brewer I think it's yeah this. So it's for this Lincoln doc.

B

Yeah and if we're the documentation for RPC or rpcs it's uh there is um good amount, that's missing and that's something that we're working to address. Yeah.

C

This is all just automatically generated from the comments on our protograph definitions, so it may or may not be entirely useful understanding how it works, but that it is something that's out there, and this is. This is generated automatically if it's part of the CI pipeline, so it'll it'll be up to date.

F

Yeah, so we do treat Italy as a product, but it's an internal product that we don't currently Market externally hugging face is an exception. They know what they are doing. They are super technical. They understand exactly how git and gitlab and giddly works.

F

They just want to use their own front-end, so they are able to do this, but this is not something we would recommend at this point, however, what we want where we want to be, is to offer a clean, RPC interface with all the implementation, details and internal logic, some of which is currently managed in trails to be moved to giddly, for example, how object pools are created and deleted uh is is kind of a detail that doesn't belong in a client, whereas we want to keep all the business decisions, uh things that need access to the main postgres database, like customer, tier or customer settings in the repository or or in in their projects.

F

You want to keep that on the real side, so that gitly can be agnostic and just be told what it needs to do.

A

Make make sense. Thank you William, and address very, very interesting that we are even thinking about having it as a separate component.

F

It may never be marketed outside this is uh this is a product decision, but it does help us to be more independent and develop uh the overall of gitlab yeah with uh less friction. If we have a clean interface between components.

A

Of course, thank you in.

D

The briefly mentioned from the product perspective, this is also really important because we formed some very strategic Partnerships through this sort of opportunity. uh We get a lot of contributions back from external people who are you know, interested in giddly hugging face is a great example, but there are others as well that we've seen over the time where we've gotten contributions from other.

D

You know teams sort of looking at it and it gives us another perspective from a product perspective, and it does ensure that we are continuing to develop giddly in the most independent way, which is part of the reason why we are, as the direction page shows, trying to put the business logic into the rail side of things to make giddly somewhat um agnostic to who is calling in so yeah. It's been an overall win for us, but, as honors has said, we're not actively marketing or pushing it as an independent product.

D

Just yet, though, that may be something we look at doing in the future.

C

All right thanks everyone, uh so next question from Clement um interested in the cost of serving uh cost of service. What are the most costly operations so I know on.com using fast Storage. Snc storage has been the largest largest cost component of Gilly. By far um it really depends on the instance, though so you know.com has is getting. You know, lots and lots and lots of repositories. You know business, you know like a self-managed instance.

C

You might not see as many you might see like a single monitor repository, um which will still be large, but you know not not the scale of you know: 100 terabytes that you see.com for the you know the vast number of repositories there so typically storage is a big expense. Just because you for Reliable and performance service you need to have fast storage um and then, depending on the size of the repository, you may also see significant expenses in terms of compute.

C

So, if you have, you know a truly vast repository, let's say 20 gigs it can easily use 10 or 20 gigs per operation, sometimes depending what it is. So if you're, if you're running multiple operations at once, you know and you only got 10 gigs of RAM on the box, you're gonna run out real fast right, so you're gonna have to have a really large Post in that case, just to serve the repository of that size. But more typically, storage is larger. Cost.

C

Yeah goodly cluster, you know I would say that you'd expect to spec that are the same as a standalone, giddly node, because you know internally we're still retrieving that as a single node to the world right. So I don't think you'd want to skimp because you're still writing the same operations on a given node, just maybe fewer of them, and any of them could become the primary at any point right.

C

So you can't say like well this one's my primary so I'll upspect that and leave the other ones a little bit wimpier because we could fail over. You know anything we'll let.

C

Skill with usage, yeah I mean so typically, the vast majority of traffic to gitly is going to be CI traffic. So you know a Sierra Runner, fetching or cloning um humans are almost always like. You know, 10 or 20 of traffic from.

D

What I've seen at.

C

Least so you know getting that as efficient as possible. Is the number one priority, so that is why we added the pack objects cache a while back. That was also from the scalability team. uh That's listed down.

E

C

So basically, when a fetch comes in or a clone, um we will generate, the git will generate the pack file, that's going to set up the client and then this we're using the gitly hooks binary. We'll then write that pack to disk. So if an identical request comes in, we don't need to go to git to recompute that so we save all the CPU power that would have been used to create another pack file. We can just copy the bytes from disk serve it out to the client.

C

So that saves us a lot of CPU time, but it does add additional pressure onto the storage, because now we're writing all these packs that are coming into disk, which you know typically they'll, just be a memory and stream Bank out to the client.

C

um So it's usually a win, usually you're, more CPU bound than storage bound. If you have a whole lot of clones coming in at once, um but it depends on the the instance of the petronic pattern so something to look into in general, but maybe not always a win.

C

uh Okay, I think that covers that um Terry s any epics or issues the sync operations, clarificately cluster replication um yeah. So so I mentioned this in passing earlier, we are moving Italy in the long run to a new architecture which will remove the need for prefect. uh So that's going to be on each giddly node we're going to have it right ahead, log, basically like what postgres uses or other databases um where a change comes in.

C

We, you know, save its changes to disk and then we apply them one after the other, if you know serialize them um and replicate the changes out to any other nodes. That also happen to have this. This given Repository and then the client will be so in giddly will be routing clients without prefect. uh That's pretty! Well, since the blueprint for this I remember the detail.

C

That's going to work, I think there's going to be a writing table that git labels served to clients and then some clients will know where they can go and we'll have to be able to like retry to a different Gilly server. If that giddly node goes away.

C

um So there's gonna be a lot more complexity in how giddly talks to clients when this is done, you know right now. It's basically rails has an address. It goes the address. If there's a problem, if you return an error, that's it there's no retry logic! That's that's that! So it's gonna, the client will have to do a lot more work whenever that comes around, but that's uh that's still a ways out.

C

uh Okay, so that basically does the question: sorry Terry, um but yeah, so we're not we're not putting a a ton of more effort into prefect. You know we're still polishing it up, but uh so like like I mentioned, you know, like other PCS, that previously forced a replication job. We've got the down to like three I think at this point or something but um we're.

G

Not yeah I was just being um I was just being nosy I've seen that issue uh that was or epic that was linked. Thank you. I didn't get a chance to read through it all the way, but we are working on uh zukes replication strategy, so I was just curious, as for uh you know, at least it's good not great, to hear the problems that you're encountering, but at least something to think about when we're looking at how the replication is going to work on the zoo side.

C

Yeah I'd be very interested to see uh what your chamber has it.

C

Okay, uh I think. That's we're at time. That's all the questions in the dock um I'm happy to hang around for another couple minutes. If anyone has anything else, otherwise we can call it.

B

Cool, so thank you so much will for um that session.

B

um So um hope that was helpful to people um so just wanted to announce that in two weeks, we'll be doing our next session and the topic will be diving into get internals um or yeah diving in to get how git internals help debug Gilly issues, so that one should be really interesting.

B

um You can take a look at that issue. That's linked there for upcoming sessions um yeah by the way I haven't sent out. I haven't set up the calendar uh event, yet so I'll do that shortly and send it out on slack, um but that is it. Thank you. Everyone for joining have a good day. Bye, bye.

C

Thanks everyone.

A

Thank you. Everyone thanks will.