GitLab Gitaly group, 7 Mar 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: How Gitaly fits into GitLab: Episode 3 – Git push

Description

A 1-hour training video for contributors new to GitLab and Gitaly.

A closer look at the final stage of git push where the
git hooks run and the refs get updated. Interaction between the git
hooks and GitLab internal API. The Git object quarantine mechanism.
Preview of Git HTTP (to be discussed next time).

Recorded 2019-03-07

A

Yes, we're recording hello, welcome back to the third gisli training session, I'm Jacob I'm, a staff back end engineer on the Gately team and I'm showing I was going around looking at how Italy integrates with the rest of grid lab and showing you stuff.

A

So last time we were looking at git push and SSH and how that's first hits good lab shell and then from there it Bri authenticates with gate lab and then it's established quickly connection and then more stuff happens and I want to get to that more stuff, because there's some surprising things going on during the final stages of a good push, maybe I should start with a high-level. So what happens during a good push?

A

Let's see so what happens? Is a client's runs, get bush and then clients gets established.

A

Somehow starts off, remotes get receive back command on the server.

A

Client's, kids, a sense negotiates references, it go, she ate, refs client, get sense, BEC file, data and client, kids sense, ref update and then the server run, fax back file, data I think the server does some checks on no I think those checks actually happen later so doesn't biking itself as a kind of check, server updates, refs and the interesting thing is this, because this implies hooks.

A

This has hooks which can left relies on. Crucially, and those hooks are the jokes we use are pre receive we sort of use update, but not really and post receive.

A

And somewhere here in the middle kids, does the actual ref updates and it's the server-side kid so, let's get received back and the reason this is interesting is that these things make API calls back in to get lap.

A

This one is maybe easier to understand. So did this one is less. This one is less mind-boggling, because this trigger CI hits creates like notifications and gets lab that the actual push happens. So you see in this UI did you push happened because the post receive hook ran because we have to tell the rails application sometime, because when this receive back process is started on the server? That is not the time to send notifications into the system.

A

Saying: hey user X pushed something, because at that point you haven't even received the data, you don't know what they pushed. So this is mostly about notifications. Another thing that it does is that you have this feature where, if you push new branch thinking lab gives you a URL where you can click to make a numerous requests so that you were all that stuff is printed by the post receive folk okay. So that's the that's the easy one. This is the tricky one the pre receive hook, because, what's gate does before it runs?

A

These hooks is that's the data that it receives the the peg file data. It puts that in quarantine. It gets written in an alternate object directory that is a temporary directory. So nobody knows about it unless you know the actual path to that directory, and you can only run git commands that look up that look at that data. If you know that's quarantine directory, and specifically that means it gets really interesting, because this pre receive oh conformance features like protect the branches. How do we block a push to a protected branch?

A

Well, the free receive hook, gets input from gate. That says the user is updating branch X from Y to commit Z and I get lucky, and then look at that and say: hey branch X is a protected branch, and this user does not have the permissions to push through this branch. So we block the bush. This thing can abort the whole bush, but for good lab to look at those look at what's happening and to apply these solid ations. It needs to look at the gate repository. So this is actually.

A

This thing runs on the Gately server and makes a calls back into a good lab with an api, HTTP api call and then get that makes RPC calls back into the catelli server to look at that get data that is about to be committed and just decide if it looks right- and this is fairly complex because that gate has in quarantine. So you need to be careful that all the RPC calls that run during on kidnap sites during the pre receive hook, know where the quarantine data lives.

B

So Jakob, when it's creating this quarantine area, this is like a random path and temp or something is that worth what it's doing? Yeah.

A

It creates a temporary somewhere underneath the repository I think what I want to do today is, among other things, show show you what happens and where this thing is, so you can see that it gets created and when okay, cool and I figured this one, let's try and sort of sketch out a plan of what I want to show, instead of just rambling around randomly but yeah, it's a temp here.

A

So this yeah. This is super tricky what's happening here, the or basically, this states. This state of the push transaction is very tricky because you're using quarantine data- and this will also be very interesting. However, we solve it in RHA implementation, because if you replay a push to several Gridley servers, each of them will create their own random quarantine directory for the objects and those any lookups. You do. The details. Quarantine objects are super specific to the right, Catelli server.

A

It's not a problem today, because they can the only one case, sir, for the repo is on, but.

B

For a che, that's an issue yeah.

A

Exactly so, that's that's roughly what I want to look at and I double-checked before I started this school, so I, don't say something stupid. We don't have distributed tracing yet in gitlab, shell, so the sorry in the hooks, so the API calls that get made here are invisible in the tracing, no we're building out the tracing anyway. It's not still a work in progress, so I can't show you this with tracing. Also I, don't know how to use the tracing, but otherwise I would.

B

Have learned it.

A

Now so I'm going to do this, the some low-tech way I know that's yeah, so I want to show the API calls. That's where I want to start so. First I have a repository here and I'm going to make a new branch. I have no idea what this is about. I think this is because it's a test repository with bad data in there. Oh, that's because I actually ruined that.

A

Well, let's fix that. First, let's get rid of the good attributes. File and I now want to set something up where I can easily repeat pushing a commit. So, let's see and I wanted to be new commits. So if I do something like override the readme, there is a really yes and I.

A

Let my commit message: hello and I: do a git push and I expect this will get pushed to my gate. Lab I am running GD K, so this is pushing to local via SSH to my local kidnap server and yes, that works. So, let's make that a function. So it's a little easier to repeat so now, I can just P and it will push good now. I want to show what that does so, let's steal some logs I know that so I want to look now now.

A

Look at I I want to look at the excess logs of the rails application because I already know that that's where the API calls are being made I'm going to truncate the log, because this is development and is this full of stuff I? Don't care about? First need to go to the kit lab directory log, slash development, let's see how big that is, that is big.

A

Yeah, that is 13 megabytes of stuff I, don't care about. So let's truncate that and now let's steal this file and push again and see what happens.

A

Okay, a lot of stuff happens, maybe too much okay, I know from experience. That's reals as these lines that have started in them, so I'm just going to restrict myself to these start at lines.

A

So, let's edit that till commands and grab that.

A

A

Okay, well there we have it. What is happening here is that I'm not sure where there are two coals here, but this is part of the good lab shell part where we interact with the SSH daemon and pre authentication. So this is the setup that allows gate lab shell to establish a goodly connection for this particular push and then, during the push we make this API call and this API call from the hooks.

A

So, okay, now, whatever I'm not going to show you first, maybe let's look at.

A

Yeah, let's look at what.

A

Well, let's first look up this repository that I'm pushing to and see where that is, that would be on my local house, three thousandth.

A

Want to find the director it's in and I think I just converted this machine to hash storage. So that means I need to check in that pin panel.

A

No, it's not in this lists or places too deep, I guess I.

A

Think it's this one good lap or slash, get lip test.

A

Okay, so it hasn't been migrated. It's so I expect this to be in.

A

It's going a new window for that, so this is my gtk roots and I know that the repositories are in a subdirectory called the repositories and if I now go to this relative path, that should exists. Yes and I can do fine dots.

A

That's a bit too much stuff find objects.

A

Let's make this real easier.

A

Sects and see files.

A

So these are the objects that exist now at in in normal states, in this repository that just shrunk them right. The olders ended up in this back file: I, don't I, guess these are none referenced and that's why they still loose doesn't matter.

A

But I was saying that during the bush there is a bunch there's these extra objects in quarantine, and we want to see that now the nice thing of about how the hooks work is that's their Ruby and you can they're one of executable, so I can just edit them and wave wave to the camera. I know that in this setup they are in the gate lab shell repo, which is a weirds artifact of history. So here's the pre receive hook. Let's see if I can't wave.

A

And now I'm going to push again: yes, there we are I can wave it says hello there. So, let's now see what objects look like I know that the hook runs in this directory. So I can just put this find comment here.

A

Let's see have to be smart because I don't want to send it to standard output, I'm, not sure if that is okay, so I think this would redirect ads. Let's see yes, that works the first time around so cool.

A

How P doesn't do anything? It's something DIF finds rerun defines so here you see that all the objects are either in these directories, which start with two hexadecimal characters, which is kids fan-out scheme, to make sure that you don't end up, exhausting the maximum number of directory entries or they're in the packs right here we have some loose objects and Begg files and this incoming stuff. So that is a quarantine area. I was talking about.

A

I don't want to serve.

A

Thanks, so how do we?

A

How does this work? How do we even know where to look for these things, because and we're not supposed to run, finds and say well, I found some random extra directories here and, let's assume that these objects belong to the repository, so what's actually happening is that gates is telling us about.

A

It is telling ya get this tell me what's in the hook about these objects, and we can see that if okay I should have checked this before recorded I, hope there's nothing embarrassing in my environment, but the fairies will find out what I'm going to do, I'm going to do a system and and grab.

A

Things I start with kids and I'm going to do that to standard error, push again.

A

mmm No, this is not real conversing good, so this is information.

B

A

Gate is passing to this process. The good thing the nice thing is said in this process. The gate will automatically use the quarantine directory, and that is because of well I. Guess it's because of these things.

A

Let's look what we have here so there's a quarantine path. Then it's repeated as the gate object directory and then there's a an alternate object directory defined, which is actually the normal object directory.

A

If you look closely, you see that this part is the path to the where the repository is on disk and then it goes to slash objects, so gate commands that run if I were to run a gate command in this hook, it would run in a sort of topsy-turvy world where the main objective rectory is the quarantine directory, and then there are these extra things which are actually the normal objects.

A

So this is how the quarantine mechanism works.

A

Are you still with me.

B

So, if we're trying to quarantine it, why would we point to the real object directory? What kind of stuff is gonna go in there and this pre receive the.

A

Problem is that these are environment. Variables will completely break gates. Let me see if I can demonstrate that if I do get show ref head in this temporary ripple story that just works, but if I say against objects, directory equals for empty and I, then do gate show ref heads.

A

Then it is throwing up all over the place because I just pointed its object directory to something that doesn't exist. Oh.

B

So it's because this is a partial set of objects and it's gonna refer to objects. Yeah. The.

A

Quarantine area is partial, because that is the those are the objects that were just pushed and usually a push is incremental.

A

But so, if you look this that this, this yeah, like you said this points to a partial list of only the new objects- and this is not a complete git repository- it's not a valid gate repository for that. We also need to remember where the rest of the repo is, and that's why I get says these two variables.

A

I wasn't actually aware that kids said this one I'm not sure what it's for, but it is sort of backing on my story that it's actually called quarantine path, yeah and it- and you can see it just repeats this, so maybe I can tell you another time where we can look up another time, why that's called why that one is there. We only rely on these two, so we do something pretty complex, because so these two things are important and we only know about them in the context of this hook.

A

So when this hook makes an API call to get lab, it actually looks in the environment and sends these values. Now, let's see if I can show you that that is actually true.

A

So that would be API internal and thing we're going to is to pre, receive.

A

Cash to search.

A

A

That's all shouldn't there be something more happening here: I.

A

Don't know works, oh.

A

Okay, now I remember so: I was looking at these API calls and saying well, I guess these two are the set up of the session, and that needs to happen. Wouldn't that make sense, actually that's not how it works.

A

This one is session setup.

A

This one is pre receive this. One is also pre receive, and this one is post receive why the hell well, it makes no sense- and this is actually this is. This- is technical death? That's the short answer.

A

What happened here is that get lab shell, so this parts, the session set apart for SSH this code lives in the kidnap, shell, repo for legacy reasons. I, don't want to go into all the code that does this also listen to get left cell repo and it shares implementation codes, even though they are part of completely different things right, because the session setup is part of this starting, the gate receive back process, and this stuff is part of the hooks, which is all the way down here.

A

What? Why is this in the same repo- and why is this called good lab shell? There is no answer to the. Why, but it's how things are. This leads to all sorts of breathing wrong I into the microphone. This leads to all sorts of confusion, because, in particular we're reusing. These end points either to establish a get any connection after receiving an SSH session or we're using the same endpoints when we're in the middle of a hook of agate who runs on the Grizzly server.

A

Are you following? Are you still with me? Are you following this, so so yeah? So logically, these are two separate things right. This should be calling a different API endpoint like really it should be calling this, but actually for legacy reasons it's calling the same endpoint and that's because a whole bunch of code that is indeed live. Shell was just calling the same API endpoints and that API either returns data. You need to establish at this for the session setup or it does something completely different. Just this.

A

Let's, let's see if I can find the code and see if it's true back that up, that would be the Louds.

A

Oh boy, yes, here we are, let's assume that you know make it bigger. So this is the implementation of the allowed post. So this one so first of all, the actor can either be identified by an SSH key guess what that's when we're up here or it can be identified by something that points to a user. Id guess what that is this case, because these hooks of course also run during an HTTP push which is zero to do with SSH. But the code is mixed because legacy.

A

So then we go into this excess checker and there's some real fun here, because are the changes.

A

You I want to see the changes.

B

A

A

Project protocol namespace.

A

Here we go params, so what the hell is params changes. Let's see.

A

This is my very ugly debugging heck, let's see if it works, I should show up here now, if code reloading works, which I never know if it will.

A

A

A

There we go so I was just telling you that's there's these two calls and luckily I.

B

A

So in this first one there's a changes per am on this HTTP POST, which is as a bogus value underscore any, and that is because it is just as good left shell trying to establish an SSH session and it has no idea what the changes are, because the data hasn't been sent down the line yet and we still hit the same API endpoint, because because no, but we do and then the second time around were actually in the pre receive hook. And here we have concrete information about a change.

A

These are commit IDs, and it's saying that this branch test branches or wants to be updated from this to this.

A

Okay, so I found the part of the API that does the.

A

To pre receive this, is the real pre receive check, I'm, not sure if it helps that I keep saying it makes no sense, though I think it's true, but is it making some sense now or do you at least see what is happening? Yeah.

B

I see that handler does a lot for that uh yeah.

A

It is two different purposes and for the pre receive we only care about the second time we edits and one way to recognize. That is that the changes field is populated and has real data in there. Okay, how did I get here? I wanted to show you the quarantine directory. So, let's see where that is. That is here.

A

This thing is storing the quarantine directory. So let's see what we have here, I'm just going to do the same. Dirty logging trick.

A

Come on so what is this nf thing? So, when I do a new push, it's going to be slow because it is doing a life called reloads, which is still better than me having to restart the whole thing.

A

These are errors from HD, because this is sort of a non-standard way to set up sshd.

A

Okay. This is exactly what I wanted to see. Let's break that up a little bit, and we know that's here so this first one.

A

This is the session setup. One.

A

And this whole end thing is empty because duh, it's only relevant for the other use of the dual use, end points. This thing is empty and how I put the whitespace in the wrong spots? I need to put it here now. This is the actual gates free receive hook, API equal now here we have this fund information, so we've alternate object directories.

A

Then we have okay. Actually, we have them in duplicates. Now we have two changes. So let's look at this a little bit and compare this with the environment because we're still printing the environment up here. What do we have if this is.

A

So two of these well, first of all, you can see there are absolute paths and and relative paths the. Why do we have both well during the migration project? We were in a weird transition state where some of the codes would ran run.

A

The hooks and later code that looks at the git repo might run on different machines where the repos are on different long points, so absolute paths would break. We actually had this break on us, so we had to so because the gate gives you absolute paths right. You see here. This is come on. This is an absolute path, but that is not the same thing to pass around the network between servers, so we convert them to relative paths to relative to the repo directory.

A

So this one is: don't get slash objects, so the relative path is object. So we can forget about this one, and here you have the same thing: it is dot kid slash, object, slash incoming something something, so we care about that.

A

So these two things are known are only valid in the context of this API call, so they are local to the that one HTTP request and what we do with them is that we store them here in this class.

C

Also, they alternate one, not only the current team, but also they they yeah.

A

Yeah was, strictly speaking, the valiant alternates is also always the same, but we don't make an assumption about that. We exactly later on. We throw these two things together, because we don't really care which one is which we just make sure we want to make sure we have everything because we're not doing any rights.

C

A

Okay, so okay and you're starting to see now, maybe why I was saying this stuff is very subtle and good to know about. If you want to know what happens, you're gonna get push, because the other thing maybe I can show up I'm scrolling back here. I mean this is one random string. Of course, during my last push, it was a different random string right. It's a temp directory. It's different every time.

A

So because we need to use this and because all get any calls that happen in this reels process during that's during that, API call all need to respect or need to know about this extra secret quarantine directory. We stick that information in this hook and thing. So, let's look for a.

B

A

See what's going on in there.

A

So this is, and that is using a request or a request door is a wrapper around threat, local storage. So, regardless of whether we use a multi-threaded or a single threat, its rails application server, this will be local to the request. And the other thing that's important or good, to know about this stuff is that there's a middleware in the rails stack that resets. This request store after each request. So it's it's read: local and because of the middleware, it is local to the to the request. I think Fran, you've seen this before a pull.

A

I'm, not sure. If you seen this before, does it make sense, yeah.

C

Yeah Italy I mean I that is in the in the rack. Middleware also clears and says the the recursive store on each on each request. Yeah.

A

You're with us.

B

Tuple yeah makes sense. You basically have to reset it and isolate it. So it's not reused for other requests. Yeah.

A

If this will go I guess we were using context and and variables, you would sit on a context except it's not go so we're. We have some roughly equivalent Ruby thing, okay, so these things get stored here and at the start of this handler for the this API call and that later they get used I'm going to ignore this rugged thing, because that's legacy here in Italy clients, because when we create a literally repository objects, we need to send these objects over to Kaitlyn and I mentioned. I.

A

Think I hinted that this last time that in how do I keep track of all these windows literally proto.

A

We have the repository message, and here you have these fields that are about exactly this, so that's where they come from, and this is the one magical unicorn place where we need them and that they're critical, because otherwise you have an incomplete repo.

A

Okay, 38 minutes I got to go to this part. That's good!

A

Do you have more questions about this stuff.

B

A

Cool then, let's see where we can take this next, because this is the most. This is I. Think the the weirdest trickiest part of the well I okay, I, can show a little bit more took a little bit more about what happens on the other side. So here we make.

A

These things relative paths, those were into the greatly proposed Ettore message and then, of course, on the goodly sides. We take those relatives paths and we joined them with the absolute path to the repo, because that's where we know the absolute path and then we had absolute paths back to gates just to be safe, I think it actually can handle relative paths, but we we're not counting on that and yeah. Let me show the goodly part, because it is, it is still one way or another.

A

It's a weird thing: we don't need to look at this. Look at this so and giddily. We have.

A

Okay, that's too much: where is this stuff.

A

There we go in Italy, we have a thing that both resolves the repository path from repo message, so the repo message was here. So the repository path is resolved by looking up the storage name and get the lease config in memory, mapping that to an absolute path on disk and then just joining this path to it right and then you have to repo path, and then we also look at these fields and these fields, then guess we just put them in a list of a list of strings that are as a representation of environment variables.

A

So this thing then gets called in specific places to make sure that we set this extra environment variables on any gate processes respond and also, if you go down to the Keithley ruby layer, you'll see that when we create a rugged instance, we also feed this into rugged and say rugged: hey here's, a repo and use this extra directory, because otherwise you can find all your objects.

A

Let's see for a moment where this thing gets used,.

A

So I guess we're smart Cadfael. Are we.

A

This looks like we just put it on every gate commands which is a relief, because no one forgets forget to do it yeah. So we do it here and then for some reason, I'm not sure about we do it.

A

Manually here, maybe because no I have no idea why these do it manual yeah. So that's where this thing trickles through, so we had the the push and we saw that during the push get sets these variables for us, and then we interpret that during the hook we send it along on API call we stored a threat, local storage from the real server and then all outbound RPC calls from the real server pass. This value back into little e to the same giddily server, where this directory actually exists.

A

A

Don't know, maybe it's not as mind-boggling as I think it is, but I always struggle to explain this cool. So let me turn them slack, because I have no idea what people are gonna say to me.

A

What else can we look at here with Kate Bush.

A

I think this sort of this sort of tells the story. We could take a look at how good push over HTTP works. That is one thing that's interesting. We could also look at web commits, but I think web well get push off. Http is more similar to get push over SSH. So maybe we can just squeeze that in now,.

C

One question the ones the request gets to the well: first, the changes get to get early right. Yes, the changes, Prime and then get early forward. Those changes to route.

A

Now, back to rails, right go on another one, please so so the changes maybe I should show where the changes come from. That's a good one that is here so I could say, horn, revs and that will then end up in the outputs.

A

A

Okay should say something more.

A

I'm using Warren, because that is a ruby, it's like Buddhist, but it what it does is stand.

A

This very standard standard error that pootis pootis switch, so it's like saying, puts and just putting random strings except they go to standard error because I am not entirely sure. Let's see what happens if we push the standard outs.

A

This kid, like that.

A

Yeah, okay, I guess we can here. We can also push this down to town. It doesn't matter where do the ref starts the refs start here and because we're pushing just one change, it's just one line.

A

So this is- and this is part of the interface of the hook because git doesn't know whatever it is. Hookah is the hook is just an executable. It finds this executable in. Where is it.

A

Directory and it sees oh there's a hook- it's executable I'm, going to run it and going to feed this stuff on standard outs. Sorry, on standard in, if the hook exit was zero, then the push is allowed if the hook access with one the push gets denied. Okay, maybe I should demonstrate that for a moment.

A

Sorry aborts, you know, look.

A

Go away so go away if we do that.

A

Then Ramon says go away and it kindly tells us that the pre receive book the clients could that's the interface that we have there. So what is it you wanted to know about the changes so about the about this stuff from.

C

Sorry I missed the mouse problem, so once we make the request the foods and get to get Ally, we for good party changes to Wales, I think to call the gauges access class. Yes, this.

A

Is cool? This post contains the changes in its body.

C

Okay, okay and.

A

I closed the file again, but I can find it loud now. Posts not allowed.

A

Oh one more dots there we go grab magic. This was the handler and the so the changes gets I, think it's a JSON post body and some middleware and backs that into a hash, the params hash. And then here we have params changes and that gets fed into the access check class.

C

Okay: okay: um there is where we interact with our rules and our yes.

A

Yeah exactly so, the all the all the hooks and rules are gathered underneath that class and I guess: Fran you're also familiar more familiar with this class because you worked on the hooks in the wikis yeah. So this thing, fans out into Oh, hold all the checks and, if you're very unlucky, it fans out into expensive checks that slow down your push, but that's a different story: okay,.

C

A

Cool, so that was the changes parts yeah we have ten minutes left, so we can take. A look at workhorse tried that okay, actually I already spoiled. That say something you don't know yet, okay.

A

So what is workhorse I mentioned workhorse before it is reverse proxy that sits in front of the rails, app one way to explain its. If you don't know the concept is, imagine we took nginx and we've stuffed it full of plugins, except it's not nginx, but it's a custom go app and we wrote all the plugins and go I.

A

Don't know if that helps. But it's it is it's a reverse proxy with lots of custom features and it's a weird architecture thing that happens because in the big we used to just have rails and these Ruby processes have their limitations, and this was a way for us to heck. In things, that's hack things into the request cycle that we'd rather not do in rails and over time it got bigger and bigger, and anything that is slow like an uploads is better done in workhorse, because there that is a go process.

A

It's just a go routine. So having a go routine that takes five minutes is no problem having a real process. A real request that takes five minutes often is a problem, particularly with unicorn, which is single threaded. So then you're hogging process with several hundred megabytes of memory for five minutes, so that would never be a good idea.

A

So that's what workhorse is, and actually the original use case for workhorse was to do get over HTTP because get over HTTP can take, however long it needs to take the if you have a very big repo and try to clone it. It's gonna take a long time just because you need to copy a lot of data and before we had workhorse, we had unicorn workers with a one minute timeout. So if you wanted to clone a large repo, you would hit the one minute timeout and be out of luck.

A

So get HTTP was a completely inferior transport compared with get SSH. That was the state's call long, a time call and workhorse. The first thing that we result was I was trying to get get HTTP to behave better, then offload Nats from the unicorn process. To this thing, this workhorse process, so what does that look like I haven't I haven't rehearsed how to approach this?

A

Let's see so I.

C

In workhorse we have some rules with a sheath on the back end point. Maybe yeah.

A

Let me first maybe try and explain what the transport looks like just irrespective of gitlab. So in the ssh case you establish one session and then you have bi-directional traffic across that's a session, because that's the that's how SSH works. You can have bi-directional communication and you get your data and then the session is done. Http doesn't work like that at least HTTP 1, 1 and 1. 0 doesn't work like that. You have a fixed request response cycle and after your so you have to you can only say one.

A

You can say one thing: if you're the client and I have to wait to get one response, then your request is done, so the gate, maintainer x' came up with a way to shoehorn.

A

The process used for the gate, transport into HTTP requests and the first met the the major version, and the only thing that is the most common version is called this smart get HTTP protocol. That's because there used to be a dumb protocol that we don't care about the done. We don't support the damn protocol, so we don't have to talk about that. The smart protocol, it emulates this stuff that happens during an SSH push only within requesters response cycle and the way that works.

A

Let's take clone as an example, because, from a good point of view, get pushing it from this transfer point of view. Push and pull don't behave that difference so the user grunts kid clone. Kids makes HTTP cats.

A

What is it cloning? He was a cloning. You will clones it'll aunt at example.com, slash my repoed of kids, who would have a group name? My.

B

A

Matter, soget makes HP get requests.

A

Slash info refs servers equals its upload back and that thing returns a list of all the references on the gate, server and.

A

Hits on server returns list of all references, and what I mean by that is it looks. It looks a little bit like like that, like this, it's formatted slightly different, but it's a list of ID's and references I guess we can actually run the I'm not going to do that. I could run the actual command. That does this, but it's a distraction. So that goes back to the clients. Clients looks at objects. It's already has well, no sorry. The client first looks at this list and decides what it wants to have.

A

Kids on clients fix what refs commits it wants to have and then gates, clients makes it CP requests posts.

A

Get up float back and.

A

Those body a description of the ones and the haves because their client may already have sexy for clone. There are no halves. So it's a description. What it wants.

A

Yes, that's it in the post body and then the post response is Beck file with the requested objects and RF updates description. Just I think this is repeated in the response. I'm actually not entirely sure. Maybe we can look at this next time. If you want see you can, if you use the mitten proxy or something like that, you can actually intercept this stuff and look at what goes on on the wire, but we don't have to so that's the cycle.

A

So, instead of one SSH session, you have gets followed by a post and the fun fact is that in production you see that you get way more gets of these get requests in the post request so apparently based on github.com a lot of people. A lot of clients are just checking to see if there are changes, because if you do get fetch and get fetch comes back and says nothing changed, you're ready up to date. That means that it did that get requests.

A

It got the list of everything that they have that's there and it's decides there's nothing. New I want to have yeah, so that is the that is the mechanics of the transports. This is, and this is the main difference with gate SSH, because the the basic idea is still the same. So what we have in workhorse is we have HTTP routes that intercept these specific requests and do something special with them.

A

You still with me.

B

A

Actually, it's now were three minutes till the end of our window, so I'm not sure how I should probably I can show it in the code. I guess we can do that unless you have a question about this, just happy to answer.

C

No no I mean it's clear. My main concern right now, but I suppose it is dawning in in work closely. Is the authentication in this case when you perform the git clone yeah.

A

This is yeah, that's a that is a not so clear piece of codes. It's worth looking into, I think what I'll do is actually I'll. Take this description and fill in what what happens in the good life case. So it makes this HTTP requests and that requests gets intercepted by workhorse.

A

Workhorse three authenticates their requests with Hitler, and this is similar to what Caleb shelled us at the start of a session.

A

If okay rails response with Italy session data, Italy address so network address access, token, repo, Italy, proto, repo objects etc and then get lab, or course.

A

Establishes get Ally session copies; okay, let me just you're lying workhorse copies. Well, there's nothing! There's no request copies response, Italy response into HP gets response body.

C

So, okay, so here raisin only interfere to validate that pre authentication, yeah.

A

C

Is done by yeah yeah.

A

Because this, this is a fast request. It's just a database lookup and.

A

Yeah, it is it's only telling workhorse enough so that workers can start proxying all the data back and forth to get Lee and in the post case there is a little more work with really. The only difference is that.

A

Workhorse copies post requests body to get early and it copies the the Keithley respond back into the body. So it's really just proxying copying. Data back and forth.

C

Okay, great thanks so.

A

That's the high-level overview and if we want, we can look at this in more detail on what this looks like in the actual workhorse code and how and how this. What this part looks like. Yeah,.

B

That mean I think they made a great topic cool.

A

Well then, let me end the recording.