GitLab Scalability Team, 30 Sep 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Scalability Team Demo - 2021-09-30

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

um I also have to drop off at half past, so I'm glad that somebody.

B

Yeah, you don't think you don't want to record because you lose the recordings. Oh that's that too, but I also have to drop off at half past this time.

A

Yeah um there's this uh uh yeah, it's uh uh it's it's something that has been on my mind for a while, and I had a bit of downtime and I thought just going to see if I can get this working and it looks like it works.

A

So this started when a while ago, matt and igor did an investigation into cpu spikes on one gitly server, where they noticed that sometimes it's at 100 cpu, and um that a lot of it is because of system calls in italy and in git, because it's all about the final tags rpc in that case, so there's a repo with a lot of tags and something is fetching those tags in a loop, and it turns out that gitly implements this type of rpc, where it has to yield many objects in a very inefficient way.

A

So when I saw that issue, I started paying a little bit more attention to this, and I noticed a second problem which is not just about the I o, but also about the memory allocation. So I started thinking.

A

Is there a way to do this better and actually, one or two months ago? Somebody also asked me if somebody from the kitly team asked me if I had any ideas of how to make final attacks uh more efficient, because it was something that kept coming up. So I've been walking around with this idea and now I finally thought, let's put something together, and it turns out that uh we can do better.

A

So I made a benchmark where I took the gitlab or kit lab like a local clone in gdk and in a rails runner, I fetch. I. I built an array with all the tags, which is what the client side of final tax looks like and then my benchmark was to do that 100 times in a row and see how fast it goes and with the status quo that was taking 59 seconds for 100 calls where we built an array of old attacks in github or kit lab.

B

And that's the measurement on the ruby side on the movie.

A

B

Their turns and turned into tags and.

A

Yeah yeah, so they're they're the time it takes to build an array of gitlab git tag, ruby objects.

A

uh Well, I can actually just show you that uh what the benchmark is so the benchmark uh finds project 23. It takes the role repository, so there's no caching, and then it says repository.tags 100 times so that that makes yeah.

A

So it's kind of I think it's a reasonably end-to-end benchmark, uh but it doesn't include the overheads of if you would make api calls. You also have json serialization on top of that.

A

So with the branch, I'm testing or with the version of gitlab I'm testing on take 59 seconds to that 100 times, and then I started to make an improved version and I could get that down to 9 seconds for 100 calls.

A

So that's nice because that means it goes uh it's. So this is single threaded in a loop and it means that goes six times faster and I also took flame graphs. uh So after I did that so the benchmark is just 100 calls, and then I turned on another script where I do the same thing, but with an infinite loop. So then you enter a steady state of single threaded traffic and then I run a 30 second profile to see what the server is doing and uh in those profiles you see.

A

uh So this is one from the status quo. I think uh you, you see um yeah, it's probably interesting to look at the sales of that for a moment, so in the status quo, ruby's actually not doing a lot of work, but italy is doing a lot of work gets doing. A lot of work and prefect is barely visible, but the thing I was focusing on was the kit and the gitly part, because that runs on the gt3 server and that is not horizontally.

A

Scalable prefect is, and it prefect is also not the biggest part of the pi here. So in my table, I sum the number of samples for these two things and there. So the status quo was in 30 seconds, 4 500 samples, either gate or italy was doing stuff on the cpu, and when I put this improved version uh or this alternative implementation in steady state- and I take a 30 second sample, then git and gitly are together.

A

3 300 uh samples on the cpu, but if you take into consideration that it's doing more calls, that means it is 10 samples per coal instead of 89 samples per coal, so that means 9 less cpu workloads. I think that means 90 9 times less cpu workloads so from 89 to 10 on the kitly server.

C

Jacob sorry, I arrived later, you've probably answered this, but it's find all tags implemented as a server stream.

A

C

And does each uh tag come back as a separate message? No okay! So it's it's batching them! It's.

A

C

And how big are the batches.

A

um Fairly big, like.

C

Five or a hundred.

A

Closer to a hundred.

C

A

C

Wow and it's still.

A

Yes, in the original implementation, the batches were like 10 or 20 and then somewhere along the line, uh john kai wrote.

B

A

Smart code, where he calculates the size of each protocol.

B

A

And he packs as many into a fixed window as possible, and I don't remember what the fixed window is. Okay,.

C

So it's right! So it's it's! It's a smart um that that's even it.

A

Is already batching.

C

Them yeah, yeah, okay; okay, that I was just wondering whether just batching would give you much of the same effect, but that's amazing that it gives you that much over batched um yeah.

A

Well, the funny thing is that uh in the status quo, you can actually see that prefect is doing very little, and that is, I think, because the batching is working well. Prefect has to copy very few uh very few grpc messages, and if I look at the improved version and its profile, then uh prefix has more work to do and gitly has more work to do.

A

And get this more active here, but again, this is when it's doing six times as many requests, so it is an excuse to use more cpu. It's.

B

Doing more work yeah, it's yeah um working working efficiently, not what is.

A

The thing uh not hard yeah and it could even be made more efficient because uh well, you can't see anything here, but I know that gate is using a four kilobyte buffer. On its end and in other scenarios I tried to git. Currently doesn't let you make that buffer breaker? But if you do uh you also reduce this call, so I think we can probably squeeze.

A

We can probably reduce this git block if we can get kit to use a bigger standard output buffer, but that that's an aside so but it's it's a really uh yeah.

B

Go ahead now, I want to say something.

A

About how it's done, uh what what the magic is uh or well what's different, but uh you have go ahead well,.

C

My my my the sort of the trend that I see here is that we are taking all the grpcs and the most the ones that are the most heavy and the most expensive from a grpc point of view, perhaps, and turning them into binary streams of of a sort, and obviously we can kind of do this one by one by one. Is there like an alternative?

C

That is something like grpc but is like a lot more efficient where we can get the benefits of of uh because obviously, grpc has huge benefits in terms of like you know, understanding the the structure of the data coming to you and repeatability you, you know what I'm talking about, but obviously it's got a massive downside as well, but it's kind of like we could say we can turn everything into assembly and it'll be much faster downside to that as well.

A

Is there like yeah it's a little.

C

Actually yeah, it's a good question.

A

uh There's actually two different patterns going on here, because I think you're referring to the git fetch uh efficiency, epic, we're working.

B

On which is also about.

A

Byte streams yeah um in that case we're taking rpcs that already are byte streams, uh they're just byte streams encapsulated in grpc messages and if you're transferring.

B

A

Of data, that's it doesn't.

C

Make any sense yeah in that case, it definitely makes sense just to leave it as a live stream. There's no.

A

C

A

uh So that is there we're saying we're going from a byte stream encapsulated in grpc to a byte stream in yamax, and that's. We expect that to be much better by the time. We're done in this thing we're using still using a byte stream encapsulated in grpc, so we could cut the improvement from the other thing.

A

On top of this, if we really wanted to except uh yamax, is a go library and there's only a go implementation, so we can't really use the stuff we're building for the git fetch efficiency project for this, unless we reimplement yamax, which was yeah. um But of course, if we had something that was not grpc and that was primarily oriented towards byte streams, then both of these things would work well, but just by putting a byte stream on encapsulated in grpc messages.

A

We're able to get this improvement, and one of the reasons is.

C

How are you delimiting the like? Is there a zero between each item in the byte stream or how you like, like? How are you structuring that byte stream yeah.

A

um There's a git command called cad file and that's as a batch mode, where you give it a bunch of uh object. Descript object, names on standard input, one per line, and then it dumps them out on standard outputs, and that is the format. So that has a header line that you can parse. So you just read until the new line and you break it on the spaces, and then you know yeah next and then that headline.

B

That's why grit is.

A

Back and those commits you linked well actually to grit out again because I could go faster without it, but uh so the the what.

B

A

A grid is a gem that gitlab used to use to parse data that came out of kits something we used before rugged existed so for old timers. It is it's a it's very much a throwback.

A

So the format of the stream is the more or less the output format of git cut file batch and and that format is easy to parse. It is one header line like I said one header line followed by a new line, so you can read that with buffered io and then you have a length prefix. So you know that the next x bytes are the the raw binary representation of the git objects.

A

And then there is another header line or there is endo file, so that is uh that is easy to parse. And then you need to parse the binary format of git objects. But it's it's.

A

Basically, a text format that you can parse with awk, it's a it's a very friendly uh to parse it it's not a difficult format to parse, but that is actually one of the disadvantages that I list listed here, which is that for this to work, the client side needs to have this parser for the get object formats and currently gitly has a parser for that which we had to build there, and I hacked one together that works well enough, but I didn't write a test. Suite so I'm sure it does bugs.

C

It it kind of moves, get giddily back into more of a like a proxy like a git proxy, even more than it was right, because it's kind of like yeah.

A

Yeah- and I I think that- uh and this is a bit of a crazy idea- because we made this design choice early on that we parse on the gitly server and we return structured objects and we use protobuf for the structure because well we have protobuf, it's grpc, that's what that's what you do in grpc, but all that parsing uh happens on uh a non-scalable resource and it can just happen just as well on the clients and the other problem is that because of the parsing um parsing is just an intensive, cpu intensive job and what's happening now.

A

Is that gitzly reads the git object format it parses that into a go object? So that's one time of parsing, then it serializes that into the prototype of wire format. So then we go back to bytes.

A

Then grpc sends a bites across the wire and then the bytes come out in ruby and ruby takes the bytes and turns them into a ruby pro tip of objects. And then we stick that ruby pro tube of objects into a wrapper class and git, and then we use it so there's.

D

I am, why was that design choice made in the beginning.

C

D

Say it's a sound.

C

Design choice like it's it you want to encapsulate um your your you know, because that means you can replace the back end of git with you know, other inputs.

B

For example, italy, because I think gitlab get committing- stems from before getaly.

A

Now but the question of why are destroyer commit structured? I think it's just a natural design. uh I think it's the most obvious design like if you, if.

B

A

An api, a json api for getting commits out of gitlab. Then you don't get uh one big blob in the native kit object format. You get a structured, json object that is easy to get fields out of it's just easier to consume. For the a structured.

C

Object is easier.

D

So it's a case that it was a reasonable design choice at the beginning. It's just that. It's gotten to the point that we can't that uh from a scaling perspective, it's just not the best choice, anymore,.

A

Yeah it's that with what I know now I would say it is not the most efficient choice and um it's yeah. It forces you down a path it and andrew asked about batching. So we we create. These go objects one for commit, and then we batch them before we send them out. But that means that we need to allocate memory for each of these objects. Well, in order to parse these objects, we need to keep allocating new buffers and we can't reuse them until the whole batch goes out.

A

So with a batching interface, it's harder to reuse memory and you need to worry about where the boundary is and if you're just copying bytes, then you don't care where the binary is. You just say a bite. The by the boundary can be at any byte. You send out the message you go to the next one, so it adds all this complication to to italy and um with where we are now it added up to make things slow.

C

It kind of reminds me a little bit of of like I'm gonna date myself here but like when people were building like xml services, and uh you know it was very similar. You would have a service and it would.

C

uh You would have a dom and you would build up your your objects and you generate your stubs and everything like that and exactly it was basically just grpc before grpc right and then you'd populate it and then at some point the service would be scaling up and, and you would have a big problem with the performance of the server and then at that point.

C

The nice thing that you had in the rpc times was that you could replace it so that you weren't doing that part of creating an object, an array of objects and then populating it and then passing it to the parser. You would just switch over to a jack's emitter right where you would just uh you know, basically, centered.

B

C

And it would stream those down and then you'd get like a 100x speed up, which is exactly what you're doing here. But the nice thing is that the cert, this the client side, would kind of be the same, because the client didn't have to you could kind of independently change those two sides, and then the client could actually um switch over to a jack's parser as well. And then you get the speed up on both sides.

A

It's not exactly the same problem because with the stream parser, the problem is that you don't want to build this huge uh intermediate state when you can do the job in the stream yeah.

C

But that's what the jacks that's.

A

What jax was right.

C

A

But you, it still means you parse you're, parsing xml. It just happens that you can. You know.

C

You're not putting it into an intermediate.

A

Yeah yeah there's different approaches to doing that and yeah it's harder to program because you need to program callbacks, uh yeah.

B

A

Go much faster, but the difference here is just not parsing at all. It's as, if imagine uh you had a log for json lines and you wanted to send that somewhere else and you parse each json line of object in your programming language and you dump it back to json before you write it into a socket. It's faster. If you never parse it in the first place,.

C

A

Write it into the socket.

C

Yeah I mean I still think that there is a benefit overall to to having an abstraction. You know like say you wanted to change. Git form. um I don't know you know it's it. It's a sound, it's a sound um abstraction to to to have, and if you look at every other grpc service, you know we're not just sending back raw files. It's a set.

C

We we use that same um way, because it's a sound way of building things that allows you to to to change your server without always impacting your client and and that's a a good thing like architecturally, but I think in this case you know the benefit. Is there an advantage in the fact we get value from kind of dropping that abstraction, but um I don't think it's like automatic that you should drop it, but in this case no.

A

No, like I said it is the it is a very it is the natural design choice to do what we did but yeah for the sake of efficiency. um It's it's just not always ideal.

C

And especially, find I mean final tags is one of those terrible endpoints. That's just always the problem.

B

Yeah but it's applied to.

A

It sorry go ahead.

B

It's this thing that would like, what's what's causing it to be so much slower? Is it the um the the deserializing or like the parsing on the italy side or the putting.

A

B

The wire in the in the grpc messages, I think the buffered.

A

I, the unbuffered io, is the biggest is the biggest problem. um So that means that um you have. You have one git process that is creating a list of all the tags and what gitly does now is it reads one item of the list, so that's a system call and then it writes it into another process, and then that returns the tag and then it reads directly reads it back. So that's two more system calls and then it parses that and it puts it in a grpc message and throws it on the batch.

A

So you do never yeah.

C

A

And the difference is that now we have the git process that generates the list. It talks directly to the via pipe into the git process that fetches the objects. So um it writes still one line at a time, but they come out as four kilobyte chunks. So it only does the right every four kilobytes and then the process that generates the objects also.

A

Does that so there's few way fewer system calls in that chain and then italy only gets woken up when there's four kilobytes of raw data and it needs to send that over the wire. So everybody has spends more time in user space and less time making system calls.

C

um Another- and this does like I think this is not going to change the fact that this work should get done but, like I think some of the problem is that find all tags should almost never be called, because there are lots and more and more all the time repositories with 20 30, 000 tags on them.

C

Right and people use find all tags, because we also have problems with find all tags um in redis and the cache that we have and the amount of data that we transfer in and out of that cache and what it is is is somebody needs to display a page and it's got a list of tags on the page and they do find all tags, and you know in their mind, they're thinking that there's going to be 10 tags on the repository, but there's 30 000 tags and it's putting pressure on redis. It's putting pressure on.

C

um uh Obviously on italy, it's expensive for the application and use lots of memory and most of those cases like a user. Can't you know if it's a obviously there are merge, requests and sidekick jobs and stuff that might need all. Thirty thousand, but most of the time a user doesn't need to see. Thirty thousand, you know so it's almost like they, they shouldn't be yeah. They can't see it.

C

There shouldn't be a way to call that endpoint from the web, because it means that something's gone horribly wrong in the architecture or the design of a of a controller, um but it is used all over the show and it causes us a lot.

B

I think I think that the the reason that it's used like this is because there's no uh nobody has built something like a paginator for tax, yet yeah.

A

But they built the paginator for branches already, and I don't see why it wouldn't work for tags. Yeah.

B

I just recently reviewed something that was unrelated to this, but I was also listing tags and then just taking the first. I don't know handful of them. So if you do that for a repository with loads of tags, that's a lot of work that that get this server is doing to throw away. On the other side.

A

Yeah there it is.

C

And it's yeah, it's redis as well. You know, there's so many things that you know, because we have that z-set um implementation, that checks, existence of tags and everything there's so many things that are really expensive, because people assume that there's like 10 tags on a repository and it's only going to get worse as time goes on right people never delete tags.

A

Yeah, this has been broken forever, so uh yeah that it's it can and should also be improved on the on the client side.

B

The thing that I was reviewing on the client side was um a high deadline, exceeded errors for final tax, basically.

A

Yeah well thanks anyway, for uh for listening to this, uh I'm not really sure what to do with it, because we have lots of other projects going on, and it's also quite the radical departure of how the rest of kittley works. So I'm not sure if uh it would be easy to sell this to the kittley team or where this even fits into our other priorities. But um I thought it was. I wanted to see if it worked, and so I can happily say that, yes, it works.

D

It might be that this is one of the things that um we we keep in the back of our mind, for when we start to have problems in this area. We know that this is something that we can do to alleviate it, because I think, with all the other things going on, I I don't. I agree with you. I'm not quite sure how we would get this some priorities to do unless it was very like a very, very clear. We have to do this or that's going to explode.

A

Well, um one practical thing I can say is that there matt created an infradev issue about these final tag problems. So we could take this as a possible solution uh to solve to resolve that infra death issue, but that doesn't tell us who's going to do it and when.

B

Wouldn't the whole pagination thing, and so on be more like a better thing to work on than this, because, most of the time like even now you, you found a very efficient way to push all those tags into ruby, which is awesome. But do we need that.

C

We shouldn't we should. It would be better if we didn't have all those tags in the first place.

B

That made me wonder how many of these all methods do we have that load way too much from italy. I have no idea if there's five.

C

Branches and find all tags, is it branches or refs um whatever, but.

A

We have a couple of them: yeah.

C

um Just just one other thing: um jakob do you think like if you implemented that, like maybe the way to kind of approach, it is to say the the the?

C

What we do is we kind of closely couple the giddy client and the kiddley server, or we make them more coupled and so that implementation detail change, is kind of in the sorry, giddy client being the ruby class called gilly client.

B

C

Whatever that's become since I last looked at it a long time ago, but um basically like when you do the find all tags from the caller's point of view, the the abstraction remains the same. It basically gives you an.

B

C

But and and then you don't have to kind of push that that complexity of this new abstraction and pausing.

A

Development: okay, the proof of concept stops everything and getting clients. I just opened them.

C

Then that's a lot better then it's, I think the sellers are much easier, then, as well, because you're not pushing it on to like everyone in the organization.

A

No, no, so you don't uh it's, um I it's it's almost all hidden and gets declined right now and not in a completely natural way. But that's where the that's where the response stream comes in and uh yeah and an array of git tag, gitlab git tag, objects comes out.

C

Yeah, in that case, it's probably something you could sell fairly like relatively easily but yeah when.

A

I well, I think part of the shell is this first towards rachel and the amount of time it costs and.

A

Yeah, I and I I don't know how people uh how a wider audience would feel about this concept of uh not emitting structured data or emitting data in a custom uh yeah as a byte stream like this, but um anyway I I just wanted to like. I said I wanted to share it, uh and I think we have enough other things going on before this uh before we'd have time for this.

D

Now, thanks for taking us through that, it's uh it's, I was quite surprised by the difference in the results, so it was good to see.

A

One of the weird things by the way was that I just sorry just one more thing, but I started building this and at first it was uh twice as fast and I thought wow. This is amazing, and then I started fiddling with the ruby codes and then it went. uh I got it three times faster after that, and then I ended up at six times.

A

um So that's that was a bit crazy, but that that's how much uh yeah, how much difference that made.

C

Like if you moved this over into gitlab or gitlab this issue and related it with all of the find all tag alerts that we've had over the last I mean we could get the gilly team to to do this. Basically.

A

I think they have some other stuff on their minds, uh but.

C

Well, yeah, I'm just I'm just putting it out there that we can fight. We can probably get you know because it's yeah, if, if you're worried about like.

A

Yeah, it might make sense to move it over to get that board gitlab, because that's what the uh the infradev issue lives.

C

Oh, is there an infidel issue, okay,.

A

I have to drop off sorry about that, but thanks for for listening everyone and for the conversation.

D

Yeah, I'm gonna need to drop off as well. Unfortunately, thanks so much for recording this bob.

B

You're very welcome andrew. Do you have still have something to show me no.