GitLab Scalability Team Demos, 16 Dec 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Scalability demo 2021-12-16

Description

https://docs.google.com/document/d/13TW4x3ofw0RxifZvZ7eNvrPxFnnXmhzQ8fal3fhYgjg/edit

A

Thanks uh yeah, so I've got the first and only item on the agenda, which is yes query, and this is a tool that I developed um this uh this week or last week. I guess, and it came out of some analysis. I wanted to do during a recent incident um where I wanted to do some post processing on data that is stored in elasticsearch, so our logs are in elasticsearch and I wanted to do some local post processing.

A

um That is not that easy to do with the elasticsearch query language, and I also wanted to grab the data so that we have it for longer than the seven day retention window that we have in elasticsearch.

A

And I was shocked to discover that there isn't really a good tool that does this kind of thing like grab. You give it a query and elasticsearch query and get out the json data. It just doesn't seem to exist, and so I looked around at some client libraries.

A

And and developed this tool, which is basically a small, go binary.

A

So, just to quickly share the repo there's some documentation and instructions here in the readme, so you can kind of see some of the stuff. That is that this tool is capable of, um but I will also give a quick demo and I did prepare something that hopefully will not include any sensitive information, because we are dealing with our production logs here.

A

So here's a sample query where we're querying the gcp events index. So this includes things, like um instance, maintenance. So it's useful to to see if a host was rebooted due to a maintenance event, and so you can give it a query either using this dash query parameter and then you kind of dump the json in there. But you can also provide this query um on uh standard in, and so this is particularly useful if you're using kibana and you've got like you use kibana.

A

As a query builder, you've got an elasticsearch query that you can get from the inspect tab in kibana. You paste that into a file and then you'd basically do yes uh well cat quer, query, dot json extract the the query field out of that thing and then pipe that into yes, query and.

B

You can also use pb paste even.

A

True yeah, so that's that's kind of the basic idea. So with that, let me just go ahead and run this, and so it- and you can see- I'm I'm piping into jq here to kind of extract some, not so sensitive information out of these logs, because I think they did include some ip addresses and such which don't want to include on this recorded call.

A

But this kind of shows what you can do with this tool, um and you know you can do whatever kind of analysis you want. So I'm just looking at how many uh log lines were present during this time range, which is you know, a fairly inefficient way of getting that count. But it's very efficient on human time, because it's very easy to put together this kind of query this kind of pipe line.

A

um One last thing that I want to show is: you can also give it dash verbose to see what it's actually doing, and uh this is a custom build that is actually stripping out the password which we probably want um and that that patch is pending upstream in the library to actually strip out those passwords.

A

um But with this web, as you can see, you know it's actually doing quite a bit of stuff just to paginate through the results and that's the elasticsearch uh pagination api. It is not very curl friendly. So that's that's kind of that's why you would want to use this tool in the first place over.

A

You know basic curl commands so yeah. That's that's! Basically, it cool.

A

Any questions, thoughts, comments.

C

I I I kept thinking the whole time, uh what's the difference with girl, but you answered that at the end yeah, that's uh it's annoying! After due to imagination,.

B

Yeah, it's not even that there was. There was a couple of deletes in there for some cursor as part of the requests. So you really don't want to be manually like remembering like yeah, where.

A

B

Chain of api calls that I need to do a delete instead of a get next yeah.

A

It's it's kind of obnoxious, so you you start the you do like a normal search call, but then you give it a scroll parameter, and you say how long you want this scroll to exist and it'll create this scroll object which does expire after whatever timeout you gave it, and then you need to go to the scroll api and say this is the scroll id from the last call.

A

Please give me some new results and a new scroll id, and so you that's kind of how you paginate through so you you need to kind of keep getting the token from the last response and piping it through to the next one, um pretty obnoxious to do with carl and and then to be nice. I kind of delete those at the end. In this case, we could also wait for one minute and they'll clean themselves up, so that is not strictly necessary, um but it's it's nice to clean up after yourself. Sorry.

B

I was on youtube.

A

And also this tool has already been pretty useful for some other use cases. A recent one was introducing new log fields and introducing those conditionally.

A

uh So uh this is the feature flags actually, um so we we kind of in rails log which feature flags were checked as part of that request and whether the the result was true or false, um and there's a lot of interesting stuff that you can do with that data. But one of the concerns I had was that this would massively increase our log volume in terms of the size of the log lines and elasticsearch doesn't store the size of the source, object anywhere.

A

You'd have to kind of compute and store that on your own. If you wanted to really query that or get that out of elasticsearch, and so with this tool, I could you know, query a sample and run my own statistics as a as a unix pipe on that.

B

A

There are quite a few aggregations. I want you to do.

B

Where you know you can use the the json um objects directly to like aggregate, so you can. You can run a function on all of those, but um that's not very convenient. Sometimes if, if you want to do something complicated and every time you do it, it's going to have to refetch everything and recompute it, and especially if um it doesn't really fit into that processor document a time model, um it's it's also challenging so yeah.

B

That sounds good.

C

A

I could talk a little bit, yeah.

C

Thanks steve, uh maybe you can talk a little about the uh the bachelor's degree gate mailing list, because it ended up being um getting progressively more boring, uh which is good, and I I I'm still waiting to hear more, but uh it's uh it's it's looking good.

C

um Let's see, I can show it as a merge request on our mirror of git.

C

No, this is not it. This is a stock mirror last update three months ago, but this is also a mirror, and this is merge. Requests.

C

And the the problem I'm trying to solve is that the I o sizes uh we use when transferring pac file data are not optimal. um This is a typical thing. If you do I o, then you can tune the I o sizes, the the size of the chunks of data you send across something, and sometimes it's irrelevant, and sometimes it matters and we send so much data, so much backfile data that it matters apparently and so yeah, but we had to come up with different.

C

We had to figure out how to solve the problem, and um I uh got feedback from patrick from the gitly team and we came up with something where uh actually this was this was my first idea uh is to use standard io. uh So then I had to use the file browser here.

C

I had to make some changes in in upload back so that it uses standard io, and I had to create another function that uses standardio instead of regular c uh unique syscalls um and because uh the standardity of buffer size is not that big, just using standard theo on its own is not enough to get larger rights.

C

uh I also had to create a configuration mechanism with an nfr to have a larger buffer, and this is also iffy, because if you want to reconfigure the buffer- and you want to do it as early as possible in the process life cycle, so the common main is- or this happens in a function yeah. This is really the main function of all git sub commands.

C

I I learned in a previous attempt that that is the right place to do it, but it also makes it more makes it harder to sell, if you're, adding stuff to the main startup function of that. That affects everything so yeah. That was 74 lines. um But then uh I was going back and forth with patrick and I was trying to write a cover letter for the good waiting list and I realized, like I'm, asking for more than what we actually need and there's something simpler. We can do, which is in.

C

Somewhere in the div here I could say: well there there already is a buffer, that's used for the copying and right now it is eight kilobytes plus one byte.

C

uh So what if we can just configure that and it's okay to compile in a different size, because we compile git when we build italy, so we we have our own git anyway. So we can use a different number here, and this is actually better than just tweaking the right sizes, because this also makes the reads bigger.

C

So it saves even more time. So I thought okay but uh yeah. Then we have a new compile time. Constants do people like that? Well, let's just try and then the funny thing on the kid made in this was that they said well. Why don't we just make this buffer bigger and forget about the constant and that didn't even occur to me because nobody who's you who's, not using the back objects. Everybody who's, not using the back optics.

C

Cache is probably going to do eight kilobyte rights, so they don't benefit from making this buffer bigger, but they get maintainer said why don't we make the buffer bigger and that's actually the simplest thing to do, and uh so, but it would have been, it would have been nice if it was just this one line that got me that where I changed the number, but the line is funny because the calculation of the number is weird.

C

So I wrote a comment to explain the calculation and then people pointed out that allocating 64 kilobytes in a stack variable is a bit much depending on the platform, so I had to switch it to use heap allocation.

C

So clock gives you a zeroed piece of memory because before the we also got a zeroed struct, but the so the patch ended up very small, and the other nice thing is that uh I've been doing experiments to compare these things where I start a flame graph, and I do a clone from on the vm, where I know it's a cache hit and then I just count the number of samples in the different programs- and I can say well if kit is using 200 samples without this tweak and it's using 120 samples with this tweak, then that is uh less cpu and then I add that up.

C

So I look at the the gitly hooks, gitly and git samples. um Let me just show uh what that looks like.

C

I don't remember which one this is, but here I'm actually doing something else.

A

I think you might also be able to get the sort of proportional cpu time from the r usage metrics that we have in logs.

C

Yes, but gitly itself is part of it and that won't show in our usage.

C

And that has been actually a recurring theme in all this work that we've been doing here is that gitly itself is a problem and it quickly cannot measure itself.

C

So um what I was trying to say is that my approach has been make a flame graph and count add up the different parts that I care about, and then I can make comparisons of the number of samples. But one of the good maintainers did was that he just simulated our cache by creating a shell script. That gets a pack file and uh he measured the throughput on that and he got roughly the same kind of like 30 kind speed increase. So I got a 30 percent uh drop in cpu frames or stack frames.

C

That's also good and he got a 30 roughly cpu, a 10 megabyte per second increase when he tried this. So that was a really different way to approach the, but also a very sensible and valid way to approach the impact. So it was nice to see that it had an impact, and it was really nice that this person took the time to set up that experiment because he was interested.

C

So yeah the patch got very small. I think it got a good reception and now it needs to go through the process of the the kit. Maintainer made the kit development process where the bdfl of git has a queue of patches.

C

You will look at and he once a week he posts an update to say where all these, what the status is of all these patches, so my patch will start being mentioned in what's cooking in git, which is a weekly email, and then I can keep an eye on the what's cooking and git email and at some point it will say it is in the next branch, and uh that is our internal um uh goal post like if once a commit is in next, it is um okay.

C

Let the getly team will accept it as a custom, git patch, on their build. So once it's in next, I can accelerate it a little bit, and otherwise you wait until next becomes the current kit version, but that's once a month or something of that order. So it's uh maybe two yeah sorry didn't prepare this. So I was rambling a bit any questions or comments.

A

That's great, I mean also thanks for sharing a little bit about the uh the development process of git itself, but that part was really interesting to me.

C

Yeah, it's uh it's! It's quite unusual because it's all! um Actually I don't know it's unusual compared to what we do they they they don't use uh anything like git lab or github. It's it's pure email-based workflow and everything goes through a single person. So you have uh you have maintainers, who I guess, what in the linux kernel community would be the lieutenant rule where they actually not? Quite, I think, with linux.

C

Once certain people have reviewed it. Luna storefolds doesn't really look at that or he just accepts big chunks of work that other people have reviewed. uh But here the the benevolent dictator uh reads everything.

C

So people go back and forth and make sure that things are in decent shape, but then he has to read everything yeah, but it it it works. Changes get made things get merged.

C

B

Cool thanks, everyone and I'll upload this shortly. Bye have a good one. Thank.