GitLab Scalability Team Demos, 29 Jun 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2023-06-29 Scalability Demo

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

A

I, don't I didn't have anything planned, but uh I'll take the chance to to talk a little bit about timeline because I happen to have been assigned as the coordinator for this week for the capacity planning issues and so I'm trying to figure out how to get timeline to adjust to changes. So, for example, one of the alerts that I was looking at was this one for potential saturation for PE bouncer client connections, which forecasts that we will reach 80 percent uh somewhere soon in August of September.

A

um But this doesn't seem to match what the current behavior is. So it seems that we had a base label that was around 20 and then or lowest. Level is now like 25 percent, but it seems pretty stable at that point.

A

um But I guess timeline likes to be pessimistic about stuff and said. Well. If we had that, um then our project that we'll have we'll continue to have small bombs and then things will get out of control. But I don't see a reason to believe that so um I was talking with pop and um he talked about this file.

A

Something I already have opened, which are the forecast parameters where you can add change points um and see how that affects um I'm, still trying to figure out like if that makes timeline uh more or less bias, tours changes. If, if you add a checkpoint, um it's from I was trying to generate things locally, but they seem really slow, but yeah I'm still trying to figure out.

B

If you remove all the components in the markdown file, it goes a bit faster. If you do that. Yes,.

A

So uh let me show you what I have so far so I change the uh uh attorney thing to have only the component that I want yeah.

B

Remove the Ops one on the bottom as well.

A

B

Yes, there you see.

A

B

No on the you've got the wait a second, so that's the Imports component text is empty there and on the bottom, you've got operation rates with zip Stars. Remove that or comment it.

A

B

Or remove it entirely, yeah.

A

Yeah I remove it for now and then how are you running? Okay, so I tried the thing that you said, which was to run so I have downloaded the data, so I have now this data folder, which is a couple of gigs, and then I shall run now. So I was doing this and the the thing that I was uh looking is that um there is in the documentation this um this flag, that you said time lapse timeline only cash.

A

um I was playing around with it and it doesn't seem to I, don't know if it's checking if the data folder is dead, because it seems that regardless of whether I put a flag or not, it is using the couch. But let's, let's run it like this, um which um yeah now with the cast downloaded, it is really fast. Maybe not.

B

Watch Breaking.

A

A

The above section versus to requests.

B

Maybe open that file but I think maybe we've broken something in the markdown thing.

A

So I think before I took that out. I think it worked, I mean, uh let me just because I it was working before. Obviously, when you demo something it's bound to break okay, uh so it wasn't that um did it make any other changes. Let me just leave here.

A

A

Yeah, just deleting a bunch of stuff from both petroni.

A

And then I was yeah. I was debugging this, but that doesn't matter um I, don't know if I still have the SSH terminal up, even if I'm, using the cache.

B

Normally you shouldn't.

A

I'll, just just.

A

A bit, that's probably my ubiky or something.

C

A

Yeah I'm not sure what did I break.

C

A

Me see live debugging here or maybe we can look at that this one.

C

A

Search, do not create.

A

She said, but try to fetch from Thanos.

B

Yeah, so that's.

B

What did they try to fetch from Thanos and why.

A

So it's query log with drench, which was called by prom query, which was called by the duration forecasting.

A

B

B

I think I think it's trying to load today. Did you does it work if you were enable the tunnel.

A

uh Okay: let's try that so that's what I was trying to figure out, so it.

B

Shouldn't but yeah.

A

It didn't matter but uh something's wrong with my so I had to so. This is uh I was uh uh there's.

B

A an environment, an environment variable that you can set for the now time and maybe set that to yesterday, maybe check to read me. There should be.

A

um Okay, let's see.

B

Variable there.

C

To the time to turn.

A

Let me check the FX.

B

Sound yeah and then go to saturation forecasting, dot, pi I think.

B

Yeah and then there search for now.

B

Oh wait: let me what was that file called again.

B

Sorry, it's called just forecasting dot pi.

A

Timeline now time we.

C

Have yes, okay,.

A

A

And it said your date impairment date.

C

C

There's just the same thing.

A

B

Just what does timeline.log say.

A

Timeline.Log yeah.

B

There's a file in the root of the directory and maybe yeah look go to the ends. Maybe. There are cash hits for.

A

And then query range with cash cash missed.

B

Yes, and which range? Is it doing the 23rd.

C

um Second, nine.

C

B

Did you change the yeah? You are the types the same of the queries.

A

B

Yeah, can you show me the depth of the markdown file again yep.

B

C

B

D

B

I don't know what's going on.

A

Yeah and I would like to set up a tunnel because I I usually run it with a tunnel, but uh something so I. I I've had a lot of trouble.

A

Setting up this, and one reason is that apparently- uh and this was something that I think I can come to a couple percent, but when I migrated from my uh Intel MacBook to this M1 MacBook I did the migration assistant and that copied a bunch of Homebrew installation stuff that was x86 and uh I had to just remove all of that and I think a couple of things from my broker, including my SSH.

A

No tunnel, is not working now, but um I'll have to figure out that uh offline, but um I guess uh I'll just take a chance to to ask Bob, because you mentioned that uh adding a trend change point to help. Can you speak more about what the logic is behind that yeah.

B

So a change Point um by default, timeland or like profit does this adds about I think it's 25 change points and the first 80 percent of data. So that means that this, this change that we see at the end of.

C

B

End of June ish yep um won't have a change Point yet, which is why it says: okay, it's trending upwards! Now, if you add a change point there, then it it changes the trend line and the mean will um will change to towards like because then that's where the, where there's a trick in Timeline, so you can see them and then like um that's where the the the the mean line changes. So every time like you see this blue line um every time it changes Direction!

B

That's where a change point is so what you're saying now uh by adding a trend, change point you're allowed to change direction here.

B

So um with that I'm hoping that the what I'm expecting that change to do is it'll make the growth less steep on the end, but it will make the confidence wider, so I don't think it's going to remove, or maybe it will, because uh yeah the confidence range is not that wide.

B

So yeah, that's that's the the thing to play with and it's handy if you can run it fast because normally, when you do the single single component, it takes like two three seconds. So then I just have like a browser open and.

C

B

The parameter refresh and it kind of like that, it's not such a horrible feedback, loop, yeah.

A

Get it okay, I'll I'll, keep playing with that. uh I know that wasn't too informative, but uh yeah I just take it at least take this home. If you're going from uh into my book to an M1, don't do the migration thing, it brings problems.

D

E

I had so much trouble with that, as well, like old binaries still lying around with the wrong architecture.

E

uh The path in Homebrew changed yeah disaster.

A

Yeah so I also had to play with the password yeah.

F

It doesn't help, does it that Mac OS went oh, but we've got Rosetta and that'll fix everything. So you end up in a state where it's not clean, works or doesn't work. It's a odd behavior and things that don't work in certain architectures.

F

I know that that one thing isn't in the instruction set, but we're not going to tell you that up front.

B

F

B

The thing between when they went to a x86 is pretty good. I thought anyway. I I have another question: I just came out of a call with some people um that work on the cost service, apparently that talks to Red is persistent.

B

I didn't know that, yes,.

E

B

Have we thought about that when we're talking about redis cluster or we think like yeah? Is that already brought up in discussions or do we need to still.

E

I, don't know if it's been discussed in the context of cluster specifically, but there's been discussions around having a dedicated redis forecasts.

E

um I can try and pull up some old issues where.

B

It's been no, no, it's okay, I'll I'll, just asking. If we knew about it, I'll find the issues and issues and mention it, but there's a Workhorse. Also. Does this I think that's probably something that we need to keep in the same instance as rails, uh because it's something with uploaders I, don't know.

B

There's things that we need to look into.

B

I was surprised to learn that constructs through red is persistent me.

E

B

C

Learned about it, yeah yeah, a.

G

Long time ago, yeah fell out of my head too I. Remember I, remember us talking about whether it needed a dedicated medicine since in deciding against it, because it was too too small a workload at the time.

B

And too much work to set up a redis instance, yeah yeah.

A

Oh, so it's not uh searching the same data that rails does it's just a distinct, separate subset of data.

B

As far as I know, yes, I think.

G

So but I mean my knowledge is very very out of date. By now.

A

But warhurst, you said also told us yeah.

B

But that's that's for be for rails and Workhorse to be able to talk to each other. Okay,.

D

Oh, can you put a message somewhere that it just didn't like sustainability management tax Sylvester and to make sure that he's uh I'm sure you were yeah perfect? Thank you.

E

I've also dropped the link to the Epic that I created six months ago, when this came up the last time around.

B

Excellent I'll cross-link that.

D

Any other topics today.

B

I can talk about the AI architecture thing that I've been working on the past few hours. If that's interesting,.

B

uh So, okay, let me share my screen.

B

I, don't have anything pretty to present, so you can all look at my editor and see what I was typing so.

B

The idea is to have the AI Gateway that gitlab instances, including gitlabs gitlab.com SAS, dedicated self-managed, can talk to not a single one but multiple ones, uh depending on on the deployment in the beginning, I suspect it's going to be gitlab, sauce and self-managed instances talking to a single AI Gateway and the this is going to be my suggestion that they talk grpc to that Gateway. The reason I picked I would suggest. Trpc is because we don't need to do versioning. That means we can keep the API pretty stable.

B

We can add stuff without breaking old clients. We can't remove stuff, but we can choose to ignore it.

B

We can remove it as long as we keep the numbers the same, but you know and I've started writing that up.

B

um The idea is that the gitlab instances provide all the data that they can to the Gateway and the AI Gateway decides what to do with it, how to generate whatever responsive needs. So that would mean that we create a separate grpc service for each of the things that we wanted to do, and here I've added an example for the code suggestion service.

B

Yeah, that's about it! If the for Coach suggestions, we also have a component in visual studio and the web IDE a would be that those things talk to their respective gitlab instances that then forward the request to the eye Gateway um right now. This communication happens in just regular rest, I'm wondering if that could also in a future iteration become something that does protobuf. So if we do that, then we would be able to forward things that the gitlab instance that it's talking to doesn't know imagine a situation where we've got a developer.

B

That has the most recent version of the the code extension but they're working at a bank that has a gitlab version. That is two versions behind the their self-management behind and they could still use newer features that we've introduced to the vs code extension. If the gitlab instance that sitting sits in between the PS code extension and the AI Gateway just transparently passes on whatever it gets, um which I think we could do with protobuf 3..

B

um That's that out of the way uh I mentioned, that I would recommend, building a service for each feature that we built. So we have a code suggestion feature we could have a chat feature. We could have a summarize issue feature um the reason I do.

B

That is that, because that would allow us to changed and improve the way we talk to AI providers for everyone at the same time, so because the AI Gateway is something that is owned by us and hosted by us, and it is deployed to much more frequently than the one update a month that we otherwise do for for self-managed users and not all of them update every month.

B

So that means that um we can iterate faster and people can see improvements faster, regardless of what version of gitlab that they're running, uh but also they want to call out yeah. Currently, we've built some some features in gitlab inside the monolith. So what these features do is um just call out to open AI with like a regular requests and provide AI things. The idea is that they are using our keys for that. So this doesn't work for self-managed.

B

That's why we need them to go through the AI Gateway, but because these features are already built with them uh that are already built inside the monolith and directly tied to the provider I'm suggesting to build like a proxy in grpc to account for them. But I am recommending to migrate them over to uh feature specific RPC. When we can that's about how far I got any thoughts.

A

So currently um I think the current situation is a bit messy right because there's some features that are built in the monoliths that call out to open AI or to Google vertex, but I, think code. Suggestion is different in that it speaks directly to to vertex and Google right.

B

Yes, but it's also different, because the clients that use code suggestions reach out immediately to the model Gateway, which is going to become the AI Gateway, and it doesn't have any information on the gitlab instance. So we want to start routing requests from Visual Studio through gitlab.com. If you are using gitlab.com. If you are working on a project that is hosted on gitlab.com you're, going to proxy the the request, togetlab.com gitlab.com is going to add any information that it has about your project or whatever, before forwarding it to the AI Gateway.

B

The AI Gateway is then going to Route it to Google vertex like build a prompt and Route it to Google vertex, but indeed it's a bit messy right now, because that's the only feature that well, it doesn't work like we want it to work yet, but it's the only one that is using the Gateway and the rest is just doing their own thing.

A

And, and what is the Improvement that we're looking for by going to get lab like we? Couldn't? You argue that that not involving the gitlab, it says, makes it more flexible because you're not tied to the rails version at all.

B

um So that was our. That was my first idea as well. uh The.

F

I can maybe inject some product context. Yes, please, um if you look at one of the hottest kind of competitors in the market from Source graph, um they're, doing really really well in performance with their code suggestion, because they're able to take this Source graph and use it as a vector embedding to improve the results that get back because they're you know open AI model or whatever it is suddenly has a lot of context about what your repository is, the file structure and all of those sorts of things.

F

So um in order to not be left out of that market, I think we're putting a you know Gateway in place right. So if we're on that round trip, um when we do think about that information gathering it and maybe turning it into Vector embeddings um to make that easy. For you know, stage teams or the A9 I enabled team to do, and if we go straight to model we. Obviously we can't inject that um additional information into the prompts.

B

That's the discussion that we're going to have in the future as well like, where are the embeddings going to live, because we don't do any embeddings. Now, we've built some database tables for it in a separate database, um but I would hope that we could do this inside. The I I haven't written a lot about this.

B

Yet because I don't know a lot about it, but I was hoping that we could also do this in the in the AI Gateway, because then we've taken a database dependency and um out of the critical part um for gitlab.com. So then we have this experimental embeddings database that may or may not be hosted on cloud SQL and if it becomes unavailable, gitlab doesn't boot, which is yeah annoying.

F

Undesirable yeah.

B

But um I'm I don't know enough about this, yet too really see. If that will work properly.

B

The idea of going through gitlab like gitlab instance for code suggestions uh as well to enrich them what we send off to the model like without using embeddings or anything to enrich it with complex information, for example, we could add stuff like we are building this. This file is going to contribute to this issue. That's being worked on and yeah information about the issue could then.

B

Help improve the code suggestions, perhaps maybe.

A

Yeah, no, that that makes sense and I guess that would make it a more integrative picture, like the the current code. Suggestion is not doesn't involve gitlab much at all right. It's basically just that check that this is just one third party or not models but yeah. Now that that makes sense, they.

B

Don't have a choice anymore, Google vertex is not considered third party anymore.

F

Yeah and and further down the line in terms of like opportunity cost um the unique value. I guess we're. Bringing in this space is is the data we have and the Insight we have is on the gitlab platform right with a.com or self-hosted, and if we're not injecting that in somehow um pick an llm, you should get similar results. That aren't you know USP for us.

B

The main concern that I still need to address somehow in the document is latency that it's added by the gitlab instance, because the model Gateway that we were using before is super lightweight and all the latency is added from talking to the model, uh and we need to look into like the Mobile Gateway hop is very fast. That's not really what I'm worried about, but I suspect that the gitlab.com or gitlab instance hope is going to be more expensive and it's not something we always control. If we don't control the instance.

D

C

D

An argument in the other direction, then, in that had a hundred question about, like whatever your expectations. Yes,.

B

The the initial ID that I discussed with Andrew and was not thinking enough about them, self-managed installations and so on, but um was the design where the vs code extension keeps talking to directly to the Mobile Gateway and the Mobile Gateway requests information from gitlab.com to enrich the the prompts that it sends to to.

D

Vertex because then it has context to ask for the data it needs right. Yes,.

B

And then we could build caching in between and I, don't know what but um yeah.

D

B

D

Is that, but, obviously, in terms of the designs that you shared with us, is this like you putting some ideas down, or is this done been done in collaboration with the AI development teams?.

B

uh It's kind of like uh the zeroth iteration, it's a talked about. I've talked about this with Stan and with andras from the um I. Don't know, there's two AI teams and he's from one of them that works on the model. Gateway.

D

B

So it's not just me, but it's not enough people yet I'm going to work with this uh on this with Matthias andras.

B

At first and then I get my get some ice from Stan as well, and then, after that, I think everybody can contribute. I deliberately get them like didn't cross link the merger Quest everywhere. Yet because it was a bit of a Hot Topic and I wanted to be able to type stuff. First.

F

Thinking about kind of what you showed this morning, Igor in having to switch some hard-coded stuff in the extension um to get you know the demos working and things like that. Have we considered how we're going to approach the versioning in a way? That's not gonna kind of cause, multiversion incompatibility at the extension IDE layer, but.

B

All the extension is first I've mentioned I've mentioned this. I've mentioned this uh in the dock. A little bit I think it would be super cool if the the communication between the IDE and whatever gitlab instance is also grpc, because that makes it easy to not do versioning like if the. If the thing sends information that gitlabs on gitlab the their gitlab instance understand great you'll get better results. If it doesn't, then that's too bad.

B

We'll work with what we do know, and the cool thing I think would be- is that uh in theory with the protobus V3, the gitlab instance doesn't need to know about everything that it receives like. If we keep them the.

B

um If we reuse the same protobuf specification across the tree across the tree, we can have the editor extension call out with everything it knows to gitlab. Gitlab can add stuff, but it doesn't need to open up what it reads from the from the from the IDE. So that means that if the messages are well formed, both above things, the AI Gator will just get them, even if the gitlab instance in between doesn't know about them.

F

Yeah, it's the the worst case scenario. Right is the ever a self-managed customer where the extension accidentally updates and then suddenly code suggestions doesn't.

B

I think I think we need to assume assume that these things are never going to match because.

C

B

People like me, who never update anything on their laptop until somebody threatens to restart it for them and there's people who see an update and then immediately click. The button.

E

Yeah I think the protocol will have to maintain a pretty wide vendor.

C

E

Of backwards compatibility I, don't think, there's really a way around that no.

B

And I think that the using protoba for that is a good idea, because the only thing that we need to do is never remove a field.

E

Yeah I mean we can also.

E

um Create like uh endpoint, V2, endpoint, V3 right, the the grpc method name um is something that we can change so that we don't end up with, like one huge.

B

System, yeah yeah, if we need a clean cup but I think um for this kind of work, where we're basically just gathering information and then massaging it into a way to present it to whatever like, in this case, Google vertex.

B

um It's just additional right like. If we have a client that didn't provide any open files, then we don't provide it to the then we don't provide it upstream and we'll get a less accurate or less useful response.

A

I, don't know if I'm on the appreciating or underestimating the abilities, the capacities of Ides extension, but um I I will just double check if, like because I know, there's vs code, but there's also jet brains. Another like that. Just double check that all of them will allow you to to add grpc I, don't know if they just have yeah.

B

The first iteration doesn't do that I. Think um I think we could get the same thing done with just Json, but then we need to figure out a way at the the gitlab player to translate Json that I don't understand into protobuf. That I don't understand.

B

E

B

E

I, don't think you can, because for Proto buff you always need a.

B

Schema yeah the schema. Well, you need the numbering just.

E

Yeah I mean that that is this, that is yeah the bulk of what the schema does. Yes,.

B

Unless everything is the number, is the index plus one something like.

E

That but I mean it depends on the type of the fields as well like it's. It's defining an actual packing of fields.

B

Yeah, so we would, if we stick with Json. On that end, we could come up with a stable API. Like that's the name of the field, that's the index of the field and transform it to, but that's basically protobuf, then.

E

Rebuilding in a in a probably worse way that point I mean at that point, you might as well pass the Json through and just do a Json D code at the end. Yes,.

B

E

Yeah I I had another question. um So if we're going to go with grpc, what does that mean for the implementation of the AI Gateway? The early discussions I saw around that were to kind of repurpose what we have for model Gateway and extend that code base. Is that still the planet? Would we then do a grpc server in Python and is? Is the language runtime and the grpc support a concern to be considered there.

B

So there is trpc support for python and I suspect that is, support is better than the one for Ruby.

B

But it's something to look into and for the first iteration of code suggestions. I, don't think it matters because I think that's just going to stay rest, so we can um make gitlab the instance just a proxy for what the IDE extension is already sending that yeah stays the same. So the code suggestion Service as it's currently running at coaches.getlab.com stays working for a while. We update the extension so I don't think it changes anything in the short term.

B

I'm also not super fixed on grpc, so if we decide Json is easier and then let's keep Json whatever.

G

Well, I think I think maybe I I missed uh the. What what do we? What do we gain by using grpc uh instead of Json.

B

Easier versioning, uh easier, bi-directional streaming if we wanted to for a lot of data.

G

I get the bi-directional streaming I, don't get why versioning is easier with with protobuf versus Json.

B

Because you've got the the fixed thing and you just keep adding fields and you, okay, but I.

G

Mean okay, so so you keep adding fields and the payloads get larger, and uh and if you want to change the semantic meaning of any field, um then you have to kind of still go through like a deprecation process and you'll continue to have that bloat. So, especially with like early adoption stuff, where we're going to you know, we have to assume that we're going to include things that we're going to care about. You know in version two three four and whatnot: it.

C

G

Guess: I'm not seeing how that's a win uh that we get over having fields that you know can be present for absent in Json.

B

The same goes for grpc like it's not like, they can be present or absent, and the clients ignore them, but yeah like the receiver. It just ignores them so that stays the same. Yeah.

G

So, what's what's the differentiator I guess, I I'm I, guess I'm not seeing why we're leaning into grpc, apart from it being like you know, a a more compact protocol.

B

um um Nothing really, we need to add an API between two services that we run and generally we pick trpc for that. uh Like generally a lot of the times at gitlab, we've picked grpc for that, so I, the first documentation I read, was using Json and then people said well. Why not? Grpc, okay, okay, I! Don't have strong.

G

Feelings about it: okay, cool thanks, I, mean I, thought I might be missing. Some some benefit.

E

I think one of the benefit one of the potential benefits is it's. um It manages by means of having a well-defined um schema, schema, it kind of solves the the schema question. It doesn't solve it completely because if you ever want to remove Fields, you kind of have to Define them as being optional and.

C

B

That's something that, like written in the system, already make everything optional to start with and.

E

If it's there don't.

B

E

So you'll you'll still need a whole bunch of runtime validation.

E

uh In any case, it doesn't absolve you of that, but it in.

C

Addition to it does make it explicit.

E

Representation, it kind of gives you a built-in way of knowing which Keys exist and kind of enforcing that those are correct. That's kind.

B

Of the map, man, it makes sense explicit, like this field exists and it needs to continue to exist. And if somebody removes the field, then it's visible because you've removed it from the Proto specification.

B

Any additional thoughts.

G

It seems like we can satisfy the requirements with with other standards. So I guess I don't have an opinion about it.

B

um I'm not going to argue about the protocol. To be honest, if somebody prefers Json I'm very happy to do it. That way,.

G

I was mainly thinking of it in terms of um if I was missing some benefit and also the client support that we were talking about a few minutes ago.

B

Yeah, the client support is between gitlab and clients. No, because that always sits in between a.

B

Alejandro, do you want to talking about.

A

Yeah I I had a good thing. Maybe this is not too scalability or I guess it is because we we want to report metrics on it, but um uh one thing that I was looking at on code suggestion is that we are not really so now that we are using external models. uh One key thing that we have to handle all the time is that the length of the input um which is not in characters uh because for language models, you have to grab your string of characters and divide it into tokens.

A

um But the thing is that the tokenizer is different per each model and we're using an external model. So we don't know what they're using so. But the thing that we're doing right now is we're just saying: let's take, let's allow the user to to only add up to 2048 characters, and if we look at the logs we're hitting the limit.

A

But then, when we go to the Google documentation, you have a limit of of that's bigger than that and it is not in characters, but in token lens, um and so the program that we're facing is that, because this is a black box, we are not able to grab the string tokenized it and then send the tokens to the model, because the model only accepts accepts. The string accepts the string and then tokenizes internally. So one strategy we could use here, which will be because it just grab a random token Essence.

A

Here, for example, we've talked about using an open, AI token answer, which is obviously not what Google is using, but just to get an estimate because token assets are similar just to get an estimate of what we do. uh Of uh of what of how many tokens are contained in the input string, and then we can do things so, for example, Stan was working on something like well.

A

If we are, if we're sending under the Google's token limit, then we can include some more contexts like the import statements at the top of the file, and that should give you better suggestions, because now this Concepts about what libraries are being used and that kind of stuff um I also needed to to report metrics to Prometheus I'm, currently reporting token stories uh character link. But that's not that useful, because uh yeah token is the real measure.

B

um What does the like to? Does the truncation happen now at the AI model Gateway or at the vs code?.

A

uh At the Gateway, so now we are the wait. Actually I don't know. uh I know this is uh this is what they were discussing here.

C

A

Were saying, oh, it's 2048 patterns three characters which, depending on encoding that translates to more or less bytes, um yeah I, don't remember at the moment, I think I think for sure. I can say: I, don't know if it's happening on the editor or on the Gateway, but we're sending at most 2048 characters to Google, um and we have a much larger capacity which we're just going to Serious capacities. uh One thing that we have seen is that you can apparently send we haven't hit the upper limit in Google.

A

We try sending like a thousand sorry 10, 000 characters and Google didn't complain. uh It just apparently dismisses some of the earlier tokens that you send and only and only processes the what was the number the 8 000 latest tokens that you send so um I was also looking at I. Don't know here, it is uh I was also looking at. Maybe this is not the best, but there's also this tokenizer.

A

um It might be a better uh fit. I have very little to go by, except that Google publishes models that say that they are that.

D

A

That way, um but yeah, the one of the things that I still don't wrap my Mana around is that this is, in any case, very inefficient because we're going to tokenize, but that's not going to include influence in any way what the model does, because the model is just going to take the industry. So it's kind of unnecessary, but that's the best way. We have to measure things in our site now that we're using something a model that we don't control.

G

um Just for context have we have we committed to using these extremely generated models instead of our home built models.

F

uh We put out an official announcement that said that Google models are now first party models, I think there's okay, A bit of experimentation and comparison going on, but the general direction is to head towards um Google and brand that as more white label.

G

Okay, got it got it um and, with regard to the tokenization I kind of didn't follow at what point in in the in the data flow, does tokenization happen and does the model consume the tokens or does it actually do the tokenization itself.

A

So when we had our Aura model and uh what we did was take the info string and then before passing it to the model, there were like three steps: there's pre-processing model and process processing and the pre-processing. We took the string changed it into tokens, and that was that was what the model processed.

G

um That makes sense that's kind of what I expected um yeah and is that different than what we're using with Google's models.

A

We just give them the string and we don't know what they do internally and.

D

A

Give us- and they give us back as string.

G

Fascinating, okay, and uh is it? Is it fair to so I I've worked a bit with tokenizers and lectures in the past, I I'm kind of assuming, but but not not a lot so I'm making a lot of assumptions and I want you to tell me if I've got something wrong here.

G

um If we pass, uh if we, if we pass an input string, uh that's say, for example, the the 2048 characters.

G

um Would we expect there to potentially be more tokens than characters, depending on uh depending on how they they Lex out or or would be expect to that the character count to be an upper band on token count.

A

Yes, yeah. Definitely, there's no yeah, there's never going to be a case where a character is divided into multiple tokens, yeah, okay, and if you look at actually that page, that I was uh showing. It does says that for for the models that we're using a token is roughly code into forget so a beginning, heuristic will be to just divide your input string length by Fork that but um I think we can do better. Maybe.

D

G

That's great stuff, thanks for sharing that yeah.

A

um That's all that's all I had.

G

um Well, I've got some I've got a couple of flame graphs to to share, but it's uh this is hot off the presses, so I'll just take a moment. um This is a follow-up from last time, a much much shorter fellow versus last.

C

Time, oh, my gosh I have so many tabs I'm. Sorry.

G

I don't have time before I get to the right ones. Here we go.

C

G

So this is again from the uh from the the memory bloats that we're seeing on dedicated. We at this point have um have two uh two code paths that are associated with freeing this uh this bulk memory, um when, when we terminate a single individual socket, let me make this a.

C

Little bit easier to read.

G

When we, when we kill the difference, so these are, these are two code paths in the kernel for freeing socket buffers. um So this one is what we, this will be assassinate, a particular socket by injecting a TCP reset into into this. The stream. The context here is: we've got uh a long-lived, TCP connection where the client has set its receiver window to be zero bytes and therefore we're not allowed to send any data to it.

G

But server is still spitting up data that it wants to send to the client, and so it accumulates an ever-growing uh backlog of data in this socket, which is why we have why it accumulates a lot of memory usage and when we kill that session. This is the specific code path that we get for for uh for releasing those socket buffers. So at the top you can see it's kind of got the the generic page, freeing um code and um kind of in the middle. Here we can.

G

We can see that um that, when we, when we send when we use TCP kill, is a is a small utility that just injects DCP resets um into an existing socket and it uses the send to assist. Call to do so, and we can see that the way that ends up kind of you know um during that Cisco actively freeing the socket buffer pages. Is uh it processes uh it processes the we get to see that here?

G

Wait it processes this it um it processes, as uh as a consuming the the backlog of receive packet, Direction and um and detecting the the reset and purging the the right Cube, which is the um which is one of this, is related to, but not identical to the transmit buffer, um and so that's how we end up getting to free a bunch of this bulk memory.

G

This this particular example was um was from a single socket that had um over 100 megabytes of data accumulated in its backlog, which is way more than what the limit should have been, um and then the more generic path is when we have a clean shutdown that uses a fin exchange instead of a DCP, a hard TCP reset.

G

um We end up with um um the um uh the socket buffer is being released more gradually during uh during regular irqs, where, um uh where a timer based uh Handler is uh doing doing the cleanup asynchronously in the background, so those were those are two hot off the press. uh Findings from this um this, uh this memory, bloat analysis.

C

G

Been hunting for a while to find the exact allocation path and the and the exact free paths and I was pretty confident that there were going to be at least two um relevant cases for the for the for the uh for freeing pages and we've. We got both of them yesterday, so that was that was first info. um We're gonna do like a a kind of a a summary write up on this um soon, maybe maybe today, maybe tomorrow, so there'll be some some writing. That kind of gives more context around it.

G

But since we had a couple minutes, I thought I'd go ahead and give the give a sneak preview here.

B

Have you decided which, which of the solutions you you found.

G

Let's talk about that, um so I I would love to get your opinion on this Bob. um So at uh um so the as mitigations we're gonna we're gonna. um We are gonna. Do um the um the CC, the cctl adjustment, to stop using compound Pages, because that greatly reduces but doesn't eliminate the blood rate.

G

um Periodic restarts of nginx is probably going to be the simplest thing to to as a Band-Aid um and as a long-term uh strategy. We're thinking about. Having um um we're thinking about asking uh our development team to have web sockets have an upper bound on uh how long it will patiently wait uh when it sends server pains to the client and the client isn't responding, um because the the reason these sockets are staying held open for so long is because there's um a TCP proxy layer, probably a firewall or other network security device.

G

That's injecting TCP um keep alive packets and the TCP stream, but it can't do that. uh You know within the SSL uh within the TLs tunnel, so um which is how the the web traffic gets carried.

G

So if we at the the websocket layer of the protocol stack require clients to respond, um the the machine in the middle isn't going to be able to spoof that and so that that would be a a reliable way to detect whether or not the client really is there, um and we can set that if, if that's feasible, we could set that as a configurable timeout for um with a really generous upper bound. As long as it's like, not days, which is how long it takes to fill up uh memory, and this would help out.

G

You know outside of the context of of this this weird case, where we've got a misbehaving piece of network infrastructure between the client and us.

B

I'm just checking I think action. Cable should allow us to set that awesome.

C

G

B

G

I were kind of you know, brainstorming yesterday, um and uh that that was that was kind of where we were leaning. um We don't think it's going to be feasible to deal with this at the nginx layer, because um um well, because it's it's not um it would anyway, the I don't want to take up too much time.

G

We're pretty confident that dealing with it at the internet layer is not going to be feasible for multiple reasons, um but doing it within the websocket stream seems like it would be a a significant win and it does look like it's the websocket um TCP connections that are responsible for the large majority of this memory leak. Possibly all of it.

G

Well, I, guess: that's it guys.

B

Yeah sorry I was I'm going to try to find the the solution in the rails thing and Link it to you. If you.

G

Want to create an issue that would be beautiful thanks.

B

Talk to you all later.

A

Bye thanks, everyone.