GitLab 14.8 Release Kickoff, 17 Jan 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Enablement:Global Search - GitLab 14.8 Kickoff

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hi, I'm john mcguire, and I'm going to take us through some of the updates. We have coming for global search in get lab. 14 8 and I have with me john mason, who is our senior engineer, who's going to be helping us go through a lot of these different changes and help plan out some of the pieces that we have coming in 14 8 joining me today.

A

So the first thing we're going to be looking at um we've been looking into open search, which is the aws rebranded and forked version of elasticsearch, and we have some areas. We want to create some testing frameworks on to understand how we are continuing to track the uh the kind of cross compatibility that we we believe we currently have with git lab and open search, so we'll be looking into getting this. This testing framework set up and we'll be tracking any type of issues that we find from it going forward as well.

A

The next thing that we're looking at is optimizing code search tokenizer to improve code search quality. Now I know sometimes when we're using code search, there are like specific characters that may not return results. The way uh that I I might expect it to and I'm wondering john can you help me understand what changing the code tokenizer will do.

B

Yeah totally um so uh elasticsearch will take the content and break those into individual terms, and so, if you have uh like the brown fox, it will uh split that out into individual tokens, so the brown and fox and then by default it'll, throw some um words that don't really matter like the uh those are called, stop words and store brown and fox. um Where we're at right now is there's an issue if you have hyphens or special characters.

B

So if you had the dash brown dash, fox elasticsearch right now still thinks it should treat uh dashes as word separators. So it's it's gonna. Do the exact same behavior, it's gonna, you know, remove the dashes, then it's gonna be the brown fox and remove the um the problem with. That is, if you try to search, if that's like a variable in your code and you try to search for the dash brown fox, it won't show up properly because on the back, elastic search is just storing brown and fox.

B

So the new tokenizer that we're gonna roll out it will uh respect special characters. um So those will, if you have a variable that has separated by dashes or underscores or even punctuation elasticsearch, will um respect that and just store that as one individual term. So then you'll have the dash brown, dashbox and so you'll be able to search for stuff like that.

A

Right, I can see how that's extremely helpful for code search, uh because we use all of these special characters in code, so um the the fact that we get tokenized and sometimes um ignored, or actually even break a parsing of a word right like it doesn't maybe think of it as the same term anymore. The way it's set up.

B

A

Looks great, I actually think this is a um a large problem that we have across a lot of feedback that I've seen from our customers, and I hope, they're really looking forward to this improvement, and I think it will make code search specifically uh work a lot better from the initial cases that I've I can imagine.

B

Yeah um another case that um it applies here is searching for uuids, which are separated by dashes so that that will fix. This will fix that issue as well. Yeah.

A

Okay, that makes a lot of sense. Well, I'm really looking forward to this.

A

This sounds really good um and that moves us into some performance updates that we're looking at 14 8 we've moved a lot of the scopes into their own index in the past, and each time we've seen that there's been some type of improvement to the performance of search specifically in gitlab.com, but we also hear this is helpful for larger customers, they're using the elasticsearch integration, and so the next scope that we're moving is commits, uh and this we have a few other that will be moving out, but the at the end will have each individual scope and its own index and code will be in its own index as well, which gives us a lot of advantages too, and so this is something to look forward to coming in 14 8..

A

We have another performance improvement.

A

We have noticed that there are uh some group level searches so when you've actually specified a group, uh but but not a specific project when you're searching across the entire group that perform pretty slowly- and we have discovered that a lot of this time it has to do with groups that have lots and lots of projects in it.

A

This is actually gitlab.org list of projects that are in it, and so, when it's doing a group level search, it has to do some type of search across each individual project which takes a lot longer to complete, and we have found some ways to optimize this group level search that will essentially make all group level searches faster, but it'll be most noticeable on the ones that have lots of projects in it. Is there anything else on this john that that we can add.

B

uh Yeah so uh kudos to dimitri, um who is on that that issue um who came up with us this idea, uh where instead of passing this gigantic list- and so this really this issue really comes up when there is a group that has a lot of projects um get lab is one um there are a few other uh where, if it's a group that has a lot of projects- and you do a group search, it's doing like a select the elastic search equivalent of select star where project id is in this gigantic list, and so, as you can see, that's a lot of stuff going over the wire.

B

So what dimitri is proposing is, instead of having each of these individually to have something called an ancestry, a namespace ancestry id which can represent a collection of those, and so instead of doing a massive list, you can just specify the namespace ancestry ids and those will represent the the associated permissions and uh they'll be uh order of magnitude more efficient. So I'm really excited about this.

A

Yeah, uh especially because we're searching on kitlab.org all the time and it.

B

A

Be slow so me too, uh search.