GitLab Database Office Hours, 17 Oct 2018

Previous Meeting

⏯

youtube image

►

From YouTube: Optimising Rails Database Queries: Episode 1

Description

In this video series we will take a look at optimising database queries in Rails applications. We'll be using GitLab as an example, but the techniques can be applied to other Rails applications as well.

The audio quality is unfortunately not the best, but I plan to sort this out before recording the second episode. Make sure to watch it in 1080p, otherwise the text will be too blurry.

The explain visualiser used in this episode can be found at https://explain.depesz.com/.

A

Everybody, my name is your pizza and in this video we'll be taking a quick look at solving database performance problems in real applications.

A

Specifically, we will be taking a look at get lab, although the techniques used in this video can be applied outside of kit lab as well and probably outside of rails projects in general, we'll be taking a look at the snippets dashboard of kill, app, which I have in front of me, and this dashboard is used to display all the snippets you've created, either globally or in a specific project that what stands out from this page is that it takes about 18 seconds to load.

A

If we look at the careers that are executed, you can see that the top two queries take about nine point, four and nine point three seconds to execute respectively.

A

This is followed by a bunch of queries that seem to repeat themselves quite a few times that these squares are most likely the result of a M plus 1 created problem in this case, meaning that for every snippet we probably fetch the author, but for whatever reason we are not pre loading. Those queries today we'll specifically take a look at this first query. This rather complex looking select star from snippets query now for the sake of this video I.

A

Have it already formatted here in fin that way, we don't have to go through that procedure and I've already obtained a query execution plan, a process, sequel, there's a command you can run explained. So if you open up a terminal here and for example, we open up our development database, we can normally run a create, like say, select account star from users and we get the results. But if you want to know the expected plans, we can run it with explain.

A

This whole result opposed to sequel, telling you what it expects their own emphasis on expect because explain doesn't actually execute the query. If you want to actually execute it, we have to use, explain analyze, and now we get some extra data, such as the planning time, the execution time, the cost of each step in the query and so on.

A

We can extend that a little further with some additional options. So, for example, we can run explain analyzed with the buffers option which will display the number of shared buff is used in this case. Buffers are memory buffers and they use for caching results from disk. Every buffer, I believe is I, think it's 8 megabytes in size or 8 kilobytes 1, ft, 2, 8, 1, 9, 2 bytes, that's right, it's 8 kilobytes! So in this case we can see that it says buffered shet hit 3 means they used 3 buffers.

A

Now this command is super useful for determining why a query might be slow because it will display. You know all the steps it took, how long it took and so forth.

A

Unfortunately, if you gather this output from the terminals, if we take a query from the left here and we executed with explain analyze buffers and then we paste it, the output is not exactly readable. There's a lot going on here. So typically, what I do is I will run this copy the output and then put it in this tool called explained or depeche calm.

A

And what this does is. This will visualize the query plan and show you information, such as how long it took to execute a specific step. How long it took to execute that step and all sub steps. How many rows were produced out so forth? I'm not going to go too deep in the format of posta sequels explained plan, because that is something you can discuss in its own video, but very quickly to give a rough idea. Each step here is called a node.

A

They start with your names, a limit, sword, etc and they're sort of executed from insight to out. So this first node is actually the last thing it will run and not the first one, perhaps the best visualize. This is a function. Call that calls another function. It calls another function. In other words, limit is a function then ankles the sword function. It calls this case index scan function etc, and they return results as they unwind now.

A

In this case, what we can see here- and this is why I like this tool, so much is- we can immediately see that, for example, it no three- we spend about 2.3 seconds, excluding all sub notes. If you look at the statistics of that node, we see that it says we are using I believe about two and a half million shared buffers which I believe equals about 20 gigabytes of memory.

A

That's a lot of memory for just a bunch of snippets, especially considering we were only displaying to any of them here and if you scroll down, we see that here again we use quite a lot of buffers to get the data if you scroll down a bit further again a lot of buffers, but we also perform an index skin over 860,000 rows here and here below we. How many do we scan about four and a half million with a filter?

A

The filter is essentially equivalent of rubies area select, so it basically loads all the data into memory and then filters it. Here we have the number of rows removed, that's about 1.8 million, so in general disk we is doing a lot of work just to get 20 snippets way too much work. So, let's take a look at the actual query to see if you can quickly figure out why that is so.

A

I'm gonna close my terminal because we're not going to use that anymore today, I'm gonna, here, look at the query now, if I look at this query and I scroll down a little bit and then go back up, what stands out to me is that there appears to be certain sections that are repeated quite often so here, let's see what lightest is. This is line from line 11 to 16. We have this exist, select from project authorizations and get left this table stores all the projects that a particular user has access to.

A

So it's a table of columns, user, any project ID and the maximum access level for that project. We get that here, but we get that here below again and then another 20 or so lines below.

A

We again do this and if you scroll further down again, we do that, although this type, we also have some extra filters going on.

A

So, in other words, there appears to be quite a bit of repetition. Going up. Another thing that stands out is that for this first sub query: here we get all the projects we say limit us to the ones that you have access to and get the ones that have the project features, snippet access level called in this list of values or where that value is 10 and you have access to the project.

A

This is a little weird that we're doing this again because we already filtered the project still once you have access to here, and so this and exists as far as I can tell, is completely redundant, and so, if you remove this oops I have to go there. We go. We are left with and feature snippet access level in null 2030 or 10, which is exactly the same as feature snippet access level in nolde, 10, 20, 30, and there you go. We already got rid of that now. I know from some prior testing a bit.

A

This saves about 2 seconds of the execution time, but we're still left with about 12 seconds or so so it's not the primary source of concern. Now this particular and condition in its current state. We could remove it because, as far as I know, these are the only possible values that we actually store in this goal, however, I suspect the code that generates this query will use different values here, based on your permissions.

A

So in my case this could be moved, but if you are not logged in, for example, I imagined that it would only use a single fill, let's say 30.

A

Nevertheless, if we know all the hills are used, we do not eat this condition, so let's get rid of it. Now, let's take a look at this one here we see where projects is abilities in 10 or 20, and the snippets access level is in all 20 or 30 or snippets access levels.

A

10 and again you have access to projects, in other words, we're getting all the projects that our belief, visibility, level, 10 or 20 s, public, add internal with zero, I, think being private, and then we say Oh get the ones where the snippets access level is normal, which means it's a belief, public 2030, which I suppose is what most proud knows. So is the default and 2030 is public internal and then we say or snippets are private, but here F access to the project.

A

If I look at this, what I will probably do here is move this to another Union condition or a union member or whatever you'd like to call it. The reason is that, from my experience, the performance of or impulses equal can vary quite a bit. Sometimes it performs well. But when you use, for example, aware in with a sub-query and then an or from my experience it tends to perform rather poorly, whereas if you use a Union, typically comes much better. So let's do that well, change this and condition to a union.

A

So we just do that and then we do Union and I will just copy paste. This put that here and then this or condition becomes a and condition and we change the indentation and then the Frances here can go so now we have this as two separate things. That's some repetition here is the disability level, but since this is generated by our code, that's not a big deal. We're not writing this screen manually in our source code.

A

Now, let's take a look at this, where project ID in or essentially doing here in this from block is for all these projects. We select all columns and then we just select the ID from it. This is rather wasteful, so we can change this to select projectile ID. It simply means this less data. We have to send over there like less data to filter out, basically I. Think in this case, poster seagull might be smart enough to optimize this for us, but I I personally prefer to make these things explicit.

A

Instead of depending on cases where post a sequel might be able to optimize things, our can then do is get rid of the select ID from its. We can get rid of this, and then we can fix the indentation again and there you go now. The next step we can do is change where in to where exists.

A

The reason for that is that we're in typically performs about the same with small number of values, but if you have a lot from our experience where it exists, typically performs better now the way we have to do. That, though, is a little bit more annoying. So what we have to do here, because we see three joint snippets anywhere? No, we don't okay. So what we can do is change this to where X is.

A

Let's see where X is there we go, then we can do instead of select projects ID, which is to select oops one.

A

And then we do and projects the ID is snippets the project ID over here. Basically, things get all the projects, project, features etc, where their ID equals to snippets project ID from the outer query, we do the same here and projects Doe ID equals snippet stub project ID, and we do the exact same thing here and then in outer career we have or where the project is not specified. We just use for obtaining personal snippets that are not associated with a project.

A

So so far, we've gotten rid of a duplicate or exists that use the protocol authorizations table we've gotten rid of some redundant. Where encloses we spit a career up into two separate ones that we Union together.

A

We have some few things were that we could do. We could improve, but let's take a look at how this query runs in its current state, so to do that, I'm gonna, open a database connection. Now this might be a bit scary because I'm using a production database, let's see the phone size is correct, yep now the reason I'm doing.

A

That is because, when you want to get the data of how a created performs, you need something that's as close to production as possible and if you do doesn't say a development environment where you have pretty much no data, it's not going to be accurate. If you have a staging environment, that's up-to-date, it's gonna be more accurate.

A

The lack of traffic going in can influence the behavior, because traffic who may might result in certain buffers being created that are otherwise not available, in other words, in a completely unused staging environment, the behavior is probably gonna, be very different, so we're going to do here is we're. Gonna use, explain, analyze, buffers paste in the query and then we just run it and boom there we go so this query. Now it takes two point, two milliseconds to run: that's really fast, considering it took 12 seconds. Let's make sure it's actually returning the right results.

A

I know from this page. It's supposed to return about 20 rows, I believe so, let's take a look. You do that we get rid of the explain analyze and we just turn this into a let's. Just keep it a select star, because the limit here only limits it to 20. So it's not a big deal yeah. So, let's close that and let's see if they are actually 20 rows, it's a lot of data. Let's just change that to account query: it's maybe a little easier to look at, so we can get rid of the limit.

A

Take a bit of the order by let's see if you and I we have 52 snippets. That's the total number of snippets. Let's see if that matches what I have here and yes 52, so in less than 30 minutes or so we've gone from 14 seconds. The original execution 10 to 12 seconds to 2 and 1/2 milliseconds and all we did- was remove some redundant clauses and split equally up into like an x-ray union member, and we can probably still do better. We can probably get rid of this an exists.

A

We can probably get rid of this one by using a common table expression in process equal she's, the best way to explain it's, you basically run the query and you saved results, and you can then select form it that way. If it's a heavy query you ensure that's only execute at once. That feature, unfortunately, is not available in my sequel. So if you want to use that in our code, we have to make sure that we can deal with both databases. A scale-up has to support both we're not going to do that today.

A

That's quite a bit of work. The plan is that for the next video we'll take a look at the code and see how it influences this query. In order to better understand if the optimizations we made can actually be turned back into source code, for example, if certain conditions that we've moved have to be there, we might have to take a slightly different approach.

A

Second, when we start working with the code, we may realize that reducing this career is going to be more difficult because there might be certain cases we have to deal with, or perhaps the certain parts of the query are generated very deep down in the code, so we might have to start making changes in multiple places to make this work.

A

We'll take a look at that in the next video I hope you enjoyed watching this first one, please. Let me know what you find with it and we'll see you next time.