GitLab 13.2 Release Kickoff, 17 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab 13.2 Kickoff - Enablement:Database

Description

13.2 Kickoff for for the Database team

A

Hi I'm Josh Lambert, a group product manager here at get lab, and today I had talked about what the database team is planning on working in our 13.2, milestone, 13. That 2 will be launched to everyone on July 22nd, and this is what we're focused on delivering the first important. Most important aspect is continuing to work on partitioning, in particular with the audit events table. This is one of the larger tables you can imagine and get left right.

A

It stores all of the auditing events that we have, and so it is the first one that we are looking to partition and should provide a great test case for us to make sure you understand how to do it. We can roll out some of the basic tools to bridging tables and we can also evaluate the performance with a relatively simple and straightforward table to start with, before moving on to more complex tables that have potentially different access patterns, that would be more challenging to potentially address so in particularly internet 2.

A

We are working to actually partition the table by year. This is because, right now we are using PG, 11 and PG. 11 doesn't do very well with high numbers of partitions, and so we are particularly trying to make sure that we don't have too many partitions, which could cause performance problems and overall, some a database performance. So we are working on that and we're also working to roll out some tooling and make sure that partition happens automatically busy as possible that some of these different one of them might still carry over.

A

But if so, we will continue to work on those as well and then, of course, once we do produce from the table, we want to make sure we can compare the partition performance of the audit log table with a baseline to determine what the actual for its impact if any or potential improvement there was in partitioning. So that is a key focus area here for us, as you probably have heard us for some time now, talking about partitioning here and says where I said, I met entered the two, we might have our first table partition.

A

Moving on from there, we are doing some analysis as well on the growth in particularly rocket lab coms, ballistic rose database being the largest database, for instance. It serves as a great place for us to do. Investigation and I will be continuing to understand, really goals around how we can reduce the total size database, and you can see some issues here. We've identified some improvements, but also, hopefully reducing the overall database growth rate as well.

A

They idea being here is that as self managed and comm continue to grow, we can add more features, making sure that the PG of the base is continuing to scale well and perform well for our users and our customers.

A

So this is par that issue here as well, that I find in particular some areas that are high impact as far as what the workload on our database and we're also working to improve further optimize performance as well by reducing unnecessary queries, one of the biggest offenders. Frankly, it's actually calm.

A

Half of all database queries is select one, so every other query is essentially as effectively a no op and we get around 4,000 of these per second on get lead calm, and so, overall, it's actually having a pretty heavy impact on today's performance, and so we are looking to potentially cache some of these things, so we don't have to have quite so many it's causing a lot of context, which is, and in some testing and a relatively small database, it actually had a pretty high impact, as you can see here for some of this, could you bench that tested work that we have picked I got so excited for how much optimization and adjust the Headroom.

A

This can provide for that databases everywhere.

A

From there continuing on the performance and efficiency of the database, we're also working on rolling out automatic re-indexing of standard indexes. When this is open. Two months ago we had around 500 gigs of index float. Currently it's up to around one gig. Actually, as you can see here and know, for example, I'm just as andreas had mentioned, keeping all some memories actually beyond the available memory that we have on the instances for good luck comm.

A

So it is a problem and right now we can run PG repack on these manually, but require some oversight, and we would like to actually automate this process. There are some particular improvements in PG 12, which can make this more automatic, but since you're not on PG 12, just yet we aren't requiring that we have to do more work on our side to make this happen. The benefit here, of course, by automating.

A

It is that all of our self-managed customers will see the benefits of this as well, and these folks Mia had been running a database repacking if they weren't aware of it, and so this will allow them our customers to see potentially some significant performance improvements and reductions and the overall size of the indexes by virtue would have them as automated and being built into the application, so really excited about this, for its impact on comm, but also for making the database of self managed more autonomous and effectively self. Well self. Optimizing self managed.

A

If you will. The last major item were working here in the database team and 1302 is we have found that cyclic always uses the primary node? If you have multiple Postgres nodes, this is not always required, because if the sidekick job is not actually writing any data, it doesn't need and actually shouldn't be, hitting the primary. Instead I should be hitting a secondary, which is read-only, and it can essentially offload and reduce the load on the primary.

A

By doing so, and so we're going to go through and actually annotate all of the workers determining which ones actually are read/write and which ones are read-only and from there, we can then have this jobs select. What the primary secondary should be used based on whether the job itself requires write access, and so this will again get another nice and performance improvement in optimization for larger Gil evidence is that have multiple Postgres nodes. So those are the key features for the database team in 1302.

A

I'm really excited about all these and just continued efficiency and performance improvements that we will see, but as a result, I'm not only us as in calm, but also, of course, all of our customers, and so we are excited to get putting the two into your hands in July and stay tuned as well for 13.3, which will be happening in July and we'll tell you all about that on July 18th thanks.