GitLab 15.1 Gitlab Kickoffs, 17 May 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab 15.1 Kickoff - Enablement:Database

Description

Kickoff for the Database Group for the GitLab 15.1 release

Planning issue: https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/252

Database Group Past Kickoff Videos: https://youtube.com/playlist?list=PL05JrBw4t0KqP3MYrcoQHrqPUqn_jJZSN

Presentation by: Yannis Roussos, Sr. Product Manager, Memory and Database Groups

A

Hi, I'm jen solusos the product manager of the database group and I'd like to take you through what we're planning this evening, 15.1, which is scheduled to be released on june 22nd of 2012..

A

Let me take you through our top priorities. First, one is the long term initiative, but we're running for batch background migrations. This is the way that we perform large scale data updates in all git club instances.

A

So for a while now, for a few years now in gitlab, we have extended the way that the rails applications perform large-scale data pages by allowing them to be scheduled and executed as synchronously. On the background, those are the background, migrations and now we're extending this framework that have built internal insect gitlab with the batch background migrations.

A

This is this. In this way, we provide a stable, reliable framework for executing the most complex error-prone database operations out there, and I'm talking about updating billions of records at the gitlab.com scale. So 15.0 marks a major milestone for us. This is the general internal general availability of the batch background, migrations, so patch background. Migrations are now available to all gitlab engineering and we have switched them to be the default way for perform background migrations.

A

We expect this to be a major milestone so that it will help with the stability, availability and reliability of large and small instances out there and will also help a lot with those operations to be performed uh as seamlessly and as quickly as possible, but the general availability is not the end goal for us. We have to keep on. I have fixing issues that we are finding adding more features, making the whole process more stable and seamless and auto-tuning uh for self-managed instances and more so we will keep on our work.

A

On the past background, migrations for 15.1 and for a few more releases that we follow our next top priority is implementing a throttling mechanism for large data changes. This is also related to what I just discussed about batch background migrations, but this is in general about the data changes. The problem with large data changes so, for example, upgrading updating a billion records is that you can work adversarially against your system.

A

If you go too fast or update too many things, you can put too much pressure on your system, so this effort is all about adding the capability inside gitlab to monitor in real-time production database clusters in respond to changing conditions, whatever those may be so, for example, increase traffic, increased load on a primary database server or issues with replicas or an incident or something going down.

A

We want to add the capability to monitor, for those problems, have the early warnings and the mechanisms that so that we can respond to those issues by, for example, throttling down uh the rate of data changes of data updates or even posing uh those updates, if necessary.

A

And our final priority is the automated database testing using production clones, so this is also very major milestone for us in 15.0 we finished and we have released full support for background migrations. This is also related to what I was discussing. So we fully test now all regular background, migrations and vast background migrations against a production clone a clone of the production database of digital.com, and this is super major for us. So after working on this project for more than a year, we now cover and we are testing against all types of database migrations.

A

So we cover 100 of all schedule. Database updates uh right now. That means that any database heme update added an integers, switching changing the column or whatever, and all that scheduled data operations are covered and fully tested inside a pipeline in in nmr against the production database of gitlab.com, and this is major for us. We expect that this will help a lot with cutting various issues uh that we may not know.

A

There are issues with the data uh processing that you cannot see if you don't run them against terabytes of data, and this is our way. So this is we're helping the world to shift levels, their ability to test code or whatever and we're doing the same inside gitlab by shifting left our ability to test and catch database related issues.

A

But this is, of course, not the end for us, even though we reach our major goal here, we are now expanding our scope and we want to also do automated query analysis inside merge requests internally for gitlab updates. That means that we want to also start automatically testing for regular queries or updates that result from user interactions, not scheduled large updates.

A

So it's like a user visiting a page that generates a few queries or adding a comment or changing the label that will help us find and test and refine all types of queries that can cause performance issues on a gitlab instance.

A

They are not, as that cannot cause as major problems as a huge data migration, but they affect every single interaction of a gitlab user and we want to go towards full coverage of all types of data database operations with this effort. So this will be the start of this new uh expanded initiative.

A

This is a very difficult problem, because when we generate queries, queries are param have parameters. The parameters depend on the data that we have stored and there are many many many permutations for a query, something about something that, like searches over issues, you can uh have changed the the group, the project, the issue, other parameters.

A

If you multiply them, you can generate a lot of different queries. Given a query template, we want to work towards that. We will start by identifying all queries so that we can enable reviewers inside gitlab while reviewing new updates, and then we will try to find ways to generalize those query templates and automatically test against a production database.

A

This is uh I'm very excited about this. This will increase our coverage and increase the availability and reliability of gitlab.com and all uh gitlab bits out there from large to small. That's it for 15.01. Thank you for watching and talk to you next milestone.