GitLab GitLab Group Kickoffs - Enablement:Database, 16 Feb 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab 14.9 Kickoff - Enablement:Database

Description

Kickoff for the Database Group for the GitLab 14.9 release

Planning issue: https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/237

Database Group Past Kickoff Videos: https://youtube.com/playlist?list=PL05JrBw4t0KqP3MYrcoQHrqPUqn_jJZSN

14.8 Kickoff video with a more detailed presentation of features: https://youtu.be/eek6Sn0p5gc

Presentation by: Yannis Roussos, Sr. Product Manager, Memory and Database Groups

A

Hi, I'm gary solusos, the product manager of the database group and I'd like to take you through what we're planning to achieve gitlab 14.9, which is scheduled to be released on march 22nd of 2022..

A

As a quick reminder, the core mission of the database group is to build the application code. The tools and frameworks that allow every gitlab feature to interact with the database in the most reliable and performant way possible, we'll also be tools and product features that enable any gitlab team member to efficiently develop code that interacts with the database test against production grade data sets and make informed data-driven decisions before shipping any update to the gitlab product.

A

As such, most of the topics that we're working on, including the ones that I'm going to discuss in this video, are not directly user facing, but they are behind most user interactions or automated processes that run in gitlab and affect how performant, reliable and scalable each gitlab distance is from the smallest ones to gitlab.com.

A

So let me start with the top priorities for 14.9, so our top priority for the past few milestones has been updating gitlab to work with small databases that was in order to support the starting group in the composing gitlabs database.

A

That's very important for very large instances like github.com, where there is no much room to further vertical scale, the primary database, no matter how many replicas we may have. So in order to keep on scaling, you want to decompose, you want to split the database uh into database clusters. So in our example here we want to move all the ci tables on a second cluster and support that for very large instances.

A

So we are in a very good position. uh We have been working, that was our main focus of 4.8 and all the top priority tasks have been successfully completed and the starter group is an unblocked to keep them working on the next phases of the decomposition effort in 14.9. We have to perform a few remaining follow-up tasks and make sure that there are no remaining tasks that may block the future with the composition effort our next uh top priority.

A

Our patch background, my the general availability of pass background, migrations, so background migrations are large data operations, largely updates or whatever that we want to run. We run them on the background using workers, so this is a new framework and our plan is to make this new framework the patch background, migrations framework, the standard for all data-related operations and replace the existing way of performing.

A

We anticipate that this will be a major step forward for the stability and availability of github instances of any size, while at the same time making most background data operations complete considerably faster, in most cases up to 20 or 30 times faster.

A

So we have a very thorough epic link on the planning issue. If you are interested in more details, please check it out as well as my previous kickoff videos, especially the one for 40.8, where I go into more details about what we're planning to work on our plan for 4.9 is to continue towards general availability.

A

Make pass background. Migration with multiple databases, add some updates on our uh batch background, optimizer and our auto tuning algorithm, our documentation and that kind of stuff that will move us closer to general availability, which we expect to to be in a few miles from now. Our next top priority is also related to what I just discussed is having a throat in my country for large data exchanges.

A

So during the past year, we have had a few database related incidents in gitlab.com that were directly impacted by the rate of updates on specific tables. So this is one of those cases where moving fast being able to do a lot of updates, works, adversarially against you and may affect the health of the primary database server. So we want to implement the throat mechanism that will address those problems and reduce the risk of database incidents and, of course, reduce that can reduce the availability of large data instances.

A

So our plan here is to update to start with updating the way we autotune batch background, migrations and also in later steps use a throat mechanic for all data changes that will apply. Allow us to be smarter on throating and increasing or decreasing the rate of updates based on various uh signals.

A

We can monitor on a live database server so from how how many how much traffic it has as a whole or specific tables whether there are some maintenance operations happening like, for example, auto vacuuming on a posterous database and other parameters and respond to those signals by throttling how fast we update the data or even stopping the update and perform some additional operations like a manual vacuum in, uh for example.

A

Our third priority is the automated database test and use production clones. So in general this is a long term initiative for us. So we are shifting left our ability to preemptively find database related, recreations and performance issues by testing all database updates against a production clone of gitlab.com database with every feature we add, we move one step closer to github being more performant and lower the risk that code may be deployed that will cause incidents that affect the performance and availability of gitlab.com or other self-managed gitlab instances.

A

So we have covered and we have completed all our work for regular post-migrations and we are now focusing on testing background migration. This is the same uh theme as everything else I have discussed. So the main theme of our focus in general database group are background migrations, or how can we work, optimize, work and test for those very large data updates that run on any github instance?

A

So this is not. This is not trivial and the problem there is that background. Migrations are meant to be to take a long time, because you want to access and update a lot of data, so it can take tens of minutes hours or even days at github.com scale. So the problem there is: how can you test those updates on the see a pipeline where we want to validate nmr in no more than a few minutes?

A

Let's say 10 minutes, so we are adding the tools and then we are trying to figure out how to statistically certificate and test those background jobs while at the same time offering as much coverage as possible for all scenarios. So this uh support for background migrations is our main focus. 4.9 and as a next step going forward on future uh versions, we may also add the ability to record and analyze logs so that we can also catch early and prevent deadlock scenarios.

A

Our last top priority is a smaller initiative and it's fully internal for gitlab. So it's about adding a database dictionary uh and I added the ability to label uh tables by the group that owns this may not sound um related uh uh user related, but gitlab.

A

Now we have more than 400 tables and being able to to know the owner of its table very quickly will provide us with the ability to more efficiently respond to database related incidents or bug and regression reports by finding the group that has the domain knowledge to address the problem and responding in a timely fashion.

A

That means minutes uh if there is an incident in gitlab.com or as quickly as possible, if there is a a back report, so that's it for 14.9. Thank you for watching and talk to you on march.