GitLab Sharding Working Group, 15 Jul 2021

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: GitLab 14.2 Kickoff - Enablement:Sharding

Description

Kickoff for the 14.2 release for the Sharding team.

Planning issue: https://gitlab.com/gitlab-org/sharding-group/group-tasks/-/issues/1

A

Hello, everyone, my name is fabian simmer, I'm a product group, product manager in gitlab's enablement stage, and I will walk you through our plans for the charging group for 14.2 or next release, and so to remind everyone.

A

What we're trying to accomplish here is the sharding group is particularly interested in improving gitlab's database scalability and right now we are actually decomposing gitlab's database, so rather than having an entire large main database, where all of the data lives we're going to move some parts of the database to a physically separate server to improve our ability to scale now we're focusing on ci tables at the moment, and the reason for this is that ci tables are so ci.

A

Continuous integration are significant part of our database size but, more importantly, they actually have a very large portion industry, roughly 50 percent of our rights and those are usually not particularly well scalable via read, replicas and only scale reads with those. So um this is what we're doing it. uh It is a project that also involves many other groups inside gitlab, so distribution infrastructure- and I wanted to highlight that we have this progress.

A

Single source of truth, so if you're particularly interested in understanding where we are at this is the place to go and craig the engineering manager and directly responsible individual for for this group is doing a fantastic job. Updating all of this so do check it out.

A

I'm going to go through a few things that the sharding group itself is going to do in 14.2 and I'll stay at a relatively high level, because there's a lot of things going on. I think I want to give you a broad overview of where we are at so. The first thing that the sharding group is going to continue to work on is supporting many databases in gitlab. As I elaborated earlier on right now, we have a single database that we need to manage and with decomposition we will have more than one.

A

So there are many different areas of the product that we will have to make aware of many databases so for 14.2, we're hoping to add support for multiple databases in gitlab, specifically the first iteration to handle explicit database schemas.

A

There's also work going on by the distribution team to support the omnibus configurations for multiple databases, specifically for gitlab.com, and also cloud native gitlab, often referred to as cng for gitlab.com.

A

So that's what we're up to we hope to accomplish most of this in 14.2, but there's likely going to be a little bit of extra work.

A

That actually needs to happen following that we're working on the decomposition of ci tables, and so this is the specific area of the database for functional functionality that we're interested in what we're trying to do here and you can see there's a lot going on is in this epic is to essentially look at our um specific merge requests and try to actually make the tests path uh pass when we have this in multiple databases and while doing it, we're actually like generating quite a few interesting follow-up items. But the team is sort of swarming.

A

On this specific area uh to make sure that um we can actually get this poc merge request into a shape that allows us to ultimately merch. It then there's a couple of other things that I would like to highlight. So we're also really interested in creating all of the necessary observability tools for multiple database servers. So you can imagine now we have one server. We need to monitor it.

A

We need to be aware of what is going on to be able to address any concerns that may happen, and now, with potentially more than one database cluster actually being available. We need to make sure that we have the right dashboards in place. We collect the correct metrics. We have all of the visibility for for these additional components that we require, and then there are two items here specifically to do with infrastructure, so this is actually being worked on already and we are working on it at speed.

A

So for one we need to benchmark, um or we need to be able to benchmark this new way of running gitlab, because we need to ensure that there's no performance regressions that everything works as expected when we have more than one database. In order to do this, we need to create these environments um to actually measure this before we roll it out into production and sort of tie to that is the ability to actually create clusters of databases in a repeatable manner. So you know because there's more complexity behind this.

A

This is a second database cluster with a single primary and then multiple read-only replicas to actually establish high availability. So, rather than having this snowflake sort of architecture, we did it once we wanted to be able to. You know, repeat this process over and over. This will become much more important when we are considering maybe decomposing further or when we are actually looking at different charting strategies.

A

So this is uh this is planned for 14.2. There are also other activities right now going on in other areas, for example, fixing security features for cross joints across this specific table. There are also a few other identified, broken features, you know with multiple databases, and this work happens in other groups check out this table. It's a great overview. I'm excited about the progress that the team is making and if you're particularly interested the epics are the single source of truth.

A

So you can always look at the specific issues that folks are working on.

A

Thank you very much for listening. This is the outlook for 14.2 and stay tuned.