GitLab Database Scalability Working Group, 15 Apr 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2021-04-15 Database Scalability Working Group

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Just fyi, a number of folks were in a stand-up and it just was in the process of ending. So we might get started here a minute.

A

B

uh Jerry will will not be joined today, so I'm going to host a meeting today. uh Hello. Everyone today is uh april 15th. This is the uh database scalability working group uh weekly meeting. So let's get started to uh important updates. uh Yeah jerry is preparing the pg-12 upgrade on staging so unable to attend. Then, let's move on to what's being done, quick yeah.

A

So um all the mechanics for uh having a team, the sharding team starting group, sorry has been set up so weekly agenda meeting scheduled the agenda links there. The epics are are created as well and camille's going to talk about details of those down below so we'll not steal his thunder there. Fabian you've got the next item.

C

Yes, um I met today with camilla janis. We spoke about the six cr success criteria for the working group, but also um hashed out what the success criteria need to be for any solution that we implement. I think one area that requires additional work is that any solution that we implement to scale our database cannot have a negative impact on our user experience and that needs to be quantifiable.

C

So I think we need to make sure that we can assess and measure that appropriately. um So that is something that we also need to continue and sort of establish ahead of time. So if we end up pocking more than one solution, we can actually say: okay, these. These are the impacts to our users, because if things end up being really slow in some instances, I don't think that's going to be.

C

B

And camille you have next to two bullets.

D

So um I kind of I think that I finished composable call basis proposal um and I see is it writing something see? Maybe you can voice your commands. It's gonna be better and concurrent.

E

Oh um yeah, I think you're saying hey. We shouldn't confuse. I was just talking about the naming of it. I think we shouldn't confuse our goals with the implementation.

E

I do think that um the current naming has the risk that people start doing all kinds of uh there's a lot of mid risk of misinterpretation. So, if you're pretty sure you're going to go for real engines, I would just call it the real sentence proposal just to make it very clear for people what to do. But if, if there's still a risk of confusing the the goals and the outcome, then I'm sensitive to that.

D

um Like my one of the uh goals of using more generic naming was like to not impose like the solution, it was to solve it as one of the solutions. The problem is that this was the only solution that we validated at that point.

D

So um I think, like the my idea behind that proposal in that form as it is now, I'm not gonna be doing that at this moment because of the other engagement, whoever is gonna, be doing that we're gonna have to read that and decide if the rise and join is exactly the direction.

D

I think the answer is probably yes, but at least from my perspective perception, I would not like to impose this, even though it's like heavily documented and mentioned, and in a number of places right now, even more places after your comment to to say that it was motivated racist on race enzymes, as this is like the most compelling solution uh for for this proposal. Okay,.

E

I'm afraid of a whole bunch of crazy micro services, ideas and stuff like that, and just like your pg solution became known as the zip solution. I think it's fine if this becomes known as the fails engine solution and then we're an iterative company. If there's a better solution, happy to discuss that, but.

D

Okay, so let's, let's do something like that, I'm just gonna add uh some extension to the title race, enzyme and like let's keep it like that. It's gonna be pretty like apparent and like whoever's, gonna then look at that can like change the course of that. If they find like the comparison.

D

D

uh So I also like started talking with people, and I spent some healthy amount of time and I started writing like different approaches. How we may approach this particular problem that we have and started writing some design goals with anticipation that we're going to be testing some of these solutions.

D

uh I mean, with some of these like proposed approaches to see like how complex they are and like what the consequences they have um now. I kind of probably spent most of the time on the application level, starting for various reasons and also like spend some effort on thinking like how the potential proposal could look like and iterations for that proposal, uh and like I mean it's very, it's very wrong. It's very pregnant!

D

It's like was done today, but it was following sid uh your idea of like co-locating top-level groups and seeing how we could maybe use this uh to iterate on that.

D

I'm kind of like expecting that they're going to be more proposals like to evaluate I'm just kind of expecting that, once more people start joining working with the team, we just start validating these things. How complex they are to actually implement the github, because, to be honest, like uh it's great on the paper, uh but like I'm just curious like how expensive is to do like database level partitioning and what would be broken and the same for the application level. Charging.

D

How complex would be to have many databases, many charts, many logical or physical databases supported within from the single application. So uh let's uh at least like the work progress from my site.

E

Cool thanks great to see that language there.

E

I'm I'm sold on sharding italy uh per project. That was the right decision. uh So thanks for your explanation, there uh last time I'm not sold on the elastic starting per project. uh Yet I think we should share top level group there. So I'd love to see the kind of what are the implications of starting per project like what, if you search in a group, how do we bring that together and also like to see like what would be the implication if we switch that to top level name space.

D

D

So I may be able to add to that. I think like uh see it's like not the problem on me of the elasticsearch, I kind of think like it's more the problem of all our different data stores that we have, which is also like, for example, redis and now like if we improve the residency of the database with let's say application starting then like we're gonna, have a single point of value being reduced. So then I guess my question really is like.

D

I think we should consider like how we do chart like the the data holistically, which kind of takes into account all the like data stores that are associated with the given entity, because I know that like we, we have. We use redis extensively right now. We uh start radius by the function. We have different radius for the persistent.

A

D

Data, but it also gives you that, if there is like the problem with the radix at any point, if we now start starting application- uh and we build like the logic that we are fine with the degradation of one of the databases that we connect to it still like, the noisy network can affect, uh like all other customers. So I think, like my perception about the starting in this context, is like to figure out exactly how we could.

D

How we could handle like this kind of a way to contain all the data stores for the given group to ensure that, if only like one part of the gitlab is not misbehaving, let's say part, that's ten percent of the group. It doesn't truly affect any more than that and github can still function.

D

uh That's at least my perception after thinking about the database starting, but I think it has long casting implications for all the all other sounding techniques. As.

E

Well, yeah, I totally agree, I think um we should, I think, we're going to end at starting our top level namespace, and then we should chart everything so not just the database, but also elastic on that level. Also redis on that level.

E

Also, the application servers we're going to have specific application servers, we're going to allow you to also bound certain charts to certain regions.

E

Yes, I don't want to leave the european union with any of that data.

E

And I think we've shown that, like with a single database, we can handle hundreds of thousands of users. So that's that's great. I think a top level namespace, it's okay.

E

If you can't have more than a hundred thousand people in a top level, namespace limited to that you're, a bigger organization, you need to use more mult top-level name spaces, but that will allow us to on that namespace level, kind of have analytics and statistics and everything else within the database without having to join across databases without having to join search results across elasticsearch classes without having to join reddit's results across various clusters. Greatly simplifying this effort.

D

So I was actually thinking about that, and this is maybe like why my perception was about the project versus group, and maybe this is where this kind of confusion that came out of like if we like, define a like the smallest survival entity as a project. We have possible iterations in the future if you really need to first to start based on the group and all that, let's, let's flag, the first possible iteration could be. We start on the top level group.

D

But then, if we notice this to be like a contention point, we can then start on. The group which is like could be a subgroup of this top level group and do aggregation. And then, if this is not enough, we could move into starting by project with the internet that we may read data from many charts. It could be like one two three, which could be like some kind of unique amount of the shots, but kind of still like perform uh the aggregation.

D

So maybe this is why I was always saying about the project, because the project is the smallest but very big entity uh at which we look at that may be useful in the future. But we can now do this simplification about the affinity of the data, because we don't really maybe then have to change too many aspects of the application. Today, starting like from very simple concept to like very broad to something more defined over time when it's really needed.

E

um I think this would be helpful to record. Are you okay with doing our recording already? I can't live stream. um I think this is exactly the conversation that we should be having. So thanks for that camille.

E

I think that saying hey we're gonna start on the project, because it's the most atomic level and then we have that optionality. I'm worried about three things. First of all, darkness, you ain't gonna, need it like. um Don't don't do something to have future flexibility because it comes with a cost and that cost is immediate. There's an immediate negative impact. If you start per project now you you most of the time they will be on the same server. But in your queries, you can't really rely on that.

D

I I I think we misunderstood to some extent I'm saying that, like we allow the smallest possible entity to be a project if we look at the transferring that entity, but we truly like define as current sharding scheme as a in one of these proposals as a top level group. So we guarantee from the application level that if you access the hierarchy up to the top level group, you you have guaranteed that all of data is collocated.

D

Yes, this contract can break later, but it's the contract that we make today that this is like the top level, but like the smallest possible entity that we move, because we need to have this entity to be more viable because, for example, you might move between groups or groups between groups. This is like the smallest entity that we consider as a movable entity.

E

I I don't understand how that would look um like. I think suppose you move a project. How does that look like moving that from a database server to another database server? I can see all of the projects being on the same database server but being in separate databases, and then I think you still have the negative impact like you still have to do. A cross database query to get top-level metrics, so I'm I'm not sure, I'm understanding your proposal.

D

But what I it's like, it's very it's like very preliminary, like you didn't validate it, but at least my perception is like that.

D

If we start by the top level group, we always ensure that your project is always on that chart asset of level group so like the project move right now is immediate, because we only change the reference in the database in the model when we say that your project is collocated, we effectively need to move the data from one data database to another which can be fine, because user is a conscious operation of the user to make it goes to the settings. It clicks transfer a project right now, it's immediate!

D

Maybe we like, if this is the model that we take. Maybe it's not immediate. Maybe it takes one minute or maybe three minutes how long it takes to transfer data, but it's a conscious decision with a downtime of may with the decision made by the user that we need to rebalance the data, because our starting scheme says that your data is uh on the different chart. So we always maintain like this locality of the data. We never break that contract to ensure that we don't ever execute cross database queries because they will misbehave.

A

This helps resolve it. So when I, when I hear what you're saying what I hear is, is that we shard on the project, because it's the smallest element, but then what that allows you to do at a higher level is basically, as a business rule, say. Well, all projects are going to be guaranteed to be in the same physical database server. So we you have the atomicity that you're looking for, but you're still addressing sid's concern of not having things spread across physical databases where queries suddenly take 10 minutes.

E

My concern is having spread things spread across different databases, not across different database servers, because then a lot of the stuff that people pay us for would be cross database, serv queries and those are slower.

A

Yeah, so, and and what sid I think has heard an amazingness of our one-on-one a week ago- is that sid is has heard that we are proposing a database per project. But I'm not hearing camille, say a database per project.

D

No I'm I'm saying that like like there is a potential that, like a smallest entity, that we may move is a project, but now we define like we, we design a system in a way that we guarantee that, on that same database project and all associated sorry, like the top level group and the all the projects of the top level group are on the same database, because we assume that the most of the features that people use are on the group level and there is only handful of the features that they use on the instance white level, which could be like to-dos explore function, user issues, user, merge requests, but the majority things that they do.

D

They really on the group level like issue, search of all the projects or some other features. That may, of course, change to some extent, because we now consider some features that may be cross groups.

D

I think there was some mention about like creating relations between issues in different epics that could be like living in different groups, but this would be like like in one of these proposals. It would be the main design rule that okay.

E

So everything so everything.

D

That is of the top group.

A

The only time someone's data would be spread across physical databases, where suddenly things take a long time is actually.

E

What is the physical data they need to.

A

Migrate data and they know that their their data is unavailable while we're migrating it. So their data is never spread across physical servers where something queries need to be aggregated.

E

You cannot use database and database server interchangeably. Those are very different things.

A

So, what what do we? What do we want to say just to distinguish like a database in us versus a database and your.

E

Database, a database server is like a postgres server, something like that. A database is within a database server. You can run multiple databases.

A

Right so I I haven't heard people propose that we're running multiple databases on a database server with one with one project each I haven't heard that. But that was your concern. A couple days ago said so maybe just clarify whether or not that's part of your problem.

D

um Okay, this is this is like very first proposal. I I prefer to say, and make this clear distinction: physical databases and logical databases. You can have many logical databases on the physical server and effectively. This is, for example, one of the suggestions that I received from the carrier today that we may really want to use a smaller amount of the physical servers to collocate, more logical databases, because in despite that, these logical databases they grow significantly.

D

We move them completely like to the physical servers, I'm okay, with what.

E

What I'm, what I care about is that what customers pay us for is the list under g3 audit events, audit reports, code, analytics insights, portfolio management, security, dashboards.

E

I think that is really important that we have everything in a group. Many of these features are group level. Everything in the group should be in the same logical database so that all of this works really simple and well.

D

That's that's! That's one of the proposals exactly that. I'm discussing.

E

Yeah but then I don't understand how that is compatible with chart per project, then I don't know, then I don't understand what what chart per project means if, of course, I'm I'm for kind of moving projects. um I think that moving is just a moving functionality, but I don't understand why it would be called sharp per project if you have the entire group in the same logical database,.

D

So maybe it's more like the problem of my semantics and like and like the potential possibility of considering that, if in the future we did grow a single database, we may consider starting by more like by not this top level entity but more like a sub-level entity. So I think like to be more clear like this is more like, I guess semantic problem in one of these proposals. Okay, because.

E

D

Like it's, it kind of really describes like your proposal of like we collocate our data for the top level. We ensure that if you move a project, we we move this data uh physically from one logical database to another logical database to fulfill this expectation that all data.

E

Are of course, of course, you should be able to move move projects so everyone's in agreement there. I think what zooming out a bit, I would say, look if we have top level name spaces in different logical databases. We have two orders of magnitude improvement in performance.

E

I think our biggest top-level name space is taking up less than one percent of our database queries, or at least I hope um sharding per project would be another two orders of magnitude and I think it comes with a huge additional cost in user experience and code complexity, and I really want to caution of us against biting off that much if it's just not needed now and in the it's not needed. Even when we're 10 million users.

D

I agree with this test statement and this is more like the taking that into consideration if it's needed. What would be the path if it is needed to like compare to the current proposed architecture.

E

Yeah, I I think we should look at that. But what I'm also seeing is that, like we're, sharding elastic search per project instead of per group- and I think it's a mistake- and I'm just really worried that people are going to default to project and that's what I'm really trying to warn against.

C

Yeah, so this is fabian. I think my understanding is we're actually all in agreement that any solution that actually gets evaluated needs to be evaluated against the functionalities that sid put in point three. I I don't quite know how to pronounce it, because those are the things that could need to continue to work at least as well as they do now right.

C

Ideally, they actually get get better, and so I think the proposals that camille made- and I think, there's maybe more than one way ultimately to accomplish this- should be evaluated against this, because if, if they're not performing and they're degrading that user experience, I don't think we are we're winning anything right. We are. We need to solve our scalability issue and keep that in mind.

C

I think that's probably the most important evaluation criteria, at least to me, obviously there's the technical complexity and and other things as well, but I think that's that's how I understand the the requirements here on a high high level.

A

Thanks for that I said yeah, I agree me saying physical database and database- that's not a good terminology set, so I asked chen to update the glossary to camille's terminology physical database, logical database, and if we always use those, then it's very clear which one we're referring to, but I do I do think we all. I think we agree more than it seems from that discussion.

A

So, hey sean. Can you take a follow-up action as well to address uh sid's concern around elasticsearch and search by project? um It feels like you did not answer that question well and we need a more uh tighter, a tighter answer on uh what our plans are as well as uh how we, how we think about charting for elasticsearch.

B

Yes, I will uh work with cz to get uh information assembled together and what we are planning to do. Yeah.

E

Thanks, uh thank you so far appreciate it and I think the outcome of that will be like now that we think about it. The group results are slightly worse because we can't really have elastic do that. We need to do it at the application level and- and I don't think, that's that that's a good trade-off, but um let's do the investigation and looking forward to those results.

E

C

We have an issue about uh the starting strategy: job for the elasticsearch um yeah happy to communicate, asynchronously or even synchronizedly. We can also summarize our current investigation in a more detailed form and share with the group.

C

A

Andres, I think you had an issue you researched top level name space and starting a while ago, so I just want to refresh my memory little pitfalls. If you link that in the dog that would be helpful.

A

Yeah, it's not it's in the handbook. I think I have it right here.

B

Okay, uh we are one minute over uh quickly go through the the blockers. Jerry is uh working on the pg-12 upgrade uh today on staging and this weekend on the production environment. So uh that's a blogger. He is dealing with uh other discussion items. uh Do we want to discuss today or move to next meeting.

D

I'm fine with moving mine.

B

Okay, we'll move to the next meeting yeah okay, what's happening, uh camille will continue to work on the refining. The the design, docs and also uh camille will talk to nick fabian will work on understanding how our existing qa tools measure the performance.

B

Okay, uh guys, the meeting for today. Thank you. Everyone uh talk to you next monday,.

E

Thank you, bye.

A

Bye, hey stan, I just linked it on uh 20, hello, x. There all right! Thank you. Yep.

D