GitLab Sharding Working Group, 4 May 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020 05 04 Database Sharding Working Group

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Don't see my screen aren't sharing my screen all right.

A

Haven't seen anything that indicates that we are not know, we are on track at the moment because they already come in executing latest on stage a all right.

A

So what's been done this week, so under s did a lot of partitioning demos, just the coding part of it showing what it's going to take to do some actual partitioning scheme of migrations. You can watch all the videos there they're very short and if they span or between 5 and 11 minutes, great well rehearse presentation. Thank you Andres for that and the merge request is out there for everybody take a look at and then Andres. You want to give a quick overview and verbalize your comments. There yeah.

B

I just want to make sure this is summarized in a sense that it's not a workable solution, so we've been just hacking on this for a day or so, basically using Postgres partitioning and combining that with foreign data represent, um that's been good good inside as well, but it's not a ready to use charting solution and, since positives doesn't support native sharding.

B

This is one of the roads that we did were sort of investigating, but it's not a ready-made solution. I just want to make sure this is. This is clear from that from that point, so it's not a drop-in solution for us and we are going to need to change the application quite a bit. Even even once we have that solution. It still requires application changes so that that's important to note from that point.

A

And we have an issue. I will comment over here that we will start tracking, as we continue on with these prototypes and exploring partitioning that we can start tracking what application changes we need to make.

C

So a quick question, so we are looks like a. We decided. This is a technical approach. You know going down the road, the mass of the war, technical approaches. We are going to solve the partitioning issue with tasks right so.

C

Yeah, just just confirmed business there's a way that we're going to pursue a particular charting or partitioning.

B

So I think we're pretty clear on the partitioning site, so that's well understood, and this is a tool that we can use going forward for the sharding slider, but so basically the ability to distribute those partitions across many service. This is not really clear.

B

There is a way of doing that using firm data represented here, but it's totally unclear if we, if that is the path that we should should be taking, and if even if we did, there is still a lot of challenges in the application left to benefit from that it sort of replicates what side of CP does where it's a similar pattern. I think some aspects, and what we also know is that side has to be is also not a drop-in solution for us. So we're only exploring this at this point.

B

We don't know if that's the right solution for us.

C

Okay, so so now we are working on a partition in solution or technical by sharding studio to be sorted out how to proceed with or personal sharding solution. Yes,.

B

In a sense, the partitioning step, I think that's a good iteration and that in the right direction- and we I think we should be starting without the sharding topic. It can build on top of that, and it can also benefit from the findings that we have in this first iteration, because we already start tackling issues in the application to support the partitioning better and that also benefits the sharding approach later.

B

D

I would like to add that it's it's hard to imagine a situation where, where sharding will not take us to do heavy changes on the application, starting is you know, after all, is the last resort before all the vertical scaling capabilities have been have been done right, and so we need to assume that is going to be changes to the application. Actually I would say the more changes we do to the application, whatever that that cost time factors may be the better, because we will be able to do things smarter, maybe long term.

D

This is not a desirable goal, but since there is no drop in solution, the more we accept this, probably the better will be. Actually. This is a very good, very good approach and I welcome this demo a lot. These demos watch the videos they're cool, and this is expected. This is exactly what partitioning looks like with with phosphorous right before in their wrappers and partitions. Now, the key here is actually my opinion. Is that this part should be the minimal part of the of the partitioning itself. I mean it's the kind of the worst case.

D

The worst case is that you really need to go through the FTW to go to partitions on remote servers, but the idea is that, in my opinion, that 80% 90%, whatever high percent that you can be, is that application we'll talk directly to this chart. It needs to get the data from or if we don't want to make the application so much more complicated to write a proxy layer that will intercept this and know which partition to reference this. This query, but if most queries will go through the FTW is not only.

D

The challenges that you have found will happen, but also performance problems, because, for example, for in there rappers mice are not parallel. So if you, if you need to scan late, let's say 8, charts, post Chris will leave this in sequence and lag for that way will be pretty high. So, first of all, I would say the application needs to be changed, that all the queries that will touch more than one partition need to be carefully redesign, understood or else assume that those will occur in a higher lag right.

D

But if we can change the application analyze the application in such a way that again, 80 90, whatever percentage of queries, can target only a single shard directly then either through a proxy or directly talking to the shard. We will avoid most of these problems that there's definitely challenges on this, but maybe we need to face them anyway, because there's no drop in solution or magical, you know drop something and we'll sharp everything for you transparently. The dustless exists right.

B

Yeah I really agree with that. I think that's yeah! We already see that when we, when we talk about just partitioning one table in lab, we already notice that there are different access patterns and you you want to think about those because some they they work, fine with a partitioning approach and then the other day conflict with that. So how do you deal with that? And we yeah? We picked a a small example where we we sort of made that exercise exercise earlier. I think it's it's crucial for us to to.

B

You know, figure out what what problem we're really trying to solve for pick a specific feature where we know that it has scaling issues in the database where we expect more data to come and use that example and then provide a solution for that. Maybe partitioning or likely partitioning is a good way of approaching that, but I think so far, we've been we've been on a very like general level, discussing charting solution, speed outside used to be or be that using foreign data represents growth and stuff like that.

B

um But in order to iterate on that, and in order to to get that going, I think we really should pick a feature that we care about. That we know, has scaling problems and then start with a partitioning implementation and figure out the problems along the way.

B

So question I think this is also a product question where we, where we want to invest or what we know which which feature would benefit from that approach. We're currently discussing in that direction of all.

A

Yeah and that's the what's happening next, so continuing to work on isolating access patterns and I believe some of those have been added. Yep I, don't know, but they have so I have a list of them. I will add to this issue, so we can further iterate on, but last week was a light week and we had the holiday and most of the database team was in training half the week. So we'll continue to iterate on isolating the access patterns and get a good example.

A

We can do the next iteration on and if there's any other unknowns. While we work on partitioning each other.

A

Let's jump over to container registry dan, yet some questions yeah.

E

I guess I mean some of the conversation just had has been helpful and answering know where we're at and what's going on. So thanks for everyone for discussing that, I think. The considerations that I'm trying to weigh up against here is that could be a fair amount of effort to do two proofs of concept, one with an integrated, fully integrated schema into the existing database and the idea of having a separate databases.

E

Andreas asked somewhere in here as a down that down most a little bit that was actually directly discussed with we've, said in sort of runs afoul of the idea that we should be a single application single database, which is why there was some contention around whether we go and create a proof of concept to represent one of the other.

E

The design that I think you linked to here, andreas that had been come up with with lots of discussion with was Roman and Halley, and everyone container registry had led to the point where we decided to go and create a separate database. So that's the thing we already planned out and we're executing against up until we started having discussions about shopping and long term an application extraction, and all of that, my understanding from the discussion we had with C last time. I, don't remember exactly when it happened to Craig. Sorry.

A

E

And and and so that the ID, a single application single database having a separate database, doesn't really do that and I get. There are other applications I get that there are other stuff going on, and that's fine, but we're about to go and create this.

E

So I was trying to work out where you know whether we had capabilities that would support an integrated schema. Whether an integrated schema would satisfy that integrated schema nominee. Sorry, a schema in the same database. It's separated just for the container registry.

E

They should be no querying across it. But my question fundamentally was that shouting a database is the shouting of databases containing multiple schemas. Is that supported in postfix 11 and then under you, wanna verbalize? That.

B

Yeah, so post-course itself doesn't do any charting any native sharding. What you can do is partitioning and that's that's unrelated to using schemas or databases. So I think this is a cross-cutting concerns.

E

B

So perhaps can we talk about the container registry for a moment, we're saying that we want a single application single database, that's kind of a a strategy formulation for us. We talked a lot about the database, part of that where we say that, yes, we want one database and I think there is still a bit of computing going on with terms, but setting that aside for the moment, I think if we say single application single database.

B

Doesn't that also mean that we put everything into one application because so far you know we've been talking about the database, we're saying that everything should go into one database but at the same time we're sort of heading towards having container registry separate, not saying that's wrong, I think I think that's the better approach, but we should decide for which one which direction we want to go. Is it one one application one database or it can it be one of application with multiple services that have their own logical database?

B

I, think that makes the most sense, but there is I think there's the to two different directions that we that we discuss yeah.

E

And I think I think that's a good point and I agree with that. Concern. I think the I think we already have separate applications. We already have Italy, we have workhorse, we have container registry, we have runner, we have a bunch of separate applications that are actually written. These are all did I, say all of golang I. Think all of this stuff is going it's all written and go for a reason.

E

I, don't think we considered and I'm started. Just for a bit of background. We had considered riding container registry replacement for Ken to contain the registry into the the rails. App, that's not the way we ended up going and there's a bunch of there's a bunch of factors of why and I think I think we can get into it. People won but there's a bunch of reasoning, but we ended up choosing to take the occupation registry and fork that code into our own code into our own. Our project.

E

That being said, I think at this point the concern is, we can continue diverging from the single application single database and it's already it is already something we're using and it it's pluggable. So you can use a different container registry solution. You don't have to use ours, there's a bunch of other factors to it. We can continue diverging and if we can I think I'm summarizing SIDS concerns as the best I can, but we continue diverging.

E

It becomes harder and harder and future to scale like we've got all these separate databases, it's hard to maintain harder to support for our customers and it's harder to be as it's it's not as an easy solution, as we want it to be right because it's new and this database that we're trying to create doesn't exist. Yet that's kind of why it's like our again, let's not add to this problem by creating yet another one. Let's, let's look at a way we can actually move forward. Go ahead. Address re, I.

B

Was going to say regarding the scaling issue, I think it's actually the opposite. If we don't cram everything into one database, we have much better scaling options in the future because we can scale them individually and what we discussed is making that easy for customers when, if they use omnibus, for example, on those separate databases, they wouldn't even notice because they run on the same database cluster anyway.

B

So it's not that much of it's actually no difference for a customer terms of complexity, but it would allow us some good luck, um for example, to scale those databases individually. So we would have a chance of scaling the rails database, for example, differently from the container registry, which is a huge opportunity.

B

A

The other piece scalability and support ability was a concern of SIDS, but the the biggest piece and the biggest concern for him was the iacv and selling the single application single data store, and it's actually in that video and I'll find the link again. But it was about 14 minutes into the video with Sid from the Friday before last word.

A

We really need to push that single database single application and it sounds like we need to get him in the same room with the majority of us, and we can talk about what the trade-offs are going to be to Dan and roms point in that issue about container registry.

A

If we need to integrate it into get lab and to fully support that single, like a single database, single application quantify how much work that's going to because it sounded like it was on a scale of months and correct me if I'm wrong Dan, if that needs to be integrated into kit lab and talk about what it means to have a separate application, but calling into a single thank you for the video length on separate application calling into the same database. What the trade-offs are there and yeah?

A

He said it once I know: he'll probably say it again: we can talk theoretical, but till we actually have data or samples of how this actually works. It's still just theoretical, though.

E

Well and and that's a hundred percent, a good question I did pose that there is an issue we created to discuss that and usual gave us a really nice explanation of why that's problematic and I think it comes down to like we could do that with as the minimal impact on our deliverables in terms of getting the online garbage collection implemented inside the container registry, which is easy just for everyone's reference here container registry storage is a 4.1 petabytes right now and we have no way to clean it up and it's unbounded growth.

E

So, like that's what the online garbage collection is supposed to solve. We've been targeting mid year, probably end up being August. We have a very small team of people. This would delay that, however long it took to implement this inside as a POC, and so my concern here like we can implement it very quickly in the same database, but it really doesn't feel like it. It is the spirit of what she's asking for I think it's doable. We basically have rails own the schema.

E

Have it read only inside the rails application because there's no overlap here. This is not shared data across the container registry, in what our rails code would be. Looking at its straight-up totally separate internal data structures for the container registries on my garbage collection plan, and so from that perspective it's a perfect example. I should say, but I think it's a great example for something where we do application extraction.

E

But if we really do want to sort of tease out what it would cost and how much it would take, how much effort it would take to to implement it inside there's the version of like yeah we're just going to put it inside and have rails, manage the schema and have it read only that's not a huge deal, but it just means it's all. These extra tables floating around everywhere and it's like it, doesn't really feel like it's properly integrated to me.

E

So I'm really concerned that that would end up being just like a like a throwaway yeah, it's integrated, and then you go why? Why do we do this because just not really adding any value at that point, and so I really I really just want I'm kind of concerned about. Let me find that issue that Jerome created for everyone's reference.

E

E

That is actually the one you shared above Andreas I. Think now that I'm looking at yeah I.

B

Think it for you yeah, you.

E

Know you're I apologize and guys go ahead if.

B

We already know that there is no overlap between between those two datasets from the rails: application on the container registry, so those services they don't they don't need data from each other and cross join them in the database.

B

I think that's a very clear point of making a separation there and perhaps also the reason why we why we would extract that and intercept the service in the first place, but typically resembles great when you, when you put it in separate databases and not try to to stuff them in the same database, so I don't see reason not doing that right.

E

And that's a hundred percent and and I agree with that, but that does not address his ised concerns right and and and those are completely valid, like as a company I'm I, think it's very important that we remember that we need to make money to be a company and have jobs so I feel I, don't mean to be too like goofy with that statement, but it's really important. So we really feel like this is tied directly to our iacv. Then we ought to go. Look at it I'm.

E

What I'm getting at is that amount of effort and to get that happening. The smallest possible thing we can do there feels like it's just like oh yeah, we Chuck the scheme in it doesn't help with scale. It is contrary to the other efforts. We're trying to push perhaps and I would defer to you on that draya's, because you know that better than I do. But that's kind of my big issue here is like yeah. It's a great case.

E

That's why we ended up kind of choosing that path forward, but that is is directly in conflict with the concern around ICV. So how do we square that and that's kind of my point: I'll, stop, rambling, sorry and.

A

I get ondrea's point of you know. If we do put the schema into the single database, we still have the separate. Is it still considered a separate application because the container registry is running and go and it's not part of the larger rails application so does that still fit within SIDS vision, single application, single datastore? In that.

E

Great totally agree, I think what I would sort of thinking and this only really sort of happened later last week, I think question mark, maybe I've just missed remembering six days ago, I guess so like midweek last week, you know evaluating the need to integrate what would be required and then sort of evaluating whether a proof of concept makes sense. In spite of the valid concerns you raised, andreas I don't need to dismiss those at all.

E

What what would actually be involved in how much energy would that mean and then on the application, integration side, I guess I have the similar question to what you raised andreas, which is, do we then integrate it? Do we do you don't throw out this idea of having a separate container registry and a separate Runner and a separate this?

E

If we don't have a case to be made around those things being separate, then they would not be separate, I suppose, if that's the in line with our eyes, I see the efforts that, since it's it sort of being quite clear about, and if that's the case that maybe needs to be a broader evaluation in this particular case, it is a good really good example of extracting and being separate because it is totally separate. So.

B

Good sorry! So what is what is the impact on ARA ICB for either of those solutions wise one, one better than the other having having a single database for them were putting them separate? So.

E

It's iacv, which is incremental annual contract value, which is our ongoing and just sorry, I use acronyms and we're kind of not supposed to because it's confusing or it can be confusing. It is for me often so I apologize for that um I haven't got the quantified numbers from CID like I, think we could get those numbers if we really want to weigh out what it would cost.

E

I I think that's a fair question: I don't have an answer for it other than that address.

B

Perhaps we can clarify on that. One I feel that we might be making too much of a problem making that position if it goes into a separate, logical database where lives in same schema, because that's technically that's not not that much of a difference really, but it's it's a good way of separating things, but I'm, not even sure if that comes with with additional effort. If we, if we wanted to do that, so I guess, I'm, saying I'm not I'm, not fully understanding the concern that we have about that.

B

A

I I'd recommend going back and watching the video link that Chen added there, so he really emphasized how much the single application single data store really drives sales, and if we need him to come in and and kind of communicate that again, then I can certainly ask him for now. I would say: let's ping him on the issue here and ask that question.

A

You know the we've got a few different permutations right if we insert it as a schema into the existing database, but it still lives in its own go service, which seems like the simplest iteration that we can do right now. Right, sorry, I need to see Dan said see it for the green. Is that the simplest iteration, the least amount of work is just putting it in the existing database as a separate schema that will cause your team, the least amount of work you're on mute.

E

Thanks yeah, that's I mean that's the simplest version that doesn't require a keep APRI architecting this solution or well. It wouldn't require a reality, necessarily the integrated fully, but it just means that we then have a bunch of potential issues around naming conventions and changes around stuff like that. So.

D

E

Separate schema would be best. Second best would be that sort of read-only in integration with rails, and you know the read-only models and it requires a whole heap of extra maintenance and blah blah blah so yeah. We.

A

Can either start a thread on the CEO channel or actually comment in this issue and get CID feedback on which direction you would like us to go and I would recommend the issue so that everyone can see it because, like Alvaro and unrest, folks won't be able to see any feedback in the CEO Channel, so I would throw out the different work streams that would need to take place like just kind of a high order of magnitude. How much work it's going to be and asks did feedback on those like?

A

Is it okay, the simplest path or talking about a separate database schema, but it still lives in its own application? Does that still meet the our values of single application of single datastore, and we can continue the conversation asynchronously rather than trying to set up the time to get involved in this conversation, so nobody else wants to take it. I certainly can. But again, can you add that one yeah? Okay, thanks.

E

A

E

At time, pates yeah all.

A

Right Morgan, Sandra's, yeah.

B

There is a scalability concern with putting everything into the database, obviously so a follow-up question if we wanted to do that is how do we deal with the fact that this database is growing and seeing a lot more traffic, because we we don't have the ability to put that in separate databases but put everything into the same database, adding services using the same database. There's a scalability concern area transfer there as well and.

A

I think I mean from since perspective: that's what charting solves and if that's not what charting solves. We should find a way to prove.

A

Okay, let's we'll take that conversation offline, the container registry, one thank you Dan and if there's any other comments or conversations and sad them in here and I will follow up asynchronously thanks everybody thanks all right.