GitLab Database Office Hours, 18 Nov 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Database Office Hours 2020-11-18

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

B

Hey hello, everyone great to see you again in weeks, hi.

C

We should try to join all of his hours and at least say hi.

B

Oh yeah yeah. Why.

D

C

Let's wait one more minute and we can.

B

B

Are things going? Are you prepared? Are you again passed a feature, freeze, rush.

C

Or calculate the reviews now.

C

C

Side: okay, let's start so welcome.

A

To another database.

C

Of showers uh for wednesday 18th of november, uh I'm going to share my screen with the agenda.

C

So the first topic is by craig the engineering manager of the database team. It's about gathering feedback on maintainer, how much time we spend on database reviews common themes. What can we do for making reviews better?

C

There are a lot of some interesting comments by tuna and adam and a proposal by albert that could keep us working for the next two years.

E

Yeah, that's for the next century,.

C

So yeah, I I think that I I will discuss alper's.

C

Proposal it could be a new, a whole new product category for git club, so like reviewing database updates in general, I.

D

C

You anyone want to talk about uh what they proposed because got some additional proposals on reviews etc, or should we take it offline.

A

uh Just a comment I was having a similar proposal that can maybe work for the for the reviews. Yeah that one uh I don't know don't want to spend too much time on it. If you have time just give it a read, it's still kind of fuzzy, so it's not entirely worked out but yeah.

A

C

Okay, we will take it in the.

C

Comment foundation thanks. Thank you anything else.

E

Yeah particularly difficult, are, you know, a scope, change occurs and or a finder change occurs. A lot of different code is using it, but we don't know which one in that particular review. So we don't review the other bro things which might get broken. That's, I think, a fundamental challenge.

C

D

C

And it's difficult to know, yeah to figure out all the affected parts of the application.

E

Yeah, what happens that I usually ask hey? Is anyone else using? Usually the answer is like uh no, this is I mean if it's hey, no, it's the only code which I'm using that's fine. You can, of course, search for it, but uh finder finders are also a bit difficult in that you know they are building up a query in runtime. Sometimes the input comes from the user interface like sorting column in that case, there's a cloud of possibilities, possible queries which may come up and we have no idea which one is affected, particularly.

C

Yes, I agree and we have caught off guard sometimes with finders, that there are specific combinations of inputs of parameters that can make queries unusable or can lead to queries that time out. So yeah that's a problem.

C

C

So the second topic is uh the recap of the discussion. Query timing: recommendations steve has created the vmr. Do you want to discuss about it still a little bit.

D

uh Sure um yeah I mean I think we had a good discussion, or at least a good initial discussion about what kinds of guidelines should we have around the timings that we require or expect when introducing queries or modifying queries and um just kind of creating a central place for that in the docs.

D

There's. Certainly, I think a lot of room for additional information there and additional like guidance, but um I think we've got a good starting point. It looks like it was merged this morning or yeah earlier today, depending on your time zone.

E

See what is one short thing about query performance? uh I when I like one year ago, I reduced it from 10 seconds or whatever in the document to one second actually and in short, shall we aspire.

E

I could tell exactly like the one line word. What do we say when we see a query performing over one second or.

C

I think that the fact that uh we have it in the guidelines is a good start, so we can just link to the guidelines and the guidelines are setting the rules from the.

A

C

Team perspective, I can uh tell you that we already have an updatex. We are already measuring the performance of gitlab against the 100 milliseconds and the 50 milliseconds for github.com. So in general we don't want to have queries that go above 100 milliseconds. There are always exceptions, so we can leave with a couple of exceptions there, but it's important to to try and lower execution times of most, if not all, queries.

E

Yeah, it's very good result, because, one year ago I tried to push it under one second, and it was already hard, and now we did 100 milliseconds and also we know that, like on any screen, we should have less than 100 queries and the total of them should also be you know less than 100 milliseconds. So it's coming to an interesting point, so I mean um let's say you have a few dozen queries on a page.

E

uh Apparently most of them are simple, selects like select usernames limit1, but there might be one or two like search queries which are finding a set of records or a list of records they are allowed to. You know to be maybe around 100, milliseconds or less than 100 seconds, but the rest should be even like less than 10 milliseconds hp, which is always true. Writing I mean single object. Finding like finding a single row.

C

Yeah, I agree and we should take our responsibility here pretty seriously. If you want my opinion and uh one of the comments. My comment there was about user facing queries, queries that are used, as you said, in the controllers. Apis, uh it's not acceptable. We.

A

C

uh The performance of git lab as a whole, if a page takes three seconds to load because there are queries in there. There are 10 queries, it's taking 500 milliseconds.

C

This is uh this is a problem and we should try to not to increase the response times. So, on our end, what we can do is keep all queries as low as possible.

E

Does it make sense to ask in the you know, ui queries uh the authors of mrs to use the performance bar and to get if they are working on a user screen and to get a list of the queries from the performance bar and put it in the mr description at some point so that we know like we slowly built every screen.

E

In total, all the queries below like a certain limit.

C

Yeah, this is a food for thought. That would be great, but you know you know that most of the times we are not exposed to the front-end changes to the controller or view the controller changes. So, and this is also the discussion we have on uh having a larger mars that include both the database updates and the application code update. That will allow us to see things like that.

C

I don't know if you are aware of that discussion on.

E

No, if you link.

C

So yeah, I will find it and link it in the documentation.

C

So there is this discussion on uh not breaking anymore uh merged requests and updates multiple metrics, so that the database update is on its own and including the database update together with the application update, so that the database reviewers are able to see the the effect of uh the update. So this is a two part, so we will be able to understand what happens there. So there are some times that you can see a change in in the finder or a table added and you're you and you don't.

C

You cannot get the context without adding uh checking also the application code, and the second part is what we are saying. uh If you can see the controller, for example or the model you can.

A

C

Also, uh how this uh will relate to other parts of the application option.

D

Dude, I'm curious, I don't remember with the performance bar if there's a page that loads and then starts making api requests. Does the performance bar display anything from that.

A

At all, you will yeah, you will have you, can open the dropdown and see the the ajax requests. You could even see the graphql queries, cool.

C

Yeah they won't affect, they won't affect the initial load time. So asynchronous requests won't affect, for example, how fast the page will uh load so from let's say from an seo perspective, but we also want those api requests to not take uh 10 seconds so.

D

Right yeah, I was going to say. Maybe I was wondering if we should add a section about, like you know, expectations on an initial load time versus any other types of requests that occur uh after that. um That could be helpful when it when we are dealing with like an entire front-end page, that we'd want to look at.

C

I think that the the data that we proposed and that we merged covers most of the parts on our side. So if we require that queries that are participating, api responses or any response, don't take more than 100 seconds at least we try to do our part uh on that end. So, on the other end, if there is a loop that uh goes and uh and runs, 100 queries on a controller.

C

I hope that we will consider the review of the controller or.

C

Okay, so next item so I'll have some additional comments there. So.

E

You want yeah just a comment for the uh in git club product. We don't use graphql still extensively, but the challenge will be that in the front end, the front-end mark will be able to ask for like basically something equal to connecting to psql and typing 10 queries.

E

You know so the graphql challenge at some point, maybe also uh something which will increase the number of queries uh on front-end, mrs, which will be hard to detect so you'll, not know that you know that graphql now touches 20 queries and n, plus one risks and and it's a front-end dmr and you don't even you know, uh notice that there's a query change. So that's a challenge for reviews, just something if any ideas there. If anyone any experiences would be great.

A

Just uh just a comment we already facing these issues, so new ur's are mostly I've seen using graphql as this in in my team, and it's very easy to write and plus one queries and chasing them down and solving them on the graphical level is quite challenging and what I seem to to to mitigate this is introducing costs. So each uh node that you are looking up has a certain cost cost and and within one request you can basically have let's say maximum 10.

A

Well, you can reach maximum 10 costs and then, after that, you get an error. So we have some sort of limits we can set up based on the complexity of the graphql query, but we don't don't really use that at this point I've seen a few fields already having this, but I think the main point is the n plus one queries, because you can load up each merge request for each merge request, give me the divs the d files or the div stats, and that's you know you can set up page size. 200.

A

That's can be easily a thousand.

A

C

Great government adam, thank you so next um thing to discuss. I.

E

C

To bring to everyone's attention the effort to test with production data for anyone not knowing about that. So this is an epic. uh It will take some time uh to work through it, so, on top of anything else, we're doing uh to provide access to database maintainers to production like data through postgres, ai, we're working also on setting up servers in gcp, so that we can run migrations there.

C

So our plan at the moment is uh start a setup once in the server that will allow us to run in a safe and safely without any connections to the outside world.

C

It won't be able to send emails, etc to run migrations for any brands, and our plan is to start single server, create an image, then figure out a way to provide the multi-user server to all maintainers, and the idea about maintainers is that all database maintainers already have access to red data to production data. So we don't have to to worry about other issues with uh security, etc, and once that step is done, we everyone will be able to to test migrations from the cloud without.

A

C

And then we will have to think about how we can automate that, and there are two possible way uh roads there. uh One path is to anonymize production data, so why? When we create those thin clones also anonymize the data.

C

But that means that we will have to to put the effort to mark the pii information, the columns that we will have to analyze and and then anonymize the data. So that means that we will have a clone with anonymized data, so we will be able to maybe provide that clone access to more people that don't have access to the data and another path is maybe setting up in the future private runner. That will be able to run ci tests against production data and, depending on whether we are using a minimized data or not.

C

Maybe we don't return any. uh We don't have any laws in the job, but at least we make our lives easy. uh So the common case will be the test.

D

C

If the tests fail and maintainer will be able to to go and check what happened there without exposing the npii, I don't know if you have any opinions.

B

C

Anything to discuss here.

B

Yeah, in my experience, I've we tried to analyze data before and and as far as I can tell, it was not easy to do that like in a reliable way that also that the data also still makes sense, because you could just remove all descriptions from issues, for example, to to to some fixed words, but then you get, then you get no longer the random data. You have from a regular database so uh and then yeah, you have to be sure, you're anonymizing everything.

B

I'm also not sure. If you want to uh how we want to keep that database like up to date with the schema, because um would we would you be doing the production database re-analyzing over and over again, or will we run migrations on that database to keep it in sync yeah? I see a lot of uh things that can go wrong with that.

C

I totally agree and we have more than 300 tables, so even marking the the columns that need anonymization is a huge task. Then we would have to set the process in all database reviews for any new table in any new column, to go through security review and and also there are all the other problems like. We have json data and we don't want to destroy the jsonp data. So what happens?

C

If you have a data that you want to anonymize inside the json b, like, for example, we have audit events, you, you may have a customer's name, a customer's email, an ip inside there, so we would have to run the animation go inside there and anonymize only part of the json b.

C

The second part that worries me also with anonymization in, is that you mess with data. So you change how indexes work, if you anonymize names and emails or whatever else the behavior of the index is changed, you don't have any more the same data in production and I don't know what the consequences are and also you add, a layer of another layer between you and your production data. So.

A

You are not testing against production data. You are testing.

C

Against the anonymization process, so if the anonymization process has bugs you test against those sort of bugs but yeah, this is an open discussion. uh Please, if you have any comments, the the.

A

C

Is to skip anonymization and go directly to testing with the.

A

Production data in.

C

A way that does not expose pii information, so I gave an example like, for example, creating a private runner that will be able to use the server that I'm discussing the first step here and the test. Maybe this is a better option, but this is fully an open discussion at the moment.

C

Please, if you have any comments, add them to the to the issues.

C

Here, okay and uh one more discussion is uh gitlab database. Then we have discussed it in the past and we have also created a new epic. If we want to discuss it again and move forward. I don't know if, what's your opinion is creating.

A

C

Database them, uh if you have any additional comments there, the idea is creating a database game that will be reside outside of the the core repository.

C

We will be able to also use it in the version of the customer shop or whatever other projects we have and also we'll be able to provide it to outside contributors or anyone that wants to use what we are building here and maybe others can come and help us with adding.

C

E

D

E

We had discussed it's too early and also that's the initially she linked there. It makes a lot of sense, um especially the versions, app customers, dot app license data where I'm also working. We sometimes need same challenges and we just copy the actual version of the s. Certain database helper from gitlab code and it's becoming tedious, hm would be great.

E

One problem will be the you know, um so we have a little specific utilities and some generic, uh so we may move. I think the postgres specific to a certain gem, or we may just have a generic. I mean my question is: do you want to contribute to the open source community? In that case, we have to think differently in packaging the gems. In that case, we may need two or three gems, but if we just want to have something like a gitlab gem, that will be great.

E

The only um risk I see there is that the licensed uh versions, customers and some other applications are running on version 9, 10 11 of postgres. So that will be a challenge.

C

A

If we want to provide.

C

This as a gem, a generic term, we would have to also support older versions of postgres, and this may be uh maybe a problem or we may. We could set it to support pg-11 and upwards and wait for the other projects to.

C

Upgrade any other thoughts on this one, any comments: okay,.

C

This is an open discussion. If you have any ideas, any comments, please add them to the epic. So the final topic is by amparo.

C

Do you want to hire.

F

Hello, everyone first time we're here um so one well, I I have this uh issue that I'm working on and it's not even close to the verification step yet, but I just want to make sure that I have something like to be able to do it when it's it's time and I'm wondering like if there is any specific or accepted approach to verify self-managed uh changes. When you know you are concerned about any database performance issues.

F

If there is not enough data, you know like locally there's no problem, there's no issue because obviously there's not enough data in you know my local or uh self-managed uh different uh installation that I have so.

F

I was wondering if there is something that it's like like you do or if we just you know, turn the put your flag on in getlab.com and then try it on them or yeah like if it's just uh enough to do it like on the staging, which I know doesn't have like um a lot of data like gitlab.com does but yeah like just wondering how do you approach uh a new scenario like.

C

C

So is this a a feature that will run only on search, managed instances? It won't be available in gitlab.com.

F

That is the idea because it will block a new user signup. um So it's a new configuration, so I don't think we want to do that like on gitlab.com. I think uh we should add actually like the dot com um check to not do it on the comp. So yeah, then, if that is not like available in dot com, how would I go about? You know verifying the change.

E

Let me give a first answer there, because I had some time to check the scaling of self-managed instances due to usage ping performance.

E

So actually our databases schema is common on core edition c edition, enterprise edition, self-finished and github.com, so that first of all gives um makes keyclap.com a very good example for every other instance.

E

Secondly, when I had the check on different self-managed instances, the largest self-finished instances, which we could get usage ping had 15 times less users than gitlab.com um and 35 times less cie builds and like 30 40 times less um issues. So I checked issues, ci bills, ci and the users as examples. So what I believe is that any query or any code which runs good in gitlab.com and github.com, is very challenging actually and sometimes too challenging will run on particularly self-managed. Instances. Second thing is that omnibus installations have some more relaxed.

E

Postgres settings like statement timeouts, which is 15 seconds on gitlab.com, is actually 60 seconds. So that could also be a nice thing.

B

Okay, but one question though okay, so our database is like a lot larger, but compared to customers. Do we have better infrastructure structure, better processors for the database, instances compared to those machines, or even like having uh databases running on on on hard disks instead of ssds and stuff, like.

E

That uh I totally think we have, I was able to check three or four customer installations, and they really have um even the best customers have a bit less than you know what we have a lot.

C

Less, I think that uh the.

E

A

C

A very nice dashboard in periscope with uh data from self-hosted instances, so that you, the the user data and you can check memory cpus and everything.

C

uh I think that the our guidelines here is that, because, as toon said, some self was the host of instances, maybe under provision and also some self-hosted instances uh can grow to git labs gitlab.com.

C

Let's hope that we will have a self-hosted instances at that level at some point, so we always use gitlab.com as a as the comparison. So we always test, I guess gitlab.com scale, to make sure that everything must be able to run in gitlab.com.

C

The second part of your question is: okay: we have to test with the com, but we cannot test it with gitlab.com at the moment.

F

C

The idea there is exactly what we were discussing of with testing with production data at least database maintainers at the moment, have access to a replica to a thin clone from production, so can test against production data, and you could work with a database maintainer to test whatever feature you have against production data.

C

Am I right? I don't know if everyone agrees with that approach.

F

Okay, I think that that sounds uh good. uh Just another thing that okay, when so, do you think uh it's it's then not necessary to have like the don't run this uh on dot com or just run the the long query in this case um manually by a maintainer like in that case, would I be just verifying that the query does not run long enough that my uh process?

F

Will you know time out or do I leave it, as is like as if I can, you know just turn on the feature flag, try and then turn it back off.

F

Does that make sense.

C

Yeah, so so, if um the the the difficult part there, the expensive part is a query.

A

C

Can you can test against production data using the database labs? So if the only concern there is the query you you can also do that. uh Follow the guidelines on testing queries and you can you can use database labs and also involve a database reviewer and maintainer. If you need additional uh text, if there are additional parts of my of the update, that means the verification.

C

Then you need to do what I said.

F

E

F

E

Slightly related, you may also generate synthetic data, for any I mean table which is not. There are some examples of it, so you just create some data for something which doesn't exist on gitlab.com.

F

Do you have like any examples or anything like that.

E

I will find out um I've benefited from some other examples uh which are using bulk insert to generate like 10 million of some. You know rows.

F

E

C

F

C

Yeah, but we cover only I, if I recall correctly, 25 or tables, and uh we feel I think we feel a lot of data on you for 20 of those like issues users.

C

So we have this uh and you you can generate in a local environment with a lot of data, but uh it will be lots of data artificially generated data for projects, issues, notes comments uh and that kind of stuff. If what you want to test is not on those uh 20 core entities, then you're out of luck. There.

F

Okay, okay, one follow-up question on database lab: I'm not sure if this is known. um How long does it take for a new index to be applied to what we use on database lab because I've seen yeah? It's it's been. I think, like four days since uh an index that I was uh thinking was, there was not there. So I'm not sure. Well, I haven't really checked today, but I checked yesterday and it wasn't there. It was not my index, it was actually worked by someone else, but I was hoping it was there when.

C

I checked yesterday you added the index or is: is it an index that is in production.

F

I, no, I didn't add it, but it was already merged it was. It was already emerged. uh The the merchant.

A

Was already merged.

F

And I think deployed to production.

C

Okay, we have an issue right now. uh The data database labs is a little bit uh lagging with respect to production. In general, you should be able to see production updates in uh in a day. uh I'm.

F

C

But there is an issue and I know because I've seen uh somewhere in a comment- and I checked so we are lagging behind production at the moment. uh If you don't mind uh other comment in the the database.

F

Sure, yeah I'll I mean I'll recheck. If the the index is it's not there yet, but if, if not I'll, add a lot of comment there.

C

Yeah yeah, so there there is an issue there. This is a technical issue. Normally, normally we have a fresh.

B

Yeah, but nevertheless, you can check if the index exists with backslash di. I guess- and you also can add the index in database lab if it does not exist, so you can. You should be able to do your testing.

F

Right so I I could, just like add it manually for my session check like the timings and everything and then just okay, great thanks.

C

I was afraid for a while that you tried to add an index. It took for four.

A

C

C

Most indexes take, even if it is a over a larger table, it should not take more than 10 minutes and you you can then.

A

C

Remember that uh everything that you do in database lab is for the duration of your session. So.

A

C

The session, and then it expires during the overnight you have to do it again.

F

I would do it again: okay, yeah sure. Thank you yeah. That's uh that's it for me guys. Thank you.

C

Okay, thank you. I think that we covered everything any additional things you want to discuss or any comments.

C

Okay, so that's it! Thank you. Another database office hours is in the gun and talk to you next time.

F