GitLab Database Office Hours, 27 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Database Office Hours 2021-01-27 - Thin cloning demo, Database Lab, Database Migration Testing

Description

Agenda doc (internal): https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit#

We give a demo for accessing database thin clones using psql, talk about Database Lab and our plans to implement fully automated database (migration) testing

A

A

B

B

All right, so this is the database office. Hours call um cannot start right off. Today. I put a topic on the agenda about the thin cloning that we're doing on um using postgres, ai and database lab.

B

I wanted to give a quick demo of how that looks like with pc core, and we can talk about how we how we plan to use that going forward. um I don't think there is any other topics on the agenda feel free to add.

B

If there are more, has anybody like, I think, everybody's using postcards ai um right now right? um Has anybody used um p-sql access to a thin clone already right? So basically you can use um postcards ai database lab you go to the ui you log in you use um database lab, you can run queries, you can get query plans um and that's that's all great.

B

It also works with slack in a similar way, but a very nice way of using that is also creating a thin clone and then being able to access that using ap, sql client. So, basically the same same way. You can log into a production replica. Today you would use pc core log into that.

B

You can you can explore the data there, but you're directly connected to a production replica, which means it's it's part of the production cluster. You don't want to mess with that too much, and then um it's also read-only. So you can't even create uh temporary tables or tables for that matter or any indexes.

B

You can't mess with the data. You can't change anything and that that is sort of a limitation and with thin clones. What you can do is actually grab a thin clone. Make that your own, and then you have a fully redried database cluster. You can use p sql to access that and then you can create a nexus. You can change data or you can export data um any way you want, and then you you start over, you create another thin clone. Then you can start fresh and that's all within within seconds.

B

So that's a nice way of kind of using that.

B

Would it make sense to run through a quick, quick example of how that looks like cool? um This isn't very well documented yet so this is something that we will still have to do. Otherwise we just point to the documentation um for this. I'm just gonna share my my screen just a second.

B

Can you see my console all right, cool.

B

So basically, we have uh one uh gcp instance. Currently that runs uh database lab um this is um it's not exposing anything on the public network except for ssh? So uh what we have to do is a bit of ssh port forwarding to get there, um and this is there's basically two things that we want to do. One is talk to the api, so it has a nice uh command line.

B

Tool that you can use um has a nice api that you can use to create thin clones, and in order to do that, we still have to expose the the api somewhere because it's not on the public network. So what I do here- and this is this- is what you can do when you're set up with your ssh key and all that um I basically just forward the api port to my local machine. So basically, app is available on the port. On my localhost.

A

um Instance that we're connecting to.

B

A

Sorry for interrupting you, could you please magnify a bit. It's only me which I'm seeing a very like small.

B

A

B

You better all right thanks for noting all right, so this is just the ssh port forward for the api. um Let's give it a one instance that we're talking to, and then it's really as easy as um using the dblab cli tool uh you can do dblab clone create and then you can basically specify the posters user name and password that you want to use.

B

So this is the postgres account, that's going to be created for you to connect later, and that's basically it so this is you know what you're not seeing here, that there is one. There is a token that you have to configure, but that is that is basically all that you need to do to create a clone, and um this is the time it takes so about about 10 seconds, and you have this thin clone available.

B

What you're getting back there is the um connection information. So, basically, now on the database lab instance, we have a full postgres cluster running on this port and in order to connect that we can use those those user and password combination, uh we specified before the super secure password, um and we have to remember the port here, because this is what we have to forward again. It's only from the database lab instance. It's only exposed locally, so you can't connect to it from the outside.

B

For that we use ssh again and this time we basically need to make sure that we use uh we forward the right part. So this is, you know this is the same part, 660 10..

B

I forward that to my local part and using the same same instance again, and then what I can do is just use my v-sql uh client, whatever you like to your any ui tool, um you can connect to that that local port um and you're connected to a full read, write thin clone of the production database.

B

So you can see that here. This is the this is the actual copy of the production database, mine, terabyte and size. It's been renamed so gitlab hq production is the actual production database name and upon thin cloning, it's being renamed, because you want to have some indication that you're not working on a production instance.

B

Otherwise it's very easy to mess up and um drop tables in the wrong console. I guess so that's that's kind of uh help in that regard right and then what you can do anything you want with that instance. uh This is fully your own. um I can, I can create indexes.

B

um I can drop tables, update data, um everything and then I can just recycle that or create another thin clone and start over. Basically.

B

You have any questions so far.

C

Right, I wouldn't this requires to have an sh key deployed on the restore instance right.

B

uh On the database lab instance, which lives inside the good library store project, currently, yes,.

C

So that would be a request to you or to an access request, or how do we handle this.

B

This is a good question. This is something we are figuring out. Currently, I think it's a nexus request, um you- and I I don't know yet about the routing who who takes that. Currently. This is the.

C

Unless I guess, because it's a production data, I guess it needs to go through the usual path. With um I mean we need to make sure that that only people who are allowed to get production data access right, get this access so because.

B

It needs to be the.

C

Standard access request then yeah.

B

Yes, um I think so too for the for the ssh key setup for postgres ai. Anybody can basically start using that um with a good good laptop email address. You can log into the product on the site and you can start using that. But then you can only access the ui. You can only use database lab where you get the query plans and all that, um but you don't have a way of accessing the data directly like like. We just saw on that demo right so yeah for.

C

B

You want to have an access request for sure.

C

Okay makes sense, cool.

B

All right and then um just to show a quick use case what we just had this week. um There was a request for changing a bit of data in the database and basically we're using our database lab and thin clone to prepare data, because what you can do is you can create some tables right. This is something you can't do on a on a replica and then um what we basically have on on the local machine.

B

I have a csv file there, uh some data in it that I want to import- and I can just I can just go in and use copy for that which basically allows me to copy csv file into that table, and that goes from my local machine to the uh process cluster.

B

I can now work with that right um and then you can. You can do all sorts of things. um We ended up running a bunch of queries and then exporting data again um using copy. uh It goes in the same way you can copy uh from the cluster to your local machine, exporting csv, and that's that's uh really useful for preparing those changes, but you can also.

B

I use it basically on a on a daily basis to whenever I want to interact with the database, um I'm I don't connect anymore to the to the production replica.

B

Most of the time I just grab a thin clone, and I work with that because I'm I'm totally sure that I'm not going to mess with anything up there and I can just read, write data in any any way.

B

Cool and then basically the only thing left is the lifetime of that that clone, if you're not using it anymore, then it's going to be recycled and destroyed after a couple of hours. I think there is a setting where you can prevent that from happening. Much like the termination protection in gcp um other than that, it's gonna destroy itself off after a while or you can.

B

You can go in and remember that id. So this is the clone id and you can do that yourself. So, basically, you would use database labs again and you get the typing right and the pasting and just destroyed it again. Right.

D

I think I recall when, when this was first being discussed as a possibility, um because with every every time you make a clone it's taking up more and more space. Is there still any concerns about that type of limit? If there's a lot of people making these clones.

B

um Yes, this is still a concern. uh The the disk space is obviously not unlimited and the longer you leave a clone around the more space it's taking roughly and obviously, when you're making changes to the clone, then then this is also taking more space, um but it's not that that an additional clone is a full copy of the data. So this is this is incremental in a sense, um and at the moment I wouldn't bother being concerned about that. If, if we run into those limitations, then yeah, we need to do something about that.

B

More add more disk space.

C

B

Same goes for that termination protection when you enable that- and you forget about that- then we'll have some some kind of this event at some.

B

B

And then the usual caveats apply as well, just like with database lab. This is not a production instance, so it has a couple of differences in terms of index of instance, type um it's much smaller, and then it's also based on zfs for for those reasons for think loading. So that gives you different characteristics, so the the performance can be much different compared to production.

B

There is an interesting work going on where nikolai is proposing to implement an estimator so based on the on the performance that we see on database lab. What do you expect to see in production um and we would be able to estimate that for timing numbers, for example, but that is sort of still ongoing work right now.

A

And this is very nice. I wanted to ask if this is going to be more recommended instead of having access to a replica. So currently I have an access to replica and I'm was actually not using it lately, maybe in a couple of weeks, but would be maybe recommended to use this instead.

B

I think you can do both. um I can only tell from from my usage of this um I'm I've grown very used to just creating a thin clone, I'm working without um you're, more more flexible with with what you can do, um you're, not at risk at running gigantic queries and breaking breaking the production replica.

B

um On the flip side. It sometimes takes longer, so some queries are just very slow compared to production replica. um But if you can, if you can manage that um personally, I I use the thin clones very, very.

E

Often- and you also don't have the statement timeouts that you have in a replica if you want to do some explorative analysis.

B

Exactly and for analytical careers like in the situation when you you don't expect to change anything, so you don't really need a writable cluster. uh There is always the option to also use the archive replica for that, where we don't have those statement timelines either most of the time. I would expect this to be faster than working with database labs for those queries, but whenever you need the flexibility of being able to change things, database law is the only option that you have.

B

B

Yeah and sort of going forward what we would love to do, what is sort of a very natural uh thing to do with those thin clones, and we've talked about that before. Have we about connecting your development environment to that um running, migrations on a thin clone, and this is something that we're driving forward right now. It is not recommended to connect your development environment. Still, it's probably not going to happen that we recommend that, because of the you know, security concerns associated with that.

B

But what we are going to have is a environment where um these, uh for example, database migrations, are being kicked off automatically, um and this is sort of locked down a lockdown cio environment that runs automatically runs those migrations for you and you get some feedback on on the mr.

B

um So this is what we're currently working on to get going. There's an there's, a very minimal, minimal product out there. um I linked an example uh with the feedback that you that you would be getting back, so the workflow is basically you push a change uh with a migration, um a kicks off ci, which basically picks up another pipeline in a locked down environment. That runs the migrations for you, grabs a thin clone like we just did, runs the migrations and um get some statistics right now.

B

It's just runtime, so it just reports only back the run time for those migrations. But we have a lot of discussions going on what we can add. You can do query statistics, um you can do uh lock observations, so what kind of locks does this migration take? Are those dangers or not, and some and stuff like that, and all that will be reported back on the on the mr? So you get a comment with all those details um and for database maintainers.

B

You also have the ability to look at the locked down pipeline, so you would be able to see those basically the output of the migration that are being executed in case there are any errors or any log statements that you want to look at. So that's, hopefully useful.

E

Andreas, maybe it will be interesting to others if you walk through the diagram that we have in the gitlab database, testing and discuss. This is super technical, but at least I find it interesting if other people find it interesting on how this is going to work. Sure, because.

B

E

There is that we are going to test all migrations for all our mars, so this is huge and.

D

E

Address just presented will be automatic in all the cia page lens, so yeah, and so if andreas wants to.

B

B

Yeah I can try. Not uh those pictures are not so sophisticated. uh Take them with a grain of salt.

B

um Can you see my browser all right? um I hope you can read it. I um probably increasing that a little bit.

D

B

Okay, um so this is on the on the database team group. um There's this gitlab com database testing project.

B

We have just renamed that from migration testing to database testing, because migrations is a huge thing that we can do, but we can also do more, so we can also, we think, also about um getting automatically getting query um plans for your changes. Basically, so um this is more than just migration testing, uh at least the idea.

B

uh We can see how we get there. um There is a bit of readme, um but the um the the basic problem that we're solving I mean running, taking migrations and running them on the thin clone is not very difficult right. I mean we've just seen how we create that thin clone. You can connect to that using psql and, of course, you can also configure your gdk environment to or gitlab environment, to run those migrations on that. So that's not super difficult to do.

B

What is sort of most concerning in that context is the security aspect and the fact that we're working with with the full copy of the production database. This is considered red data, um so they're the most important data for us. The one one that we have to protect the most and as such, what we can't do is sort of add a job to our regular ci pipeline.

B

That just runs, you know, grabs a thin clone and just executes the migrations, and on top of that data, the reason why we can't do that is would basically be able to send any kind of code any any type of migration, but also any kind of malicious code.

B

If you, if you, you know, uh become creative about that, and you would be able to inject that into an environment where you run that on the production data and and be able to observe the output of that, and potentially also, I don't know, uh copy that data somewhere.

B

So there is no limitation with regards to to network isolation and um and those kind of things, so uh we can't have it in that open way and that's why we need a more locked down product and what we're basically working on right now is having um a separate project. This is what you can see here. This is being mirrored to the ops instance. We have the ops gitlab net instance, which is a private instance that we run, and that has a mirror of that project where those pipelines execute and basically only in the future.

B

Only database maintainers would have access to that project and the pipelines and the output of all that.

B

And in order to run the pipeline on that project, we have two runners: one is called the builder, the other one, the worker, and basically the idea is- and I don't know if that translates very well with our picture.

B

But basically the idea is that we have this builder. It's a standard, shell executor that is building docker images, for you so um doesn't do anything else. It doesn't run any code. It just builds docker images, um it pushes those docker images to the to the registry that lives on the other runner um and basically the worker runner executes from that registry, and it doesn't have any other network connectivity, so it can't connect to the outside world really except for its own local registry.

B

So it's basically similar to the idea of how do you assign ssl keys, where you can't have any network connectivity used? You build something on a regular network and then you inject that into a locked down environment where you do the signing of the running and the more security uh related stuff, um and this is the environment that you control more. This is what we're doing here as well um internally. This is um spinning up um a couple of uh services, so um one is redis, there's a standard radius.

B

We need that to run the migrations, the other one that we wouldn't need is a postgres instance. So in cr you would see a postcode instance instead of this one.

B

uh What we do here is we basically run a service that performs the same steps as we just saw in the demo, so ssh port forwarding to the api um using the dblab or cli to create a thin clone and then port forwarding that so it's sort of um it looks like a postcard from the outside.

B

It exports the postgres port, but in fact it just forwards to the database lab instance, and um this is the only hole that sort of is in the network here, other than that the this container basically runs um gitlab rails and also in a way where you, on top of all the isolation that we already have so on top of limiting the network. From for this runner, we also have ib tables going on in this container.

B

um So there is no network connectivity, except for the local docker network talking to to those services, and this is actually where the migrations execute. So um this can't talk to anything else than the postgres and the radisson. That's it um it executes the migrations um and it basically produces an artifact which is a json file with all the statistics right now. It's just the information that which migrations ran and how long that took basically, but we would drop any all those query, statistics and everything else we want to report on.

B

We would drop that into a json file as an artifact, and then there is a final job going on that sort of picks up that json file. That runs on this builder runner again, because this is the one that can communicate to the outside world.

B

We can also communicate back to the gitlab com, merge request, so we're using the json file and we push a comment or whatever makes sense, to the original, much request with those with with those statistics and reports that you can see here, um yeah, that's what we currently have is the runtime of migration. This can be much more.

B

Yeah, I think that that's basically the idea, um sort of and yeah the biggest concern is really the the security side. How do we make sure that um nothing escapes and we can't run sort of? Well? We have some controls over what kind of code runs on production data and who can sort of see the output of that code.

B

B

Any thoughts about that.

A

This is great, I think.

D

A

Is going to be used a lot and looks really really nice? I think yeah thanks for working on this, everyone.

B

Cool really excited about that as well. um I think it's a major step for us to to run this automatically and get the feedback. We hope this is going to be useful.

A

Totally, I think this is going to be very, very useful. I'm looking forward to see this in this output in mrs and try to understand more about how the code that we write is affecting or touching databases.

B

And we just added that or we're just about to merge that I think that um this is being triggered already. So, um even though so there's there's going to be a job on the regular ci pipeline. Soon, that's triggering those those pop up, those testing pipelines- um it's probably not gonna, be available for all the mr.

B

So we're still, you know early phase testing that, um but what I've already seen is that this is also very fast in terms of how fast you get feedback from that the job kicks off very quickly, so pretty much in the beginning, beginning of the regular pipeline we triggered the other pipeline and given that most of the things are being cached, so the docker cache is pretty good, um there's not much. We need to do on that, and the thin cloning is very fast as well. It takes like 10 seconds to get that clone.

B

You can expect to get feedback from the migration, maybe a couple of minutes after it started plus the execution time for the migration. Obviously, that's kind of nice getting that so.

B

Quickly, if there is something that you that you want to see from from that testing kind of statistics or what metrics can we add, I'm sure there is a lot that we can do that. That will be useful. We have an epic.

B

On our tracker- which I don't have handy right now, but I'll put that to the agenda in a moment, if you have ideas about that, be awesome to hear them, leave a comment on them on the on the epic or create an issue. That'd be.

B

B

Oh yeah, and in addition to that there is the blueprint for database testing. I also linked that on the agenda. um That is basically a summary of what we just talked about. If you want to leave feedback for that, it's also much appreciated.

B

B

Great, so much for the database testing do we have other topics very other thoughts.

B

B

You have a nice shirt steve, but I can also only see half of the tanuki. Is it the uh standard, gitlab shirt right, nice.

D

The standard one.

B

We just got some swag as well. uh It's a bit hard to put on.

E

B

E

Have a logo now.

B

Yeah, you can see it on the issue tracker as well.

B

Cool, if you don't have more topics, you all have a good day. It was nice, seeing you.

A

Thanks a lot for presentation have a.

B