Kubernetes Data Protection Working Group, 9 Sep 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: WG Data Protection Meeting 20200909

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right uh today is wednesday september 9th, and this is a kubernetes data production working group meeting. uh Let me share with you guys my screen. uh This web client is so hard to use.

A

A

Can you guys see my screen by any chance.

B

It's coming yeah, we can see. It now sounds.

A

Great, uh the couple of discussions happened in the past two weeks. uh Some of them I was part of- uh and we discussed, the major thing we discussed is to cover the uh existing very representative representative applications and how they conduct snapshot and and the backup- and uh uh we worked around some of those applications. Me particularly worked on google, there, uh the kafka one and they are mysql and the others today. The first item is to go through that. What do we have?

A

One of the big purposes that we put all these use cases into our kind of white paper and try to provide from or perspective what do we think those how those applications should be working and that's one and the other aspect is really to see whether the uh up the container notifier, aka application execution hook can support those scenarios.

A

So uh that's the second items and I will go through what or what are all the updates on the container notifier. I think we have made to a point more or less. We can we're ready for finding a cap.

A

The last things we are doing at this moment are to adjust the kept document slightly and then once we have that done, I think xin. Our plan is to file the cap as soon as possible. Right and then uh we will open the discussion to the group. uh Any one of you, john tom. You want to get started.

C

uh Yeah, let me let me grab the document that we were working off of give me a moment. Sure.

A

I'll stop sharing.

A

I think we have a a general doc where all the items, at least then we can go through one by one right.

C

Yeah, that's a good idea. I can have that dock open. So let me let me grab that one I'll put in the main dock as well. Thank you.

C

So many docs at this.

C

C

Okay, so I'll show the main dock.

C

C

uh So here uh can everyone see my screen? Okay, yep uh great. So this is uh the dock. We're talking about quiesce hooks.

C

I think the the approach, the high level approach that we wanted to take was to look at different applications and see what kind of hooks that they require to take application, consistent, backups and so really. Our motivation here is to help drive the primitives we need in kubernetes and specifically, in this context, the container notifier api.

C

um There are obviously many approaches to backup applications, so we decided not to make this an exhaustive list here right. I think there's other sections of the main uh white paper that we'll add that will kind of go into that little more detail. um There's also many applications that provide their own data protection.

C

um It can be things that are outside of kubernetes. You know one example would be. um You know cloud data services like uh like rds, for example: maybe they have their own backup mechanisms. We decided to not include that in this section specifically and is very application-specific things. So um an example here would be kafka which uh shawn and I looked at there.

C

uh The other thing we we, I think, wanted to talk about was the scope of these hooks. So we wanted to figure out. uh You know what what kind of commands we'd actually be running in these hooks. You can imagine very, very general commands you know doing anything from uh issuing arbitrary apis to applications or you can make them very specific. You know you can have very specific volume freeze on freeze, you can make it very specific to an application. um You know like flush tables with reedlock, for example.

C

We kind of went through we'll go through this later, but we figure out what what examples we wanted. uh We discussed the different mechanisms, um so this is kind of you know. I think more, what's well outlined in the container notifier design dock that we have going um so I'll skip over. That section.

C

We did discuss execution controls, um but I think again that will fall into the domain of the container notifier right. If we, if we figure out some applications, we'll need specific controls around retries or timeouts or those kind of things um you know, I think it'll have to be part of the container notifier api as well.

C

uh So after that, we really just categorize the different types of of databases, um and then we kind of went into specifics from there, um and so I think we we had different sections. So um you know we can start with kafka, because that's the one that sean and I worked on and then we can jump into the other ones, but just really quickly. At a high level. We talk about relational databases, so things like mysql and postgres. um We talked about time series databases so in flux, db from etfs.

C

That kind of thing- and I think we're shown to look at that one. um We also talked about um key value stores like redis.

C

This one was assigned to me, but I didn't get to that. uh We talked about kafka, so sean and I like that one distributed database like mongodb, so it's in here this is great. Actually I realize everyone updated this. This document with with the hooks so sean and I worked in a separate document, um but we'll we'll go into that.

C

um So with that, let's go into the specific databases. Maybe each person can talk. I can present still if that's easier, but maybe each person who worked on each section can talk about what it takes to back them up and we can talk about how that applies to the various um hooks. So I guess uh fung. Do you want to talk about my sequel here.

D

uh Yeah, I can talk briefly about my sequel, so um basically the the workflow is almost every every database would be similar in term that first you have to quiet the database operation in one way or another. Then you back up all the resistant volume being used by that application in the approach that we are using. We just take a snapshot of the pvc, but some other application.

D

They might use some other way and then we undergoes right. Those are the general steps for almost all the database that we interact with um then, but mysql will have. A special thing is that we, if you connect it to a mysql server and run a lock and as soon as you disconnecting from that the lock will be automatically released, so you have to do a little bit trick there. So if you look into my uh ps from queer second you're gonna see that I propose a streak there.

D

So that means that you, instead of running across command you script and that script you're going to be pushing into a background with the sleep so that it will keep the session running. So, even when you, um the container notifier command exit, the uh skip will be running in the background and then you go on and the controller will go on and take a snapshot and in the unquest, which is another script, to searching for the the script, the queer script and kill it.

D

That's that is just uh the general workflow on for that one and then, after you know, queers and queers, you go to a phase of cleanup, so we clean up the you also because we is using um container notifier in this case right so uh for each of the queens and enquires. We issues a notification, then, in the cleanup we have to delete these notification objects.

D

um If, in case of the right, you cannot guarantee that everything will be successful in case of the failure you might have to do some actually an additional cleanup like if you, let's just say you, have three snapshots and only two of them success. You cannot go on and back up it right, so you have to somehow go back and clean up the tool that you already create so that it will not occupy the um take over. Take the resource from the system, so that is with my sequel.

D

I saw many uh some people doing the quiz a little bit different. I I I don't know exactly how they deal with the fact that my people were disconnecting from that from the clients when it's exit from the command. So I have to look further for that one. But this is what I have and I want to share.

D

C

And I guess what's interesting here from a requirement perspective, is that uh you'll have to run the the hooks in parallel with the snapshot right.

C

Essentially, the we'll have to the quietest hook will need to be running the sleep command, while the snapshot is being taken.

D

Yeah yeah, that's the this is how you can guarantee up consistent for my sequel. Otherwise you know my sql operation is still going on and so on and so forth. The the quiz is very simple: it's a flush out table with the read lock. So as soon as you lock that, then you know, customer user can continue reading, I think from the database as well, but they were not able to write to it so, which is said for the app consistent from my point of view.

C

D

Does this work.

C

With the container notifier proposal, um having that kind of run in parallel.

B

Yeah, I'm not sure this is this one's a little weird, but I did do a prototyping with mysql for using the execute uh the execution hook. The cra approach yeah um yeah this one. No, we haven't really get to the coding part yet, but for the initial prototyping we are going to do exact anyway. So uh so that's going to be similar, we're not going to start inside the couplet when we're doing prototyping yeah.

C

I guess even with exec you'll still it'll still be a little challenging, though, because you'll have to have.

E

C

Running at the same time, all while the other stuff happens right, the snapshot was taken.

D

Yeah, I do not have implement using uh notifier container file yet because, obviously, it's not available for us to use, but I'm using um port exact from valero and it works.

B

Yeah so yeah, that's what I was using. So I'm saying that for the prototyping for content notification that we will also be using exact approach. Actually, so it's going to be similar, yeah uh yeah, but if once we really switch to equivalent I'm not not sure uh but because cubelet internally, it's also doing an exact um so yeah, I don't know if there's any difference there. If you know if we do poc, it works with a controller approach that doesn't work with that that we have to see.

D

B

D

Think we have to discuss it in the implementation detail right. This is just a proposed general workflow right. We yeah the exactly command, might change when we run into the problem during implementation.

D

It would be exacts for valero part exact, so it works there and yeah.

F

Jing, I didn't understand the issue here. Look, it seems as far as the steps but quieting taking pvc snapshots and on quiet thing, these are all done. Serially what's different is that the semantics of quiet is different for my sequel, in a sense that the application is not completely quiet, it allows read access, but that's harmless as far as backup is concerned, so I think the workflow is quite actually general generic can be applied to other applications. It's just that the semantics of quies may be different for different applications.

F

But as far as we are concerned, the fact that choirs taking pvc snaps on choice are done. Serially, you're, safe, there's, no.

B

Yeah, I think, for the application from the application point of view. I was just I'm just not quite sure, I'm not that familiar with cube lady, I'm just since. uh Eventually we wanted to do the swinging people, I'm not sure if there's any anything extra there. That's what I'm trying to say since I haven't really uh coding directly inside kublet and then the plc we're going to do even with the container notification is going to be in a separate controller.

B

To start with, uh so we'll have to check with the signal they probably you know understand more from this uh point of view.

A

B

A

G

Talking about is those steps.

B

Happens in different components.

A

B

Those are the sequences.

A

B

Yeah that part, that part, is not changed right yeah that part.

H

A

G

Really depends on upper.

E

A

E

To guarantee the sequence of it,.

A

The order of it.

B

Because, normally you would have the external controller will have to uh check the status of the notification, so uh in this case I'm not sure if q uh or whatever cubelet or this other controller will be able to update the status. Yet so that's, I think, that's the that's the only thing so you know.

E

B

When I was doing prototyping, this was some time ago with the execution hook. I basically just do it immediately right, because you, if you are keep running that and you wouldn't know, then then that says, cubelet or something or execution controller extreme controller. It's not going to update the notification status and then so for this external controller. What how is it going to know that it's really.

A

B

Notification status, but then, in this case, if it keeps running it's not going to return status, I think that's the that's the uh problem here with this particular, I think, let's not just to get stuck on this first one. I think that mexico is very special. We can come back and talk about this. uh Can we maybe move on to this other uh databases? We can come back. I think mexico is really special.

D

I I agree that I agree that my sequel is very special.

D

We deal with uh is the most complicated.

B

Yeah because you're, basically, when you're doing your thing, it's it's not coming back right. So if it's not coming back, then you wouldn't have a then kubelet won't be able to change the status of the notification object which this.

D

um The detail offline to uh and um to show the exact status uh how you know that is successful it it. We can do that. It's just that. um I do not show all the detail here.

B

Okay, yeah, we can probably this one. We can, probably you know, dig more offline yeah. We don't have to get stuck on this one. One case.

C

Yeah, that sounds good, and even if you close your session, you still, you can still take a best effort snapshot afterwards, which uh we'll at least have all the data up to the date when you're in the flush tables right, which I think is what happens with this.

B

uh Correct but then in that case you wouldn't.

I

There would be any crash, consistent backup, in that case systems yeah.

B

Right yeah not just means your mark mark the backup as inconsistent or something so that user no.

C

C

uh Prashanta, do you want to talk about the time series, databases sure.

I

uh Can anyone hear me well yeah, yeah, okay, so what I did was I researched three different time series databases, no db prometheus and influx db and uh at a high level what I realized. What I felt was. These databases are relatively newer to you know my sequel, postgres and other relational or object storage, disputed databases that I've been around for a while, and these databases are so mature that they are providing.

I

You know, backup and restore utilities which are not only just qsing the application, but actually creating the backup object for the user as well right. um So as a result of that, you know the data protection, uh the data protection challenge is greatly simplified uh from a vendor perspective. It's as easy as running those commands for taking a backup- and you know doing a restore and pointing the database to that restored archive or whenever you are restoring the contents, it's as simple as that, so we can go in order.

I

What I've done here is kind of looked at each of these databases uh differently. I've kind of looked at uh you know chopped out each database in these four segments. uh We're talking about the overview, uh you know what is the backup strategy to use? What are the resource strategies to use and a set of commands that will give you the full backup. You know full, restore incremental, backup, incremental, restore, and you know any kind of transactional log copying or journal copying of that nature.

I

So, speaking of newer db, this one uh has all the bells and whistles you know to do you know offline backups if you want to shut down your database and take a backup or if you want to do it online, while the database is running, it supports doing a full backup of the entire database, incremental backup of the database and also copying of the transactional logs. So all are just single line. You know cli options that you can run against the database and creates all that for you from a strategy perspective.

I

All that needs to be done is you need to create a backup set against the application?

I

You should be doing obviously low frequency, full backups, high frequency, incremental backups and then highest frequency journal copies to get the point in time. Recovery based on your logs.

B

A question the incremental backup number three is that under online offline or both.

I

Everything is on so whatever I've documented here is all uh online. I've just said that, okay, that that is offline, but you know you don't really want to shut down your storage managers and things like that.

E

I

More interesting piece is the online backup which I'm focusing on.

B

I

Okay, so uh yeah, so all the full backups incremental backups and the journal copies are online activities.

I

uh What happens is there is a directory that is created where all these items get placed, and you know, depending upon the command, that you use, whether it's a command for a full, incremental or journal, and then, similarly, when you restore it, you just have to point to that backup set where these items are placed or point to an individual, full backup or an incremental backup and the target cluster or you know, the restored application, would uh restore it and start using that data for its activities or for its configuration moving forward.

I

So very straightforward uh didn't find any challenges in terms of you know what is required, the documentation, you know the documentation is pretty large, but I've kind of summarized it in terms of this is how a successful backup and restore should happen.

C

um How do you map this to kubernetes? So if we, the primitives, we have right now are essentially long snapshots and then we're talking about adding the s hooks. So I think.

I

The difference that I see is you know with uh traditional databases, we would qs it and then we would take a snapshot. uh You know here. I would say that we would probably just depend upon the application to you know: do the qs and create the copy of the backup we would, you know, take a snapshot of the persistent volume still and then you know the hooks or whatever the injection commands are.

I

There needs to be a workflow where the user can specify and provide this background, or you know upon restore, can specify these commands and things that you need to needs to do to massage the content afterwards.

I

But from our like from our data protection vendor perspective, it should just be a simple copy of the application, along with the persistent volumes, the data is captured in the persistent volume and when you're restoring it you, you know, hook into that backup object again.

A

H

Look looks like all of this backing up and the.

A

H

Is file level operations right? This copying file connect.

A

Some folder into another one, but in order case we will be doing pvpcs and one snapshot api.

A

I guess the question is: how does that fit into the picture?.

I

So I think if we are it's because it is happening at the file level, there is uh you know. Definitely you know there are these uh pointers and everything which they are using to just create that snapshot object and from an api perspective. If I understand the uh the question correctly, I don't think there should be any difference, because, as long as we are capturing the pv, you know using any kind of snapshotting technology via csi or something we should be able to capture all the file level items without any issues.

B

But what other sequence then? Who is creating those uh api objects.

I

The api of uh you mean the api.

B

Like the pvc yeah like the pvc pv objects who are creating those and at what time, when does that happen? I mean this sequence. I'm not clear to me.

I

So basically, what is happening is you know when we are taking the full backup full restores. You know it is creating the backup object and putting it on the persistent volume itself that was used by the database right. So as long as you take that persistent volume and you map it to another instance of the database- and you know make that instant point to the backup object after restoring it, uh it should be good enough to come up.

H

So that pvc needs to have double the size of the existing database to accommodate the copy.

I

No, I wouldn't say it's double because you know it because it's a file level thing: it must be using pointers and it uh you know it doesn't. It doesn't increase the capacity doubly or something like that. It does use space, efficient uh mechanisms underneath the covers. So while there is some amount of data that will be created, it's not going to double it uh right off the bat.

A

Oh okay, it looks to me it's.

I

So when you are doing a restore, uh you are still, I mean there is a restore command that needs to work on the backup object to restore the contents into a format that the newer or the target database can understand.

I

So you still need to perform that restore operation, and then the you know connect that database to that restored object and then everything will be functioning again.

A

uh Sure, but if both.

D

A

Data and the thumb sits.

E

A

Right then, the application is.

I

Yes, so that's why you need to map it. uh The real data is not the consistent data right. The consistent data is the backup object that you created.

I

So while both of them are on the same tv, you will want to change like you know, you can delete the original data if you need to and just map to the uh you know the backed up data and do it that way, but you technically do not need the original data after you do the restore okay right and the limited example that I have next is exactly that. uh You know where the uh where the snapshot, you know the steps and everything pretty much the same.

I

There is a snapshot, copy or snapshot command that you run, which creates that snapshot on the same pv, where the prometheus data database is pulling all its information from now when you're restoring it. All you need to do is map that part storage, dot, time series database path to that snapshot that you had just collected.

I

So it's as uh it seems I mean it's very straightforward, because you know all the qs inconsistency. Everything is taken care of by the application that is putting it into the same pv that you were already using and from a backup restore perspective. It's about mapping it to the right, consistent object upon restore.

A

Okay, uh where are the queer scene hooks executed.

I

uh When you say where are the qsync hooks from the applications perspective or from the uh you know, from our perspective from kubernetes from kubernetes, there are, there is no specific hooks. I would say that are technically required here other than the fact that sorry, there is no qsing piece of a hook required here other than the fact that you should be able to hook into these cli libraries or the command api libraries that these applications provide.

B

But what is the equivalent of like the like, lock table? Is there some command that is required to quest.

I

Oh no, as long as you run like, for example, if you see the full backup command about about no cmd hot copy database, uh this will qs. This will do everything underneath.

D

Everything exactly.

I

So I mean, as I went through these you know, that was my understanding. You know it seems like these newer databases have seen the challenges, the problems that have been faced with all the other databases that we've been dealing with so far and they've just provided. You know very simple, straightforward ways to do these things and the newer db. uh You know the full backup incremental backup journal copy, resembles a lot of a microsoft sql server in terms of how they provide their backup capabilities as well.

I

Journal copy is basically the transactional logs and they provide the same kind of full incremental transaction block copy to manage your overall strategy around.

A

It when you say incremental, do you know how to manage the history.

I

uh Not really, I can do some more digging around it, but uh but yeah I'm not 100 sure I would assume it's. You know just uh cbt kind of stuff and they may have their own uh handlers to figure out what the change blocks was.

I

uh But I can dig into it more to see what they are doing.

A

I see and when you run the command, I assume that tam hot copy the date needs to be an input.

I

Yes, yes, so uh what I'll do is I'll update the cli command reference as well, but uh yeah? You need to provide the. uh Are you talking about the backup or are you talking about the restore.

A

I

Yeah, so technically, when you're doing the backup, you need to point to the backup set where it's going to be stored and obviously the database name and then, when you're, restoring it you're pointing to the backup object and the restore location where you want to restore it.

A

Okay, so when you take uh uh when you, whenever you run this command, you will create the database. We will create uh an incremental or full backup right. Then, actually the size might be much bigger than the database itself, because you might end up running this comment multiple times no correct.

I

Yes, yes, so those are things that we definitely need to keep track of, and uh the other thing that I did not add over here is you know. The new db and influx db also provide policies. Okay, so they have their own policy schedules that can manage these incremental backups delete the oldest one. uh You know when the latest one is taken and those other things, so the user could leverage those items.

I

You know, if he's just managing a single application directly or if he is managing multiple applications which have the same kind of strategies, then he would need like a single solution that can do all these together.

A

Sure one last dumb question: how do we, how does the external be the upper level controller snapshot, controller, know the destination.

E

A

E

A

Backup and restore and map being able to map map that directory to a certain and pbpc.

I

So right now, as I said, the backup directory is the directory or a folder structure on the persistent volume itself, but you can configure additional controllers to map it to an external s3 repository or something as well um so to I I'm not sure if I'm answering your question correctly, uh no.

A

No, no, the the uh those data is right how many pvp uses they have.

I

um I mean they could have multiple right, for example,.

A

I

A

I

A

We know how do we know which pvpc to take a snapshot to capture the dump.

I

um I'm assuming that it would be, it would be, placing it on some master controller object. Where you know everything is stored together. I can take in that for you and figure it out yeah, it's a good question. I did not look into it yeah. I think.

F

The number of pvcs tvs don't matter here, because the artifact of the backup operation is a file. So if the database spreads around spread across multiple tvs, it shouldn't matter, because the artifact is a single file. That's.

I

A good point, that's a good point and I think for influx db. If we scroll uh scroll a bit up down- and I think that's exactly what uh they're also doing- they basically create these three objects. You know they you, you can have multiple shards, but it just creates a single shard object for as part of the backup item that it creates.

E

So is this similar to say rather than snapshotting, my sequel, this would be doing a mysql dump and then you you've got to dispose of the file. Is that the analogy here.

I

Yes, yes uh yeah, I I looked into there's no single way of just qsing the data and then taking snapshots of the pvs. uh You know this uh ability to do snapshotting, plus dump by the application, was the only method that I saw across these three databases.

F

Yeah, I think these are more into snapshots than to backups, because we still have to rely on pv backups so that the artifacts are stored elsewhere.

I

They do allow your purposes yeah, they do. Allow you to store the artifacts elsewhere as well.

F

But the only way you can do it here is that you can have a different pv attached and your backup files end up in the other pv, and that way it's like a remote device, but that's a little.

I

I mean it's it's more about. You know if, if you're not using a centralized data protection solution and if you're just using the application, uh you know pieces to protect it, then that would definitely make sense. I think the question or the strategy from from a data protection solution perspective should be uh having the ability to run these api commands. uh I believe you know before we take a backup.

I

You really do your own kind of backup of the persistent modeling and before you restore you are providing some capability to massage the content before the actual resource right.

F

Yeah, I would imagine basically, the backup commands like if you scroll up a little bit like those are run as part of the quietest thing. The application exactly and.

I

You also need to have.

F

We need some capability to run, custom commands and restore.

I

Yes, yes, and especially with what the thing that we need to keep in mind, is you know once we if we are moving these applications uh from one side to another side and we are qsing it? You know the application is still in a qs state when we are going to be restoring it so upon the restore. Also, there needs to be a nqs command that might need to be executed.

I

You know, normally, if we think about legacy days before humanities, we would take the data volumes and we would move that across multiple instances, so we never really had to do that on qs again, you know on the application side, but here, if you're not just about removing the data volume for the entire application as well, then.

E

Qs would become.

I

Something that you would want to run against the app before resuming operations.

F

Yeah, this is very similar to a cd snapshot and for sd.

I

Snapshot there.

F

Quiet, it's just restarting its city with a new data volume. So I would imagine exactly some.

I

F

Would be like that, like there is no unquestioning or quality, definitely yeah, I think so.

I

Yeah yeah you've got to look at it at two levels: right: the applications where, like you know these time, series and cd that provide everything built in versus you know, uh databases like mysql and so on, which just provided up to the qs import. I think that's how the high level you know workflows would be differentiated. In my opinion,.

F

So these are incremental backup files, or you know, journal copy files that you generate as part of running these commands. Are they self-contained or you need a previous history to be able to you consume them like we store them.

I

um So everything follows the concept of a backup set. So if you're running an incremental backup and you're pointing to that slash, dem hot copy directory and that hot copy directory was a backup set, then it will use the data within that directory to do the change block, tracking and figure out what the incremental pieces are that need to go into that particular directory. You can have multiple directories for the same kind of backup as well, so it just follows the hierarchy, structure of a backup set and everything gets put into that backup set.

F

Okay, so what you're saying is that, like inside the slash stream hot copy directory, that's where all the incremental backups are stored and the new cnd is smart enough to know when the last backup was generated and it just generates the diff. From that point on.

I

Yeah and it also tags sub directories as full incremental journal and puts those copies in it, and also you know there will be full hyphen one full hyphen, two for iphone three, so that is, I believe, it's following that kind of hierarchy and the naming convention to map which you know, which incremental backup is the next in sequence and stuff.

I

I see yeah overall I mean I would say that definitely simplifies life from uh you know: recovery, migration, perspective and so on, like uh if we, if just another point I'll, add on in flux db, they also have the capability of doing continuous disaster recovery. So you could have two influx db instances and you can just start keep importing data from one instance into the other instance and provide a time into it.

A

Yes, that has worked for those applications.

H

Anyway, thanks.

A

uh This is a lot of information.

E

We need to consume, but we still got a couple more.

J

A

Cover can we, uh if.

I

You don't mind we're gonna move to another.

A

C

Yep sure anthony do you want to go through manga.

G

Yeah I can, I can go through it quickly, uh yeah, so I spent the time to look at mongodb and for mongodb, as opposed to my sequel, where you have to keep that station running during the the choirs command. Mongodb provides the a similar command to run the choirs and enquires, but for this one you don't have to keep the session running so once you run the quiz command. This is the db fsync lock.

G

So mongodb will return like a json document that contains the the response status. So this will be.

G

You know the status of the the command and also it keeps track of the lock count so that, for every lock that you run, um mongodb will keep the account and it will keep incrementing that uh for every um requires for every dbf sync log command that you run so there should be a corresponding unlock command that will decrement that log count and for the for the database to be unlocked. That log count has to be zero.

G

So you know comparing mongodb to my sequel: it's running the quiet and quiet, it's pretty straightforward, the only um one of the so I looked at the this. Is the shaded mongodb this one there's one extra step that uses to run and this is to disable the cluster load, the cluster balancer.

G

So for solid cluster you have. The data is um distributed among the the shads um and before you take the backup, you first have to stop the balancer, because if, um if there is any data migration, that's happening between the shards uh during the backup, then you can. You might end up with an inconsistent backup. So it's very important to stop the balancer before the backup um and that's what I guess.

C

uh That's on each replica right.

G

C

Yeah, so it's interesting here for the for the from the perspective of the container notifier is that you have to do some a little bit of coordination between what happens on each replica. It seems like for char, you have to execute commands on each tropical first and then you can do the fsync and then the snapshot.

G

Right, actually, it seems you just need to connect to from the documentation you just connect to any of the the mongos and you can run the the stop balancer command and then so. This will stop. The cluster balancer and mongodb also provides two functions that you can use to query the status of the the balancer. This is the you can you can use the get balance state and there's another one. There is balancer running, so this one will enable you to to verify the status of the balancer.

G

So once you verify that the balancer is disabled, um you need to backup the config database. I think this is where the mongodb stores the metadata for the shaded cluster um and then so it depends now. There are some options where, if you choose precision, you you'll need to run the choice command so lock, all the you know all basically lock the entire cluster you you need to run the the db fsync lock command uh on all the I believe for each replica.

G

uh This is prior to doing the previous snapshot, so you lock, basically the entire cluster. Alternatively, you can just run the the lock command on the second remember, which shard a replica set. This is in case you don't want. You know you don't have to have your all the whole cluster locked.

A

uh Interestingly, are those api calls uh all those is there any like share script or whatever, with the uh can be hooked into the execution hook?.

G

Right so this will be run on the shell, uh so, for example, with the pod exec um on the shell of the you know, the mongodb container.

C

Okay protocol, so you can execute it through the shell or you can you can connect to the cluster remotely right.

G

Right yeah, I tested this with I've done this with uh using the valero that, like found mentioned development exactly so this one. I was about to get it running with the pod exec. But yes, if you shell, if you open a shell to one of the containers, then you can you can down this same command.

I

And all the shots, is it like a one to one mapping between a shot and a pv.

G

Right so from the deployments I tried, I yeah, I saw there's just one pvc per shard, so each shot is uh it's a stateful set uh and you cannot. You love like the primary member and any number of uh secondaries for each shot right.

I

And the uh the snapshotting and everything can happen like in parallel for each of those pvcs right.

G

Right so I think on step, uh so once you acquire the uh you know depending if it requires the whole application or each secondary member, then uh the pvcs need to be snapshotted in parallel. I think just to make sure that uh you know they are consistent.

B

But if they're all quiet they don't have to right, I mean there is no guarantee that they are snapshotted at exactly the same time.

G

J

Yeah, I have one question: when you do uh sync log right have sync log, so that means all right operation will stop.

G

Right yeah, so that's what will happen.

J

To the right operation, if, uh during the back up period the if anybody is right trying to write what will happen to those right.

G

um So I think the right is.

G

They'll be cured or uh yeah, it will be cued um yeah, it will be cured and then, once you unlock that's when now the right will be committed.

J

Okay, so that means it's kind of a outage for application right if they're, using the application actively yeah. I.

G

J

G

J

In general, in general, mongodb also suggests to take backup using ops manager. They have a one tool right uh and it takes the back one s3 or any other devices, and you can restore point in time. Recovery, I'm not sure. By taking snapshot, can we restore pointing time recovery.

C

We cannot because we're we're essentially just talking the volume data, ops manager.

E

C

uh Al is like a logical level dump and then it uses the ops log to do point in time. I think which.

E

Is like just the.

C

Journal basically, okay, I from my experience uh the logical dumps for are very slow for any reasonably sized database. We, you know, uh we only. We've only been able to use volume snapshots, but that's you know, that's my anecdotal experience.

A

Yeah, that's that's very typical to conduct even mysql dump. Is it's pretty slow compared to taking a snapshot of a pvc.

J

No, but if you use ops manager right, they have the backup uh vrs, they call backup recovery service, they do a snapshot. Take this snapshot and every 15 minutes one hour, and then they took the op log. Is there upload it takes the incremental changes, so you can restore to any point in time recovery. It creates a time.

A

Sure sure, but that's how I connect to a specific application. That's that's actually what I meant and some of the applications might have is on my application or not.

A

J

F

Sorry, question of 0.5, so here you're saying for approximate point of time: snapshots you can minimize the impact by taking the backup from the secondary member. So if you want to have a exact point in time snapshots, basically we have to rely on apis and that will take it from the primaries.

F

But you're saying if we don't care about the exact point in time and we want to do approximate, then we can take it from secondary, but you know they may be lagging behind and and then so that's one question and the second question is like for that scenario: like does expose apis to know which nodes are primary, which nodes are secondary, which time we have a scenario where one node is primary for some shards but secondary for some other shards.

G

Oh yeah, I could check this so um yeah mongodb, it provides there's an there's, a call I think db is master, which would return true. If that specific instance is the is the the primary sorry, um and so you can use that call to check if that instance is the primary node and then for this one for the approximate point in time, uh this is from. Actually I I linked down there there, the documentation for for the from the manual for our mongodb recommends taking a backup of the shared cluster.

G

So this is how they recommend, if, depending on your position, if you want to have a exact point in time, then stop all application rights, but just for point in time snapshot you can do it from the security member of the replica chart.

F

Okay, thank you, but so, basically.

F

It's easy to unders to figure out which nodes are primaries which nodes are secondary and we can't have a situation where one node is primary for some shards and secondary for some other sharks. Right.

G

um It's okay, if you don't know the last yeah, I'm not sure yeah. Okay, thank you. I can look into it and also you know one thing: the mongodb documentation. It's. I found it to be very comprehensive, so I tried to link some of them down there. You can also look look into it after the meeting.

G

A

Thank you all right. uh Sheen. We only got five minutes left. We won't be able to go through the uh execution hooks, but this is all good discussions.

E

A

Still got kapka, uh uh we will make it a short time.

C

uh Yeah, I I mean sean, you do most heavy lifting here, but I can give a quick summary. uh So kafka is interesting because it's pretty tough to actually do data protection for kafka. Kafka has multiple broker nodes. Each, which has is primary for some subset of the topics which is kind of similar to the shards in in uh manga. We were just talking about um backing it up is, is really a challenge because you really have to take down the brokers, um there's not kind of a simple quiet hook that you can.

C

You can execute really it's kind of a stop the service itself. The other thing is uh there's kind of two imagine data services you need for this. One is kind of the thing that serves the actual topics and messages and the other one is zookeeper. So zookeeper contains two kind of important bits of information. One bit is the consumer offsets. So if I have a client, I actually do have some server-side state that tracks kind of where I am.

C

It also contains topology information for who is um who's currently the leader for uh for the various um among the brokers for various topics. um So we did kind of mock up what backups of this would look like in kubernetes?

C

It's pretty, I would say it's definitely best effort. You know this. This would be possible to execute, but it wouldn't involve some downtime right. In fact, when you freeze and unfreeze the zookeeper volumes, uh you would not be able to make progress in your your client-side offsets. Your excuse me, your consumer offsets stored in zookeeper, which is problematic right that that doesn't go with one of the goals here was to have as little downtime as possible, um and then you could just take the normal volume snapshots right on a per broker basis.

C

uh I think our recommendation, for this is basically that you know this is similar to what we looked at with nuo, where maybe it's not a great fit for this, this type of um quiesce unquest hook. um You know you might need some kind of application uh integration to get this. This correct.

C

um Is there anything else, sean you want to mention.

A

Yeah, I also want to bring the topic to the community. uh It is interesting. The kafka is one of the great examples where there are actually multiple applications within this offering so kafka itself depends on zookeeper or if you treat your keeper to keep yourself as a notification right uh the. What makes things complex is the zookeeper itself can serve cop cap, but it also can serve others.

A

So uh we put this thing together again with the best effort, because at least based on very limited research tom- and I haven't found the effective way of doing this in kubernetes environment, and that's one. The other one which is also interesting, is that, as previously mentioned applications, they are already pretty rich tools around those applications to do either a in place, backup or a snapshot within the environment or a synchronizedly copy. The data to another cluster, that's actually most of the data.

A

Most of the end users achieve dr in these days, for example, mirror maker 2 is a very kind of you know. Popular tool that is widely used in the cochlear community uh to asynchronous replicate make the synchronized replication.

A

uh You get some drawback from that uh and more interesting, more interestingly, I think for almost all these distributed uh systems, they offer very high availability guarantees right uh and uh some kafka. Can you even do a stretch, cut cluster kafka, which means that your your cover can run in two different data centers. Even if you lost one data center, you still have the the other one up and running without affecting your business.

A

So in this case, with all these complications feeding in here, uh I'm not sure we are ready to propose anything in this community for this very complex applications per se.

A

uh One thing I I think I took away from today's session is that maybe it's not a very bad idea to look into those typical applications supporting tools to take backup in this case, if those tools already supporting application, consistency, snapshot or backup functionality, us as a community, maybe can just wrap or control on top of it and utilize.

A

Those functions directly without you know even in working web hook to do the choirs and the web hook, becomes a mechanism to issue a signal and that actually goes well for the concept of container nullifier uh yeah. That's pretty much what I have today. I think uh I will sync up with shin net on. I think I really appreciate all the effort everyone put into this. It's a lot of effort in a very short period of time uh shin. Unfortunately, we don't have enough time to go through the uh our agenda today. uh I guess.

B

We need to respond.

A

B

Yeah, that's fine yeah. I think it's a lot of material to cover today. Thanks everyone, a lot of hard work.

A

Yeah and alice uh wanted himself to cover the xcd operator from openshift.

A

Maybe we we can sing offline on that as well. uh Thanks everyone for today's meeting uh any last minute questions or suggestions.

C

I think we'll have to figure out. uh We still want to follow up and maybe have more meetings and figure out. What's left here, um yeah, maybe xing and sean. Maybe you can drive that.

A

Yeah, I will I will set up a following on to the on to this discussion. uh Anyone if you are interested in joining that follow-up meeting, we're going to mostly focus on what we discussed today in more detail. Please send me or send an email all right. Thank you. Everyone have a race, nice trip. Thank you.

A