Cloud Native Computing Foundation Kubernetes Community Days (KCD) Chennai 2022, 30 Jul 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Application Aware Backups With Kanister And Kopia by Prasad Ghangal

Description

Cloud-native applications comprise various components, including data services, storage systems, and related Kubernetes objects. Each component requires its own data protection tools, strategy, and domain expertise. A robust solution aligned with business requirements often involves complex workflows. What if there was a way to coordinate the implementation of these workflows while optimizing how backups are moved into storage? During this talk, Prasad will demonstrate how two open-source tools, Kanister and Kopia, work together to optimize backup and recovery for Kubernetes applications.

A

To do my name is Prasad ganga I'll be talking about application, Level backups with canister and copia about me. My name is I work as a technique at in Thrift Cloud Technologies. My main interests are kubernetes Co and open resource I'm, also the maintainer of World, Cube and canister. Besides work, also like trekking and playing cricket about today's session I'll be talking about data management in general. The challenges we face and how canister framework helps overcoming those challenges and how canister helps protecting your application data on kubernetes cool.

A

So when we talk about data operations, undoubtedly data management or disaster recoveries, one of the challenging problem we we need to solve when it comes to kubernetes or applications on kubernetes. The problem becomes more challenging because there are lots of moving Parts.

A

Your data management basically depends on the infrastructure you're using the kind of application you have deployed. And if you talk about the gender approach approaches uh we see about data management. The first one is uh storage, Centric snapshots, where the underlying file system provides a provides, a way to snapshot the volume it is class consistent, but obviously not uh they didn't didn't care about if the they don't basically interact with data services.

A

So that's why there is second approach: people follow which is storage Centric, along with data services, uh for example, some application needs freeze and then freeze Data before you perform snapshot. So this can be. You know. uh Some people follows Centric approach or snapshot approach with some hooks. They have to allows them threes and appraise the data data service.

A

uh Third, one is data Centric approach, where we use uh database utilities, like masquerade, postgres term, um to uh snapshot of the data, and then there is application Centric, where we use multiple strategies in collaboration to to manage the data of application.

A

So obviously no there is no single approach which uh there is a single solution to this problem, because the data management or the backups depends on a lot of factors like the infrastructure using there are different, provisioners uh different types of application. Each application has their own way of data management right.

A

So, even if you talk about just backups, there are different ways of taking backups. You can do volume snapshot, logical, backups uh provider based API calls like RDS snapshot, um or you can call Operator apis to perform snapshot.

A

Then uh application might have their specific concerns, like um application might need to scale up scale down before and after the backup and restore uh against freeze and unfreeze the data. And then your backup might have a different uh different Target requirements like different types of Object Store. It could be vendor specific as well so so when, when we talk about protecting uh stateful application data on kubernetes, there are a lot of things we need to consider and there there won't be a single workflow which we can follow for all the apps.

A

So ideally uh the ideals ideals tradition would be to have a framework that allows us combining different approaches and have built a workflow that that can be executed to perform application level level backups.

A

So that's where canister comes into the future. It's an open source framework uh to manage data at application. Level. The way the weight is achieved is using blueprints. So canister has something called blueprints which you can Define to build workflow, and then you can execute that workflow. We will talk about it more in details. So talking about kubernetes framework, sorry canister framework components, there are uh four main components: one is chemistry, controller blueprint, action set and profile.

A

Excuse me, canister control is, is the basically the custom controller responsible for managing uh performing operation based on the CR creation, so um the different level of CRS, which are in involved in the community canister framework, is blueprint. Action set and profile. Viewpoint is basically where you define the workflow for backup and resistor or delete operations.

A

um Actions say it is the basically trigger kind of to trigger the actions defined in the blueprint profile is the CR where you define destination, for your backups or in case of restore the source for your restores and to manage all these CRS. There is you. You obviously need a custom controller, which is a canister controller uh which take which performs some operations based on the CR creations.

A

Then uh there are two tooling uh clis: uh canister provides, one is, can cattle another one is candle, can Catalyst to uh it can calculate, helps you creating the CRS like action set and profiles can do used within container to push your push and put data from the object stores of your choice, all right. So uh this is how the blueprint looks like basically so blueprint consists of actions list of actions. So in this blueprint this is the blueprint for mongodb application interaction. You could see.

A

There is a backup action and then in each action they will there could be multiple phases. So in this case we view only see one phase. um Here's consist of function and arguments so function.

A

The canister function defines how the commands or whatever, uh how the operation is going to take place in case of cube task function. uh What canister does is it? It runs a container with the given image and execute these commands inside that container.

A

If you have requirement like you, want to exec into a container and then execute command, you can use Cube exact function, so there are list of canister function. Depending on your use case. You can use, uh we will talk about it more in in the next few slides but yeah. This is like in the each phase. You define how you want to perform those operations and you basically um uh basically build the workflow.

A

uh This is the example of action set CR in action set you basically, as I said, action set is kind of a trigger for the action defined in the blueprint, um so interaction set.

A

You define your reference blueprint and the action you want to run with defined within that blueprint right, and you also pass the object reference uh only which uh the blueprint interaction will be performed and then profile uh profile holds information about the object store where you want to pull or pull the push, the backup data or pull the data from uh you pass the profile reference, and once the action set is created, a canister, you know, runs some operations and then based on the operation status, it updates the status in the action set status field.

A

So, in this case you can see um it has said some output artifacts, and that is the path to which the backup artifacts are pushed.

A

uh This is the example of profile profile holds the credentials and the object store information like in this case. We are using S3 compliant Object, Store, uh objects.org, S3 compliant location type, with bucket a canister backup, and these are the credentials defined to interact with that bucket cool all right. So this is how, in theory, we have talked about how canister works. Now, it's time for demo, it will be. um We will be showcasing how uh postgresql application can be.

A

um You know protected using canister, so all right, so how this kubernetes cluster running, in which I have.

A

Created a postgres namespace and in post Chris name, space I have deployed a postgres application.

A

These are the running boards. So what we'll do? First, we'll try we'll add some data into this database, so using cubital exit.

A

So we are under the pod and now we'll try to create some database like let's create test database.

A

Cool add some data. Let us create a com table named company entries.

A

Add one more entry.

A

Cool so let's list down all the entries. Okay. So now we have two entries in the database right um now: let's perform backup on postgres, so I have already installed um canister operator in in canister name. Space operator is open. Running next thing, we'll create is the blueprint so before creating blueprint, let's go through the blueprint. uh This is a blueprint for protecting for this Grace application uh um yeah. So if you go through the actions in actions, we are defined with the lack of action and in action. There are multiple phases.

A

In this case there is single phase for backup action. um In this case, we are using Q Plus function. That means it will run a new pod with this image and we'll execute the commands defined here.

A

So in the commands you can see, we are building the host name from the object passed and we are executing PG Temple command and then we are using can do location, push to push uh the Dom to the object store, and then we are setting the output artifact, uh which is basically the path to which we have pushed the data.

A

And then, in this tour phase we are fetching the data from from the location we have defined during backup and basically then again running psql command, to restore restore to the data and in delete action. We are just deleting the uh the term who is to be opposed to the object, stored, cool all right. So let's create the blueprint in the defined.

A

We have installed the canister operator. All right next step is to create a profile to specify the object, store, information.

A

All right, so it will verify if the past information is correct. If the bucket exists in that region and we'll create the profile, all right next step is to perform backup so for performing backup.

A

We will, there again uh will be basically creating an action set and if you go to the command say we are specifying backup action from postgres BP blueprint which we have created and we are passing uh the postgres stateful State as a reference object or on which the action will be performed and then the uh profile name, which is the three profile, some random number three cool. So that is how we have created action set which will perform backup action. We can check the status using cubicle, describe action, set command.

A

All right so, in the events you can see, uh the status is complete and if you see the artifacts, it's saying the backup has been pushed to this location. Let's quickly verify that okay, so we have this.

A

Excuse me yeah, so the bucket we had used was canister demo inside that.

A

You can see it at this path. There is a file uh which.

A

To which the uh to which canister has pushed the data could all right so now we are done with backup. Now, let's do disaster, let's delete some delete. The database we had just we had created, we will again execute I, will do Cube, exit, sorry, cubital exit, um execute command and then get the psql CLI and then uh so we have.

A

Okay, good, so we have, we had created test database. This just drop the database.

A

All right now we no longer have test database cool. Now, let's uh do restore um so the Forester. We have to find the register point. That was the action State we had created for backup. Okay. So let's get the restore point.

A

All right, this is the restore point to which we want to restore our application.

A

Okay, so we'll be creating again action set, but interaction set will be passing, restore action.

A

Okay, um so, instead of passing whole information again, you can just refer the rest of the information like blueprint, the artifacts from the uh Backup backup access it. So we'll use from argument.

A

I will pass the reference to to the backup action set, and this is how it will create restore actions. We can get the status.

A

Describe action set.

A

Cool, so the status is complete, let's again go into the database pod and verify if the data is restored correctly.

A

So, ideally, we should see two entries in the company that company table so first of all, you can see there is a table, so there is a database which is restored correctly.

A

Let's connect to the database and try to list down the entries in the company table.

A

Good, so you can see the edited. Data has been restored correctly, with two entries as expected, so yeah. This is how uh using blueprint you can Define the backup, restore workflow and then use action set to present the actions from the blueprint all right. Moving back. So, let's, let's see how this whole thing happened. So if there is a database workload, you want to protect data of uh first thing you need to do. Is you need to Define blueprint? uh You need to different workflow how you perform.

A

You want to perform backup and restore operations, and uh once you have a canister controller, pan running blueprint created, you can use section set um once you create action set, you define the action you want to run from that blueprint, then canister controller will fetch the blueprint and action um for that direction from the blueprint and will run that workflow.

A

So we use uh canister functions to Define how you want to perform those operations and then again using can do we push the artifacts to object storage, and this is how, once everything is done, the action status is updated with the required information cool. So uh we talked about chemistry. Functions there are different types of chemistry functions you can use while building the workflow. That depends on your requirements. So if you want to execute some commands, add some custom logic. You can skip exec or cube task function.

A

If you want to scale down uh scale up or scale down the workloads, you can use the scale scale workload function. There are a few functions for PVC operations like backup everything from from the PVC resistor data to and from the PVC.

A

um There are a few functions you can use for taking CSA volume snapshot, um AWS audio snapshots, uh you can do also supports different types of Object, Store and then um yeah different types of provided snapshots are also provider based. Snapshots are also Pro are also supported by canister using a specific cancer functions. The complete list of canister functions can be found in the canister docs link, but yeah I want to go in like in in the to the whole list, because there are lots of functions um yeah.

A

So moving back like how we push and pull data from and to the object store, we use kopia. um We used to use a stick, but recently we also used to copia for all the objects to related operation. The reason is, um it's: it's more secure and reliable. It's it provides uh different types of encryption. Algorithms and the duplication is very efficient. uh It sits way faster than the district.

A

uh It supports multiple uh computation, algorithms um and basically have lesser, maybe footprints um and it's and it supports uh lots of uh object stores, including S3, GCS, Azure buckets and all so it's uh it's very faster, reliable, secure than District. So we have. We have like a switch to copia for almost all the operations.

A

um So for now, the way you can enable copia for Object Store related communication. You just mention copy your snapshot in the output artifact of the action, and you have to create a canister certificate server. With with the repository backend of your choice, it could be S3 GCS anything and then, when you create profile for your uh for, for your action set, you specify the credentials of copier server instead of direct.

A

Instead of you know specifying the credentials for Object Store, so copier server acts as intermediate a server between your Object Store and canister operations, and through profile you will you can you will be communicating with copia server and that also, you know, provides you fine-grained um security configuration basically, instead of using the credentials of Object Store, you create a copier server and use copier service credentials in the action set or profile to to trigger the operations.

A

um I. Let me give you an example: how copia profile would look like? So this is how copia profile looks like you, you defined the location will be of type copia and you specify endpoint of the copia server, and then you specify Utilities information and username password for for authentication with copier server and then canister will use. You know we will push the artifacts to The copier server.

A

A

um Yeah I think we've already talked about this, so um if you, if you specify the copia snapshot in the artifact, um uh you you have to Define The copier credentials in the profile and then chemistry control will communicate with copia server to push the artifact and fetch the artifacts for backup energy store all right. So as of now the kopia server creation part is manual, we are yet. We are in the process of automating that uh this is something um you can expect in the future releases.

A

um The the new features new upcoming features in the future images are, um we were trying to improve the user experience or blueprint authors experience to build the blueprint, um we'll be adding more canister functions to um to support the operated, specific snapshot operations like Kate, Sandra and other operator based databases.

A

You can expect more examples in the community blueprints and yeah The copier server creation, which is manual as of now few resources. You can refer canister, you can find all the canister dogs, including the canister, different types of chemistry function you can use for building blueprint at docs.com.

A

um We have documented few sample blueprints that you can use as is, or you can modify as per your requirement. uh They can be found on example, directory in the canister guitar.

A

S are at copyio dogs and if you have any doubts, if you want to discuss anything suggest anything, please skip through today's issues on the canister chemistry GitHub repository, you can also join our flag.

A

Workspace, canister, Dot, slack.com, um feel free to you know, we'll be happy to help you if you have any doubts and any issues all right so yeah, that's all from my side.

A

um If you have anything you can reach out to me on Twitter on LinkedIn um yeah, thanks for having me thanks to all the organizers of kcd Chennai thanks a lot.