Cloud Native Computing Foundation Online Programs, 8 Jul 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: How to protect your cloud-native data 101

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hi everyone and welcome to this webinar how to protect your cloud-native data one-on-one during the next hour, we're going to take a look at data in kubernetes, what all the different paradigms concepts and what is required to secure your data and guarantee their operation using some of those paradigms, but also evaluating what you may need on top of that.

A

So, let's get started. My name is nick vermandy, I'm a principal developer, advocate with on that I've been working with quantities approximately for the past six years on various areas such as network cni, now more focused on the csi.

A

So for today what I have in store for you is we're going to take a look at the definition of cloud native data, what other characteristics, what all their you know, what may be different options for you to build data uh within the cloud, because cloud native is not necessarily only about kubernetes, but then we will switch specifically to kubernetes uh talking about different concepts involving data and also extended to.

A

Components that will help you protect your data and finally, we'll take a look at open source solutions, around data protection and also around more real time protection things like synchronous, replication, how you could eventually do this with kubernetes and the csi driver. So let's get started.

A

What do we mean by cloud-native data?

A

If we look at the gunner magic quadrant for cloud database management systems, it says that 75 percent of all databases will be deployed or migrated to a client platform with only five percent ever consider for repatriation to on premises.

A

So it means that most of the time databases will be consumed as a service in the cloud, not necessarily in kubernetes but directly over a platform as a service such as you know, rds and alike.

A

It also states that by 2023 cloud, preference for data management will reduce the vendor landscape, while the growth in multi-cloud increases the complexity for data governance and integration, and also cloud dbms revenue will account for 50 of the total dbms market revenue so garner, essentially, look at the database and data ecosystem, as merely you know, services running in the the public cloud provider.

A

But what it doesn't said is that you know we should consider this whole cloud native database um concepts as a whole iceberg at the top of the iceberg. This is more or less the care bear world where the narrative is told by the service club providers. Everything is easy, is naturally consumable via apis and you basically pay for what you consume.

A

But of course the story is not as clean as what they say. All those services are specific to particular cloud service provider and, as we know today, there's not such a thing as sticking with a single cloud provider.

A

First of all, because you don't control the business you may have different merger and acquisitions, and those companies may use a different cloud that the one you have started with and another reason may be that you may want to use a particular cloud for a particular function.

A

If you look at maybe google to run kubernetes, maybe aws to run your storage and your buckets, etc, etc. So you may want to have specialized cloud for a specific set of function. You want to provide to your business and then the challenge becomes that, as you move between different clouds and reiterate, um you know database migration or operate databases into those different clouds. You, even though, at the end it may be. You know relational database or no sequel database.

A

The fact is in terms of the operation, those are operated in different way because you are using different cloud providers. So it's not exactly the same api, um but protecting your data will happen in a different way and how you combine this data with your overall application architecture will also be different, and typically that means that you need a broader scope in terms of the skill sets of your engineering teams.

A

Your devops teams, as well as your developers in the end, and on top of this there's also two other considerations to have the first one is about cost when running databases services in the cloud most of the time, you will also have to account for the underlying running instances that are powering your database, so mainly this means you will have to pay for the compute.

A

So the running instance, as well as the storage, for example, your lbs volume associating associated to your database. That's one and the second consideration to have is about availability.

A

As you probably know, most of the cloud database are replicated and highly available within a particular availability zone. As soon as you want to recover into a different availability zone, then it does incur some downtime. Potentially you have to restore for snapshots, and that also means you have to schedule and manage the lifecycle of those snapshots by yourself and chances are that your snapshot may not be.

A

You know restorable at the destination, so on top of that, you would have to test your backup on a regular basis, which is obviously causing some overhead. On top of your you know, operations and again this is for one particular cloud.

A

If you consume this database as a service model from different clouds, you will have to repeat the same sort of automation and extended operation over multiple cloud, and you know automating snapshot testing. The restore all those snapshots in different cloud will also involve different skill sets because they are using different api, different sdks uh defense provider.

A

If we're talking about infrastructures code- and you know talking about devops pipeline, for example- so all of this needs to be taken in and if we compare this to a let's say cloud native kubernetes solution, then there will be a couple of differences.

A

So, let's first take a look at the cloud native features. What we could expect from a kubernetes environment, whether running on premises or in the public cloud managed or unmanaged? First, it's all about scalability. The last couple of years I've seen the rays of auto scaling for pods, but also auto scaling for nodes. So this means that, as your application requires more power, you can also deploy more nodes in kubernetes, as you would do with a neutral scaling group in public cloud providers as well, so not only for nodes, but also for the application itself.

A

So you can scale your application to be able to take. You know taking some of the peaks during um high usage periods such as you know, promotion sales, if it's that's a commercial application, or during black friday, for example, where potentially you need, you need more power for your application, so more web servers, uh maybe more database nodes to facilitate uh reads, but also more kubernetes compute nodes as well elasticity.

A

So that is the ability to scale up, but also down where, when you don't need this, this extra power self-healing is also a very important concept in kubernetes, whereby kubernetes is fundamentally an asynchronous system with eventual consistency where things happen concurrently and will eventually converge to a finished state.

A

So even though um things may not succeed the first time, maybe the second time uh a controller will try to do something all the prerequisites will be met by you know if it depends on other controllers and in the end, eventually, everything will converge. Sort of you know, self-healing itself.

A

Observability is also key in communities. They are you know, proliferation of solution like prometheus, rafana, different dashboards that are available within kubernetes.

A

I would say you know at no extra costs, so this is also a very important factor, so this is for the basic, the foundation of kubernetes and what kind of requirements and capability it is uh providing. But how about persistent data and storage? Let's say if you want to build your own database in kubernetes and run it in production, then, first off, of course it needs to be distributed. You can not run a single part database on a single node.

A

You want to have a distribution of your data across multiple replicas across different commodities nodes.

A

You want replication to happen as well, because by default we're going to see later uh some of the qualities primitives, but the data itself is not replicated by the platform so meaning that there are two main solutions.

A

You can replicate your data at the application level, meaning that you rely on your database cluster or your database replicas to provide multiple instances of the same data or, alternatively, you can also rely on your storage provider or csi driver to provide that particular feature or of replicating at the block level, and we're going to talk a little bit about this later as well.

A

Encryption is also something you have to consider, especially if you're running database that help that are holding sensitive data. End-To-End encryption is really important in kubernetes um and you have to find the right solution, which is not necessarily relying only on the cloud provider for encryption. You may also want to encrypt your data directly inside kubernetes so that no one can get access to your queries. Volume if someone were to you know, read it from from kubernetes itself.

A

And another very important function for your developer is self-service provisioning. So the idea here is, as you know, microservices become more and more popular.

A

Individual teams are responsible, for you know a set of micro services and each one of this team will run their own queuing or messaging system and their own databases, and you can simply you know. In the let's say cloud native philosophy: you cannot rely on a waiting time for developer to consume and to provision their databases.

A

They need to be deployed on demand. You cannot afford. You know waiting two three, four or four days or even multiple weeks to get the database up and running in an environment where potentially code updates and new releases are deployed in production typically multiple times a day, so self self provisioning is a very important concept when it comes to deploying and managing databases in in communities and because kubernetes has all the fundamental prerequisites to enable this kind of paradigms.

A

It makes it a the perfect platform to run databases and, finally, all the functions we have been mentioning so far can be encapsulated into devops pipelines.

A

And again, you have two solutions: either you could use your cloud service provider native service such as azure pipelines and and others, and of course you will be subject to the same drawbacks. I would say that we've seen before in terms of different clouds having different apis and different way of implementing those devops pipelines or, alternatively, you can choose to stay within capabilities and use a kubernetes native devops tool such as tecton, which gives you the ability to develop your devops pipeline without leaving kubernetes.

A

Basically, every task defined in your devops pipeline in tecton is run as a distinct container. So you can compose your pipeline as you wish again using yaml, which is the de facto language declarative language in kubernetes, so again, no need to to learn anything anything new.

A

Just using yaml, you can define your different action, your different tasks and those tasks can communicate to each other by using you know, also your storage inside kubernetes and essentially run the whole pipeline without leaving your cluster, meaning that, as you deploy kubernetes as your de facto cloud operating system, you can stay consistent across all different clouds.

A

In terms of building your applications, but also building your devops practice, so now, let's take a look at some of the kubernetes data primitives, the most important primitive when it comes to persistent data in kubernetes and non-persistent data actually is kubernetes volumes and commodities supports a variety of volumes.

A

For this you can use coop ctl to get access to the whole descriptions as usual, so kubctl explain, pod.spec.volume.

A

Recursive will give you the full list of supported volumes. Some of them are legacy, I would say, because it also include the deprecated entry drivers, but kubernetes has moved away from the entry drivers into a more modular approach where every store, storage, vendor or provider developed its own driver called csi or container storage interface.

A

It's a plugable architecture where only the required csi will be installed by the user when you need it. So, for example, if you're using amazon eks, you can install the ebs csi driver, and so you will be able to take advantage of a variety of features that comes with the csi, so all the kubernetes primitives, and on top of that also additional capabilities.

A

The main volume providers you're going to be using are displayed here on the screen. The first and more obvious one is the persistent volume claim. So a persistent volume claim is a request for a back-end persistent volume that matches specific criteria such as you know the size, the storage class, of the the volume you want to create.

A

Essentially, when you create a pvc, two things may happen the first one if your csi can provision dynamically persistent volume. Well, when you create a pvc in the backend, a persistent volume or pv is dynamically created.

A

If the csi doesn't support dynamic creation of pv, then you have to statically link your persistent volume claim with the back end, persistent volume that has to be created before, but nowadays most of the the csi driver support the the dynamic provisioning of persistent volume and typically, what you will do is create a particular storage class defining the type of storage you want to use and the csi you want to use and then reference that particular storage class in the pvc definition as well as the size, and then the back-end per system volume will be automatically created.

A

Another important consideration is how the pod will access the pvcs. So if you have a single pod, you can have a pvc that is locally attached. If the node fail, then, unfortunately you will also lose the data. Now, if you create a higher level construct such as a deployment, then you will have to use a shared file system, because in the definition of your deployment, you will specify a single pvc, meaning that if the pvc is a local attached file system, then only the first pod will be able to consume it right.

A

The other parts that will be potentially residing on the same host won't be able to access it because it's been already claimed by the first one and parts that are residing on other nodes. Well, they won't have access to the local attached volume right. So the only solution is to have a shared network file system.

A

If you need multiple pods part of a deployment to access a particular pvc, and then you have the option to create a particular directory per pod basis, so you will have a shared file system and every pod that have access that has access to that particular pvc will create or will use its own directory.

A

So, in terms of access definition, it means that if you want to use a pvc within a deployment and every pods need to write to that pvc, you will need to use, read, write many access, backed by something like an nfs share. Other volumes that can be used include empty gear, which is a scratch directory, typically mounted from the root file system or ram on the node.

A

It starts empty and, of course, the pod may write data to the directory that will be mounted into it, but when the pod is restarted, the data that is located there is um also scratched, then hostpass, which identifies a particular path in the communities node that will be mounted as a volume into the pod. It is typically avoided in production, as it has some security involvement, but also because it's only valid for naked pods.

A

Again, if you create a higher level construct managed by a controller such as a deployment stateful set, etc, then it doesn't make sense to use host path right, so typically used for testing or troubleshooting. Eventually, then, we have config map which are a set of key value pairs.

A

That can be mounted to your pod as environment variables, but also as a volume um into the pod, and then your application can can access can get access to this information just by reading the files that will be present into your mount point secrets sort of similar to config map, except that it is encoded in base 64 but not encrypted by default. This is really important.

A

Then we have the downward api, which can be very useful because it provides contextual information for your application running inside your pod. So the downward api allows you so to define in yaml again inside your pod. You can reference particular fields. You want um to inject into your running pot, so it can be things like you know your pod ip, the amount. You know the requests uh for your cpu, the limits of the cpu memory, etc.

A

So give essentially a lot of contextual information for your application, as opposed to you know, hard code, um those information. And finally, we have also a firm ephemeral volumes which are a bit more recent than the the others, and they have been created to meet the requirement of specific use cases where applications don't really care. If the attached volumes are persistent or not.

A

So, for example, it may be a caching application um where the data you know can be easily scratched when the pod get restarted and the application doesn't really care or care about that, or it gives also the ability to pre-populate data as input for the application.

A

But essentially the main difference is that the life cycle of the volume is the same as the one of the pod, meaning that um the pod can get restarted on the node. Where previously the volume didn't reside, as opposed to as opposed to a pvc, for example, as the pvc once it's been claimed, will basically reside forever on the particular node. It means that the pod is tied to the specific, the specific node where the pvc resides. So it cannot be restarted on another node here.

A

The difference is that parts can be restarted on whatever nodes. Also, in addition, ephemeral volumes can be supported by csi providers to deliver um some additional capabilities, such as snapshotting cloning, resizing and storage capacity tracking for those ephemeral volumes, because they are fundamentally csi capabilities.

A

Okay, so now, let's focus a little bit more on the csi basic capabilities. What does a csi need to deliver to kubernetes at the bare minimum? So it is a standard defined for storage plugins in 2018 when kubernetes moves from the entry driver development, meaning that for every modification, the whole community system had to be replaced.

A

So moving away from this entry to a more pluggable architecture, where you know you don't need to replace the whole community system just to update or upgrade your csi driver.

A

So essentially now the csi driver is delivered as an additional application that you installed inside kubernetes as you deploy the cluster, so the csi driver need to be compliant with a couple of apis or rpcs that will deliver specific function to kubernetes or functions that commodities expect so dynamic provisioning and the provisioning of a volume attaching or detaching the volume from a node mounting and mounting volume from nodes also support the creation and also deletion of snapshots and finally, also provisioning a new volume from a snapshot.

A

Those are typically the features that the any csi will deliver. Now, as I was saying before, the csi driver itself is installed as an application on your kubernetes cluster and has typically multiple components, so the node plugin typically delivered as with daemon sets holding a grpc endpoint.

A

Then we have the controller plugin, also grpc endpoint, and then the csi driver has multiple interfaces: one responsible for identity, the identity, services, the controller services and, finally, the node service.

A

Now, when it comes to data protection, the csi driver deliver multiple functions that are represented as an extension of the kubernetes apis snapshots are effectively represented as crds or custom resource definitions and are composed of three main objects. First, the volume snapshot, the volume snapshot, content and, finally, the volume snapshot class.

A

So the volume snapshot is comparable to a pvc in the sense that it is actually a request for a snapshot. A real snapshot and that snapshot that is taken is effectively similar to a persistent volume, in the sense that it is effectively the physical sort of snapshots and the corresponding object is the volume snapshot content.

A

The volume snapshot is composed of a snapshot controller as well as a validation webhook and is effectively delivered by the csi driver. So the snapshot controller watches both volume, snapshots and volume snapshot content and it's the component responsible for the creation and the deletion of volume snapshot objects.

A

On the other side, the sidecar csi snapshotter is the component that watches volume, snapshot, content, objects and that triggers create snapshots, as well as delete snapshot operations against a particular csi endpoint and finally, the validation. Webhook is nothing more, but a http callback that is there with the goal of tightening the validation for volume, object, snapshot.

A

And um finally, we also have the volume snapshot class, which specifies different attributes belonging to a volume snapshot. It is sort of similar to a storage class.

A

If you want to compare volume snapshots to pvc again, one other thing to notice is that the volume snapshot content similarly to a persistent volume, can be provisioned dynamically or can be pre-existing so for snapshots, pre-provision or already existing mean that you can link a volume snapshot object again, which is the request to an existing snapshot that has been taken, maybe by your storage array, so effectively, representing the external snapshot taken from the storage array into a kubernetes first class citizen, but most of the time, of course, if the csi driver supports it, when you will create a volume snapshot, the corresponding real snapshot, the physical snapshot will be created and will correspond to the volume snapshot content.

A

So obviously snapshots are asynchronous. That means they represent at a particular time the content of the data. It's not a synchronous replication that is happening continuously over time, and that may be an issue in case of um rpo. That needs to be equal to zero, so rpo or recovery. Point objective is the representation of the data. Then you can afford losing in case of failure right. So if you have an rpo equal to zero, it means that you need something um more synchronous than a snapshot.

A

Basically, you need a representation, a continuous representation of your data over time- and this is the type of thing that are not um or cannot be represented directly or are not available directly in kubernetes, but by using particular csi drivers that can produce and provide actually that feature on top of the additional.

A

Functions that are required by the kubernetes api. Well, the csi driver can itself deliver synchronous replication. So this is the case of the on that csi that is represented here on the screen, but other open source csi drivers like open ebs can also support replication. It's just to give you an example of how it can be delivered.

A

So the idea here is to provide this additional capability by giving the user the ability to configure, as yaml or in a declarative format, to configure the number of replicas on the per volume basis so that when a node fail, even though the storage is locally attached, it's effectively aggregated as a pool of available storage and every volume that is consumed within that pool is also replicated on a set of other nodes.

A

So when the node fails, the pod can be restarted on another node, where the volume of replicas can in turn be promoted as the new the new primary volume, and this is what will enable you to seamlessly recover in case of no failure with effectively zero data loss.

A

Okay, so so far, we've seen different paradigms: snapshots asynchronous synchronous, application for zero or po, but fundamentally there is also something else which is creating backup from your snapshot, so your snapshot as such um are living within kubernetes. So in case of failure, of course, if you want to restore, you need to restore from a storage repository that is still available. So typically, you want to externalize your snapshot and copy the data into an external storage repository like aws, s3 or google storage.

A

And again you can do these operations without leaving kubernetes with again the same principle, leveraging controllers crds and the operator model.

A

So in our particular example, we'll take a look at canister, which is an open tool by an open source tool by casting and is effectively composed of crds controller, as well as a command line that we're going to explore in a minute at the center of the canister architecture, we have multiple crds, including the blueprint which is a custom resource, defined as a set of instruction that tell the canister controller how to perform actions on a specific application.

A

They are typically a curated set of manifests that are maintained by the community, so every application will have a corresponding blueprint that will encapsulate actions such as how to quiz the file system or the database etc.

A

Then, as another set of custom resources, we have action sets which define actions that can be triggered by the creation of the corresponding. You know, custom resource manifests so typically, if you want to do a backup or a restore actions, you will do that by creating those manifests and to help with the life cycle of those custom resources.

A

You can also use a command line called, can ctl, which can be used in dry mode to generate a different manifest and then those manifest can be applied to the kubernetes cluster using coop ctl or you can just use ken ctl without the driver, an option and it will directly create those crds into your kubernetes cluster. So here we have an example for the elasticsearch application.

A

The first thing you're going to do in the workflow is to create a profile that encapsulate the information required to create a remote storage location. In that particular example, we are using gcp and cloud storage to externalize the data that we are going to backup from elasticsearch.

A

So it encapsulates the information required to configure the xml bucket, as well as the corresponding credentials.

A

Then, once the profile has been created, you're going to create the blueprint that is available, you know publicly that is really specific to elasticsearch and defined how to perform action on that particular application, then we can use, can ctl with dry run mode uh to generate the the the manifest for the backup action set and later apply it with ctl or here. In the example, we just use canned ctl without the driver mode and that will directly create and push the manifest into your communities cluster.

A

So here we specify the action set as backup we specify the blueprint that we just created. We specify the stateful set. That is basically representing the application that we want to backup so default.

A

Elasticsearch master represents the namespace and the name of the stateful set, and finally, we also reference the profile that defines where we want to store our backup content.

A

So once the action set has been created, the manifest push to kubernetes the controller will react to that and trigger effectively a backup that you you can monitor in terms of the status using coupe ctl as well. So just monitoring that particular custom resource you will see once you will be updated once the the backup has been completed and then in case of disaster, you and you want to restore the content of um of the remote location. Then again you can just use.

A

Can ctl, as displayed here on the screen, specify the namespace create the action set. This time the action set is restored and from the backup name, which is basically the name that has been returned by the previous command. When triggering the backup action set and again it's a crds. You can monitor the progress of the restore by using kubectl to monitor the status of that particular crds and at some point, the data, the initial data will be restored in the right place.

A

So that concludes our presentation for today. Hopefully, you learned something and it's been useful. A couple of key takeaways before moving on kubernetes is ready for stateful application with cloud native data. This is a very important point. It has evolved over time. So now it's not only about cattle. You can also run pets in cobilities, but the key is to make sure that you can reach the right level of availability, scale and performance, and we've addressed today.

A

Some of the challenges for persistent storage and as you've seen they are not all addressed by default in kubernetes, especially when it comes to zero, rpo and synchronous replication, and even when you have a csi driver that can provide snapshots.

A

You also need to backup those nutshop those snapshots into a remote location, and this is where you may want to use again, kubernetes native tools such as canisters or valero or others, and this is made possible by extending kubernetes for data protection and making data protection first class citizen in capabilities thanks to the ability of capabilities to extend its apis.

A

So, finally, you want to make sure that your csi driver can protect your can protect your stateful workloads in case of node failure or also cloud outage, and possibly a baby availability zone outage too now a call to actions for you.

A

If you want to learn more about data on kubernetes and how to run your stateful application and your stateful workload and kubernetes, please join the dok or data on kubernetes community. You have the link there, I'm personally running the dok london meetup. So if you're local to the uk, you can go and subscribe to the meetup page so that you are always up to date when it comes to the next dates for our needs meetup, the next one will be it's in september, so if you're local don't hesitate to join us.

A

Also, if you want to learn more about on that and on our csi capabilities, you have a set of links there that you can go to for more information. I hope you enjoyed this presentation.

A

I wish you a good day and take care of yourself see you next time.