Cloud Native Computing Foundation Kubernetes Community Days Bengaluru 2021, 10 Jul 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Building Stateful Workloads in Kubernetes

Description

Kubernetes Community Days Bengaluru'21

It's day 2. Kubernetes is running. You have your deployments and services set. Now how do you migrate the data store? Let's journey together on this code-focused tour through ConfigMaps, Secrets, Persistent Volumes, Persistent Volume Claims, and StatefulSets. We'll craft and launch a strategy to care for your users' data in this new container world. You can power your business on Kubernetes: stateless or stateful.

A

Hi welcome to kubernetes community today today we're going to talk about building stateful workloads in kubernetes. Here's the part where I tell you, I'm definitely going to post the slides on my site. Tonight.

A

I've been that person chasing the speaker and it didn't work out for me as well either. So you can head to robrich.org right now and click on presentations here at the top and here's building, stateful workloads and kubernetes. The slides are online right now, while you're here on robrich.org. Let's click on about me and we'll take a look at this page and see some of the things that I've done recently.

A

I'm a microsoft mvp, a friend of redgate, a cyril developer advocate, and let me tell you about az, give cam a z gift camp brings volunteer developers together with charities to build free software. We start building friday after work sunday afternoon. We deliver the completed software to the charity. Sleep is optional. Caffeine provided, if you're in phoenix, come join us for the next a-z give camp or, if you'd, like a gift camp in your neighborhood, hit me up on email and or twitter and let's get a gift camp in your neighborhood too.

A

Some of the other things that I've done. I do a lot of docker and kubernetes training and one of the things I'm particularly proud of I replied to a.net, rocks podcast episode. They read my comment on the air and they sent me a mug, so there's my claim to fame my coveted.net rocksmug.

A

So let's dig into building stateful workloads in kubernetes.

A

We talked about this guy.

A

Let's first talk about stateless kubernetes, that's the easy part, so we have a user. It goes into an ingress that ingress resolves, dns and forwards off to the service that acts as a load balancer in front of all of our pods that have the containers yeah. We got this. This is easy. This is what we do all the time we have stateless web services and running them inside of kubernetes is easy, but our application probably doesn't look like that. It looks like this.

A

We've got a user, we've got an ingress, we've got a service, we've got the pods with the containers in them, but we also have well stateful pieces. We have an ssl certificate that we need to terminate. We have a database cluster. We have some secrets and configuration that helps us discover that cluster and file storage.

A

So how do we add this state to our application? That's what we'll do today. So, first, let's talk about state. What is state, not these states? What is state this state is the things that live beyond the current lifetime. If the current lifetime is the request, then it's things that live beyond the request. If the current state is the function call, then it's maybe class variables. If the current state is the machine, then maybe it's keeping things beyond machine restart.

A

This is the type of state. It's things that live beyond the current lifetime. So how do we handle state inside kubernetes?

A

Well, let's look at a bunch of different types of state configuration secrets, data stores, file stores and singleton services, let's double click into singleton services and take a look here with singleton services.

A

We may have a service that needs to look at the entire state of the system to be able to identify where it needs to go next, maybe we have a month-end report that needs to be able to look at all of the customers to be able to write that totals line, or maybe we have a database table that needs to create an auto-incrementing primary key now it needs to know about all of the records in the database so that it knows which one is next, these singleton services are stateful in the same way that we have other stateful things.

A

They need to know about the entire thing and we need to have well exactly one of them. So that's the state. We have singleton services, we have file and data stores. We have secrets and configuration. Let's look at how to do each inside of kubernetes, first cube, ctl get all now cube.

A

Ctl get all is great for being able to get all of these stateless resources, but it very specifically does not affect stateful resources, as you grab these slides from roberts.org click into each of these blue links, and you can learn more about issues that people have filed. Saying hey this stateful resource isn't available when I say cube ctlc at all.

A

To that end, this last one is their recommendation of deprecating cube ctl get all because it isn't stateful. Now, they've done this on purpose. Cubectl getall very specifically returns the things that can be easily recreated. They specifically don't return secrets or config maps or other data that might get destroyed. If you stop and restart it.

A

Okay, that's the design decision that they made, but we're looking for stateful resources. How do we find them? Well, here's a great github issue that a github comment to this that creates a mechanism for getting all of the things now in this script, we're going to say, cubectl, api resources that api resources will return the list of all of the resources in your cluster, including crds.

A

Let's lock that grab the first name, remove the first line, the title and then set out the space with or rather set out the character turn with a space, and now we've got that script in place.

A

We'll take the results of this, which is now a common delimited list of all of the resources and will cubectl get that now, that's perfect. We now have a script that is able to get all of the resources.

A

Well, you probably want to create an alias for this or maybe a command. Now the cool thing is: if you create a file called cube, ctl dash, get underscore all and put this script in it. Then you can do a cube. Ctl get all and you'll be able to get all of the things so cube. Ctl get all does not actually get all of the things, and so you'll need to specify the specific type of thing that you want.

A

Okay, let's look at each of the stateful things that we want to look at first, let's look at volumes. Volumes is file storage. If we save things into our container, when the container restarts that file is gone because well, the container file system is gone. So how do we store files beyond a container restart?

A

That's where we use volumes now a volume is a sim link. I may have my content off in ntfs drive or maybe in a cloud blob store, and I just want to sim link that into a particular folder inside my container. Now. The beautiful thing here is that the application doesn't know that this is assembling off to a more durable storage location. It's just a local file. The downside is, it is actually now remote file access, so file locks, don't work.

A

The local operating system believes that it locked the file, but it only locked it local to that container. So any other containers reading this file system may tend to clobber the thing use different paths, maybe a path that includes the hostname or the function of this application or the user id something to make it unique so that we don't end up clobbering the same paths.

A

Here's an example: yaml file that creates file storage, now notice here in the pod definition. Now this pod definition may be part of a deployment or other mechanism here in this pod definition. I have my list of containers and in this case I have an nginx alpine container and now I may have other resources like resource limits or other parameters, and here I have a volume mount section here in the volume mount section. I have the path in the container to that directory.

A

This is the path where my application can just read and write files and those files will get stored off to this volume named the volume so out here outside my containers list, I have a volume section and here's my name, the volume and I specify host path. The path to that directory.

A

Now here I'm specifying the type of driver that I'm using in this case, I'm choosing hostpath hostpath will create a path on the node that is running this container. Now, if I'm running in a multi-node cluster, this isn't great.

A

I probably want to swap this out with a cloud storage or a network share, but for the sake of testing, when I'm using docker desktop host path is perfect. That's why I've included it here as you're testing. Hostpath is a great way to not have to configure a whole bunch of other stuff in production use something more durable.

A

So we have this file, that's yml file. That includes the details of how to spin up this volume, but now we're stuck in a quandary who owns this. Does the developer own? This does operations own this now the developer owns the rest of this yaml file, so it might make sense that the developer could own this part, but operations really wants to control where the files are stored, to ensure that it's backed up correctly to ensure that there's proper authentication to this folder so who owns this file?

A

Well, let's split it in half, let's create storage. This is the actual drive that we're using now, let's carve it up into each into pieces, we'll call each of these pieces a persistent volume.

A

Now, let's grab one of these persistent volume pieces, we will claim it and we will use that persistent volume claim in our container storage, persistent volume, persistent volume claim container now. The beautiful thing here is that ops can own the storage in the persistent volume development can own the persistent volume claim in the container.

A

These are four separate, yaml files. Now that's perfect. We have an elegant separation of dev and ops. Now, if we don't need that separation, maybe we're in a really small business, where all things are the same, then we can use the simpler approach, but when we need that separation of concerns built into kubernetes is this mechanism to do exactly that?

A

Here's a yaml file for that persistent volume. Now here in the persistent volume we still have that host path, specifying the folder on the physical storage medium. Then we specify other characteristics about this. In this case, my access mode is read, write once I could also do read, write many many containers can access it at the same time and read only where they can read it, but they can't write. I'm also specifying a storage capacity, in this case 10 gigs and a storage class name.

A

Ultimately, I'm giving it a name. In this case I gave it the name pv volume, here's the persistent volume claim as a developer. I now want to claim one of those persistent volumes.

A

Now, I'm going to specify the characteristics of the volume that I want to claim: storage class manual access mode read: write once and resources storage, three gigs now I'll be able to claim that persistent volume that we created previously, because I have a match on the storage class name, a match on the access mode, and this the volume is at least this big, that's perfect. So I've got this persistent volume claim notice. I don't need to match it by name.

A

I just need to match it by characteristic, and now I have a name of pv claim when we create that pod definition, our container's definition is identical. We still identify this as the volume and the path in the container to the directory is the same. But now, in the volume section instead of listing the hostpath driver, I list the persistent volume claim and I'll identify that claim name now. The pod will own this persistent volume claim and this persistent volume claim will claim this volume.

A

So, ultimately, now we get that really elegant separation concerns where operations can own the storage and the persistent volumes ensuring it's backed up carefully, ensuring that access modifiers are as expected as a developer. I can claim one of those persistent volumes and attach it into my container in a really elegant way: file storage.

A

Next up, let's look at config maps. Config maps are a great way to do configuration inside of kubernetes. Now these are read-only configuration details. I can mount them either as environment variables or as files.

A

So here's an example configuration yaml file. I have my pod definition. Maybe this is part of a deployment or maybe it's a separate pod definition in my containers list. I specify my things. Maybe I'm specifying other details like a volume and now I have an end section for the environment variables I'll give it a name and a value name and a value.

A

Now I've specified a lot of environment variables. Inside of my application. I can pick I can read the environment variables in the way that makes sense for my app and my programming language and then I can get at that detail to get at those values.

A

Now again we're left in that quandary who owns this file developers or operations? Maybe operations wants to keep production secrets secret or maybe the details are specific to each environment and they want to be able to roll those keys easily.

A

Well, we have an elegant separation here with configmap. A config map can specify the keys and values for each of the configuration details and then name this configuration file so that we can include it later. So in this case we have the name of the config map.

A

We have a key of secure database and we're using cyril to do so. Our logging is enabled and here's our api url. We have a config map now in our pod definition. Maybe in the case of a deployment or a separate pod, we can specify a bunch of other details and then we will specify end from this config map reference. I'm naming my config map, and so all of the keys from that config map are now exposed as environment variables inside my container.

A

Now that's perfect here, I'm specifying the configmac name, and now I have all of the details.

A

I can also mount it as files now, in this case I have a volume mount section and I have a folder inside my container and I have a volume section instead of using host path or another driver here, I'm specifying the config map, I'm naming that config map. So inside this directory, I will have those three files, one for the secure database, one for logging and one for the environment, the api url.

A

I can open each of those three files and I can get at the contents in that up in those files to be able to read those variables. Now, that's perfect! I have that great separation of concerns. If I need it, if I don't need it, I can use the simpler approach. Now. Some of these configuration details might be secret. So, let's see, how will you store secrets now? Secrets are built into kubernetes and they have been for quite some time, but they were always stored as base64 encoded.

A

Now, since kubernetes 1.13, I can choose to enable encrypting these secrets at rest. Now I do need to opt into it, but also be equal, that's kind of late in the game. I may choose to move my secrets into a key store like hashicorp, vault or azure key vault.

A

Now these are encrypted at rest as base64 encoded, and they are copied to every node on the off chance that I start a container on that node and need to read these secrets. So if you lose any nodes in your cluster, you can probably assume that you've lost the secrets and you should go roll those keys.

A

So here's how we can create a secret. Now we have this yaml file, defining the secret, here's, the name, the keys and the values for each of the keys and values that I want in this secret now here I'm specifying them as string data. If I chose to specify it as data, then it would need to be base64 encoded, which is a great way to include other content like json objects.

A

Now the cool part is I've, got a yaml file. The downside is it's really easy to check this in the source control? Let's not. Instead, let's create this secret on the command line, where we know that this content won't get checked into source control, so we'll create this secret. Here's, the secret name, db connection and then the keys and values for all of the keys and values that I want in this secret.

A

Now, that's great. I can now create a secret in each environment, specifying the environment, specific secrets for that that deployment environment and now all of the containers that need to can reference those secrets.

A

Now I could do it again as files or as environment variables, in this case, we'll use files and so now I'll specify the secret and that secret name so inside this directory. I now have the database user and database password files. I can open up each file and I can get at that secret.

A

I could also do it as environment variables. Last time we just specified that all of the environment should come from a particular config map, in this case we'll enumerate them. Here's the name of the environment, variable that I want to create and I'm going to get that value from the secret: here's, the secret name and the secret key.

A

Here's also the secret name and the secret key, and now I have two environment variables that I can use in my application.

A

We were able to do configuration and volumes, which is great now, if I have configuration or secrets, or I need to store data more durably in files, I can do so with kubernetes native details, but what if I need lots of pods running together? Maybe I have a cluster. Maybe this is a database cluster.

A

Here's where I have stateful sets a stateful set is a bundle of machines, and I miss using that term machine because they're pods they're containers, but it creates those containers with predictable names I can think of a deployment, but instead of getting randomly generated names, I get names that I can expect.

A

So maybe I have a kafka cluster or an elk stack, or I need to create that mechanism where I have many machines, many pods running in concert that are able to sync data between them or discover each other and hold elections to find the correct one.

A

I have each of my containers and instead of the pods being randomly named here, it's pod 0 and pod 1 and pod 2.. Now that means that I can discover them. I have this headless service, where my container can reach out to another container to be able to transmit data or synchronize configuration.

A

If that pod is unavailable, then the next pod to get spun up will be named exactly the same, so that I know that those are the machines in my stateful cluster now I do have a headless service.

A

Here's a stateful set and it's pretty much a deployment. We can see that we have a template that has all of the container details and those are, as we would expect. The only difference from a deployment is that it is a stateful set and that we're specifying this headless service name now that headless service. Then here is pretty much a regular service.

A

We have the ports, we have the selectors. What makes it headless and required for a stateful set is that we set cluster ip to none. Once we set cluster ip to none. It is now a headless service and we can use that headless service to be able to communicate between the different pods in our cluster.

A

So now, if I'm coming into that stateful that headless service, it's a regular service and it can round robin across all the machines I can just curl to dbservice.

A

But if I want to get to a particular machine, here's my db service- and I will specify that machine name here. So I can use dns to go, discover all of the different machines in my application and be able to replicate data across them.

A

Now I may choose in my headless service, to accept inbound traffic to load balance across all these machines, but maybe in my stateful set, I want only a subset of those machines or a subset of those ports available. So I'll, probably not reuse, this headless service for inbound traffic, but rather create a second service that accepts traffic. Only for those particular details that I need.

A

Okay, so we've gotten to look at a lot of different mechanisms built into kubernetes to facilitate stateful workloads. Let's take a look at our application diagram and identify the pieces that we've been able to discover.

A

We had the ingress and the service and the pods with containers inside them. That was our stateless environment and yeah. That was easy. We have a domain name. Perhaps that's a config map in our ingress that identifies this domain name, so it can be environment specific we have our https certificate. Perhaps this is a secret. Now, the beauty of it being a secret is that well it's secret.

A

We have our containers that have configuration details to be able to discover the database cluster. Perhaps these are environment, variables or files stored in config maps or maybe secrets our database cluster. Now we need multiple machines to be able to pull this off so we'll use a stateful set so that each of these pods can be able to discover the other pods to be able to replicate data between them.

A

We'll also have file storage, we'll use volumes or persistent volumes and persistent volume claims for this, and we may need some configuration to be able to discover those volumes. Persistent volume of persistent volume claims stateful sets config maps secrets.

A

These are all tools that are built into kubernetes to help us get to that next level of being able to use stateful resources inside of kubernetes. Now all of these things are built in what's great is we can replace some of them with advanced details? Maybe we need a specific driver to get to our sin, or maybe we need very specific encryption details.

A

We can replace each of these with standard crds.

A

This has been fun being able to show you stateful resources inside of kubernetes join me in that place where the conference is designated for q, a or, if you're, watching this later hit me up on twitter at rob, underscore rich and grab these slides right now at robrich.org.

A

A