Cloud Native Computing Foundation Online Programs, 10 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Persist Your Data In an Ephemeral K8s Ecosystem

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hello, everyone and welcome to a brief discussion on persisting your data in an ephemeral ecosystem. My name is eric zitlow and I'm going to just be going through a couple things here today. um Who am I? Why should you listen to me? Well, I'm a dad on kubernetes uh community ambassador, I'm a director of developer relations at maya data, one of the biggest contributors to the open ebs project, I'm also an open source software contributor and committer on a couple other projects as well, including things like the apache sander project.

A

um I've been a distributed systems, solutions architect in a previous life. I've been a network engineer software engineer, I've kind of run the gauntlet. So I have a very bit of experience with different aspects of this and I definitely was here throughout the rise of kubernetes as a technology.

A

So why does this matter? Why should we care? Well, cncf did a survey back in 2020 that was really really fascinating. um They basically were asking people questions about containers containerization what people are actually using in production and from the time they started the survey back in 2016 to 2020 they experienced a 300 uptick.

A

So this last year in 2020 it was 92 percent of people who responded said they had used or were using containers in production right now, that's pretty wild and it gets even crazier when they asked about state workloads, because 55 percent of respondents said they use stateful applications in containers in production.

A

Now, for some of you that might be kind of a a bit of a mind-blown situation, because containers typically are not used for stateful things. They're an ephemeral object, typically so having a stable workload running in a container generally requires some finessing some extra technology, and today we're going to be taking a look at that, and maybe the right way to do it all right.

A

So, typically, there are many many solutions and there actually is kind of an entire sub-industry of the storage uh industry, that it has sprung up things like sap storage, os, a gluster, longhorn and, of course, open ebs, which is what we're going to be talking about today. Now, as I mentioned before, open eds was basically majorly contributed to by maya data. It's actually one of the reasons I took a job at maya data. It is one of the better solutions. It's a um branch.

A

The longhorn and open eds actually have the same roots in the same project. They're, both basically a fork uh open, ebs itself is uh I'd, say the more open, uh open source community version of that, um and it's really grown into its own thing and does a lot of really cool stuff, so we'll get into that today. Okay, first off, what is the elastic block source, so ebs ebs, quite simply uh is is just that it's an elastic block sort of storage, which means it's block, storage device. It's not!

A

um You know a file system, it's not anything else. It's just a block, storage device um and things like aws eds volumes. If you ever use those they're exactly the same thing: they're ebs volumes, elastic block storage. um This is just an open source implementation of the ebs technology, so pretty simple. If you've ever used any uh any aws or azure has their own version same with gcp uh pretty much. Every major cloud provider has some variation on the uh evs volume.

A

uh This is something you can run yourself, though, open evs being an open source implementation. Okay, the dnr attached storage is something I'm going to be saying a lot, so I figured, I should define it container. Attached. Storage simply means a storage device that is closely associated and basically bound to a container. So in this case we're basically binding a uh storage device, an actual physical piece of storage to a pod and we're doing that in a persistent way, so we're creating a container attached storage architecture.

A

When we do that, okay, simple kubernetes, just really quick going to run through this, we have master, we have workers, everyone understands this master could be a control plane, it could be a single node, that's fine workers, it could be. One could be very. You know many many many many workers, that's all cool.

A

When we have our master, we have a couple different parts. We have our api server. Api server is actually how we communicate as the operator with the kubernetes cluster. We have our controller manager and scheduler, which basically handle all of the tasks that actually getting work done, is set up and delegated. By these two components, then we have fcd. Fcd is actually kind of the secret sauce or part of the secret sauce of kubernetes. It allows us to reference other kubernetes components, so networking other pods by names tags all sorts of things.

A

It allows for basically a lot of obfuscation of the complexity that we normally have to mess with when we actually deal with a full microservices architecture, for example, you can actually obfuscate a lot of that into fcd, using labels and in names of various things, so very, very powerful tool. If used correctly. Okay workers have a cube, a cubelet which is basically the kubernetes process.

A

That's running the c advisor, which is a local resource management and monitoring tool, so basically keeps track of everything on that specific worker and then cute proxy, which is uh the networking everything when it comes to uh kubernetes. So all the the communication between nodes, all of the um people going in and out, say to hit a web page that you're hosting those sorts of things q proxy is involved in all of that, then we have our pods. Now you have many. Many pods on a worker and pods are basically the containers for containers.

A

If you want to think about it that way, pods themselves are actually ephemeral. This is where the problem comes. When we start talking about data and persistence and statefulness and all those things a pod of being ephemeral means when that pod goes away, a power goes down. If you know someone shuts that pod down cube ctl shuts that pod down that whatever it had whatever it was doing, whatever it was storing, is gone. It's just gone. There's no way to get that back.

A

uh That means that we have to start thinking about other solutions when we want to actually store data. For example, if we were running a database inside kubernetes, so simply put our our most simple possible. Kubernetes architecture looks something like this: we have our master.

A

We have our workers all these different components, working together to create whatever the application was, whatever the workloads we had, it gets them running, it gets them running cleanly and hopefully we don't have to manage too much now, as I mentioned earlier, the q proxy helps us talk to the outside world and, as I also mentioned, we don't have data storage inside this kubernetes cluster. So what are our options? Well, as a kubernetes uh native workload? What we generally historically have to do is we would have to delegate all of our data outside.

A

So whether or not that was a database as a service. Maybe we were hosting something alongside our kubernetes cluster or maybe we're using s3 buckets. If say it's a web page and it's grabbing, uh you know like pictures or something for the website. um Generally, all of that would be hosted outside of kubernetes and while that's very cool, it still adds complexity. It you know, it'd, be really nice. We could manage everything in one place because we're actually um essentially simplifying our overall architecture when we do so so what does that mean?

A

Well, we're going to need a way to make pods unique and to do that we do this thing called a stateful set now stateful set is kind of a large topic all onto itself, but simply put just think of it for now, as a way to give pods a unique identifier that persists through that pod's life cycle. So even if we spin a pod down and and spin that thing back up, it allows us to spin it back up with the same unique identifier, with any updates we've done to that pod.

A

Persisting through that power cycling, so basically we're removing the ephemeral nature of the pot. Now we're not adding any storage yet, but we're giving it something where if we were to bind something to that pot, which we'll actually do in a moment, that's not going to go away just because that pod disappeared so really really key piece. Stateful sets are the first piece of the puzzle when we're actually solving for persistence in a kubernetes ephemeral world all right.

A

Second, we have these things called persistent volumes and they are associated with persistent volume claims or should say the pod associates themselves with a persistent volume plane, which then gloms onto a persistent volume that satisfies the claim requirements. Now this does not just happen automatically in kubernetes. You actually have to set this up. You have to set up padding, there's some complexity. That's introduced, actually a lot of complexity, that's introduced when you do this, and because of that, it means we're really going to want to have a system to manage this.

A

Otherwise we're basically going to have to be setting up a spider web of our own management solution and trying to bind volumes to the right places and make sure things stick around in the way they're supposed to. So how do we do that? Well, that's where open ebs comes in, because open ebs basically takes all of the uh complexity of setting up your storage, binding your storage and maintaining your storage, and it puts it all in one nice little basket now, you'll notice.

A

Here, we've added this thing called the open, eds control plate, we'll get into that in just a moment what it does but essentially understand right now it actually runs on the master and it changes the way that uh storage is handled or it actually uh improves the way storage is handled. So what are the different parts? We have no disk manager operator, which is an operator just like any other operator in kubernetes.

A

It's basically, uh if you wanted to think about it in kind of simple terms, it's kind of like a custom control loop um with the my api server, which is uh you know, associated with the api server, which is what the operator the user would talk to using. Maybe like cube, ctl commands submitting this api server. The my api server extends that, but we have a local pv provision.

A

So my key at my api server, just like I said, extends the kubernetes native api server. So when I'm interacting with open evs, I'm not having to run a bunch of side commands in a different, you know using a different process, I'm actually running cube ctl and I am running my commands all through there. So I'm setting up my storage classes, I'm setting up my persistent volumes, I'm setting up all those different pieces through cubectl, really really powerful any other tool. You'd use to interact.

A

Obviously qctl is just one, but they they all have access to that same api, so we're not having to kind of custom manage anything it's all just built in as an extension of the normal api server. Okay, local pv provisioner is actually really cool. What it does is it works with ndm, which we'll talk about in just a second and ndm, creates a pool of devices. Local pv provisioner actually pulls from that pool and creates the consumable resources.

A

So when we we use a persistent volume claim to glum onto a device that device was created by the local pv provisioner, that device can be all sorts of things. Local, pv, provisioner, basically obfuscates it. So all we see is, I have storage device, I use storage device. Storage device is associated to a pod. Doc makes it really really simple: okay, ndm or node disk manager is a daemon set that runs on each node and what it does is.

A

It essentially keeps track of the physical devices that are attached to each node, and it has a way to filter through those devices it's actually configurable, so you can filter based on criteria and it will create a pool, and that pool is your list of valid devices. It's all the places that your local pv provisioner can pull from to get a device to then associate with a claim to then be used by a pod, so we're creating a resource pool that is actually referenceable through qctl kubernetes handles it as a kubernetes resource really important there.

A

It's essentially when I say kubernetes native it means we're not doing anything wild with kubernetes we're simply using kubernetes in the way it's intended to be used to manage our resources. So ndm allows us to do that by creating this device pool. It also enables some cool hot swapping stuff and some other really really nifty features, okay, putting it all together.

A

So uh as you'll see here, the uh pathing changed a little bit, because now we actually have a pod here on the left that is associated with a persistent volume claim that is associated with a persistent volume and the local pb provisioner is still associating uh other persistent volumes ready to go uh for when another persistent volume claim comes through.

A

So this is essentially in simple terms: the right way to do data on kubernetes and is one of the ways you could then run, for example, a my sequel or or cassandra database in those pods that have those persistent volumes without the worry of losing your data, a really really powerful tool, highly recommend you check it out, as I mentioned before, it is an open source project. The github code is out there, you can go download it modify it contribute to it. Please, if you do contribute, uh you know put a shout out.

A

Take me on linkedin, it's eric zitlow on linkedin. I would love to help promote um the other thing. That's really uh a good way to get involved. Is the open ebs newsletter um go ahead and subscribe to that it is curated by us at maya data um we are. We are trying to put that out as a resource for the community kind of all things. Storage related around kubernetes, all right with that. I'm gonna leave you and have a great day.