Rook Rook Presentations, 5 Oct 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SDC 2017 - Advancing Clustered Storage Architecture with Kubernetes - Daniel Kerns

Description

Downloand Presentation:
https://www.snia.org/sites/default/files/SDC/2017/presentations/Containers/Kerns_Daniel_Advancing_Clustered_Storage_Architecture_with_Kubernetes.pdf.pdf

Abstract:
Internally storage systems are actually complex distributed systems and implementers spend significant resources managing the cluster. Now that we have maturing container orchestration systems, much of that work can be delegated to the orchestrator allowing storage developers to concentrate on storage. In this presentation, we will talk about our experiences building Rook.io on top of Kubernetes and demonstrate how container orchestration changes perspectives on storage cluster implementation.

A

I'm hope Dan Kearns I run the RIC project. I have to also work for quantum today. What I want to do is talk to you about how, by leveraging environment like kubernetes, you can orchestrate very complicated storage systems and save yourself. A lot of work, consequently, focus more time on building storage systems than on building distributed computing systems. I'm gonna give you a demo of rook running on kubernetes and I'm, going to talk to you about the architecture that are some of the architecture that we use inside of rook to support staff on kubernetes.

A

Ok, so I think everybody understands I'm, gonna fall off the stage. Sure I think everybody understands that storage systems distributed storage systems are really complicated, distributed systems. There's all kinds of stuff going on. There can be thousands of instances of different process types. There can be thousands of nodes, they're, actually very complicated systems and storage developers spend a lot of time working on the lifecycle of the various processes and demons, and things like that that support the storage system. So it's a common cause of failure.

A

It's not just about the data structures that represent the storage, it's about keeping the system alive and healthy, and that's where kubernetes comes in and makes an enormous enormous difference. So in a cell system, there's probably 10 different services, maybe maybe more than 10 thousands of instances of those services in the storage cluster yeah, you have to keep track of them all. You have to manage the lifecycle and you have to do what you need to do to keep them all healthy, kubernetes, I, don't know how much people know about kubernetes.

A

Kubernetes is one of several container orchestration systems. The goal is to drive up the utilization and cluster computing systems and to move to a more declarative model than procedural model for managing the various applications that run in the system. It's got all kinds of features. It's it's there's lots of goodness. It seems to be picking up steam in the market. Place were at quantum, were firm believers and kubernetes, and so we looked at how we could use kubernetes.

A

We make storage appliances, and so we looked at how we could use kubernetes to make the job of building a storage appliance easier and how how we could ultimately build a higher quality, more reliable storage of appliance with kubernetes.

A

The thing that really jumps out about kubernetes is the declarative nature of what you do. You generally don't write code to tell kubernetes what to do. You basically write a spec. This is what should happen. I should have three instances of this application running in my cluster kubernetes goes and make sure that's that that constraint is valid at all times. There'll always be three instances, so if one of them dies whatever reason, a new instance is created.

A

Likewise, you can say you know: I want to have enough instances so that my response time with my application is such and such and kubernetes will do the scale up and scale down and whatnot of that application. For you there's also a bunch of support for the deployment of applications to notes. If you ever built a system like this, the deployment, it's a simplest thing in the world, except that it's not right and there's actually a lot of trouble that goes into deploying applications and distributed systems. Kubernetes actually makes that simple and likewise updates.

A

When it comes time to update that application, you generally don't want to take your service offline. You generally want to have some sort of online update plan, but of course, those are hard to build and again kubernetes has platform support for doing those kind of online or rolling updates, there's several different strategies and, of course, in order to do all this. It's monitoring the health of the system and whatnot. So there's a great deal of information available about the health of the various application. Instead of running the system, so I.

A

Sort of mentioned the declarative thing: it supports lots of different service models, so you can have a service model. I talked about an instances and load balance, Vinson says, but there's all kinds of different service models and there's this concept of a thing called a pod, which is a collection of services that work together, and so you can put a bunch of services in a pod, say, micro, services and, and they will be scheduled together. It has an extensible control plane. This is one of the biggest things.

A

That's a benefit for us, I mean the current version of kubernetes. These things are called CR DS and you you define CR DS and can interact with a control plane through these CR DS, and it provides complete application, lifecycle, support, which is a great deal for the storage cluster, because it has to run forever. Now, let's look at storage and kubernetes. So if each one of these vertical things is a node in kubernetes, you may have some number of applications running on these nodes in whatever layout is necessary. For that application.

A

The waste storage is handled in kubernetes. Is it's handled off cluster or the normal way it's done in kubernetes is off cluster. This is through something called a persistent volume claim, and if you look at the kubernetes documentation, you'll see it says it does storage orchestration.

A

What that means in kubernetes speak is that if this application is connected to some storage out here and then that application happens to be rescheduled on this node, the the volume attach points will be moved to that other node seamlessly and that application can continue running so that it dies here and scheduled there or if it's rescheduled in multiple places, kubernetes will handle that transition for you and that's actually pretty cool in rook. So rook is an open-source project that that we sponsor at quantum we have lots of contributors.

A

You should go to github too, to look to check it out. Our idea is gee what if we actually brought the storage system into the cluster? What if we ran the storage processes inside the kubernetes cluster? This gives you a lot of flexibility. First, it means you don't need to have that external, a separate storage box that you bought from somebody else.

A

You can use disks attached to the various nodes in the cluster, but it also means that the applications that make up the storage system can benefit from all this goodness that I've been talking about kubernetes I. Think that's one of the the major benefits, so we orchestrate south in ruk ruk is an exclusively set. Will do other storage backends over time. That stuff is interesting for tons and tons of reasons and there's been quite a number of Ceph Doc's at this conference.

A

All of those apply here we have a pod called the rook operator and you can think of it as a system operator or something like that. You could think of it as the operator pattern, if you're familiar with that. But the idea is to manage the storage cluster and so there's a bunch of code in the rook operator that manages the lifetime.

A

The use of kubernetes to manage the lifetime of all the components that make up the Ceph deployment that sits underneath, rook and deploying SEF is actually pretty complicated in practice, and so we try to abstract away a bunch of that and more directly integrate seth with the kubernetes platform in the process of doing this and I'll demo.

A

This later on, and so because storage now runs in the cluster you as either an IT person or as a developer or as a vendor in our case, have the option to do either a converged or or an appliance model for your storage.

A

So you can build a kubernetes cluster where the applications and the storage all run together in the cluster and- and you know that has certain benefits or if you're, building a standalone storage device you can just not have any applications running except the storage applications gives you quite a bit of flexibility in how you end up deploying the storage. So our architecture looks a little bit like this same picture as before, except the rook processes are running throughout the cluster. What we get from kubernetes.

A

On the other hand, he is support for all the things in this box and some of those things are pretty hairy like I mentioned the upgrade and the deployment, but also things like security and and scheduling of these processes we're using kubernetes security for controlling access to the cluster. In this case, and so we're hiding the Ceph level, AF has its own security system and so we're hiding that, because for a kubernetes user, they already have to deal with the kubernetes security. What do they want to deal with the stuff stuff?

A

For so we think this is a really big win: the components of rook. We have the rook operator, I mentioned already. The various set components I just listed five of the major ones. In some configurations we have storage presentations in NFS, sifts and I scuzzy.

A

We have the rook agent, which is a rook component that runs on every node in the cluster that handles persistent volume mounting and the rook API, which is, it just presents a rest, a restful api, if you're not using the kubernetes CRD extensions and I'm going to go into each of these insert talk about how they're managed a little bit differently in the rook cluster. So first off the operator, the SEF manager and the rook API are pretty much single tense. You generally run one instance of these guys.

A

You may run two instance or two or more for high availability, but it's a it's a master/slave relationship. It's a you'll fail over to the other node. If you need to so they're very easy to manage in the SEF environment. As a cluster comes up, we bring up the rook operator and then we support multiple stuff clusters on a kubernetes cluster. So then, you'll specifically after you bring up the operator, you'll specifically say I'd like to create a soft cluster named.

A

You know, Joe right and then it goes ahead and and spins up to set daemons. The set monitors are a real problem area for SEF clusters. The SEF monitors are.

A

They control the Ceph cluster in a very in a very, very real way. It there have to be at least two running at all times and in our environment. We only support three because we never saw a need for more than three, but you can have any op number above above three and the monitors maintain a quorum so they're how we detect or they're they they.

A

They maintain this each each maintains a copy of the state of the cluster and there's a quorum elder another underneath that has to run, and so it's really important that we run just three of these we and that their names are really well known, and so one thing that happens in kubernetes is pods can get rescheduled over time. A node fails or it dies it crashes.

A

Whatever happens, it can get rescheduled and Seth Mons don't tolerate having their IP address has changed very easily, and so you have an option where you could say: okay, Mon number three died and so I'm gonna create mod number four and we're just going to write off Mon number three forever.

A

But what we actually chose to do is if Mon three dies and gets rescheduled on another node, we're using kubernetes service IP addresses so that the IP address remains consistent, regardless of where in the cluster the Mon comes up. If that turns out, it's been really really productive for us, everything in kubernetes cluster has to be able to fail, and that includes the Mons, and so we spent a great deal of time here. Understanding failure in the cluster and understanding how we recover from failure in the cluster, thus fos.

A

These the basic idea is most people run one OSD per disk drive or per rate controller, or something like that. We generally don't use rate controllers for all these reasons, and so we'll schedule a pod on every node and each pod will contain. However, many OSDs are necessary to support that note.

A

So if there's say 10 physical spinning disk drives, there may be 10 OS DS in that pod, but we also have the ability to redirect metadata to other nodes to other devices, and so, if you say, have an SSD and you want to store your Bluestar metadata on SSD and your blue star data on a spinning desk, that would be 1 OSD and we can manage that declaratively.

A

It's important. It's really. It's really surprising to people who have used Seth before that. There's no self.com files here and they do exist under the covers, but nobody uses Rooke ever sees these files, they're generated on the fly by ruk and they're, put in the right places in the pond, and so the Ceph demons can run unmodified.

A

But nobody ever has to edit a septic compound for the other services rgw, which is the object, store and MDS, which supports the file system. We scale as needed based on traffic, and so the startup and shutdown time of these services, particularly MDS, is, is.

A

They take a long time to start up, and so you can have you can handle others parameters that you can use to control how often they're started up and at what at what the transition point is and so on. We have the opportunity to do SSL termination at the kubernetes load balancers if we want, instead of terminating farther inside the SEF system, so that actually simplifies a bunch of the deployment and configuration of SEF.

A

So what do we get from kubernetes again, this lifecycle? It's a really big deal. If you've built a distributed system, just processes dying right, you don't want to wait to find out and and- and we get this lifecycle support at both the node pod and process level. We have automatic failover of services, kubernetes remaps IP addresses and makes things look fairly, invisible for that support from kubernetes directly for upgrade and for rolling upgrade.

A

We can control it or we can let kubernetes do it in its own way that cluster level metadata I mentioned the Ceph dot-com files that suffuses our metadata is ultimately stored in sed, because that's the way to do it in kubernetes, but we're generating configuration files on demand for each of these services, based on what users have asked the customer to do secrets.

B

A

Has support for secrets built in just basically think of as as encrypted things that you don't want other people to see. We use kubernetes secrets to store the Ceph secrets, so the Ceph keys for us eff, encryption and that sort of stuff are all stored in kubernetes secrets so again we're leveraging the kubernetes authentication and access control system to manage the Ceph secrets so you'll when out demo. You won't see any of them and it has an extensible API.

A

So with that, if there's, if there's questions, I could take them right now and then we can jump in a demo. Maybe yeah.

C

In general, this is here we have an assistant coordinator, Bernays.

A

C

Modeled on the hospital- yes.

A

C

And then you give.

A

C

It from the container.

A

Right, there's a container that's running by the Padre.

C

Rosa, so here.

A

C

Know what I'm welcome multi girls wear clothes is synchronous, most probably sure that another container does not like, or somebody else won't at this possible doesn't have access to. This is tutorial related to this well.

A

That once the volume is mounted on the hosts, access to that volume is controlled by UNIX access, controls, right and so kubernetes ensures that no other container has access to that mount point.

C

As long as it's my answer as.

A

Long as it's not what, as.

C

Long as the container about dynamism, it's.

A

Not what is another container.

C

Or another process from the host he's not buying goods. That's.

A

Right and running with root privileges, yeah, that's, okay! Let me demo this thing. So hopefully the demo gods are with us. I didn't know how I do this with a projector.

A

A

Oh you've got my screen great, so here I have a kubernetes cluster key I need to make it bigger right.

B

A

It's not supposed to be an eye chart.

B

A

Make this one bigger tick: that's when you get over there, so I have a kubernetes cluster area. It's got three nodes running fairly generic cluster in this window. Over here, I'm gonna run this watch script.

A

It just runs, watch over and over and in the window and makes us he makes it easy to see what's going on in the cluster. So again, we can see if there's three pop there's three nodes running currently: there's no a pods being scheduled in the cluster nope, persistent bomb claims and no services, and so, if I go and create the rook operator and I'll show you my cheat sheet here. I definitely have to cut and paste kubernetes commands, so the rook operator has been created, so this yamo file has been essentially installed in kubernetes.

A

At that point, this rook operator pod over here has been fired up and that's all that's happened right though it's just a process of running somewhere and it's not really doing anything. We haven't created a storage cluster yet, but that is what the work operator does, and so we go over here and we create a storage cluster. Okay and I'll show you what this looks like.

A

Thank God, the same keystrokes work here.

A

What this Yama file does is it basically says I want to create a cluster named rook. Okay, I want to use all the nodes in the cluster for storage and I want to use all the devices in that in the cluster. So you could, you could specify a set, a subset of nodes or a subset of devices.

A

Basically, any kubernetes selector here could be used to identify the nodes or the devices in particular I'm, going to use the blue store files, the blue start storage system with a couple of parameters and and that's the whole cluster spec. Now, in the background here, we've installed, SEF write an entire stuff installation, and so the monitors our car have just come up. There's there's 3 mons running out of the three different nodes, so they have an anti affinity, so they never run.

A

In the same note, in a couple seconds here, the the OSDs and the SEF manager will come up if we're lucky they're there. It is so 3 OS, 3, OSD pods running at each of the three nodes and each of those OSD pods, because I only have one drive on these VMs is just running one OSD, but if there had been 10 drives on the VM I'd be running, they'll be running 10 OS days so now we'll send an entire stuff clusters up and I'm going to show you how to interoperates with kubernetes.

A

But let's look at the set parts of it a little bit I'm going to create this pod called Brook tools. This is a toolbox pod that contains the set command line. Tools like the otto's command and SEF, and whatnot and you'll see it's schedule in the background there and then I'll connect.

A

And so I can use a rook command-line tool to have a look at the status of the cluster.

A

Crossing fingers here because it usually doesn't take the song.

A

Okay, that's not good. We can go sub status.

A

Oh, it's a bad sign.

A

Out there I don't know what made it slow so.

A

You can get the status this tough cluster, if I, if I exit this and I go and create a storage pool.

B

A

Nothing much happens there, but if I go back to the tools so that I go.

A

Thanks I think that.

A

I, don't know why it's so so slow, but I created a pool with that command called replica pool the process to create a pool or the code to create a pool. Look like this.

A

So in this case, I created that pool whose name was replicated replica pool on the cluster storage cluster rook. It has a replication factor of 1, so not very interesting there, but I could have erased your code editor or whatever, and then I give it some more parameters to create it.

A

What's really cool now, because it's kubernetes is that I can go in fire up applications and use the Ceph storage that's been created, so this is just a standard. Wordpress demo application and I'll show you the file a minute. But in the background here you can see that a wordpress and a my sequel app have come online. The my sequel app is connecting to that replica pool that pool that we created and my sequel is starting up. So let's have a look at that.

A

A

Thing just very traditional kubernetes yeah mo we're going to create a service called WordPress. That's going to have an endpoint, it's going to create a persistent volume claim called my sequel. Pv claim we're going to map that to the Brooke storage cluster, we're going to fire up a my sequel instance. That's what this is and telling it to use that persistent volume claim and then we're going to move forward and fire up a wordpress instance, and so, if we're lucky, we can actually come over and find out what port it's running on three.

A

Oh five, four two.

A

And and browse to that WordPress instance.

A

Oh five or two.

A

B

A

Somebody could tell you to be W in there.

A

How was it okay? Let's wait. A second.

A

I think maybe, when my laptop went to sleep, the koomer neighs in there, it's got sort of slow anyway. What else could we look at here.

A

The my sequel and WordPress descriptions are really very Fenella now, if you're, building a storage appliance if you're not interested in building a hyper-converged system, if you're, just building a storage appliance realize the run, a storage system is a lot more than just running safe right. You have a bunch of other applications running on the cluster right. In fact, you know, databases and other sorts of metadata running in the cluster too, to run the support applications for the storage cluster so running.

A

My sequel inside the storage appliance is a very reasonable use case and in fact, we're also running Prometheus and other things here, and so the ability to just declaratively say what you want through this kind of text is a super is a super advantage.

A

Maybe this would help.

A

Oh there we go a little Wi-Fi changes, everything, so my sequel is running and then, in a couple seconds we're pressured start right.

A

And the cool thing you can do once this finally starts up is you can tell that my sequel instance that you know gee I want you to reschedule another note if we can just kill it or something like that, and it will automatically kubernetes will pick another note for it to run on reconnect the persistent volume claim to that instance. It'll still have all of its data, and the application WordPress in this case will continue working.

A

What's the magic for that huge.

B

Etl describe Bo.

A

A

I literally, can never in a bra cute milkman. huh What does it say here.

C

A

B

Waiting for volumes to, but in his bone, according to her, what.

A

It clips tried.

B

To stir the pot, it better start moving the pot and QC sorry.

A

The wordpress pocket.

B

A

Oh alright, thank you.

A

I hate doing live demos, but I also hate it. When people.

A

A

305 for two: there we go and so WordPress comes up and we can configure this.

A

We don't need to save our password.

A

And so there it is WordPress is running, and then what I was going to demonstrate is if you go and kill the my sequel bot, which is currently running on node 2.

A

So that pods terminating a new container is trading, hopefully it'll work. This time.

A

I, don't know what that so something about early in my laptop.

A

What was your magic command? Yes, but was.

B

There, my sequel container on the top three it might just be that container.

A

Oh that's: what's going on, I pre-loaded the containers and but I might not have done it on every single node I'm sort of concerned about why the one I killed. It's not terminated, yeah.

A

Well, in a perfect world, it would have just restarted and I could have shown it to you, but the demo gods were not with me so.

A

So that was the demo. There are some disadvantages to this approach. First off we don't own the whole software stack and if you're a storage systems vendor I mean that could seem like a big issue to you, and this this particular stack. Kubernetes stack is evolving really really quickly.

A

Kubernetes is on I think a three month release cycle, maybe it's even less than three months release cycle, and so every time you turn the page, a new version of kubernetes is coming out or something else is changing in the stack and again things changing that rapidly in a storage, it seems to be somewhat incompatible with the storage system, so we really don't have total control of the stack, which is not what storage vendors generally look for.

A

The flipside is that by delegating all this work to kubernetes and giving it all this and leveraging it for all this work, it's an opportunity, cost argument and- and we have a lot more time to implement storage features, which are the ultimate features that we think our customers are paying for, and so the net here is that kubernetes turns a life cycle code into life cycle declarations.

A

So we can write these declarations that are much easier to do than to build the code to do that, and we have more time to spend on logic for the storage system.

A

Operationally complex systems like Ceph can be dramatically simplified through tight integration, with things like kubernetes with patterns like the operator that we use and- and we believe that overall systems reliability is increased. So any questions.

A

My shameless plug, but any questions.

A

All right thanks a lot.