Cloud Native Computing Foundation KubeCon + CloudNativeCon Europe 2021, 14 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Towards CNI v2.0 - Casey Callendrello, Red Hat

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon North America 2021 in Los Angeles, CA from October 12-15. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Towards CNI v2.0 - Casey Callendrello, Red Hat

CNI v1.0 is out! In this talk, we'll discuss some directions the CNI project could take as we look to define the next generation of container networking.
We'll look at -- Where CNI is today -- What it does and doesn't do well -- How we might improve it -- How can Kubernetes better use CNI?

A

Hello there and welcome to the cni maintainer session here for kubecon 2021., whether you're watching this live and joining me afterwards for q, a or joining afterwards on youtube at one and a half time. Speed. I'd like to thank you for taking some time of your day to join me. So, let's get started.

A

So the maintainers track sessions at kubecon are where cncf projects can give things up, can give updates and talk about. What's next for those projects in this session, I'll be doing that for cni, and what we'll be doing here today is starting the conversation about what cni 2.0 might look like. First, a brief introduction. My name is casey calendrello.

A

I am an engineer at red hat, working on openshift and upstream kubernetes, as well as maintaining other upstream projects such as cni, oven, kubernetes and go ib tables, which you may have used.

A

I've given similar talks to this before, but this is mostly about cni 1.0 and today this is a new talk around what we might want to do for cni 2.0. What are we going to talk about today? We'll talk about an update of what the project has done so far, namely releasing cni 1.0, we'll talk about some pain points and considerations that we are thinking about as we look into the future and we will look at some possible directions that the project might be taking.

A

So the first agenda item is cni 1.0, but this slide is kind of boring. I think we need some more word art there, that's better by the time you're watching this talk, cni 1.0 will have been cut and this is a pretty cool achievement right. This is a standard that started from the community about five years ago and it's finally time for us to declare 1.0 that we've reached a stable specification.

A

Cni 1.0 is not going to contain any particular surprises. It looks pretty much just like the previous versions before it. It's just a formalization and a rewriting of the existing spec that everybody knows and loves.

A

Honestly, it's been almost five years now, so it's appropriate for a project that is as mature as cni to declare stable release. um As an aside I'd like to thank the cncf for donating time and resources to help us set up a website which we didn't have the resources to do so now you can find everything you need to know about cni at our shiny new website, cni.dev.

A

Just to get everybody on the same page, here's a quick overview of how cni currently fits into the overall kubernetes and containerization ecosystem.

A

Cni is responsible for configuring, a network interface more precisely, an attachment inside a container, that is to say it mediates, the interaction between a network plug-in and a container runtime cni. The protocol is an execution protocol and cni is additionally a configuration format. There is a reference implementation for consuming the configuration and executing the protocol known as libcni that is used by many plugins and many container runtimes lib cni is maintained by the cni project itself.

A

The project cni also supplies some commonly used network plugins for really common use. Cases such as a generic bridge implementation, but by no means are the cni plugins released by the cni project, an exhaustive list or the exclusive set of cni plugins that people tend to use.

A

So if you look here in this diagram on the screen, cni is everything in this orange box. That is to say it is the configuration it is. The protocol, by which a runtime on the right talks to a plugin on the left and lib cni, is the reference implementation is in between this interaction.

A

Let's talk briefly about the abstract or logical components of the cni model. On the left, we have a container which is managed outside of cni. Cni is only one aspect of bringing container up and cni makes no commentary on how a container itself is managed that is supposed to be handled by a container runtime engine.

A

Then there is a network which, in the cni world, is represented by a single cni configuration file, and then you have an attachment of a container to a network, and this is cni's picture of the world to be a little bit more accurate. The cni model allows for multiple attachments in a single container. It allows for multiple networks and it even allows for multiple attachments of the same container to the same network and as an aside, the fact that kubernetes only understands a single attachment is a limitation or a decision within kubernetes itself.

A

Other cni runtimes such as, for example, podman natively, support, multiple interfaces and the cni works with them. In this regard, the execution protocol has only three methods. It is a simple sort of rpc world. It has three methods, add dell and check. That's it add concerns with creating an attachment says. Please attach this container to this network and delete would be the inverse of that check is to report if a particular attachment is still functional, that is to say asking a plug-in to please validate that everything is still configured appropriately.

A

We go into much deeper dive into the specifics about this in kubecon 2020. So if you want, if you're interested about that, you can watch that talk, cni 1.0 really doesn't change anything from that talk, so that talk is still accurate.

A

An important distinction about cni, though, is that plugins are executable binaries when we say rpc, many people think grpc or json rpc or rest. No, we that's not the way. Cine works. Plugins are executable binaries and each rpc call is a new execution of that binary. It's a little bit different than what you may be used to, but that's it that's the whole protocol. It's not particularly complicated, so now that we understand the basics of the cni protocol.

A

Let's look at some problems and thing we'll just call them sub-optimalities that users of cni and developers of cni plugins experience with real-world uses in the network, the first sort of wart or well we'll call it a wart is executing executing binaries the bad thing about this. Well, there's a couple things. First of all, it's a security risk, and what do we mean by this? Well, we are deploying binaries executed as root in the host context, which is obviously an extremely privileged position to find oneself in. So that's a bit of a security risk.

A

It's also annoying in containerized developments. For example, you were installing a binary, that's executed in the host, but it's built in a container. It's a good thing that go and rust make it very easy to build statically linked binaries. Otherwise this might be a quite difficult problem to solve.

A

The fact that it is a single rpc, exec style protocol completely precludes the notion of any sort of events or push style apis where a plug-in could conceivably push state back up into the runtime. It's not possible with today's cni and one sort of funny.

A

Little thing is that many plugins today are in fact thin shims to demons, so we find ourselves in the somewhat comical situation of a demonized container, run time such as the cubelet or container d, talking to a demonized container network plug-in such as uh oven kubernetes by executing executing a very small binary, which then talks back to that demon. So it's a bit strange that we have to put this sort of adapter in, but that said, we chose binary execution as part of the protocol for a couple very real reasons.

A

The first is that it solves a real problem with go and name spaces right. So I don't know if you don't know about this, then consider yourself fortunate, but go and network name spaces, don't necessarily get along so well, and executing binaries is one way to mitigate the damage that can be done by that. It's also extremely useful for demonless run times. Not all runtimes have a running demon, for example, pod, podman or rocket from which cni came out of additionally executing binaries ensures that plugins don't cheat and not checkpoints date to disk right.

A

If you need to execute a plugin every single time, it forces you to make sure that you are managing your state in a correct and checkpointed manner. What's the first wart with cni 1.0 second wart is one that some people have probably discovered themselves as well, which is network status.

A

How do you tell kubernetes that a node is configured and ready for pods to be scheduled to it? The answer is you write a configuration file to disk. You write your own cni configuration file, which is a bit strange because it's the same configuration file you're supposed to use to configure yourself right. So this is a strange catch-22 and cni needs a better way for network status to be reported right now we only have attachment status. Network status doesn't exist in the cni model. Right now. So that's word. Number.

A

Two port number three concerns itself with configuration management right. So writing files to disk is a bit troublesome, it's inconvenient in containerized deployments, because you need to bind mount something to disk, which is also an interesting, privileged concern. It's not easily discoverable as well. It means that anybody who wants to know anything about network configuration needs to have the same directory bind mounted in containerized deployments. That's pretty awkward right.

A

It is also a bit too dynamic if you have the same network configuration and you like the same network configuration across all of your nodes in some cluster or fleet. Why do you have to deploy a daemon set to copy a file to disk? That's that's kind of silly that you need to write a file to disk, that's identical across your cluster and you can't use any of the existing methods that you may have for doing this simultaneously. Configuration files are also not dynamic enough just for some use cases.

A

If you have this configuration file that is otherwise entirely constant, except for say, ipools allocated to a node or network. Why do you have to template in like doing string, manipulation templating in some ip pools, like your configuration as far as you are concerned, is entirely constant, except for addressing pools right. So these files are not really supporting use case super well, since you need to do some sort of templating or meta cni configuration management, that's a bit awkward.

A

That said, what's the good thing about files, the good thing about files is that they're, simple, tooling them is very easy, obviously easy to script. And, let's just say it's not been a problem right now right, everybody can figure out how to write a file to disk, and then you move on with your life and you go and solve much bigger problems, but in any case, that's work. Number three configuration management.

A

The fourth wart I'd like to mention is a bit more abstract. It's also a bit more unclear. It's it's around devices and hardware. The cni protocol is designed with simple views and bridges in mind. These sort of virtual interfaces that are completely limitless and have no underlying basis.

A

That would either need to be accounted for or scheduled or managed in any way like hardware, and the particle absolutely reflects this notion of utter limitlessness, multis and danm, and some sort of meta cni runtimes contain an absolutely absurd amount of code to make working with hardware and device plugins even possible, and that's not necessarily to say that it's easy or conceal or or easy an easy model for people to understand right.

A

You can actually watch a talk in this cubecon eu 2021 by my colleagues, billy mcfall and adrian moreno they're, talking about their effort on something called the device information spec, which is an attempt to bring some order to the madness around hardware, initialization, networking and kubernetes. But the takeaway from this is two things. First of all, hardware is complicated, really really really complicated and we don't necessarily want to specify every last little bit of that. But the other takeaway from this is that cni is doing them. Absolutely no favors right.

A

The specification doesn't support the use case at all and if there's something we can do to make this simpler, we should consider adding that to the specification. That's the fourth wart, we'll call it the last work. I'd like to talk about is around life cycle specifically setup and tear down.

A

Cni makes it difficult for a single plug-in on a single network, so the same plug-in to share a given resource between multiple attachments or multiple containers. What do I mean by shared resources? It doesn't need to be that abstract, a shared resource can be as simple as a bridge, the same bridge that every container is attached to, and it's difficult to share these resources. For a couple of reasons.

A

The first is that there's not necessarily good information about addressing it's hard to aggregate things start to know when you can aggregate things based on ip addresses, there's also no timing. Guarantees cni explicitly makes no guarantees other than you will get a delete after an ad. So you need to do locking between multiple instances of your plugin.

A

If you want to share something between containers, it's also difficult to safely, if difficult, if not impossible, to safely tear down shared resources such as a bridge when the last container leaves it- and you may not necessarily even want to do this right, so most cni plug-ins.

A

The effect of this is that they leave their shared resources around forever, even if the network is done even if you're not going to be using this bridge anymore, because, generally speaking, not tearing down the bridge is better than potentially tearing down a bridge and interrupting an in-flight operation or affecting something else. So it's not really like. There is no notion of tear down in cni other than tearing down an attachment, and that sometimes makes things a bit awkward for users and developers of cni.

A

Okay, so that was some of the problems or, let's just say, sub-optimalities, with scene with cni as it's been adopted and used in the real world. I want to briefly talk about some of the considerations we need to keep in mind and with that we can then move forward to think about what we want to do next right. So a quick dive into some considerations.

A

So the most important consideration that we need to keep in mind is that cni is not kubernetes right. We need to support multiple runtimes. We need to support multiple deployment paradigms.

A

Not all consumers of cni want to create a kubernetes style, sort of single logical network, logical network, across multiple nodes right for podman that doesn't even necessarily make sense um and there's lots of cni runtimes, and we need to design a specification that doesn't necessarily preclude them from doing what they need to do. Right. Cni is vendor neutral and that's a good thing.

A

A second consideration is sort of similar to the first, which is that some runtimes are demonless, so we should whatever we do. We should not make their jobs or administrators that choose demonless runtimes any harder.

A

If an administrator choose it chooses a demonless runtime, they expect that they have a network infrastructure and network plugins that are probably they would probably like those to be demonless as well right and if we require end users to manage running demons just to bring up a simple bridge and port forwarding, like we've made a real user's life, a lot more complicated for no real benefit to them need to keep this in mind as we're designing things forward.

A

And lastly, it's always useful to be wary. Extremely wary of the so-called second system effect right. This is the unfortunate tendency of version two of a particular system to try and solve all problems perfectly and thereby solve no problems well and wind up bloated and unusable right. This is not a new problem in software engineering, it's probably the first problem in software engineering and it was even discussed by fred brooks in 1975, and we really need to keep this in mind as as we look forward right by the way.

A

uh A brief aside, I think I think that kubernetes deserves a lot of praise for avoiding some of the temptations that cause. The second system effect right. I encourage all of you to watch the talk in this coupon cubecon 21 2021, to talk about reimagining the ingress api, and that team deserves a lot of praise for avoiding the same temptations around the second system. Effector they've worked very hard to design, something that is not bloated and is also not over engineered or overspecified.

A

So the with all that in mind like how can cni 2.0, whatever form it winds up, taking, be a worthy and successful successor to cni at 1.0 right and the answer in some sense is very simple: we need to solve real problems for real people without making life appreciably harder for anybody.

A

That means we need to keep things simple, composable understandable. Any of the keep true to what's enabled cnn 1.0 success right and another thing that's also critical. Is we don't want to over specify every interaction right? If we write a protocol, that's really rigid and overspecified, then you don't leave room for unanticipated uses and you just make a protocol that's difficult for anybody to use even in slightly divergent manner.

A

So with all that in mind and about 10 minutes left. In my talk, I'd like to move and think about what cni 2.0 might look like excuse.

A

Me so the first thing I'd like to think about for cni 2.0 are some potential life cycle improvements, so I've shown here the logical diagram before and the three cni methods how they fit in. You can add, delete and check an attachment. That's all that you can do so like. Let's imagine what, if you could do the same verbs for all all of the three logical components within cni? What might that look like? But if you could manage networks the same way you manage attachments well, just for discussion's sake.

A

I think we can probably come up with better verbs. So let's do that right here and you can see here. We have a similar life cycle for networks and containers, as we have attachments just rename rename things a little bit. So it makes a bit more sense and, let's think about like what would a network and container's life cycle look like within the context of cni as we have it, so you can imagine a network having some sort of ad which we in this case we're calling init.

A

You could imagine network plugins, creating shared resources such as bridges and firewall rules when the network itself is created. Likewise, what happens? What does it mean to check a network? Well, you could say checking a network is checking to see if this network is configured and is ready to accept ads, and so that's a real problem that we know we have right now is there's no way to check a network status and additionally, destroy or delete for a network would be a way to say this network is no longer needed.

A

Please tear down any attachments and please delete any shared resources right, so that solves a couple of the things we've discussed. I think that this is a pretty clear, pretty clear, gain for any sort of future directions in cni. That's network lifecycle somewhat akin to attachment lifecycle.

A

So what about container lifecycle, knowing that cni itself does not actually involve in the creation or deletion of containers or network namespaces or any sort of other isolation domains? So what would it mean to add a container? Well, one thing that comes to mind is some sort of ad being akin to a finalize, which is to say saying: this container is fully attached and passing a container plug-in as opposed to a attachment plug-in is saying: hey, please. This container is fully attached. Here are all the interfaces that are configured in it.

A

Please make some sort of additional super level or higher level aspect of a container, please orchestrate or configure that after all attachments are done. The use case for this might be something like tweaking routing tables or adjusting ctls or an interesting one is adjusting some internal firewalling right now you can have an istio network plugin, which fits into cni and hooksen, but is a bit of a cheat because it doesn't actually create any interfaces. This is a perfect.

A

This would be perfect for that particular use case, which is to say, I don't want to touch any interfaces. I don't even particularly care how many interfaces there are. I just need a container's networking state to look something like this after everything else is configured, so that would be an interesting thing to add to cni, 2.0 and then check and delete would sort of match. This check would see. Please verify that your changes are correctly applied and delete would undo. It would be something along the lines of undo what you did.

A

So that's the first exploration. This is the looking in the direction of life cycle enhancements. What else might we want to consider for cni enhancements?

A

Well, a little bit more uh specific would be demonization right. Should we offer and switch to grpc for cni 2.0, and the answer for that is almost certain. There's significant demand for this, especially if you look at all of the work that is done to un to re-demonify the exact cni binaries.

A

In other words, should we offer grpc absolutely right. Grpc is the standard choice. It is the expected solution for this particular corner of software engineering and there's really no compelling reason for cni in the future. To avoid grpc, however, can we require that all plugins and all administrators run as demons, and the answer is definitely not right. The administrative overhead for simple plugins is way too great. It'll be asking way too much for administrators in that particular context.

A

So the first solution we've come up with as maintainers is to think about offering cni 2.0 uh as both right as both demonized and non-demonized plugins. In other words, we should support interactive rpc over a socket file as well as well as direct execution, a la cni 1.0 you could. We could define some relatively simple fallback rules and it should be pretty seamless.

A

Additionally, we can implement most of this in libcni, so that plug-in authors and people who are administrators really don't necessarily need to see the complexity and they can pick and choose which ever works better for them. I think that's a pretty clear new direction that we're going to want to take cni 2.0 and we need to offer demonization and we need to not make it the only choice.

A

So that's it for ideas that we've had right. Now that we'd like to talk about for and present about for, cni 2.0, but I want to touch on another area for which there's room for serious potential improvement, and that is the interaction between cni and kubernetes.

A

We're here at kubecon and kubernetes is obviously an extremely important consumer of cni. So to take a step back right, cni configuration is a file written on disk by some unknown processor and everybody does it differently. All they need to do is write a file to disk, and this has resulted in thousands of that's an exaggeration. This resulted in many different solutions and everybody does it differently, and this isn't necessarily really the kubernetes way right. The kubernetes way is to have discoverable, authoritative, validated, declarative.

A

Configuration that is managed by some sort of central service, which is to say this sure, seems like a lot sure, seems a lot like every other api object right. What is kubernetes but a really excellent crud over at cd? Oh I'm sort of cheating, I'm sort of uh joking when I say that, but network configuration like doesn't have a compelling reason why it should be different from any other types of configuration within kubernetes.

A

So the configuration it's not immediately obvious how this entirely fits together. However, so the container runtime engine in the kubernetes cluster, which is to say container d or cryo, or anything like that that talks to the cubelet via something called the cri, the container runtime interface right and it is by design and really important that the cri is not only for kubernetes. It is abstract, it is a standard and an abstraction boundary and the container d and cryo and all of the cri, the cri runtimes. They don't talk to the api server.

A

They don't talk kubernetes, they talk cri and that's a very, very good thing for vendor neutrality and making it so that things are pluggable.

A

So if we wanted to add logic to retrieve network configuration from an api server that would pretty fundamentally violate the boundary, the cri kubernetes boundary, we don't think that that's a particularly good idea.

A

So our first sort of straw, man proposal as cni maintainers, is that network configuration management should be first class within the cri itself. Right. There's no need for this to be cni, specific sort of, in the same way that you can have gcp and vsphere volumes. uh It would be really cool if you could have cni or non-cni network configuration, managed and configured and life cycle managed by the cubelet over the cri right.

A

So cri asks that runtimes create it reconciles network configuration and enables that enables networks to exist for a particular runtime for a particular sandbox in kubernetes.

A

Putting the network configuration in cri has a couple advantages right: it means that you can have simple cluster-wide network deployment and, really importantly, it ends this sort of bizarre kubernetes status. Catch 22, where your configuration file means that you're already configured, it's always been a bit strange right. The problem is that it complicates per node configuration or, more specifically it comp. It complicates cases where network configuration is not uniform across a cluster, so that would require some careful thought, but kubernetes solves these problems pretty well.

A

They have the notion of label selectors and a node selector would be a pretty interesting thing to add to that object, right and also by making network configuration a first class concept within the cri. It opens the door for future improvements like multiple interface support and even potentially dynamic attachment.

A

Okay, I'm getting a little bit out of time, so I just want to wrap this up just a few parting words. As uh the presentation comes to a close.

A

This is obviously very early days in the saga that will be cni 2.0, and this is nothing if not a community effort right, the cni maintainers. We want to hear from you and want to make sure that this is a worthy exercise for all of us in the community, and we really welcome your involvement right. Cni is not a project that is supported by any one company. It really is intended to be some form of encapsulation of community consensus, so I please encourage all of you.

A

Have you if you have opinions about this, please meet us on the cni, the cncf slack and the cni and cnidev rooms, and if you'd like to start talking about this, we have a label on our github we're starting to open up issues to discuss and look at ways forward. So thank you very much. This needs more word art I'd like to thank you very much for watching thanks for taking the time uh thanks for watching me at 1.5 time speed.

A

I hope I didn't speak too quickly and I believe it is now time for live q, a cheers.