Flatcar Container Linux Tech talks, 23 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Calico's eBPF dataplane (and friends) - Shaun Crampton, Tigera

Description

Deep dive into Calico's new eBPF dataplane, which is now GA. You'll learn about its advanced features, including its high-performance Kube-proxy replacement that preserves source IP all the way to the pod. The talk will also touch on Calico's other dataplanes; Calico now supports Windows nodes (including in open source!) and there's a fast-maturing VPP port in the works.

A

Oh there we go yep, um so I'm going to talk about um calico's evpf data plane, and I have a couple of slides at the end um about the friends. um So we have more than one more than one data plane um and some of them- maybe maybe you haven't heard of before so I thought I'd just drop a couple of slides in at the end, to tell you about the uh the other data planes that calico has and why you may or may not want to use those.

A

That's the agenda for the talk. So mostly I'm going to talk about the ebpf data plane. um That's that's what I've been working on for the last year or so um and yeah just gonna. Take you through what it's all about. Why we did this? um How fast it is uh everyone everyone's uh always interested in that um and then yeah just a couple of slides at the end about about the others.

A

So uh without further ado, I'll dive in um so yeah um my I'm working there we go so what what's color, because evp data plane it's an alternative data plane for calico so calico is, is the um kind of most widely uh used. uh Networking and security solution for kubernetes um you know: we've got hundreds of thousands of clusters out there using calico um a few parts, the calico, so we have our data model. um You know that's stored in up in kubernetes api server.

A

We've got our calculation logic that takes all the policy and distills it for for every host, and then we have the actual implementation like. How do we get packets around through the calico network? How do we secure them? How do we drop the bad ones, allow the right ones through and that's the data plane, um so we've had pluggable data planes in calico for a while.

A

So I was going to talk about a couple of those later on a couple of the other ones later on um uh ebps, the one that that we've recently added- uh and so that's what it is, um it seems obvious we use evpf instead of the standard linux um networking technologies that that the kind of standard calico data plane is based on that's mainly ip tables and linux, routing and those kind of things.

A

Instead of that we're using ebpf, I've got a slide next to that explains ebpf. If you're, if you're not super familiar with it, um so the like, why evpf? Well, we can, we can do things with evpf that we can't do um in in the old world in the in the standard linux data plane. um So we've got some evpf only features which I'm going to cover in more detail, um and you know we can.

A

We can give you great performance, so I'm going to cover cover the the sorts of trade-offs that we make there and what we, um what we can do there later on so yeah wizzy new features that we can't do in the old data plane, great performance um and just as a word of warning, a little caveat.

A

It has some new features that we don't have in the open tables um world, but it also lacks a few features that we do have in the op tables world. um You know some of those are due to like fundamental differences between the two, so you know um I have a list of them later, but the the iptables log action is not available from ebps, because it's it's an ip table specific feature. So we can't. We can't use that um yeah.

A

So ebpf, um what's it all about um you've, probably heard of ebpf already, um but just just to recap, so it's it's a virtual machine that runs inside the linux kernel, um it's a bit like the java virtual machine, so it runs its own type of byte code and the the name means extended berkeley packet filter. But it's not only used for packet filtering these days, so the name's a bit of anachronism and anachronism, um but it just so happens that calico is using it for packet filtering.

A

So it can be a little bit confusing when you hear um hear of other uses like monitoring and that kind of thing.

A

A key thing to know about eppf is uh the the mini programs that we can put in the kernel um that run on this virtual machine are event driven, so they're triggered by something happening uh in the kernel. It's not like a sort of user process where it just runs and runs, and and can you know, do things and set timers and trigger, and you know display an animation or something has to be triggered by something happening.

A

So some examples um could be a packet arriving. I mean that's the key one key one for calico, um so packet arrives, evpf program runs, um maybe it drops the packet. That's one of the things that that particular evpf hook can do. um Maybe it allows the packet through? um Maybe it decides to turn the packet around swap its headers respond with an icmp message, there's a couple of places where we do that in the in the calico ebtf data plane.

A

um So it has some flexibility in what it can do. um You know it can mangle the packet in in any arbitrary way. Basically, it can drop it. It can allow it through for normal processing, but what it can do is constrain to the packet and the particular hook that it's attached to similarly packet being sent, maybe that one could drop it, allow it or encapsulate it and send it down a tunnel or something like that.

A

um There are thousands more hooks in the kernel, um so various subsystems use ebpf.

A

Now, um there's a there's a hook, for you know every cisco that a program makes and people are using, that to police cisco's and uh generate um like audit logs of what every program does and maybe block people from accessing certain files that that kind of thing there there are ones for deciding which socket a packet goes to when it arrives at the host choosing the source ip when when um packet is leaving the host and all kinds of things um across all the different subsystems.

A

um Since running code in the kernel uh would be dangerous. I mean you know if you load a kernel module or something like that running in kernel space, um that's dangerous! It can do anything. um uh The ppf programs that we load are all subjected to a verification process. So the the kernel has a very rigid um like verifier.

A

It ensures that your bpf program cannot access memory that it's not allowed to. um It cannot run forever. So it's not allowed to tight loop or run more than a certain number of instructions. That's how how they ensure that- and it's just generally quite locked down like the functions in that it can call within the kernel. Those are all very limited to a specific.

A

um You know allow list of of functions that we're allowed to call. So it's it's fairly safe, although of course, if we're, if we're dropping packets and we drop essential packets, then obviously that could that could cause problems.

A

um So why choose epps, um I'm going to contrast it with the iptables data, plane and and sort of explain why why it's different, um so the iptables data plane is baked into the kernel um as the the net filter subsystem. It does get a lot of active development, um but because it's in the kernel and it's it's c code- that's part of the kernel. uh Its development cycle is quite slow. um So if they're adding a new feature there, they have to worry about fat compatibility with absolutely everything.

A

That's out there and the feature may go in today, but that kernel will not be available in you know. Ubuntu say until you know two three four years down the line when that particular kernel gets gets rolled out. um So I mean the great thing about that is it's generally very stable and and and battle hard and been around for years.

A

Even nf tables kind of the version two of ip tables has has been around three years at this point um so stable but slow, moving, slow development process, um it's very general, um so it can handle all the things that the um the linux kernel is capable of handling so uh tunneling packets, ipsec, um bridging routing all that sort of stuff is all integrated in it.

A

And that's what this diagram on the right hand side is it's a diagram of the net filter and all the networking stack, but there's only one path through that for any particular packet, and it comes in on the left hand, side, and it goes through these stages, one by one um and some of them make choices and send it and send it up into a different layer or some of them loop it round, but you're very constrained um in that world.

A

um You you can't do anything. That's like super creative. You can't bypass a big chunk of it if you can spot early on that. This packet doesn't need all of that extra processing, um and if you want to jump from one place to another, to do something interesting, then, then you don't have that capability without patching the kernel, which is a non-starter for most products etf.

A

On the other hand, diagram kind of gives it away, but it's very flexible, so it can do kind of impossible things like if you want to attach a bpf program like somewhere on the left of this, that that suddenly sends the packet out of a particular interface.

A

Having added a custom, encapsulation header and switch the mac address and make it go out of your interface, that's the kind of thing that you can do in in bpf land, um and that's that's great. If you're trying to like bring the most performance out of out of a system um and that that's one of the things that I like to, I like to think about ppf. It lets you, trade, this generality and compatibility for performance.

A

So a lot of the performance that we get in the uh the bpf data plane is because we do exactly this. What this red line is doing, uh we take a packet, that's come in on the left hand, side and when it hits the uh the q disc box on the left, uh you don't don't worry about the eye chart. um We can pick it up and we can send it directly to a local kubernetes pod um or, if we're load, balancing at ingress for a node port or something in kubernetes.

A

We can turn it straight around and send it back out of the the same interface and we don't have to go through all of these blocks in the diagram and pay the price that they have, but the the counterpoint to that is some of the blocks in the diagram um may be uh useful. um So in your particular scenario, like the box is in the top right of the diagram, where it kind of loops around on itself.

A

Those are the boxes that handle ipsec traffic. So if you do this bypass, then you can't do ipsec because you bypass the ipsec subsystem. um That's that's fine for a lot of use cases, um but uh it's something you need to be aware of. Like there's. There's no free lunch, like the the c code in the kernel is pretty fast for what it does.

A

But if you bypass a big chunk of it, you can you can get some good wins um and you can do some some much more flexible, like creative things that that solve problems that you're not able to do in in the other world. So that's that's kind of my thoughts on it and I know when, when kim volk have done some micro benchmarks and and when I've done the same, I if you do like really like tight micro benchmarks for certain operations in ip tables and bpf, but sometimes the vpf one is slower.

A

Sometimes it's faster they're, both kind of pretty well optimized uh for what they do, um but it's really the flexibility and the the ability to do these like interesting trade-offs that that you get a lot from um so yeah good for different users and and different use cases. I mean obviously bpf um it's a trait of newer kernels. So if you're, if you're on a quite stable older kernel, then you can just stick with ip tables with uh with calico.

A

That's no problem, and so let's, let's talk a bit more about some of the um the flexibility that that it brings.

A

So one one of the pain points in um in kubernetes networking um for for a long time is when you're using um cube proxy, and you have some external traffic uh in order for key proxy to do to take in traffic from a node port and send it on to a backing pod. uh It ends up needing to estimate the traffic, so the traffic arrives at the first host. Let's call that the ingress host and it detects that it's a node port using op tables rules or ipvs in if you have that turned on it.

A

It does a dna which changes the destination ip of the packet to send it to the backing pod.

A

But it's really critical that the packet, the response packet from the backing pod goes back through the same host that the ingress packet came came to, otherwise that d-net dna can't be reversed because none of the other hosts know about it.

A

So the packet comes in and it does a dna, which is what it really wants to do. And then it has to do an snap as well, so change the source ipe to be the host's ip. And that means that, for all of your web server logs um and for you know, network policy, you see the source ip as being really not very useful. You see it as the um the host ip where the packet arrived rather than the original host ip from outside.

A

So with bpf we can. uh We can deal with this problem and kind of break some of the the rules that that um that are in place in the sort of standard linux data plane and do something a little bit different, um so for, for example, packet comes in um to a node port, the bpf program, and so this is this is how calico works.

A

We replace cube proxy with with code in our bpf programs. um We do we do the load balancing there and then, rather than doing an snap, we encapsulate the packet keeping it exactly as it is inside, but we we stick with the x line header on it. We send it to the correct node, like the backing node um that has the backing pod on it. um We have a bpf program there that catches that packet decalculates it does the dna so that the packet will be.

A

Will go to the pod and the pod will be expecting packets with that ip address packet arrives at the pod and it still has the original source ip address. Just we didn't change it, and the way we've made sure that we're on the reverse path is because we did the dna on the target host rather than or on the backing host, rather than this, the ingress host.

A

So when the pod responds, we have a bpf program there that can catch the response, packet and sort of reverse it all the way along the chain, um so that that's the kind of creative thing that you can do in in bpf land that you, you couldn't really do in ip tables land.

A

um I believe ipvs does allow this kind of thing, but it's subtly kind of wrong for kubernetes, just because of how how it assumes that the uh the backing pod would receive the traffic, so the encapsulated packet would have to go all the way to the backing pod, which means your backing. Pod has to be rewritten and it kind of changes. The networking model.

A

I think if I, if I understand that correctly um so yeah, that that's the sort of thing that we can do there- and I guess I should do a demo um and and show some of this stuff.

A

So let's see if I can switch my share.

A

And I'll change it to be my desktop.

A

Hopefully um that means that you can all see a couple of windows so on the left I have the google cloud microservices demo, which is a great test app for a new data plane um because uh it runs lots of services. It uses kubernetes services to kind of load, balance between them all and each of them is written in a different language. So you get to see all the different networking quirks of different languages and all of their different dns behavior, and things like that. um So hopefully I started this a while ago.

A

So hopefully it's still running um can page around. I can buy a vintage typewriter and place my order and it will create a uh create a um an order in its database. My cluster right now is running uh with cube proxy in calico ip tables mode. So if I do.

A

A

There we go um so these are the sort of services running in the default namespace. So all the um the pods that make up the um the microservices demo um we've got calico node running uh it's in op tables mode. We have q, proxy running um and yeah. That's all working. um I've got an nginx running as well, and the reason I've got that running because I want to show some um access logs.

A

So you can see that the source ip is preserved so I'll turn on bpf mode in a minute and then I'll show the access logs as they change.

A

um So let's get ppf mode turned on so before uh before we started, um I did the first step of turning on bpf mode, so I applied this uh this config map- um hopefully the the text- is just about visible, but I applied a configmap and the conflict map tells calico the real ip address of the kubernetes api server.

A

So I'm about to disable cube proxy, and that means that calico takes over from cube proxy and in order to bootstrap the whole system. We have to know the piece of information that qproxy normally knows, which is how to really reach the api server, not through the sort of kubernetes service ip. So I applied that already and restarted the calico pods um and then I'll I'll take a look at the nginx log.

A

I hope that's on my.

A

Command history, so if I refresh here hopefully we'll see.

A

Yes, I got the right.

A

A

Do you have the right notebook, okay gave it a refresh, and now it's working. um So the thing to note is um I'm hitting one of the uh one of the nodes and the ip address that that ends up in the logs is 10 128 136, which is the ip of one of my nodes that that I'm coming through and that's not really very useful um uh to like it's not useful for um the log and it's not useful for policy either.

A

If I wanted to allow this traffic in policy, I'd have to allow my node ip addresses not not great um and yeah. That's that's still working great. So I'm now going to turn on mode.

A

Okay, I'm going to disable cue proxy first.

A

um So what am I doing here? So I'm doing a cube, cuddle patch of the cube proxy demon set. This is all in our in our docs and I'm adding a node selector to it um that that makes it only run on nodes that are explicitly tagged with non-calico.

A

I haven't tagged any nodes with non-calico, so cube proxy. Just won't run anywhere. It's a nice simple way to disable uh disable cube proxy temporarily in this case. So if we do that, um I think, because I set this cluster up on gcp. It sort of has some authors doing the background. There we go now. Everything should still work, because I haven't churned anything so cute proxy's rules will still be in op tables um and where this one's flaking key proxy's rule should still be in in ip tables and everything and nothing's deleted those yet.

A

But if I apply, if I mark vpf enabled so what I'm doing here is I'm editing the calico felix configuration to turn on the bpf enabled flag, and that needs to be true.

A

So if I turn that on then felix configuration is patched and now um everything should still work, if I refresh hopefully.

A

What uh earlier.

A

Maybe the demo gods are not smiling on me.

A

So, in theory, with the latest calico, you shouldn't even need to do a hard refresh there. It should carry on working, um but I've done a hard refresh and it has come back um so maybe I'll investigate that later, um but yeah that's still working. We should be in uh in bpf mode now.

A

um This one is now working as well and if you notice the source ip has changed so, uh like nothing kind of obvious happened in the cluster.

A

Apart from that little bit of disruption, we saw which I'd say we shouldn't really have seen, um but the um the source ip that nginx is seeing now is the uh it's my source ip. um So I don't don't try and hack me or anything, um but now, when I access it, we're seeing the real source ip from all the way outside the cluster, um and if I refresh a few times, should just stay consistent, okay, um yeah, I think that's, uh that's all I have for a demo switch my share back to the other.

A

A

A

So that's the that's the demo, um but if I go to the next slide, um I can tell you how we sort of build on this even further. So um the next step after this is. If your network supports it, we can implement a feature called dslr or direct server return. So this is another option that you can turn on all starts the same way, so the packet comes in to the to the node port. We encapsulate it.

A

We send it to the correct backing node it gets decapsulated but and the pod sees exactly the same packet as as it did before. But then, when it responds um the uh the bpf program running on the backing, node is able to just respond so rather than rather than doing um encapsulation, to get it back to the original node and send it back along this kind of safe path where it's guaranteed to work.

A

Instead, we can just like essentially spoof the packet and pretend that we are the first node and send it uh send it directly back to the um the client.

A

um So as as I kind of mentioned, there's a big caveat with this.

A

um Your network has to allow this exact type of spoofing, um so if you're on prem um and you're in particular layer, 2 network, this works quite nicely, then you can arrange for that to work well, and you cut off this extra extra hop if you're in the cloud it works within the same subnet in aws and gcp, but it doesn't work with load balancers, so is it is a little bit of a limitation there, um but just to give you a feel for the kinds of um things that that you can do with bpf that you can't do um you wouldn't be able to tell um the ip tables data plane to you know.

A

Well, don't don't do your normal contract thing where you send the packet back where the way it came. You need to do this odd thing and respond in in this kind of non-standard way. It's just not possible.

A

um So yeah, um that's flexibility and the sort of interesting features we can do now on to performance, um so performance wise.

A

We do. We do benchmarking on a back-to-back pair of servers with a with a 40 gig link to kind of separate the two data planes.

A

um This is this: is the output from a simple test with just iperf running single threaded, um and we see about 27 gigabits per second or just single threaded with a with that basic benchmark between the two two servers and that's a that's at the sort of standard um you know 1500 type mtu.

A

If you bring the x line into the picture for encapsulation or wire guard, you get about 10 gigabits per second in the same benchmark.

A

So just flip on the wire guard switch and you have encryption, but you trade, you trade, some performance, and if you use our ipip data, ipip encapsulation option, it drops down to six gigabits per second and what I think has happened is since we originally developed calico um a few years ago, um like we picked ipip, because it was the fastest at the time um as our sort of standard um uh like out of the box um encapsulation, um I think the xlan's got a lot of love in the kernel and wire guard.

A

2 is is also like highly tuned and they've just overtaken ipip. um So if you, if you want to use encapsulation, I recommend uh vxlan, especially with evpf mode, because there are some specific incompatibilities with with ipip that slow it down. They don't break it, but let's slow it down a bit slicing that same data, a different way like just taking the data from the same test.

A

We can measure the cpu per packet instead um and if you look at it that way like we're saving sort of 50 cpu at the smaller ntu size, and if you bump up to a 9k mtu, we still save a little bit. um But but overall the the cpu used to send the 9k packet is mostly shifting the 9k data and then the bits we're doing are a small part of it.

A

So you see a much bigger difference at smaller packet sizes, but you do see a cpu improvement in in both cases um and one reason I like to slice it this way as well is not. Everybody cares about 27 gigabits, 40 gigabits of of traffic, um but most people would rather have less cpu used. So um you know, if you, if you're moving any like significant amount of traffic, it should reduce the amount you use.

A

um One of the things that comes with the calico ebpf data plane is the q proxy replacement, and this isn't optional in in our data plane. um So some of the features that um that the calico data plane has like host endpoint protection and that kind of stuff um meant that we really had to take this over and make to make sure everything happened in the right order in inside the kernel, and so we've taken over from keep proxy when you're in etf mode and and our implementation is faster.

A

um So it's faster than op tables mode, all the time um it's faster than ipvs mode, but it's it's a kind of splitting hairs with ipvs. Like you know, a fraction of a millisecond um I'd be tables. The performance varies a lot depending on how many services you have so as the number of services increases, um iptables really slows down.

A

So if you're talking like 10 000 services, you really want to be using ipvs for um or ebpf, because the those both scale kind of order, one in the number of um in the number of services and yeah, just keep their performance even with really high numbers of services.

A

um Scale. On this graph I mean even with 10 000 services. Iptables is still only taking like a millisecond, it sort of depends if, if you're, counting every millisecond as well, um how do we evaluate dsr?

A

So we do a simple round-trip test, so we set up an nginx pod and then just curl used curl with all its um debug options turned on.

A

um So we measured the sort of real like time to first content um time in in that setup, um and we saw um iptables mode, um it's just above 1.5 milliseconds in our test, uh ipvs mode in cube proxy 1.5, milliseconds um bpf, with the sort of non-direct so where it goes back to the first node uh beats down a little bit so down about 1.3, milliseconds and then with dsr.

A

We knock another kind of three milliseconds off that to take it down to one millisecond.

A

uh Yeah, so that's that's the performance section of the talk, um a little word on limitations, so ipv4 only at the moment we wanted to get get the data plane out there, get it into people's hands and kind of implement a broad set of the of calico's features. Before we tackled uh things like ipv6, um one of the key pieces of advice we had about making an evpf data plane was you have to cover a broad set of the features you want.

A

Otherwise, you can kind of micro optimize it and end up with like going down a blind alley where it's uh it's very fast for the one feature you implemented, but then it ends up slow because you've, you've you've made some poor choices elsewhere. So we wanted to do a broad broad base and and get it out there um we're on x, 86 64 only at the moment. um The main reason for that is just doing the cross builds.

A

Our infrastructure wasn't quite set up for it, so we can cross-build go binaries quite easily, but cross-building um all the um cross-building, the uh the the f binaries is a little more fiddly. um I think amd 64., sorry arm 64. Would be fairly straightforward to do, um but the other ones um that that we have some support for like our pc. We might need to flip the endianness and so on and work on that um right now.

A

All nodes in the cluster must run the bpf data plane and that's because the the kind of creative, like ncap based external traffic solution that I talked about, requires that bpf program to catch the packet on the other end, I think, over time, we'll add support for running hybrid clusters and we'll we'll basically add s-nat support to the the bps data plane. So it can interwork with the other types of cluster.

A

The other, the other types of mode sctp, is not supported. I think there's some limitations in the kernel for supporting sctp in in bpf.

A

I believe that the vpf support for updating packet checksums hasn't been updated sctp, although it could be run there, um and that makes it uh very difficult to um to do um like cube proxy function and do the nat for sctp, and we don't support the log action. I mentioned that right at the start, but it's um it's. uh The log action in our in our policy um is implemented by an iptables log action and the iptables log action isn't available from bpf because it's just a totally different point in the kernel.

A

So we need to do something else to get logs if, if we implement that feature or just accept that it's the difference between the two data planes that can't easily be resolved um in version 318, which which came out just a few weeks ago, uh host endpoints are now supported so before that we release with um workload endpoint support support only so that's pods and not hosts.

A

um So we've now added host endpoint support. That was that was quite a big piece of work and host. Endpoints is the feature that kind of intersects with um cube proxy and and being able to interwork, with with the uh vanilla q proxy versus needing to do our own. So if we had to do our own, we wanted to make it better, um but we we sort of had to do our own because we planned for host endpoints down the line.

A

um We're missing do not track policy, which is kind of another thing: that's sort of specific to um ip tables um that policy disables contracting for a particular flow for some very particular use cases.

A

um I think we are going to do it, but I think we, I think it will end up meaning something a bit different in um in bpf land. I think it will go towards uh being implemented in xdp and um like being a sort of uh really early um pre-filter like like we have in in ib tables mode where we have a.

A

We have a little sprinkling of xdp to to do some of this function, but I think we can do a more thorough implementation of that on the on the base of the the new bpf data plane um and for a long time it wasn't available in calico enterprise, but I'll just do a little plug. We've introduced it in calico enterprise 3.5, um so we've added a bunch of our enterprise features in there like flow logs and enhanced types of policy like tiers policy and so on. So that's a tech preview in in calico enterprise 3.5.

A

um That's all I have about bpf just had my couple of slides on the other data planes that that kalika has um so, I guess from the beginning of calico um the the data plane, certainly from quite early on, we made the data plane, pluggable um and a big driver for that was we started off in python uh with uh with calico. We were part of the open, spec ecosystem in in python, and when we saw kubernetes coming along, we thought that was the thing to um to to really sort of engage with.

A

um We just decided that now is the time to to rewrite it into go, so we split the product into into two parts like all the sort of brains and the data plane separate, and we rewrote one and go, and then we rewrote the other, and we ended up with this quite nice um split between the two. So you could run the you could run the golang back end with the python data plane and then you could switch to the go data plane. We could test them against each other, make sure they were really robust.

A

So we ended up with this api and that was really convenient um when uh microsoft came along and contributed a windows port of calico, so the first, the first data plane we added from from outside was the windows data plane, um and this was open sourced in in 316.

A

um now have in. In the latest version we have bgp support and we're adding support for a bunch of platforms, so openshift, eks, aks and rancher are all on the supported list now and aks um it's it's kind of being baked in. So you can try it as a tech preview, where it's sort of a tick box option and you can just enable it on your windows nodes in in aks.

A

um I've got the docs links at the end um repeated so um yeah. The windows windows data plane is there and we have an enterprise version of the windows data plane as well. That supports a bunch of our enterprise features um and the new kid on the block is the vpp data plane. um So vpp is a is a project from cisco, an open source project um and it's part of the fido project, which is why the that's the logo on the right there, um and so the vpp team, have contributed a data plane.

A

Implementation for calico, based on the same the same api that we have um recently passed its first round of conformance tests, so we're moving that to a calico owned repo and they're they're working on it in there as part of the the official um calico release and they're working towards a tech preview release where you'll be able to uh enable this very easily.

A

um It has some some um crazy features, so there, the the name vpp um there doing kind of heavily vectorized packet processing, so they can process like 10 packets at once, through a kind of vectorized pipeline and using the the vector operations in in modern cpus.

A

um All very clever runs in user space um and they sort of have various ways of getting packets up into it and then it it runs all the um all. The protocols and user space uh implements policy and everything and then fires the packet onto your into your application.

A

um I think uh it's it's gonna the next milestone for that is tech, preview, so past some conformance tests, but it's not sort of thoroughly uh thoroughly baked yet. But it's it's really interesting and great to have uh such a big contribution from from outside the team, um so yeah interested to see how this one pans out and how it how it competes with the bpf data plane and I'm sure we'll find that out.

A

uh That's the end of my talk, so um I put some links up here for the ebpf docs getting started with windows and dpp. They have a how-to for turning on in your cluster. Now, thanks very much.

B

Yeah hi thanks for the talk it was great, I mean it seems like we do, have some questions. um If you want to take a look in the chat as well, but one seems to have been answered, but I think it's okay. If we go again um so I suppose it's um does ebpf give better performance than ipvs and does calico ebpf skip contract.

A

uh Yes, so um it gives better latency than ipvs, I'm not sure we've measured the throughput. um So you know performance is a multi-faceted thing, um but the latency versus ipps is certainly better. That was the graph that I put up. um Ipvs is pretty fast, though, and and our our data plane is, like you know, a fraction of a millisecond faster in our tests in terms of latency, so the big win is against either of those versus ip tables.

A

Does it skip the next contract?

A

It does skip the next contract for workload flows, but the the thing I sort of alluded to I didn't- I didn't really mention it, but um in in version 318, um when you go from ip tables to bpf mode, um we kind of cooperate with linux contract in order to make sure that it's not disruptive when, when you do the upgrade so existing flows, we just kick out to linux contract and let it handle them, um but new flows we handle in our own contract table um just thinking about like why that might not have worked earlier.

A

I flipped my cluster backwards and forwards from bpf mode today, and one guess is the flip from bpf mode to ip tables mode is disruptive, because we can't do anything on the ip tables mode to make it less disruptive like ib tables, isn't flexible enough to handle it going that way. So it's possible that I messed it up by flipping back and forth, but I'll have to dig into it.

A

Cool, so we got.

B

um Yeah um hope that answers it and then we have another one: um what's the difference between ip tables and ipvs and ebpf.

A

Okay, so iptables is um the uh it's: the linux, kernel's built-in firewall and load balancing solution, and the main thing to know about it is it's: um it's structured into chains of rules, so a rule might be something like if the packet is going to this ip address and this port then drop it or then rewrite the packet and send it to this service instead, and the structure of the rules is they're in a big long list. So you have a chain of rules.

A

The first rule is processed. If it matches it wins and it does its thing. Otherwise it goes to the next one. Otherwise it goes to the next one and when q proxy programs, the services into that, if you have 10 000 services, then you get 10 000 rules in a row and if yours is right, if your service that you're accessing is right at the bottom, that's 10 000 rules you have to go to and each rule costs about, 0.5 microseconds.

A

So they, when you got 10 000, they add up to milliseconds. um So that's ib tables and that's how it implements say, um say: nat for cube. Proxy ipvs is a separate subsystem, uh the ip virtual server subsystem and it's basically a faster way of doing that. It has efficient ways of sloping up the traffic before ib tables gets, gets a look at it, and then it does a more efficient load balancing technique than having a thousand rules in a in a row.

A

It does a hash look up to figure out what the right back end is and and away it goes. um Ebpf. um Is this virtual machine, that's very flexible, and one of the things that we've implemented is a load balancing solution, that's very similar to what ipvs is doing there. So we take the incoming packet. We check it against a we. Do a hash lookup in a table which is very fast to figure out. Is this a node port? Is this a kubernetes service?

A

If it is, then we rewrite the packet and we send it, but that lookup is much faster than going down 10 000 rules, which is what you have to do in in ib tables.

A

So hopefully that answers.

B

It, okay thanks and then I guess we have another one, um and that is can this be enabled with a c and I multiplexer.

A

um So the configuration option for calico um is kind of global, so you turn on bpf mode and all of your all of your calico node instances will switch to bpf mode.

A

um So you could multiplex between like calico running bpf um or, like suppose, you use multis to add two interfaces to every pod, and one of them was the calico interface that would run bpf mode, and then you could have a second interface doing some, some other cni like say you had some dpdk special or something you could. You could do that um and I don't think they would. I don't think the calico side would have a problem with there being a second cni on there.

A

um You can't have two interfaces, one with bpf mode and one non-bpf both networked by calico, just because it's a global flag.

B

Okay, thank you again um for the great talk and for all the answers. um If there are no other, ah I guess there is another another one. It just came um so what's the performance benchmarks when using ipvs and ebpf.

A

um So yeah, I think I covered that already and the um the latency with ebpf is lower. That was one of the graphs that I put up. um So I think ipvs was like 0.5 milliseconds and if I remember from the graph, the bpf data plane was 0.4 milliseconds, um but I don't have uh numbers on hand for service like like how much throughput you get on the service um so yeah. I don't have numbers for.

A

A