Internet Engineering Task Force 104, 24 Mar 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF104-IEPG-20190324-1000

Description

IEPG meeting session at IETF104
2019/03/24 1000

https://datatracker.ietf.org/meeting/104/proceedings/

A

Good morning, just still, okay me, a little closer can't get too much closer. This is the IEP G meeting at IETF 104 again, if you're, not an IETF, 104 you're wrong rooo, because they'll be fun.

A

My name is Chris. That's Warren!.

A

No, it's not on surround I, don't think! Oh! It's just echo II! Okay for the presenters! Please stand on this Pink's thingy on the floor. X! Please no dancing use the mic like this. Please don't!

A

Okay, exactly very good! Okay, there are some presenters that send slides some that didn't. Perhaps they don't actually want to present here's the agenda.

A

Okay is Stephen around yep.

A

No mr. Ferrell is not here. Okay, Simon you're up also there's a pointy thing for the clicking and for the with the laser bits and one each order.

B

C

Morning, my name is Simon linen I work for switch, which is the Swiss national research and.

D

C

Network I haven't been to a PT's in a while at school that show meant so many people show up used to be smaller, okay and I'm. Here. To tell you about offers, there was sort of a plea for agenda items were neither DNS more BGP, so I try to I try to feel that I don't have nice slides, unfortunately, but I hope that I can interest you a little in this awesome, exciting topic of buffer sizes. So this is about sizes of the buffers in Reuters and switches, I.

C

Think! Well, you probably all know why we need buffers internet, it's packet-switched and the the flow control is mostly end-to-end done by transport protocols such as TCP, and you need buffers to accommodate fluctuations in traffic rage and bursty traffic. You need them when you go from fast links, slow links or when you aggregate many links at the same speed, but I don't have time to go into the details, because it's a huge space of situations where you where the buffer needs are very different.

C

So how big should the buffers be? It's a slightly confusing picture. Many of us have sort of grown up with this historic recommendation that I think is also in an RFC.

C

3489 TS, which ended up recommending to have office space in the Reuter, released the bottleneck, route or switch that corresponds to the overall end to end round-trip time, which is the average I think times the bottleneck bandwidth so on today's networks on white area paths that can result in quite big offers. So this this recommendation was I, think it's implement by many people in the in the ISP space.

C

Suddenly, though, maybe not in near the edges anymore, so in 2004, some people set out to challenge those rules and most of them were Nik McEwan's group in at Stanford, and one of the early modifications to this recommendation claims that you can actually divide this of a space rule by the number by the square root of the number of parallel TCP flows that you have running across your bottleneck thing.

C

So that can be a big savings if you have hundreds of thousands of these or or Millions, but that was sort of put out there and researched and validated in simulations and so on. So question is: how does that translate into practice on the Internet?

C

There was some additional work to push this to even smaller buffers, making some trade-offs. For example, when you don't insist on being able to to load the bottleneck link at 99.9 an hour, I am percent, or when you can assume some some pacing of the connections I won't get into this anyway. This, despite a quite a big spectrum of what what we operators are told how we should dimension our buffers.

C

So in practice, how big are the buffers in the switches, I I'm, looking specifically at these new commodity, switching chipset like Broadcom or Mellanox, or these kinds of things that are used by many vendors these days? Basically, there are two types: they're, switching Asics that have the memory on chip or maybe on package in the future, and those have very limited buffers. It's usually on the order of like two-digit megabytes. I know some of the early Asics had, like nine I, think that's the least, and the biggest ones have around 64.

C

If I'm not mistaken, are these on ship buffers and note that these switches have have like 32 times 100 Gig ports, so they're pretty? They have a lot of bandwidth and usually they're deployed places where people don't think the network is the bottleneck, but, of course, due to things like in caste or other situations that happen in the data center.

C

That is not always true, and then there are chipsets, often very similar, for example, that Jericho chipsets from Broadcom doom that can accommodate external memory for the buffers and those switches have vastly bigger buffers, and the one reason is that you need a lot of bandwidth between the switching chipset and the and this memory, this external memory, so the chipset I think moto chips that uses something like for also parallel gddr5 connections between the chipset, the switching chipset and the memory.

C

So you need at least four channels of or or four sets of dims and the smallest dims you get for memory are already quite big. So that's how you end up with with much bigger buffer memory but of course, there's a cost because on the chip sets you need about half the pins and the the series or whatever external memory, bandwidth of the chip.

C

You now need them for this memory channel and you can't use them for ports, so you get about half as much switching bandwidth if you do that and also building these systems are hard, is very hard hitting these channels to external memory, and you know there are mm use memory management units on the switching chipset now that have to somehow accommodate the the bandwidth and latency limitations of this memory. So it's it gets very tricky and ASIC many basic designers don't like that.

C

They would rather build simple switches with on-chip memory, but that's the size is very limited. So, of course you may ask why do I care yeah? If your buffers are too big, you're, probably wasting money, and also there is the potential for like interfering with with TCPS or whatever transports rate adaptation. That is actually bad for performance. It will often be better to drop packets earlier well. I I haven't put this on the slides, but we all know this use. Buffers should be managed actively right. That's active queue management, which is also a lot lot.

C

Wild widely deployed in the internet I think it's. Maybe there are some parallels to this discussions. If you buffer self, too small, then you drop packets. Of course burst may arrive at your buffer, which doesn't fit so you have to drop packets.

C

This is okay, because any TCP is based on this built on the assumption that network will drop packets to tell me to slow down, but of course, the dropping of packets is often a bit like I'm sensitive and can drop like innocent flows or flows that are just in a in a delicate phase of startup or something, and then that can lead to two real performance limitations.

C

Also for distributed systems like you find in the cloud, people are often interested in these high percentiles, the so called tail latency I, don't know you when you type a search query to Google, it extends it out to I, know a thousand servers and you only get the results if, like the last server, has send its response. This is not not like this, but get the basic idea. So tail latency is important and yeah. The the big question I found in the discussions that people people optimise the networks.

C

Of course, for very different goals and yeah traditionally, when, like 20 years ago, or so when when links were super expensive people wanted to utilize them to the absolute maximum. Ideally, the links should all be running full at 100% all the time and people should still buy my service. But of course, today most people don't run. The link for endeth has become a bit cheaper.

C

It's it's become sort of an accepted, the approach to throw bandwidth at the at the problem, performance problems and the metrics, the metrics, for which metrics optimizer are quite different for many of the cloud or content providers, figure of Merit is page load. Time and page today is a very complex thing, of course, that can can include many many TCP connections and software interactions. Some people may be interested in low latencies if you do gaming or other interactive, AR, stuff or connected cars. So there's no, no consensus.

C

What how you should build your network, because people build a network for different purposes via the internet. People an ISP solve has to be a bit agnostic of this. We have to build networks for just any application and the that picture always changes.

C

So it's a it's an interesting tussle space isn't so one of the things I I'm struggling with is that if we like the research community wants to encourage operators to sort of modify their thinking or experiment with new recommendations on buffa sizes, we have to give them the possibility to see how much they need right.

C

So I know about the ice peas in the room, whether any one of you thinks you have a a good idea of what how your buffers are used when it's not just for ice peas, also for for cloud service providers or CDN operators or exchange point operators.

C

I think exchange points are particularly interesting case because they have any even less control, probably, but what people send them, as as other people, so I found it a bit frustrating when I studied I try to study before this problem that my I found that even the commodity chipsets that you that are on all the products now they have some instrumentation of. What's going on in the buffers like Broadcom has some sort of high watermark recording or where you can see?

C

Okay in the in the last period, something the buffer went up to this I and Mellanox has a nice feature way. You can actually split the buffer into slices of equal size and the chipset would in the in the fast path would count how many times the buffer was encountered at this size range I found that very nice. The frustrating thing is that the software we use on those switches doesn't allow me to look into this.

C

This instrumentation, although it's there in the in the hardware, so that's a bit frustrating yeah in the in the past also twenty years ago, I would have say well it's it's clear. The IETF needs to define a mabe for for accessing buffer statistics, because these days, people don't necessarily implement this as MIPS anymore, but I think the this whole story of how to how to give operators visibility into what the buffers are used for or how much they are used. That is quite quite interesting at hello.

C

Whether someone in the ITF wants to would like to work on that, of course, it's not just how full your buffers are. Although I think that is sort of the critical questions, you're running a network- and someone tells you you- you don't need all this large purpose that you have you. You really want to know, okay, how much of the buffers am I actually using and how often I know.

C

Maybe the average of our utilization of my four gigabyte buffer in my switch is only 32 kilobytes, but that doesn't tell me it's safe to reduce the the buffers to and all 3200 megabyte, because you know so many times there will be peak set but go to 2 megabytes or something, but also maybe not as much for ice peace. But you also want to know what kind of traffic what flows occupy that space. That's a much harder problem to really look more into the structure. The stuff in the in the queues.

C

I think that that could be addressed so either by some active measurements or this in situ measurements that also some switch switching basic people implement in their hardware or by programmable forwarding things.

C

So some some developments that might it might make it interesting to look at this problem again. I think it's other than these is modern, chipsets and design problems and links always getting faster. That's not new yeah they're, interesting approaches to limiting the the queuing through the network on the both on transport protocols, with things like data center, TCP or other attempts at making TCP more I know. Most of you would like and limit some of these bursts. Some of this is going on here in the IDF super promising, yeah and also they're. There.

C

People are motivated to work on this again.

C

So if, if I'm actually interested you in the topic, which I hope there's a few pointers here, where you can wait, you can look at everything the people studied over the last years. The reason I'm standing here is that two or three weeks ago there was a small workshop in Stanford to also by the people who basically initiated this work like 12 years ago, so and so that they're willing to do more research with with operators.

C

Few very interesting presentations which not really online yet and I apologize for that I need to hunt the organizers, I hope they will make them public should stop playing with this thing so that there was a lot of interesting content from operators of like content, distribution networks, cloud networks also, some some more traditional ISP people about measurements. They did, for example, by artificially limiting the buffers in their switches whether they got performance loss or even performance improvement. In some cases, all the things that interests them, which were all different and the workshop.

C

The goal of the workshop was to organize a wider workshop for the wider community. Maybe in fall 2019, so yeah say: watch the space I, don't know what the space is, but if you're interested, then you could have the opportunity to to talk more about this towards the end of the year and of course, if you have ideas for measuring buffers, offer utilization impact on transport protocols, performance and so on, then yeah there people interested in this certainly a few groups. You can also contact me if you want if you want pointers.

C

So thanks a lot. That's all I have any opinions.

E

Jeff, as I am NOT, an expert in transport I spend an awful lot of time. Looking at TCP.

E

One of the things you should think about for the next version of this presentation is to include pointers to an older presentation that run through ITF a buffer bloat, because too much buffer that you said this cars problem. It's not the fact that the box has too much about fares. Is that TCP, using that without any bounds, tends to have really bad back pressure problems so but like like you've mentioned pace, tcp is really what you're trying to optimize, rather than no thirstiness.

E

Second bit is speaking with a vendor hat on measuring buffers in a abstract way for bottles is not going to work for you. The problem you're gonna run into is that for easy software based systems like say a linux box, or some like that sure you could actually get an easy exposure of those buffers and map that to resources used, but the mini start actually getting to chipsets that have different types of buffer resources.

E

Based on how it's trying to do different types of forwarding you're going to find that you can't get a useful abstract model, you can in existing modeling stuff, like yang, protobufs, etc, gets nice modeling for what the resources are on the box, but even then getting mapping to the resources to how they are utilized be tricky. So, like don't think of Broadcom as an example, they have a host table that has certain types of resources for fast hits, host level routes and different resources for everything else. So just late.

E

Getting that exposed can help in some circumstances, but even then you're down to one specific chipset for one specific line card for one specific implementation. So your visibility.

D

C

E

If you get be ready, you know big data type analysis, yep.

C

Thanks Jeff, the buffalo thing is interesting because I see that as something where the community has made a lot of progress in a few years, so many of the problems have actually have been have progressed over the past years, which makes me hopeful for that.

C

The buffers already it's it's time to reconsider this well in comparison, I mentioned this aqn story, which is sort of a sad story, I think in the in our community, because there was a lot of research results and also some ITF work and, in the end, this very little deployed practice so yeah about the about the modeling, the difficulties of modeling buffer utilization, you're right, it's the! If you look into the hell the buffers I manage.

C

It gets very complex because so that what I I didn't mention, but these chipsets with many ports and like a blob of memory there they have to to allocate that memory to all the different traffic laws between ports and that's very tricky. So the the model is it's hard to do. The modeling I agree, but at all I'm just frustrated that I can't look at all at these. These statistics are being taken, so maybe even that bad model would be I. Consider that as big progress I know, maybe it confuses people, but so.

E

Now that you are getting there shouldn't even be any model I'm just saying that don't try to push this through IETF unless you're, looking for, like a very broad abstraction on things, because you're going to end up with per vendor per chip set and the pressure that we're seeing as a vendor right now from some of our large customers, especially the data center guys, they want to see everything.

E

The fact that you may want to shut off most of it in many cases is perfectly reasonable, because you want this information in many cases holding when you're trying to troubleshoot things. So if the Apple.

D

E

Specific line card is having issues you can turn on monitoring for the resources, and that can give you useful information. You just realize that you're at a stage and you're looking to see if you sources for a specific route. Another thing.

F

Jeff Houston Mike. This is a really old topic, as you're well aware, I seem to recall one of the seminal pieces of work was actually Doug coma back in the late 80s, where he had a 150 mega ATM switch with no buffer and managed to get a max of about two megabits per second out of it, and you know the consequent investigation as to what was going on. Part of this is the connection between the way TCP, traditionally rate controls versus how switches work and for most of our lives.

F

We've worked on loss based rate control mechanisms, I don't understand when a queue is building up until a cure is overflowed when a cure is overflowed, ie the buffer is full of on dropping packets. I am too fast, and if you think about it, a huge amount of work by van Jacobson by the time I get the lost signal. One hour, TT has gone past, I'm, probably going faster.

F

To what extent do I need to rage adapt in order to drain the buffer, and the crude mechanism that we came up with, which seems to work for the last 20 odd years is times two, so our rate half and then I again slowly build up till effectively I'm filling up the buffers when I get the signal, I rate Harv again that's metastable, and it probably needs to be because the world runs cubic for that. You need big buffers.

F

If you don't want to leave bandwidth on the table, you need buffers that are equal to the delay bandwidth product because of this harming property and doubling property. The real issue is actually work on bbr, which was old. Work on latency sensitivity that never quite worked. Bbr has taken a different view and it is possible to run bbr extremely fast with extremely small buffers, because your signal is the formation of the buffer, not the overflow of the buffer.

F

Now, if I was building, chipsets I'd be pushing PBR like crazy, because all of a sudden I don't need large buffers if I've got TCP being latency sensitive and if I was running large applications on today's network, I would use bbr hell. I. Am why? Because it knocks all those delay base, throw the loss based systems off the table because they're basically flood the network. So is there an evolution going on? Yes has lost based systems? Are they losing it?

F

Yes, why run cubic I like it just seems to be a totally self-defeating property with VBR version, one at any rate version two got to polite version, one just simply, did it better with extremely small buffers? If that's where you were yeah, it's an.

C

Interesting evolution or just on.

G

July anger thanks for that doc. I was actually going to step up and say something quite similar to what Jeff said in terms of the action between the congestion, controller and and the the queue management system. Ultimately, what we're talking about here is not simply buffer management in the network, but the whole system of sender, condition, controller rate control and how offers are managed in the network and what those signals of drop / ecn mean to the endpoints. So whoever is girl wants to look at this problem.

G

I strongly encourage them to look at the entire system, know coming to the entire system and condition controllers. Jeff is right that PBR does a better job of managing the buffers, but I would I would push back gently against the fact that PBR does not actually kill Kubek. It's super important to say that well, I and and I want to look at those those traces for sure, because it can.

G

There are certain conditions under which that can happen and, as he pointed out, PBR we do which, which you hear interested in will be presented at ICC RG. This, the speak and you're welcome to talk to the people are working on. Now there will be friendlier. Is my understanding? I haven't actually seen exactly what's happening over the past several months that, but, in general, the condition controller, not just DB, are a better congestion.

G

Controller is something that we want so to reduce the buffer, bloat / buffer management problem with the the the two sides of this are basically going to be. How do you do good aqm, and how do you do a good condition, control I, don't think DB as the final answer in this, but I think that's in the right direction.

A

Sorry, the Stephen in the room know Jana Europe.

B

B

G

Just moved I just moved from one mic to the other. Well, thank you for for waking up early and showing up to this talk to this session. This is a much bigger in in more crowded room than I anticipated, but I'm gonna, try and- and here so I, don't block that view there.

G

My name is Jana Inga I work at fastly and I am an editor in the quick working group and I am here to talk to you about quick observability. How many of you have wondered what to do with quick once it gets deployed in your network?

G

Well, a handful of people, not bad. So let's talk about it. I'm not gonna, be able to answer all your questions. I'm not gonna, be able to give you all the bits, but I'm gonna tell you what the current state of affairs is and then the floor is open. You can come. You know beat me after that.

G

So what are we talking about for people who've been under a rock for the past three years? That's well people who've been under a rock for the past 20 years. That's the HTTP stack for people, who've been under a rock for the past three years. That's what quicker places so quick is basically replacing all of TCB. It doesn't replace TLS, but it's subsumes Atilla so sits inside of Quicken in a nurse in a slightly strange way and then there's a new HTTP, that's being built to run on top of quick and that's htv-3.

G

So the surface that's exposed to applications on top will remain the same, but HTTP 3 is basically the HDB mapping over quick penis 1.3 sits in this strange way right there, because it's basically quick users TLS, as as as it doesn't, the other staff as HTP. Doesn't the other stack but quick uses TLS to do the exchange and then, when the keys come come out, quick uses the keys to encrypt its own headers, in addition to application data so as compared to the previous stack, we're only application.

G

Data was encrypted and the transport headers were visible in the network in the quick world. The transport headers are also encrypted and not visible in the network, most of them anyways. So those are the drafts if you enjoy reading drafts, those are available and before I move on one of the most important problems that quick there are several problems at which set out to solve, but the most relevant one and to this conversation is, is the one around metal boxes?

G

Presumably everybody's familiar with this here in this room, so I won't go into the details of what exactly I'm talking about here, but the problem that we're trying to solve is that of ossification is that we have and we have been unable to change TCP in in in quite a while you've seen TCP fast, open, you've seen MP DCP a number of extensions to TCP come out of this community. How many people would would claim that if you put a probe in the network today, you would see a TCP, fast, open sin.

G

Are you claiming that you would see dzb fast, open sins if you put a probe in the network? Yes, it must be in your private network, because that isn't very much out in the Internet. Yes, that was my point. Sorry I didn't make it very well.

G

H

Use a mic: hi Gernot, yes, I, do see some of these. Thank you. I.

G

Would love to talk to you more later.

H

Thanks I I do run a global backbone. So, yes, you can. You can talk to an operator about the implications that quick has for operators, because that doesn't seem to be a piece of the consideration, as that was considered as part of the working group. It.

G

Was and- and this is not what I'm here to litigate I never talked about the tooling that's available for quick we've had this discussion add-in, but in general right, not easy, be fast. So when I try to deploy I'm unable to deploy it, and if you see DCP first opens packets on the wire I'd love to see what fraction not TCP is actually using TCP fast open, because that's a number that we are actually interested in to be clear and I'm interested in it.

G

Iii I was I've, been pushing for TCP fast open for quite a while. It's super difficult to deploy apples forward. How to deploy. Microsoft's wallet have to deploy Google support hard to deploy, and it's not because we've not been trying. We've tried, we've tried and we've tried and metal boxes still break it in various insidious ways and we're unable to deploy it. So.

H

So what I'll comment on is that, originally, when ecn was implemented, there were certainly a lot of boxes, these middle boxes that did go and black hole connections that had these en bit set when TCP was set up to initially do that and being an early adopter of some of these technologies.

H

I've absolutely seen the types of behaviors that you've documented as part of this and that you're using you know that you're listing up here on the slides the same yes, these have caused me operational problems in the past is the fact that people have non-standard devices deployed in the internet, some of the ways to solve that isn't necessarily always to create a new standard or to create a new protocol or a new method.

H

Sometimes you have to go and do what, for example, Mark Andrews over here has done with trying to do stuff with the DNS protocols where he goes. Neat he's taken the time he's research he's identified, which TLD, ccTLDs, etc that are not compliant with the standards publishing the list, publishing the research and going and actually doing the work to actually go and clean up. Some of these devices, as opposed to saying huh fixing these is too hard, so I'd. Let me.

G

H

G

Protocol instead, I do I do I. Let me have the last word on that point and you did I'm gonna move on because we're not going to really to get this point here. The point here is that this was one of the core motivations for why we did develop quick and it is deployed, and it has seen less than it's taken less than nine years to try and see any bits on the wire.

G

So it's a strategy that has worked so one of the one of the arguments I'm going to make- and this probably plays into the point that you just heard is-is-is- is this so a while ago, Google's version of quake is just called G. Quit was deployed has been deployed for a while now, but then there was.

G

This is a few years ago and Google's version was deployed, and- and this is the before fastly I was at Google, and we- the first bite of this of this packet header was, was the flags fight and you all know what a Flags fight means right. Black spec means it has bits that you use. You turn on and turn off. We decided to flip a bit and because we want to change a particular behavior. We wanted the header to change a bit. We change B, and this is before quick, quasi, unpublished as an idea craft.

G

It's before any of this had happened publicly, but Google quit was deployed in Chrome and there was basically this was our unencrypted and we had left it at seven for a while and we flipped a bit and everything went to hell. Basically, we had we had calls coming in going users can't reach Google over any Google property using Chrome and by the way this is Chrome's problem, because guess what we can using Firefox right, and so we had to go, find out that the problem was we did.

G

We found it was in using quick that this problem showed up long story short.

G

This was basically a firewall that we found deployed on enterprise networks that allowed the first packet in both directions to go through, which meant that Chrome basically had a world fried like a happy eyeballs like I'm, going to try quake if it doesn't work, I'll fall back to TC and that doesn't work was concluded based on whether the handshake packet came back or not so chrome would send out a handshake packet.

G

This firewall would let it through the response, would come back from the server the firewall would little through and then, when chrome decided, Rick's working, the firewall would black hole everything. So this was exactly like the perfect mismatch between these two mechanisms. Of course, and we spoke to the firewall vendor, we spoke to them in detail about exactly what had happened and.

G

He actually talked with the team that implemented the code that did this and they said that the way they determined what quick traffic was. You want to ship a feature that did quit blocking and the way they determined what Creek was. Is they looked at TCP dump for a little while and found that the first fight was unchanging and the way they implemented a filter for Quake was basically this.

G

So I now want to make too fine a point here, except the node that the one byte that didn't change on the wire ended up becoming the quic identifier and I will say this for what it's worth. We still don't know. What's going to happen when we change despite tomorrow, right now, the Google quick, that's deployed doesn't have seven, it has nine in the first byte that G. That is a bit we flip by the way. The second bit is the bit we flipped.

G

It is nine right now so I don't know if this has changed to equal, seven or equals nine, or if it's changed something broader. So we have to see what happens next. However, this was a pretty strong reinforcement of something we already knew that the only way we could actually protect a protocol from bitten, getting completely ossified and for for us to be able to you know, deploy without having to wait for 10 years as we waited with TCP farce. Open was to encrypt yes, so.

I

This is a great story: Aaron Falk, but basically, if I understand what you're saying is that you were running a protocol in the network that was not published anywhere, and so somebody used a heuristic to figure out what it was yeah.

G

I

G

It's a fair point, so so looked a couple of things they could have done. One is so it was published as a public document. It wasn't. A standard I should have actually said that clearly the the document is available, but they hadn't even looked at the min they hadn't even googled, quick to be very clear. We may ask them: they had no idea the document even existed, and then they asked us the next time. You make a change.

G

Can you please let us know, and we said no, so we don't know what happened after but yeah. That was the state of affairs.

G

It's it's it's yeah, but but if you want to ship a feature, I would expect that there'd be some amount of due diligence. It's not to put you know, I'm, not putting them on the spot. They're, probably serving their business purpose adequately, well, to continue to survive and exist right and to thrive. Even it's a big company, but incentives are different in different places and they aren't incentivize to go. Look up the spec.

G

That's a pretty strong signal to us. We can write specs all we want, but we have to figure out how to incentivize these folks to go. Look at the specs, so at any rate we this was a lesson that was reinforced.

G

So what's the status of this work, we've been working at the idea for the past two years on standardizing this on working on is two and a half years now and it's we understand, everybody understands that there are obviously privacy and security implications, as well as network operational and management concerns, and it's unfair to say that those happen haven't been considered. Those have been absolutely considered. It's it's!

G

It's just that I think that the community is is worn down by the inability to move transport or into n bits, and we are trying really hard to figure out how to make this work. This means that there might be fundamental operational, fundamental changes and how operational work happens going in the future.

G

I, don't think that should be off the table and I think that that sort of work is encouraged now so we've we've had a very strong focus on awarding ossification because we want to paper over the network that as it is, and we want to avoid middleboxes looking at the headers if there are strong reasons why metal boxes need to get some information, there's a very high bar for how to demonstrate that, and that has to be a demonstrated need and I think that is critical, I think that should have been established 25 years ago.

G

Unfortunately, we didn't have the experience that we have now so as an operator if you're thinking, what can I do here, I would recommend that you think about what exactly you need and and and and and the working group will be open to have a discussion on that. So there's several implementation efforts.

G

How much time do I have left.

G

Four Pistons I actually wanted to go into the packet header, but I'm not going to go into the details here and just gonna to show you and I wanted to show you some some tooling here the packet header basically has it, has two flavors there's a long header and then the short header format.

G

The long header format looks like this: it's only used for the handshake. It's only used during the handshake to bootstrap to set up the connection and the bits you see there, which are not marked out with xx xx, are visible in the network. Those are unencrypted bits visible in the network in the long header when you get to the short header.

G

Actually, that's not a great picture, because it's and I want to say this doesn't show you everything that's encrypted, because the packet number here is imprinted, but only some bits are actually visible and I'll actually show. The short header is the common header. That's what's used during the rest of the connection after the handshake is complete, and this has a bit here called the spin bit that you may have heard of and if you haven't heard of it, you again been under a rock for the past few years climb out of it.

G

But the spin bit is basically a bit that is available in the in the short header and and it's it's. It's meant to be used for passive measurement of round-trip time. I won't get into the details of how but that's visible in the network and that's available for the network to use to measure round-trip time folks at Ericsson and other places, and are doing a ton of work on figuring out how to use this, how to build filters and tools around this, and so on.

G

Again. That was a very strong case that they had to make that this was super useful and super important bit of information in the network, and that eventually came out of that discussion. A very heated one and say so. That's the packet header, where there are, if people who are familiar with TCP might say well, I can see a packet number that looks like a sequence number, whereas the acknowledgment information was a flow controller information.

G

Well, as it turns out, none of it is in the packet header, they are all inside the packet, so a packet basically is constituted by frames and you can have different frames. There are many different frames of which there's a stream frame that carries application data and the AK frame that carries acknowledgment information.

G

And now we just look through a whole bunch of information out at you here, but I'm going to skip these and show you basically what a quick packet looks like there's the packet header, the grayed out bits are ones that are encrypted, so the packet number is in fact an encrypted. You can't see the sequence number on the wire anymore and pretty much everything else inside the packet, which is the scheme. The acknowledgment information is all encrypted.

G

So clearly, Wireshark is not going to be enough because it really, you can't see very much without having an endpoint key and that's that's how that's going to be with Bioshock. You know what be able to get a lot of information, but there's been a lot of tooling work. That's going on for doing endpoint, tracing of of quick and that's seen a lot of traction. So here a server-side.

G

Network or if you're, a client-side network, then there's a bunch of tooling, that's that that works been going into and if I may just very quickly show these two bits: quick trace and quickness, or two tools that have been under serious development and they are very, very promising tools. The first one quickly is written by Victor at Google is available, it's open source and basically is a packet trace. That shows you.

G

You know like if you've seen, TCP trace, it's quite similar to that it shows you at times it's basically a time: sequence, diagram of of packets and acts and losses and things just that. Quick quiz is a much more elaborate rule being written by robin marks at university of hasselt, and this shows some seriously interesting information that you can get from doing stuff logging at the endpoints. These are all available to see. These will be discussed in much more detail at the tcp TSV area this week.

G

So there's a half-an-hour talk on quick trace a half an hour talk on a quick quiz. So if you're interested in these, please show up and I think you have maybe a minute or two for questions.

H

Excellent high chair, much still hi um so so I have a I think an important question, which is what are you doing in the protocol you've taken, so you've gone and you've moved from from you know, protocol number six, the protocol number seventeen and gone and encapsulated your data within protocol 17 and a number of global network operators have deployed proto, 17 rate limits and sometimes proto 17 rate limits with ephemeral, ephemeral, port numbers in them.

H

How has the quick working group consulted with those operators about what the consequences are for the protocol in detecting ephemeral port, but either black hole to rate limits and I can give you some history if you'd like.

G

So I can speak to my operational experience at Google with this, which is that when we deployed quake at first, we found about 93% reach ability with quick, so in terms of black holing. That may have been the 7% that we that we missed and I was perfectly fine, because we had a TCP fallback. So if quick didn't work, that was perfectly fine. The rate-limiting is this is the more nefarious one because well I'm, sorry, how is like rate limiting denial.

H

Service attack, I didn't.

G

In Sedona, so it's it's a mode, the more than a fattiest one yeah.

H

Well, that's why the rate limits were deployed, I.

G

Think you're talking past each other yeah.

A

I think you're both talking about the other side of the problem, like he.

D

Says the various.

A

One meaning suddenly, my packets are weirdly not getting there and it's really hard to figure out. Why versus.

D

A

Side like somebody trying to da stack my customer clearly I'm gonna, write, live at that right, yeah, I, think or.

H

I'm gonna rate limit attacks that traverse or are against my network infrastructure so because I don't want to have to carry a tech traffic to another continent, for example, which I think is a legitimate operational use case. I think.

G

That's completely fair and I'd love to talk to those people. If you.

H

Have those people like you could talk to me, I'm gonna.

G

H

To Joe from ITT who's also publicly said that he's done. This there's also a number of other mobile networks that have done this as well that have published this in the operator community. So I'm kind of curious, where this disconnect is between the quick working group and operator community Yepez.

G

H

These things so.

G

The discussion has to happen in the quick working group by the operator community, so.

H

G

Come to the IETF: well, they have to come somewhere, I, don't know exactly what do I.

J

G

Between you and me, what I can say is that operationally our experience at Google was that when we turn this, when we turn on quick globally and right now to be clear, quick is about 40-ish percent of Google's traffic. Last count, it was 40 percent two and a half years ago, and that's about seven percent of Internet traffic, so I'm sure that you're seeing quick packets right now flowing over the networks. Rate-Limiting was something that was observed.

G

It was definitely observed and what we did was the cold-called operators, and most of them were in enterprise networks, and they basically said we didn't even know that we had a rate limit on. We got 3% back just from doing that. That's so so yeah we did get people to turn off the rattling because those are on by default and their firewalls, not not because they had turned it on.

G

We still have a mostly have got rid of the rate-limiting at this point that the most recent data was from last year, which shows that most of the rate-limiting that Google sees has gone down. It's it doesn't exist at this point. It's quite possible that it's still happening in corners that that I haven't seen or heard about. But if it's happening in a big way, then it's certainly being missing. Google traffic. Okay,.

H

Yeah I mean I've tried to bring this to people in the quick working group and they have not been listening, and so that's why, when you've come here to this, has more operators in it and that's why I'm trying to reinforce this message? That has not been heard right very.

D

H

That I would what I would say is that the, but that, but so my second question is actually more about why? Why is other protocols are moving away from UDP such DNS, such as DNS and they're,.

G

Moving away from UDP yeah, yes, hang on.

G

H

Thank you you, you have confirmed my point.

G

All right, I'm, clear you I, am sure.

H

Has the disconnect, based on this conversation,.

G

So, to be very clear: I'm, not the working group I'm here doing a presentation about quick, tooling you are, this is I, think the reason the conversation is breaking down is because, if you want to have a conversation at the quick working group, you need to off the chips, get a slot discussions made I'm, not representing the working group here, I'm representing the discussion I'm having this is I. Don't think this is a productive conversation at this point here. Listen.

E

I Jeff Ayres, who is that Jared, since this is operational considerations and I, haven't been following quick in the specs for a while. What's in the specs right now for the fact that we're using UDP and this impacts ecmp for four day.

G

You could suitably CMB with the 40, but I mean it's sitting under under sitting. It's a new DP. Oh so.

E

The de part of the point is in your slides sort of approach. This a different way leave my packets alone. The contents are secret, routers have to actually, you know, find some level of entropy within the packet. They actually know hash, stuff and trouble with UDP, as most boxes that are actually doing ecmt on UDP have a limited number entropy that they'll. Actually, though, they don't tend to get to the contents of things they try to stick mostly to the headers. So one of the considerations therefore become expects.

E

Should talk about ecmp only goes this level of deep and be very careful how you actually do your flows on this, because otherwise you're gonna find that, despite the fact you have massive fan out for easy, especially in datacenter situations, if you're not careful about how you play with your port numbers, your traffic all smashes down same porch Chan.

E

You know you two guys up front, obviously know this, because you we've had long conversations about this sort of thing, but for your presentation here this is an important thing for the rest of this room. Sure.

G

So I would say that ECM be still usable and a lot of deployments. Actually do you see simply and uh planning to use ECM v4 for sharding traffic for quick traffic coming in, but that is if you are willing to build in the support for it. There is a connection ID that is in fact visible on the wire and that's at a fixed point in the packet header that can be used for shouting, and that can be as long as seventeen bytes. So you can certainly use that for charting as well right.

E

And again the point becomes at some point: you have to figure out how do I look at the pack in a safe fashion when this comes back to you know, is the spec stable enough so that ASIC implementers can exist? That's.

G

That's a fact, question and I think the doctor's approaching it's basically more or less about stability right now and yeah.

E

So this is not in the specs it's written even a bit, let's figure out the things that have that conversation.

K

That's question.

E

Eric one so, given that everything is now hidden of which I am a fan, one performance enhancing proxy did that does strike me as having been useful as the things that are at the TCP used to be managing for satellite links. Are you aware of any measurements or experience of how quick behaves over satellite links? No.

G

But I don't explicitly be any different than TCP over into inside LED lights. It just won't have any in network assistance. It will have to that's error.

B

G

Just whatever it sees itself yeah so at the moment, and that's a conversation that mean we we can and we need to have going forward. But it's it's something that at the moment, if this is again why we expect that I mean HTTP. For example, every single implementation of it is probably going to have is, will have a TCP fallback. So ultimately, if you need need need need, need those devices the and that's the way to get those performance benefits out. Well, I.

E

Mean if you were operating a satellite network, you can want to provide all protocols that the end user.

B

Might want to use.

E

With whatever the best experience could be, do.

B

You think in protocol mechanisms.

E

For thinking for learning, this are satisfactory. Do.

G

I think the protocol mechanisms that are in quick right now are satisfactory for not for this purpose. They're not gonna, be adequate they're, not no they're, not.

B

G

Adequate things you need more support service me. Oh really,.

E

Okay, well I thought I was eating the mic. Even the question was.

I

About design like.

E

Links and quick performance of her satellite links, but yeah.

G

I mean clicks, so not a lot of work has been done in there. There's probably you know, condition, control and other work. The protocol doesn't offer additional hooks for things like that to happen at the moment, but there's there's certainly possible to imagine that an extra version of the protocol might have more things do to support support. Other network types, Thank You.

I

Erin for just a point of information on that last topic that there's a mailing list called each asset. That's discussing performance.

K

Over satellite.

I

Links, including with quick for.

K

Folks, who are interested, thank you for that.

A

Okay, is there a Steven Farrell in the room, nope Jobe you're up.

L

Good morning, ie peachy, can you hear me properly in the back of the room? I will talk louder. This is really awkward.

L

My name is Jobe Snider's I work for entity, communications, a IP transit provider and in this presentation, I would like to discuss some ideas or considerations related to BGP black holing, quick recap. What black holing is black holing in general is that you signal to an adjacent network, usually your transit provider.

L

Please do not send me packets destined to this IP address, and the common use case is that you receive say a DDoS attack targeted towards one of the IP addresses originating from your network and you sort of disable this IP address in order to allow the other IP addresses to remain reachable, in other words, you're sacrificing the victim of the DDoS attack, so that the rest of your services remain online. Fine, usually black holing is implemented through one of two methods.

L

One is what I call in-band signaling, where, on the BGP session with your transit provider, you announce a host route with a special BGP community and as special BGP community triggers the black hole in behavior or a second approach is out-of-band signaling, where you have a ebgp multi-up session towards a special route surfer in the providers network and that route surfer will inject a black hole into the ISPs network.

L

When you receive a black hole request, the big trick is that you, the big trick, is that you rewrite the next hop to an IP address that is no routers or you rewrite it directly to the null interface.

L

Now, the downsides of this approach is that your request to not signal to to not receive traffic for a given IP address also becomes the best half if it passes through the filters, because your request is conflated with also being the most specific route to a given destination.

L

Another downside is that almost HP implementations you will need two filters: one filter to catch the black holes and figure out which ones you would want to allow and the other filter is for normal BGP routing. To put this in perspective, today's largest conflict, an entity's network, is 57 megabytes and roughly 50% of that is because of black hole, related prefix lists.

L

Now most providers will use IR or RPI or Whois data to construct these filters, and we've noticed that some of our customers, either maliciously or accidentally, have requested black holing for IP addresses that are not theirs. We've seen, customers that would add a SN's of their competitors to their own assets, so they make it into the filters and then they signal black holes into our network. Of course, this upsets some customers.

L

We've also seen excellence where a DDoS mitigation system miss identifies a DNA adidas flow and it starts black holing for IP addresses that are outside that customers network in any guard. All of these situations should be proactively prevented, rather than reactively, where we disable black hole and capabilities for such customers.

L

This is an example. Routing policy. You can see we have two lists, one for black holes, one for normal routing the destination that we receive over the ebgp session is in the black hole list and the community is our black hole. Community then set a specific next stop and jump out of the policy. If that conditional requirement is not met, it may be a normal routing signal and then we would accept it now.

L

My proposal to this community is that we should only honor black holes if they align with the active path. So, if you're receiving traffic for a destination, then you should be allowed to signal black holes if you're not receiving the traffic anyway, for instance, because you're, not the best path or you're a SPF is longer, then your request for black holing should, at that point in time, not be honored.

L

So what I propose is a black hole, validation procedure, for instance, if I receive a black hole on a certain edge, router and I I also have a copy of the rip.

L

The local rip of that edge, router I, can walk the rib and see if the next less specific route is also pointing to the same next stop or if the next less specific route has the same Aysen in the most the leftmost position of the s path, there is a number of ways where you can validate is the black hole request, aligned with the best path, whichever method you pick, I think this will bring significant benefits to your other customers.

L

Now, the implication of only honoring black holes. If you are the mass bath, are numerous. For instance, it makes it much harder for networks to black hole traffic of their competitors because their competitors are may be directly connected to your backbone and a for-sure to say s path or hire local preference.

L

This trick, where we depends where we make black hole. Validation depends on the next. Less specific routes also causes us to enjoy the benefits of any type of filtering. We do it out at our edges, like bogum, filtering or RTI, rich invalidation or IR based filtering. In other words, you have to put in effort to become the best path.

L

For instance, you'll you'll be announcing a slash 24, which is the longest that will propagate in the DFC and then in other words, if you want to black hole with this method, you will have to combine a black hole request and a BGP hijack in order for the traffic to disappear, and that is a significant higher barrier than what we see in today's like holy implementations.

L

So in black holding reconsidered, I think there is multiple input channels or listener channels that we can consider. You could give an API key to each customer and let them post the requests to a certain HTTP endpoint.

L

We can use invent BGP signaling, where we either use EMP adjacent within pre policy to catch these requests for black holes or use a trick where you don't install the received black hole routes into your fib and only export those routes to a special BGP validator, not to any other ibgp neighbors, or we can use out-of-band signaling through ebgp multi obsessions, where, through a md5 key. You can assure that you are talking to your customer in this new world.

L

We can remove the black hole policy related to the slash 32, so we only need to install prefix list filters for normal routing. If we see the community and a specific prefix length, the actions that would follow would be to advertise this route to the black hole. Validation server did not install in fit and exit the policy, so I really expect up to 50% reduction in our router conflict size, which is really nice.

L

Our plans at this point in time, we're within the PM acct project, are working on a black hole, validator implementation. This will be open source software free to use for everybody.

L

Pm acct is a very powerful tool and that already has a lot of the moving parts such as BMP or or bgp capabilities. It can already walk the rip in various directions and perform comparisons between more specific and less specific routes, and if it, if the the request would pass the validation step in PMS, st then PM st will Ament a action into a stateful database such as Redis, and that data store can be used to generate injections into your network.

L

Because in all these components you want to be able to reboot the router reboot, the validator. We would injectors OD and all the while. We need to be able to maintain the state of these black holes a.

L

Cute trick is that this can be implemented on virtually every beach B implementation, as long as they support either best external path. Advertising or you do your injection through an ACO or some other means I think there's many ways to skin this avocado.

L

But the good news is that everybody gets to implement. It has an opportunity to implement this in their network. We don't need to wait to prevent theirs to support us.

L

What I, also like about this approach is that it goes very well with our PGI origin foundation in this community. Archicad, written validation and black hole have been considered a problematic situation, because we don't want people to create robots that allow up 2/32, because that opens you up to certain attack vectors.

L

This means that black hole announcements are almost always invalid announcements and should be rejected, but by lifting by using this validation step I'm, depending on the less specific route, we can use our PGI origin foundation on the normal routes and those will pave the way for the black hole records to be validated or not, in other words, our pica origin foundation and validate a black holing work very well together.

L

This is a community effort at this moment in time, NTT communications and Tilia carrier are working towards implementing this methods in their networks and PM. Acct is the validation software and we hope that in May we'll have something to publish and for others to work with.

L

So the summary of this presentation is, unlike the traditional way of implementing black holes, I posted that black holes should only be honored if they align with the best active path. If you're, not the active path, we will ignore your black hole request and with that, I would like to open up the microphone for questions, comments and considerations.

M

The video for telecom, you I, have been looking for something like.

M

Making sure that I only honor the black hole when it goes the same path as an active route and I'm certainly going to look into how they met how the mechanism actually applies.

M

M

One nasty pea counter comment on the presentation: I, really hate, I, really hate to see the opening sentence. Black holding is signaling, which it obviously is not in the signaling. You are carrying requests and in parts of in parts of a presentation you were talking about requests I personally like to speak precisely and really distinguish between what happens to the packets and what's happening in the signaling.

M

The other thing is my requirements for security and, following that, authorization for the black holing probably are a little bit higher than can be achieved by the old set up that you are discussing, but well, okay, what you are discussing quite clearly is a big improvement over what is used by most people today. If.

L

I may address your second comments by making the black hole requests. Dependence on the normal routes and with normal I mean what passes through our filters. What I'm suggesting is we can make the filters as strict as we can, and maybe you you have you know a different policy than I have. But the point is that anything we do for normal routing or positively impact the black hole feature.

L

So if we deploy origin validation, the black hole signals or requests become of higher quality if we apply bogan filters or filter out a essence that we should never see behind that ebgp session, what I'm saying is the better. The quality of our normal prefix filters, the better the quality of our black hole, requests completely.

M

Agreed that was essentially my first comment that, yes, this improves stuff and while okay, the last comment, is well okay, actually looking at the nasty details, I well, okay, the authorization requirements that I'm using today are, in certain ways more strict than what you are doing and what most be are doing and kind of even you're, pointing to using origin validation, I. Think if I start to look at the nasty details.

M

In my opinion, I will end up opening up black holing to be started by parties. That, in my opinion, are well okay, where I do not where I do not have the authorization by the address older. That I should be accepting the signal from that third party that is somewhere in the path yep.

L

If you are an IP transit provider like entity or DT, and you want to talk more details, how to implement this in your network or how to plan for this, please talk to me or email me. If you're a vendor of bgp software, please recycle some of your flowspec validation code so that we can do this validation trick on the box rather than off the box and I fully understand that this will take two or three years to to hit the mainstream customer releases. That's fine!

L

Until the time we can use the PM acct a black hole validator, but ideally I would like this to be unboxed rather than half box. Any other questions comments.

L

Thank you for your time.

A

Is there a Steven in the room, okay, Fernando with you.

B

D

B

N

So I'm, Fernando and I'll be presenting essentially a document we offer with a young source. Sorry, it's better okay, that was discussed mostly on the six-month and basic source mailing list. The reason for presenting it in this meeting is obviously because we like your feedback and your ideas. Sorry, so there we go. So if you have been following the six men and they'd be six jobs mainly list well. This documentary got quite a lot of discussion.

N

They were essentially like almost three hundred or over three hundred messages that were triggered by this document. This document discusses an operational program problem that arises in some remembering scenarios.

N

Actually, the problem we have been pointed out before in this ID by yelling Koba, and it was actually to some extent, discuss in a write document, five hundred something the document about recommendations about perfect sizes. Okay, so there are multiple scenarios that can actually lead to this problem. We are just going to focus on on the most simple one.

N

This is obviously a very common deployment scenario for ipv6, where you have the CPE router that is doing DHCP prefix delegation with the ISP, and you get a prefix list, and then you advertise on the local network, its sub prefix from their prefix, that you were list right. So that's quite quite common now. The question here is Oh before we get to them to the problem. This I cover and something to keep in mind which are like, which is quite important for the problem being discussed, is some of the associated parameters.

N

You know with this deployment scenario on one hand, typically, the the router lifetime that is advertised on the local network is in the order of over 30 minutes. Typically, if I remember correctly, the default value is 30 minutes and at least time is normally in the order of several days two months. Okay. So this is essentially the problem that you might be that you might experience. Let's say, for example, that the CP is hard reboot or it crashes and reboots.

N

It could also be the case that the user I don't know if that's common in let's say in your environments, but we are quite used to whenever you have a problem with your internet connectivity. Even if you call the USB support, usually the you know the first way to solve the problem is like: have you try resetting the home router, so those are among the reasons that might lead to this scenario. So the question is what happens when, for example, you reboot the CPU router, or it just crashes and reboots?

N

Well, quite normally these devices they don't keep any state regarding the prefix that had previously that have been previously list, so normally they will do DHCP, prefix delegation again and depending on you know the policy of the ISP. They might actually get leased a new prefix okay. Now what happens is that, of course they get leased a new prefix. They will advertise a sub prefix on the local network. But the question is what happens with the previous prefix?

N

Ok, so you got a new prefix, you advertise the new prefix, but nobody told the network that the prefix that had been announced before has actually become stale. Okay, so essentially the old addresses are are maintained quite frequently. The old addresses are preferred and obviously also the old routes are preferred, meaning the old prefixes are considered on link and the others that you had configure before are still prefer. So you keep the old addresses and you configure additional addresses now. What's the problem with this, of course, the previous services have become stale.

N

So, if you try to use them, they will fail. This means that, for example, v6 connectivity might fail, or, for example, if you are using the same router that implements happy eyeballs well, the connectivity would fail and you might end up preferring ipv4. Ok,.

N

Let's talk about a couple of deployments that might avoid this problem, so the approach that was recommended in the right document- I- don't remember. The name of the document of of the top of my head- is to use stable prefixes for 4 networks. Ok,.

N

Well, there are, there are pros and cons, for example, is nice for law enforcement, because if the user always get, you know the same prefix, it's easier to. You know to track the user. On the other hand, it's not nice for privacy, because even if you do, for example, temporary addresses well that's kind of like in vain, because you change the interface ID, but the prefix is always the same. So you can be track with the prefix some ISPs report that the provisioning systems they allow.

N

They don't allow them to this, and also there's the case whether you agree or not, that you know some ISPs want to change extra for stable, prefixes I'm, not saying that should be the case, but that's something that happens. So that means that by default they give you a dynamic prefix, because if you want a stable prefix, you have to pay extra. So the bottom line of this is that it's quite normal to actually get dynamic traffic sis. There was some some kind of like survey that had been performed on this topic.

N

I, don't remember the numbers, but I think it was something between 25 and 35 percent of of the surveyed ISPs that were leasing the non prefixes. So in the case of you know those eyes, peas that have been surveyed in around know 25%. They were using dynamic, prefixes, meaning that you might you know potentially heat this problem. That's a personal point of view. I!

N

Don't think that the solution to the this problem should be to use stable, prefixes or put it in another way, I mean, even if you don't, the network should be resilient and you know still not break. If you know the ISP is less and dynamic prefixes, ah then there is another deployment, let's say scenario if you want that might help avoid this problem, which is the case in which the CPU router actually records the prefix that has been list on stable storage, okay.

N

So the idea is that when you do DHCP perfect delegation, you store the prefix that you have been laced on stable storage.

N

So, in the event of the reboot, you might do DHCP prefix delegation again and if you learn a different prefix, what you might do is advertise the old prefix with a lifetime of 0 like prefer lifetime and valid lifetime of 0, ok kind of, like you advertise the new prefix with a normal life times and at the same time you advertise the stay prefix with, let's say, 0 lifetime, so you can somehow disable it or deprecated the prefix.

N

Now there are a number of problems, or this is kind of like tricky. First of all, if you look at RFC 48 61, it says that if prefix information option has a valid lifetime smaller or shorter than two hours, the valid lifetime should be ignored. So that means that even if the CPU router tried tried to deprecate the prefix, the old addresses would, you know, couldn't actually be deprecated.

N

You can and prefer the prefix, so the the preferred lifetime will be honored, but not the valid lifetime and in any case there are many of these boxes that just don't record their lease prefix on stable storage. So that means the that's, not something that happens in practice.

K

Which is better that's right! Now you kind of corrected yourself there, so you we can use the preferred lifetime of zero to deprecate it. It's deprecation versus invalidating, so that does work. Sir yeah problem, no.

N

So in that case, what you, if the CPE, was recording the prefix unstable storage, which doesn't need to be the case, but in those cases in which that happens, then you can advertise the preface with the prefer lifetime of zero bar you, you wouldn't be able to communicate with a new owner of the prefix. Why? Because you still have configured the addresses, so, for example, that preface will be considered all linked. So if you want to, let's say, communicate with a new owner of the prefix, it won't be possible.

N

Now there is another problem that is not mentioned here, that some implementations enforce limits on the number of viruses that they can configure. Okay, I, don't remember if it was Linux today, I think enforce a limit on 16 addresses, okay from different prefixes.

N

So if you consider that the default valid lifetime for a prefix information option is of about one month, and you assume that the implementation limits the number of configure a dresses to 16, then if there is like a reward of let's say every day or every other day, you will actually hit the limit. So even you know, if you were to do this when you advertise the new prefix since you keep the old address SC, since the other addresses are still maintained. Eventually, you will hit the you know.

N

The system limit on the number of prefixes for which you configure addresses so that of keeping the old addresses doesn't come for free, okay.

N

Now, besides, you know the fact that you know CPU routers could do something along these lines. Unfortunately, this is something that you cannot rely on right, because well, there are deployed boxes that they just done in all record dailies prefix, unstable storage. So this is our take. Of course. This is something that has been proposed on the mailing list and is something that we have considered. One of the proposals was to actually try to somehow affect the source address selection, algorithm, for example.

N

One of the things that was proposed is that you should always make VRS with the largest referral lifetime. They prefer one okay, so that's one option which doesn't actually make sense, because you know it's up to the implement. You know it's up to the each profits which prefer lifetime is employed and another option that is better than option. One which doesn't make sense, is to essentially associate a timer or associate the timer which, with each prefix that is advertised on the network, like you know, a timer that or a tungsten actually better work.

N

It turns that that essentially indicates when was the last time that the prefix was advertised. Now, if you were to use for example, as they prefer address the one that was less advertised, then you could say that in the in scenario in which you know your CPE router crashes and reboots well, the still preface will not be advertised anymore, so a new prefix will be advertised and then you would use them in our correct address. If you want now, there are a couple of problems related to this.

N

First one is what I mentioned in the previous slide, meaning that you wouldn't be able to communicate with a new owner of the prefix. That's one of the problems. Second problem is that since you keep the old addresses configure, you might hit the lick. The system limit on the number of configure a dresses so meaning, if you get like, for example, one reboot every day. You know after a couple of weeks, you will have configured 16 addresses and the new prefixes advertised will be ignored. So that's the problem.

N

Now there is another problem, probably not as bad as this one, which is, if you are in a network segment where there are like it's a multi prefix network. So two prefixes are being advertised. Let's say by two different routers, then, as these two routers advertise, their corresponding prefixes, the source address of your pockets will flap. Okay. These routers has a router advertisement, so this perfect was the last one being advertised. So you start preparing this address. Then the other router advertises the different prefix.

N

Now you start sourcing packets from that other addresses, and you know it's guaranteed- that the source address of the packets will flap. Okay, that's kind of like kind of like not nice, for troubleshooting.

N

There are a number of things that you know we think could be done to actually, let's say it's all boring, prove the situation with respect to to this problem.

N

They say the most important mitigation from our perspective is essentially to do something along these lines. I mean there are multiple ways in which you could possibly infinite this, but essentially it goes as follows if you are receiving RA packets from the same router that had advertised your previous prefix and the array contains prefix information options, but not the previous prefix, for which you had configured addresses well, that should be taken as an indication that the prefix is stale okay.

N

So what we do essentially is, for example, we wait for two arrays from the same router that contains prefix information options and, after the second one we unprepared the address and after two additional ones, we actually completely remove the address and the unlink route.

N

Okay, it's kind of like tricky and hockey in the way that, in theory, the prefix information options, don't don't need to be contained all in the same array packing so yeah in theory you could say you know, there's a router that is advertising, let's say three different prefixes and it says one different array for each of the prefixes. Yes, I haven't seen these in practice. Okay, let's say the good part of this is there you can essentially solve the problem on the without actually requiring any changes on the network devices themselves.

N

So that means as soon as you know, you implement these the same mechanism, then no matter what you know whether the ISP is leasing, stable, prefixes or not, or whether the CPE router, you know records the prefix on stable storage or not. You are kind of like overcome the the problem.

N

Papa Papa. Now really, when you know when, when we were, you know analyzing this problem, we started to, let's say: rethink or reconsider some things related to slack and neighbor discovery. So first one for example: normally the prefix lifetime in prefix information option by default is of about one month. Okay, now the router lifetime is 30 minutes. Now our question is you know to what extent it makes sense for the prefix lifetime to be larger than the router lifetime?

N

If you think about this in the context of RFC 80 28, which is about you, know, multi prefix networks, it essentially says that when on any given network, you have multiple prefixes well, the prefix should kind of like be tied to the router that advertised the prefix. Why? Because, if not, if you start sourcing packets, you know from one of the prefixes that was advertised on the local network and you send those packets to the wrong router. Well, it's quite likely that they are going to be addressed future.

N

So normally what you want is to source package from a you know from a given prefix only towards the router that actually advertised that prefix, in which case I, don't know to what extent it makes sense to have the prefix lifetime larger than the rotor lifetime I.

N

When we were discussing this topic with a number of faults, the only reason that let's say we found for this is that in the event you have an outage with the ISP, then you might continue using this prefix now from my I have liked differ a little bit with that. So what I would say is, of course, this I'm not saying that this is probably acceptable to most people, but probably if what you want is a stay, it's like a prefix that it's that you can actually use, irrespective of what happens the ISP.

N

That might be an indication of what you LasR for that's one thing. The second thing is that what you could have is that the local router never advertises the prefix with lifetime larger than the rotor lifetime. So even you know, even if communication with the ISP is lost, what the local router can still continue advertising the profits. Okay for as long as the prefix lifetime.

N

So in that sense, for example, if one were to cut the you know the the the maximum value that one honors for the prefix like then, let's say one could limit. You know the amount of time that these breakage lasts on the on the network. I don't know if there are comments or questions.

E

Are you Eric Cline, you and I spoke about this before one thing that just came to mind was I. Think actually the easiest thing to do would be for the host to do some kind of bi-directional forwarding testing by basically tried to send to an address in the PIO, send originator packet to the router that the router should write back to it.

E

Bi-Directional forwarding detection BFD of several rft is about PFD. If there was a host based BFD, where you would just basically send the packet to an address that you yourself already have, but amid it you know, send it to the to the Reuters MAC address. The reader should bounce it back to you. If it doesn't, then that your Pio might be gone after a couple of those BFD probes, then you can say yeah. The PIO is really gone.

N

This station, for that, when.

O

You have already configured let's.

N

Say you have an address configure for the prefix okay, you want to check that. It's that's working! What's the destination address that you use for the check the.

E

Same or configure a second one, it's slack, configure 17 addresses and do PFD probes between them so use as long as you're. Listening to the to the nd messages you fatigue you subscribe to the to the solicitor node multicast group, you know you should be. The Rooter should be able to find you and send the packet back to you or issue.

N

A redirect so this.

N

E

A lot about how PFD works so I think this would be actually the best solution. It would solve a number of problems as well.

D

First of all, I think you to clarify on your slides for people who might not before, and the whole story is that all this router behavior is clear violation of RFC when it does not record the previously assigned prefix right, so we're talking about definitely broken router implementations, not something which is expected as per our standards right. So what we trying to fix, we trying to fix, wrap me, broken routers, misconfigured, routers, forget loss and so on right, we're not talking about situations when every senior accessed your RFC.

D

Secondly, I do not feel that souffle pain, source address is a bad thing. If you have multiple prefixes on the interface, it's already happening now, if you use your longest prefix match right. So if my host has two addresses on the interface from two prefixes, you don't know which one will be chosen for the next connection. So it's already happening. So it's okay! If your hosts are selection around down prefix for source of this, and actually it might be desirable if I'm advertising, two prefixes from two routers I, might actually see kind of load, balancing.

N

That's fine. What do you say that this or so this would be the same? If you apply source on the selection algorithm, why would it lead to a different outcome now.

D

What I'm saying is every time I host opens a new connection right that will apply default search, source for this selection, algorithm and even now, without proposing use the most recently updated address. You don't know which one will be used for next connection. It will be random depending upon your destination,.

N

Because the longest prefix matches no I agree with that, but given the same destination nowadays you always get the same source address. Now. If you modify the Society selection algorithm given the same destination now the source Saladin's would change. So now you try it and it works, and you try it later. You just get a different source address and it doesn't so for troubleshooting. That's not nice. I would.

D

Say if you have privacy extensions, make sure you still get different source addresses, but outside.

N

D

Well, like I said actually I personally in my network, I would love to see hoes doing kind of load balancing that way. It would be nice, so it might be actually design about behavior, but but again back to the proposed solution.

D

I think, first of all, we definitely need to twist the timers, because the current standard default timers do not make any sense, but I still not convinced that we need to significantly change the logic of the protocol to work around broken routers I think the proposed changes to heavy and drastically changed the whole logic of evidence of absence is not absolute. Evidence is not evidence of absence when you should not make any assumptions based on information which does not present an array. One.

N

Thing that was Eric there had proposed. That is, for example, if the previous prefix is missing. One thing that you might want to do is send an RS to the router like pull the router and see. If you know the prefix, is there that's one option?

N

The thing is you know somehow you need to get rid of this tail addresses. So what we we give you some.

D

Prefix time will do what you want right, basically reduced, preferred lifetime and make appear more in line with the router life. Tanya do most of the trick for you, but still you keep the old addresses.

N

And if you get multiple reboots during the month, you will hit that system limit for something I. Don't remember. Yeah.

D

You also my equivalent lifetime like slightly bigger than preferred lifetime. It doesn't have to be months right. It can be a day, for example right. So you do know you do not keep those addresses for for really long time.

D

So basically, your preferred lifetime should be X and you will at lifetime should be I, don't know ten times, but it shouldn't be a month. That's what I'm saying.

N

You're saying that when the preferences advertise they prefer, the other lifetime should be smaller.

N

That's a side comment when I was trying to track. You know what was the reason for which we had the timer's that we have I ended up talking because you know. Normally you get these values over time and you know sorry and people get used to them and they're supposed to be like a reason for them.

N

But nobody knows what the reason is and I talked to the one of the original actor answers of neighbor discovery, and what he mentioned was that, for example, for the road, their lifetime, that has a default value of 30 minutes. What? Because at the time there was the idea of using ipv6 to communicating with the moon, so the but the road of life that had to do with our now. We got used to that value. But when you think about the reason for which that value is there, it's.

E

Jeff has so EFT is probably the wrong tool here. If he's very good about telling you something's gone down very quickly, it's less intended for know. Are you there? Are you still alive so coming up test, not so good, going down test very good?

E

There is something called seamless, BFP, that's a little bit better than that for unsolicited cases, but then the router advertisements, without that should carry a piece of PFD info and they're. What I'd suggest is that ping is probably actually a better check for a lot of these things. Just simply no check to see if stuff is live and be there. It means whom the ICMP by.

N

Which others would you be pinging? The.

E

Exact same thing, you're attached at the FD sessions do so.

E

And the only reason even got to the microphone is: we've had people in I, Triple, E, land and other places want to use the FBI procedures didn't like actually using VFD machinery and then still call that VFD anyway, and what does that see? This happen inside IETF as well. Ping works perfectly if I know some some basically you're. Looking for some sort of loopback test, that's really all you're looking for is just being all the source route packets from a host through the router and make sure that you could actually get the answer back. What.

N

Would happen if, for some reason they were filter well,.

E

That's if it's filtered that you have it to go while.

N

Windows does feed their IC.

E

And piece as far as I remember, yeah well, in some extent the conversation about the lifetimes of your networks, part of the thing I'm not seeing this conversation is never partitions. Now, if you're in a real organization, you know like an enterprise or something like that, you expect that network partitions are gonna, be small for the most part. If they're happening, you want things to fall apart as fast as possible.

E

That's what routings for, but when you're looking at the stuff being used like in a home environment, people walking in and out of Wi-Fi range buggy and all the stuff somebody accident, but didn't you know some Apple equipment that knocked over things. The spanning tree is broken.

E

They pick your favorite bug, class of things, but partitions happen in network environments for houses all the time when you start looking at other cases where you might want to use ipv6 where you know extremely low power, that Berks packet loss, heavy environments when you have those types of partitions a stable address becomes, you know very important.

E

So the ula type scenario is actually a good solution for those things, but I I'm, not a v6 expert I get my toes up occasionally and you'll a doesn't seem to be working well in a lot of environments or both deployed in a way that it seems to solve these problems. You know a lot of cases you know. Link locals in the network that is actually properly put together would solve exactly the same solution, but no too many devices need to be middle of these cases. So you do want long lifetimes.

E

Just to have stable addresses trouble. Is you can? Oh? Is it boy? Is it the global? You sign one for your router, it doesn't matter, it has to actually have a long lifetime.

O

um Eric nygren, the this is certainly anecdotally, I've seen being a real problem in the world and where it seems to happen, is kind of the ED combinations of not great implementations across a bunch of areas.

O

So, if you have a not great implementation of client application that doesn't do happy, eyeballs and not great at what OS it might be running a ten-year-old Linux or some very old Linux kernel, that is basic ipv6 support and a a network device that are home route a home router that is doesn't have the best implementation that combination together can combine with the ISPs practices can result in some fraction of you views with being in a network being broken like this, and what I've seen is that that, if that ends up being a few percent of users, a few percent of users running some smart TV with a ipv6 out that doesn't do happy.

O

Eyeballs on a old prefix is enough to cause major content providers like national streaming services to get enough user complaints coming in of oh hey, my users can't stream stuff. They end up. Switching that content fact I could be for only so. It's a real problem, and ideally because of that nature of how this happens, that a bunch of things are broken. Looking at how many of the what what are simple fix is that people are unlikely to get wrong in a way that may worse on each level of those steps.

O

I probably may be one of the better ways: it's like encouraging apps to have better fault, um have better um happy, eyeballs tall fallback, behavior fixed doing something on the OS side and doing something on the router side. That are each simple, because that maybe make it such that that, if you fix at least one of those things in the chain, you're less likely to have these sorts of problems. I.

D

Have another crazy idea for you, so I assume your solution relies on the host reichian prefix to next hope. My can write prefix to next hope. So, basically it should be a t26 compliant right. So, yes, we basically do not care about all the hosts in this case. What, if again as an additional mechanism, you suggest the trout are using you'll encode for every new prefix generated in this case, your host will be always preferred will be always prefer. The active link, local, which correspond to the actual active prefix.

N

How would pre fix other ties by a new league local would become preferred over because.

D

You have to next hopes now right, and one of them is not reachable anymore, because the prefix is gone. I.

N

Mean it is reachable, it just I mean there if.

D

Your router doesn't have the prefix anymore. It doesn't have to address anymore, which means that particular, a local will not be reachable. Ok and network may become. Disability detection will tell how's that, as is, should not view, so it will be selecting the way. Maybe it will be selecting the right next hawk. It's actually I'm just thinking about it. It might be something to consider. Yeah I think there are end of npvd stuff right, yeah.

N

I think like there are two things to consider here. I mean one is, for example, if you could take as much time as needed to solve the problem, how you do things. On the other hand, the question is well. What could you do?

N

You know to actually solve the problem without having to wait for that long, because, for example, in that case, in order for the problem to be avoided, then you need each CPU router to be updated, whereas in with a mechanism that we are proposing as soon as they host you know, implement a behavior. The problem is solved, so I think.

D

We need to implement a number of things which makes a pro the whole system or a lot. They don't have to be one thing: yeah yeah I agree with that. So we can consider the different things which make the solution, but it doesn't necessarily one yeah right.

D

So maybe in some we can talk which one as it's been mentioned could be easier to implement in terms of what people can get wrong so yeah and what are the things we can deploy without actually again introducing too much complexity, and we can the logic of the protocol too much yeah.

N

I agree, I think that's it. So thank.

A

You, okay, unless there's a Stephen Farrow in the room, we're all done. Stephen Stephen, we're all done! That's it! Thank you very much see you next time.