Internet Engineering Task Force 109, 20 Nov 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF109-ICCRG-20201120-0500

Description

ICCRG meeting session at IETF109
2020/11/20 0500

https://datatracker.ietf.org/meeting/109/proceedings/

A

A

Anybody come on. People are 55 people in this meeting, not one person.

A

A

We need someone to do minutes, I'd like to get.

A

A

You're going to hear my lovely voice until we get somebody to two minutes.

A

Yes, thank you and brian. Thank you. I think uh if brian you guys can sort it out, but presumably west can do it, but, but you need to, I I'm very grateful. Thank you so much for doing that.

A

um Well, welcome to iccrg. um This is the meeting after six months and it's been a while so um we've had, I had a large number of uh people having things that uh we unfortunately could not. We did not have enough time to present here, but um I want to start off this meeting by before I do the agenda bash just very quickly. Note a couple of things. First, the note well, as you all know applies.

A

um The important thing I wanted to point out here is that I would like to really try and shift uh the focus away from presentations at meetings to discussions.

A

I am probably going to slowly start insisting that those who want a presentation slot actually have um initiate a discussion on the list first, so that it doesn't seem like a one-shot thing. We always have this problem with iccrg. I think where people come to a presentation, walk away and it's 10 minutes of engagement for three months, which is not particularly exciting or interesting, and there's no continuity.

A

So I would I want to encourage everybody to to to participate on the list.

A

I want people to uh bring topics up on the list and if, if the topics are interesting, I would like to give that time on the agenda, and I will say that topics that are getting discussion on the list will uh will have priority when it comes to agenda time at a meeting. So uh keep that in mind and uh uh I'll move along um one.

A

One more thing before I get on to the agenda is that later today, uh during the irtf open, there's a talk by rainesha ware of cnu, uh she will be presenting uh on.

A

uh Oh, I don't remember the name of the talk, but it's about the chain, fairness index and and uh and moving past the chain, fairness index to measure uh fairness amongst flows and it's a very interesting piece of work. um I encourage you to show up uh and and and and give feedback. I'm sure you will be interested in this talk um with that. Let's get started, we have a packed agenda today, so we're going to try and keep this on time.

A

um I uh we have praveen doing an update on our like that, and then we have a special guest, ayush mishra, who will be talking about a really interesting paper where they've done some fascinating work on measuring tcp condition, control deployment on the internet, the we have neil from google, giving us an update on bbr v2, followed by sylvester, who, who will be talking, I think, about bbr uh unfairness um and and then uh bob talking about pcb product congestion control, and if we have time we will have a chance to talk about mpdccp as well.

A

So it's a packed agenda. um Let's keep this uh to the speakers. Let's keep this within time. I'm gonna try and uh move you along. If, uh if I need to- uh and I would like to get started so I'm gonna take- uh I think, charge of the slides here and I'll run them. I know sylvester. You were going to try and do your own slides and that's fine, but I will I'm going to cue praveen. Now, I'm going to switch this to my um to to your slides, praveen.

B

All right, can you see and hear me.

A

I can see you hearing is you you might want to move your mic down uh better now, uh slightly better. You can go a little bit. Oh I'm sorry! It's me not you.

C

B

Waiting for the slides to show up okay, hello, everyone- everybody is probably in different time zone uh uh after a long time. We have iccrg so happy to see everybody here uh today, I'm going to talk about our led bat.

B

We have an update on our luck. Bat. This is joint work with others at microsoft and also uh much less bertrow from uc3m. uh I missed gabriel's name, uh sorry about that jenna. Can you move to the next slide?

B

B

uh So what is our led by a quick recap? So what we want to do is we want to bring the benefit of light back plus, plus to the receive side of the transport connection. For those who don't know what ledbet plus plus, is it's an improvement over the original lightbet rfc to solve a bunch of uh shortcomings, of that rfc, uh so uh network plus plus, is a sender side, congestion, control algorithm. What we want to do is uh bring the same benefits of that algorithm to the received side of the transport connection.

B

uh How we do this, we use the flow control mechanism. So, as you know, each tcp packet contains the window field which advertises to the peer. How much data it can buffer- and that is typically a typical tcp implementation- would tune that buffer over time to make the performance good. So we would increase the window as long as the sender is able to keep up and the application is draining data. But in this case what we want to do is use that as a throttle so uh based on the uh ledbet algorithm.

B

We want to actually control how much data the sender is able to send effectively implementing a less than best effort. Congestion, control uh for the end-to-end connection on.

D

B

Side, uh one of the key points there is yeah. We don't want to shrink the advertised window, but based on the computer window, we can over time reduce it over the period of rtt. uh Why do we want to do this? Why? Why is it important to do this? On the receive side? uh Three major reasons.

B

One of the challenges is that a lot of like software updates, which is uh one of the primary use cases for background transfers, uh uses uh cdns uh and and having control over the servers, is difficult to most of them, don't have lab, plus plus support.

B

The second reason we want to do this on the receiver is that there can be proxies on the path. Corporate networks have a lot of proxies um and that can prevent uh the the sort of less than best effort happening.

B

End to end the the network on the client side might be overloaded, and just doing it in one path of the network is not sufficient um and, of course, there's like cases where the receiver application has more information about exactly which connections need to be lower priority and might not be able to communicate that to the server side and doing this just using the client side. Application has a lot of advantages, uh including having enforcing any sort of preference that the local application or operating system that wants to apply a less than best effort.

B

This is based on the draft that is co-authored by everybody. That was on the first slide. Please next slide, please. You know.

B

So the update I have is that we have an implementation now in the windows. Operating system is based on the draft it's implemented for tcp. uh We already had an api and sort of uh configuration in the os to turn on ledbet plus plus the same. One also enables our led bat. So when you enable this, you get less than best effort in both send and receive directions.

B

uh It includes all the additional mechanisms of led by plus plus the r led by draft leaves it open to the implementation. At least the current version of the draft leaves it open to the implementation to either use lead battery led by plus plus, we have chosen to implement all of the goodness of lightbulb plus plus, which includes uh rtt measurement slower than reno increase for the window.

B

With the adaptive gain factor. We also do the multiplicative decrease to solve uh interlaced, but fairness and late command advantage problems. We also have a.

D

B

Version of slow start which exits early based on deal increase, and we have the initial and periodic slowdown which helps us measure the base delay accurately throughout the lifetime of the connection and solve the latency drift problem.

B

uh One of the key things here is that when you turn this on, we automatically negotiate tsb time stamps. If tcp times time negotiation fails effective. The algorithm is disabled. At that point it becomes. It generally just falls back to a traditional tcp connection.

B

We do expose that information after the application, so the application could uh apply like a static uh throttle uh for these kind of workloads. uh Next slide, please.

B

uh So what are the dv? There are some deviations from the draft that I wanted to call out. uh The periodic slowdown algorithm in that ledbet plus plus, is is complicated for this, uh our ledbetter implementation. We have chosen to make it uh more.

D

B

So, instead of targeting only a 90 reduction, uh we are uh basically doing one slowdown per uh each measurement interval and that measurement interval is also different than uh the one for ledward plus plus, which was 30 seconds. This is like 60 seconds uh period uh for measuring uh the base delay. It was basically a periodic slowdown.

B

We also have a different target delay. This was arrived at based on some measurements.

B

The target delay of 60 milliseconds was used in lightweight plus, but we found that on the receiver side, we have to use a lower value. We are still sort of experimenting and tuning these constants, and once we have more results with a real world workload, we will update the draft with the correct recommendations.

B

uh Next slide. Please.

B

So I have some initial lab results. I don't have data from like a white deployment yet, but this is just to show the effectiveness of the algorithm as implemented on the receive site. So this is just a cubic short flow that kicks in during the middle of a are light but connection. As you can see the connection lamps up the center doing slow start, uh then we basically enter the slowdown period. That's the initial slowdown and we throttle the sender completely.

B

This is what would have happened if the sender was using ledbet, plus plus, but in this case it's the receiver throttle in the center and then we exit the slowdown. We continue our growth, but then the cubic flow kicks in and the reaction is pretty immediate. We can sense the delay build up in the bottleneck and we back off to the minimum rate, which is the uh two packet window that is recommended.

D

B

Lightweight plus plus and then once.

D

B

Cubic flow leaves, as you can see, the ramp up is pretty slow here, there's a 50 millisecond rtt and because we are still slower than reno, the growth is pretty slow. uh Next slide.

B

Please this demonstrates the uh latecomer uh this. This basically shows that the latecomer advantage problem does not exist uh with our leadback. uh So what happens here is that uh when the late camera flow starts, the slowdown, the periodic slowdown, basically allows the flows to remeasure the base delay and that causes the both the flows to sort of fair share, and we don't see that the late gamer gets undue advantage and completely throttles the first floor.

B

This is the same configuration with a 50 millisecond 100 mbps. Link next slide. Please.

B

uh This sort of shows the interled bat fairness for outlet blood flows. These are four different outlet. Bright flows, staggered started, as you can see, uh there's periodic slowdowns happening for all of these flows. As a result, they all measure the base delay accurately and they're, able to like uh fair share the link amongst themselves.

B

uh This does not show uh what would happen if you put in a cubic flow, but as soon as you have a cubic flow, all of these would back off and then ram back up when there's no competing traffic next slide. Please.

B

uh This is the low latency competition effectively. This is the problem where the queue is small enough. Actually, there's a mistake on the slide. The the queue size was actually 250 packets, so this is actually a shallow queue and because we can't build the queue we can't exceed the target delay so that bite would not back off.

B

We had the same experiment with ledbet plus plus and when it was a sender, and we would see that it was actually taking much smaller share compared to cubic, but with our ledbit, we are seeing that uh there is actually a little bit more competition, it's almost as if they're fair sharing, but this is a problem we will continue to investigate. We haven't root caused this yet, uh but this this is something that needs to be uh investigated, so certainly different, behavior than what we saw with the library plus on this intersect uh next slide.

B

B

uh That's sort of the summary of where we are at the next step for us is to take this implementation out for a spin with a real-world software update workload and measure its effectiveness measuring this is hard less than best effort in general. uh Metrics, for this are really really really really hard problem, particularly because the goal is to actually improve other traffic. uh We've had cases where people had to drop off the call and go tell their family members to like stop doing things on the network. So um you know it's it's.

B

Basically, a user experience uh uh measurement. We have ways of doing this, so we're still working on creative ways to measure the effectiveness of this algorithm in the real world. uh We want to do constant tuning, there's a bunch of magic constraints. I think this applies to both. uh Let that plus plus an outlet pat.

B

The other thing we want to explore is the uh making the target value dynamic. Currently, it's 60 milliseconds for uh ledbet, plus plus and 40 milliseconds, for our left back. We would like to figure out a way to tune this based on the bottleneck link. One of the challenges here is that, because it's a less than best effort algorithm, we can't really uh send at a very high rate, to to figure out the the capacity of the link. So this is a challenging problem, we're still figuring out how best to do this.

B

There is a problem with bbr v2, so we took the latest alpha release of the linux implementation of bbrv2 and took it for a spin with ledbet plus plus uh in the lab. So the same thing should apply to our ledbet as well, uh effectively they're the same algorithm implemented on either side.

B

So the problem here is that uh we don't really see uh qubit up with bbr, which is kind of by design and as a result, uh we sort of enter the low low, latency competition mode and, and our led back and light red plus plus are not backing off.

B

So we have to do more work here uh to figure out how best to uh do less than best effort in the presence of an algorithm like uh bb or v2. So this is interesting avenue for research. If there are folks on this group who want to contribute here, we would be really happy to hear about any ideas.

B

On the draft side we have both drafts currently adopted in iccrg.

B

On the our ledbet side, we may want to think about just referencing light, but plus plus I I don't really see the point of going back to lead back with the known problems. uh We also want to update uh the draft based on the data and the tuning on the ledward plus plus side. um We want to add pseudocode. That's been an ask from a lot of people uh so that the rfc is draft is easier to read and implement, uh and there is.

B

There has been a suggestion to also make it uh alone, instead of having to refer the original, ledbet, rfc and sort of replace the original one, and that there's a third point I miss here, which is to also make it agnostic to transport right now. Both of these drafts are very much specific to tcp, uh whereas they could also be applied to quick. So that's the third sort of work that needs to happen for these drafts. uh With that, I think I'm done with my talk and I'll be happy to take questions.

A

A

C

Thank you. uh Thank you for this um certainly an interesting idea. I have to ask a question, though, like if you had your way, would you prefer a server-side-only approach, uh because assuming all servers implemented, something that was of the shape of led back plus plus and we didn't agree on the hdb priority, strap that, like the like lowest priority, actually like indicated a congestion drill, change, and maybe we can get like similar results, just an idea, it's kind of cross layer, but um given them like very aware, but their opposites worth thinking about.

C

But everything else sounds interesting and thanks for continuing this work,.

B

If you, if you look at the original problem right so yeah, you could do this on the server side, which is how we started out with as a conversion controller. The challenge is the proxies the challenges uh not in not all applications. We can modify to inform the sender about which connection is lower priority, so there's some challenges doing just just on the server side. One of the good parts of our ledbet is that it can easily coexist with the ledbet plus plus on the sender. So there's no like interrupt problem here.

B

So let's say the server decided to do lightweight, plus, plus and the same time. Client is doing. Outlet bet uh those would just uh co-exist very peacefully.

D

B

Of the advantages of violet bite is that you don't need any changes on the sender side but yeah. I think.

C

B

uh Both both have their advantages.

C

Yeah messing with conju flow control always makes me a little bit scared, but I agree that it does work.

A

Yeah, I'm going to inject myself at the end of the line and also close the queue here. um Your next.

E

Thank you uh for this talk. uh Why does it critically depend on tcp timestamps? I saw that we you disable on like that. If timestamp is not negotiated.

B

uh Yeah good question: I think this is covered in the draft, so um basically we want to be able to take rdt measurements uh and that's the basis of the algorithm rate. We use latency measurements. uh The challenge is that if you're a pure receiver and not sending data throughout the life of the connection.

D

B

Accurately measure rtt, so that's the reason we have a strong dependency on the timestamp option.

E

I see thank you.

A

Vedi you're next.

A

Can you hear me? uh Yes, we can. Yes, we can thank.

F

You um so I have a lot of questions because I've been actively looking into this. I did send some emails to the mailing list for the things that I have found um a couple things that I have yet not know updated on the mailing list.

F

One thing is: uh when you do internal like that experiment, don't you don't? You sometimes see a delayed act problem in the rtt estimate, because that's what I was seeing and some of the flows were really like struggling to get any bandwidth at all.

B

B

So we have a filter that we window filter that we apply to all these rtt samples, which allows us to sort of weed out the delay tax samples. One of the things I would point out is that the workloads we are looking at are mostly continuously transferring data, because these are like update workloads which always have data to send, even if it is in chunks uh so applying the window filter for the most recent and received rtd. Samples is really important that will allow you to sort of overcome the delay, rack, inflating.

D

B

F

Set or receive.

B

F

Did you uh do you do that, for both the receive light pattern, uh lightpad plus plus.

B

Yes, we do it for both.

D

And it's more important.

B

On the ledbet side, because uh if you're not sending data, then then you're completely dependent on using the timestamp technique to measure rtt um but yeah. If you're sending data, then.

E

The deal attack.

F

um Second question was uh the dynamic target.

A

I'm gonna have to interrupt you.

G

A

um uh Would you would you be able to be able to list.

F

That that's okay, but I.

D

Just want to ask one one: last.

F

Question, which is a little bit more important, was the base delay is, I think, the basic lane travel is 10 minutes in the light pad plus plus draft, not 60 seconds, which I noticed in the slides right.

B

uh I I'm pretty sure.

C

It's 60 seconds, maybe there's a problem.

B

But let's take this offline, okay, cool uh and I have your other emails as well asking questions so I'll respond to all of them later.

D

B

A

All right, uh let's keep this quick folks. uh We are going past time now. Jeff europe.

G

I gathered from slide eight that it seems like you're, assuming some kind of delay bandwidth product queue size in the network. It doesn't work well on small queues. Let me ask you explicitly what assumptions is led back making about the network queue sizes and in relation to this sort of recent work about trying to reduce the amount of buffering in the network?

G

How does lead back react to consistently under provision but network buffers when the initial provisioning might have been delayed, bandwidth, product.

B

Shallow buffers was attempted.

D

B

In that by plus plus, which was more effective, uh so basically, uh what we do is we we are effectively slower than reno. So there's no assumptions here about what the buffer size is.

B

There are cases where it's bdp or higher, in which case you know there is enough buffer for us to build a queue and be able to detect that the target delay has been exceeded, but there is no other assumptions here, but if it's a shallow buffer, the way we solve that problem in red red plus plus, was that uh our effectively our window growth was lower than reno, uh so that that would basically make it uh go much slower than competing traffic.

B

That's the best solution we could come up with, as I explained earlier, detecting exactly what the bottleneck capacity is is a hard problem for a less than best effort, congestion control. So all of these are good areas for research. I also think that the as I mentioned, the target value that we have is fixed right now and making it dynamic is also an important problem that should be solved.

G

Thanks that answers my question. Thank you.

A

H

Up very quickly say thank you and, and I'm really interested to see some of the the further results and things we've noticed similar issues in terms of uh many of the flows that you would really like to be background.

H

Flows are not flows where, where uh you're doing a lot of the sending, um and especially around intermediaries, you may not be able to go, find all the corners of the internet so that that I think, has a real, practical use in a way that that a lot of the other things we do are helpful in some cases, but not quite as critical.

H

The one. The one thing that would be interesting to know if you've seen is, we've discovered a couple of senders and I can go look up what what congestion control they're using where, after you've closed the window down. They take many tens of seconds, if not minutes, to respond to the window.

H

Opening back up and I'd be curious to know if, if either you've seen anything like that, and is that one of those cases where we say well, it's fine because you're trying to be background flow anyway, and so, if it's going to take you, you know two and a half minutes before you get back up to actually transferring real data. That's fine! Because it's in the background or is there uh is that going to be a problem with this kind of a strategy.

B

So that's an interesting problem, so we haven't done at scale measurements. So I can't tell you if you have seen that problem. That's a short of work! That's upcoming, I'm very uh happy to keep an eye out for that. So thanks for the heads up, um if the, if that becomes a problem yeah I mean we don't want to go artificially slow either I mean yes, we are trying to do these things in the background, but if there is enough capacity, we want to be able to saturate it.

B

One of the also the other problems with going slower than reno has been that you know if you're, really on a big van link and you artificially slow down, it takes a long time to to come back up. So, yes, that is also an avenue for more improvements, possibly but I'll, keep an eye out and keep you posted. Hopefully another update in one of the other upcoming iccrgs. So thank you.

H

A

um I want to take a quick moment to thank praveen for this there's clearly a lot of interest in this work, and I think it's the results are super interesting. I have questions about them as well, but I'm going to hold them off for the list, which is the plug I want to make here. Please take these questions to the list. I think that we want to see continued engagement on the list. Praveen's already said that he's interested in they are interested in in getting feedback on how to make things better here.

A

uh Please please, please take those to the list. Make suggestions, engage in conversation there. I'd love to see more of this happen on the list, instead of just during the q a session here at the iccrg meeting. um I I wanna yeah uh I'll say that just one comment I to make, which is that people are talking about doing this per stream in quick or in http, and that is very tricky.

A

You don't want to do multiple construction controllers within a single connection that walks into very, very strange territory. I'll just make that comment and move on. We can have more discussion on the list or on the chat here. Thank you, praveen.

A

um Are you sure you're up.

A

Next, let me bring up your slides. Okay uh can.

I

You guys hear me.

A

Yes, we can okay, okay, I will just go on. You want to introduce yourself and take it away. Irish, by the way, for everybody here is an iccrg newcomer, so please treat him kindly and make him feel welcomed and uh happy to be here. Take it away. Okay,.

I

Okay, thanks for the introduction, janna, uh so hello, everyone, I'm ayush, I'm a second year phd student at nus and today I'll be talking about the great internet, tcp congestion control census.

I

So this was basically a measurement study that we connected in mid 2019 to figure out who's running what congestion control uh algorithms on the internet uh next slide.

I

I

uh Okay, so 30 years of congestion control research have produced numerous congestion control, algorithms and, as a result, for most of the internet's lifetime, we have seen a heterogeneous mix of congestion, control algorithms, and um I mean I'm just not seeing this, but this has also been verified by previous similar studies as marked in brown on the timeline on the slide.

I

But what's happened since the last such study in 2011 is that we've had a new kid on the block, so bbr, which was proposed in 2016, may arguably be the most momentous development in the congestion control landscape yet, and the main reason we feel uh this way is because for the first time in the internet's history, you're going to have a significant part of internet traffic, that's not going to back off when it sees a random packet loss.

I

So what this essentially does is that it introduces an entirely different congestion control mechanic into the existing mix of window-based and loss-based congestion control on the internet. Next slide, please.

I

So, with this uh in mind, what we wanted to do was we wanted to uncover the exact extent of bpr's deployment on the internet and maybe refresh our view of what the current internet congestion control landscape looks like. So to do this, uh we set out to do a congestion, control census of sorts to measure the 20 000 most popular websites on the internet and figure out what congestion control algorithm theorem next slide. Please.

I

So, unsurprisingly, it turns out. This is a non-trivial task for a variety of reasons.

I

Firstly, uh while making such a measurement, we will need to isolate the internet's network dynamics so that, whatever we see on the receiver end, uh we can make sure that that's a function of what the congestion controller is doing and not just what uh the network looks like at that point of time.

I

Second, we would also want to extract a common feature from variety of congestion control. Algorithms, uh since we don't know up front what the remote condition control algorithm is, and finally, uh we will need to identify these congestion control, algorithms uh within short, http page downloads. So this was a design decision that we took uh very early into this measurement study and the reason was that most of the websites that we were aiming to measure serve http pages, so it would be the best candidate for conducting such measurements.

I

So our solution to do this measurement study uh was a tool called gordon and gordo got in deals with each of these concerns through a variety of strategies and design decisions.

I

Next slide, please.

I

So the first issue, which is isolating the network's dynamics, is start by gordon by localizing. The connection bottleneck, so gordon does this by date limiting the connection right before the client and the reason we do. This is because this provides us an opportunity to directly control the bandwidth that the sender sees and it also minimizes uh the risk of random packet losses on the internet that can potentially be hard to account for when we are doing our measurement next slide.

I

I

The second issue, which was uh selecting a common feature to extract from all our congestion control algorithms. um We dealt dealt with this by actually choosing the seventh of the remote congestion controller as the common feature in our measurement, and the reason we did. This was because, since um whether your congestion control algorithm is window-based or raid-based, it's always going to have a cap on how many packages that you have in flight, and this can essentially become it's essential.

I

It's effective event, and this is something that we can measure so how we measure the seabend uh is actually through a very simple iterative.

H

I

So we note that this event is the maximum number of unacknowledged packets. A sender allows you to have during the connection, so to basically get this number. What we do is we start the connection uh with the remote server and then we drop all the packets till we see a retransmit.

I

So in this case all the packets that we received before we see a retransmit is going to be the value of the first congestion window or c1. Next, uh we start a new connection after some time and this time we accept c1 number of packets before we start dropping packets. Again till we see a retransmit and in this case the new number of packets that we've dropped will become c2 or the second condition.

I

So we perform this uh procedure. Iteratively till we exhaust all the data in our target web page and thereby we produce a sieve and evolution graph.

I

We have found that this even evolution graph is uh not only effective enough to differ between the known congestion control variants, but it's also quite handy in making uh useful observations about any unknown congestion control variants that gordon might encounter next slide.

I

I

The last issue that we had to deal with was dealing with short, http, page downloads, so how we can deal with this is really simple. We can either look for larger pages, which is exactly what we did. We crawled the target domains for the largest pages we could find and since our measurements are made on a packet basis, we use the smallest mtu that was allowed by the network path or during the connection. So this basically allowed us to extract as many packets as we could from a given excise.

I

Next slide, please.

I

So, while making these measurements uh gordon, actually simulates our two key network stimuli in a way to l-set uh characteristic responses from a remote congestion controller, and um we can encompass the stimuli in something what we call a network profile and this network profile will be applied to each measurement that coordinates so in this network profile. What garden does is it emulates?

I

A packet drop the first time this event exceeds 80 packets and it immediately gets a bandwidth change after receiving 1500 packets, and it does these changes while emulating an rtt of 100, ms, the exact details for why we use these numbers and why we choose these. Two network stimuli can be found in the paper next slide. Please.

I

So this is a really interesting slide and what it essentially illustrates are the distinct congestion window responses for different congestion control algorithms when we apply our network profile.

I

So as we can see, uh all these graphs have reasonably distinct shapes and in fact our classifier uses a decision uh tree based classifier that uses these distinct shapes to identify different congestion control, algorithms again how this classifier exactly works.

I

You can find more details about that in the paper next slide.

I

Please so now uh I'll cover the measurement results for our measurement study and, let's start with accuracy, next slide, please so to measure the accuracy of gordon. uh What we did was we set up control servers in various locations on the.

D

I

And we measured them via a local server in the lab and later on. We reversed this configuration by with our local server, acting as a control server, and we made gordon run in various machines around the globe.

I

And what we've seen is that our shape-based decision tree based classifier works reasonably well to identify a bulk of the algorithms and any misidentifications uh that we see are basically between algorithms that have a very similar congestion of congestion window evolution shapes, which is something that we expected. But even given that we can see that for most of our identifications, the accuracy is more than 90.

I

Next slide, please.

I

So the measurements of the websites themselves uh were made from servers in singapore, mumbai, paris, sao paulo and ohio and uh for the websites. uh Given our network profile, we found that 16 of the pages were less than the optimal uh page size of 165 kb, so, basically based on our network profile, we calculated um you know in the worst case scenario, what is the minimum page size that we need uh to get a reasonably long event graph that we can identify?

I

So it turned out. This number was 165 kb, but 68 of the pages we measured were lower than this number, so in case they were lower than this number. What we did was we did a classification and our best effort basis, which is that if we could make identification, we went ahead with it, but if you couldn't, then it was just classified as a short flow.

I

We also found out that about 1400 websites among the top 20k alexa websites were unresponsive to our rather aggressive measurement methodology. So I'm going to be referring to these websites as unresponsive websites for rest of the presentation.

I

uh Next slide. Please.

I

So, in terms of the distribution of the congestion control, algorithms, in terms of website count, uh what we found is that cubic is still the most dominant congestion control algorithm on the internet and we measured it being deployed by 30.7 of the measured websites.

I

However, uh it looks like bbr has been adopted at an unprecedented rate since its introduction in 2016, and it's now, accounting for almost 18 of the top 20 000 alexa websites. We also identified a slightly modified version of pvr being deployed by 167 google owned domains, and you will be referring to this slightly different variant as pbr.

I

The other significant shares in the congestion control landscape belong to congestion, control, algorithms, like ctcp, illinois, vegas, htcp and uh yet another high-speed tcp.

I

There was also a significantly large chunk of websites that gordon was not able to classify or classified as unknown. So we'll have a closer look at these websites later on in the presentation.

I

Next slide, please.

I

So, given our numbers uh from the distribution based on just the website accounts themselves, um I don't think that really gives us a complete picture because not all websites are made equal and it's likely that more popular websites are contributing more traffic to the internet.

I

So for this, what we did was we just looked at the top 250 alexa websites, and when we do that it looks like ppr overtakes cubic as the most dominant algorithm in the websites uh that matter.

I

We also noticed that a significant number of the website that deployed pbr um served video content. However, uh I should note here that it's not necessary for these websites to be deploying dpr for delivering video as well, since our measurements were made on static, http web pages.

I

uh But that said, even if you just consider the static http web pages, vbr is still contributing uh significantly to the downstream private chair.

I

um Another thing I would like to note: uh circling back to there being a difference between the video congestion control algorithm and the http webpage congestion control, algorithm uh gordon actually identified netflix.com to be using cubic to serve its web pages, but when we actually reached out to netflix um it turns out, they actually use new reno to deliver video uh next slide. Please.

I

So, coming back to the unclassified variants, we found a significant number of websites that gordon was not able to identify so to just uh to investigate further what we did was we re-ran experiments on these websites and it used a variety of different uh network profiles to see how differently they react.

I

So in the measured websites about 14 of them were either short flows that we discussed earlier or did not respond to our measurement methodology but of the remaining websites that did actually respond and give us long enough. Even graphs. We found that most of them are reactive packet losses, but a significant number of them do not reactivate losses.

I

We were also able to identify a variant run by websites that were hosted by akamai and I'll, be calling this period akamai cc for the rest of the presentation. uh Next slide. Please.

I

So arkham cc on its own, uh turned out to be quite an interesting digestion, control variant from its reaction to various network profiles. What we found was that it did not react to packet loss, but it closely followed whatever pdp was emulated by gordon.

I

We feel that it's likely that this radiant uh this is a variant of uh fast tcp. There were some other interesting uh cbn evolution, crafts that we found as well.

I

So, for example, uh on your right, you can see amazon.com, which ran a in that did not really respond to our uh emulated packet loss and showed uh htc uh htcp like uh behavior in the congestion avoidance phase, uh yahoo.coda jp was quite conservative uh and seemed to exit slow start even before it saw package laws were saturated the pdp uh and, on the other hand, zero.com was on the exact end of the spectrum other end of the spectrum, which is to say that it did not respond to packet losses or changes in bandwidth.

I

It just seemed to keep 200 packets in flight all the time next slide.

I

Please, in summary, uh what I think uh we are seeing is essentially a paradigm shift in internet in the internet, congestion, control, landscape so similar to the transition that we have seen um earlier on between a and d and mind a sizeable chunk of the internet traffic today is being controlled by these rate-based algorithms like dpi, and we feel this really further underlines the importance of understanding the interactions between these two different uh schools of doing conduction control and mitigating any unfairness and coexisting issues.

I

Next slide. Please.

I

So, given that we are seeing such a fast-paced uh we're seeing such fast previous changes in the internet, congestion control landscape, uh we would want to make some changes to gordon to keep up with these changes. uh So primary among these is identifying newer.

D

I

Control algorithms, so, for example, since ppi we have seen uh we have seen proposals for variants like copper and pcc, we're watching which are also rate-based, and ideally we would like to identify them as well. So, since the measurement study we have extended gordon to measure the received rate along with the sea wind and turns out receive rate is quite handy to differentiate between, I mean not just differentiate that identify copa and pcc privacy uh in controlled experiments.

I

um Lastly, we would also want god to emulate a larger variety of networks stimuli. So, for example, there might be slightly modified versions of qubit or renault that don't respond to one packet loss, but two or three packet losses that we are not able to emulate and therefore we're not able to identify them, and we would also want gordon to identify sub-rtt behaviors since right now, we are constrained to measuring just a per rtt. Seventh next slide.

I

I

So I would like to end this talk uh with some with two high-level research questions that our research group has been dealing with since our measurement study. So the first is really understanding how ppr and cubic will cope with this evolving congestion control landscape.

I

While there has been plenty of work that indicates, pbr can be unfair to cubic in some scenarios. uh This congestion control evolution is unlikely to be a walk in the park for ppr either.

I

So uh we have done a lot of interesting work in this front and I will not go into the details of it in the interest of time, but allow me to illustrate one of our key results through a very simple experiment, so we ran multiple instances of tent flow experiments uh with different shares of them running dbr and cubic. So first, we had only one bpr and nine cubic first or the second trial we introduced bbr2, and we kept on doing this uh till all our flows for pbi and the graph on the right plots.

I

All the bbr flows, overflow average throughput as compared to the fair share in that scenario, which is indicated by the dotted line.

I

So what we see is that, as you slowly increase the number of pbr flows at the bottleneck, the buffalo average throughput of ppr drops very sharply. In fact.

I

At one point, it even goes um below the pressure for that bottleneck, so the main coin that I'm going that I'm trying to drive from this graph is that, um if bpr is working really well, today does not mean dpr will be the obvious choice against cubic tomorrow, uh both for bbr and cubic uh the performance is likely to be a function of what the congestion control landscape looks like next slide.

I

Please the second research question we're trying to look at is understanding the database, congestion control mechanic, so bpr and other new internet congestion control. Algorithms that have been proposed since have been predominantly replaced. Examples of this would be coppa and pcc, we're watching, and it's quite common for these algorithms to work on type, send rate and receive rate feedback loops to basically inform.

I

What's going on at the bottleneck, we feel that this is a new congestion control mechanic, that's still not uh completely understood, and what we essentially need to do is we need to be answer. Some of the key congestion patrol questions like on convergence and fairness will be in the rate-based setting.

I

So in this direction we are working on modeling such um send rate and receive these feedback loops and trying to understand how they work next slide. Please.

I

Yes, and that's all I have for you today, uh thank you for your time and I'll be happy to take any questions.

A

All right, we have time for very quick few questions. um If you could try to keep this brief, that would be very much appreciated, but thank you so much ayush. This is excellent work and I'm really glad to finally see it in iccrg, despite the fact that we were trying to have it here six months ago,.

E

um I just thank for the work um it's really interesting.

E

I think uh categorizing uh cubic um as a mimd uh may be um not very precise, because uh cubic actually runs this sort of tcp friendly mode you'll be good to know that how often the cubic internet connections are actually in this region, because in that mode, it's really just running know so. Yeah.

I

Yeah so yeah, I think that's a really interesting point and I think that's a little short-sighted on my um my half so cuba. I agree that cubic can actually be both mi and d and aimd, but as far as uh actually measuring how often it does this on the internet. I don't think it would be possible to do this without current tool, since we essentially isolate the flow we are measuring in the localized bottleneck. So it's really not competing with other flows.

A

Thank you, praveen you're. Next.

B

uh Great work uh thanks for presenting this uh one question I had was any early thoughts on being able to distinguish bbr uh and bbrv too.

I

Okay, so uh we did actually plot out graphs for ppr and uh dpi v2 as well. They have very distinct congestion window responses, but the problem they are having right now is that dpr v2 is congestion.

I

Window response is not consistent, so given pbr and dvr v2, we can distinguish between them, but given ddr v2 and some noisy measurement on the internet, we are not able to pick whether it's dpip2 or not. So we need to do a significant amount amount of work in that direction. To be able to make this.

A

A

Jonathan you're next.

J

uh Thank you: um are you planning to incorporate ecn response characteristics into your research.

I

uh That's actually an interesting idea, but we have not done this so far, but that's definitely a direction we would like to look into so I mean not only just excellent but possibly later on being able to classify uh quick connections as well. So those are the two key uh directions that we have not um specifically looked into so far, but he definitely would like to look into in the future.

J

All right, thank you.

A

Well, thank you, everyone for your questions and thank you ayush again for presenting this, I'm assuming that you're going to be uh subscribed to the iccrg mailing list. Yes, yes, okay, excellent! So if people, if you have questions, take it to the list, please uh I wish we'll be there on the list and you can also give him more suggestions for what he could do to continue this work, because I think this is very useful work and its use is also in being able to uh find out how the internet is changing as time goes.

A

So. Thank you again. Thank you. Moving on, uh we have neil uh neil uh to to defend why bbr v2 might not actually be noise, as is but I'll leave that for you to do neil. Take it away. Oh actually, let me get you your slides. First.

A

There we go all right, take it away, sir.

K

All right, great uh thanks, jana, so uh I'd like to give a quick update on some bbr work at google. uh This is joint work with my colleagues at google listed there, including folks on the tcp team, uh quick team and the swift team next slide. Please.

K

So um in brief, uh some aspects that I'd like to cover uh include a main focus on some work that we're calling bbr swift where we're looking at using delay as a congestion signal inside data centers and then briefly, I'd like to touch on a second topic about scalable loss, recovery handling and some of the considerations that we think are kind of interesting.

K

As we've looked at our experiences with bbr and prr, and the question of how scalable are these various um styles of multiplicative decrease uh when there are large decreases in the available bandwidth uh and then I'll? Do a quick summary of the status of bbr at google and a quick wrap up, and just uh just to sort of set the context here about what we're trying to aim for.

K

um For this talk, we we mainly wanted to share our experience with uh some of these experiments and algorithms were trying out, and we wanted wanted to invite the community to share any feedback. You have uh and, of course, always encourage you to share any uh test results or issues you see or patches, traces or ideas generally uh next slide, please.

K

So the first part of this talk will be about what we're calling bbr swift, where we're using delay as a congestion signal in the data center next slide.

K

Please so um a little background here for folks who haven't run into it. um The swift congestion control algorithm uh is one that some of our colleagues at google uh recently published at sig com in 2020 and at a high level.

K

One of the interesting aspects of the algorithm is that it uses the network rtt and host delays as the primary congestion signals. Although it also uses loss, it generally tends to try to keep the queues low enough. That loss is quite rare, and so the delays are the very much the primary signals.

K

And, of course, um you know, this crowd will notice right away that this use of network rtt means that there are particular scopes where this is an appropriate and feasible algorithm, uh and in particular um this is appropriate. Where you're you have traffic, that's inside a network with a unknown topology or a known rtt properties, um which applies to a lot of today's data centers, which have very regular topologies, where the operators know the expected rtts.

K

Another requirement here for scopes, where this makes sense, is that the network interface cards support hardware timestamps at least receive timestamps transmit timestamps can also be useful, uh and the third requirement is that uh all the traffic's sharing the bottlenecks be swift compatible um because the the algorithm uh sort of requires that uh to behave well and in terms of that algorithm.

K

I think the two main points uh that are interesting about swift are that it's uh it's using um a fairly typical aim d, that is additive increase in multiplicative decrease approach, uh where the one of the interesting aspects here is that the multiplicative decrease is proportional to the excess delay and we'll talk about the details there in a little bit and then a second really interesting aspect of the algorithm is that when the congestion window is below 1 and it does support fractional congestion window values, it accomplishes that by using pacing so that the average number of packets in the network is is fractional and that allows it to handle large scale.

K

K

That is lots of flows sharing a single bottleneck which some people call in cast where you may have hundreds, thousands, or even tens of thousands of flows, sharing a a path that may have a bdp that can only hold a hundred packets or 200 packets. Something like that.

K

um So in terms of where swift has been used so far, um it has been used in production inside google data centers by a a user space, and networking stack called snap in the sosp publication about that system in uh sosp 2019, and this is used in uh for a significant amount of traffic within google data centers, where this is an appropriate uh environment. Since a we, we know the target network rtt we're aiming for and b.

K

We know that all the other traffic sharing the queue and then is swift compatible, and in this case we use uh per quality of service cues to accomplish that segregation of different algorithms into different cues next slide, please.

K

So um how? So? Why would we want to use delay as a congestion signal? There are a couple different advantages, so um the first sort of class of advantages is that it provides a richer source of information about how much cueing is at the bottleneck, and this is um quite interesting because it actually allows you to to get a quantitative notion of the current degree or magnitude of queuing, which is something that you can't really get from ecn or lost signals.

K

And this is useful, because this allows you to react more quickly in cases where there is a long queue to get rid of that cue more quickly and dissipate that congestion more quickly, but also correspondingly. It allows you to avoid overreaction and potential underutilization if the queue is actually short.

K

And you can think about that ambiguity. If you consider, for example, a dc-tcp style shallow threshold ecn signal where you might have a a sustained ecn signal that lasts for quite a while um and the you know an ewma filter of that might turn that into a very high alpha, for example. But it's still quite possible that that q, even though it has lasted a long time, is quite shallow, and so it's quite easy for an algorithm to sort of overreact to that. Whereas a delay signal allows you to avoid that issue.

K

So the second class of advantage for a delay as a signal is that it gives you sort of a known target latency for engineering, your systems- um and this applies to several different pieces of the puzzle here- one.

K

It helps applications to sort of predict the latency that they should expect, and second, it allows people who are engineering, the network itself to set slos or service level objectives, and that can in turn, help inform, monitoring and alerting, because you sort of know what to expect or what you're aiming to achieve, at least by contrast, using loss rates or ecn and monitoring those um and and conveying those to applications.

K

It makes things actually quite difficult to translate into application performance. So if you tell someone to expect a 0.1 loss rate, what are applications supposed to do with that? They don't really know how to translate that into into latency expectations. And it's a tricky thing to do, and finally, at a high level, a key piece of the puzzle here is that to make this work, we need accurate delay, measurements for network and host delays. So next we'll talk about that next slide. Please.

K

um So bbr swift, um the primary signal that it's using is uh we call network rtt and that's something that the data center computes by uh basically computing the uh total round trip time for a data segment, minus the receiver act delay, and so we've drawn a diagram here to sort of illustrate that and we've shown the total rtt in this sort of teal color and the receiver act delay in this sort of orange color and then the network rtt component.

K

You can sort of visualize as the purple um path of the packet there and the sort of vertical distance of the represents the the network rtt. So the we can consider a specific example depicted here. If we, if we look at the sender here, the data center tcpa, it um schedules some packets uh or schedules a packet to be released at a particular time from the pacing layer that packet travels across the network as data p1.

K

It's received at the receiver here at the receiving nic, but then there are all sorts of interesting delays that can happen on the receiver side for various reasons. So one big delay source that we've noticed is uh power, saving c states. So often um servers that are not running at you know: 100 cpu utilization on all the cpus will take the opportunity to go into a power saving state and if the packet arrives um and the nic that's handling the receive interrupt is actually in a power saving state.

K

It can take a quite a bit of time to wake up, sometimes often that's longer than the actual rtt of the data packet, so that delay is significant and needs to be quantified.

K

Other delays happen because the tcp stack might be processing a whole queue of packets, not just one packet and then, of course, in tcp and other protocols. There's often an intentional delayed ack mechanism that comes into play as the receiver is trying to piggyback that ack on, hopefully, some outgoing data segment later on, um and so, if you think about all of these delays, you could have various combinations and in this protocol what happens? Is the receiver is able to convey that receiver actually back to the sender and to do that?

K

We use basically a new timestamp option that we've described earlier in the week in the linked internet draft here that we are calling extensible, timestamps or ets. So you can check out the tcpm, slides and presentation and also the the linked internet draft that describes the details, but basically we'll we'll talk about some of it.

K

And an interesting thing to note here in the diagram is that the data and act transmission times are measured by tcp and here in this picture, the data and act reception times are measured by the nic and conveyed, via the net hardware, receive timestamp mechanism to the tcp stack next slide, please.

K

So how is the signal used in the algorithm? So in bbr swift? uh This is uh an extension of bbr v2, where the core aspects of bbr v2 are unchanged and in particular, if a connection does not have the delay as available as a signal, it is going to behave exactly as the algorithm that we've documented at the itf and open source with respect to its response to ecn, loss, bandwidth, min rdt and so forth.

K

But what we have here is an extension to br v2. That's based on the swift, algorithm and key piece of this is basically that a new configuration parameter, which is the target rtt the rtt value, that the algorithm is trying to seek in some sense and trying to maintain rtt values near that target and inside of a data center. You can think of this as being in the ballpark of or in the order of a 100 microseconds.

K

Although you know a particular installation, we get to choose the target that makes sense for that installation and then how is that target used?

K

Basically, the algorithm at its core says that if the network rtt that's been measured is greater than the target, then we do a multiplicative decrease where the multiplicative decrease factor is essentially proportional to that excess delay. And here the excess delay is quantified as network rtt minus the target rtt and that's turned into a fraction by dividing that by the network rtt. So you can think of this intuitively as saying what is the fraction of the delay that we're seeing?

K

That is excess delay, because we want to reduce the congestion window by a corresponding amount so that we've cut out the excess fraction of the congestion window, which is proportional to the excess fraction of the delay in those cases, and then you can see here, there's also a bound so that we don't reduce the congestion window by more than 50 per round trip.

K

And again this response happens at most once per round trip, as you would expect, since the center needs to react and then wait a round trip time to see the impact of its reaction and see if it needs to cut further.

K

So that's the the basic approach that the algorithm takes, and one thing that is important to note here is that when this target rtt response is used, we disable the ecn response so that essentially, this target rtt response is used as a replacement for ecn.

K

If the administrator wants to use that in their site and one interesting issue that we are still working on nailing down, the details of is the question of how exactly land flows using ecn as a signal should interact with bbr swift flows using delay as a signal there's an interesting set of issues there, and we have some ideas that we're exploring uh one kind of approach would be to say that landflows can dynamically set the target rtt based on where they see the network rtt around the boundary between ecn marked packets and non-ecm marked packets, which gives you a sort of sense of the target rtt that you'd like where the ecn mechanism thinks the delay is at a good level.

K

uh Next slide. Please.

A

How am I doing on time.

K

A

uh Not well so that's what I.

K

Wanted we have seven.

A

Minutes left and.

E

A

So, okay, we wanna uh yeah, I mean decide yeah you want to do. It sounds.

K

Great yeah I'll just thanks, so just a quick sketch of the kinds of um results we see with this this class of algorithm. So here uh we have a sort of very simple or basic in-cast scenario. With two machines: each machine is sending a thousand uh bulk uh tcp flows so with 2000 flows in total and we're comparing dc tcp bbr v2 with ecn and bbr swift. uh The thing to notice here is that, because of the large number of flows and and dc tcp is sort of operating uh sea, wind bound and act clocked.

K

It is basically trying to maintain at least one packet in flight for each flow, which leads to a very large standing queue of all of those excess packets, which leads to a large loss rate that you can see here. Six percent for one machine 66 for the other machine, and it also has some some sort of fairness issues, um whereas um bbr v2 with ecn does a little better. uh It's a little bit more fair.

K

The retransmit rate is considerably lower around 1.6 1.7 percent uh and the fairness is a little better uh or actually comparable. I guess to dc tcp um and then, if we look at bbr swift, um the you can see that the uh the algorithm, because it's able to use the pacing rate to match its sending to the aggregate delivery rate, it's able to keep that queue. Nice and small, correspondingly achieve a very low loss rate here.

K

The loss rate is is about .05 percent, and you can see there that the network rtt on average, is around 93 microseconds, corresponding to the 50 microsecond target that was used in this particular experiment, and you can see the jane's fairness index is is fairly good. um So that's just a quick uh comparison to give you a sense of the properties uh next slide. Please.

K

So, where are we uh we're preparing for production testing, uh we're basically rolling this out in preparation for doing large-scale production workload, testing and we're also planning to release this code as open source and document the algorithm, and this includes the the timestamp implementation as well, and basically the goal here is we want transports to be able to use this algorithm as their cc on connections where a target network rtt can be known, and we know that the coexisting traffic is also running a compatible algorithm, and we also, in the long run like this, to be usable on both physical machines and inside virtual machine guests.

K

Next slide. Please.

K

So the second part I just wanted to briefly mention was some interesting observations and issues we've seen around congestion control and loss recovery next slide. Please.

K

So perhaps the slide title here is a little provocative, but I thought it was interesting to to sort of raise this issue that we've seen, because our experience is showing that both uh on data center traffic and on the public internet. This is an interesting issue. So, as this audience well knows, traditional tcp congestion control uses a multiplicative decrease upon round trips that have packet loss, reno will cut to 0.5 of the old congestion window cubic will cut to 0.7 per round trip.

K

But an interesting question arises: what, if the bandwidth available to a flow suddenly drops by a very large amount, say 100x 1000x.

K

It sounds like a lot, but this can actually happen in the data center when you are partitioning work among thousands of servers or even in the public internet, when you drop from a well-provisioned flow, going at hundreds of megabits down to a police flow at a much lower rate due to an isp police or policy and where policers are quite well deployed quite quite frequently.

K

So in theory, what happens in these kind of scenarios? Is that with something like reno? You expect a number of round trips of very high packet loss until the flow reacts fully and adapts to the new congestion window and in particular you expect a number of round trips. That is basically the old bandwidth divided by the new bandwidth, and then you take the log base two of that ratio. That tells you how long you expect to to see these high losses.

K

So if there's a thousand x cut in the fair share bandwidth, you can see 10 rounds of high loss. That's the theory! In reality, it's it's actually a little bit different with traditional tcp loss recovery before rack it actually couldn't handle consecutive rounds of loss. What tends to happen instead? Is you get a re-transmission timeout? You cut your congestion window to one and you slow start back up with tcp rack, but but no proportional rate reduction.

K

You actually see a reality that matches the theory, multiple rounds of high loss, and this can be quite painful and we've definitely seen this in experiments where you use rack, but no prr in the public internet. When you run into a policer it can get quite ugly. um But finally, if you're using rack and prr, uh you get a nice kind of behavior, where the sending rate is bounded to be quite near the delivery rate, and thus this keeps the loss rate at sort of a reasonable level while still robustly probing for bandwidth and this.

K

This is uh what you get if you say, run a default. Linux stack, you're, going to get cubic, plus prr and you'll get that kind of behavior today next slide, please.

A

You have two minutes at most. Okay, I don't think we go through the ten slides.

K

Yeah uh um yeah, I just want to zoom through the a couple of these um yeah.

D

K

Just yeah, so so what we've done with bbr is we've um I mean for bbr v1. There was a sort of prr-inspired approach for v2. We tried to simplify things by removing that and just doing a pure multiplicative decrease.

K

But what we're seeing in our experience as we roll this out, is that actually there are important scenarios where that does give you that sort of theoretical behavior- that's quite poor, and so what we're doing is we're now trying in production, various prr inspired responses, and we hope to update you in the future, either on the list or in in presentations with the results.

D

K

Slide please uh so wrapping up next slide.

K

So, as folks know, we've open sourced bbr v2 and talked about our previous ietfs. You can find the links in the slides here next slide.

K

And just a quick status update so for youtube and google.com public internet traffic, we've deployed bbr v2 for a small percentage of users as an ongoing experiment. As we refine the algorithm and code, we see reduced queuing delays and reduced losses versus pvr v1, getting closer to cubic levels for google internal traffic, we're deploying bbr v2 as the default and we're in transition there.

K

Currently, it is used as a congestion control for most of the internal traffic within google. This is using the algorithm, as previously described with bandwidth, then rtt ecn and loss as signals, as I mentioned before, we're still in the process of rolling out the code for this network. Rtt signal inspired by swift uh next slide.

K

And in conclusion, we are, you know, actively working on bbr v2 and this variant we're calling bbr swift um continuing to iterate, and we are, you know open. We love to hear feedback uh on these approaches, uh test, results uh and so forth, and we definitely appreciate the survey results from the previous uh presentation, uh for example.

K

uh So thank you very much, and uh hopefully we have a few uh moments for a q, a um if not we'll. We can take questions on the mailing list.

A

um Whitney's in queue so I'll let her get in, but I I will have to take more questions. uh Offline, go ahead.

F

I'll try and use headset is it? Is it better.

A

Yes, much better.

F

Thank you. um This is very good question, so the network rtt uh is obviously used for bvr swift is it? Is it used like how is the delay thing working in bbr v2 if network rtt is used or if it's not used.

K

uh Sorry, can you repeat that last part, how is the what used.

F

So, are you doing the multiplicative decrease for bb or v2 as well in the van using network rtt or something else.

K

So, in the wan case, we are not using the the network rtt signal um the basic uh practical issue there is that usually for land paths, um you don't know the uh target round trip time ahead of time um and so in our deployment. So far, we are we're, definitely just using the target rtt uh within a data center for the wan flows, they're just using ecn and lost signals.

K

We do have ideas about how we might unify those and use a delay-based signal.

K

You know, I guess I briefly alluded to that in terms of perhaps using ecn signals to find the delay the target delay at which we'd like to match, based on the transition between rtt's. Above that point, where we see ecn marks, our gt is below that point where we see no ecm marks, use that as a sort of way to uh find a target rt dynamically for for when phase, but that's a future work.

F

Right, a quick queuing delay can be measured right by just having a base delay and a target target whatever just like what led back does.

K

Sure, but to have that base delay you sort of need to you know in general to be able to to distinguish um a standing queue for a longer wire. It's it's uh can be quite tricky unless you either have knowledge ahead of time or you have an ecn signal or something that allows you to disambiguate. Those.

A

All right well, thank you so much neil and thank you video for that question. um Please continue this conversation on uh the mailing list. Again, I'm sure a lot of people are interested in the relationship between dbr swift and that that plus, plus and and uh questions on dbrv2. Please continue that on the mailing list, uh thanks neil silvester you're up with your uh I'm going to bring your presentation up and there we are take it away.

L

Can I share my screen, you know, is that.

A

Yes, let me let me give you the screen.

L

I'm in the screen sharing cue, so it should be working. I see yep.

L

Can you see it? Can you hear me well, can you hear me yes, okay, and can you also see my screen? Yes, okay, thank you! So hello, everyone, I'm sylvester, and with my co-authors, we are interested in internet resource sharing, so we run some tests, but studies on fairness and we chose bbrv to condition control as a new wave of congestion control to control compared to existing ones, because it's designed to be friendly to cubic flows as opposed to e1. It has scalable ecm response and it already has some deployments.

L

So we use three machines connected and train chain topology in our test bed: traffic generator receiver and the button in the middle and on the sender and receiver. We installed the bbr v2 alpha kernel and we used default linux settings. We implemented several aqms in dpdk, tail drop, pi gsp step, pi square, dual pi square and and virtual drawer queue core stateless aqm.

L

We use data delay for rtt emulation of acknowledgements to emulate. Rtt the button accrete was changed between 100 meg and 10 gigs and the results I present are with one gigabit per second and we use several condition: controls cubic bbrv to in both scalable and classic mode and dc-tcp.

L

We change the number of flows from 2 to 100 in this measurement.

L

So this is an example measurement results. We have two connection classes connection classes are identified by congestion, control and rtt, so this was cubic. Ten millisecond and bbr 10 millisecond rtt is over one gear per second button. Like we change the number of connections, half is from one connection class. The other half is from the other.

L

The buffer size is set as a factor of the rtt, so 0.5 means five millisecond in this case buffer, and we present plot the relative good. Where one is the ideal and the relative good put of a connection class is the average good put within the connection clause divided by the ideal, perfect pair connection, fair share, and we also studied several like aqms.

L

So we have seen reasonable fairness with tail drop when it comes to sharing between cubic and bbr v2. It was. It was much worse for small buffers and we also have seen good fairness with csaqm for basically all cases.

L

So what happens with the different aqms we plotted? The tail drop results, the grey shadow for reference, so with pi. The the fairness is very similar to tail drop, while with gsp we have seen huge degradation compared to drop for a smaller number of users, and it was similar to teradrop for a larger number of users.

L

So what is what is csaqm mentioned as a result, so, in addition to existing aqms, we also have it of course, uh csaqm, which is a core status resource sharing framework. It can apply a wide variety of policies, not only fair sharing and it can enforce these policies for heterogeneous traffic mixes, and it also scales well with a very large number of flows, because the algorithm itself is stateless and it's also conduction control independent. It puts no assumption on how the condition control behaves.

L

It relies on packet marks with different values. Larger values mean more important. Packets in conditional situations, packets with smaller values, can be dropped or marked with a condition experience. Dcm, flag and the button like behavior is purely based on the packet values. So we don't have to do any flow identification. We don't have to use separate queues or decode the policy information anyway. Therefore, the implementation can be very simple and fast.

L

At the same time, it's it's needs some standardization or it has to be done within an administrative domain because it needs a header field.

L

Even though the the aqm is almost like this installs that we were implementing able to implement it in p4 you can. You can find much more about it at our home page, so getting back to results.

L

We also also use the dc-tcp, like condition controls, so we we compare the fairness of dc-tcp and vbrv2 in scalable mode in in this case, instead of changing buffer size, we change the target delay as a factor of rtt and our key findings here is that why the step aqm prefers dc tcp, uh pi square aqm for pi, squared the bbrp v2, and this actually hard to choose which one is better and for pi square. The fan is improving as the number of flows grows by by which step it.

L

It gets worse for larger number of flows and the csa qms mentioned before it. It provided a reasonable fairness in all these cases.

L

Again, this was the relative throughput. I have also shown for the bbrv two versus cubic cases.

L

So, let's see how these aqms are marking the connection process, the ec and marking ratio here shows what what is the fraction of the congestion experienced mark packets compared to the total number of packets? It's a logarithmic scale and the y-axis.

L

It is clear that step and pi square both connection classes are marked the same and because the two condition controls use the ecn feedback differently. This results, in that fairness, as shown in the previous figure and with csaqm, there are seemingly no connection connection with the marking ratios of the connection classes. So there is no clear formula how to mark the the packets, but this is exactly the the right marking ratio to to achieve good fairness.

L

um We also compare the fairness between vbrv to scalable end and cubic flows over dua pi square bottom, like we in this time. I present time series plot and the number of flows in from the different classes changes between 0 and 50. You can see the number of flows at the top.

L

And you can see that dual pi square can control bbr, v2 alphas traffic class, leading to significant unfairness between the two traffic classes in in general.

D

When the number.

L

Of bbr two flows is large: uh the classic flows, experience very small circuits, so please be available. This is a logarithmic scale. You can see the the total gas throughput at the top.

L

And we believe that this is because vbrv2 applies, the model-based condition control. But what happens if the network works with different models? So this this kind of unfairness can happen then, and comparing that to to do our cue called stateless aqm that can that can provide a pretty good fairness by not applying not assuming anything about the condition control used.

L

So, in summary, we have, we have uh performed much more testbed discernments. I will have a link to to all the results, but most condition controls have rtt fairness issues even in mono congestion, control scenarios and evolve congestion controls have fairness issues with legacy.

L

Bbr v2 vs cubic fairness is very dependent on settings. I actually shown some some good results. Sometimes it can become quite bad and dc-tcp versus bbr, v2 scalable mode in in general, provides that fairness and then an interesting finding we have we have seen is that aqm tunes for a specific congestion control have actually the potential to hurt the resistance even more and and they very rarely have it, even though they they hop in, for example, multi-rtt scenarios when the specific condition control is used.

L

So there are examples for the graded performance compared to teardrop or spend, and it's pi and gsp for bbr v2, vs cubic pi square for dc tcp versus bbr v2 and dual pi square for bbrv to scalable mode versus cubic.

L

So in summary, the condition control evolution has accelerated. It's also possible to use a space congestion, control or condition controlling in bpf, but it's very hard for a new congestion controller to be both innovative and to be fair to existing condition control. So we don't want to say in any way that bbr v2 is a bad congestion control.

L

We believe it's actually a quite good one, but but, as also stated in reference number, one tcp friendliness greatly constrains how we can handle congestion in the internet and and why fairness to existence, congestion controls is often demonstrated in in special cases, that is, that is not universal and now and as the number of deployed congestion controls increases, this becomes even harder, and even the harm-based buyer for a new new congestion control is closely impossible to meet.

L

So this is, I refer to the presentation which jonah mentioned at the beginning, and I believe that two congestion controls can be still tuned to be compatible for some scenarios, but we are skeptical that this can be generic enough for it. This can be done among several new congestion controls.

L

So what can be done? How can we provide fairness? What are the ways forward so today, fairness is dominated by end-to-end condition, control and and over provisioning, and we question whether this is still the way or or whether actually tcp, friendliness to reno and or data center. Tcp, a point of facification and the similar point of a classification is aqm's tuned for a specific condition. Control behavior.

L

There is another other method used. Quite often is that is fairness, scheduling by scheduling the network, for example fqa qms, and here articular qs, and this is used. This is working quite well. It has its issues.

L

It's it's not practical for high-speed users, it results on equal or or static sharing, which is not always optimal, and communicating policies to every potential button is hard.

L

So we believe that cooperative approaches like csaqm has a good potential for controlling resource sharing uh flow identification and policy decisions are done at that point endpoint or at the network edge. In this case, the implementation in the routers is then very simple and invariant to the number of flows or invariant to the policies used, though it requires a header field, but to be on the fair side. uh Headers or some kind of solutions are needed for for for many other solutions like we have ecn, we have an l4 s bit or we are.

L

We are proposing enough for usb, and also there is the scp, but this this one requires a new header field so to to compare these solutions created a table with the free methods and condition, control in network and cooperative sharing, so engine condition, control, provides fairness by congestion, control issue, condition control, but it has fairness issues and there. Ftt unfairness is hard to solve in network scheduling, provides very good fairness and actually solves rtt and fairness by cooperative resource sharing provides fairness by marking packet marking, plus aqm.

L

It provides good fairness and it also solves the rtt and fairness issues, so resource sharing is dynamic, front and condition, control and cooperative and pretty static for the network solutions and the end host control.

L

The antenna condition controls has a full illusion of the entrance control and I'm saying illusion here, because there is still a fake sharing of packets and also condition control aggressiveness and rtt is very chaotic and and actually hard to control.

L

There is limited control of anthrax in the in-network scheduling scenarios and with the cooperative. If actually the marking is done at the end point there can be high control high amount of control, while it can be limited if there is edge marking. So it's if, if the entrance is not communicating with the edge and congestion control evolution is, is constrained by end-to-end condition, control based fairning fairness, because, uh because of the harm to existing congestion control- and it's less constrained constrained in the two other cases and the bottleneck complexities is low for end-to-end condition control.

L

This is basically the buttons we have today. We don't have to change anything for the in-network. We believe that in most cases, some kind of cpu-based solution is needed, especially for high number flows or especially, if you want to control higher ories of resource sharing. While it's medium for cooperative, we were able to successfully implement the aqm in in p4.

L

There is no need for signaling for end-to-end condition, control. uh There is a high need of signaling for every potential button, like in the in-network case and depending on how how we do marking some kind of signaling might be needed, and actually the packet marking is a kind of inbound signaling. Also also the antenna condition country is a cooldown state of effort, so it doesn't require standardization.

L

The in-network likely requires some signaling unless we are okay with the flow fairness basically and the cooperatives require some kind of standardization for packet marking.

L

So questions to community future work.

L

First question is what more to include in these type of evaluation: congestion controls, aqms rtts also, what is uh what are the typical implementations when it comes to operating systems and meaningful defaults of the of the congestion controls, also very important question to to discuss and and uh is that? What are the typical battery lacks? What is the speed of them? How many flows do we have over them and how many button actually consider in the path?

L

And the third interesting question is: is the effect of the sub millisecond internet on fairness, so some caches are very close to the edge and do and flow still have a chance when, when sharing a button like form with these submission flows, so you can find our results at this web pages and I'm looking forward to your questions and comments. Thank you.

A

What's yesterday,.

A

um Cohn is in line, go for it.

M

Hi here, can you hear me? Yes? Yes, thank you for this work. It's very interesting and I think it opens also a very interesting discussion, um because, indeed, from from alpharest point of view, the strategy here is to try to line up long-term congestion controls to to be fair so that the aqm, like you showed, doesn't need to make differentiation between the different types of congestion controls well for alpharest, we did a big up between the classic and and alpha s traffic.

M

So I think it's very valuable to do these these comparisons.

M

Definitely um it's also a good discussion, whether um we we should do this differentiation or not um from within the network, and whether the network is really responsible for doing this, or that we also should focus also on the cases where it's not possible, to do something in the network to also make sure that congestion controls really have a common protocol and a common behavior related to ecn marking definitely drops, but also, I think, it's more important to to have this behavior on the longer term. um So yeah.

M

I think it's a a good point for discussion and, as as you know, alphares is relying on on the the end system to to uh to do at least their best they can do to to line up.

M

I don't know yeah, I guess it's up to other people also to have this discussion.

M

But thank you for the work. It's very useful.

L

Thank you for your comment. Coin.

A

Thank you and thank you so much sylvester thank.

K

A

I would again encourage the conversation to continue on the mailing list. This is for the presenters as well as for the for the other folks in the group. uh Please go ahead and and kickstart conversations on the mailing list. I think that's a much much higher.

A

uh You can have much deeper and higher bandwidth engagement there. um All right moving along is uh bob or which one of you is going to do. This.

D

I'm probably going to start um and then will do um we'll switch to in the middle.

A

Okay, uh just a quick note uh bob that we have 16 minutes before the end of the session. um So I want to give you a heads up on that: okay,.

D

Take it away all right, so um this is: let's talk about tcp prague, um which the authors are there? Let's move straight on next slide, uh jenna.

D

Yep um so, as I mentioned in tsvwg, if you were there, this is going to be a bit of an invitation to collaborate, because um when we first started on l4s back five years ago, I think um dc tcp that we were using version. 319, the linux linux kernel um just happened to work really well, but um and we carried on using it for um maybe three years, because we were really focusing on aqm products. um We were mainly network companies.

D

We were dealing with the um some safety aspects, but um we were largely just sticking with uh what worked and when we tried to use later versions of the kernel. It didn't work that well, but um we were mostly just sticking with what we've got and then we started getting criticisms that didn't work with later kernels and finally started looking at it and found there was a real rat's nest of tangled bugs um and they seemed to have come into the um linux kernel um since 319 and uh it's taken us.

D

It took us months to work it all out anyway. So coon's going to talk a bit about that. In the middle of this talk, I mean we, we fixed it probably about a year ago now, but we haven't really talked about it since um so what? What we really think is that it got a bit of a bad reputation for um not being um it's possible to uh reproduce any of our results, because no one could um use it on the latest kernel, and so we we want to do a bit of a relaunch.

D

Now that the code base is usable for others, you know it's been up um against the latest version of um the kernel now for a good um year or so, and so um and also it seems likely that we're going to start seeing deployments in the network- probably not like in the next few months, because they're going to depend on the code point assignment, but once that does come, um I think the um you'll start to see it in production networks. So um I'll come back to the invitation to collaborate at the end.

D

If we can move on to the next slide. That was just a bit of background as to why we're giving this talk jenna next.

D

So first I wanted to give a very, very quick tutorial on dc tcp, so for those who aren't familiar with some of the things coon's going to talk about it, it will give some context next.

D

Right so I guess the the main thing, um I believe, is important about data center tcp, and it's not often seen this way is um that the smoothing of the congestion signals shifts out of the network, where um aqms traditionally have filtered out variations in the queue.

D

Obviously, the queue still varies: they've filtered out measurements of variations in the queue before signaling drop because they didn't want to signal drop too early and um because dc tcp uses ecn, it can shift that responsibility to the end system, and then the important difference is that the when it's in the network, the um delay that the smoothing has to add.

D

um Just in case it's not really um just in case the q variation is going to go away, has to be about a worst case, rtt, typically 100 milliseconds.

D

um Whereas once you move to the end system, it can smooth based on its own round trip time, and it always the end. Systems that are using a particular bottleneck can all be smoothing on their own round trip times um and also, very importantly, the zero there um that they actually get the signal with no smoothing delay.

D

If they want to react to it straight away, so if you can do next slide, it builds um jenna, then they can choose not to smooth at all, for instance during flow when the flow is starting up and of course, um you've got zero delay on the network side as well. um You've got an instantaneous q um aqm, sorry.

D

D

I said I'd go quick and I'm not am I um through this, um so the the way that the end system smooths in um in dc tcp. This is really the only difference from reno other than a load of implementation. Details different, but it's the only real design difference. It just takes a fraction of the marks every round trip. Time doesn't ewma of it and then uses that ewma to scale down the reductions, which is why it does a reduction by extent. Next.

D

Yep, so um the effect of that is the memory in the end systems. um It deals with the fact that short flows and bursts are effectively unresponsive and what it does it um in the classic approach.

D

Sorry, um not what it does yet in the classic approach and including bbr um short flows burst into the buffer, so the the buffer is held um slightly full, even with bbr um and and fuller, with um obviously um tower drop and the rest of it, um and then short flows burst into the buffer where, whereas with um data center, tcp and l4s, and so on, uh the long running flows leave headroom for the recent level of short short flows, um and they they've learned that headroom from maintaining that ewma of the feedback, and so the short flows um burst into that headroom and stay below the threshold unless they're, just occasional, surprise short flows that the memory um isn't used to next.

D

So that's really um next jenna. um So this is now the core of the presentation. um Next, please jana and just wanted to start with a um traffic light slide that I've used before um many times, but you'll see it's got some extra bits added on the end.

D

um These are all the bits of um a prior congestion control that have to um be there to be safe on the internet, for the in the first block, the requirements and to perform well, which is the second block, and that used to just have two items in it: the top two in the performance area, um but we've added a number of others, as we've found all these problems that um kuhn's going to talk more about some of those areas um and I'm not going to even read out all the titles.

D

And you know you can have a look at this slide in your own time, um because the point of it is merely to show that there's more stuff added on the bottom, uh including bug fixes, which you know obviously, performance improvements. And when I say bug fixes these aren't sort of code bugs they're performance bugs where the effect of the bug is to reduce the performance so um kuhn. I don't know whether you want to pick up on this side, I'll move straight to the next one.

M

Yeah next slide, I guess for the time so first slide I want to show is um uh the the the improvements we did in in prague. We had to do in prague to to solve the the quite bad degraded behavior of data centers being the recent kernels. So um if you, if you look uh the queuing, latency spikes are very high um and- and we found out it's mainly due to less responsiveness on on one hand because of rounding downs, also not enough number of bits which were used in integers.

M

We, we definitely had issues with tso sizing and the pacing with it. So the default settings which we had to adapt there are bugs in br, which have a small impact on cubic and other.

M

Let's say classic congestion control, but had a huge impact on on data center tcp and also to to even further improve smoothing. We also found that the there was a need for partial additive increase so instead of when there is a slide for it uh non-response at a certain time, which makes it also vary a lot.

M

So if you really look at it under the same conditions with the one millisecond thresholds, if people tried in a long year to reproduce and- and I also saw that uh the the presentations from sylvester um also use data center tcp- the latest kernel version- you will see it really underutilizes the link.

M

So in our prague version we fixed all those and it was really difficult because we had to remove all of them before we got the good result again. um So we spent quite some time a few years or a year ago or more than a year ago, in the meantime, so to get back our initial 3.19 results. So I think that's very important to know if you want to do experiments. Please use prague instead of data center tcp in the latest kernel next slide.

M

So the the the smoother steady state additive increase.

M

If you are aware, if data center tcp is driven by an aqm which is smoothing like in a coupling or or in in pi square or whatever other aqm you you need, you expect that every round trip time there are marks, but classic uh congestion controls and also data center tcp took it over. It suppresses the additive increase in the round trip time after a multiple decrease, so when it's in the congestion window reduce state.

M

So that means that if you get marks every round trip time for you inspect expect that that you don't have any opportunity to increase so um because of that, your aqm will start to oscillate the interaction, because it's putting the right marking probability and then it becomes non-responsive.

M

So it goes down because it allows it to increase again and then certainly it's uh the probability is too low. So so all of these interactions were creating extra periods of going up going down, which are not good, of course, and then also for round trip time dependence, if you have a very big round trip time. Of course, if you compete with a small round trip time, you will definitely get every round trip time marks. So that means that a bigger round trip time will get pushed down completely.

M

So this is an important difference between prague and data center, tcp that we have in in prague so in prague. What we do we increase on every ack, except on the ones that echo an ecn mark. So it's a kind of proportional additive increase as well, so we we do only one, a half a packet increase if 50 of the the packets are marked in that case, okay next slide.

M

um So so to to compare a little bit, uh what are the the the real benefits of of using prague or alfres, or the data center tcp kind of uh flows? Is that um obviously we have a very smooth throughput um and a very low latency. You see here uh in in blue at the right. We are below one millisecond, still having full link utilization in a wide range of cases.

M

Okay, there is a a worse startup and it becomes worse when the bdp becomes bigger so for longer round trip times, but these these are things which we haven't fully focused on or or well. There are research in progress. uh Work is being done, but, okay, it's!

M

uh It was a little bit shifted behind because of all the the safety issues and and the the discussions on the mailing list, but anyway that there are potentials to to clearly potentials to to optimize that so comparing it to okay, the best case uh codal five milliseconds on on a bottleneck. There is still a significant improvement. Let's say next slide.

M

So um one of the things we have also worked on is a better round-trip time independence and uh we can play with it and and do whatever we want with it. So um that's that's the main message, the the the discussion- and there is a lot of discussion on the mailing list about what it should do, but I mean um tcp practicing itself can be made completely round with time. Independent like is shown at the right side. So here we have different flows from half a millisecond around three times.

M

One millisecond round trip time, 10 and 30 milliseconds.

M

If you look at how it works on a codal aqm, which has a buffer of five milliseconds, so luckily it adds five milliseconds to each round trip time and if it's then proportional to the one over the the effective round of time, including the buffer, you will see the different rates, so the one with the 30 milliseconds really has a very low latency compared to the 0.5 and 1 millisecond, which effectively get around 5 6 millisecond latency.

M

If you look at at prague, which only has one millisecond buffer uh initially in the beginning of the flow, you see you get an impression of what would be the the the rates and and what we did in in prague is after 500 milliseconds.

M

We enable this uh conversions towards uh fair share, so you see after a while, all the rates go and share the link evenly, and why do we not immediately do that, because we think it would be a good strategy to start from a dynamics, point of view to still get the benefits of your lower latency, but fairness and convergence is is a process of longer time. So we only need to that's over longer time, so we don't want to disadvantage smaller round trip times when it's about dynamics, but we want.

M

We don't want to this this uh or give these advantages to to the longer round trip times if it's a matter of of downloads and if the round trip time is very long well, there is not much the base round. Retime is already very long. There is not much uh possibility to do interactive applications and the the shorter term interactive uh mechanisms will will not make a big difference.

M

So that's a little bit the strategy that we that we follow, but that's of course, also for discussion or it can be adapted based on applications that that use this mechanism. But the the good point is that, and I think it can be done in every congestion control that, after a while, if you have, if you are in a very steady state that you can all convert to the same fair throughput uh for the rest, it's it's still a matter of being non-responsive and and use whatever.

M

There is because congestion control cannot cannot happen in a very short time frame, especially not for short transactions anyway. So, okay, next.

A

Slide kuhn yeah. Can we have past time? Do you want to wrap up in a minute.

M

Yes, uh next slide quickly,.

D

Okay, bob yeah yeah, just I think, look, there's just one more two, more slides, actually and I'll go very quickly, because I've said a lot of this before um so we think there's a lot of potential in exploiting high fidelity ecn markings, particularly as um there are. There are signs that there are going to be operators deploying that so um you know network operators.

D

So um there's a list here of possible topics to work on. I I was hoping um we could go through this, but I guess it's a bit late. Just um so you'll just have to quickly look at them, but they're. You know if there's any people in the room that are looking for a research project. You know um uh coming to the end of a masters wanting to do a phd or whatever um or phd students. Looking a bit lost um or anyone who's.

D

You know a postdoc or whatever there's there's all sorts of things there to look at, um and um you know we're sort of wanting to try and be a bit more open and be um a bit more helpful and the next slide was really um uh just just.

D

We want to start thinking about um a more um a better way to be able to visualize comparisons, to be able to um um come up with common metrics, because because at the moment a lot of the metrics aren't aren't common.

D

um There are drafts on reference test cases and rfcs, but not many that really focus on low latency. um 7928 is probably the closest and um possibly uh well. Certainly, everyone's got to use re reusable tools.

D

I don't think common tools is a um an aspiration that is realistic, but um certainly make sure that other people can use your tools, and so um I think we're gonna end there, just if you can switch to the last slide and leave it up. That's just some point as to how you get involved thanks any questions.

D

I know we're over time, but.

A

Yep, uh thank you so much bob and cone. um We have two people in line and I'm gonna cut off the line after that. uh Well, god is in line too, so you just go.

E

So uh how do you solve or improve the rtt uh fairness by the proportional increase on the axe? I can't quite get the the inside there.

D

Cool that sounds like it's yeah.

M

So so your question is how how did we improve it.

E

Yeah, how do you get better rtt fairness.

M

So so what we do is um that we adapt the additive increase, so we we slow down the additive increase and um we.

D

Can I jump in because I think I think I know why you chung the description of what explained for the additive increase in this talk? Wasn't the there's another change, the added to be increased that he hasn't talked about in this talk during the rtt unfairness? Does that help.

E

Okay, I guess yeah well.

M

I did a presentation for it, I think in in the previous or one of the previous iccrgs, and if you have questions, maybe ask on the mailing list, we can go in more details. I guess right.

A

All right, jonathan you're up next and let's keep this quick.

J

Okay, I'd just like to highlight the fundamental feature of gcp park, which is it relies on a redefinition of the congestion experience mark in ecm.

J

um The uh the standard definition is that it requests a multiplicative decrease in, but prague expects it to mean an additive decrease. So um I think no, it's multiplicative.

D

J

To so, I think a lot of the discussion is going to have to talk about the uh resolving that discrepancy.

M

J

M

I think it even doesn't matter it's it's just in the long term, when we are in a kind of steady situation where we want fairness in download situations or or whatever. If you, if you really can measure what is the the share of each flow there, we want to converge to uh or an obeyance, let's say to a kind of marking probability to rate equation, and that's all of it. It doesn't mean that you have to be a imd.

M

You can do whatever you want very clever, and even if you see that you're in a dynamic situation, you can make optimal use of the dynamic situation. It's only and it's the only way to measure if you are on a long standing uh link very greedy situation.

M

It is important to converge because that's where it's measured, but on a short term, you can do whatever you want. Well, that's maybe another point of discussion. What are we going to allow? um But in alpharest you still need to keep the the the latency low, for instance, but uh well that's the main difference.

D

M

You're being far too.

D

Clever here, jonathan's just got a misconception that, because uh the the reduction is equivalent to reducing by half a pack at every mark, he thinks that's additive. But it's it's it's repetitive additive. Therefore, it's multiplicative, that's.

J

The whole point: no each each mark is additive in park.

D

J

It's actually not in prague.

D

It is in relentless, but in prague it does a multiplicative decrease based on the number of marks over a round trip.

A

All right with that yeah I will call. I will call uh um this. This is thank you so much bob and cone and jonathan for asking the questions, but uh I'm sorry that we had to rush a little bit at the end, um but thank you so much and thank you, everybody for staying eight minutes. Past time um this has been an excellent session.

A

Please continue the conversations on the mailing list. um Let's use that medium. I think there are a lot of conversations that we want to have.

A

So let's have them there um enjoy the rest of the idf and uh hopefully we'll see you again soon. Thank.

D

You jana for cheering.

A

Us so well.

D

A

A