Internet Engineering Task Force 110, 9 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF110-ICCRG-20210309-1200

Description

ICCRG meeting session at IETF110
2021/03/09 1200

https://datatracker.ietf.org/meeting/110/proceedings/

A

Hello, everyone can you hear me richard.

A

A

All right, we'll just give it a minute more, but in the meantime, if anyone would like to volunteer to do um to take down the minutes that save us some time going through the presentations that give us more time to do more interesting discussions, let me put it that.

A

A

Are you volunteering richard?

A

I will take that as richard is volunteering? Aha, there you go well done. Thank you, sir. So um I'll call you out in a.

B

Moment and if we.

A

Do we need a job ascribe for this, I'm tempted to say no, unless somebody says we should I'm not gonna ask for a job ascribe.

A

A

If I can get this going right.

A

I'm going to try to share my screen from here.

A

Do you want to share your screen? Yes, I did not change my mind.

A

All right, I think people can see my screen. Yes, I can see it clearly.

A

All right, very good.

A

There you go even.

A

Better um all right, let's get started it's a few minutes past the hour and uh welcome to iccrg everyone, um it's ridiculously early for me, it's four in the morning for me. Hopefully it's a more sane time for many of you, um but uh I'm looking forward to this session. Hopefully this will wake you up and hopefully you'll be engaged as we go through this.

A

um So let's just start off with a couple of notes at a high level. Just uh note: well, oh, hang on. Let me fix this, so you can actually see it there. You go.

A

I'm reading chris's note here all right chris, if you're willing and able to do this. I would very much appreciate that.

A

A

All right we'll go through that in a second. So that's the note. Well, um if you haven't taken a look at it, please do uh please read it before you. You uh participate at the mic, I'm not gonna walk through all of it in in in detail, but uh that and then you note on the code of conduct that we are bound by the iit of code of conduct.

A

If this is important, uh please make sure you've seen this as well and read this as well before you participate um and here's the high level goals for the irtf and reminded the folks here that the irtf is not a standards, development organization. It is a research organization and uh with that I will jump into the agenda for today.

A

So we have a several presentations today and we have a an excellent uh agenda here uh with with uh events which are we I'll talk about the first one in a moment, but um we have a number of presentations and I hope uh that everybody stays engaged to the entire meeting.

A

I uh um I'll thank richard for for uh taking down the minutes and, as chris pointed out on the chat, if somebody wants any comments, my any comments to be voiced at the mic. Please prefix it with mike in this chat and chris has offered to kindly deliver them at the mic and I'll take him up on that.

A

That is about all from from me. I am going to now kick off the first presentation from rui now this is a slightly unusual presentation in that we are actually going to be talking about a congestion control. That is more um it's more geared towards data centers. uh We will kick it off with that presentation. Rui. Do you want to I'm looking here to see? Do you have to come up and ask for camera and mike.

C

uh Can you hear me now? Yes, we can hear you, okay, let me get you going.

C

Okay, perfect: go for.

A

C

um Hi everyone- this is raymil from alibaba group, so today we were trying to introduce the hpcc plus plus, so we use use http plus plus to control the conjunction precisely which you can achieve near zero q in our data center and also maintain very high throughput.

C

So this is our work we publish in sitcom 2019 and then, after that, we actively deployed this technique into alibaba cloud and with many other vendors. So by that time we think, like we are working with many vendors to want to standardize this algorithm and this design, so we can be aligned in the in the same setting.

C

C

Okay, so so the motivation for our design is like today in the clouds we design very high performance networking. So there are a couple of motivation uh for our design. First is the first um application in the cloud right now is very sensitive to latency inflation, for example, in the ebs, elastic block storage, the sra guarantee on the latency is 200 microseconds and some other like key value application memory. Id application also requires some millisecond latency, so we we think that makes the network become one more important.

C

The second is that this many applications, such as high performance storage and the distributed deep learning application that use a device hardware that can generate the data orders, magnitude faster and also require outer low latency for those applications, so that network become a new bottleneck in our system and the last part is the new architecture is happening in the cloud, for example, the resource disaggregation, which separates the compute memory and the disk into separate resources pools. So even memory access will go through a network in the future.

C

So in this case the network becomes very important, require very low latency for those settings. So please, next page.

C

So to support those applications, the silicon manufacturer generate fast asics because the traditional softwares that cannot keep up with the speed, so the hardware offloading is inevitable. So, however, using faster hardware can generate traffic more aggressively. The network congestion becomes a severe problem for our clouds. For example, we running rdma in our network. We we find that there are lots of pfc store and also data, lock events, so we find it's necessary to solve the conjugation control problem in this case.

C

C

So um we identify a couple of key issues in our conjunction control, er high speed network, the first issue that we observe lots of pfc store and the deadlock event in our rdma network, and we we find that's a cause of stability issue in our network. However, we cannot disable pfc because that will affect the application performance as well. So the key, the key uh insight for this limitation is that the current congestion control algorithm is has has a very slow convergence speed. So so that's the essential part.

C

We want to address the second issue that we want to run application mix of application. In the same cluster, however, we cannot achieve both high throughput and the low latency for different applications. So the essential reason for that is the traditional condition. Control algorithm rely on the standing queue so when the queue build up, the condition control works uh start to work to react to the congestion. However, when the standing cube build up, the application latency has already been affected.

C

The last part is we, we run dcqc, which is a state of art, conjunction control, algorithm for the rdma network, and that we find that those those algorithms rely on the heuristic to configure their algorithms, so, for example, in dcqcn they use at least 15 parameter so to to to work with the congestion control. So, in our case, it's really time consuming to tune those parameters to work for different workloads.

C

Next, please, so this is why we we want to come up with a new condition, control algorithm, which you can address our currently issue in our production.

C

So the new opportunity we found is that new commodity, asics switching ethics have invented elementary ability, which tells what the status in real time that switch has.

C

So the idea is to use inband telemetry as a precise feedback for conjunction control, so, for example, in this in this figure, so the sender generate packets package of the user data and when the packet goes through the network, each hop of the switch will add a telemetry information into the packet.

C

So those telemetry information includes the q lens and the tx bytes and time stem.

C

So to allow us to calculate precisely the the extent of the congestion, so um the packet continues forward around the pass and each switch will attach the terminal tree of its own into the package. When the package arrives at the destination, the reservoir will generate acknowledgement. Packets back to the sender, while putting those telemetry pack information into the act packet to allow the sender to adjust the scenery based on telemetry information.

C

So, as I said, the the most important telemetry information we need is the key lens and the tx rates so which it tells precisely if the buffer build up for this particular egress port and more important that, if there's no queue buildup in in a in the eagles port, we still can use the txt rate to quantify the the occupancy of the link. For example, we can control the link to you use as a 95 percent utilization.

C

So in this case we can maintain near zero q because we have a five percent bandwidth buffer to absorb the the traffic burst. While still, we can ensure very high throughput to saturated, saturated link.

C

C

So we we want to compare our inbantanometry mechanism with the traditional isin marking easy, as we know that a single bit notification of condition uh is a simple and efficient uh being supported by many vendors.

C

We viewed that inventory is an advanced version of ecn which provides a fun green network load information, um for example, q lens precisely q lens uh in terms of bias or in terms of sales, and also transmit bias, which allows to generate on the calculate the tx rates on the egress port, based on the tx byte and the timestamp information.

C

The link capacity. Actually, we use to con to differentiate a different link. Either is 100g link is 25g, link the benefit or the key insight we get is. First, we can convert to uh appropriate uh just just a one round trip ton. We can adapt to the uh corrector rates to avoid congestion, while traditional condition control.

C

If you rely on the heuristics every time you cut to half and after multiple round trip time, you can adapt to the red sending rate the second benefit is we can constantly maintain a near zero queue for loading latency, while still we maintain a very high throughput, because we can set a bandwidth buffer to observe the small burst without actually building up a queue.

C

So there are a couple of numeral comparison or overhead of using inventorment tree. The first one is we we use in the paper. We allow each packet to carry the telemetry information which uses up up to less than five percent of the bandwidth.

C

That's a uh ideal case, but in reality we we prefer uh to query inbound animation information per round trip time. In this case, we use a standard rfi 1.0, which is iotf standard, int format and for those standard we have 200 bytes metadata for the for the int, which is account for only 2.5 percent of the binaries.

C

So in this case, we um we see that, because bandwidth is the generally abundant in the data center and we we treat, we use the um those small bandwidths for for low latency. I think we think it's a good trade-off.

C

C

uh Should I address the question right now just later for the after the presentation.

C

um That's up to you rui, oh okay! I can. I can talk about that. um L4S.

C

Oh ifrs, I I'm not quite sure the detail, but it seems like a another standard in the tsv vwg right so um yeah. They proposed to use the fundraising, um um I'm not sure how to compare it, because uh I remember ps l4s targeting on the wide area network and uh it focused on the the switch support to to improve the traditional ecn.

C

While we're focusing more on the data center environment, uh where the the ion team features can be supported in those switches that can give us more information and especially for the for the for the data center, the bandwidth is uh abundant. So we can use the those boundaries to send the telemetry information.

C

If the protocol intend to be used on the internet, so right now, we are focused more on the on the data center environment because you know the because the telemetry information is more directed to the data center or cloud provider, so those low level, uh congestion, information or traffic load information is very sensitive to uh to the to the network operator.

C

So if you, if we want to use the internet wide environment, then the one issue is maybe explode, those detailed information to the to the other user. Maybe there are some privacy issues.

A

um Through just a high level point, the chat will keep going on uh you may you may want to you, want to finish your presentation and then see if people want to answer questions at the mic or you can continue the conversation on the chat after your presentation as well. Okay,.

C

Okay, okay, thank you. So we will highlight our design, so the the key design is using implantation as a precise feedback. We can quantify the condition and precisely and react to it precisely so that this gave us three benefits.

C

First, the first convergence, so the sender know exactly uh how to react to the condition and- and it can adapt to the precise rate in just one round trip time.

C

The second is we can maintain near zero q, because our feedback here doesn't rely on the q lens, not only rely on the q lens, but also we learn the tx rate and also in our case we have few parameters.

C

We only have three parameters and those parameters is not related to the uh the performance, just the trade-off. So in this case we don't need any heuristic to infer the status because we can directly get what what's going on in the network.

C

C

So this is the preliminary result from our smarnic implementation we deployed and as we see that in the in the left, figure showed that the axis is the different flow size. um The y-axis is a 99 percentile normalized flow composition. Time actually in this figure, the lower the better the lower means, the lower latency, and we compare our design with dcqcn the the purple curve, which is the default condition control in rdma.

C

And we, as we see that for those small messages we can reduce the latency by 25 and even in a medium like 200k um flow. So we can still reduce the latency by 50 55 percent in the right figure.

C

We showed that overall in the experiment, uh the distribution of the q length in the switch and as we see that in the hpcc plus plus, we say that in even in the 99 percentile, the q lens is only 23 kilobytes, which translate to seven micro, second and q, delay in 25 g link, which demonstrates that we can achieve ultra low latency while maintaining near zero q.

C

um That's all for for my presentation, I'm happy to take any questions right.

A

A

Thank you roy. um I see vidi at the mic. Let me.

A

Okay, I thought this was.

D

A

There you go with this.

D

uh Thank you, rui for the great presentation. I just have a question in the graph that you were showing.

E

D

Just the previous slide.

C

D

Okay, oh, can we switch to the previous slide if that's okay,.

D

So the first graph, why does the? Why does the latency becomes normalized like for the purple and the green I was. I was imagining in my head that it would always stay low. Why does it become the same at higher.

C

Oh yeah, so for the higher, which means the flow size is larger than one megabyte, for example, is, is a throughput oriented for those flows and those flow maybe need uh tons of rounds to finishing.

D

All right, but don't we have much better feedback, is. Is this assuming some kind of loss or is it like a real-time test? I didn't get the environment yeah.

A

Yeah, it's a yeah. It's.

C

A

C

Just very very quickly, videos there's echo.

A

Coming out of your mind,.

D

A

D

Mute if you're not.

A

Speaking, it might be the best way to do this.

C

Yes yeah, so we we keep the utilization to 95 percent.

C

In this case, we have a five percent bandwidth header room to absorb some more bursts. So in the case, the shuffles, like a the the number in the in the in the left, have very low latency because we don't build up the queue. However, if you look at very tale of this uh figure, we actually perform worse than the dcq scene for those long flow because they can use hundred percent uh throughput. We only use a 9 percent.

C

That's a difference.

D

A

You ignacio you're up next.

F

um Hi um we we just comment on the chat that this idea is similar to a protocol discussed in ippm group, the er aom protocol, it's different, because here you are measuring the the cues which is directly related to traffic and congestion, and but there is several issues that you can find when you try to do this in every in every step or or you you flow and the the question is, is if you already knew this protocol that we are discussing um ippm and how do you think that this?

F

The second second question is: how do you think that is possible to use your protocol when not all all the steps inside in the path in the flow paths are collaborating with your protocol?

F

Because if you are thinking that this.

G

F

Very slowly, uh okay, you! You will get this this intention. Thank you.

C

Oh, you mean, I don't get, the related work is mean or ippm.

C

So to answer a question I think for uh for hpcc, plus plus the up, we need a separate queue for our protocol. Just like ecn, you need a separate queue for easy and for tcp traffic as well right. So we need to have a separate queue for hpc, plus plus.

C

If you don't use the hpc plus plus for for some of the in-house you might, you might have to use different queue, so you will use the queues in the switch to separate different.

C

A

Let's continue this one on that, if you would like ignacio um stuart you're up next.

H

Hi, hey um just a quick comment. uh I think I understand the confusion uh that led to vidi's question uh and this happens a lot when people use the terms, latency and delay interchangeably.

H

This graph is not showing the latency uh in the sense of the per pack round trip time it's showing the total completion time for the transfer. So when you have really large transfer many many round trips, the actual per packet latency becomes less important.

I

Yeah yeah, that's.

H

F

H

More than per pack.

C

Yeah, the left figure shows the trans the flow combination, time, which means the translation uh delay for this flow, the hot. How much time for the network transmit this particular flow. Yeah certainly need a couple of round triple times for the for the figure in the right. It actually shows the the the delay of the network, because the left side shows the cdf of q lens, and so, if you, if you add about the transmission delay, this is the queuing delay right. So this is a queuing delay um distribution. You can.

C

You can translate into that.

H

Yeah, my comment was just because video was asking on the left graph why the lines converge above two megabytes, and I just anyway. That was my comments. Just oh.

C

I see yeah help clear.

H

C

A

Thanks stuart chris europe.

J

uh Relaying for ayush, what's the difference between the hpcc presented at sitcom and the hpcc plus plus, which is being presented here.

C

um Hpcc is more academia like a style.

C

There are a couple of different first one is we use per package int, but uh in the production once we, because once we, after the published uh on sitcom, we we work on the vendor to deploy this in in our production and during this procedure, we'll accumulate a couple insights or experience from deploying this protocol. That's come with hpcc plus plus, so it's more production ready and it's a standardizable protocol.

C

Now some of the design is too artificial in the second paper, so in the htc plus, plus it's more practical and design, and it's it's good to uh align with different vendors, because we uh when we deploy this protocol and then we actually talk to different vendors uh to implement these protocols. We talk to the unique vendors and also talk to the switch vendors individually. So that's the common ways that we want to standardize so that we can. We don't have to talk to each one individually, but we're talking a common language.

C

So that's the way we come up with this htc plus plus.

C

There are a couple of uh a detailed design in in the draft. um That's quite different with the paper version.

A

Yeah, I would encourage people to read the draft as well roland you're up next.

K

All right just a quick question about multiple bottlenecks. So did you consider multiple bottlenecks or are they occurring in your setup at all and the the network telemetry data? Does it allow you.

C

K

Identify the particular blocked neck for a specific flow.

C

um In the paper version, we consider the multiple bottleneck design and we actually, we carry int information for each hop so that we can actually know where is the conjunction and we have of a complete design to consider all the bottleneck, and we actually can change the scenic ray so both based on different policy. So we can see like under the multiple bottleneck we can achieve maximum fairness or proportional fairness or something between that.

C

It's our fairness we can achieve, but the later in the in the production deployment, we found that the simplest way is just to uh uh do the the the maximum fairness we don't store, because in multiple bottleneck cases we need to stop all the information for each hub. That actually causes more resources, but the the the benefit is very limited. The benefit. Just uh how? How fairness we can achieve, what what fairness we can achieve and how quickly we can adapt to the uh the fairness point.

C

But actually we we we care more about ut utilization convergence which which can avoid congestion, but we don't care much about fairness, convergence so because the flows are actually very short. In our case, we don't consider so that that's why we we make a very simple design uh that that works with the multiple personality case. Yeah thanks.

A

All right, um thank you, so much rui for that um I'll encourage this is this is uh a presentation which was uh in part there's some discussion on the iccrg mailing list. People seem to be interested in discussing and engaging in data center construction controllers.

A

So if people are interested in that, I would encourage you to engage on this more on the mailing list and I'll also encourage roy to to bring these up on the mailing list. We are trying to gauge how much interest there is in topics around dc construction control.

A

If there is interest, please bring it up on the mailing list. It would be good to know.

C

Yeah thanks. I have already uh copied these comments. I will reply in the email. Thank you. Everyone. Thank you. So much roy.

A

uh With that, we will move on to the next presentation of the day. Let me see if I can.

A

Robin get up.

E

Yeah mike is on yep all right um so for those people who've seen this uh yesterday, don't worry! It contains a lot of new information, so you don't get too bored uh next slide. Please.

E

So qlog stands for uh quick logging and its project uh started about two years ago, when we identified that the new protocols would probably get quite complex to actually analyze and debug in practice, and we're going to need some advanced tooling to do that, which would typically do for something like tcp, for example. Next slide is we take a packet capture somewhere in the network right and then analyze that using something like wireshark, for example.

E

This is still possible for quick, but quite a bit more difficult, because next slide quick of course encrypts most of its transport level metadata as well. So to do this, you would have to store the entire packet capture, including the very large payloads, and also the tls decryption secrets, um leading to obvious privacy and scalability issues. There's, however, a second, more long-standing problem, with the typical way that we do this next slide, which is, of course, that a lot of protocol information isn't always reflected on the wire.

E

That's, of course, always the case for congestion control information like the congestion window, which I don't have to tell all of you, and so um to solve these problems or to try and solve them for quick, we proposed a different approach, next slide, where the idea is instead of doing a network based capture, let's log, this information at the endpoints directly from the implementations, obviously or have all the salient data right there available and they can easily leave out the privacy sensitive parts.

E

This is, of course, not a brand new idea. Many implementations have some kind of debugging output, for example, but the idea of qlog is to have like a single format, a single schema that all the different implementations can use, uh making it easier for us to create reusable tooling.

E

uh On top of that, and so the next slide um qlog really is not rocket science. It's relatively simple. Currently, we just map this into json and we define how, for example, I receive packets containing an acknowledgement frame should look like, or indeed on, the right side, what uh you should call the variables related to, for example, just control updates using this type of log as input. We were then able to create quite a few indeed reusable tools next slide, which are available in the queue this tool suite. One of those.

G

Is, for example,.

E

Here the sequence diagram where, in the middle you see the packets going over the wire from client server and vice versa and their contents, but on the right side, you see implementation side updates, including, for example, when, when a probe, timeout timer was set, or indeed when the the bytes in flight was updated, allowing for very fine grains debugging of these these systems next slide.

E

The second major tool that we have in cubist is is a kind of you can call it the tcp trace for them for quick. Where, because of this approach, we we don't just show the data and the acknowledgements and the flow control. We can also show just the window bytes in flight and also the various round-trip time measurements that are used internally immediately and so for me, as definitely a non-expert in congestion control.

E

I really liked this type of thing where you see the the very clear correlation between the congestion window rising and the rtt going up with that as well. It's very interesting to see this so so graphically next slide.

E

This this tool has then been used by quite a few implementers. For example, this is from a blog post from cloudflare who explain in detail how they've used this to debug their initial cubic and high stark implementation for their quick stack next slide.

E

But it's also going beyond initial deployment. Initial implementation, debugging, for example, what facebook has done is they have deploys q log at scale in their data centers as well, and so they've been able to find quite a few, um relatively quick, specific bugs in their setup that were previously missed in lab testing. They only really surfaced during their deployment.

E

For example, one of the things they had was that they were underestimating bandwidth during the zero rtt phase, because they were relying on previously acknowledged packets, which, during zero rtt, you do not have previously knowledge packets, at least not once containing application layer data another one that is in a very interesting paper that I'll link to the bottom. There was that in quick, you change encryption levels and that is accompanied by you can do that with implicitly acknowledging all the packets from the previous encryption level and the way they did, that was they.

E

They uh mismeasured the rtt there. So rtt was only much much lower than it actually was because of the implicit acknowledgements again causing a wrong.

G

Bandwidth estimation.

E

But there you go: okay,.

E

Okay, so um the thing is that no no previous slide, please.

E

So the thing is that this has been used not just by um the experts or the people that knew that they were doing uh to great effect, but I think it quick.

E

We have this um very um special situation where you have a lot of quick implementers that are really not congestion, control experts by themselves and they have to try to make something working based on the drafts and looking at other people's code, and I think this kind of tool and approach has really helped them to get their things at least somewhat working um next slide, and I think that's that's one of the main reasons that qlog has found quite a bit of support within the quick community.

E

Most of the implementations actually output, this directly as well and, as I said, facebook is using this extensively in production.

E

It's because of this that the format has shown promise and the tools seem useful that we are now moving for adoption of this work by the quick working group next slide, which is intended to happen somewhere over the next months as part of the quick recharger one of the goals there is to flash out. All of this, for for quick, we have some basic adjust control stuff in there, but, for example, facebook. They have added a lot of custom events as well to help them better debug their their custom setups.

E

So it's going to be interesting to see if we how we need to update the default q log stuff to to match that. The second main thing- and one of the main reasons I'm here today- is that we're also trying to figure out if this can be extended to more than just quick.

E

Logically, of course, things like tcp or the multipath versions of these or things like mask or general tunneling, also, of course, have to deal with aspects of congestion control and can get quite complex, and we hope this approach could be useful for those things, as well with the ideal goal of creating tools that can be reused across protocols like slide.

E

We also have a few proof of concepts projects around that, for example, for tcp we've been playing around with ebpf and using k probes to get this kind of information bubbled up from the kernel. We then combine these with the raw packet captures from wireshark to get like a full view of what the tcp stack, for example, in linux, is doing, which has led to some some interesting observations.

E

Next slide, we've also been playing around with this for multi-parts quick, and we have an ongoing project looking at it from multiple tcp as well, where we tag each event with a specific path id. So we can split them out later and have a comparison between the different paths, of course, so it shows definite promise. It looks like we will be able to do this. However, we are also running against a few challenges or bottlenecks. Next slide.

E

For example, we've recently started a major push towards actual performance testing of quick stacks um and qlo. Currently, as I said, is json based, which is flexible, but it's also not super optimized, uh for example, in terms of file, size and so testing um performance on gigabit networks has turned out to be quite difficult to scale.

E

um There are, however, other people like, for example, big banks from microsoft, they're using a custom, similar approach, but they are using a more optimized format and custom tools where they are able to ingest much much larger log files and much much more information to help debug this kind of high performance scenario, so I think the general approach is is good. We just need to look at how we implement it.

E

Specifically, that's one of the discussions that we will definitely have for q log. One of the problems that I have there is kind of the main puller of this thing. Next slide is that again I am not a uh I'm, not an expert. I don't. I don't implement these things myself. I don't test them. I don't do research, um so I don't know what what like? What the the typical way of working for this is either research or actual deployment. I've seen things like this come along here in this research group.

E

This was from a couple of meetings ago, which seems fantastically interesting as a tool. It's unclear, for example. Here I don't know if you can actually log the individual congestion control windows or, if that's even useful, right um and so next slide.

E

The fact that I need more feedback became obvious earlier in the early days when I talked to jonah um asking you know how, how can we improve this tcp trace for quick, and he said the simplest thing you could do is add a type of thing, a ruler where I can just drag and drop across the data line here, and it shows me about what the data rate was and how how long it took, which is really really simple to implement it did in about one hour, and I would have never thought of that myself.

E

That would actually be useful in practice, so that kind of input that was really really useful for me to uh to optimize this tool. Now, where am I going with this final slide?

E

Because of course the itf does not standardize tools, and I don't want to make this whole about the tools, but I do think that if the goal is to have a format that can be reused across implementations and protocols, and that we can then build reusable tools on top of that, it might be a good idea to start from the tools and then work our way backwards to make sure that we are not missing things that are very useful in practice or are actually being used right now.

E

So we're looking for feedback now on formats and ideas you might have had there and also some things that you might have had as input from tools.

E

Most of this work for, for practical reasons, will happen in the quick working group, even though we're also looking for other protocols um as well, and we will of course have some very good input already, because there are a lot of knowledgeable people involved in quick with this as well. But extra feedback and opinions are always welcome.

E

There are using the giz help, of course, right now, as I'm interested hearing feedback and questions. Thank you.

A

Thank you so much for the presentation robin uh we have time for just a couple of quick questions, but I want to uh uh quickly say that I'm I'm grateful for robin for presenting this here and, of course, for the work that he's doing. This is super super useful for quick, but, as he points out, it's super helpful for for for transports as well. People who wanted to do and use tooling, that's um that's more modern, so to speak, will find this to be extremely useful. Okay, john john you're you're you're up.

I

uh Thanks, um I just wanted to ask what sort of compression algorithm you're using to get the 18 megabytes? Oh that's just normal, zip! Yesterday, okay, so presumably a more advanced conversion would get it down smaller sure.

E

And for storage and transfer- that's fine for now, but it's mainly if you want to load this. For example, rqvis tools are all web-based and your browser really doesn't like you uploading a gigabytes, json file or or or downloading the compressed version and then having to unzip it in the browser is also not something it likes to do so.

I

A

All right, it looks like we don't have anybody else in the queue, but oh jake, you're, a stretcher, okay, I'm gonna close the queue after richard, but jake you're up next.

L

Yeah hi, I I do think this looks very interesting. Has anybody started doing anything with this? For uh for tcp like this seems like it does have obvious.

E

um Not not concretely like we have the proof of concepts. Those are more like research projects. I do know that people at facebook and google have shown interest in applying this to tcp as well, and I think even apple as well to doing that. But I don't know of any concrete efforts yet, but I'm very interested in hearing from people.

L

M

Thanks so I would just wanted to uh basically chime into the same horn here, and I think it's very interesting to have a standardized way to lock these um congestion control internal events, especially useful for tcp, obviously, because right now at least I am faced with at least three different approaches: how to diagnose and troubleshoot tcp, and none of them are standardized, so they all need their specialized tool sets. They may need the recompilations of the stack and so forth.

A

Thanks for that comment, richard I'll say that uh at high level, um just two quick points before I let robin go one is that the q log tool, the q log schema, is different from the visualization tool and that's actually quite quite an important distinction here to your point uh there richard, which is that if you want to extend the with the schema to include more events in tcp and so on and so forth, that is an excellent uh uh that that would be very useful work.

A

I think I'm speaking for robin here, but at the moment in the quick working group at least this is going to be taken up and the focus is going to be what needs to be logged for quick, not for other protocols, but if there are extensions that you need to make or if you want to extend the schema for the future, that work would be very useful to do now, because I imagine that if you want to adopt this sort of work in other working groups, then having that work done before it gets, there would be very useful.

A

All right, I will uh thank uh robin for his presentation and martin. You get to jump into the queue and say something.

N

Yeah hi thanks john. I just want to say yeah as as responsibly for tcpm and psvwg.

N

I think I would be very interested in drafts for events in those protocols and other transport protocols appearing in those working groups to standardize in the q log framework.

A

Thank you, martin, and thank you robin to the next one uh anna I think europe. I.

O

Should okay, thank you jana, so I am going to talk about the congestion control in congestion control in the multipath context, and this is work that has been done as part of our work on multipath dccp.

O

But what I'm going to talk about today is also applicable for for other protocols, so the general problem of congestion, controlling congestion, control for multipath- and this is joint work with colleagues at deutsche telekom, city university of london and my colleagues at college university and in particular the results that I will talk about today- is from our phd student at college university, marcus piezka.

O

So next slide. Please.

O

So, first a little bit about the background and context, so we are working on multipath dccp as a multipath solution for transporting general ip traffic or udp traffic, and this framework is using dccp as the protocol.

O

So you have one dccp tunnel per path and the two main use cases that we have in mind for this is in the context of 3gpp and access traffic, steering, switching and splitting architecture that is being defined there for combining cellular and wi-fi networks, and also for the hybrid access use case in in home networks, where you combine the fixed and a cellular link for better performance next slide. Please.

O

So then, let's look at the the problem uh that we then encounter. If we have this type of multipath uh framework and- and this is generic for the protocol, as I said so, this tunneling solution, of course, results in nested congestion controls, and this can also be a problem for for single bath, a single path tunneling. But when you move into the multipath context, this encounters, of course, an added complexity and in our work we are using uncoupled congestion control over the the two parts.

O

In all the cases as the use cases here, we don't see the need for coupling the congestion control, as the idea is to to use the two parts and to be able to to aggregate them and use them for the better performance, and we don't expect the fairness issues to come into play here. So you see the general setup on the picture here you have a ue and then you have a proxy or a tunneling endpoint, where you in this figure have a downlink transfer.

O

So you have the scheduling component in this amp and then you have two paths: two multi-paths with a congestion control, a reordering component for the packets, and then you have the also the congestion control from the ue and the to the server next.

O

Slide so this scheduling and reordering components will have a very large impact on the performance in this scenario. So in this results that I will talk about today and also for a lot of the the work we have done here, we have looked particularly as at the cheapest path, the first scheduler or the the strict priority scheduler.

O

So you have a prepare preferred path and you send data on this path whenever you have a space available in your congestion window, and it's only if this preferred path is not available that you start to use your your other paths- and this is also, of course, uh we think a quite reasonable scheduler for the scenario that we are targeting, because you may, for instance, want to use your your wi-fi network and then only use your cellular quota if you, if the wi-fi network does not provide the sufficient performance or you use your your fixed network in the hybrid access scenario and then use your cellular network for improved performance when needed.

O

So this is a scheduler that can have benefits both from the user perspective and also from the operator perspective.

O

The reordering component can also have different uh functions, and I think marcus will talk, particularly about reordering in the the next talk, but in the work I'm presenting here we're using an adaptive time limit for for the reordering to determine if to pass the packets on or not as we may not have a reliable transfer also over the the tunnel paths.

O

Next slide, please.

O

So I mentioned that if we move to the multipath domain, we have some additional uh challenges uh for the congestion control in congestion control, and particularly uh the challenge in in this scenario, is to be able to aggregate the capacity over the two parts, so you're using one path as your preferred path, and the challenge then, is to actually be able to also use the second path when needed and in particular the end-to-end congestion control may react before and slow down before. You are able to actually start to utilize that path.

O

If we take the the next slide.

O

And here we have a example of this, so what you see in this graph is a time sequence of four different transmissions stacked on top of each other and for each one you see the the green throughput is for the preferred path and the red throughput is for the second path. So up at the top of the the topmost scenario, you see that things work as you would like it. You have capacity using both the first path and the second path.

O

The second scenario, from the top, on the other hand, does not work well at all you're not able to use any of the second path, so only the capacity on the first part is actually utilized, and then we have two examples where you kind of manage to use some of the the capacity of the second path and, in general, the the use here is going up and down a bit, and this is uh experiments with four different configurations that shows how different the outcome can actually be here, depending on a number of parameters that impact this performance next slide.

O

O

So the results that I'm showing here is using a user's base implementation of this multipath framework. So there is also a kernel level uh implementation developed by uh the colleagues at deutsche telekom available that mike has presented the last ietf in the tsvwd session, some measurements from there. But here we use a user space implementation which of course offered and quite a lot of flexibility in trying out different protocols and different configurations, and this usage base program.

O

Do you hear me?

O

Okay, something seemed to have happened there with uh good luck, I'm back! Okay! When did you lose me? So I was uh explaining that we are using a multi, the user space framework for this experiments, and this offers them a lot of flexibility in how to try out different scheduling methods and and different protocols. So, in this framework the packets are captured through the linux tune device, and then the framework encapsulates the packets with information like the path sequence, numbers and the timestamps of the packets, and then packets are scheduled over two single path sockets.

O

So this means that the framework as such is not tied to dccp. It also allows us to use other protocols uh as tunnels, so in the experimental results we have both.

O

Dccp and the tcp in the experiment that you will see in the next.

O

Line, and so this is the setup of the experiments on the end-to-end path, it's the ip tunneling, where we have tcp on top, so we have used tcp cubic and tcp bbr and in the tunnel.

O

Okay, so I think maybe I will turn off my video, because this maybe is messing up with the.

O

Okay, so, uh as I said for the tunnel, we are using using both tcp tcp neurino and tcp bbr, and then we are using a dccp with the ccid2 which corresponds to neurino and also a new ccid ccid5, which is a implementation of bbr style, congestion, control for dccp, and we have uh some base delays for the the two different paths on the multipath and for the added delay to the server 20 milliseconds on on the preferred path and initial 20 milliseconds to the server as a baseline and a similar trick path as a baseline, with 40 milliseconds to the second over the second path and the same bandwidth and same q.

O

Configurations on both paths. So next slide. Please.

O

So now some example results. So short here stands for the basic configuration and what we see here is the combination of different congestion, controls on the end to end path and then in the the tunnel, and the results are relative to the performance over a single path. So you see a percentage of the the flow completion time for downloading a large file here.

O

So if you're below 100, you have gained performance as compared to the single path case, and you can see here that the the performance varies a lot depending on what congestion control combinations you use uh in general. Bbr is performing better as a tunnel protocol, and this is as bbr reacts faster when you start to experience congestion, and there is also less loss over the tunnel with bbr, and you can also see that the bbr at endpoints, with reno in the tunnel, performs very poorly here, because bbr reacts before the the second path is used.

O

Next slide, please.

O

So here we have another scenario where we look at the impact of where you put the tunnel end point in relation to the the server.

O

So the short near scenario here, the the proxy is closer to the ue, and you have more of a difference between the round-trip times end-to-end and over the tunnel, whereas the the short distance, then the proxy is further away and closer to the server, and you can see that this placement of the tunnel endpoint also has a large impact on performance and having the proxy closer to the user, to the ue as expected is typically beneficial, because in this case you have more of a difference between the two control loops.

O

When you have more difference in the rtts and in particular, you can see that if the two rtt's are similar, then bbr over bbr performs quite poorly next slide.

O

Please this is taking a little closer look at this particular scenario of the bbr uh over bbr. So what you see on the x-axis here is the end-to-end rtt and on the y-axis you have the portion of that rtt.

O

That is within the tunnel, and you can see very clearly here how this has an impact on performance and if you have have a similar or similar rtt between the the two paths, then bbr here is reacting with the similar attempt to fill the bottleneck and is not able to move the the traffic over to the to the second path.

O

Next slide, please, um okay. So to summarize, um the congestion, control and congestion control aspect has a lot of impact on the performance on the the multipath tunneling problem and there's a lot of factors that interact here. So the results that we saw here were particular to the the scheduling mechanism of cheapest pipe first, and it also has a large impact. What the ordering mechanism you use the different congestion controls have quite the different interactions.

O

uh The placement of the proxy, of course, is also quite important, as well as the the path characteristics and if we should have some some first general conclusions uh from uh the the work we are going, uh we are working on. We can see that overall, bbr performs better than reno as the congestion control for the tunnel and having the proxy close to the user as expected is typically beneficial and we are actively working on this playing with the different parameters and analyzing the the various interactions.

O

So I hope that more results will also be coming and with the next.

O

Slide- and there are also a number of drafts that are related to this presentation, so there's a draft for the multi-path tccp protocol. There's a draft related to the the framework multi-path framework, some draft related to how to send dccp efficiently over udp.

O

There is a draft on reordering that I think marcus will talk about next and there is also a draft on multipath schedulers, and with that I am happy to have questions.

A

Thank you anna. Let's keep this quick because we have questions after marcus's presentation next, as well uh I'll close the mic line here after gauri, but martin you're up.

N

Thank you anna. This is very interesting. um How applicable do you think these results are to non-multi-path scenarios that have congestion control in congestion fall like mask.

O

So I mean the the particular problem of of using the second path, which is the main challenge here will not happen unless you have have multiple paths. Of course, uh some of the some of the aspects that we see here, for instance the relation between uh the different control loops in terms of of rtt and the impact of what what the congestion controls you use in the tunnel versus what you use and will also come in to play in the single path context. But the results are not directly transferable. Of course,.

A

Gauri you're up next.

O

We don't hear yogori.

P

uh On side 11, I was curious with your heat map and how this would play out. If there was a much larger rtt at place, you know, is it so so can you just talk me through a little bit more about what's going on.

O

Yeah, so what you see here is that the end to end rtt is going from a very small rtt from you know.

O

I think the smallest value used is 5 here and up to about 100 milliseconds and then on the y-axis. You see what portion of that rtt is in the tunnel.

O

So, as you go up on the y-axis, you have more and more of the rtt is inside the tunnel, which means that the the proxy is further away from the ue.

P

Is the blue hole around 20 just an artifact of the analysis.

O

uh The blue hole around 20, I'm not sure what you you mean up in the top there yeah yeah, the top yeah.

P

O

That, I think that's just a random effect, so I mean what we have done here is that we have sampled this space with the number of measurements, and so each measure of each measurement point here is, is uh you know not repeated many times it's like 400 samples over the space to create this heat map. So there is.

O

There is some noise in in this graph, of course, but you can see uh quite clearly that the uh yeah that the blue space is quite separated from the red, so you can see quite clearly the impact of the two control loops and and the difference between the rtt's in the two cases.

P

Thanks, that's really nice! Thank you.

A

Thank you so much anna um we'll move on to the next presentation. This would be the same space, but I'm gonna bring this up marcus.

Q

Europe all right, can you hear me?

Q

Yes, very good yeah, so I want to talk today about the multi-pass sequence maintenance. What this means. I will elaborate in the next slide. So next slide. Please.

Q

Yes, starting with how multi-path typically works, which components are usually employed, so let's have first to look into the the picture so on the left. There's a sender on the on the right. You have a receiver, and in between that is now very much related to multiples. You have at least two communication paths and on center side you have a scheduler which is responsible to distribute traffic across the multiple paths, so you can have different logics for this and you will find the number of logics already described in the iccrg scheduler draft.

Q

While on the receiver side, you probably have a reorder reordering mechanism, so here is called reordering queue and that's exactly where we want to focus on today on. How can we ensure the sequence maintenance when it comes to to multiples between sender and receiver?

Q

A multi-pass network protocol takes responsibility to allow the communication between sender and receiver, and typically in such scenarios when it comes to heterogeneous environments, you have a latency delta between the multiple paths employed in the communication, and that is exactly the issue um where I want to talk about so with this latency delta.

Q

You can imagine when we simultaneously use the two parts, so maybe we send the packets in a round robin fashion that will cause out of order delivery on receiver side and with that it completely differs from the characteristic of a single path communication, because in a single past, communications you're only dependent um from from from the latency of this single path.

Q

So keeping it short here having a latency delta between multiple paths causes some trouble and services with a certain expectation on data sequence and consistency, will experience issues.

Q

If you look at itf, we see there are multiple multi-pass protocols defined or available as a draft that is multiple tcp, multi-pass, dccpa, multi-pass, quick, cmt, sctp and so on, and would be interesting to see in the next slide how they behave uh when it comes to out-of-order delivery and which mechanisms they have um implemented to to overcome this.

Q

Nevertheless, typically scheduling and reordering are not part of the protocol specification and it's left with the implementers um to to take care and that might cause trouble if an implementation is dependent on protocol mechanisms.

Q

We will come to this in the next slides, so for mptcp. Reordering is simple to do the strict reliability of tcp. So it's a trust, trust to weight approach on the receiver side.

Q

But if you look into the next slide.

Q

Then we see that there are a number of multi-pass protocols which claim to provide no strict, reliable, in-order delivery and that is, for example, the multi-pass dccp. It's also the combination of a concurrent multipass http, combined with the partial reliability functionality from sctp.

Q

It's also the multipass quick when it becomes combined with a quick data cram.

Q

So from from today's perspective, I think they should consider sequence maintenance in their design, but I from my feeling uh that is not complete completely given right now, um so for the three protocols I mentioned on this slide, the dccp http and quick when it becomes combined with the unreal of unreliable transmission, um then strict reliability is, is not an option. Obviously, um and the solution, if we talk about sequence, maintenance will be a trade-off between maintaining the data sequence without interrupting the data flow.

Q

A special challenge in this scenario is when it comes to to packet loss, and I think it's likely when, when we talk about multi-path transmission, that uh we will see packet loss to some extent and then this is all always combined with the question. So how long do we have to wait in a potential reordering mechanism on receiver side and at one point at which point in time we can assume that the packet is really lost and we have not to wait for it any longer.

Q

So, during the multi-pass protocol design, this has to be considered and possible measures like different sequencing schemes, so, for example, sequencing scheme for path and separately for the multi-pass connection or a sender, receiver signaling have to be taken into account. I will come to this in the next slides um yeah. With this draft we have available as icci the multi-pass reordering craft now, in version two, we claim to cover all these aspects and provide guidelines for design and implementers of multi-parts protocols next slide. Please.

Q

Yeah, okay: let's have a deeper look into what we have specified so far in the draft, so we discuss uh several mechanisms to support, smooth and adjustable in order delivery for multi-path communication, and this draft scheduling is out of scope. So scheduling uh maybe also provide some some measurements to to out to overcome out of order delivery on receiver side. But we see is this rather as part of the scheduler draft available at iccrg.

Q

So we discuss, for example, resequencing mechanism with which we want to keep the generated sequence of data at receiver side and there we discussed multiple functionalities or multiple logics. So the first is the passive one. So we trust forward packets as arrived for sure that will not reorder at all.

Q

Then we have the exact mechanism that is similar to tcp and provides strict reliability. Another mechanism is the static expiration. That means we wait a certain time for missing packets until we assume a packet loss and any packet which arrive in time will be reordered.

Q

Adaptive expiration is similar to the static expiration, but we do not have a fixed time threshold. So we dynamically adjust the time uh on how we, how long we wait for a missing packet, depending on the rtt, for example, or the latency on the part.

Q

Then we have delay equalization strictly spoken. That is, is not a reordering mechanism that just delay the faster packets to match the latency of the slower part. And last but not least, we discuss fast packet loss detection. That is probably something which can be combined with the other logics and that level the part and connection sequencing to to have a very early idea um about if a packet is lost or delayed next slide.

Q

Please then, we have two other areas identified which can help to overcome out-of-order delivery, that is, recovery and re-transmission in the recovery area.

Q

We see that we can overcome packet loss when we spend redundancy and in that area we see forward error correction but also network coding, and when it comes to re-transmission so overcome packet loss by by re-transmission. We also see you see three different mechanisms which can help here. On the one hand, we have the signaling, which is, for example, used in in tcp or dccp to signal outstanding packets to to the sender uh anticipated means.

Q

We predict a beneficial uh early retransmission for for the reordering reordering purpose, and last but not least, we have the flow selection, so the abil ability to retransmit packages on a path different to the original one. If this is if this supports the reorder, as I said, combinations of mechanisms are, in principle possible and most probably useful next slide.

Q

Please yeah. First of all, I want to invite you to to contribute to this draft, uh where I think that this is a applicable to many of the multi-pass protocols um discussed or standardized at iitf, so that the draft itself is still under development and some content is not finalized.

Q

However, you have got an idea today on which mechanisms yeah about which mechanisms we think and my question today is: is there any mechanism which is missing so far, and my second question is how to proceed with this trough? Do you see any value in that yeah and with that I'm ready for today and now, I'm would like to see some feedback.

A

Thank you marcus.

A

I don't see anybody at the queue and I don't want to spend a lot of time waiting, but I'll give it just 30 seconds if somebody wants to come up to the mic and offer feedback in the meantime, I'll I'll offer marcus that your question about uh how do you want to proceed with this draft? I think the one thing I would recommend is is is engaging the group on the mailing list and and seeing if we can generate discussion.

A

That's always a good way to to to to get people interested um andre is in the queue europe.

A

Andre andre, we cannot hear you.

A

I'm not sure what's going on there, but.

A

R

Myself, can you hear me now.

A

Yes, yes, go on.

R

Okay, sorry, um andre bondi software performance and scalability consulting in new jersey. My question for marcus is: how do you go about setting the adaptive expiration time? What are the variables that go into that? Is this something that you're going to do at connection setups, that you can establish what a floor on the latency would be? And how do you go about increasing the expiration time? Because if you make it too large, uh there's going to be an issue and then there's a problem of degraded throughput.

R

If you assume that you have a longer latency on the slower pipe.

Q

Yeah very good question. Thank you for that um short answer. Here. We have some implementation in the multi-pass gccp available for that and there we continuously update the timer with rtt information we get from the sender during transmission, so we have some signaling mechanism implemented for that and second point: you are totally right. We have to set some boundaries, so it doesn't make sense to let the reordering queue grow to an infinite value that would slow down the total throughput. So one measure could be here to set a manual limit.

R

Yeah on the signaling is that going to be in band or out of band, that is to say along the connection, the payload connection on a separate connection.

Q

Yeah yeah. That is exactly uh why I think we need this draft, and that is a question which should should be mentioned there and then it's on on the particular protocol to think about. Should this be something which is in or out of bed, so we we would not give a final or I don't see that we have to give a final recommendation to a particular protocol. That is, we just give some guidelines and and such questions you have put now, um should this be something protocol specific or out of band or whatever.

Q

That then has to be decided um within the particular uh working groups.

R

Okay, it may be service specific, just the way one uses out of band signaling for circuit switch telecom ss7, for instance,.

Q

Oh, that is, that is now very specific. um Ss7 from my knowledge, uses stdp right.

R

Yeah, it just goes along on a separate trunk. Actually, this is old stuff.

Q

Yeah yeah, okay,.

R

And then really old stuff, like 30 20 30 years ago,.

A

Yeah, I'm gonna jump in here marcus. Let's, let's take that question on to the chat yeah.

Q

A

Do need to warn, but thank you for your questions, andre very much appreciate it sure I'm going to the next presentation and thank you again, marcus and anna for your presentations. I think we should continue the discussion on uh on on the list uh coming up next bbr neil, take it away.

B

uh Let's see, can you guys hear my audio.

G

G

Okay, great um all right, uh so thanks, jana yeah. We just wanted to give a uh quick update on uh the bbr related work. That's going on uh in our team at google, uh and this is joint work uh with my colleagues uh listed here.

R

G

G

So I think this will be a a shorter talk than many of our recent talks at the itf just wanted to give a quick update on the deployment status at google, where we're nearing completion for internal tcp traffic, uh give a quick update on the alpha open source release on github talk about some plans with respect to internet drafts and then also talk about our continued work on uh what we're calling pbr netswift uh and, as always, you know we're just offering this in in spirit of offering our experience uh with these kinds of experiments in deployment.

G

And of course we are always um looking for feedback or test results. Issues people run into any ideas or patches folks want to contribute would be, would be great. Next libraries.

G

So, uh in terms of the um ongoing deployment of the bbr v2 algorithm within uh google for google internal traffic, this is coming pretty far along right. Now, we're in the process of deploying vbr v2 as the default tcp congestion control for internal.

B

G

Traffic and uh we've gotten to a point recently where uh it's used as the congestion control algorithm for over 98 of the internal tcp traffic, uh as measured by traffic raid or traffic volume, um just to be clear here so for this internal traffic, uh we're using a number of different congestion signals.

G

uh You know we're using the core uh bbr uh approach of modeling the bandwidth and running around trip time, but we're also using ecn and loss as signals as well and as we deploy this we're seeing some latency reductions at the tail for rpc traffic, and this is as compared to the previous uh congestion control, which for internal traffic, was um based on a shallow threshold, uh ecn algorithm um and then we also have ongoing work for looking at bbr v2 for google, external traffic, so basically uh youtube and google.com traffic over the public internet to end users, and that work continues.

G

We're seeing some reduced uh camera delays um and reduced packet loss versus vbr v1, um but we're still not quite where we'd like to be, and so we're continuing to work on that uh and, of course, we're continuing to iterate in internal lab tests and experiments as well. uh Next slide, please.

G

So, in terms of the status of the algorithm and code, as we've mentioned uh at the ietf a few times before, we've got a release of the both the quick and the linux tcp code. That's available. The linux tcp code is on github and we've made a couple of recent um minor updates to that code. Rebased it onto a more recent version of linux. For those who are interested and posted a few uh minor bug fixes, um and we think the the pbrv2 uh alpha releases is ready for experiments and.

K

I think there have been.

G

uh Reports, you know over the past uh year or two from other folks in industry and academia who have taken a look at it, both in production settings and in lab settings, and there are just some links in the slides to previous talks where we've given more details about the algorithm and the code and how it behaves next slide. Please.

G

So we um there have been a number of requests for updating the apr internet drafts they the ones that are out there currently uh document the version one of the algorithm, um and we are now uh planning to go ahead and uh update those to reflect uh bbr v2 um and our goal is to get those out there by the july ietf.

G

So we can uh present those and discuss those, um and you know the idea, I think, is to uh just replace the drafts that are up there with uh drafts that are targeting bbrv2, hoping that that will make it more clear that um these are sort of replacing the the earlier drafts um uh next slide.

G

G

So we did want to mention uh briefly that we are also continuing uh another thread of research work that we discussed briefly at the november ietf on a an approach, we're calling for now bbr.swift, which basically leverages approaches from the swift congestion control algorithm, uh which was presented at sitcom 2020, where the approach basically uses uh the network round trip time as the primary congestion signal and there the the main motivation is that it gives you uh for environments where it's available.

G

It can give you a richer signal with more information about the current degree of cueing along the path which has sort of two advantages. One is that it allows faster reaction if there really are long queues right now and then the other is that it allows you to help avoid overreaction when the cues may be persistent, but might be short. For example, and you know, there's ongoing work.

R

G

Preparing for production testing doing some lab testing right now is the main focus, um as we mentioned. uh Tcpm part of this uh research and development effort includes work to provide time stamp information in the tcp options to provide the sort of detailed, fine grain and more accurate round trip time measurements that you need for a scheme like this. So there's a link to the extensible time stamp draft that we put out last fall and ultimately, the goal is to allow this as an optional approach for contexts where.

R

G

Target network round-trip time is known, which might be the case, for example, in data centers, and also where cases for cases where you know that the other traffic sharing your bottlenecks is using an algorithm of this same type, and this might be the case.

G

For example, if you have separate uh quality of service cues and you can isolate this traffic to its own queue, to avoid interaction with other classes of algorithm, and we do ultimately want this to be usable for physical machines and virtual machines, which will take some work to plumb the time stamp, information up and down the stack to make these timestamps available, but that is kind of the long-term goal uh next slide. Please.

G

So, um just in conclusion, wrapping up uh you know we're uh continuing to work on both bbr v2 and this newer approach. uh Bbr swift uh and we're finishing the rollout for internal tcp traffic for bbr, v2, uh continuing to iterate on external traffic or public internet performance and are hoping to release an internet draft in july and as always, we uh invite um feedback or test results, uh issues uh patches. Anything like that um and uh next slide. Please, and I think uh that's that wraps it up. Also, if there are any questions.

B

Thank you, neil. We have a few minutes or.

A

Questions I am going to try a new experiment this time, which is that I will cut off the q and a at uh at uh in about in just under five minutes, uh doesn't matter who's in the line. So I'm not going to cut the mic line, but I'm going to cut off the q a all right omer. You are up.

S

Can you hear me yes thanks dan? uh Can you share? uh Do you have any data on what kind of end devices bbr v2 is deployed on, for the external users, are chromebooks uh androids uh or something else, and uh but interests me the masters, how bb2 interacts with the modern, modern cellular networks?

S

If you have that, if you, if you have information to share, thank you.

G

Sure yeah so um to add some details there so where we're deploying bbr v2 um in for the types of traffic you're talking about is for the google.com and youtube servers that are sending traffic out over the public internet to end users, and um so since this is uh basically um talking about all of the users using youtube and google.com or currently just some a small percentage of them for testing.

G

uh This basically should be a cross-section of of every kind of device that connects to google and youtube, um and in our experience of course, that's a pretty diverse set that largely these days, I think largely the the dominant um bottleneck technology is wi-fi, but obviously we do have a lot of cellular users as well.

G

On the cellular side, I think it's still mostly um 3g and 4g. Although we're starting to see you know obviously some 5g trickling in now, and so the I don't have any numbers to share uh with you today, um mainly because this uh the public internet aspect is still a work in progress, um but I can share that.

G

Definitely we do spend a lot of time looking at the performance of both bbr v1 and v2, for users that have cellular or wi-fi connectivity, since that is such a dominant uh slice of traffic for for google and youtube. Excuse me does that answer the question.

S

A

Anna europe.

A

Anna europe, I don't know if you're unable to get yourself unmuted.

A

And you're out of the queue.

A

All right well, I will uh thank neil then for his presentation, and I will. uh I won't speak for everybody here, but I speak for many people that um very excited about the the update to the draft. Looking forward to reading updates to the draft to the pbr draft view.

A

And that is now oh, give me a second.

A

A

All right, gori get up.

P

Hi, can you hear.

B

P

Excellent okay, so um this short talk is going to look at zero rtt parameters for quick, basically to exchange transport parameters, to let you do something different in congestion, control and there's a draft. It's a revision, seven, and it's with these people on the side, nicholas emile tom and me next slide. Please.

P

So this is a draft that tries to deal with paths which are not typical, so we're talking about paths that have something that's different in them and maybe they're slightly higher in delay. That could be many tens of milliseconds. It could be many seconds. They maybe have a very large bandwidth delay product.

P

They maybe have a sub ip layer that is on demand and therefore, when you resume a connection, you get a different capacity, or maybe you usually get the same capacity, but sometimes you might get a different one.

P

These paths are typically also have other optimizations, such as asymmetry, improvements and to get their overall efficiency at an acceptable level.

P

If you have paths which are non-typical, then there's two options: either you dynamically learn that the paths are non-typical or you have information that lets you customize. The protocol stack to make it work, and you can do this to mitigate the effects of delay, the bdp capacity asymmetry.

P

Our focus was primarily on satellite paths and that now covers a very wide range of paths, and we focus just on geo in in the initial work here and but you might see other paths that have similar needs and I think that's one of the important things I'd like to kind of bring up here is, if other people are seeing, paths that have other needs, maybe slightly different characteristics. We'd love to talk to you, because we'd love to make sure whatever we propose here actually works on across a variety of different paths.

P

Next slide, please.

P

The context is to define some transport parameters as extensions to quick, and these are shared during the zero rtt phase, basically allowing resumption using additional transport and connection properties discovered from a previous successful connection and what this is a lot like. Tcp control block sharing, but it's also different, because it's designed for quick.

P

We hope that the information that's provided is useful for optimizing client requests.

P

There are cases where your web browser automatically prioritizes different pieces of information, maybe for the web client, that's a well-known bit of technology that many vendors have already have in their products, but for other applications such as dash and probably for a vpn applications, etc.

P

There will be ways to optimize the way in which the network's used by the clients, since this is iccrg.

P

The core thing we're going to talk about is using the information to make a jump in the sea wind, the congestion window, so that you, instead of starting a session with a configured large initial window or with a normal small initial window. You choose something which is based on previous history and use that to initialize a safe sea wind and, like any other method, that's used with tcp, for instance, we'd like this information to be shared across multiple connections.

P

This is not a new proposal, but it's a proposal which we'd like to make concrete for quick next slide. Please.

P

And why is this important? If you have the sort of bdp and uh delay that you might see in a satellite geo environment, then it might take you many seconds to download something which you could actually send using tcp and in maybe a small number of seconds.

P

So quick is adding, maybe two seconds of extra time in a typical configuration here, simply because there is no pep involved optimizing the protocol.

P

So maybe we can do much better, and this this slide shows two methods which could be used to improve performance. That we've tried, in a spreadsheet analysis, using a a little tool. We have to look at different combinations of parameters and we see the orange one is a jump to 25 of the last window and then two rounds of rtt to get to the full size window and green, a high jump method where we delay it by one rtt. So we make the jump more conservatively.

P

This range of options in the orange green area all the way across the glue area, where you can trade performance against conservative congestion, control, behavior, and that's the good reason for presenting this in iccrg, because a lot of the issues are concerned with how best to adapt. But before you adapt, you need a method of having the data about the previous connection.

P

P

And the way in which to have this information, we believe, is to get the server to store parameters in a bdp extension which we then communicate from the server to the client. The client gets visibility of the information. It may also get a encrypted token.

P

The encrypted talk can be returned back to the server, so the server could be stateless. If that's the design, you want and simply receive previous information and about a floor that it had previously struck with the same client.

P

When you come to the second connection, to the same server, you can reinitialize the information and, of course, there's a possibility that the path to the end point could have changed as a point. There's a path, change possibility in the amount of capacity that you might have of these two.

P

The most dangerous is the change of the path, but um the method we propose will validate the rtt against the previous rtt and we're suggesting that we include some form of pacing when we initially start a new um higher congestion window, and in this way, if there is a big rtt change on the path, then the damage that's created is very much limited and we believe could be made safe for wide-scale deployment and therefore something that might be interesting to standardize next slide.

P

Please this is the set of metadata we expect to put in the bdp metadata um three parameters: bytes in flight minimum rtt encountered, which is partly to configure as a safeguard, but also to configure a pacing interval and one of the issue. One of the problems is, is initializing the rtt of the pacer, so this information is quite helpful in getting a good response and the maximum packet number encountered.

P

So we say with these three pieces of information, can we now jump safely next slide.

P

Please and next slide: we don't need to talk about this.

P

um Well, we've approached this in various ways. One way uh was to perform some implementation work in pico, quick and um there's a github that you can use to access this. This is primarily focused on the exchange of the cryptographic information at the start of quick of a quick connection, so that you can actually get the bandwidth uh parameters exchanged and it focused on a very simple easy to implement change to the congestion controller. We have a more advanced version of how we expect that congestion control um update to occur in the draft.

P

So please leave there to find more details about what we actually suggest, but it's clear even from these three simple results: that, with out the option, it took four point three seconds to exchange the two megabyte chunk of data on average as the median value with the zero rtt enhancement 3.4 and with the zero rtt vdp and 2.9 seconds, so at least saving what we would see um as people evaluating satellite links, a significant proportion of the download time- and these are uh for modern satellite length running at 50 megabits per second next slide.

P

Please, uh with also looking at uh how the client might use this information, because um the server can always optimize and it be nicely optimized in a predictable way. So the user didn't have to be concerned about it, but also there's a possibility to optimize the client if it knew about the likely bdp aspects of the path it's using and we did some work in 2018, which is published in the netsat days um conference.

P

It looked at dash and we showed that using a dash client.

P

We could take this information as one of the inputs to adapt the results to produce a much more predictable performance in this case, trying to avoid um the strange behaviors that happen uh when your predictor gets the uh capacity wrong and therefore vastly under underestimates the amount of capacity you've got because you've got a larger rtt and there was talks in the irtf open about the various ways in which um dynamic adaptive streaming players can play out, and this is kind of like one of the input parameters and a good example of how um knowing something at the client can help.

P

You actually make better requests at the application layer. Next slide.

P

And this shows a number of different plots on the same graph, I'm first of all going to talk about the blue plot. This is redo and the reno behavior is that the congestion window opens in more or less steps stepping up each time it gets a round of apps and for a longer rtt path.

P

This basically controls the amount of capacity you can get for small to medium sizes of exchange. This is totally the dominant factor, rather than the amount of bandwidth available. When you use something like quake, because if quit uses reno, there is no protocol enhancement along the path.

P

So, if we look at the jump scenario, we see here a different case. This is where the congestion controller is initialized with a previous rtt measurement, and we chose in this case to initialize, with 25 of the bandwidth.

P

That's 25 of the previously used capacity. That means the following: rtt: there is a step up to use, half the capacity and then another step up, etc. Until after several rtts, we've used the whole capacity now, it would be possible to jump immediately to use the full capacity, but if you're, a congestion control person that might frighten you and because it would cause severe congestion against any other flaws. That happened to be present at the time when your new flow started, rather than when you previously measured your available capacity.

P

So we decided to initialize to around 25 and again and something we're happy to experiment with, and there are ways of making this more conservative. We know this next.

P

Slide slight variation: here, I'm going to talk about the pink and gray curves. The pink curve is a method we call high jump which delays the arc, the the increase- and we can talk about more of this in the draft. We also have high jump paste the gray curve.

P

What I'm trying to say here is that there are a range of congestion control decisions, all of which are um safe to some extent.

P

We would claim safe enough and a much better performance than than reno the hijab paste and paces the packets out at the rate determined by the previous capacity, and we get this more linear growth in the use of the capacity, and we could talk more about that, but I can't because I need to move on so next.

P

P

We talked about the client being able to use the information, so this is not just a server-side decision, which is why we're trying to standardize the format of the transport extension next slide.

P

And we've also looked at the security of it they the way in which to exchange the the parameters, the way in which the path cannot modify. These parameters are important to us and we believe that we have a safe mechanism here that works with tls 1.3.

P

P

There's some interaction, um emil is our tls person um and there are probably things that we should discuss between quick and tls to see how this initial exchange of data should best be handled, and I think some form of synchronization between the two working groups is important.

P

This isn't a congestion control issue, it's more a security discussion, but but it's still an important part of the design of the mechanism. Next slide.

P

Congestion control safety is the thing which probably is most important for this group, and if we can standardize or adopt a way of exchanging this information, we also need to adopt a way of safely using it and we're, assuming that any method we use here will have a way of backing out quickly and efficiently. As soon as there is loss detected and the cw d has been artificially increased, so um we will. We will expect to quickly back out of any problem, but um do we need a draft on congestion control safety that updates 6928?

P

Maybe this might be something useful to set the boundaries here next slide. Please.

P

We could use a new bdp extension specified in quick, and we are wanting to do that part of the work in quick, but there's obviously also a congestion control piece, which is why we're trying to bubble this up here in this group and attract some comments so trying to stick roughly to time. I'd like to take comments now, if possible,.

A

um Very quickly, can you meet your mic there, so I I don't know that we have time for q a we are already at time, um but I'll encourage people to thank you for presenting this here.

A

I want people to engage on this question and I'll take my my my moment to basically say that this is something that we expect the the the us at a specific mechanism, but the idea of recording and reusing constitutional control information is something that we expect will happen in quick, because there are places to store this information at the client. This is something that is different from pcb, where a server can actually encode this information and ship it off to a client and then use it on the next connection.

A

When connection is established, which makes it much more likely that something like this will get deployed by quick implementations. So it is uh much more important now that we actually engage on this topic. Iccrg is the right forum. Perhaps we can continue this discussion on the list and at subsequent uh at the next meeting as well, but I'll thank gauri also for putting this together very quickly. At short notice, do you want to say something cody yeah.

P

I'd just like to say that we are. We are super interested in not doing this as a group of satellite engineers, who've worked on peps and enhancement and modelling of really long delay paths, but to do it within the ietf where we can get other people involved in this. This is a super interesting place where we can actually collaborate between different people and we'd really love to get feedback on this.

A

Thank you so much gauri again for the presentation. One quick announcement bob just announced that uh a new draft uh describing product construction control has been posted. Please take a look. uh We might end up discussing that at the next idea at the next iccrg meeting. um Thank you again everybody and enjoy the rest of the itf. We'll see you next time.