Internet Engineering Task Force 108, 30 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF108-ANRW-20200730-1410

Description

ANRW meeting session at IETF108
2020/07/30 1410

https://datatracker.ietf.org/meeting/108/proceedings/

A

Okay, um we're ready to start the next session uh at least one I'm back. We also have a lot of participants, um so um this is the third session for the applied networking research workshop. uh We have today there's another one tomorrow and for those people who haven't been in the first two sessions, uh we quickly announced a couple of things again so on the next slide.

A

We definitely want to thank our responders again. um This virtual meeting has less cost than usually, but without the sponsors it's not possible. Next slide.

A

And then this slide, you find all the slides in the proceedings uh in the data tracker, so you can also go there and download the slides and look there, because here you find um the link to join the slack channel.

A

We have for this workshop if you haven't joined yet um this slack channel is particular um useful when you want to have discussions with the authors after this session, because right now we have the chat in session, but this goes away after the session um all papers and the whole program is also not only in the data tracker, but also on the workshop web page and the proceedings are for free available from the acm library.

A

One quick note. um As all itf sessions this session will be recorded and put on youtube afterwards. So please be aware of that. If you want to say something or join the ceo, whatever next slide.

A

um Then we will also for the question answer session. We will use the queueing function provided in meet echo. If you want to ask a question, please join the queue by pressing this little button, which shows a microphone and a hand which you find below your name in the upper left corner and there's also more information on how to use mid echo um somewhere on the ietf webpage or you use google to find it next slide.

A

Okay and that's where we are right now, we are in the um third session. We have um four talks today. So if everything went well, we have a little bit um slack time at the end and we can take some more questions or we can. You know finish the day earlier. um We have two long papers and two short papers so actually two, we call them short papers and physician papers and the first talk will be held by corey and eldon.

A

Korean ellen is a post-doc at the university of liege.

A

In his phd thesis work, he was working a lot on network measurements and protocol designs and we worked together in a project actually over the last couple of years and now he's focusing his work more on service assurance for intent-based networking, and this talk will focus on path brokenness in tcp, so that's quite exciting and we can start right.

B

B

Hello, everyone, my name, is cory adelin. I am from the university of liege and this study is entitled evaluating the impact of past brokenness on tcp.

B

So I will start first with a short introduction on what is passwordness.

B

The idea of passwordness is based on the end-to-end internet architectural guideline, which has been first laid down in an rfc in rfc in 1958.

B

Well, the idea is based on this conceptual opposition between dumb networks and intelligent endpoints, and by pushing intelligence to the endpoints you supposedly guarantee and maximize flexibility and extensibility of the internet. So this is an example of an end-to-end path.

B

Of course, more and more intermediate devices do not conform to this principle and apply various policies up to the application layer.

B

Those devices have been coined as middle boxes by rc 3234, and they introduce a phenomenon also known as ossification of the network infrastructure.

B

A lot of studies have shown that this phenomenon is a major for the deployment of new tcp features and for the deployment of new transport protocols.

B

To study how they break pass, we introduced the concept of mass condition, which is a description of an action performed by middle box on a packet on a given path.

B

First, you can have a blocked feature, so if a client tries and establish a tcp connection with the feature enabled, but the middle box blocks it and then, if it retries with the very same packet and the very same feature disabled and then the middle mark forwards it. You have a block feature.

B

You can also have a remove feature, which is the soft control part of the block feature.

B

So in this scenario, a client tries and establish a tcp connection by sending a sync packet that contains a given tcp options, but when the middle box receives it, it removes it either by placing it with zero or by shrinking packet size and forwards it. So the feature has been removed and, of course, obviously you can have a change feature when you send syn packet, which feature value. A a middle box requires it from a to b, so the server believes client sends feature value b.

B

This is a change feature, so why does this happen to well? It can be the action of a lot of different policies which can have security or performance purposes, even for packet marking, which is not strictly speaking, a middle box behavior, but that choose to consider also because it has very similar consequences.

B

So those past conditions they can lead to pass impairment which is the actual past brokenness and which is the middle box, induce consequences in term of connectivity issue or decrease, or shortfall of quality of service.

B

For instance, you can have blocked traffic if you try and establish tcp connection with a feature enabled which is blocked by a middle box, and you are not configured to retry without it, which is now known as a fallback mechanism. Then you have block traffic.

B

You can have a disabled feature so here both endpoints try and advertise their support for the selective acknowledgement option, but the mailbox removes it in both directions. So now feature is disabled in both.

B

Directions, you can have a description of the negotiation when, in this example, client and server advertises the windows scaling parameter value which is being rewritten systematically by a middle boss to fire.

B

So it doesn't disable the feature, but it disrupts its negotiation.

B

Phase and finally, you can have traffic disruption impairments, so this is a little bit more complex. So first client and server establish a tcp connection with sac supported in both directions.

B

Then the sender sends three data packet of 20 bytes each, but in the middle we have this little box that performs sequence number remapping; okay, so it remaps all sequence: numbers from sequence, space, a to sequence, base b.

B

For some reason, second packet gets lost in the way, so the client tries and acknowledges first receive packet using the arc field and last received packet using a sunblock, because it's not consecutive.

B

When the box receives this sucks, this arc segment, it has to remaps it back to sequence, space a, but it should also remaps the stack block.

B

The problem is that parsing, a sac block means parsing a link list potentially on every packet, and this is something that you want to avoid, especially on highly loaded devices, because it takes a lot of cpu time so, most of those middle boxes. They just ignore the suck blocks so in consequences. When the sender receives it, it has an invalid stack block in it.

B

So now the outcome will depend on the implementations, but at the minimum we will lose the suck block and at worst we lose the entire packet.

B

So we have observed those phenomenon in the wild in previous studies, and here is a few important numbers which has to be considered as lower bounds.

B

So we showed that two percent of deployed network devices are actual tcp ip middle boxes.

B

We found 38.9 of network pass, are actually crossing at least one middle box and six point. Five percent of all network paths are affected by at least one potentially tcp breaking needle box.

B

So we built our testbed based on vpp and we developed a middle box plugin for vpp called mmv.

B

Vpp stands for vector packet processing and is a kernel bypass framework developed by cisco, so we developed mmb in order for it to be flexible, intuitive and fast. Of course, we want it to be fast because we don't want it to be an extra overhead when analyzing the impact of past impairments on quality of service. Of course, we choose to study the impact of mass impairment on tcp by focusing on those three tcp features: ecn, sac and window scaling parameter because they are widely used and because they are so widely impaired.

B

And the idea is to characterize a little bit more precisely. How do pass impairing middle boxes affect the quality of service of tcp.

B

To this end, we build a testbed in two different setups, a direct and an indirect setup. Direct setup is simply two traffic generators, communicating together exchanging data in order to compute baselines and in the indirect setup we include a network simulator which would simulate various network conditions, as well as various past conditions.

B

To this end, we introduce artificial delay: artificial loss, artificial congestions and even elephant conditions.

B

First, the explicit congestion notification. So the idea of this feature is to have a router reporting congestion without dropping packets instead by flipping bits in the ip header.

B

So it relies on bits in the transport header, for the endpoints to communicate together and by two bits in the ip header to notify their support for ecl and for the rotor to mark the packet as congestion experienced.

B

So we recreated three pass-impairment scenarios: first, without congestion, a disabled ecn scenario with a router systematically changing the ipcn bits to one one. So, congestion experience, but with the fallback mechanism of ecn enabled a blocked, ecl scenario and a broken ecm scenario which is similar to the disabled ecn, but which is done in a way that the fallback mechanism cannot detect.

B

Which is the case when you only apply this modification when those two bits are not zero and which is something that happens in the wild really, but it happens.

B

So this is the median received data over time of the three scenarios plus the baseline.

B

First, we see the green line very close to the dotted line, which means that the disabled, ecn scenario is not impacted.

B

The orange line is roughly parallel to the green line, with an extra step of one second, which is the time before it retries without ecn, which is perfectly normal, so it is not impacted more than that and finally, we see that the broken ecn scenario is impacted, a lot and barely transmit data compared to the other- and this is the case because tcp is reacting to ce ecn, bits by reducing the size of its window and if you systematically reduce the window, it is size to the minimum.

B

Of course we want to emphasize that you should not disable ecn by default. So to this end, we also we created two pass impairment scenario with congestion with enabling and disabling ecl.

B

In this example, you can see that it entirely prevents from any retransmissions caused by congestion, while when ecn is disabled, you have a lot of transmissions. Of course, ecn do not always prevent from transmissions, but in this scenario it is and it's important it is important to mention.

B

So what we showed is that broken ecn slows down the connection to its minimum of 1ms per rtt, and we insist that disabling ecn should not be the per default solution, because ecn is very useful.

B

C

B

The selective acknowledgement, so the idea of the selective acknowledgement is to acknowledge non-consecutive data chunks, so we recreate three pass impairment: scenarios with artificial loss and sac enabled a sac disabled and a broken sack with this reshuffling box in the middle.

B

So here is the median bandwidth per packet loss rate. So this is for artificial packet loss because we have.

B

A minimum 0.001 percent of packet loss in our test bed that we could eliminate and which is the cause of this difference at zero percent packet loss, artificial packet class.

B

So what we observe here is that disabling sat leads to a higher bandwidth for packet loss rates lower than 0.9 percent, while enabling stock leads to higher bandwidth for packet loss rates higher than this value, which is related to our test bed. So why does it do it? Because having sac block enabled means parsing suck blocks means parsing a link list which consumes cpu time a lot of cpu time and, on the other hand, disabling suck leads to spurious retransmissions.

B

So you have to retransmit packets that you already received, because you are not able to acknowledge them, and we see that for packet loss rates lower than zero point. Nine percent cpu time of passing stock blocks is more expensive than spurious transmission and then, after d0.09 percent of packet loss, it is more expensive to retransmit all those packets than to parse those stack blocks.

B

And finally, the broken sack is at the bottom of the figure because it barely transmits any packets at all, at the exception of the zero percent artificial packet loss. But basically, as soon as you experience, your first first loss event, the connection stalls completely. This is because the sender discards, the entire act packet. Even the act number not only the sag block.

B

And this is the total transmission in function of the packet loss rate.

B

So, of course, this has to be put into perspective with this other figure, because you have more transmission if you have more bandwidth, but you observe a similar phenomenon with this threshold of zero 0.09 percent between sac and disable disable stack.

B

So in conclusion, we showed that broken sac stores completely completely the connection as soon as the receiver generates a single suck block, and we show that disabling sac leads to a lower throughput, but only after this threshold of 0.09 percent.

B

Next, we investigate the window scaling parameter, so the idea of this feature is to extend the tcp receive window by introducing a fixed, offset shift of 2 to the power of the parameter, which is particularly useful in the context of elephants network, so high delay high bandwidth networks.

B

So we recreate past impermanence scenarios with artificial delay, clipped window scaling, parameter with a middle box, systematically reducing the parameter to an arbitrary value or even removing it, which is equivalent to setting it to zero.

B

So this is the median throughput per combination of delay and clipped with wi-scale value. We analyzed four different congestion control algorithm for the sake of completeness.

B

So what we can see here is that obviously clipping down and removing the window scaling parameter, has a direct impact on the maximum achievable throughput of a tcp connection, but it can also be very problematic. For instance, let's take the most widespread clipped with scale value, which would be seven.

B

So if you are experiencing this past condition and have as soon as you experience 100 milliseconds of delay, you are not able to reach one gigabit of throughput, so in summary, we showed that window scale impairment have a direct impact on the maximum achievable throughput and that this is going to be more and more of a problem in the future.

B

So in conclusion, what did we learn about middle boxes? We learned that they are prevalent in today's internet.

B

We tried to show in the study that they are problematic to existing gcp features, and we learned by other studies that they are also problematic to transport evolution.

B

So what are they going to be in the future? This is probably going to be determined by the problematic of encryption by default of the transport layer.

B

So, on the one hand, you can choose not to encrypt the transport layer by default and go the middle box proof tcp way such as mptcp did.

B

On the other hand, you can choose to uh ensure ensuring proofness by encryption like quick, is doing. Thank you.

A

Okay, um then we go directly into the question answer session. um Let's see we have uh korean here only on audio, not video, but we can take questions.

A

Maybe let me start with one question, and this is a little bit- maybe not like to the heart of the paper, but you have one slide in there, which shows some measurements from a previous paper. I think you did um and there was like- I don't know, six percent of midbox- that break tcp, but there was also large percent of middle boxes, which seem to not break tcp. Can you little say a little bit more? What these mailboxes are and how you define in the box.

B

There we define middle box on the basis of the past conditions, so if they were doing any blocking any rewriting or any stripping of options.

B

But a lot of those modifications are harmless because either they are well handled by fallback mechanism or because they are done on fields specifically designed to be modified in transit.

B

And yet the wide majority of middle boxes do not break us.

A

Okay, so those metal boxes are also um safe for kind of evolution. If you change something or it's just like with the current crafting pattern, we have.

B

That's a good question.

B

It depends if they they might implement some kind of white listing, so in that case it they might be harmful for evolution. But uh we cannot tell from the data that we have.

A

Okay, thank you. um We have a little bit time for more questions.

A

Yeah we have one question: oh, did it press right button? Yes, that's greg.

C

D

C

Some specific middle box pathologies- I guess um it wasn't clear to me- are those specific impairments ones that you observed in real networks or are these hypothetical scenarios that you're investigating.

B

So all impairments that we investigated here are exist in the wild because we have measured them in previous studies.

B

The most widespread one is the uh sick number surfing box and others are pretty rare, but all of them actually exist in the wild. Yes, okay.

C

Right and then one comment you mentioned encryption by default, we're using quick as um remedies to this. I point out that the ecn pathologies unfortunately cannot be solved via those mechanisms seeing as it's an ip header field.

B

Yes, of course, thanks.

A

Thank you. um I don't see any queue anymore, so I will propose we move on. Thank you. Karin.

A

So the next speaker is sandra lucky. He is currently working as an assistant professor at the department of computer systems at the.

A

University in bucharest, I completely screwed this up. I'm very sorry! um Maybe you can tell us later how this is pronounced. um He also has a phd from the same university and which he made in 2015, and he is very much working on active and passive measurements, but also programmable data planes, sdns and quality of service in general, and we start with this talk on congestion control in independent offer. As.

A

E

Hello, my name is chando. Lucky I'm going to talk about an l4s aqm proposal which relies on our core status resource sharing framework.

E

There are traditional applications like gaming voice, transmission, ssh that require low latency for good user experiences. However, these applications generate low throughput traffic that do not build up queues along the forwarding paths. In this case, strict priority scheduling seems a feasible solution for ensuring low delay for the selected applications.

E

However, with the technological evolution applications requiring highest throughput and low latency at the same time have emerged just think of hd 4k video conferencing, augmented reality virtual reality, remote control of robots and so on. In this case, this simple priority based scheduling is not enough. It's not a good solution, since it leads to starvation of normal traffic, so how to ensure low latency and high throughput at the same time. So this is a complex problem and it is affected by both endpoints and the network. The endpoints we can use different congestion controls in the network.

E

There are buffers of various sizes and aqms. Most applications use tcp load space classic construction control needs large buffers to have full link utilization. Actually, these loads based approaches fill the intermediate buffers by design and thus the results in large queuing delays. Of course, aqms can reduce the queuing delay significantly, but still for full utilization. We cannot go below a limit.

E

On the other hand, scalable congestion control enables much finer rate control. It cannot only react to the fact of congestion, but its reaction is proportional to the congestion level dcdcp and tcp. Prague are two well known examples for scalable congestion control, but the recent bbr version 2 also implements a dctcp-like scalable mechanism in general. They can reduce the queueing latest significantly, but in turn require ecm support and are too accuracy for the coexistence, with classic congestion controls to solve these incompatibly issues between classic and scalable sources. Airfors internet service has recently been proposed.

E

It promises ultra low latency, low loss and scalable throughput for f less traffic. F4S flows apply scalable construction control, its design goals include isolation of l4s and classic traffic, and it also aims at providing window fairness between forest and classic flows, enabling their coexistence in the same system. The current state-of-the-art l4s aqm is dual pi square. We will use this method as a reference aqm in the evaluation section.

E

The main reason behind the incompatibility of l4s and classic traffic is that they require different congestion signal intensities. Dual pi square solves this problem by applying different signal, intensities, ecm marking or drop probabilities for l4s and classic traffic. In nutshell, dual pi square maintains two queues: one for f4s and another one for classic. Packets f4sq is controlled by a native aqm, a dc-tcp like step or red aqm and marks the packets with easy and congestion experienced for classic queue. Pi square is a ap aqm is applied, dropping or ecm, marking packets.

E

With a calculated probability, however, the two aqms cannot work independently. The output of classic aqm is also used for determining a coupled marking probability. It is also considered during the ecm marking of l4s packets.

E

As a result, this coupling methodism leads to higher signal probability for alpharest and lower for classic packets. Dual pi square works very well. If you consider a single classic and a single scalable congestion control behavior, it ensures different signal intensities for the two classes, but cannot differentiate between the flows. Inside the same congestion control family in the recent years, several congestion, comp congestion control proposals have emerged, both scalable and classic ones and, for example, bieber version. 2 can also work in both classic and scalable modes.

E

Rtt unfairness is well known, but there are further problems. For example, the aggressiveness of congestion controls from the same family could also be different and may rise in compatibility issues.

E

Most aqms also have an assumption on the congestion control behavior and what, if it is not accurate?

E

Let's see an example for the incompatibility of two scalable congestion controls, dc tcp and bbr version 2.. The left figure illustrates the case when a step aqm is applied which is similar to the native alpha saqm of 2 alpha square. Without going into details, we can clearly see that, with step aqm the ctcp flows get much higher share than bbr1s.

E

The right figure shows when our in-network research sharing method is used. We can see reasonable fairness. The key problem is the dct cp and vbr version. 2 require different signal. Intensities, however, step aqm applies the same ecm, marking probability for the two flows, and it leads to amp fairness.

E

On the other hand, our early core status equiam proposal can provide different signal, probabilities for dctcp and bbr flows. It does not require flow identification and peripheral clues, but this early version of our aqm cannot satisfy the requirements of alpharescent classic traffic. At the same time, it also requires additional packet marking before the battle neck.

E

The air for saqmb propose is based on our core stateless resource sharing framework called per packet value ppv in the per packet value concept. The resource sharing policies are implemented by a packet marking mechanism that assigns a packet value, actually a simple number to each packet. The packet value is not a traffic class, but an incentive expressing the importance of the given packet in the traffic mix packet marking can be done for the different traffic aggregates independently and thus can be implemented in a distributed way for different traffic classes.

E

We can use different methods to calculate the values of individual packets. The routers in the network then aim at maximizing the total value delivered without having any information on the traffic aggregates and the policies to be applied.

E

They solely use the packet value carried by the packet in the decision on which packet to drop or resee and mark or which packet to forward accordingly in case of congestion, packets, with the smallest packet values, are dropped or marked, with ecm congestion experienced in the pair packet value framework. If you have a flow and we apply policy, a this policy determines the perhaps value distribution of packets belonging to the flow. The role of packet marking is essential. In the following example, we assume two constant bit rate flows called flow number one and flow number.

E

Two. The marking is based on a function that we call throughput value function. In this example, the two flows represent two separate traffic aggregates having independent packet markers. The throughput value functions used for packet value. Calculation are the same. It means that in case of congestion, we expect fairly so share between the two flows.

E

In this example, we only distinguish down different packet value levels from 1 to 10, as seen in the figure for each we put value. The associated packet value is seen on axis epsilon. The throughput value function defines the expected contribution of packets with given packet values and the total traffic of the flow.

E

In our example, descending rate of flow number one is 80 and in this case packet value is 10 between 2 0 and 10 on the throughput value function, which means that if we filter out packets with packet value 10, their throughput share in the total traffic of flow number, one should exactly be 10 megabit per sec.

E

Similarly, the contribution of packets with value at least nine, is 20 megabit per sec, with at least eight is 30, but at least seven is 40 megabits per second, and so on. The sending rate of flow number two is 50 megabit per sec and the bottom neck is 60..

E

Since the same throughput value function is used and the ascending rates are high, we expect fair share of the botona capacity dropping packets with the minimum packet value. Let's do an observable packet value threshold below which all packets are dropped, and in this case this threshold value is 8 and it results in 30 megabits per sec, allo traffic or alloy throughput for both flows.

E

This animation illustrates how bottleneck works in the previous.

E

Example, our l4s aqm proposal called virtual dual queue. Core stateless aqm can be deployed in a per packet value domain, where the packets are marked with packet values in advance, similarly to dual by square. It also maintains two queues: average traffic is directed into q0 y classic into q1.

E

Our aq method also maintains two virtual cues for two reasons. First, virtual cues can be used for reducing queuing latency in the physical buffers and secondary can keep histories of packet values used for calculating stable congestion threshold values. Each alpharest packet updates both virtual cues with its size and the packet value.

E

If packet size fits into both virtual queues, the packet is enqueued into the alpha s. Physical queue classic packets, only updates virtual queue, vq1.

C

E

Packet is stored in the physical q1.

E

We extend the original virtual queue concept, since a virtual queue not only counts the packet sizes, but maintains a histogram of absurd values.

E

Similarly, to traditional virtua queue, concept, pq0 and vq1, and our system have a maximum size and a serving rate that is less than the outgoing capacity packets are served from the two physical queues with a simple, strict priority scheduler for us first and then the classic. You can recognize that virtual q0 represents the value distribution in the alpha s, traffic, where which are q, 1 stores, the coupled distribution of both alphas and classic packets. From these distributions and the predefined serving rates and delay targets of the virtual queues, two congestion threshold values can be calculated.

E

Ctv 0 is applied for the forest traffic only while ctv1 for both traffic families. They are simple filters for airflow as packet. If its packet value is above both thresholds, the packet is forwarded. Otherwise we mark the packet with acn congestion experienced for a classic packet. We only check ctv1 in practice. Congestion threshold values can be translated into congestion signal intensities. This coupling mechanism is similar to dual pi square and it ensures fairness between forest and classic flows.

E

Our evaluation testbed consists of three machines connected in a chain: topology, a traffic generator, a receiver and a bottleneck in the middle we implemented both the developer square and our core stateless aqm and dpdk net m2 was used to emulate rtt.

E

We also modified the bottleneck rate between one gigabit per sec to 10 gigs, and we use different congestion controls in the generated traffic and the tool we use this iperf 2..

E

In the first scenario, we compare the performance of the two aqms under changing traffic intensities, using dct cps alpha s and cubic as classic congestion control in case of our aqm. All flows use the same throughput value function, meaning equal desired resource sharing. The experiment consists of nine phases, varying the number of alpha s and classic flows. Each phase lasts 20 seconds and the number of l4s and classic flows can be seen on the top of the figures.

E

We can clearly see that both aqms provide good flow fairness, even among different traffic classes. If the number of flows is large, our method uses virtual queues that results in slightly lower usable utilization than dual pi square. It's about 98 percent.

E

Okay. In the first phase, the single dc-tc flow is not able to fully utilize the bottleneck link with our aqm 0.8 gigabit per second flow throughput can be achieved, but dual pi square results in even worse utilization of approximately 30 percent.

E

This phenomenon is doesn't appear with a single classic flow in the last phase, with one alpha s and one classic flows, both methods show significant unfairness. Dual pi square gives larger share to classic traffic and contrasts our aqm favors dc tcp.

E

We believe that this is mainly due to the differences trick. Priority and time shifted, fifo scheduling of the two approaches, the nine teams, person style and the average queueing delays are similar for both methods developed by square results in slice, with smaller delays for alpharest, but our approach also provides average delays in submillisecond order, except some temporal peaks. Significant maximum delay can only be observed if the number of flows is limited, as the more flows arrive in the system. The maximum delay also goes below 1 2 milliseconds, with.

D

E

E

We also repeated the previous scenario by replacing dc-tcp with bbr version 2..

E

We can clearly see that dual pi square cannot control vbr traffic well, leading to increased queuing delays and significant unfairness between l4s and classic traffic in general. If the number of bbr flows is large, classic traffic experience is a very low throughput share. Bbr applies a complex, model-based congestion control. We assume that the opposite, chaotic behavior is caused by that. The bbr's model is not prepared for this complex network behavior, with our core status. Aqm, almost perfect fairness can be seen in most phases.

E

The largest deviance is shown for the single fos and single classic case. The advantage of our method is clearly visible in this scenario, since it can automatically learn the congestion signal probabilities of various flows from the observed packet value distributions.

E

Our approach also provides smaller delays for both alpha s and classic traffic.

E

Since the propagation rtt on the internet is heterogeneous, we consider a scenario where flows with different rdts 5 and 40 milliseconds coexist in both classes. The experiment consists of seven phases where the number flows for each traffic class rtt parts are shown on the top of the figure.

A

E

Provides better but not perfect fairness, while ui pi square cannot handle the heterogeneity and rtts well with our method. F4S traffic with 5 millisecond rtt occupies more bandwidth than their fair share, which is more visible. When flow starts, leaving the system, it seems that dc tcp flows with small rtt can adapt to the new conditions, much faster occupying the freed resources.

E

We also repeated the experiment with bbrs alpharest congestion control. We developed by square the unfairness between alpha s and classic classes is more significant than with the ctcp bbr traffic almost fully suppresses cubic flows. On the other hand, our core set as atm, provides similarly good performances with dctcp.

E

Do bvr flows occupy faster, the feed resources, the classic ones, but also allow cubic flows to increase the rate.

E

In the next scenario, there are four different congestion controls in the same system. This is tcp and bb are used as alpha rest and cubic and bbr is classic congestion controls. Similarly, to previous experiments, we varied the number of flows in the traffic mix, as shown in the top of the figures for our aqm. Fairness is reasonable during the whole measurement in the marked areas, the number of flows is the same. However, the classic bbr flows cannot restore its throughput when load decreases.

E

In the last phase, we assume that the bbr's model stacks in a wrong state and cannot add up to the change network conditions for dual pi square. The experience awareness is much worse since air force, bb air flows take the most throughput. It's done time more than the first share. In some cases, the behavior classic bbr flows is again very interesting. At the start, they take a high share, while in the end they have the lowest share.

E

It is especially interesting to compare the first and the last intervals when the traffic mix is the same, but the research share is significantly different. We think that bieber's, model-based congestion control cannot tolerate the highly changing conditions of this scenario.

E

Congestion, control evolution is ongoing in this wide environment. Compatibility of congestion controls even within the same family cannot be expected. We need an aqm that provides different congestion signal intensities for different congestion controls.

E

We have demonstrated that our proposal provides different signal probabilities for different congestion controls by design without the need of flow identification and pair flow queues. On the other hand, it requires packet value marking, and thus its deployment can only be visible in closed networking domains. At the moment, the proper algorithm is lightweight and we are working on its implementation in p4. Finally, all the measurement results, including the scenarios at 10 gigs, are available in our website.

E

Thank you for your attention.

A

Okay, thank you very much. It turns out. I just broke my videos, so you have me without the video, um but at least we have sandra here on video and roland, so you can see them. um So very first question from I said: how do you pronounce the university.

E

It's earthworks lauren university that wish loren was a physicist.

A

Okay, that sounds much easier when you say it: yeah.

E

A

um Okay, then we would have time for some questions. Maybe I can go with the first question um first, um so, as mentioned on the very last slide you're working on an implementation. So can you say something about implementation complexity, maybe also compared to the dual um pi algorithm.

E

I think the complexity is very similar to dual price first, so basically our ctv based method. So we have this through packet value filters. uh I mean at that point I mean implementing. This is very very simple because you have the packet, the packet carries the packet value and you only have to check if it is bigger uh or greater than the the actual value, and you can update these filters periodically, similarly to dual pi square, where you update the drop probability periodically here here, you do the same with the threshold values.

A

Okay, so it's meaning the mass that's different, and probably the virtual queue handling is also a little bit different right.

E

A

Okay, then we go to uh greg who's in the queue again.

C

Thank you very interesting work, um uh a couple of comments, uh one as noted on your last slide. Congestion control evolution is ongoing. um You know, bbrb2 is a moving target.

C

Also dc tcp within l4s has migrated to tcp prod, be nice to see um evolution of your work tracking those congestion controls as they as they have all, maybe be running with with the current version of prague, and that sort of thing second comment um is uh be interesting: to see a comparison of your algorithm to an fq implementation that supports l4s, uh ecn signaling as well. uh That would be uh really interesting both both on the performance side and the complexity side.

E

Yeah, actually, what was in our mind so first of all, uh our approach has the benefit that we don't need their flow cues and, for example, if you want to implement it in a high-speed router. Like uh barefoot of you know, or something like that, you cannot work with lots of uh individual queues, so you have very limited uh capabilities, and in this way you cannot do perfluo queueing and in our approach I mean at the moment we are working on a p4 implementation.

E

So I can say that this algorithm with some simplification, can be implemented in p4 and can run on real physical hardware.

E

However, it's of, of course in the software implementation part or or if we have a software target, it would be also good to compare the results with a pair flow or a flow flow. Queueing aqm like fq codel, or something like that.

A

Okay, um do we have more questions.

A

Okay, then also thank you for your talk.

E

Okay, thank you.

A

And we go on with the next talk, um so that's a shorter talk because that's a position paper and the talk will be held by um danny lashes.

A

um So danny is a phd candidate at the university of campinas and he is working on.

A

Flexible network and application integration mechanisms and multi-domain environments and his talk will focus on auto and how to integrate alto into existing systems. So, let's.

A

F

Hello, everybody: this is danny from the university of campinas. Today I am going to introduce opposition paper about the use of the itf alto protocol to provide multi-domain network information.

F

This is a joint work with people at yale, university, ericsson, nokia, telefonica and sichuan university. Here's some key ultra related terms. We have the ultra server a logical entity providing res-based apis to query the art information services.

F

We also have the alto client sending alto queries to get guiding information from the l2 server and the alto client protocol used for sending alto queries between an alto client and an ultra server. The other working group in the ipf started in 2008 and currently also discussing proposal on recharging. The working group alto already provides a generic framework to expose network information for applications.

F

In particular, alto introduces generic mechanisms such as information resource directory information, consistency and an information update model alto also introduces extraction modules such as network and cost maps to provide network location grouping and a cost between them. The patch vector abstraction and capability maps such as the unified property maps and throughpings capabilities.

F

Now I would like to make clear what we mean by multi-domain, so a multi-domain is considered to be a network region in the global internet, and each domain has a network view from the perspective of the network region. In this context, a network region can be an autonomous system, a set of autonomous systems or hsps transport access, networks, etc.

F

Therefore, the multi-domain approach involves multiple network regions with different technologies and all administrations.

F

Nowadays, many multi-domain use cases are emerging where the traffic from a source to a destination, traverses, multiple domains, data science, applications and flexible interdomain routing are some example of such use cases. However, the current altobase protocol is not decided for a multi-domain setting of exposing network information, for example. Consider this peer-to-peer deployment using alto with the current alto client protocol. The alto server in each domain will provide only local information to alto clients.

F

It means that the alto client, the tracker in domain a will receive partial e-network information, either from the main b or the main c. On the other hand, an alto server to server protocol is necessary to allow alto servers to exchange information so that the alto client may receive entry information.

F

So the key questions are what information do multi-domain application need, and how does the network provide that such information?

F

On the one hand, application interact with the network by asking them to currently traffic for a set of flows, for example using our previous scenario, the application has two flows to transmit f1 and f2.

F

On the other hand, before the application can run a resource allocation algorithm to execute such submitted flows, its needs to gather some information from the network. First, the end-to-end costs across multiple domains, because in terms of resource availability and sharing, for example, the bandwidth availability and second, the application needs to find a sequence of domains and candidate paths. This means which domains are involved for the different traffic flows and one or more potential paths connecting such domains.

F

Now I will summarize a set of key issues in the core and alto design for gathering multi-domain network information regarding the server to client communication in multi-domain scenarios, it's not possible to optimize the traffic with only locally available information. Therefore, the communication amount multiplied. Alto server is necessary to exchange network information of multiple domains.

F

The altobase protocol states that the altitude server to server communication is possible. However, such a protocol is outside of this curve of this specification.

F

The connect pivot information is reachability between source node and the destination nodes in order to find the resource sharing, an application needs to know which domain are involved in the data movement of each node pair. Besides, a set of candidate paths need to be computed in order to know how to reach a remote destination node once the multi-domain connectivity discovery is performed. An application as an alto client needs to be aware of the pressing and the location of the alto server to get appropriate guidance.

F

Thus, alto servers will be located in different domains so that multi-domain ultra server discovery mechanisms are also needed in the current alto framework. Each domain may have its own representation of the same network inventory, for example, in this figure suppose that the path cost for domain b it utilization chair is thing of a valuable bandwidth. In this case, both values are not comparable together or even if all the member domain have the same. Utilis utilization share property.

F

The form of billing may not be uniform domain, a for example, using dollars and domain b and c may use some other form of local unit.

F

Applications also need to express their requirements in a query, for example, find the bandwidth the network can provide for flow f1, subject to reachability requirements, blacklist of devices, quality of service, metrics, etc. The current query, interface in alto, can not express such flexible queries regarding the scalability. Optimization problems specified by the application requirements can be computationally expensive and time consuming.

F

For example, the number of available paths for each flow is increased exponentially with the number of domains involved. And, finally, the information provided by the altobase protocol is considered cross-grading in several multi-domain use cases.

F

New alt extensions can be designed to provide fine grinder information using those alt extensions for multi-domain scenarios could write new security and privacy concerns.

F

So how to design a whole alto framework in this table, we identified the relationship between the alto design issues and their corresponding envisioning mechanisms to allow alto to expose information across multiple domains. Regarding this server to server alpha communications, alto server may consider hierarchical or mesh architectural deployment, for example, when a hierarchical architecture is used, alto servers in the main partition cutter locally available information and send it to a central server in convention deployments. Alto server may be set up in each domain independently connected to each other and gathering the network.

F

Information from other domains, multi-domain mechanisms combining domain sequence, computation and paths computations need to be defined or standardized computation protocols can be leveraged for expiring these design requirements such as pgp, vps or pce. Here we have a couple of examples following the pc-based architecture for computing, optimal multi-domain and 2m paths for close domain alto server discovery, the rfc a686 specify a procedure for identifying alto server outside of the alto client's domain. Other pce or bep basic mechanisms could be also used.

F

Multi-Domain composition mechanisms are also required so that the network information from alto servers in multiple domains can fit into a single and consistent virtual domain abstraction. Here we have three proposal using mathematical programming constraints for multi-domain composition, lexus, our collaborative network scenario again. To give an illustrative example, consider each domain provided debunded property using a set of linear inequalities where x1 and x2 represent the available boundary that can be reserved for flock 1 and flow 2 respectively.

F

Each linear inequality represents a constraint on the reservable bandwidths over different shared resources by the two floats, for example. This linear inequality indicates that both flows share a common resource and the decision of their bandits cannot accept a hundred gigabits per second. The involved domains may also exchange such properties and apply multi-domain redundancy optimization to remove cross-domain redundancy, for example.

F

Taking a look at the set of inequalities, one can conclude that the constraint in domain b and domain c can eliminate that and domain a and finally, a unified representation can be created, representing multi-domain network resource information with a flexible and generic query language. The network can filter out a large number of unqualified domains. The language and specification could be inspired in a standard or pre-standard mechanism implemented with a user-friendly grammar.

F

Here we have two proposals of language design, using a network service, descriptor style and a sql style for an alto client to spread its available resource requirements.

F

Alpha servers also need to support mechanisms to improve the scalability and performance such as pre-computation and projection. For example, the altar routing state abstraction extension describes equivalent transformation. Algorithms to reduce the redundancy in the network view as much as possible, while still providing the same information regarding the security and privacy alto needs mechanisms that provide accuracy, sharing, network information and, at the same time protects each member domain.

F

Here we have to initiative using a secure multi-party computation protocol to collectively send the responses to the alto client without revealing the source of any entry.

F

As next steps, we will continue the discussions on feasibility and deployment concerns and, finally, to mention that many of the alto members are organizing the network application integration workshop. Please consider to participate.

F

Ok, thank you so much for your attention and feel free to ask any.

F

A

Okay um thanks a lot.

A

We um now we have danny in the audio, so we would be available for questions.

A

A

Hello, okay, so yeah. We can hear you very um softly, but somehow you don't show up on my screen at least, can you say something again hello? Can you hear me?

A

Yes, yes, can you guess, okay, that seems to be fine? Okay, we have one question. um No, we somehow uh somebody requested screen sharing. I don't think that was the intention. So.

A

A

Okay, probably you wanted to request video as well danny sorry for that, but still people can queue on for questions while we're trying to resolve this.

A

Issue: um okay, let let me actually ask one question, because your talk was maybe a little bit abstract about uh metrics. Can you maybe um briefly um iterate like which kind of metrics could be shared and how that would look like more concrete, I mean the one example you had had there was on on bandwidth, but um bandwidth might be actually at least available. It might might be changed quickly. So what are other examples? Maybe you can go into this a little bit yeah.

F

Yeah thanks for your question. Actually, currently we are discussing a lot regarding the metrics because we need after we we define methods. We need to generate a unif, a virtual, unified representation.

F

We identified two type of transport metric, one related to universal metrics, to say something that is, for example, bandwidth, latency, and so on with that we can try to, I think, or we think that it's more easy to generate a unified representation using cu or minimum. Something like that, but other type of or metric than not are universal, unix more related that they are numerical but ordinal, for example, it is more tricky. Maybe it's not easy to provide a unified representation. Maybe we can try to provide a vector representation.

F

It's it's! uh It's a current discussion that we are trying to to to get more and more ideas on that.

A

Okay, so I guess people who are interested should come to the outer working group and continue the discussion there.

A

Okay, I'm waiting a few more seconds to see if somebody has another question.

A

Otherwise, um thank you to you and we move on with the next talk.

A

So for the next talk we have stuart stewart is a researcher at futurewire future networks um in santa clara and also visiting professor at the 5g innovation center at university of surrey um he's working mainly on the forwarding layer and deployment of technologies there, and he used to be a former era director routing area director and is the chair of the paul's working group. So welcome stuart and we start the.

A

D

This traffic engine rearing requirements are becoming more demanding, sdn solutions, work by calculating traffic engineering paths and then allocating resources centrally then communicating these decisions to the network notes individually.

D

This allows a holistic view for better optimization, but it provides less resilience against perturbation in the network or in network state and delayed adaptation to network changes. Traditional routing, on the other hand, relies on distributed algorithms.

D

These provide fast adaptation to perturbations in network state, but there's considerable overhead in data synchronization and local decisions may not always be globally optimal.

D

So our proposal is a hybrid solution to combine the advantages of central and distributed approaches whilst avoiding the disadvantages.

D

Conceptually centralized components are used to calculate te paths and resource allocations.

D

This information is communicated in a distributed manner, using a link state routing protocol.

D

We can provide this service to multiple data, plane types, mpls, mplssr, ip srv6, ipv4 and ethernet, etc.

D

Ppr works by providing a method of injecting paths into the link state igps in the data plane. The packet is then mapped to its intended path through its ppr id. This ppr id is a single identifier in the packet.

D

The format of the ppr id is data plane specific, so it might be an ipv6 address, an ipv4 address, an mpls label or a mac address when we've demonstrated that this works in a hackathon in interop.

D

So a little case here abc is the shortest path, but we want for traffic engineering reasons to go a b c d.

D

The way we do this is in the data plane we're only going to put the ppr id call it little d in the packet, but in the control plane we provide ppr id d plus the set of identifiers uh that it must pass through a set of node names. It must pass through and we provide this in the control plane, and this allows us to build the mapping in that we need for the forwarding plane to work.

D

So, let's look at a little example case. uh We have a traffic engineered repair, so the primary path is abcd.

D

The repair path is a e f g d and we have um some subsidiary connector paths, uh bf and cg, to deal with the case of uh failures of the bc and the cd link.

D

So if any node fails any path fails, we can uh repair it through our traffic engineer, repair path. Why do we need it to be a traffic engineered repair path? Well, if we have a critical sla for the traffic, that's using the primary, we must also provide the same sla in the backup, and this is particularly important for 5g, ultra reliable low loss communications or for massive iot slices.

D

Furthermore, high bandwidth traffic carried on te paths must not saturate best effort, shortest paths that we would get from some other techniques.

D

The repair paths are created in the sdn controller and injected at any node or for resilience in a small number of nodes in the network.

D

So another concept we've got here is ppr graphs, so we have a graph that describes this and what they use is tlvs to describe the graph as a series of lists of paths, any node may be a source.

D

The source node is allocated in all the set of source nodes are allocated in the graph with the s bit and generally, there is one destination node, which has the d bit set. The destination has a ppr id associated with it. So we'll see how this works.

D

So a familiar little graph here, the primary path is still abcd. The backup path is aefgd, plus bf, plus cg, and we describe this in the graph on the on the right. So we have a tlv structure that lets us describe the ppr id d, prime and the three sub components of the of the graph.

D

So ppr can support both centralized and decentralized computation of the of the path. Any node can inject the ppr path, either for itself as, for example, the point of local repair, calculating its own repair paths or, on behalf of an sdn controller, managing the repair paths.

D

Multiple nodes can inject the repair path for redundancy and duplicates will automatically be eliminated in the igp flooding process.

D

Any algorithm can be used to calculate any path or graph to serve the needs of the application of the application. In other words, we can create bespoke disjoint paths or lossless paths or low latency paths.

D

Such paths are independent of any other paths chosen for any other purposes, because the path map is discriminated on. The ppr id.

D

So what might we do? What were we thinking of doing in future for this? Well, every path can have its own individual policy installed by the control plane for each specific ppr path. For example, we can specify the queue behavior at that part that hop or we can specify any monitoring or om behavior. We want to to take path apart.

D

The path can be strategically installed by the sdn controller or tactically, by an edge node. So the research question we've got is: how do we define a suitable policy expression language for the ppr.

D

Now also, we know that efficiency can be improved by path, oriented flooding, so the path abcd to d prime needs the the red path, but not the blue path. Information and the nodes on a e f g d to d double prime need: blue, but not red.

D

So the question is: how do we define a resilient flooding reduction system? Because we have to do this without compromising one of the central uh tenants of link state protocols, which is that the flooding system provides a lot of the resilience.

D

So what about? How can we do more work on this resilience and robustness? We know how to build, for example, fast reroute based on ppr.

D

So the question is: can we expand the ppr graph structures to provide traffic engineering between determine between debt net nodes and also add packet, rep, the packet, reputation, elimination and reordering functions, uh and can we do use this to provide these facilities for new data planes such as ip another aspect of robustness? That we think is worth further research, and that is to that is to consider the case of byzantine robustness, and a byzantine system is one that can withstand active lying by its components.

D

We know how to make link state protocols- uh byzantine, robust radio permanent- showed how to do this. Many years ago, at mit we're dealing here with high value traffic engineering and strategic 5g services, and these are a prime target for attack, so we're proposing to you we're present using a link state protocol to set up these te paths. So the research question is: can we make traffic engineered paths that are robust against byzantine attacks or accidents that have the same characteristics.

D

In summary, ppr is a hybrid distributed routing and sdn solution, which combines the benefits of centralized path, computation for more efficient resource allocation, with the benefits of traditional routing and the robustness that afford that affords with regard to being able to more rapidly adapt to network perturbations.

D

Thank you for listening and if you would like more information, please contact one of the authors on the email addresses below. Thank you.

A

Okay, um thank you for the presentation. We also have stuart already on audio and video. That's great, um so we would be ready for questions at this point.

A

So, um like my impression is that you still have a lot of research questions there um that you need to tackle. So probably there are many open um issues still left, but the one thing I was wondering about listening to the talk is also like: how do you actually evaluate um this? How can you make sure that what you propose is kind of better or more secure.

D

um How do we make sure it's it's better and more secure? Well, so um one of the things we pick up is the natural distribution of the well proven distribution of information through a link state routing protocol, which means that we can distribute the the information without having to set up a per node uh connection from the sdn controller to every node in the in the network, and um we know that we can, for example, run that system such that you can stop uh that information being corrupted by one of the nodes on the path.

D

That was the the byzantine work that was done that never really got deployed, because nobody ever really thought it was a high enough value, but- and they were also worried about the cpu overheads, but that work was done probably 30 years ago and uh laying in the bot lane in the toolbox and the requirements on networks have significantly changed. Since then,.

D

So uh so, yes, we we've got some ideas. We've got, we've got plenty of things we can do with this. We're really trying to find out whether people are interested in working with us on this, and uh whether people are interested in deploying techniques of this sort.

A

Yeah, I mean that's actually a good point. um That was also what my question was hinting for, because um I think you need to actually somehow prove that there is. You know a real benefit um for somebody to do the investment and to do the employment right and sometimes that's hard, sometimes easy to argue um that something is different, but not necessary if it's too much better.

D

um Well so so ppr itself has got interesting and useful properties, so um the the alternative to it is, for example, a segment rooting approach where you put the information in the packet and you build these bespoke paths.

D

The other kind of alternatives are um that you uh construct um um large numbers of sort of piles and purposely put them um in the network, and I've done quite a lot of work on um fast reroute uh in the ietf and fast reroute is a useful technology that people want, but I don't think it's all there yet, but in particular around the the need to maintain uh traffic engineering of um fast reroute paths of repair parts and also, in particular, we're moving to a world where we're being much more picky about uh what we mean by assured quality of the of the path uh and the the new work.

D

For example, that's being done um in the itu, fg uh focus group network 2030. We should look at a whole new class, the new classes of services, um just taking a service that you really really engineered or a network slice, for example you really engineered and then just throwing it in the best effort bucket, doesn't seem a good idea. So um you know one of the things we're interested in is: how do we build a resilient network that preserves the original traffic engineering qualities despite the fact that we've gone into the failure mode?

D

And how do we do this quickly enough? um You know the failover times, for these things are some 50 milliseconds.

A

So we have tallis on the audio, I'm sure you also want to comment on that.

G

Yeah, maybe you know just high level comparison right, so segment routing is really a minimum state solution for soft services right so for optimizing capacity when you get to trying to optimize hard guarantees like hard latency and bandwidth guarantees, we need to hop by hop state and then the question is: what are the optimized mechanisms to establish the minimum amount of state for the maximum amount of you know different flow guarantees, and we think that the graphs there provide another level of optimization that we haven't seen in before, because we really have only done point-to-point path and you can easily calculate the amount of state you need right.

G

So there is some good amount of qualitative assessment that could be done in comparison between. You know, ppr sitting in the middle between sr and let's say.

A

Rsvpt yeah, I guess some more work to do yes very high in the summary.

D

And if anyone's interested in working with us we'd be delighted to uh to talk to other people.

A

Okay, let's see if we have any more questions in the queue. Currently, it doesn't look like it going one going two going three. No, that means we're done with the session for today, as you can already see um on the on your screen, probably uh is that we have another session tomorrow. It's only um two more papers, but two very interesting papers around monitoring and locking.

A

So I'm really looking forward to that session session, and with that, I wish you a good day evening or morning or whatever time zone you are in and talk to you.

A

A