Internet Engineering Task Force 92, 24 Mar 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF92-IRTFOPEN-20150324-1730

Description

IRTFOPEN meeting session at IETF92
2015/03/24 1730

A

B

You hear me: ok,.

C

If you're here for the I RTF open meeting you're in the right room, if you're here for something else or you want to read you email you're, welcome to do that. If you give her something else, you're, probably in the wrong room, the observant amongst you will notice that I am NOT. Lars eggert I am Matt Ford I'm with the Internet Society, and mostly here, to introduce our speaker for this session.

C

Who is our applied networking research Prize winner at this I ETF and that is Aaron gambar Jacobson who's going to talk about who won the award for designing and evaluating nfe control? Plane he's going to tell you a lot more about that, but maybe we could just have a round of applause to congratulate Aaron on winning his AARP award.

C

I think Aaron you're gonna you're going to present and then we'll take some often time afterwards for Q&A. If you want to save up your questions and I guess, if you have clarifying questions you can you can dive in with those, but otherwise we'll save, save questions for after errands talk and I'll moderate the discussion, Thanks yep.

B

Something is not clear, certainly feel free to step up to the mic can interrupt me. So thanks. So much for that introduction. I hope you'll find what I'm talking about today interesting. So what we've done is we've we've done some research to take the principles that we have in software-defined networking and extend those principles to network functions or middle boxes that are running in our network in order to allow operators of networks to better satisfy a number of different goals.

B

So for those of you who aren't familiar with network functions or middleboxes, the basic idea behind them is that they're going to perform some sort of sophisticated analysis of traffic or flows as it passes through this device in the network, and typically it's going to take some stateful actions on that traffic, so good examples that commonly exists. Things like when optimizers caching proxies intrusion prevention systems and we're seeing two ships in the way these network functions are being deployed today.

B

The first of these is network functions virtualization, and the basic idea behind this is that we want to take dedicated Hardware appliances that are deployed today and replace them with virtual machines that are providing the same functionality, but allows us to run the network functions on top of generic compute resources, so we no longer need customized hardware. The benefit of this is that we can dynamically allocate instances of network functions as we need more capacity in our network or, as we need to introduce new functionality.

B

The other trend, that's reshaping. The way network functions are deployed is software-defined. Networking software defined networking gives us the ability to flexibly re route traffic between these network functions as we create them or as the needs in our network of all, and so together. What these two trends give us is: they give us a way to dynamically, reallocate we're in our network reprocessing certain traffic and what processing is happening to that traffic and as a result, that can enable a variety of interesting service, abstractions and capabilities for our middleboxes.

B

So one such example is we could build a system that elastically scales network functions as the demand and our network changes over time. So we start off here with a single instance of an intrusion detection system, and we want to make sure that this intrusion detection system is going to always be satisfying some sort of performance SLE. Perhaps we have an SLA that says the packet loss that we experience has to be less than some percentage, so as the load in our network increases will start to overload this initial instance.

B

We have that's going to start to create SLA problems, and so we need to add another instance which NFV makes it easy to do this and with Sdn, then we can rear out some of the traffic from our original instance to this second instance, and now that gives us the ability to shed load from that original instance and now satisfy our SLA okay. So now.

B

The second thing is that at some point the load in our network may go back down, and so just as we scale down, we want to be able to scale back in. So at some point we want to be able to destroy the second instance, because it's no longer needed and we route traffic back to the first now.

B

The problem here is that, while we're doing this scaling in and scaling out, it's important that we accurately monitor the traffic and have our IDs function as we expect it to to actually detect malicious attacks on our network. The thing is, it turns out in order to do all three of these together, we actually need more than what we can just get with this concept of NFP and this concept of SDN, and so with only these two abstractions.

B

Today we can't quite realize these scenarios, like elastic NF, scaling or some sort of high availability situation, so to understand a bit more, exactly what we're missing and what else we need. Let's take a look at this scenario in a bit more depth. So again we're going to assume that we start off with a single instance of the ids, and here I'm going to look at traffic at a little bit finer granularity, I'm going to assume that we know specific flows.

B

These could be TCP flows, it could be a set of traffic from a group of house, but some notion of flow through this network. So, as we see traffic from these flows, this intrusion detection system is going to establish some state related to those could be things about connection endpoints potential information about what we've seen in the payloads so far a variety of different pieces of information.

B

So now, when we start to hit an overload situation as the rate of these flows increase, we again can launch another instance, but the question becomes what exact set of traffic are we going to talk about rerouting in this particular case?

B

So one option is that we could only rear out new flows that are coming into our network, such that if we have some green flow that comes in we'll send it to this second IDs instance that we just created it'll establish some state and properly analyze this traffic, and this is great from a cost perspective. We clearly needed this extra instance, but this isn't going to help us satisfy our SLA.

B

We still have all that extra traffic from the red and the blue flow going through our first instance we're still starting to experience packet loss, so this isn't going to work. The other challenge that we face is that there could be information at each of these IDS's that we need to collectively combine in some way, so maybe we're trying to do port scan detection. All these flows are going to a particular hosts, and if we don't aggregate information about connection counts between both instances, it's going to take us longer to detect that scan.

B

So it's unclear of accuracy will be affected in this situation. Also, okay, so we need to get some traffic off of this original instance. So we'll pick one of the floats, let's say the blue flow and go ahead and rewrite it now. The problem is that well, we've rerouted this flow we've run into a situation where we left at state behind and so now the state that we need to continue to analyze this traffic and detect any attacks that might be in.

B

It is now only available at our old instance and not available the new place where this traffic is going to so we're not going to reach our accuracy goal at some point. Eventually, this blue flow will die Oh to the network. The load in our network will go back down and so from a cost perspective, we want to ideally be able to destroy the second instance. The problem is: when do we go about doing that? If we destroy it immediately, we run into the same problem where we get rid of state that we need it.

B

We no longer we'll be able to properly analyze the green flow if, instead, we wait for this green flow to die off now we're going to run into a situation where we need to wait for a potentially unbounded amount of time before we can destroy this. So in traffic traces we've looked at from our campus network, this may mean for 25 minutes we're going to continue to run this instance, maybe longer so that means we're going to satisfy our SLA is and accuracy, but from a cost perspective, we're spending a lot of extra money.

B

We don't need to so what exactly do we need that if we want to get these three goals, what's missing from just an MP and sdn? Well, one thing is that we need some way to manage the internal state that these network functions are maintaining, and so we need to be able to move it copy it and, in some cases share it between different instances of a network function. Second of all, as we're transferring this state around, we want to make sure that we're not compromising the accuracy of our network function.

B

So there are certain guarantees we need to have on how the state transfer is happening, such that we don't lose updates to this state. We don't potentially have packets that are processed, and maybe even in some cases we need to make sure we process the packets in a particular order. Now these same requirements apply not only to the elastic scaling scenario that I talked about, but to other interesting scenarios like transparent, failover or potentially.

B

If we want to do something like in-place upgrades, so I hope, I've convinced you that we need something new here and some for the rest of the talk. I might talk about what are the challenges in doing this and meeting those requirements? I just talked about I'll talk about then our architecture that we've developed in order to meet those requirements and address those challenges, and, lastly, I'll close with some preliminary evaluation results.

B

So there's three main challenges that we face in meeting the requirements of being able to move state and to do in a way that's safe.

B

The first of these is that there's a lot of different network functions out there, everything from when optimizers de cashing proxies to when you start to talk about cellular networks, things in the evolved packet core, and we want to make sure that we're minimizing the number of changes we need to make to these and that we can accommodate a lot of different network function architectures within this broader system that we're proposing to develop. The second issue is not there's lots of things going on in the network here. We're thinking about moving state.

B

There's updates that are happening to that state, there's packets that are still flowing through our network and we want to be making forwarding updates. So how do we avoid problematic race conditions between all of these different things that are going on? Lastly, it's important that whatever we're doing to move state around doesn't have a lot of memory, overhead, cpu or and doesn't take a lot of time, especially if we're talking about moving state in scenarios where we're trying to do scaling we're ready in an overloaded situation.

B

So we don't want to impose a lot more load onto what's already overloaded okay. So what could we use? Well, one thing is we could say why not use virtual machine snapshots. We already have virtual machines that our network functions are running on. We know really well how to snapshot virtual machines and clone them efficiently. The problem is, we can use this to do scale-up. This will give us a copy of the state we need for both of these red and blue flows, and we can move the blue flow and we'll have at state.

B

The problem is when we run into that scale. Down scenario: we have no way to can recombine to vm images into one. So that's not going to work out another solution that exists out. There is a system that came out of IBM Research, it's called split merge. The basic idea of split merge is that you use shared library in order to access and update, and excuse me in order to access and create state internally. So you basically replace all memory allocation calls with calls to their library functions.

B

The problem is that they're targeting a very specific scenario, which is elastic scaling, so it's not clear. Their solution will work in other scenarios and also in their system. They don't provide any of these safety guarantees, that'll ensure that we don't lose important updates and that packets aren't reordered in cases where that can affect the accuracy of network function. So this brings me to our solution. Open it up, open and apps architecture is very similar to what you'll see in SD on.

B

So we have this logically centralized open and f controller, and on top of this will run scenario, specific control applications. So one control application may be implementing this elastic and f scaling example that I talked about and it'll issue operations to move copy or share state as it needs to underneath this controller, then we'll have the network functions themselves and they were going to conform to some sort of southbound API that we've developed such that we can accurately export an import state from these different instances.

B

So when a control application issues in operation, a module within the controller will translate that into a series of the southbound API calls to do our state transfer and once state has been successfully transferred. We can then communicate with an existing forwarding module to tell it to update the forwarding state in our switch and rewrite our traffic. So I'm gonna talk a little bit about the the southbound part first and then I'll go into how we implement these higher level functions.

B

So I said: there's a lot of different network functions out there and obviously, depending on what they do. They maintain very different internal state.

B

The state and a caching proxy looks different than the state in an intrusion detection system which looks different than the state in something like a really simple firewall or network address translator, but it turns out, despite the fact that they share different state, the way they go about, creating and updating this state is common and that they think about state in terms of either being associated with an individual flow or multiple different flows.

B

So, to give you an example, let's take a look at the state for an intrusion detection system, specifically the bro intrusion detection system, which is an open source. Ids. That's existed for many years, so here we have for every single TCP connection, a couple: different objects, a connection, object and protocol specific analyzer objects and we're going to organize these in some sort of a hash table.

B

Likewise, we have state that is maintained, / host, so for every host, we're going to maintain a count of how many different connections have been established or attempted to be established with that host. Likewise, we may have some state, that's updated for every single packet we process and something like statistics applies to all the different flows that thus network functions responsible for so we can use this taxonomy to develop a relatively simple API that allows us to get put and delete state from these network functions on this flow basis.

B

So these functions first of all, except what kind of scope of state are we interested in and a filter that defines a flow space for what types of flows one set of flows were interested in then we modify the network functions to accommodate this operation. It can take its internal state, apply this filter to it and any state that matches will be sent to the controller. Likewise, if the controller wants to provide some state to be integrated into the middle box, the middle box can take this state and integrate it into its existing structures.

B

So this relatively simple API means that we don't have to expose or change how the network function is organizing its state internally, and it provides an intuitive way for us to reason about what state were interested in. So now that we have these capabilities from network functions now we can go about using these for our to realize the operations that our control applications issue.

B

So, let's say our control application wants to move all HTTP traffic from being analyzed from the first instance of an IDs to a new second instance that we've created so we're going to tell it that we want to move all traffic. That's on port 80.

B

The first thing that's going to do is: ask the middle box for any state that it has related to http flows and that states going to be provided to our controller next we'll go ahead and flush, this state from our first instance, because we don't need it there anymore and we'll put this state to our second instance. Now that the state's been moved, we can finally go ahead and update our forwarding such that we can resume analyzing our HTTP traffic. At the second instance, we have similar capabilities to be able to copy and share state.

B

I won't go into the details of that here, but I'm happy to answer questions about that. If you have them later on. Ok, so we've addressed this first challenge now. How do we deal with all these race conditions and providing important safety guarantees? So one problem that can occur in the move operation I just showed is that we can lose packets or lose updates to state as a result of packets arriving.

B

While we're trying to do this state transfer so I'm going to assume here that we're running the bro intrusion detection system and it's running a script that computes a hash of the payloads of all the packets for a given connection and compares that hash against a database of known malware. This is a standard script that comes with this IDs, so we'll go again, have two different flows: red flow in a blue flow. So when packets come in, the IDS is going to say: ok what half?

B

What's the hash of this packet matted to a rolling hash that it's computing so now at some point, I say well: I want to move the red flow, so I'm going to go ahead and do my state transfer like I did before, but before I had a chance to update my forwarding state. Another packet comes in for this red flow, so now this packet comes in and the intrusion detection system says why don't have any state for the red flow.

B

This must be a new flow, so it's going to go ahead and establish some new state now at some point, our boarding updates going to take effect and now our third packets going to come in and now when we try to compute a hash over this third packet. Now we only have the first two packets and so the hash that will excuse me the first and the third packet. So now this hash that we compute is going to be incorrect and we're not actually going to detect that there's some malware in this flow.

B

So what we want, as we want a guarantee that these state operations are lost free, we want to make sure that we're not losing any packets and that all packets are being processed that that have passed through our network at this point in time. So split merge also provides a limited form of this loss freeness, but it turns out a key thing they don't deal with.

B

Is the fact that packets may already be in transit to a network function at the time we start the state transfer, so while they can buffer packets at the switch they're, ignoring the fact that packets may have already passed through this switch, so this doesn't quite give us the lost freeness that we want. So how do we go about doing this?

B

Well, we're going to enhance the capabilities that the network functions provide for us just a little bit we're going to add an event mechanism such that when some set of packets come into this network function, we can say: do any of these packets match a filter, and if they do, we can send an event to the controller that says: hey I was about to process this packet. It was going to update, or it may have been about to update some state that you're trying to move.

B

We can then tell the network function to either go ahead and process packet buffer for processing later on, or simply throw it away and not process it any further and to add this capability, we just need to modify the main packet receive function within the middle box out of code that add a little bit of code. That checks should I be raising an event or not fairly simple change. Okay. So how do we use this now? To get this loss? Free property? Well? Well, first thing we'll do before we start transferring any state.

B

Is that we're going to enable events on our first instance we're going to say whenever you get a packet that matches this red flow space that we're talking about I want to know about that, and you should not process that packet further.

B

So now we can go ahead and start to transfer our state from our first instance to our second, and if a packet comes in in the middle of the state, transfer will go ahead and construct an event and send that an event to the controller where we'll be buffered temporarily after we finish after then, we can finish our state transfer and after the state transfer is done now we can let this packet that was buffer to the controller, be processed by our second instance.

B

So now we make our forwarding update and when the third packet comes in, it turns out that we've seen all packets for the flow they've all reflected in the state. We can compute our correct hash, and now we can detect that. There's now we're here now there's another potential problem we run into, which is reordering and in fact adding this loss. Free mechanism can actually introduce reordering. That may not be possible otherwise, and this could be problematic in the case of a script that comes with bro.

B

That looks for weird activity looks for things like did you get a syn packet after you've already got in a data packet. So let's go back to this fifth step from the last slide, where we were flushing the packets that were buffer to the controller, so we'll flush these and then we'll go ahead and make our forwarding update. So now we make our forwarding update, but before that update takes effect, another packet comes in so that packet comes in. It goes to our third instance, our third instance as well. I've.

B

Excuse me our first instance says: I've events enabled so I'm going to send this. Third back to the controller, the controller will say: I've already flushed the buffer offense, so I'll just go ahead and pass this directly to the switch and directly to my second instance, but before this packet reaches. That second instance are forwarding updates already taken effect, so it's possible. Another packet comes the switch that packet gets forwarded to the second instance and now arrives before we've gone through this whole sequence of forwarding along this third packet.

B

So now we have reordering that's happening. So what we need in some cases is, we need a guarantee that our move operation is order, preserving which says that all packets are processed in the same order that they arrived at the switch and that updates to state happen in that order.

B

So how do we go about realizing this? How am I doing on time? Okay, let me actually I'm going to skip through this, because it's kind of complex and we can come back to it later if people have questions okay, so third challenge issue of overhead. How do we make sure that we're not introducing a lot of memory, CPU and other overhead in actually providing these operations?

B

Well, so the thing is that we're given applications, some choices, the first choice that we're giving them is: what sort of state do you want to move if you're only moving, HTTP flows, you only need to move state relating to those HTTP flows. If you're trying to create a middle box, that's highly available, so you're snapshotting state, you may say I only care that, if something fails that a certain set of flows continue to be processed correctly. So now you only need to grab that state.

B

The other option is that you can decide whether or not you need these guarantees. So the example that I was going through this intrusion detection system was off path. That's what makes it an IDs versus an intrusion prevention system so because this IDs is off path. If packets get if packets get dropped on their way to the IDs, there's no way to get those retransmitted, the idea is getting a copy of traffic.

B

However, in the case of an IPS, if a packet gets dropped on its way to the IPS, that IPS is in the middle of a connection, which means normal TCP mechanisms will recover from that loss and that IPS I'll have another opportunity to see that packet. So in that case we don't need this loss, free property and so by giving control applications the flexibility to choose what they want. They have some control over how much overhead they experience.

B

Okay, so going back to our three goals, we wanted to get SLA s. We wanted to make sure that we could do it at low cost. We want to make sure that our network functions are happening, we're operating accurately and analyzing traffic. So we've addressed this issue of diversity by making sure that our changes that we make to import and export state are simple and we have a simple events mechanism. We deal with race conditions by adding this events mechanism and by having lockstep forwarding updates.

B

And lastly, we address this overhead issue by making sure that applications have the flexibility to choose what set of state and what guarantees they want on those operations. So we've implemented the open enough architecture.

B

The controller itself is implemented as a module running atop of the floodlight SDN controller and we've also implemented a communication library that can be linked into network functions in order to communicate between the controller and the network functions themselves. We've modified four different network functions so far to conform to our southbound API and provide events and export State. So this is the bro intrusion, detection system, we've modified iptables, squid, caching, proxy and also pratts, which is a asset detection and monitoring system. That's used in our University Network.

B

So how well does open an F perform and doesn't actually give us the benefits we wanted, so we're going to take a situation here where we have a trace of traffic from our campus network that we're replaying at a rate of 10,000 packets per second and we're going to start with one instance of the boro intrusion detection system 180 seconds into the experiment. We say: move all HTTP flows to be processed by a new instance 180 seconds later we're going to move any active, HTTP foes at that time, back to the original instance.

B

So in order to actually do the transfer of state that we need takes 260 milliseconds, and so that's quick doesn't take very long. We also looked at. Is this accurate? Have we maintain the accuracy of the network function, so we compared what happened if we let all of the traffic be analyzed by one IDs and didn't do these moving back and forth operations versus what happens? What is the output of the IDS? If we do this scale out and scale back in turns out, the log entries are equivalent.

B

If we had used this VM replication that I talked about earlier, there would be entries missing from our because when we do this scale in operation, we have no way to combine to vm snapshots together. Lastly, there's this issue of cost, so how quickly were we able to scale in? We were able to scale in as long as it took us to move the state back, which again was about 260 milliseconds?

B

If we had used waited for flows to diox--, the flows in this particular trace lasted more than 25 minutes, and so we would have needed to unnecessarily continue to run the second instance of the ideas until those clothes had finished, so that would have been a lot of extra cost that we would have been paying. So I said this move takes 260 milliseconds. How does what we're doing it? The network functions, contribute to that.

B

So we can look at how long these get and port operations take on our network functions, and we did this for three of the thing network functions that we modify and it turns out that the cost to serialize in deserialize state is most of the time that we spent in these network functions. So potential Urso, definite improvement opportunities there. If we can do a better job with how we go about serializing and deserializing, we may be able to improve the efficiency there.

B

The other thing is that it takes longer, as the state in the network function is more complex, so iptables has very little state for every flow because it only does simple rules like is this TCP connection, the established state, or not contrast that with the bro intrusion detection system, which needs to know a lot of information about the flows that it's analyzing, so it states much more complex and as a result, it takes several times longer to actually go about, transferring it okay.

B

So we have these low-level operations, but how about the high level operations and out of the guarantees impact the time that it takes us to do these move operations? So here we're going to assume that we're running the asset, the preds asset detection system, we're again using the same trace of traffic at a slightly lower rate, 5000 packets per second and we're going to move the state for 500 flows that are active at a given point in time.

B

So, if we look at how long it takes for this move operation to complete, we can look at first of all what happens if we don't provide any guarantees? If we don't provide any guarantees, then we're talking about 190 milliseconds to do this operation. We can do some parallelization of our guests and our. What's that we're issuing in order to speed things up a little bit so now we can cut that down almost in half not quite half to about 130 milliseconds great now.

B

The problem here is that we're losing packets as a result of this, so without any guarantees on Las freeness or order preservation, even in the best case, we're losing 462 packets. So we add in our loss, freeness guarantee. Now our move operation takes longer takes about twice as long, but we're not going to lose any packets.

B

However, we do pay an overhead penalty, so there's these buffering of packets that are happening at the controller while we're waiting for the state transfer to finish, and so there's 881 packets that end up being buffered and they experience about 120 milliseconds extra latency on average as a result, but never higher than about 150 millisecond penalty.

B

If we add in this order preserving requirement, we again see another increase in the amount of time it takes, but we don't see a significant increase in the amount of overhead that we're imposing on packets, although there are more packets that we're imposing this overhead on. So here, with this board of preserving operation, we end up buffering 883 83 packets at the controller and also I- didn't talk about this, but there's another approximately a thousand packets that we buffer at our Center second network function before they're processed.

B

So the overall takeaway here is that these operations are reasonably efficient, but the guarantees that we want to offer in some cases do come at a cost, and so it's important for control applications to have that flexibility to decide whether or not they need these guarantees. So where are we going from here? What's the next steps for open enough? Well, the first thing is that there's a lot there's, there's buffering that was happening in the Los trina's case.

B

There's even more buffering, that's happening in the order preserving case, and so the question is how can reduce this amount of buffering that's happening in an effort to reduce the number of packets that receive extra overhead and in order to reduce the memory usage of our system? So one thing that we can do is rather than pausing traffic and immediately saying before this state transfer starts. I want you to start raising events. We can allow the network function to continue to process packets and then any packets that are processed.

B

We essentially reprocess those packets, at our second instance, to bring its state up to speed similar to what virtual machines do when they take when they're doing migration, where they'll take a snapshot and then replay updates to memory later on, to bring that snapshot up to speed before finishing the final migration.

B

The second thing that we can do is either improve the scalability of this system. So right now, all these packets and oh, the state, is going through the controller, which means there's a limit to how many operations we can handle simultaneous of the controller, but it turns out the controller doesn't have to be involved. We can actually use a peer-to-peer mechanism to transfer state directly between instances of a network function and still get all the same safety guarantees that we want.

B

Lastly, I said we need to modify the network functions and obviously there's a lot of network functions out there. So how do we make this task easier to do? Well, where we can use some techniques from Berman alisis in order to analyze the network function code and automatically figure out what state is this maintaining and what state do we need to actually export from this network functions as we have some ongoing work in that area.

B

So, in conclusion, I hope, I've convinced you that we need something more than just an FB and sdn. In order to be able to realize rich scenarios, we want to dynamically reallocate packet processing. Particular we need the ability to quickly move copy or share network function state and do it in a way. That's also safe and we've achieved this with open enough. If you want to learn more or if you want to try out the code for open an app, I encourage you to visit our website open an FCS wisc.edu with that I.

B

Thank you for listening and I'll be happy to answer any questions you have.

C

Thanks Aaron error: are there any questions.

C

Yeah kevin faulk, Carnegie Mellon, so one question is the things you talked about. Look like they're all cases of / flow stadia to move. So, if you're doing something I think you mentioned at the beginning, there's things that could be not perfil a state that could be pretty large.

C

So, for example, is this a chunk of malware that I've seen before as opposed to does this packet have these bit set, so you didn't I, don't think you had the graph that actually shows kind of the size on the x-axis and the impact on the y-axis other. You had examples. So what does that look like? Do you have that or.

B

So I guess I don't have an exact graph. The best I can show you. The best I can put up here is sort of this which says: here's how how much done that so in the case of iptables I, can give you an idea that the state for a single flow is less than a kilobyte in the case of bro, we're talking about a hundred or two hundred kilobytes of state for flow.

B

So it is, it is reasonably small, that's true, and so one thing you can do is is is to be able to start to proactively copy some of the state and our replay events. That's future work would enable that. The other thing I want to touch on that you mentioned is this idea that everything I was assuming here was / flow.

B

So a good example of multi flow state would be objects in a cache and you may say: well, there's there's cash sharing protocols out there or we can just go ahead and and not worry about it, because it'll just get recast right, and so that may be a trade-off you're willing to make in some cases, as you may say, it's not critical that I copy this state, so I'm not going to bother, but in some cases, if you're actively serving a connection and you're saying here's an object, I'm serving to this client.

B

If you move that connection in the middle of serving that object, then you definitely have to go about copying that object over well.

C

Or take the cash it later. I suppose right I mean there's some. It depends on the semantics you.

B

C

B

C

B

Exactly it's, it's very dependent, so.

C

I guess there's another kind of question related to that which is probably bigger than just your, but so find a cascade of three or four of these functions and one of them Rob's the packets in some ways such that reclassification of the prior uplink thing needs to be done, but now you've migrated one to some other place. What kind of situations could I get myself in with respect to do that? In this era? Scheduling something or other you do to deal with that yeah.

B

So we've so we've thought a little bit about sort of the chaining scenario where you have many of these network functions that you're passing through so I think. We think that in many cases you can sort of, if you have a chain, you can sort of my great for one middle box in the chain at a time, and you are doing some temporary redirection. In that case, you can certainly do better scheduling if you look at the entire chain at a time.

B

We need to give some thought yet as to how we sort of extend our safety guaranteed to guarantee them across multiple network functions. Yeah.

C

The last comment was related to you said about chain moving the state ahead of time or something our program. Analysis move.

A

C

To me that something along the lines of some distributed, almost like a tribute to shared memory or something where, where you can look at page accesses if you're willing to eat some of that page fault time, might be relative thanks.

A

I execrable of Lanka I just have one comment. That's in some cases you can't fix this problem just with the controller and they're moving the state, because you.

C

Speak into the mic, please.

A

If you have a problem, if you have the subscribers which was registered in the Sun network element, you can just move with him because he must be aware, as you know, made registration two different that requirement. So in some cases only application itself can move the state and coming from other elements, that's now the SUBSCRIBE all I move the state from this unit from da sein. So I mean it's in some up for some application. You can change the state with the control, but for some cases you can do it only with the application level.

A

So application itself must decide how to correctly move the state. That's.

B

True, sir, so I agree that there's certainly there's some information you need to know about. The network functions go to know how you're going to go about writing these applications and that's something that we haven't yet done. A good job of capturing we're hoping that, ideally, some of our program analysis could give you a simplified model of how this network function works or potentially, give you recommendations on ears which are huge, which you should have your control application do, and if you have it, do this you'll get this equivalency level of output.

B

If you have it do something else. Instead, you'll get this equivalency level of output. So some interesting questions about how you communicate that with someone who's trying to write a control application. Thank.

A

D

So for the state move one sir: what's the condition to check that hole, so is that I have 10 it's aesthetically, complete good or so.

B

D

About that, that's.

B

Really up to the control applications how they want to do it. So your control application and the scaling scenario could be. Maybe it's monitoring, CPU and it says I'm going to monitor CPU and then I'm going to do some sort of measurements of what are my elephant flows to figure out exactly what set of flows I want to move from one box to another, so that's completely flexible and you could implement whatever you want it there right.

D

So and because I assume that when you move something from one place to another place, if you also cost you something like been with us, awesome blog there so.

A

Initially, you.

D

Want to move because you want to meet the essay way, but right.

C

D

When you move, actually, you are like a spending like some bandwidth, cuz I, you know, make the problem like even worse and then I believe he'll. You know from the presentation, it's not quite clear to me and also this moving thing. You know you can really solve the SAO a problem so.

B

It's true that there are, there are other SLA s, not only I, guess I was referring to as a lace for the network function itself, but you're right that what you're doing in the network can certainly have an impact, and so one way to do that is to be a little bit more proactive and say alright, we're getting close to violating an SLA, and so we want to recognize that we better start this migration now, otherwise we're for sure going to violate the SLA s that we have but yeah I agree.

B

There's there's some interesting questions there and that's one of the reasons we also want to look at. How can we reduce the amount of state that we're transferring, and so some of our program analysis is trying to understand, rather than exporting all of the state that the network function is maintaining? Can we figure out what state was updated since the last time? Maybe we create a snapshot in a failover situation, or can we figure out?

B

Maybe some state affects the packets that are output, but are not by our network function and other state affects the log. And maybe we say you know in something like a caching proxy, we're not really concerned about the accuracy of the log. So we're not remove that state, and so you may be able to limit what state you move in exchange for a relaxed notion of the behavior of your network function and how much it compares to what would you would have gotten if you didn't move at all.

D

The whole opus funny good first of all, is that you are mentioning continuously middleboxes and well. We are working with netta French that are related purely to the beta playing a mean, rotten functions and forwarding functions in general. How.

A

Do you see these.

D

Kind of framework applied in that the environment.

B

So it's an excellent question: I haven't really thought about it in terms of control, plane devices I've only really thought about in terms of data playing devices. I think I think there's probably a different problem there and potentially a simpler solution. When you start to talk about things that the control plane, sort of the thing that comes most mind is work, that's being done in the distributed SDN controller case, where there you're SDN controller, is your control plane, and so there you're concerned about moving state.

B

But you don't have packets, necessarily that are going through this controller, and so you don't have that challenge to do it. Yeah.

D

No I mean I mean I'm thinking about the make it you're moving. The function is performance, / penalties and B, and they kind of I mean.

A

D

This for the air, for the data playing functions of the other hand, is precisely the penalties that you will occur in the quesadilla player. A former framework like this, something that I was.

C

Thinking whether whether.

D

In the case, for example, we have a project on visualizing, the others mm-hmm, whether that's we could apply these in science. Only environment, just and I was curious, if sure.

B

D

B

I think you could I think one challenge that you certainly face is sort of. Where is this going to which is sort of standard NFP challenge you know to migrate in an across the entire continental United States versus to migrate it between something in a metro area is going to be a really different situation and one is probably feasible. The other is.

D

No, this is some kind of state you have to preserve as one other right. Second, if I've understood well, because they see a BCS to share my my view and to see whether you share it as well as I see this as a similarity between these- and these are the some time ago. I remember when in object-oriented programming is a object, persistence framework I think this. This is a clear connection right, so this is very much connected with this. Yes,.

B

We haven't, we haven't necessarily looked specifically at that body of research, although we have started to look at it actually as we're doing some. This program analysis, because there's all sorts of things to figure out what objects exist beyond the processing of a single packet and what objects are only used during the processing of that one packet at this middle box and so I think there is definitely a broader body of work there that's worth considering, because.

D

There are some researchers have been checking with that. They are starting to think precisely on a network programming paradigm, that is object oriented, and they are precisely one of the properties that they were thinking about, what about persistence and this kind of rubric ability? This is something that it was taking note because probably will- and finally it's about that- you were mentioning here- a control application, the culture playing. This is something that your start thinking about. Well, did you take care of the dsdna architecture? You have the end of your collection and well.

D

We have having some problems in match in them and that.

B

There is a third dimension, because this.

D

Is an additional dimension right right so well we have three access now, how we put them.

A

In space, I mean.

D

I won't on application that is running according to as the end principles that can be can apply any of the orchestration, and now we want to replicate it. How do you see these? Are these all the whole thing marching and.

B

So I think I think. Certainly some of what this controller is doing could be part of another controller. That's already doing some, the sdn or the nfe orchestration things in the network.

B

But it's not it's unclear how tightly you can integrate those because they're each solving us that there is solving a slightly different problem and so I think there's just going to be me need to be some interfaces there for the same reason that when you're talking about NF the orchestration, you may have an interface into your system. That's going to worry about launching the VMS themselves and figuring out where they're going to go, and then a system that's going to worry about. Okay.

B

Now what nfm is I'm actually putting on this, so even there that could be split into multiple controllers, potentially so it's sort of at one point: do we end up with too many controllers running around the network and I expectable? We are rapidly approaching that and it's a break, a run problem, but.

D

Rather the center eyes controlled, you do end up with three four five centralized control.

C

Rights can be no.

D

This isn't it isn't going to be a problem for us operators to check.

A

B

A decision, but it's not.

D

Interesting anyway,.

C

There any other questions for Aaron I did have one um I'm wondering it's it trivial to sort of bound the amount of buffer space you need in the controller, or is that so we do kind of bound the number of flows you can migrate to stop that? Yes,.

B

There's a couple different things you can do so in theory, it's reasonably predictable. You know how long you know how much on average, how big state is, and we can predict how long it's going to take to transfer that but you're right they there's this trade-off of the more state you're transferring the longer it takes the more buffering you need to do so.

B

One way to do that is that you say all right: I'm going to move these flows in pieces and I'm going to move ten flows at a time after I've moved those ten flows, then I can move on and move the next time flows.

B

The challenge you run into there is that now that you're breaking flows, smaller, your forwarding entries in your switch, need to be broken down that much smaller, which may be okay, may not be okay, but yeah buffering is certainly initial enjoyed this framework and something we still don't have a great answer to how to go about reducing that.

C

All right well, if there are no more questions than I, think we're done so thanks. Aaron thank.

B

C

Can whoever's got the blue sheets bring them up here? Please.

C

A

A

A