Internet Engineering Task Force IRTF, 27 Jul 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF111-ANRW-20210727-1900

Description

ANRW meeting session at IETF111
2021/07/27 1900

https://datatracker.ietf.org/meeting/111/proceedings/

A

B

A

Can start I'm I'm sharing the slides, but just let me know I mean I guess.

C

Yeah, I can see it perfect.

A

Okay, just let me know next slide and.

D

A

I mean it's just a recap of yesterday's. um You know introduction. So it's nothing sounds lovely.

A

Hello randy.

C

C

All right we are on the top of the hour, so we can start hello. Everyone welcome to session three of anow 2021. So this is day two um I'm drawing myself.

C

I'm pleased to welcome you to this uh third session, um in which we will talk about uh interconnection and routing.

C

um So we have two talks, um one from uh romer from two from iij uh talking about hunting, bgp zombies in the wild, and the next talk is going to be on uh meta peering. That was automated isps election by um mustafa. My name is amrish fukiya. I work for the internet society as an internet measurement and data expert, and- and I guess we can start first of all, um let me introduce roman roma is a senior researcher at iij. His current research interests include traffic, modeling network data analytics and anomaly detection mythical.

C

Can you please play the video for the first presentation.

E

Please hello, everyone. This is roman ronson from iaga research lab and today I will present our latest results on bhp zombies. So, first let me explain what a bgp zombie is, so this figure represents how one prefix is seen by wrist routers on the y-axis. Here you have all the wrist routers and on the x-axis you have time. The prefix we are looking at is one of the risks. Basically bgp beacon, prefix and the green circles here shows that the prefix is announced by one of the router.

E

Then the green line shows that the prefix is active.

F

Yeah, sorry, just one second, I misconfigured the the audio device, so the volume was very faint. Just give me a couple of seconds and I'll fix this.

G

A

So while we wait, maybe I can just tell everybody that we corrected from yesterday and now you do have slides in the data tracker in case. You want to go there and check at least all the talks from yesterday until they have the slides there. So I just wanted to add that.

A

E

Hello, everyone. This is roma francine from iaj research lab, and today I will present our latest results on bishop zombies. So first let me explain what a bgb zombie is, so this figure represents how one prefix is seen by wrist routers on the y-axis. Here you have all the wrist routers and on the x-axis you have time. The prefix we are looking at is one of the risks, basically bgp beacon, prefix, and the green circles here shows that the prefix is announced by one of the router.

E

Then the green line shows that the prefix is active in the router routing table and the red cross shows that the prefix is withdrawn by the router. So here it means the prefix is active for two hours. Then it's withdrawn for two hours and then it's announced again for two hours and withdrawn again and announce again, and this is what we expect from bgp beacons.

E

But you can see that there is three lines here that represents three router: that think that this prefix is active during this time, even though we know that the prefix was withdrawn by arrive, and this is what we call bgp zombies. So here we have three zombies and in in summary, a bgp zombie is an active entry in a routing table that correspond to a prefix that is in fact withdrawn by its originals. So we've looked in the past. We've looked at this bgp zombies and we used bgp beacons to do that.

E

That was what we published in palm 2019, but it didn't really tell us anything about the regular prefixes we're using on the internet, and this is the goal we set for this work here. We want to see we want to monitor big zombies for regular prefixes and see if it was, if it's as bad as what we've seen for our b cards, so for beacons.

E

Defining zombie was very easy because we knew already when the prefix is withdrawn and when it's announced again, but here because we are looking at any prefix on the internet, we have to find out when an original is going to withdraw a prefix and to do that, we are looking at a metric, which is the number of active routers for prefix. It's shown here in that figure so that metric range between zero and one one means that all the router we are using see that prefix as active.

E

So it might decrease a bit and there's some noise in in this metric, just meaning that some router might withdraw the prefix. There is some topological change that can happen of the on the internet.

E

That could cause this, but when we see a significant drop and it's when things start to get interesting either the drop will go uh down to zero. In this case, it's mean that all the router we're using, they always run the prefix.

E

If it goes down and then up again and then stabilize that means this. There was some significant change in the topology, so maybe that prefix was withdrawn and then renounced by the same origin or different origin.

E

But what is interesting for us is to see when there's a significant change, a significant drop and then this metric, the number of active routers, is stable, but at a low value here it means that only a few routers didn't withdraw that prefix and if it's, if it lasts for a certain time, um then we're gonna say that uh this is a bgp zombie. So.

B

This is our definition.

E

Of the bgb zombie, when we see the majority of the router of within the prefix, we're gonna, wait 90 minutes and after 90 minutes. If we see that the prefix was not completely completely withdrawn or it wasn't reannounced, then we're going to say this is a zombie you can check the paper for, for somebody tells why we use 90 minutes.

E

We have some explanation for that in the paper.

B

I don't have time.

E

To explain so using this very simple zombie detector, we analyzed six years of bgp data and found 6.5 million of bgp zombies, and we looked at different things in these zombies. First, we ran some sanity checks. I will explain the next slide and we also look at some of the characteristics of zombies in the wild. One of the first sanity check we've done is to look at what we call the state variance between recipes.

E

So here's an example of a zombie that appears in a3, so as3 is reporting a path to a certain prefix space is s3, s2, s1 and 0..

E

So uh s3 tells us that it can reach that prefix through this ss, but we have access to one of dcs through this and, and that is tells us that it has withdrawn this prefix.

E

So because those two information are conflicting. um We say that the states here are incurrent. One of the riskier tells us that it has a path to a prefix, but on this path we see that one of the us has actually withdrawn the prefix.

E

So this um really tells us that there is in fact a zombie and it's not a misclassification, so we use that to validate our result, uh but to do that, we have to have pass zombie pass with at least two respears in the in the us pass.

E

So about 68 of the zombie pass with detective detected of two wrist piers and for this 68 of pass. We found that almost 95 percent of them have incurred stats.

E

That means they are really zombies and the rest are not really conclusive, because it could be that the two wrist peers we have access to the both are affected by the zombie. So we cannot really say if it's a risky classification or not.

E

Another thing we looked at is: where are the beacons in a result, and we found that the risk bgp becomes account for 3.2 percent of all the zombie we detected and 3.2 percent seems to be a small value, but actually it's very big, because there is only 27 risk become prefixes out of the over 800 000 prefixes.

E

We have on the internet, so only those few prefixes have created a lot of zombies and one of the questions we had then was: uh okay, maybe noisier prefixes like hp, beacons are more prone to zombies, and this is what we looked at in this figure we took.

E

We took the prefixes that lot of zombies and we look at the number of updates for these prefixes and you can see that, as the number of update is increasing, the chance to have zombies is also increasing.

E

One takeaway from this is that beacons are not really representative of what we observe for the regular prefixes, and this was especially true for ipv4, and we also looked at zombies for popular content networks, so we took 42 asm that are commonly found in the top 15 alexa umbrella and majestic list, and we looked at how many zombie there here's the ranking of this asm and over time, and you can see that some of them are zombie almost every month.

E

We did this only for two years in 2018 and 2019, and we looked a bit more at this asm. We look at a different characteristic, and what we found was that um the top s and popular content networks um usually have also either like they announce a lot of prefixes, or they are very long, especially.

E

And if we think about it, it kind of makes sense um if we assume that bgp zombies are due to uh bugs in routers, then um a longer space lines mean they're, gonna imply more routers and thus more chance to hit one of this back.

E

And finally, in the paper we looked at some of the side effects of bgb zombies. We found that 77 000 zombies created the tools, for example, it will be cases where the path will follow backup links instead of the usual as passwords.

E

Here we also found that uh some zombies have an origin that is different than its uh covering prefix, because that that route is stuck then we might have a wrong original information.

E

I won't give the detail here, but you can look at the paper. In some cases, the bgp zombies.

B

E

Create routing loops and we found over 400 of potential routing loops in our result. I also advise you to look at this presentation from kellnag last year, where they give a concrete example of routing loops and also the address route to to show that okay. So that concludes my presentation in this work. We look at bgps on this for regular prefixes.

E

We found that zombies are widely spread. We found zombies for very popular content networks.

E

uh It's even though it's widely spread, it's not as bad as what the uh our past study uh using bhp con was suggesting, and uh also here in this.

B

E

Some interesting side effects from bgp zombies. Thank you for listening.

C

All right, so um you can start queuing up and ask question to roma. um Is roma? Are you here? Yes, great.

C

Good morning, thank you good morning.

C

So do we have questions for roma.

C

If not, I have a question for you, um so uh my first question would be: what do you think could be the cause uh it could be the cause of these pgp zombies? Is it 40 routers? um What could.

E

C

E

um Thank you, that's a very good question and that's definitely something is missing in our study. It's it's simply missing, because it's uh it's very hard to check all the causes of zombies.

E

The main cause we think is, is yes, faulty routers and bugs in in routers, um but uh yeah, because it's just like very hard to check uh all the different. uh You know software version and in in uh it's probably a lot of corner cases where things don't work, uh it's it's hard to check yeah, but uh that's why we think it's a problem.

C

Okay, okay and and do you have an idea of what perhaps could operators do implement to you know to mitigate the effects of zombies, um so.

E

What we've heard from operators is, um they sometimes see those zombies appearing and the common practice is to re-announce prefixes and really withdraw them, because um so, when you withdraw prefix, you have like a small, very small chance that a zombie appear and uh by just re-announcing and re-uh withdrawing is, it seems to uh usually um remove zombies. That appears uh we've heard from some that did just reset. They are all a bgp session, which is a bit uh a bit brutal. But that's.

H

A good old way.

E

Yeah yeah yeah yeah, um but yeah. I think the the main problem for operators is to monitor this, to find to know that there is in fact problems right and for that we and for the paper we do monitor zombies, but in real time it's uh it's sometime, a bit hard to do because you have to.

E

If you use recent rogers, that's a lot of data to process, um so it can be a pain of like taking all this data, um but still like ripe, provides some tools like uh pgp play, for example, where people can look at how their prefixes is are seen by uh by risk.

C

Okay, I see uh we have one person uh in the cube colleen, you can proceed.

H

Hi, uh can you hear me? Yes, yes, hi uh nice talk, um so this this is uh not even close to my area, so this this question may be uh completely may make very little sense. uh I I was wondering if you were seeing any difference in behavior between uh ipv4 and ipv6 prefixes, um given that the the interpretation is that it may be rooter bugs and they may be exercising different code paths.

E

Yeah yeah very good question, so the.

E

In this paper, I don't think we've done much comparison between ipv4 and ipv6, but uh we had a previous paper on that and we've seen the bad news is: we've seen uh much more zombies in ipv6.

E

And one reason we really dig into the results we saw. We saw that uh um one of the network was creating a lot of zombies and we contacted them and they were saying that uh yeah they had some problem with uh their ipv6 and they were restarting their their bgp sessions. When that customer complains.

H

Yeah, okay, great! Thank you thanks.

C

Right in the cube, please.

C

A

Hi, can you hear me now yeah hi ramon. Thank you so much for the talk very interesting. So I'm just wondering because I mean you processed so much data, and I know that going through all the information from risk and from broadband is just you know, complete uh a work in itself. So thank you so much for for taking the time to do that. So maybe I missed this.

A

um I'm just wondering how much of what you're, seeing as zombies are actual zombies and how much is just you know some some hiccups or accidents that may occur or or have you been able to separate um reliably these two and maybe a follow-up, is just have you seen like some big offenders in the sense of you know, somebody who, may you know, generate too many zombies, or um do you see sort of like a very skewed distribution in the you know, origin of these zombies. So again, thank you. So much.

E

Yeah. Thank you. That's very good question too. um So in.

E

In this paper, we didn't really look at the source of zombie, but the previous one uh did a bit more work on that and we found uh so we were. We had a technique to to find what is the source, which is where the zombie created, and it was, uh it was not really like. There was like not one big offender, it was like uh changing quite a lot, and that also uh gave us some more evidence that there is like those bugs it's it's a bit random like how how the zombie are created.

E

um So last time we worked with uh bgp risbeacon and they are just doing the same thing like over and over like every four hours, um but even though they are always doing the same thing, we could see that uh zombies appear different places in the network.

E

So, even though we have like this very controlled environment, you know things were changing all the time um and and for the. The first question is um how we check like they are really uh zombies.

E

So in this paper we did like some of this sanity check, like um uh the difficulty with with this work now was that we're working with uh past data we use like six years of bgp data. So it's very hard to go back in the past and check you know what really will happen, but the the previous studies we uh did run tres routes.

E

Every time we found the zombie we run tressrot, and we could confirm that every time like we see like a few routers that will um forward the packets and then we receive a smp message saying that the network is unreachable.

E

So we could confirm that- uh and I don't remember exactly the number, but it's like over 90 percent were really zombies.

A

Thank you so much thanks.

C

C

All right, so I see jared in the queue so we'll take the last question before closing: go ahead. Jared.

I

uh Yeah jared moch akamai, um I mean I mentioned that, because in your paper you highlight a number of our prefixes that apparently regularly get you know, get stuck and such uh yeah. I think there's there's a lot of things that contribute to to that. On our side, we've definitely been trying to improve uh some of our prefix stability efforts as well. um You know because we have a large set of distributed deployments uh and then a lot of them a lot of these like the 16625 as originated prefixes.

I

Those are all uh coming from bgb speakers that run on uh either on routers or on servers. uh Specifically, that may may be more likely to actually have some uh negative operational impacts when, when those things are put into service or taken out and when we've been trying to improve uh the prefixes, so I'd be interested just to to know or see if you're still seeing this from us and if it's improved recently, because we've undertaken a number of efforts to improve this.

E

Oh, that's all: okay, okay, that's very interesting! Then um I guess yeah we could check again and and see like if there was uh some improvements and but but I get um one thing I'd like to mention is um I remember for akamai we see because you announced a lot of like very small prefixes, I think at different places in the network.

E

Sometimes we see a very long as pass for these prefixes and I'm guessing that uh so zombies will appear for those like very long uh as fast and and I'm guessing that this has like little to no impact for your traffic, because that's probably where it's not the place where you're gonna direct your your clients.

I

Yeah we have a lot of distributed unique deployments as well as uh we now have uh actually three different backbones that we operate that interconnect all of our sites together, so depending upon where the regional interconnection is for uh the the distribution of the content into uh into a customer network. You know you know in japan, it might be different than in india than somewhere else. um You know obviously you'll see different as paths uh you know for those based upon the the relationships we have with the service provider.

I

So uh it's it's quite possible that you know some of the prefixes might be negatively impacted uh depending upon that upstream provider uh you know, network property and and and we actually, we have teams who are actually working full-time. You know going and chasing things like that around, uh as well as systems that kind of monitor and detect it, but they tend to just take things out of service uh and then a human has to go and chase down and figure out what what happened?

I

um You know we quite often see uh you know I. I think it should be no surprise to anybody. Who's looked at bgp research. uh We see a lot of interesting events all the time and so when the systems just take stuff out of service automatically, it's really you know it's doing that to improve the customer experience, and that happens all day every day.

E

Yeah, that's very interesting thanks.

C

Thank you jared, so this bring us to the end of your presentation. Roma. uh Thank you so much uh uh for coming here so early for you um uh yeah. Thank you very much. So we can move. Thank you very much bye-bye.

C

We can move to our next speaker um who is charzeb, who is a phd student at the university of central florida and is currently working as a research assistant at the networks and wireless systems lab his key interest areas of research on network architecture, internet peering and data analytics.

C

Please play the the new the next video please.

J

Hi everyone, my name, is shazee mustafa and I will be presenting metabearing today.

J

So finding suitable isp to pair with is a slow process and isp admins put a lot of effort and lot of time into selecting suitable pairs.

J

Why is that? Because isp admins often attend events sponsored by pmdb, nano, nano, etc where they network with each other and using these events they identify isps for potential appearing and after the negotiate, after that, they negotiate traffic exchange terms and conditions which uh may include traffic, the max traffic volume they are willing to exchange or the specific points that they are willing to pay at or whether or not it will be a public, pairing, etc.

J

And then, after this step, um if and only if, both of the.

J

J

Hi everyone, my name, is shazam mustafa and I will be presenting met appearing today.

J

So finding suitable isp to pair with is a slow process and isp admins put a lot of effort and a lot of time into selecting suitable pairs. Why is that?

J

Because isp admins often attend events sponsored by peer and db, nano nano etc where they network with each other, and using these events they identify isps for potential period and after the negotiate, uh after that, they negotiate traffic exchange terms and conditions which uh may include traffic, the max traffic volume they are willing to exchange or the specific points that they are willing to pay at or whether or not it will be a pay public, pairing, etc, etc.

J

And then, after this step, if and only if, both of the isps agree to the terms and conditions, the bgp forwarding rules are written, so the deployment actually takes uh takes place. So I mean overall, since the whole process requires a ton of manual worker is extremely slow and it oftentimes takes a couple weeks, two months and even with such an elaborate and lengthy process. Finding the right peer is hard and uh say you put in like two months into selecting your peer.

J

It's not guaranteed that it's it's going to be an optimal uh option because see the internet is far more dynamic than these inter connection deals, which means that during the negotiation and finding process there plenty of great peering opportunities are discarded under or over estimation of various metrics can lead to future disagreements.

J

Rsps can be part of sub-optimal relations knowingly or unknowingly, and we have also seen cases of isp's ending period parent contracts because of disagreements and violations.

J

Now bad selection can also mean that your resources are not optimally utilized, so sort of your load. Balancing factor is suboptimal and in the bigger picture and in the longer run such relations sub, optimal relations uh or disagreements or missed opportunities. Everything uh in the longer run, such relations can hurt isps both isps financially.

J

So it's clear why it's so important for these to be optimal and um along with that to be dynamic. So if you've identified an issue, you can fix it quickly.

J

Now we present meta pairing, a tool that will help identify optimal peering isp pairs, and it also gives you the best peering contracts, so say for two given isps. We need to decide whether or not they should be appearing, and if yes, which particular locations they should be appearing at, so we first calculate the traffic matrices with both the isps which is sort of their internal traffic flow.

J

We calculate these using pop or points of presence locations which are locations where an isp has connection capability, so this can be an ixp or a private facility too.

J

We also identify the locations where both these isps have a presence, because these are the points where appearing is possible, uh then we gather the gridded population data for the united states, which basically divides the whole country into small segments, and now we know the population for each of these segments- and we take all of this data all of these computations and feed them into this policy generator machine which uh turns it up extracts all the useful information.

J

um It knows what's important. What's not so it does that, while respecting any additional peering policies injected using the policy filter. Now, how does it do that? It basically gives out three different outputs.

J

The first is so: it basically uses the pop locations in the population data to construct an overlap map between the two isps and what it represents is the number of people each isp presumably uh covers, and how many more people can become accessible with appearing deal. This is summed up in the affinity score.

J

The next one is, um I use the same applications and also the traffic matrix matrices to give out peering willingness scores for each of the common populations from the perspective of both isps. So the overall willingness for the for a particular appearing deal is just the average of these scores at particular points.

J

Third, it optimizes the selection of publications using the individual willingness scores to generate acceptable, peering contracts.

J

We then take the geometric mean of the willingness score, which is the representative of the willingness to pair and affinity score, which is the representative of the non-overlapping areas and population um to get the felicity score, which tells us whether or not these isps should appear. Now. Please note that these scores are novel they're, not like an industry standard. We came up with these scores and we have discussed how we came up with this and what they represent so with the felicity scores.

J

Isp admins can set a threshold. Okay. If this pair has a phylicity score of more than 0.6 or 0.7 will be pairing, otherwise we will not. So this is the deciding factor for whether or not they should be appearing and, along with that, we give them the acceptable pairing contracts that okay, if you decide to pay these are the locations that you should be paying now. Both of these results can be used by isp admins to decide whether appearing deal will be worth it now.

K

J

An effort to expedite the development of this model. We have deployed a web application to showcase the workings of this tool. These screenshots provide an overview of the website for sprint and ebay.

J

We can see the overlap map so uh which is calculated using the pop locations. As mentioned earlier. We can also see the willingness course for each of the contracts that are possible. Not all of them are listed in the screenshot, but they are possible and at the bottom we can see a sample contract recommendation uh which is given a sprint and ebay, decide that they are appearing metapeering or tool recommends that they should be appearing in los angeles and chicago.

J

uh So the website lists top three such contracts, so this is the best one and then there's the second best option and the third best option, and just for reference- uh here's another example for columbus and ebay, and in this case the same overlap map is given, the willingness graph is given and a sample contract is given. But in this particular case our model does not recommend peering, but in case they do end up deciding that they should appear. They should be at san jose and ashburn.

J

We tested this model on 23 different isps, which basically means 506 pairs using two heuristics. Now the x-axis um we can see on the x axis. We can see the isp pair type, where a is access c is content and t is transit. So a t for example, means that it is an access, transit, isp pair first, so the first heuristic, the isp view we recommend peering. If any one of the two isps has a felicity score greater than a certain threshold.

J

In this case, we used 0.55 for the second um heuristic, the holistic view we'd recommend peering. If and only if, both of the isps have a velocity score greater than 0.55.

J

The figures show a slightly more detailed confusion, matrix sort of so the top left is the true positive. The pairs we suggested and we're appearing according to data bottom right shows the true negative. The page we didn't suggest appearing and we're also not appearing according to the data uh cada.

J

The top right shows the new recommendations that we generate, and the bottom left shows the peering isps that we recommend that they should not be appearing and it can be. It can be seen that there is a lot of missed opportunities and some some sub-optimal relations as well.

J

We believe that metabearing is a step in the direction of a more dynamic and automated isp relation management, and we are working on an extended model, a more complex model, a more complex metabolic model which uh uses uh machine learning techniques to learn from previous data, to learn what's important uh in appearing deal before giving out its recommendations.

J

So um this concludes my presentation. uh I hope you liked it. If you have any related questions, I would be happy to address them now.

C

Thank you and welcome. I see we already have one person in the queue randy. You can go ahead.

B

Okay, um clear my mind when you're waiting up here and use is when you say one of the serious weights is population.

B

Do you mean the number of human beings.

J

L

I'm sorry, can you.

J

Repeat the question so so.

B

When you're giving the weight to a particular isp and you use the term population, are you talking about the number of human beings.

J

Correct, yes, the regional population.

B

That kind of seems to assume this is all eyeballs and there's a lot more to the internet than eyeballs there's content and eye candy and infrastructure etc.

B

So how are those accounted for.

J

Yeah, so the metrics that we use um assume that traffic is directly related to these eyeballs that you, as you mentioned, and it's directly proportional to the population around that area. So assuming that there are eyeballs, um the traffic originating from that area is directly proportional to the area that all of the pops it will cover. So it's just a heuristic measure um which we use in approximating the peering suggestions.

B

But if you're heavily weighted towards eyeballs traffic tends to go to eyeballs, not from eyeballs.

J

For this particular case, we're assuming uh an even distribution, so two eyeballs into uh from eyeballs.

B

C

Thank you, randy. Do we have any additional question.

C

um If not, I might have one so, uh for example, did you take into account the cost of peering and also the cost of transit? uh For example, what percentage of you know of of saving someone would do by by appearing in your model, for selection of a peer.

J

For this particular project, we have not uh focused on the economic side, but um another project that we are currently working on, focuses just on the economic side, so optimizing peer selection uh based on how much you can save so that's yeah. That is something that we are working on, maybe in the future, these two projects combine, but right now, in this project we haven't considered that.

M

C

You do we have additional questions.

C

I don't think so. um If not, I might have another one. uh So I see that you focus your your your model based on data coming from from the us. So uh do you intend to expand it to other regions of the world where perhaps uh interconnection is a little bit different than the usc.

J

Yeah definitely uh we've seen uh that the trends, peering trends and interconnection trends are different, especially for european ixps. So the um the reason that we have focused on u.s is because the model needs to train using peering trends right. So if we combine two, it will be a problem, but yes, the same model can be trained for european isps where different mattresses have different weightage.

J

Simply because how ixps are established over there.

C

All right, so I'm watching my clock, so we still have one minute. If anyone has a last question or even a question to to the previous speaker, please please proceed.

C

I don't see anyone in the queue so I'll hand over to andhra. Thank you very much, shazik and roma. Thank.

A

A

So hi everyone yeah thanks so much. We just um are closing session number three, so I'll invite and mundo to help us share session four. um I don't know if you could share also, maybe the slides or not just let me know- and I can share them just to remind people how we're gonna run the session.

D

A

And when I think we lost you.

D

A

Okay, because we we don't see the slides, I don't know, if uh should I trash.

D

Share: okay: I thought you wanna.

A

Oh just let me.

A

Try so maybe we can already start now the the session, because I think you see my slides now.

D

A

Okay, great so I'll just hand it over to you. So thank you so much for for helping us with this session I'll leave you in good hands. Okay,.

D

Okay, can you see uh the this by session slides or I guess you- uh I I'd like to show these the to share my sk, okay, it. uh I was going to share my slides here, but if you have perhaps it's easier that if you share yours.

D

Okay, so hello, everyone, my name, is ed mundo, this suzy silva. It's a pleasure to be here and we have a very nice session on monitoring the internet traffic. uh Today we have four papers uh and uh uh first paper towards layer. The lamentary will be presented by uh justin human, the second paper. uh It's about the spin beat also measurements. Then the detection consumer iot device how you detect iots uh in the wild for the lens of isp and the less paper it's on the evolution of internet flows.

D

So it's uh the way you're gonna run the session and allow a very short question after each presentation and then you have a panel at the end of 15 minute penalties. That's correct, andre! So without further ado, not to take time from the speakers. Let's go to the first uh presentation: uh uh justine human, from university of liege he's uh on research unit for in networking. So please can you show the video.

D

A

um Sorry under here there's no sound for me, I'm not sure if other people have the same issue, if you can help us with that. Thank.

M

M

Hi everyone, my name, is justin from the university of liaison belgium and today in this talk, I'm gonna introduce you to something that we have called cross layer telemetry. So basically we try to find a name as explicit as possible, and I will go much into detail about this.

M

But first let me remind you some basics, so we moved from a kind of monolithic architecture to uh microservices, so you can see microservices everywhere now and there are a lot of reasons for that, um mainly that uh it's easier and faster to deploy to maintain, but there are also other uh reasons.

M

um So if you look at this kind of architecture- and let's assume there is a problem somewhere- and I ask you to to debug and to find the problem here- you would tell me okay, easy game right, but what about this one? So it gets a little bit more complicated right.

M

So hopefully, for that kind of spaghetti, microservice architecture, you have uh application management performance, which is actually application performance management. Sorry- and in this case uh more specifically, you have distributed tracing tools. In this talk, I will uh take jaeger as an example, but you have to know that there are a lot of other alternatives.

M

Jager is just a famous uh one uh in all of them, so such tool is very useful uh when we are dealing with microservices and they all have something in common. They all have the same notion: the same concept of trace and spans.

M

So you can see a trace, has a hierarchy of trace and spans, so a trace can contain sub-traces and also spans, and you can see a span as a unit of work of code.

M

So this is just the part of your code that you want to monitor all right, and here at the bottom of the slide, you can see a screenshot of the jagger visualization, so you have basically a main trace which contains two sub traces, and each trace has some some child uh span, so here a span and then a span, a child span, etc, etc.

M

So you can see on the right on the left that you have a clear visualization of the flow of the code path and on the right. You have the time taken by each part, so it's very useful for those those two reasons you can see, as I said the past taken by your code.

M

So if you execute this one, what is going to be triggered, etc, etc, and you can see the time that it took, but such tool is limited to layer, 5 and above so, if you have a problem, low level, low level issue, so a network issue, for instance.

M

Well, that's when you're facing problem and to just to cite an example: let's assume that my database lookup is slow, so I can see in jager that my database lookup is slow because I traced it in that case. Maybe should I just blame the app or should I put the blame on the server or even the database, or maybe this is a network issue or actually it can be anything else. So it's hard to to to know exactly what what it is.

M

So you don't know why, and you don't even know where exactly the problem is, so the only thing that you know is what jager tells you it's that it's slow? That's all!

M

So, let's just see a basic, simple topology here, so you have the app the database and this guy in the middle of the path, with a congestion of one of its interface, and so jaeger will just report a slow execution time right. So when you see this, you will investigate on the app you will investigate on the database and actually you wouldn't find anything. So you would be left wondering scratching your head and wondering why it takes so long and that's that's a big problem because root cause analysis.

M

It's it's it's hard in this case, so the goal here would be to include also include layer, 3 and layer 4 in such tool, and actually this is exactly exactly what we provide with cross layer telemetry and to reach such a goal. We first have to answer three two key questions.

M

The first one is: how do we find a way to correlate the traces from the atm with the corresponding network traffic? So back to my example: if you want to trace exactly your database lookup in your code, you want to match the trace generated by the apm with the network traffic, which is the db lookup.

M

Okay, on the link, and for that we will use iom. So iom is just um it's in situ operation, administration and maintenance. It's actually used uh to carry some useful data uh in packets. uh We have developed it in the kernel and it should be available soon. uh Why do we use iom? Well, we want to kill two birds with one stone. So, as I said, it can carry a lot of useful low level information.

M

So, for instance, you have the queue size, the preferred location, ids of nodes and interfaces from where the packet is coming from and where it's going to um and so on and so on, and on the other side we just enhance the io header to carry um so both the trace id and the this pan id. So remember what I told you uh about uh the common point of those tools: they all have the same notion of trace and span, so a trace id and a spanner id both represent a unique id of a span.

M

So that's why we just carry the two.

M

So the first important question is censored: let's face the second one when and how should we inject these ids so when? Well, we have two possibilities either at circuit creation or when sending data. So if you think a little bit about it at circuit creation, it wouldn't be enough. Why? Well, because an operator could use the same socket for all connection all along or you could have for the same connection, multiple trays see if you want to monitor different parts of your code, you would have multiple trays on the sims okay.

M

So that's not an option to inject those ideas at socket creation and, moreover, you don't want to modify the c library, because you would have to modify also high-level uh languages and that's not an option. We want to provide a tool, an improvement to those tools which is not um a burden. So you you want to integrate it easily without changing everything. Okay, so we will just select the option of uh injecting those ids when we send the data, but now how so again we have several possibilities.

M

The first one that comes in mind is to modify the send functions uh offered by the c library. But again you don't want. As I said, you don't want to modify everything, because if you modify this you will have, you would have to modify also highly high level languages, so not an option.

M

So we are left with two other possibilities, which is to add a new cisco or use a netlink call again, if you add a new syscall, syscalls are not always portable and the preferred way of doing this is usually to use netlink call. So from a kernel perspective, it's always the best option, so we just select the netlink call as an option.

M

So let me explain you a bit uh this architecture of uh crosslair telemetry.

M

So, first of all, we have a client which is a jager client in this case, which which is used to add tracing code to your application and actually, when a trace is available, it will be sent to the agent which will forward to the collector, which will apply some action on it and then store it in the database. Okay, now we had a clt library, which is also a client library.

M

We had the netlink call and we add an iom agent and an iom collector okay, so the clt library is just an abstraction of the netlink call, so the application will just pass those ids to the clt library, so those ids will be injected through a net link call and they will be injected where well to the socket.

M

So again, someone will tell hey. This is a per socket solution. So what? If you have some congestion in your cure? Maybe that one packet will be marked with those ids and it shouldn't be well you're correct and we are currently working on a perfect solution which will be per packet.

M

So it comes with a lot of other challenges, but I think we are pretty close to a perfect solution, but right now this one is working pretty well, except uh in the corner cases I I mentioned, but again the the main goal of this one is to have some in useful info to debug so from layer, 3 layer 4. As long as it's low level.

M

It's not that important that you have a perfect match between the connection and traces.

M

If you want a perfect match, then stay tuned and we will post the next version, which will be a lot more precise, so back to our architecture when there is a request, so here https request or it could be whatever it is as long as it's network traffic.

M

So when the packet goes out of the interface, it will be marked with the ids that were injected here, okay and so actually the application, or it's not important if it's an application as long as it's a node on the other side, and you have an iom agent that will run on this part and this one is responsible for gathering iom data and it will forward those to the iom, collector and this guy will just correlate ium data and any data that you want from your tree and layer, four with the ids that you have passed, and so it will send this to the jager collector.

M

So this is a correlation request and so the jager collector will be responsible for just storing the correlation inside the database and actually, as a result, you can see directly in the jager visualization iom data or layer 304 data as you want. As I said so back to our example. You have the application database we introduce and we will simulate um a congestion here on this guy on its interface and as you can see. So this is the first node. So this is the app second node at third node.

M

So the second node is this one, and you can see that the egress q is increasing. So if you are the operator now that you see this well, you directly find root cause analyze. Okay, so you can see. Okay, there is a problem in the queue. Maybe I could rebalance it or you are. You are now capable of applying some actions to to solve the problem, but again without cross-layer geometry, you wouldn't have those data. So the only thing you would know it's that it's slow once again.

M

M

So let me conclude on this talk. um I definitely think that it's a hot topic in the industry, so I've heard that there is a lot a lot of interest on this one, and I do believe that clt solves a lot of challenges in the tracing world.

M

We are still working on some bar to improve it, so I mentioned earlier another version which will be per packet so to have a perfect correlation and match solution, and there is. uh There are also some other things to improve, but not that important. So I insist on the fact that this solution is working. So this is something that you could use right now, if you want- and there is a link to the github repo so feel free to have a look at it. There is also a video to demonstrate how it works.

M

Everything is explained here there so feel free to visit it and that's it for me. Thank you very much for your attention feel free to contact me on my email address, or you can ask me a question right after this thanks.

M

M

I can't hear you at.

D

Mundo keep have to press.

D

So thank you very much for your for your presentation. Very nice too. uh Let's see if there is anyone in the queue uh right now to ask questions.

D

I guess people are shy at the beginning, but I I I have one question: do you uh foresee, because in your implementation, if, if I recall uh uh you have to do some stuff by hand once you get off the information, so do you foresee, uh uh instead of manual interven invention, can possible implement uh potential to automated analysis uh that could be attached to your tool or have you thought about it?.

M

Yeah well, we could, but you have to know that below clt this is iom, and so this is the configuration of iom. That is the biggest part. Actually so, operators using iom usually configure it by hand, but again we could provide some tools to to automate it to automatic it. So um we could also uh provide a way to merge everything to the rpm tool, but maybe it will be a lot of burden for each tool.

M

So I think it should be uh better to keep it kind of decentralized and maybe uh provide some tools to configure iom yeah.

D

Okay, thanks a lot, let's see if there anyone else in the queue it's a nice thing of being the chair. I keep asking questions then.

D

In your paper, you, you have set up experimentation in the lab. The are you planning anything in in, let's say real setup or I'm wrong. You already did that.

M

No, of course you're right, so it was a pretty limited test bed just for the sake of the paper, but we are definitely planning on uh expanding the test bed to to have some more real test cases.

D

Thank you and, as you said, this everything is free right. I can just uh yeah.

H

It's really available to try to.

D

Okay, okay, so if I I'm not mistaken okay, this is yes. We're gonna have a panel afterwards, so I think we should proceed and I please ask you to uh remain unto the panel. So thank you very much for your presentation and and see you uh soon so.

B

D

Let's, let's go for the next talk: okay, the I guess is going to be presented by ike kunsan from uh our wth university uh he's a phd student and a researcher. So please.

N

Hi, my name is ikonsa in our work. We have investigated the four spin with cousins, l, q, r and t, and in this talk I would like to share with you some insights on their performance regarding packet loss measurements.

N

Before I go into more detail on the different spin bit cousins. Let me first give you a quick intro. Why work on them is actually meaningful.

N

I think it is safe to say that network measurements have always been important to get a better understanding of what is going on inside of the network. However, measurement techniques have typically been developed independently from protocols, and thus they oftentimes depend on externally visible protocol semantics.

N

A prominent example are tcp sequence, numbers and acknowledgements which can be used to compute the round trip. Time of a connection. Let me quickly illustrate that with a short example. What we have here are two hosts interconnected by a network probe in the middle. If the host, on the left hand, side now sends a packet with a certain sequence number, the network probe in the middle can store that sequence number and then basically start the time.

N

The packet is then forwarded as usual, and the horse on the right hand, side then answers with a corresponding acknowledgement.

N

As soon as the acknowledgement then arrives at the network probe, the network probe can basically stop the timer and compute the right hand side half of the roundup time. Unfortunately, such techniques are no longer possible in times of encrypted transport protocols such as quick because they are the protocol. Semantics are no longer visible to an observer to still allow for meaningful measurements. The quick standard features a special purpose bit, and that is the spin bit.

N

It is a dedicated bit in the quick short header and visible to onpath observers, while the spinbit allows for round time measurements. There are also other important network properties that one might want to measure in this context. There's an ongoing discussion in the ippm working group focusing around four different proposals that are similar to the spinbit, but enable packet loss measurements.

N

Let me quickly explain how they work in the simple example, so we again have a server on the left hand, side, a client on the right hand, side and now a network probe in the middle, but this network probe now is only able to capture the traffic on the downstream path so from the server to the client.

N

The first spinbit cousin is the saw called lbit and it builds upon and host based loss detection, as is, for example, incorporated in quick.

N

In other words, the end horse detect loss and then the lbit reports that loss into the network due to this design in our setting, the server will report the overall loss on the downstream path. The second spin bit cousin is the sole called qubit or square bit.

N

As the name implies, it generates a constant square wave signal, or in other words it first transmits a certain number of packets with a set qubit and then a certain number of packets with an unset qubit. The network probe can then simply count how many packets have arrived in, which phase and can thus derive the packet loss that has occurred here on the downstream one leak. The third approach is, then called the arbit or reflection square bit and builds upon the qubit.

N

In essence, the end house deploy qubit observer logic and then report the number of observed qubits back into the network. Due to this design, the arbit in our setting covers the overall upstream path as well as downstream 1..

N

Finally, we then have the so called t-bit where we basically have one train of packets, which is reflected several times between the server and the client mapped to our setting. Our observer will now only be able to compute the packet loss that has occurred on the overall link, so from the time that the train has left the observer under one direction until it has entered the observer again from the other direction.

N

As we can already see here, the approaches differ in which parts of the network they actually cover- and this is already the first important decision factor when we want to decide on which of these approaches we want to choose.

N

The second aspect would then be the measurement accuracy. So how accurate can we actually determine the packet loss rates? This directly leads us to our contribution, because what we have done is an experimental evaluation of the four spin bit cousins using a mininet-based testbed.

N

At this point, it is very important to have a fair comparison between the different approaches, and this is why we induce packet loss only here on the downstream one link, because that is the only link that all of the four approaches can actually cover.

N

We've then investigated three different scenarios. In the first setting, we have induced random packet loss on the downstream one link in the second setting. We have induced burst packet loss, and for that we have used the simple gilbert model and then finally, we've also considered the impact of different flow sizes on the measurement accuracy.

N

In this talk, I will mainly focus on the first setting, so, let's get directly into it, the main goal of our random loss scenario was to find out how well the different approaches perform under ideal circumstances.

N

For this we use symmetric traffic and disable congestion control so that there's a constant flow of packets. We then transmit roughly 1 million packets in each of our experiments and perform 30 iterations for each of our settings. We then report the cumulative loss rates that the different approaches have derived at the end of the experiments.

N

In addition to the four approaches, we also check the ground truth, so how many packets have really been lost?

N

What you can see in this plot is on the x-axis, the configured loss rates and on the y-axis, the measured loss rates.

N

If we now look at the result of all the four approaches, we can see that they are mostly very close to the ground truth, especially for higher loss rates. The only thing that stands out here is that the t bit is not that accurate for low loss percentages, as evidenced by a larger confidence intervals. Here. The main reason is that the t-bit includes two pause phases, and thus it doesn't actually cover the whole traffic.

N

Next, we wanted to find out about the timely behavior of the different approaches and thus looked at individual measurement runs. What you can see on this slide are the first 10 seconds of one hand-picked measurement run where we had configured loss rate of one percent.

N

Looking at the albert, we can see that it closely follows the ground truth, although there is always a slight timely delay. This is reasoned by the fact that the average depends on the end host based loss detection, which takes a bit of time until it actually deems the packet to be lost.

N

Next, looking at the qubit, we can see that it also follows the ground with quite closely. We can also identify the longer algorithmic intervals and the qubit actually represents an interpolation of the ground truth.

N

Next, looking at the arbit, we can see that it is significantly higher than the others here in the beginning, and also it starts later than the other two approaches. This is due to the fact that it builds upon the qubit and first has to wait until the first iteration of the qubit has succeeded.

N

However, in the long run, it then also follows the ground truth quite closely.

N

Finally, looking at the t bit the previously mentioned longer, algorithmic intervals are now really visible, because it is the only one to only produce three readings in the first segment.

N

Additionally, it also has the highest fluctuations in the measurements and takes really long time until it gets close to the ground truth so summarizing. It can be said that the albert is the closest representation of the ground truth, while the q and orbit are not far behind only the t-bit struggles a bit and takes a long time to get close to the ground truth.

N

Let us finally get back to the questions stated in our title, so which spin with cousin is here to stay well solely based on the measurement accuracy. The albert seems to be the best choice as it closely follows the ground truth. However, it depends on the end, host loss, detection and there's always this slight temporal delay between the actual packet loss and its reporting.

N

On the other hand, we've seen that there are longer algorithmic intervals for qr and t, and these come with two disadvantages.

N

First, there's a decrease in accuracy when they are subject to burst laws, as is evidence in our second experimental setting. What we did is configure increasing average burst sizes and, as our results indicate the qr and t bits struggle. If the burst sizes increase, the second disadvantage of those longer algorithmic intervals is that they prolong the time until the measurement stabilizes.

N

These results now stem from our third setting, where we investigate a different flow lengths, as is evidenced in the plot, the different algorithms start, their measurements at different times, so the elbit starts. First, then, the qubit joins afterwards, the tibia joints and, finally, the orbit joints.

N

This observation is especially important if we want to observe short flows or if we want to have fast measurements.

N

However, in the long run, all the measurements start to stabilize at the same time, so at roughly one to two megabytes. So overall, the measurement accuracy seems to be suitable in all of the four cases, although there are, of course, differences in them, but which of those approaches now to choose.

N

I think that another property of those approaches will eventually be the deciding factor when choosing which of them to deploy, and that is their power segmentation properties.

N

These eventually decide on how closely the network operators will be able to localize laws, and thus it actually depends on the needs of the operators on how fine-grained they want to localize the laws in their networks, because from a measurement accuracy perspective, all of the approaches should provide reasonable results. For that.

N

Thank you very.

D

D

uh We have time for one question to ike. Thank you for presentation, let's see, and then uh let's see, if I see anyone in the queue right now, uh not so white people are thinking. uh Let me try one question to you. I uh do you think, uh uh have you thought about trying to measure the duration of uh packet loss burst and uh do you think that would be feasible with uh the the the scenario that you have.

N

So basically, you mean like uh how long the individual bursts that we have occur. Yes, so uh basically, that is not directly the intention of the different techniques that we have there and I actually didn't come up with them myself. So I yeah. I don't think that those techniques are that feasible for that, but yeah. There are ways of determining how long certain bursts take, but I think that is mainly only possible for the q and r bits in that case, because the others are not that feasible. For that.

D

Okay, thank you very much for and uh cedric had the question and I guess uh you it disappeared from my screen. Okay,.

N

A

You have a question.

N

In the chat, I think.

D

N

D

uh Can you speak.

O

Yeah, I can speak. I was just wondering if, if that instrumentation had an impact on the packet loss rate, so if you had all this uh bit to measure loopback at last, do you do you change the actual bucket loss rate that you observe in the network.

N

uh So in our setting, we um only had like this artificial loss, so in that case we didn't really impact uh what saw the the actual packet loss rates, uh and I don't actually think that in in reality, that will be the case because, um as it is right now, the the spin bit we're.

O

N

To it basically yeah yeah uh yeah, that's what I was about to say something about. uh So basically, um the uh people in the ippm working group are thinking about adding the the loss bits to the two reserve, bits that are still available in the quick short header in that case uh that wouldn't really um yeah. So we would, in that case only use currently unused space for that, but obviously, if we add additional bits to the uh to the overall transmissions, then that might have an impact.

N

uh Although I don't- or I am not able to quantify that right now, but that's uh yeah, I think a valid thought about those uh those additional bits. In that case yeah.

D

Okay, I guess you, we deferred further questions to the panel. So let's thank you very much. So very nice work and let's go to the uh next uh presentation. I guess the presenter we're gonna be a side sadie from max planck institute uh he's a phd student in computer science. Please I can start the presentation. Please.

L

Hello, everyone welcome to my presentation, I'm sachawat sayidi and today I am going to present our work on detecting consumer iot devices through lens of an isp consumer iot devices have grown extremely popular recently, and it has been projected that the number of deployed iot devices will surpass 17 billion by 2023.

L

These devices provide wide range of services from smart speakers to smart appliances, tvs and surveillance cameras. However, it has been shown that these devices can be exploited, and one notable example is the mirai attack, where millions of exploited devices participated in enlarging, launching one of the large-scale data's attacks that triple parts of the internet and service providers.

L

Moreover, these devices can be exploited to exfiltrate private user information without users, knowledge or consent. All these problems brings us the question for us as network operators. Can we identify and locate iot devices that are connected to our networks?

L

This problem can be studied in different networks, from from small office to campus networks to even to large service providers. We collaborated with an isp of 15 million customers and we wanted to study this problem at the scale of an isp.

L

However, detecting iot devices as a provider level is not an easy task. The reason is that traffic patterns across iot devices are diverse, deploying uh there are. There have been some recent work that suggested that we can deploy some agent inside the premise of a customer. However, it's not scalable, and it is privacy interested as well and active measurement approaches that won't work if the devices are located behind the net.

L

Moreover, if you want to do a deep packet inspection, we will face serious privacy concerns among the customers of isp. One of the readily available data sources are flow capture utilities such as netflow and iap fix. These data sources are already collected by its service providers for their other operational purposes.

L

They, these data sets are sampled and they don't contain any payload, which means that they contain only headers of the packets like source ip decision, ap and so on.

L

This brings us to the question of whether we can detect iot devices using this limited, passive and sparsely sample flow data in the wild, and, if yes, at what granularities can we detect them and how fast can we detect them and how are iad devices deployed today, as observed in flow data?

L

The key inside of our work is that devices that we've studied should have shown repeating patterns of communication. That appeared even in sparsely sampled data, which we generated detection rules using as extremely limited packet fields, and we were able to detect devices or generate rules for detection of devices from seventy-seven percent of the study manufacturers and we detected devices in a dataset from an isp from within minutes to hours.

L

We leveraged the fact that iot devices in order to provide their services, they have to communicate with certain backend infrastructures, and uh if we focus on the destinations contacted by these devices, can we can we find out which of the subscribers of this isp have which type of iot device?

L

Our methodology has five steps. First, we generated a ground truth, iot traffic.

L

Second, we check whether we can see traffic from a single device from a single vantage point that isp the data set from an isp and third, we identified which domains ips and port numbers can be used to generate detection rules for different devices and then, of course, we project our detection rules. And, finally, we applied our methodology on a data set from a large european isp.

L

Next, we generate ground root, iot traffic, how we do it. We set up two test beds containing 56, unique iot products from 40 vendors in still six different categories.

L

We trigger devices to generate iot traffic, and next we connect our test beds to home vantage point inside the premise of isp and push back with iot traffic to the internet and captured iot traffic at these different locations. As seen as shown in the figure, we observe iot activity from uh from more than 64 percent of devices now tested in the isp dataset.

L

The next step is to basically identify which domains ip addresses and port numbers that we have generated in the ground with traffic can be used to generate detection rules for finding iot devices.

L

As it's an involved process, I'm not going into the details, but in in as an overview, we have generated uh detection rules for us at three different levels of granularity. The finest one is the product level. Well, we can say what type of product it is. For example, it's an amazon, echo or not, and the next level of granularity is a manufacturer level where we could only say it's a samsung device.

L

We could only say what's manufacturer and the the horses one is the platform level where we could only say it's an iot device, but we couldn't say what a product or which manufacturer it is. The generated rules account for 77 of the manufacturers in our test beds.

L

Next, we apply our methodology on a data set from a large ui dns. This isp has more than 15 million 15 million customers.

L

In this figure we see the duration of the data set that we have observed and in the in the y-axis, we see the number of unique subscribers that uh per hour that had the uh infer that had the infrared iot device.

L

What we see here, we were able to detect more than one million subscribers with alexa enabled devices alexa enabled device is an aim device that responds to amazon, alexa voice service commands. We also observed some journal patterns for alex and samsung iot devices.

L

The next question is that, if we uh what happens, if we increase our observation here here, we see this same plot as a previous plot, but with the 24 hour observation period, we should see here that increasing observation period help detecting even more iot devices.

L

Next takeaway is that we see that the number of detected devices are stable and there is not a lot of. uh There are not a lot of fluctuations in the number of devices.

L

In general, we detected a ut activity for twenty percent of the subscribers of the isp.

L

Now, if we zoom in on this 32 diff uh iot device time uh we have, we will have this plug here in the x ax in the left part in the y-axis. We see each individual device in the x-axis again number of devices in the number of devices per day, 24 24 hours, and they are also categorized according to their ranking in their country or in the amazon rankings in the country of isb for the ones that we didn't find ranking. They are put into other category.

L

What we see here is that device popularity in the amazon and isd look correlated.

L

Our methodology has its own limitations.

L

First, generating rules require studying a range of manufacturers products and if the domains and ip addresses that devices are using change, then we have to also update the rules and if the devices do not show or do not have enough activity, it will be challenging to observe them in the sample for data sets.

L

In conclusion, we have shown a methodology to detect iot devices based on limited sample flow data. We detected devices from more than 77 percent of studied iot manufacturers in a large isp from by by looking by considering minutes to hours of data and both popular and not so popular.

L

In future, we want to identify non-essential traffic generated by led devices at the scale of an isp.

L

If you are interested in the domains and signatures that you've generated to detect these devices, you can either scan this qr code or visit the following url. Thank you for your attention.

D

Thank you very much say we have time for questions. One question please.

D

D

Okay, so uh I yes for hearing one, no one in the queue. Okay, uh I have a question. uh uh Let's say: uh do you use uh just sports, nip, address sourcing, this destination, ip address and imports, or do we use anything more for the roofs.

L

For the rules, we use support, protocols and destination ip addresses protocols as well: okay, yes, yes, okay, and we also we generated rules to generate or finding which ip addresses to use. We use the domain names first, because we have captured the domain names in the in the lab, because we cannot use all the domain. We cannot use all the ip addresses that we that are contacted by the eye by the devices.

L

Some of these ip addresses belong to generic infrastructure like akamai or some cloud services, which are shared by contacted by large range of users and devices.

I

D

Okay and- and you identified, uh I in your presentation, said how many different iot devices you could were able to identify. I.

L

D

I

L

J

L

Devices from 40 manufacturers and if you have and you're able, if I remember correctly, for 4 33 unique devices and yeah 35 unique devices, we were able to do this, and but some of these devices be laying to say manufacturer and for those devices we could only say what is the manufacturer of this device and not only a unique product?

L

D

Very nice interesting and you said that you you can share that right. Yeah.

K

D

Community right, yeah, uh okay, I have one question from david oren: could you please speak? Please.

D

David, uh how do I.

P

What glitch I'm using! So I assume that, as an owner of one of these things, I really ought to use a vpn so that the isp can't see my iot devices.

L

Yes, yes, if you use, if you use a vpn, then we only.

P

See so what do you think the privacy implications of your local isp being able to fingerprint all these iot devices in people's houses.

L

uh There are multiple.

F

L

The first one is that if a device is known that being infected or participating in a larger scale, attack uh isps can notify uh the users and the for the owners of that device and say: okay, you have a device, that's infected and it has been shown in case of mirai attack where isps were actively engaging in notifying customers with the infected devices and even taking uh extreme measures, for example by.

K

P

K

That'll work for about five.

P

Minutes until the malware guy figures out how to hide the identity of the device.

L

Okay, you mean that once you know.

H

That the customer has enhanced.

B

L

Has this device device.

B

L

With the constant and it can.

B

L

Not without a customer's consent, uh you can always. You will know that this customer has, unless this has this type of device and uh and you don't need to repeatedly uh detect the same device for the customer unless if the device is moving from one customer, this device is only active for a few minutes in that customers and in our um setting, the subscribers were home costs for were home users and or fixed subscribers, and not a mobile one.

D

Thank you. So uh thank you very much. I guess we differ for the questions for the uh panel. We have to move on for because of timing. So thank you very much. uh Thank you very interesting, uh and so please. The next uh paper will be presented by simon bauer, who uh is at the technical university of munich, uh he's a research associated there. So please, the video.

G

Hi everyone, my name, is nimon bauer, I'm a research associate at the technical university of munich here at the chair of network architectures and services, and today I want to talk about our work on the evolution of internet flow characteristics.

G

As we all know, there's ongoing change on the internet regarding new technologies, adapting user behavior or changing internet services. We are all aware of improving network expansions. We all observed the smartphone boom during the last.

G

Decade, we are aware of the internet of things that rises and, of course, we know about audio and video streaming services.

G

At the same time, previous studies present methodologies to survey flow characteristics like flow duration, flow size or flow rates, but recent insights into flow characteristics in the internet are rare and therefore our paper produced the question. How flow characteristics changed during the last few years?

G

To answer such question, our paper surveys, the distribution and correlation of flow characteristics and applies different taxonomies to assess the relevance of heavy hitter flows on the sense of elephant flows, but also in the sense of so-called big fast flows.

G

Well before we start talking about our methodology and our measurement results, let me briefly introduce a scalable flow analysis tool implemented in go that provides large scalability due to parallelized packet, parsing and flow aggregation. um The tool is published as free and open source with our paper.

G

The tool consists of five major components which are illustrated on the right, so the first component is a reader that read packets from a pcap file that we have multiple parsers, that extract packet features from such red packets and those packet features are then written into a ring buffer component that is proposed to reorder packets.

G

Well, a regular routine then writes such reordered packet information into pools where we collect package features per flow until the flow terminates and after its termination flow data is written to the metric component, where we then calculate our flow characteristics.

G

Let's talk about flows, we identify flows based on the ip5 tuple, as we will focus on tcp traffic.

G

For our study we identified the start of tcp flows with the 3b handshake, of course, and we terminated tcp flow when we observe a connection tear down if there's an idle period in a of a flow for a certain timeout period, or we observe a freshly established three-way handshake of a ip5 tuple that is already tracked for identified flows. We calculate the flow size in the sense of the sum of layer, 4 payload sizes.

G

We calculate flow duration as the time interval between the first and the last packet we observe and we calculate flow rate um at the average data rate based on flow size and flow duration. Our study we compose a data set consisting of 28 traces provided by kaider.

G

Such traces have anonymized ip addresses and no layer, 4, payloads and each trace is provides one hour of traffic that is captured at a 10 gigabit um per second isp backbone link, as illustrated below on the timeline we select, 23 traces taken in chicago between 2008 and 2016, and five traces taken in new york between 2018 and 2020..

G

As you see, we have three periods without traces for several months, so on average we select traces um in an interval of three months, but there are three intervals without traces, larger intervals without traces, because they are simply traces available well and regarding pre-processing of such traffic, we only consider tcp flows that are longer than or equal 200 milliseconds.

G

This is also done by related work and proposed to filter out quite short flows, because calculating flow rates um or calculated flow rates are may falsified for short flows in case of single packet flows or if all packets are sent back to back.

G

So, let's talk about selected measurement results so, regarding the evolution of heavy hitters related work proposes a definition of heavy hitter traffic based on the 99 percentile of a characteristic, so the 99 percentiles of flow duration flow size and flow rate, so we analyzed the evolution of such 99th percentiles over time, as you can see plotted here on the right.

G

Let me point out two major findings, so regarding the 99th percentile of flow duration, here, on top of the plot, uh we observe that there's only little increase during the years 2008 until 2013, but afterwards we observe an increase by factor 1.5 between june 2013 and march 2016..

G

A further major finding of that analysis is the increase of the 99th percentile of flow rates, which refers to the bottom plot here, and we here and here we observe a clear trend towards a larger flow rates in the 99th percentile, where ie and increased from around 400 kilobits per second up to 800 kilobits per second in 2015..

G

Next, we were interested in the relevance of uh such heavy hitter traffic. Therefore, we calculated the share of transmitted bytes by such flows within the 99 percentile for each flow characteristic. We did not find a specific trend over time, so we so here the table shows the average across all traces taken in chicago.

G

In the second column, on the in the right column, we see the share of bytes transmitted by different percentiles and well, especially, the flows within the 99th percentile of flow size um represent a large share of totally transmitted bytes, with nearly 90 of all tcp bytes transmitted by such one percent of flows.

G

Further, we had a look at so-called big, fast flow, so chiang and all introduce a 2-2 taxonomy based on two threshold values to group flows regarding their size and their flow rate, and we had a closer look at the relevance of such big fast flows, which are represented by only a small share of flows, but, as we will see, um have a yeah large relevance regarding the share of bytes that they transmit, so we defined three threshold pairs.

G

um The first pair refers to the original threshold values from china at all, ie hundred kilobytes regarding size and 10. Kilobyte per second for flow rate, and then we increase the thresholds by uh one magnitude for pair two and pair three.

G

Let me highlight the increase of the share of bytes transmitted by the second threshold pair, illustrated in green. So here we observe an increase of big of bytes transmitted by big, fast flows um between 20 and 30, for traces in 2018 and 2000, until 2010, up to um between 40 and 50 of all bytes transmitted by tcp for more recent traces taken in chicago the values for the new york data set are smaller, which can be traced back to a larger share of small flows for such traces.

G

To conclude, my talk, let me summarize our findings so, as we uh have seen here in the talk and we observe a significant increase of the 99th percentiles of flow duration and rate, we find a large significant of heavy hitters regarding the share of transmitted bytes and we observe an increase regarding the relevance of big, fast flows during the past years further and not included in the talk.

G

But in our paper you will find an analysis of the distribution of flow characteristics over time and a study of correlation coefficients between the flow characteristics where we can confirm findings of previous studies, especially regarding a strong, positive correlation between flow rate and flow size, and that's it from my side and now I'm happy and looking forward to answer your question.

D

Very much simon, thank you very much and uh um have time for one question before the panel.

D

Okay, I guess people are different for the. I have one quick question: perhaps uh uh there there has been a change in flow since the pandemic. So are you looking into that the change of the frozen last year.

G

Yes, so so we did not do that yet, but we definitely plan to do that. So, uh as you have seen, we now worked on the kaider data set um well and qaida provides one trace per year, and so we're looking for further data sets that allow a more fine grained study and yeah, especially regarding the pandemic during the last few months. So yeah this will be, will be a topic for the future.

D

Sure sure, very nice, there is one question here from jen. Can you please.

Q

Yes, hi hi simon thanks for nice talk. I was wondering I'm not sure about the data set. um Do you know something about how the parallelization of flows changed over the years? So they are so like the change from ht1 to let's say quick or http 2, where now things start to get paralyzed over the same connection, so you would see less parallelization.

Q

Do you see something in those traces or is that hidden through the anonymization.

G

So we we did not had a closer look on that aspect of the data set so this. So I can't really answer that question.

G

But what I can tell you is that the dip addresses in the dataset are anonymized. So I'm not sure whether this affects the the study you just proposed.

D

Okay, so thank you very much. I think it's it's better to invite all these speakers to the room and now and and collect questions for everyone. uh So I'm not sure how to do that, but I think yes, okay, it's coming.

D

Okay, I think we we have everyone.

A

D

So, do I see any more questions from the audience or you can ask questions for each other.

D

Also, that's in fact, very nice audience. We had 80 people through the the audience.

D

So, let's, okay, there is a ali. Can you speak? Please.

D

uh I guess there is one in queue uh ali who said you you can just speak. Please.

L

There was a permission, so I just allowed. Can anybody hear me yeah.

D

Now, yes, yes,.

J

Yeah so yeah, the question is for the simon: do you see uh any um anything related to the walking dead so, but is there any correlation between the the ipv6 and and the port correspondingly being used, because the microservice architecture is dominating the development? So I'm just uh have you considered that aspect as.

B

G

So I'm not sure whether I understood you correctly, because the audio was not so so good right.

G

So if the question, if I understood it correctly, the question is whether there are some differences regarding ipv4 and ipv6 traffic right.

H

G

Okay, so we did not do an analysis of this and that so the current state of the analyzer hashes ip addresses to provide the faster, faster handling and further step of anonymization.

G

uh So we're currently working to to add ip addresses in plane and then we would able to travel. Look at ipv4 versus ipv62.

H

D

So anyone else, let me remove ali from q.

D

Anyone else regressions- and I guess I I, if not, I have a question for uh justin uh for the first talk: uh do you foresee in your in your presentation, in your environment, any performance issues to implement the the solution, the collection of all the results, so the performance of the heads have you thought about that or if any or do you see any problem.

M

Yeah, so that's a good question. So can you hear me yeah yeah, yeah, perfect, yeah, okay, so actually the the overhead is again it's all on iom, so the overhead introduced by the cross player telemetry is just a net link call. So that's not that big and all the overhead is on the iom side, and so we have already measured the impact of iom in another paper, and so unsurprisingly, the more you insert the more it drops.

M

So you have to find a compromise um depending on your hardware and and a lot of things.

D

Okay, so perhaps you you, you see that more. When in the real implementation, you you tune it uh better than you. You make yeah.

M

And by the way, just just a small notification to operators uh that iom is now available in the kernel, so it's for from two or two or three days and it will be available in 5.14 version. So, okay,.

D

Great thanks thanks very nice. Thank you uh paul, yes, colleen! Please, please.

H

I have a fairly general question for perhaps most of the presenters, uh the the the talk about spin bits. uh I mean that the spin bits obviously came from quick initially, um for the other talks does the presence of quick uh and the deployment of quick effect, uh the the types of systems you're building or the behaviors. You expect to see.

D

So is the question for everyone uh calling.

H

Or not, I guess I guess everyone apart from the spinbit talk, although I'm happy for everyone to answer it.

D

Okay, who wants to start that.

M

So I can start if you want. The answer is no. There is no problem. So short answer, quick.

H

Okay, that's easy great.

D

Anyone else would like to get yeah.

G

So so we also considered to detect click traffic, uh our data set, um but there was not a significant share, so we should not have a closer look.

H

Okay, but the tools can detect it.

G

Yes, yes, and also the analysis to be implemented, is capable to be extended for for further headers and protocols, so also this would be feasible.

H

L

K

L

I know since in our work we are mostly focused. We don't look into the header at all and we want we would probably won't be able to distinguish between quake and other protocols, I would say, won't affect detection, because we are mostly focusing on the destinations or yeah. Something like that.

H

Yeah but it doesn't help up to skate the traffic.

H

D

Thank you. Thank you.

D

Anyone else for questions.

D

uh I have one quick question for ike: did I pronounce your name correct? Yes, okay. uh Did you contrast the the measurements you you talk about in your paper uh like a passive measure which it's good, but did you contrast for packet loss of active measurements, the difference of doing that or do you see any reason to compare.

N

Yeah, so in our work we we didn't do that. So we were just interested.

J

N

Out um how well the different approaches uh can actually detect the the loss, that's happening and actually the the the normal traffic that we were sending was kind of the the active measurement part on that, and I think the the general idea of those approaches is also to kind of not have to use active measurements so that we can um yeah keep the the additional overhead on the on the network low and just measure it on the on the already passing traffic without having to actually.

H

Insert additional.

N

Traffic, so we so we didn't work, we didn't do it. No.

D

No, I I I understand. My question was more like if you uh would think would be a try good to compare one. I guess rather not in terms of overhead, of course, the overhead it's much more inactive measuring, but in terms of accuracy you don't see any any any any difference of doing.

D

Do that expect any difference.

N

Yes, I think the the the main thing is that if you use active measurements, then you won't be able to like get the loss. That is happening in your network and not under normal conditions, and thus you would always kind of have a different picture than what you would get when you just use those different loss techniques here, and so I don't think that this contrasting and that point.

N

Is something that I would consider.

D

Okay, thank you very much.

D

Any other question.

D

Folks, in the audience we still have.

G

D

Do I see anyone trying to ask questions.

D

D

Please raise your hand and you show up in the queue.

D

Well, let's see.

D

Okay, uh I have a question to uh uh simon, uh I guess in the in the your paper, you say that uh seven percent flows transmit at the rate between one kilobit per second and a hundred kilobit per second uh uh yeah. I I kind of wonder: is that uh uh do you expect this transmission rates, or did I get it wrong? It seems a slow transmission rate. There is any reason for that, or do you expect that or I'm just.

D

Interpreted wrong.

G

So if I understood you correctly, the question is, uh or you you mentioned, that we observed a large share of lows within quite a quite a small range of the.

M

G

Right, um yes,.

D

G

Yeah so spontaneously, I don't have an explanation for this, um so we we plan to have a closer look onto ports and even ip prefixes that may allows us to differ between different kinds of flows. um Then I might be able to answer your question.

D

That's really interesting. Okay, thanks a lot.

D

Folks, uh well, I guess people are maybe in europe it's well. It is late in europe not for me, people want to sleep so uh very nice talks. I I learn a lot from you, so I hope the audience uh enjoy and say we have a large crowd watching. I certainly uh uh will refer your your work to uh my students. So thank you, everyone for being here. I was really nice and hope the audience enjoy. So thank you very much for everyone.

D

Andra, you want to say.

D

D

Oops, I guess we can.

D

Close the session uh for today, okay, so thank you. Thank everyone.

G

G

G