Internet Engineering Task Force IRTF, 24 Jul 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF117-ANRW-20230724-2230

Description

ANRW meeting session at IETF117
2023/07/24 2230

https://datatracker.ietf.org/meeting/117/proceedings/

A

A

A

There we go hey.

A

A

A

B

He's like hello, everyone, please be seated we're starting the last session of today in two minutes.

C

D

D

Okay, welcome to another session. um It's now time for Alfred Aurora Arona to speak, and um the title of the talk is lowering the buyers to working with public career level. Data.

E

Yep and good morning, so my name is Alfredo, so today I will present our work on lowering the barrier to work it with public area level data, and this is done with my.

D

um Maybe, like uh speak closer to your microphone or like speak a little bit louder. That would be super.

E

Helpful. Okay, sorry thank you. Okay, so yeah I was saying that um I will be presenting our work on lowering the barrier for working with public reliable data, and this has been done with um a supervisor yoana from similar Metropolitan and materials from University of 20.. So the goal actually of this work is basically to of this. Paper is basically to introduce uh our Consolidated data to the community so that you can avoid uh going through all the challenges while working with a level data.

E

So next slide control.

E

Next slide, please.

E

Yeah so um of today, I will start with the little background about the system.

E

Then I will introduce the data, the original data that we have collected from the area and talk a little bit about some inconsistencies that we have seen on the data and I will finish my presentation with our proposal. Our console did data, so um Internet Resources, like such as ESL, number or prefixes, are managed by several organizations such as the um Regional internet registry, on behalf of Ayana and icon. So we have five internet registry and they have creation.

E

They have sorry regional coverage, but they share basically the same core function, but for the purpose of this one we'll focus on, we decided to focus basically on two main functions. uh The first one is related to maintaining a directory service, including who is, and each and I have actually extract uh part of the territory into publicly available, always and also delegation file, also called statistics file. Each area also provide a reverse DNS for the delegation to the customer and those file also provided they also provide sorry, the reverse DNS for this file.

E

So we decided to use those data for a project, so we'll assume that this should be easy. So let's look at the data, so here we have two examples, actually of reverse DNS on the real level. So, on the real level, we are expecting delegation to customers. So we are mostly expecting NS record, but you can see on the example of the first example on the top that we can also. We also have ipv4 and ipvc's uh I would say uh record that is not expected to see on the area level.

E

The example on the bottom is classless delegation, which is um the the best way to delegate a allocation lower than slash 24 using CNM, and in this example, we have two adjacent prefixes, 169, 2, 15, 39, 128, 26 and 192-26, which are just some prefixes that have been delegated to two different name server, so why this data was important for us at the beginning of the project, because we were looking uh basically to track, for example, limb delegation at the area level or map um a prefix to a name server.

E

So once we have the prefixes, we need to collect additional information about the prefix. This is where we go to the who is data set, so the who is provide basically general information about the resource and the way that the public available who is Data is presented is Which. Object is separate by an empty line, so here on the left, you have one example from Irene uh um who is a database where they use the route attribute which is not used by the the other registrative others. They use the inlet number attributes for ipv4 address.

E

On the right hand, side you have two objects from latnik and they use the inert num, which is common across other history, but you can see that you use a custom notation for the inlet object. So if you, for example, have your script running and you're expecting to see well from, for example, prefixes, you will encode a lot of problems trying to address all those inconsistency on the innate num how the inner attribute is used in this region.

E

How, however, the who is still used by natural researcher and operator, as I told you I just show sorry, it's come with some limitations, so we, while we are working with this data, we decided to address some of the challenges related to this data.

E

So, in addition, we've seen that the data that is publicly available on the whole is is not. uh You cannot have access to historical data. So it's just one of uh data. uh Different area I use different URL, where the public they produce the public available. uh Who is data set and I'll, show you there is a consistent term of object and also okay.

E

So on the table, you can see what I just showed you on the previous slide, I didn't using the the hood attribute instead of initlob for the app prefix, and there is no net name. Instead, iron used description attribute in like new region. There is no maintainer, there is no name for example, so we try to fill those missing obviously um attribute. Relying on other data sets on the reverse. Dns part is similar to the who is Data, it's not possible to have access to historical data.

E

We've seen an expected resource record and um in RFC 1035 is a zoom file should at least have an SOA record in addition to NS record, but we've seen on the data that is published by the area that most of them do not provide the SOA record according to the specification.

E

E

We try to address those social Engineers with our Consolidated data. So how do we proceed? We propose actually our consulate data in the common format, which is enter operable and optimized, so we organize the data in a year of the Year. Actually, so it's possible to have access a inventional, it's possible to have long signal analysis and the data is also designed to support large-scale analysis too. So we base our work on longest practice machine and we create what you call in an entire.

E

So we have a start and end address that you use as a key for each record uh that we have from who is. We rely on the delegation flight to complement a misinformation from The Who is for the reverse Zone. We also convert the domain to prefix of both classless and classrooms uh delegation. We also apply the same idea of identifier. We have a start and end address that you use to easily identify um each object so yep. So here we have uh one example of who is and reverse in DNS of the console data.

E

So we use some color code to show a what we did uh the black color. Basically show what was on the original data.

E

The orange color so shows the the key that we introduce in the data. So we have the start address and the end address that you introduced on the data. uh The green shows the data that we that was missing, for example, in this case we have the status and we have the country that we were able to uh complement from the delegation file and for the reverse DNS. We add a flag to show whether the data was from classless or classes delegation.

E

So this kind of easier analysis when we want to compare classless versus uh classes delegation on the IR level.

E

Yeah, so to summarize a little bit, what we basically did is use the publicly available data from the who is and reverse DNS data, and we try to address some of the challenges. So we add what you call it Notifier. So we have the start address and the end address that we that we can easier um or analysis using these two uh limitation.

E

We provide the data in launching manner. We start collecting data since November last year, so the data is publicly available. uh The data is compatible with data engineering tool on the website. We provide more information on the data dictionary and we also propose a basic python notebook that you can use to that. You can customize actually for your own needs, so yeah thanks for orientation, think the result. There's already one question.

F

Max Planck Institute for informatics I'm a little bit confused by the slide you had on the rir formats and maybe I just misunderstood something.

G

F

There it was said that AP Nick, for example, doesn't have MNT objects, which I do actually see, for example, in the apnic entry for Quad one um and I actually did a who is on the V4 prefix there and that and have a route entry. But a net range and The Cider range.

E

Okay, this this, you are talking about this table right, correct, okay, so this information comes from the publicly available data, so we download the the raw who is the data set that are produced by each area and we yeah. This is the Baseline of the analysis.

E

We use the public available data, so maybe, when you run the who is on your client, you are going to another I, don't know which which more enriched data from the from the registry, but this table is basically based on the public available, the one that is extracted and publicly available on the on the registry website so yeah. Maybe this is where the missing information comes from. Yeah.

H

uh Mark Hoster is Aaron, so I'm, one of the regional Registries here, so one of the things that just to clarify you're using irr data, to do your who is work as opposed to actually look at who is on Port 43.. So it's slightly different I understand the confusion between the two. It is what it is. It's been the years that this been this way so um that, hopefully, that helps to clarify things like your question that you had here earlier. Thank you.

D

Thank you all right, uh since we don't have any other questions. Let's thank the speaker again, foreign.

D

So while we are trying to display your slides, we have a sapnel here that will be talking to us, like you, basically have a call for collaboration for DNS integration,.

G

um And and his slates are amazing, this is why we were just building. You know interest here.

I

It holds clicker, yes, all.

J

I

Hello I'm Andrew kaser uh from verisign and today I will be briefly detailing our lightning paper, a call for collaboration, DNS, Integrations.

I

One of the ways the deployment of the global DNS has become more Diversified is through the integration of DNS domain names into new application, environments, telnet FTP email services and then, of course, later web browsing in the past few years. We have also observed blockchain and decentralized. Applications have emerged as a new use case for DNS domain names, which can lead to new application Integrations beyond the traditional use cases such as email and web.

I

One such use case is to use a user-friendly DNS domain name to be associated with a blockchain address or a piece of decentralized content which can make it easier for users to interact with such applications.

I

The way these interactions work is via a DNS integration. A DNS integration is a method that makes an association between a DNS domain name and a resource in an application environment. Today's Integrations can be categorized into two broad types based on how the association is created, utilized and maintained. Dns-Based and server-based a dns-based integration primarily uses DNS records, while the server-based integration primarily manages the integration via a server. We will touch upon examples of both of these to show how they are used in both pre-existent and novel applications. Today.

I

Finally, we will discuss some challenges that these Integrations face, such as accountant for the domain name, lifecycle, and why these challenges should be addressed. We will also suggest principles for a responsible integration between the global DNS and new application environments in the hopes of starting a conversation. Now that can continue at a future ietf buff and culminate in a set of best practices for different types of DNS Integrations. So current and future applications will have a clearer path towards safely and securely, integrating with the global DNS namespace.

I

Now, on this slide, we should see a graphical example of some of these relations that I just mentioned. First, you register a DNS domain name in the global DNS, and then you relate it to an application, and one of the questions we always ask is: could this pattern repeat itself for new use cases now before describing some of these use cases and the Integrations they use? We want to highlight that many of the new applications are not just from the blockchain and decentralized application. Community.

I

There are in fact many many discussions happening throughout a much broader set of communities. Other slide here shows a very partial list that includes ietf participants, irtf participants, icann w3c cab forum, blockchain and even private sector entities all engaged in discussions about DNS Integrations.

I

Now, given this wide range of integration from a number of organizations, and given that each integration makes their own trade-offs, it would be useful to bring these and other communities that we are aware of together at a future ietf buff to start the process of establishing best practices around different types of DNS integration.

I

To begin with, a dns-based integration primarily makes this association between a DNS domain name and another resource using DNS records. This is the type of integration that most of us in the room are probably familiar with because it includes the most common DNS use cases such as using an a record to relate a DNS domain name to a web host or using MX records for email Services. These are the kind of Integrations that you use on a daily basis whenever you open a web browser or use your mail client.

I

Newer examples are coming from the decentralization application Community, including through the use of w3c decentralized identifiers, for example. What Bluesky is doing to link a DNS domain name to a w3cid, did through a txt record for their platform. Another example is the proposed w3c did method did DNS, which stores adid directly in the DNS as a URI record.

I

If we dig a Little Deeper, there are also dns-based Integrations that can be used to initially prove control the DNS domain name, while the rest of the integration occurs somewhere else, and let's look at a couple examples to see what we mean by this. So the classic example is using your DNS domain, Zone to prove control of a domain name to be granted a web certificate, such as.

I

If you were to use the Acme protocols, DNS challenge now, once the user receives their certificate, they will install it on their web server and the integration itself will take place outside of the DNS Zone itself. Visa certificate is installed on that server.

I

A newer example of this comes from the blockchain namespace communities, such as the theory and name service in case those domains which are using dnssec data and txt Records stored in DNS to prove that a given DNS domain name should be imported and integrated into their given name spaces. Now, DNS is, of course, used to prove this initial integration, but once an integration is made, subsequent interactions will occur in that namespaces ecosystem instead of in the DNS.

I

And then the other type of integration that we mentioned in our lighting paper is server-based Integrations and they make the association by managing content on a server now compared to a dns-based integration.

I

You might wonder how is this going to differ and the primary reason it differs is that the knowledge of a server-based integration may not be gleaned from DNS Zone data alone, for example, you may need to interact with an application that tells you that a given DNS domain name supports their application in some capacity, and you have to go to their server or some other endpoint to fetch data. Now this can provide flexibility, especially in cases where storing such data in the DNS may not be feasible or desirable.

I

A classic example of this use case is also the Acme protocol, which you might recall that we mentioned in the dns-based integration and in the Acura protocol. You can use an HTTP challenge to be great in a search that I get that can then be installed on your web server. Now.

I

This tells us something interesting about pre-existing Integrations, in that they have methods to use both dns-based and server-based approaches such as the certificate being granted using either a DNS challenge or an HTTP challenge, and this kind of flexibility indicates, as we consider this topic moving forward. We will also need to consider multiple types of integration to support different types of applications.

I

A newer example comes from the proposed did. Web did method, which stores a did document on a well-known endpoint of a web server and again you'll notice. A similar pattern here, there's a dns-based approach for did. Methods did DNS that we also mentioned so this flexibility appears to be imported.

I

I would also like to note that these are broad categories in that not all Integrations are going to fit neatly into a dns-based or a server-based bin. What's important here for our conversation today is observing that there are many different approaches used by both pre-existent and newer applications today to integrate with DNS domain names. So it is likely that we will need to develop best practices for different flavors of Integrations moving forward to ensure that different applications that Target different use cases can choose an integration that best fits their operational profile and objectives.

I

Now, with all these Integrations in mind, we did want to discuss some concerns, such as interoperability and support, but today I want to highlight the synchronization aspect of a concern. You can check our lightning paper for a discussion of the other topics now synchronization between a DNS domain name and other namespaces and applications are not guaranteed once the integration is performed.

I

For example, the DNS domain name may be important, but there may be no clear process or mechanism or guidance to update the integration when the DNS domain name expires is transferred the zone changes or the content on the server changes now to Grant. Why? This is concerning. Consider the following example scenario: first, a registrant will use a DNS domain name and a DNS integration to integrate that name into some application.

I

Second, the DNS domain name will expire, but because the DNS integration is no longer synchronized, the now X registrant will be perceived as controlling the DNS domain name in this integrated application. Then, if the DNS domain name is re-registered, two separate parties will be perceived as controlling the same Venus domain, name dependent on the application context.

I

Users of the integrated application will see the information and data set by the previous registrant, while users of the traditional DNS applications will see data set by the new registrant, and, of course, this will lead to confusion amongst users amongst the DNS registrant themselves and amongst the integrating application.

I

So, with that in mind, we would like to set forth certain criteria that we believe are important for any responsible integration method.

I

The first circle on the slide is control, and this can be summarized, as can a DNS registrant be competent that only they or their authorized representatives are able to use their DNS domain name in the DNS integration in question, without being concerned that someone else may be able to claim or use the DNS domain name without their consent or knowledge.

I

The second circle is domain life cycle. Does the DNS integration account for the DNS domain lifecycle to avoid such synchronization concerns, as we just mentioned? Additionally? Is an integration aligned with the best practices and policies of the DNS Community? For example, if you support DNS tech-based methods in your integration, do you support the required and recommended algorithms from the DNS sec rfcs and, of course, does the integration expand utility without impacting the ability of the DNS domain name to be used for other purposes, including the pre-existing uses it was possibly being used for?

I

So, with all of this background, we would like to extend an invitation for a collaborative Community level discussion that will be needed to address the issues in this space to come up with responsible, DNS Integrations for diversifying the utility of DNS domain names into new application environments.

I

So please feel free to reach out to us if you are interested in a future bath on this topic, and please spread the word to other communities. You are aware of thanks for your time today and I look forward to interacting with many of you in the future.

D

Questions Peter.

J

Hello, Peter Thomason. um You gave an example about on slide, eight, perhaps of where the problem lies and I think you yeah, and the example you gave is essentially when I let my domain expire. I have a problem now. Is that the main issue we're solving, because it seems to me that that's maybe not best off with integration Concepts, but rather with not let having the domain expire right. um So so I wondered like what's the problem we're solving, because that doesn't seem to be it right.

I

It depends on if you're looking at the synchronization issue, from which perspective, uh let's say, for example, that you might have let the domain name expire, and you still continue to use the domain name in that integration because it just happens to work.

I

So you don't realize, there's been any problem or any issue or if we look at a blockchain example, if the name expires- and you still are perceived as controlling that name in the blockchain namespace and someone tries to send you cryptocurrency to that address, for example, but someone else now controls it in the DNS space they may think they were sending it to the DNS party when they accidentally ended up sending it to whoever the previous part.

I

So that's part of the spirit there is to try and solve both sides of maybe you let it expire accidentally. Maybe you did it maliciously, but that way we can try and account for this or understand if it should be accounted for.

I

Thanks for your question,.

K

Jim Reeds uh interesting ideas here, um but I think the problem I've got is trying to figure out where this kind of discussion and collaboration could take place. You've given a whole shopping list of things that could be looked after the future. Some look interesting, some, maybe not so interesting, but there's a whole bunch of organizations and institutions that could be involved in this. We've got the ITF we've got. Icann we've got various other industry forum and so four are going on. So where would you see this kind of collaboration and cooperation discussion?

K

Taking discussions taking place? How would you think that could be achieved.

I

Yeah an excellent question. um Our first step really is to try and have a boss to try and get more insight from the various communities involved to see who would be interested in tackling this question because you're right that some of these topics seem to be better if it's for the ITF, some seem to be better.

I

If it's for uh more eye can level discussions or w3c discussions, and it really sort of depends on who we can get into the room, to discuss these topics and decide what direction at that bot, for example, that we can take okay.

K

Well, I've got two points to make a question about that, just to be a little bit picky here. If we're talking about above that's a specific meaning in ITF context and I- think you probably don't want to have one of those kinds of Buffs, because those Buffs are supposed to lead to a working group being formed, but certainly having some place where these people could come together. For a group hog would be a good idea.

K

I think one of the challenges you would have trying to fight to to make that happen is finding a forum or a venue for it and I think some of these organizations are like to be very protective about when you're doing this little part of the problem space. Here you don't bother us with things that are going elsewhere and I think that'll be a challenge to get these people to think they could come together and work in a collective manner to look at look at these bugs.

I

Thank you, you have point will taking.

C

Hi uh Daniel Khan Gilmore, so thanks for bringing this up here, I think you've outlined a really a broad class of problems and um I think it can be challenging to get people to collaborate. When you know my use case might be something completely different from someone else's case, and the only thing that we share is that we have some kind of integration with the DNS right. I mean I, see this with the encrypted client, hello, uh uh fronting server up, uh DNS updates, for example.

C

How do you imagine getting people who work across such widely different Scopes to actively collaborate on this? And secondly, um due to this with the synchronization problem? One of the things that I think we see happening with the DNS is that um the DNS is used as a leverage point to create things that then have a different actual time scale than the DNS records themselves so like if I use Acme to get a certificate. The validity window of that certificate is not bound to the validity period in the DNS. So how how do like?

C

How can we think about aligning those time scales or, or is that hopeless like? What's, how do you, how do you see that.

I

uh So to take that last question first, um hopefully it's not hopeless um part of the motivation here to try and broaden this collaboration is to bring sort of diverse communities together, especially ones such as the afman community, who has a much longer history of operational understanding to help us influence and understand what maybe some of these new Integrations might be able to to do. And it might be the case that uh the scope is I. Think as Jim was also alluding to. It might be too much for any one sort of venue.

I

So it might be the case that we figure out as we interact and Associate, and have conversations and dialogues with people that we have to fine-tune the conversation to Target more narrow areas.

C

Okay, one one place that I you might want to look for. Inspiration is the UTA working group The using TLS and applications working group. It's a little bit more focused than the possible places you can integrate DNS, but take a look at that and see how they've dealt with TLS in a range of different options.

I

Excellent thanks for the suggestion.

D

Next speaker is Johannes NAB and he will be talking about um the title of his talk is got aquarium all again: repeatable name resolution with full dependency problems.

L

All right, um I'm going to talk about name resolution, um the stuff that DNS is all about, or maybe about, um let's start with an example to get us all back to speed. um Yes, so, uh for example, if you're resolving up trm.de, we start with the routines or I'm we're going to to talk about authoritative, name servers. I will start with the routines. We have some uh name servers names.

L

We have some blue records, we're going to start and for Simplicity reason after this figure, because we're going to fill it in later and we're going to emit all the IP addresses. We simply assume that they are somewhere in the zone, meaning in the root Dot and for the authoritative server fqbns, meaning the NS record names we are shortening them and simply pointing to in whichever zone They are going to be answered.

L

They are going to be authoritative in so initially we query the one of the roots servers we're going to get an delegation back for the de domain and have the name servers and in your delegation we also are the reverbals. We also get the glue records that we then continue on.

L

So, on the next step, we can simply ask one of the servers where we got the glue record as well. For tom.de we get an additional delegation back um with um three name servers and for one of those name servers because it's in sibling domain in DDE Zone. We luckily also get an A Clue record back and then we can simply ask this one and get um our answer back.

L

So during that resolution we more or less relied on new records. We have a heavy resolution pass, but we found a lot of stuff in ddns all those zones in in Gray, where we don't know the name servers, we don't know how we get there or what they could influence. So, if you're going to resolve them all, um then we're going to end up with a figure, that's going to be a bit more crowded and a bit more complicated.

L

So how do we get to that figure for every name that we get in the in an allegation in a referral, we're going to resolve it and we're going to continue until we have no more names to discover no more names to resolve?

L

um And then we have multiple resolution passes on our path, how we can get to our answer and how how we uh can or also what can influence the the answer.

L

So what's the motivation I mean we want to find and and resolve all because of dependencies? We want to build the Empire dependency tree that we can figure out what can influence the name resolution. We want to identify broken delegations for some definition of flame. We called them previously lame, um so like authoritative, name, servers that do not access or do not seem to exist. So if I try to resolve a name, I might get an NX domain back in the DNS. It could be in the root Zone.

L

If it's simply root crap, it could be in some Tod Zone, where I might be able to add my own records, authoritative, name style servers that do not answer. Timeouts ICP errors could be performance, problems and also performance problems, authoritative, servers that don't answer or don't give any useful answers back, meaning various DNS error indications, non-authoritative answers, um recursors that are entered into DNS errors and, in addition to Simply getting the DED um dependency tree.

L

We also want to query them all, and we want to also query the multiple data query and compare the multiple data copies. What are multiple data copies or what? What do we have in the DNS? We have endless records in the referral itself, but we also have NS records in the origin. Do they match? What can we learn? Additionally, here we have blue records and we also have degree records, hopefully an authoritative data and, last but not least, we also have for each domain. Hopefully multiple authoritative servers, so do they are, are they all synchronized?

L

Do they also have the same data, or do we have some configuration drift? Why is it important or why do we want to investigate that number? One is the security aspect: if there are some hidden dependencies there, some broken dependencies that could influence the resolution and also the performance impact. If we have records or name servers that will not work, then we're going to maybe spend some time there, but not speed up the resolution for the user goes for our goals, so our research questions are. We want to study uh the DNS dependency graph.

L

We want to find potential inconsistencies in this configurations. Try to evaluate the impact, but I left was a problem. If we want to do that, regular resolvers can resolvers do not expose that data and they do not even internally necessarily get all that data, because they might rely on two records. The the primary task. The primary goal is get the an answer to the user as as soon as possible. That's modest Benchmark there, so we we did the only uh thing that we thought to do.

L

We build our own resolver, foolishly in the attempt, because how hard could that aging be um and for our implementation goes more or less guided based on what we want to achieve, we want to discover always reasonable resolution passes. So, if there's an hidden primary server, we are not going to Brute Force the entire IP space to find that server. We want to query all data copies as reasonable.

L

We want to capture all those queries and save them that we can later on, provide provenance on. Why did we get that answer? Why do we have an additional answer? Why do we didn't get an answer? We want to be deterministic repeatable and we also wanted to be fair and efficient, so we don't want to overburden authoritative, name servers, especially if we query all data copies. So we want to hear good net citizen.

L

So implementation of suction resolver on uh with a very rough overview here, because more details in the paper, because that's gets a bit tricky on what we need to consider. So we structured the resolution models. We try to build our Zone tree and how it's observable in any wild, how it's observable in the internet, we find our authoritative server candidates, meaning blue records.

L

Routines result deep name, server, nay or DNS names within the resolver and also have to consider like if we can have a name server, that's authoritative or parent, and you try it as well. We do not get in referral back, but we simply get if we ask for a delegation or try to figure out the delegation simply an authoritative answer back in another code and for all the servers candidates. We are querying the SOA record and the NS record the NS record simply because that could lead to additional information that we can uncover.

L

The SOA record should exist, might not exist for for interesting configurations might give some hints on whether or not they are properly synchronized. Even if we don't see diverging data and we consider the names of- and we are going to use the name server as any of those two queries, it's going to provide us an authoritative answer.

L

If we get conflicting answers, then we are going to use the superset of them, because that's what all, if we want to figure out all resolution passes, that's what we need and we end up with a problem like if we have names in the NS records that we're going to resolve within the resolver, we're going to add new zones, we're going to discover new stuff in the name or they are also going to have Name Records.

L

So is it resolution all the way down? Let's get back to our figure. If we squint hard enough, um I mean it's already oriented that way, we're going to find some sounds that seem to interdepend on each other, like, for example, either zet.eu LS, Dot delhs.in are all zones that point to each other, where the name server records point to the Interlink or interdependent to each other Zone, including um itself.

L

uh If we go a bit back to graph Theory, we are going to find out, or that looks a lot like a strongly connected component, meaning from each node I can read or in in this group. I can reach I I can walk each other node and come back to to the origin, and I can influence myself for the DNS impact. That means that, if I have, um if I have a a name server in such a group, that's going to to provide an answer.

L

If it adds additional information, it might be even able to impact its own Zone.

L

If you fill that in for everything we are going to end up with our more or less structure dependency tree and have also the the additional um completed graph we're going to complete the the resolution process or it's based on postponing our queries until we figured out a strongly connected components by an online Graph Search, and we are going to need to uh that's details in the paper a bit uh dense on. How do we figure out? What are the name servers we get queries.

L

Do I need to resolve that query in order to complete the strongly connected component or, if it's not needed, then I can postpone it.

L

So we in order to to figure out those strongly connected components. We need our zones. We need to detect on what might be a Zone. What what can we externally observe as a cell, so we need to for all all dots within the Zone we need to figure out is that the zone is there a delegation there or um is that simply a subdomain in another Zone?

L

So q9 minimization provides a framework for that. So we'll simply query for each delegation we're going to query: is there uh something that's a dedication for for the specific name and not only the complete name and um compared to the the r receives, or these suggestions we're going to use SRI queries since for eight queries, we have the problem example that FD parent is also authoritative for the child. We're not going to discover this delegation even if it exists um for NS queries and that's what the original proposal was.

L

um I tried that once and I found some interesting game servers that answered in delegations in the answer, sections I don't want to know um and as an additional optimization, because even the the last label could be in a separate Zone.

L

um If you have w it's most likely, not going to be in a separate zone so initially for for all those um labels within our single labels within the second level, effective, seven level domain, we're going to ignore that for now and um only going to do a photo, Zone cut Discovery in NS records or records. If we get an answer back, that indicates an allegation.

L

So if we query all name servers that might be not completely viable, for example, the uh for the the.com Zone has 26, authoritative servers based AP addresses, there might be more servers, there might be less servers, I'm interesting on information there and there's a very large zone.

L

So if we resolve a lot of names, we are going to hit a lot of Verizon name servers and if we employ rate limits on our own, um that's going to be the bottleneck for our resolution and I would assume that very sign is an operator would probably prefer to not answer the same questions 26 times.

L

um If we can avoid it. What's the the solution here we can simply. We extend our assume that TLD servers are somewhat synchronized, consistent, properly managed and simply query a consists, a deterministic subset of the name servers so that we don't have to query 26 but Pick 3, based on the name that we are currently asking based on the IP addresses of the candidates that we can ask and if you find any discrepancies.

L

So if our assumption they're consistent, we observe that it doesn't match we're going to query all um additional optimizations are if we have its own files, for example, from the centralized Zone data service. We can use those delegations that we find in those zones directly and Skip querying the.com name servers completely.

L

um For testing I mean, if you implement in resolver, that's going to be a lot of bugs trial and errors, and a lot of gray, hair um re-running against the internet is not a viable option, because the number one burns, the authoritative sales servers. We don't want to be a bad net Citizen and even if we would do it, the results are not one-to-one comparable.

L

If um there are changes in the DNS, we don't know if it's code changes, that's bugs in our code or if it's a simply DNS changes. So we haven't procedure on where we capture the previous recorded data and run it simulate.

L

The ID name servers that we've seen in a Linux Network namespace and can then record queries that we've unknown queries, meaning queries that we've opted in the original data, but for right now, after and even in the simulation- and we have also queries that we know in the original data, but we skipped in the simulation right now so indicating bugs due to timeout handling unresponsive name servers, that's comparing the results. If you run it multiple times, everybody run it against.

L

Yourselves is a bit more complicated with more details in the paper, so let's conclude, um we haven't resolver that can discover the entire dependency tree um provides a repeatable and deterministic resolution process, um independent of caching ordering Etc um we're saving. All reasonable resolution passes, including all the authoritative servers that we can ask for later analysis and have a process to to test that we have a sample data set on tcpresolve.github.io.

L

With an outdated Alexa list, which is no longer up to date and the Majestic million list for reasonable records that including subdomains and a few name servers a few domains that might be interesting. That's only a simple data set if we have our data access more data, but that needs to be analyzed. Those impacts and misconfigurations need to be evaluated, and then especially here are interested in. Are there new, interesting questions that could be answered by such data sets?

L

Do you have information, or do you have input on what needs to be considered additionally and open for questions.

C

A

Khan Gilmore.

C

Thank you for this work. It's a giant disaster I'm impressed that you've persisted at it.

C

um Were you doing the so I'm trying to understand the relationship between this and tune in minimization, because one of the problems that we found with q, a minimization if I, remember correctly, is that you could get different answers if you were sending the full query as opposed to the um the suffix to.

L

C

Higher level name.

L

Servers, yes, I mean if the data access the possibility that the parent name server is a hidden primary for a child right that out, Simply could be an old configuration, that's still on on the server, but it's now delegated right.

C

Or a malicious or a malicious response.

L

That's malicious and it's um especially if there are multiple levels like a child of a child, because then, if I ask for the child directly I get an delegation, but if I ask for the uh child's child's name, grandchild name um the answer: the the server still has the Zone configured and it's going to answer directly right so that there are some differences that could happen there. Not sure. If that's the question exactly.

C

So I guess my question is: did were you able to analyze this to identify scenarios like that where that happened, because I mean one.

L

C

Of the concerns that I have is, that is that these things are possible and we wouldn't notice them today.

L

Yes, um I did not analyze that specifically um I hope. The data at least could give some indication where that could happen, and since we know the all the name servers that we could ask to verify. If that's the case or not I hope that would be possible to figure out. Okay.

C

The other thing that I think would be very useful and I. Don't know if you've produced this or not would be something that a domain administrator, someone responsible for a given DNS record could run. That would do all of these queries and map everything and give you the kind of diagram that you gave to show. You know here's the range of answers that I got and here's the pass that I got.

L

C

You have such a thing, no.

L

Unfortunately, not yet okay.

C

That would be really nice to.

L

Have yes, thank you, I agree.

D

We will take one more question and that would be from Victor meet.

K

Victor, yes, okay,.

L

Thank me later: okay, I think I'm unmuted.

E

uh I was gonna, ask normally what uses have you put it through? It looks like you're making the data available.

B

For others to poke out a little bit, but what have you done with it.

L

uh Coming out what what we've already done with the data, um if I understand the question correctly? Yes, no.

B

What's the what are the applications so far.

L

So the the application so far more or less the the engineering challenge um I looked a bit into the data. um There are a few scary things like if you find NX domains for for name server records that point into top level domains um where I did not yet figure out if they are completely registerable or not.

L

um That's the the scary part and other than that um that there is data, but I did not completely evaluate it. Yet initially, with was the engineering challenge. That was the interesting part.

D

Great, let's thank Johannes again.

D

Next, we will hear about enabling multi-hope, ISP hyperdine collaboration from Christian monteno.

M

Hello am I Audible, yes, all right, good, hello, everyone, my name is Christian and I would like to present today my work on enabling multi-hope ISP hypergiant collaboration. So, let's start looking at the internet. Nowadays we see that more than 80 percent of all the traffic is coming from hyper giant, namely Google, Netflix, meta and or others of them now who they are sending this traffic. Well, usually those are the isps like we.

N

M

18T Airtel EarthLink. In order to do so, the hyper Giants tend to interconnect as much as possible, so they tend to connect to as many networks as they feel see. Feed now, uh nowadays, large Hardware Giants peel more than 10 000 different.

G

M

And, for example, we see here like Cloud Fair, already reached the 10 000 and Google has more than 12 000 Networks. However, just connecting to the isps is not enough. So in order to the second thing that hypertension has to do once the client wants to access some research.

M

The hyper Jack needs to select the optimal server right, and this problem is not trivial, because there are a lot of things that are changing on the internet all the time, however, previous worked by pre-ordolatile design, a system that actually helps the hyper giant to select the the best server, but just for those isps that are directly connected to the hyper Giants right and here comes the question: how about the networks that do not actually appear with this hypergiant? Since the largest one goes up to 15 000 networks.

M

It means that there are more than 40 000 small Network, guys out there that really don't appear to a hyper Giant. So, during our collaboration with a large European Transit provider, we actually saw that a really large number of these small isbs to do not peer with the majority of the hyper Giants and they actually rely on their Transit provider.

M

Let's have a quick look. uh We have here on the left side, an actual hypergiant that would like to send some traffic to to a small European ISP, and the first thing that it does, it will send the traffic to the transit is, as we can see, the traffic is split in two different locations.

M

In this case it's two different countries and what the transit is will do is we will just simply pass over if it can all this traffic to the small European ISP right, so in this case, I'll just move forward to some other routers, but we'll keep to the same location. Now.

M

uh What will happen in the small European ISP is that some of the traffic actually needs to be rerouted from one location to another in order to reach the end clients right and in this situation we have this small different amount of percentage that actually went to to one location but needed to be in another location. Further investigation, uh we went for the investigation here we looked at all the router at their capacity. There are no congestions there, actually, no problem anywhere.

M

The only issue that this is happening is actually the improper choosing of the server on the hypergiant right and uh now. If we try to look at the entire week, we want to see what's happening over the entire week with the traffic coming from the hyper Giant and what we see from the total traffic coming from this hyper giant.

M

We see a typical Journal pattern, it's very typical for a residential ISP uh with high peaks in the evening, and when we look at the non-optimized traffic, we see that it kind of follows the same pattern, but what's most important for us is that uh there is a large amount, so the non-optimized traffic is very high during the peak times right.

M

So it's almost 30 percent here foreign now we see this behavior in more than 20 European isps uh during our collaboration with the log Transit provider, and we asked ourselves is: is there the possibility to help the hypergines to improve the server selection for non-directly connected isps right? So can we actually reduce this 18 or maybe completely remove it? If it's possible- and in fact the answer is yes, we can do that and we can do that, but by enabling is the ISP to hyper giant collaboration.

M

The idea is that the ISP means to send some additional information to the hypergine in order to improve the server selection. This sort of collaboration can go multiple ways. For example, you can have a multi-hop collaboration with ISP collaborates directly to the hyper Giant, and no other in between Transit is they're. Multiple of only one of them is involved. Another sort of collaboration can be one plus Hub collaboration where there is a chain of collaboration between all the neighbors starting from the ISP ending up in the hyper giant.

M

Multiple other collaboration are possible and we discuss them in our paper. So if you are interested, please go ahead and read, but for our presentation we'll focus on the multi-hop collaboration. The idea here is that the ISP would like to send a set of key value pairs to the hyper giant where the the key is an IP prefix of the ISP, and the value is a list of this similar IP prefixes, give.

K

M

Example, you can integrate it as a letter in sort of the hypergiant. Can you please send data to prefix a as you do, for prefix B or for prefix C in this order of preferences?

M

Right uh now, once this is set, the idea the the question comes up is how does the ISP select this prefixes, what what prefixes should choose and where it is the three different possibilities you can go either with bgp and all prefixes, which is actually not efficient, since the the the hyper jet already knows them uh and takes them into account.

M

The second one is the the ISP DNS resolver working prefixes, and the idea here is that inside the ISP actually uses a a small fine-graded, uh a prefixes that it's working with, especially for the DNS resolvers of the clients, and we we call this in our future in in the future slides. We will call this specific work prefixes that DNS default. A third option is complete. This aggregation like slash 22 desegregation, where, um where we can de-aggregate disaggregate all the prefixes of the ISP up to slash 24..

M

The thing here is that both the DNS resolver and the and the DNS server should have the ECS enabled now going back to our traffic and the unoptimized part. We ran a retrospective simulation on this real traffic and, as we can see, with the DNS default simulation, we managed to reduce from an average of 18 down to 1.3 percent of the amount of non-optimized traffic. Now, if we use the slash 24 prefixes, then we end up with fully optimal traffic in here.

M

This was the case of only one hypergiant and we went forward and we look at all the traffic that's coming to that ISP and we identified 11 different, hyper giant there and also some other networks that are sending traffic.

M

um As you can see, three of them here are marked with a star. The point is that for these three hyper Giants, they only connect in one location with the transit as therefore it doesn't really matter whatever server they will choose that there is only one possible way to uh to to send the data to the to the ISP, so the further optimization can be done only by the transit is itself. It wishes to do something. uh Some changes inside of it, internal routing.

M

The next column, shows you the the amount of traffic coming from his hypergiants and also the following up the the amount of non-optimized uh traffic. uh That is there, as you can see, overall, there's about 14 of the of the traffic that uh incoming uh coming to the ISP and this uh this 14 are potential. Have the potential to be improved on the last column that we have here, we see the amount of non-optimized traffic per own traffic share and we see a large discrepancies in between hypergiants.

M

So, for example, for one hyper giant, the traffic is actually pretty well. Optimized in the server are very well selected and there's only one point: eight percent of non-optimized traffic, while for the other hyper Giants, we see almost a half 46 percent in this case.

M

So looking over the time uh on all the all the traffic that the incoming into the ISP, we see the average of 14 and we see again the the same Journal pattern of the non-optimized traffic. Once we ran with, respectively the our simulation with the DNS default, we managed to reduce the the amount of unoptimized traffic down to four percent.

M

What's more important is during peak times where the resources are really scars for for small isps, we managed to reduce it even more so in this case, as you can see, from 30 down to 10 percent. If we run the simulation with the slash 24, we again ended up with optimal traffic right going to the conclusion. uh During our research, we showed that it's possible to improve server selection, even if there is no direct Connection in between the hypergiant and the isps.

M

We also showed using real ISP data that our system can actually improve the non-optimized traffic up to 10 without any additional implementation or Improvement to the DNS um I'm, trying to say about implementing and adding ECS to the DNS. Also, our results show that there is a discrepancies in between the the traffic coming from different hyper Giants and for some of them there's up to 46 of the traffic. That's coming the non-optimum correction and we argue that the out there are more than 40 000 different networks that can potentially benefit of this system.

M

Thank you, foreign.

D

I can ask something since nobody's waiting right.

G

D

So, um thanks for the presentation, um can you comment a little bit more on How likely? Is it that a hyper Zion will have alternative paths that bgp can expose or like? How do you really change the path that is being used.

M

Well, actually, we we, we don't want to change any paths at all right. So the idea that what bgp did and how the paths are selected is all fine.

M

Our point is that, unfortunately, the hypergiant has not enough information when the request comes in in order to select the proper server and by accident, some distance server coming from a different place may be selected for that client, and then a different route will be used, so it will be the best route in between the server and the client, but it's just the wrong server or the not. The best server was chosen.

D

In the the servers that you that you allowed to use- and you can you, can you call a bit on the set of those servers like which servers are? Are they on the same a yes or like what do they have in common.

M

So don't know they are not in the same air. So during our collaboration with this Transit um Transit is we know for sure that there's no presence of the hypergiant inside of the transit is so therefore, once the hypergenic want to send traffic from its servers, it's from inside the actual hyper giant, it has to send it via the transit is and end up in the ISP right. So.

D

Yeah, my question is more the servers that you actually selecting among right, so, if you're saying that the hypergine does not select the optimal server correct yes, so what do this set of servers that could be selected have in common.

M

Oh, but they they basically similar, they are duplicates, but they are in different locations and the idea is that which of them will respond to your client right.

D

Gotch, okay, thank you.

D

All right since we don't have more questions, let's move to the next and last stop for the day. uh um It's from Alex Fang thing and we talk is entitled: Daisy, practical anomaly, detection in large bdb, mpls, bdp, svx, VPN, Networks.

N

um So hello to everyone, I am Alex sapi, the student at insa Leon and we are presenting today Daisy our framework to detect anomalies in large-scale bgp and PLS and bgps rv6 VPN Networks.

N

So uh I will first present the scope of our project. What do we consider anomaly in our bgp and PLS and bcpsrv6 VPN networks? I will go through the different Daisy architectures we've been. We've been working on the ITF contributions with working to have not only an open source solution, but also a standard ietf solution and at the end of the results and the ongoing works of this project.

N

So um a VPN allows having a connectivity for the customer between two or more sites, and in this project we Define an anomaly as an event that occurs in the network and impacts the customer traffic and therefore makes the customer unhappy. This event can be provider inflicted due to an incident inside our Network, a fiber cut an interface not working properly.

N

It can be also providers self-inflicted when there is a maintenance window and the operator is pushing a new upgrade and in we in this upgrade there is a bug, but it can also be customer inflicted when the router or the customer adds router, but that are managed by the by the customer themselves. They push a wrong configuration to the router and they lead their own traffic Cobra hole so why this is important for isps but because uh at the end, if you match your outages badly and they last you end up in the news.

N

We all know that issues happens to all the network, but what it matters for ISP is how you manage them. This service Interruption. Of course they make you look bad, but cost you a lot of money and that's. That is why we in this project, we are focused on how to detect these anomalies in at early stages and how to provide the necessary information for Network operators to analyze the data, find the root cause and, of course, fix the issue at the end.

N

So this is a project that is financed by swisscom and we do research, but also development in open source throughout all the chain. So from getting the data from the network to get visibility of what is happening in the network, we do research, but also standardize the Telemetry protocol at the IDF and Implement different Publishers and and collectors. To get this information, we propose also new network measurements that could be interesting for anomaly detection and at the end, the final goal of this project is having a scalable solution of, of course, anonymity detection.

N

So first requirement of the project is that it needs to work. We are working with swisscoming production data and we need to take into account the different operational constraints that the goal operators tell. They have.

N

um Let's present the different components within working on uh in Daisy, so first we will understand that we need to know the behavior of the customers to make it work. I will go through the different standards we are using to get the different dimensions we are getting from the network. How do we post process them and at the end, based on that data? How uh do we detect anomalies and, of course, once we have detected the anomaly? How do we report them to the knock so that they can fix the issue.

N

um We did research, as I said, is based on production data from the swisscon VPN network, and we quickly realized that customers differ a lot in their behavior and therefore there is no one features algorithm to detect anomalies.

N

For example, they are customers that are super predictable with half with flat curves of traffic or repeated date, night cycles, but there are also some other customers that, for example, regular drops of packets is for them normal, and therefore we cannot use this drop Matrix to detect anomalies for them. On the other hand, they are managing around 10 to 11 000 VPN customers. We cannot do one recipe for a customer and therefore we need to group the customers into into customer profiles so that we can base our anomaly detection recipes on this profile.

N

um So we are getting uh different dimensions from the network, uh so first for databane we are using the ipfix to get traffic counters and packet drops uh from from Network in control, pane to get the bgps topology and the bgp state. We are using BMP capturing bgp events such as updates, withdraw and pure Downs and in management plane. It's still a bit a working progress at the ITF, but we are already deployed yampus using UDP native to get the interface State changes and.

L

N

Course jumpers is to get all device related information.

N

Of course, once we have got all this information by The Collector, we need to correlate this information to the customer so that we know to which customer we are impacting and we are correlating ipfix to the bgb path so that we know which Contours belongs to which customers and we are doing the same uh for the input interfaces of yampus. To know when there is an interface which goes down to which customers we are impacting.

N

We are doing uh all this with the open source solution of PMC City developed by Paolo lucente, and these allow us to rely not in inventories but of in what is happening in the network.

N

So once we've got the data correlated to the customer uh to the customer identifier, we base our customer um uh our anomaly detection on the customer profile for each customer profile. We apply a set of strategies that are a way to capture the service health.

N

These strategies are organized by a set of pipelines that are a sequence of conditionally checks and the checks are the actual algorithms on the data that detect. uh If there is something wrong on that data, a simple check, for example, would be check if, for that customer at that time, if the traffic is, there is a big difference from the last week. If there is a big difference, we raise the alarm. Of course, all of these is configurable.

N

We also allow plugins so that the network operators can Define their own rules and integrated, integrate them easily.

N

And, of course, once we've detected an anomaly, we need to report them to the nego operator so that they can fix the issue for that. We are interfacing with ciscom through our Kafka topic, so that we are sending a ticket to the NOC and uh so that they can get the data. Within this message we are giving them also to Raw data from where we have executed the algorithms, but also the details and the parameters of the different checks we have executed so that they can have the full view of what we.

N

What did we raise this alarm um since we are in a big data? We cannot save all the data when there is an incident.

N

We are saving this data in the in a permanent, permanent storage to play around with what, if scenarios, to experiment with new strategies and new checks so that we can continue improving anomaly detection and continue improving the accuracy of our platform yeah, uh as I said at the beginning, we are also contributing a lot uh within the ITF to not only have an open source solution, but also a standard solution uh with the different rfcs.

N

For example, we uh have proposed a UDP based transport for young push to allow the streaming or large amount of data from the router directly from the light cut, without stressing the rod processor. Of course, I do ITF. We. We have seen that there are new technologies, same routing over IPv6 that are starting to be deployed.

N

We are proposing also extensions to ipfix so that we can monitor these new technologies through the same uh system, and we are also proposing new metrics in that, in our case, is the on-pass delay, which is the delay between the encapsulating note and the different nodes along the path, and we are exporting these delays using also ip6, so that we can already have the aggregation from the node foreign.

N

We have also other contributions. I will not go into details on each of them, uh but uh please, if you are interested in us, but basically what we are doing is extending the young push header so that we can actually monitor not only the data but also monitor the whole pipeline of young push and in the second instance we are also extending ium direct export so that we can compute in passport mode. The on-pass delay that we proposed early.

N

So, what's the status right now of this project right now, this project has been developed in python, as a proof of concept has been deployed in production for a subset of customers of the swisscom VPN network and so far we have detected uh six outages three in real time and three in replay mode, and currently we are continuing studying uh if there could be New Dimensions.

N

That could be interesting to detect uh these anomalies and, of course, at each IDF, discussing with the different vendors, if our different drafts could be actually be implementable in the future. We also plan uh to study the same framework could be also be used um to uh to detect anomalies in Internet policies since at the end, in isps it's another service, but they use the same system in a way of monitoring.

N

So uh in a way in our season, they could be another profile with different recipes dedicated to internet services and, of course, at each ITF. We are present at each hackathon, but also different working groups to continue progressing with the standardization.

N

So that's it for me. If you have any questions or you are interested actually in any network Elementary uh uh topic, we are a bigger group than the authors of this paper. Please uh ping us reach out. We are currently at the whole week at the IDF at each ITF. We are present working with swisscom, but also different vendors, such as Cisco and hallway.

N

D

uh If there are no questions from the audience, I can sign me um so from all the data that we have seen uh so far and like all the different incidents, can you comment on the more common or the more disrupting one? um What would you like? What's the worst, the more common one.

N

um From the different incidents we have seen, there is a no common uh incident because it it was common. We would just have fixed the issue and not see see it nearly right. So no from all the engineers we have seen so far, which each incident is a new one and with learning new things from that incident and implementing so new checks, new strategies to see if we can improve it.

D

Sure I meant more like um what what was the more influential Telemetry metric that you have done, that catches many anomalies for.

N

Example: ah okay, so yeah um most of them, they impact a lot on the traffic amount so on the forwarding plane. But uh when there is something happening on bgp, for example, you have a lot of specs on the bgp events, so on mainly on right now, at least on control, Pane and forwarding planes.

D

Thank you. Okay. Let's take next picture again.

B

Okay, so it looks like we're ending the workshop right on time and uh I would like to thank everyone for uh joining us today uh and if we're asking those interesting questions, please always feel free to reach out to the speakers if we had to catch our questions due to time constraint and I would like to also thank our speakers once again um for those uh insightful research talks. Do you have anything to add.

D

Yeah I want to also add to thank our panelists that were super insightful and everyone of you for the really nice questions and participation.

B

Yes and it's our pleasure to be part of nrw, and we hope it will continue to bring new ideas to the new community and to this community and we'll continue to be more Dynamic and Lively okay. So let's call it a day and thanks everyone for coming and.

H

Thank you, Maria and Francis for putting this together.

H