Internet Engineering Task Force IRTF, 27 Mar 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: IETF 116 IRTF Open

Description

The Internet Research Task Force (IRTF) Open session, including Applied Networking Research Prize (ANRP) presentations, will be held during IETF 116 at 0400 UTC on 27 March 2023.

A

A

A

A

A

A

A

A

A

Good morning,.

A

A

A

B

C

C

Oh, no, that is that Ray's hand whoops.

D

D

D

D

All right great, thank you.

D

Let's try again.

D

E

What's happening.

E

E

A

E

Welcome everybody: uh this is the irtf open meeting at uh itf116 in Yokohama.

E

um My name is Colin Perkins I'm, the irtf chair um I'm, going to check that uh everything is working for the remote people.

E

And very quiet is that better, all right? Okay? So, as I said, uh my name is Colin Perkins I'm the iitf chair. This is the irtf open meeting.

E

So I I want to start with the the usual reminders of the iitf policies, uh the intellectual intellectual property policy and so on. um So first of all a reminder that uh in the iitf we followed the the same API disclosure rules as the ietf does, and the bypass participating in this meeting.

E

You agree to follow those procedures, uh in particular, if you're aware of uh um any intellectual property on your your talk or your contribution at the microphone, then you need to disclose uh disclose that fact and uh the precise rules for this are listed on the slide and so on.

E

In addition, a reminder that we make audio video recordings of these sessions available, uh this session in particular, is being streamed, live and going out on YouTube and the recording will be on YouTube afterwards and there's also a photographer here. So uh if you're wearing a one of the red do not photograph lanyards, then you'll be uh you will avoid the photographs. But if you speak at the microphones or if you are giving a presentation, you will be recorded and the recordings will go online.

E

uh We also have a code of conduct. um You know, please do pay attention to the rules about code of conduct and the anti-harassment procedures. Please do behave professionally and appropriately and ensure that everybody is welcome in this meeting and in the irtf and the I and in the ietf in general.

E

um A reminder also that any personal data you give will be handled in accordance with the privacy policy.

E

If you are a a remote participant, please remember to turn off your audio and video unless you're actively asking questions if you're a local participant, please do sign in by scanning the QR code, so we have a record of who is attending.

E

um Also, if you're asking questions, use the um please use the the miteko tool, either the on-site tool, if you're in the room or the full full tool. If you're remote, we're running a a unified queue for questions. So please do use the tool, rather than just going straight up to the microphone to put yourself into the queue.

E

Unless you have any questions with this and how the tools work, the the the the the URL on the slide has has some favorite information.

E

Also, a reminder that, as a covered safety measure, uh in-person participants uh in this meeting in the other ITF controlled rooms are required to wear an ffp2 mask or equivalent. uh The only exceptions to that are the people actively presenting uh and the chair when they're, actively speaking, um participants asking questions from the floor are expected to remain masked.

E

All right so, as I said, this is the irtf open meeting. The irtf itself is the um a parallel organization to the IE ETF, which focuses on some of the longer term research issues which affect the internet.

E

uh It's very much a research organization we're not here to conduct standards, development, we're not here to produce standards um and while the the irtf can publish informational or experimental rfcs, the primary outputs of the research groups is expected to be understanding and research papers rather than protocol specifications and rfcs.

E

The irtf is organized as a number of research groups, those listed in uh and shown in dark blue on the slide are meeting later this week. um The computation in the network research group coin RG met this morning, and we have two two groups which are not meeting uh that the network coding research group is uh essentially finished with its work and will be expected to close uh relatively shortly, and the thing to finger research group is having a a meeting in a few weeks online and all the other groups are meeting later this week.

E

We also have two new research groups to to new proposed research groups which are meeting for the first time later this week. We'll hear more about these in a few minutes.

E

First of these is the usable formal methods, research Group, which is chaired by Jonathan Hyland and Stephen Farrell.

E

The goal here is to bring together the um the political standards community in in the iitf, with the academic research Community, which is studying formal methods for protocol specification, uh they're there being to exchange some experience and and ideas and to try and understand uh whether and how formal methods can be employed to improve the way we specify protocols and to improve the correctness of the protocols that are being specified in the ITF and pass back experience on What's um useful for for professional protocol designers to the academic Community developing uh such such tooling, and that group will be meeting on Wednesday I.

E

Believe uh the other new research group is the research and Analysis of standard setting processes uh proposed research Group, which we'll be meeting on Thursday. uh The chess for this are Ignacio, Castro and Niels turnover, and this group is focused on understanding the standard setting process itself.

E

It's focused on understanding the community, its diversity, the impact um that the set of participants in that Community have on the process by which we develop standards. The impact of how changes in that Community affects the standards which are being developed is focusing on understanding the decision-making process in Internet standards and understanding the interactions between the different parts of the ITF, the participants in the ATF and the ITF and other standard setting communities.

E

As I say, we'll hear more about these uh in in a few minutes, and please do uh consider going along to the to their first meetings later this week.

E

So, as I said, the the primary focus of the irtf is on producing understanding and research, and most of the outputs are in the form of papers. We do, however, publish rfcs occasionally and the irtf has published two rfcs since the last meeting.

E

The first of those was the CCN info draft, which came out of the information Centric networking research group, um which is an RFC published in February this year and I. Think. Just last week the uh the Quantum Internet research group published its first draft, uh the first RFC on the architectural principles for Quantum Internet and the Quantum Internet group is meeting in the slot immediately following this one. So if you're interested in that topic, please do go along to that meeting.

E

Foreign addition to the research groups um and and the the research we we do in in those research groups, uh we also organized the applied networking research price.

E

This is organized in cooperation with the internet, Society uh with sponsorship from Comcast and NBC Universal, and it's here to recognize uh some of the best research results in applied networking, um some new research that has potential relevance to the uh the internet standards community and perhaps to recognize upcoming people who, who are likely to have an impact on the internet standards and Technologies going forward.

E

I'm very pleased to announce that we have two uh a RP award presentations today. um The awards go to uh the awards for this meeting go to Boris, because many for his work on uh novel uh offloading architectures for for NYX and so our facility Jacobs for his work on evaluating machine learning for network security.

E

um So, as I say, the main focus of this meeting will be on these presentations, um the the uh the papers and the awards. uh Sorry, the papers and the slides are all online on the website uh and uh please do um congratulate uh the two awardees for the awards.

E

In addition, the the final activity we organize in the irtf is the applied networking research Workshop. um The the workshop is uh a forum for the the research Community, the vendors The Operators uh in in the uh in the ITF standards Community to present and discuss emerging results in applied networking research.

E

It co-locates with the July ITF meeting, which will be in San Francisco this year and I'm pleased to announce that the the chairs of this meeting uh of this Workshop will be Francis Yan from Microsoft and Maria apostolaki uh from Princeton, uh both of whom previous anrp winners so I'm very pleased to have them on board and running this Workshop uh paper. Submissions for the workshop will be due on the 12th of May and you should look out for the detailed call for papers any day now.

E

Foreign ly I'd like to highlight that we do offer travel grants to attend the irtf meetings. We offer diversity travel grants to support early career academics and PhD students from underrepresented groups, and we also offer travel grants to attend the applied networking research Workshop, the iitf.org travel Grant site will include information about those. You can expect the call for travel grants for the the July meeting to go live later this month. So please do uh do look out for that in the next couple of weeks.

E

And that's essentially all I have to say our agenda for the the rest of today. um We are starting with a couple of short talks.

E

uh Introducing the two new proposed research groups: uh Jonathan will talk about the the usable, formal methods, group, Next and that'll, be followed by Ignacio who'll, be talking about the the research and Analysis of standard setting processes group and then following that's the the majority of the meeting will be devoted to the award talks, starting with Boris talking about autonomous, Nick offloads, and then Arthur will be talking about Ai and machine learning for network security.

E

um If there any questions, um if not um uh Jonathan I guess your first.

C

Do you want me to request no.

E

I'm just doing it.

E

You should have control.

C

Yeah, okay, good afternoon good afternoon, everybody um so I'd like to introduce the usable, formal methods, research group and before I, do that you probably want to know what I mean by formal methods and the standard definition would be the use of mathematical techniques and formalisms to assist in the specification, design, analysis and implementation of in this case protocols.

C

But that's not really a very easily digestible definition, and so the real question is: can we bring mathematical rigor to our work with protocols? Can we say this protocol is secure and I can prove it.

C

So a very, very Abridged history of formal methods, I am going to try and compress approximately 70 years of History into one slide.

C

um Yes, there are lots of things missing, but basically work in the 50s and 60s that mostly dealt with safety um and weather. Mechanical processes would fail, um and over time that was applied to more and more to digital processes, and nowadays we use it to analyze protocols and certainly initially the proof techniques were really quite limited.

C

um We had to make these very strange assumptions, or we couldn't even analyze some things. They were beyond our tools and they required huge amounts of manual work, hundreds and hundreds of pages of by handwritten algebra, and so over time. We eventually invented tools and techniques that helped with this process and allowed us to do more interesting things, and eventually the tools were sufficiently mature and usable that the formal methods Community became deeply involved in the development and specification of what became TLS 1.3 and just to give you the way, I think about this.

C

You can very roughly compose decompose, formal methods into two rough camps.

C

There are lots of things that don't really fit into this categorization, but roughly you would say formal analysis is the set of techniques which we use to say. Is this specification design correct so ignoring anything that we care about the implementation? Is the design, if implemented perfectly correct? Does it do what we think it does and then there's the second half, which is formal verification which says? Does this piece of code do what we think it does?

C

Does this piece of code say compute, the right value and the uh form methods research group will try and look at both of these.

C

um Just to give you go back briefly uh to the TLs design process. What happened with tls13 was from the beginning. Academics were doing formal, analyzes of the protocol design. They were saying. Does this design achieve the effects we want it to, and actually there were three or four major flaws that were found in the protocol design at various stages of its development that eventually uh were removed? Well, they were removed and there were proofs written that say this version of the protocol is secure. This version of the protocol meets its uh mathematical.

C

Oh sorry, it means its specification requirements and that is where I first became involved in the ATF, and so you might think. Oh okay. Well, we've already used this tool. We know how to do it great problem solved.

C

The problem is okay, uh the problem is formal methods are uh not very user friendly.

C

um The proofs can be immensely long, so we used a automated tool to do a proof of tls13 and the proof runs to 750 000 lines, and you could, in theory, go through line by line and check each one of those, but most people don't want to and you so. You basically have to choose a tool that checks each line and eventually gets to the end.

C

The alternative procedure that people use rather than trying to use one of these tools is to write by hand proofs literally long form, proofs and quite often I will be asked Can you review this paper. Here is 26 pages of algebra. You have two days to review this paper uh uh and, unsurprisingly, it's very very difficult to actually check whether the proof is actually a proof, um and so there are a couple of issues, but mostly the proofs are hard to understand, verify, adjust.

C

Everything and academics have mostly been doing this work and the problem with this work is it's very high risk and very low reward, because quite often what happens? Is you analyze a protocol? And you say ah great: it's secure no problems at all and then you go to get it published and everyone's like who thought it wasn't secure.

C

You just have written a very boring paper, um and so, unless you're like I'm, pretty sure this is broken, you're not going to invest the potentially years it would take to check um and so enter the usable, formal methods research group proposed.

C

um How can we solve these problems? How can we take this technique that we finally gotten to a point where we can actually use it? How do we make it so that anyone can use it and I think the initial steps that will be really good for the UFO MRG? Is it's going to provide a place for experts to gather and it will be a pool of knowledge that other working groups can come to and say we can't make this work. We we don't know how to do this.

C

Can you advise and if we then start building up a tower training materials, because at the moment it's very easy to analyze a very tiny protocol? There's examples and, and then the moment you want to go slightly more complicated, you're on your own um and there there are. We can also provide feedback to Tool designers.

C

So a lot of the tools are not actually used necessarily by people who want to analyze ITF protocols they're used in Academia, so we can go through them and say we need this feature, um hopefully because the uh irtf does do some publishing we'll have a place to publish these negative results. Club paste. The publish here is a proof of security and we'll have a place to store all the proofs and the checking tools you need to check the proofs.

C

um So a couple of non-goals very important on goal is to not try and change the ietf process. So maybe the ietf processes do need to change, but that's not what we're interested in. That's, not our job. It's an I RTF group and if we manage the way we can check whether we have succeeded in making formal methods usable is if people use them. If we say you have to use them, we we don't know whether we've succeeded um and very much. We don't want to be an obstacle.

C

We want to provide useful tools, uh uh we're meeting on Wednesday at 9 30.. uh Please come and join us. uh Thank you. Any questions.

E

All right, thank you. Does anyone have any questions.

F

Oh I guess I won the the online Lottery there um I've got the really low mic all right. uh So maybe this is a question for when you have the session later in the week, but my under very, very limited understanding of formal methods is that one of the challenges with them is anytime, you're, doing a proof, you're always dependent on a certain set of lemmas or a certain set of assumptions yeah. So the big question is whether or not those are valid.

F

um Does that fit within the scope of what you're planning I'm working on.

C

So absolutely um there are some very standard assumptions that we make, um which we know aren't true.

C

um For example, we assume that um say: asymmetric crypto is just a magic, perfect black box and that it always works, and yes, we will definitely need people who do Crypt analysis to continue doing their work, but the goal of these proofs is to say: if we assume that we have a valid or a secure, asymmetric, cryptography algorithm.

C

We can then swap that out with any other algorithm that we think is secure of the day and it doesn't change the proof. The proof says if you have a secure algorithm, you can build this thing. Thanks.

F

uh And sorry uh Colin introduced me but forgot to say my name at the beginning for the record: Rod Van Meter.

B

Hello, Diego Lopez, there's a question because looking at when coming here, I was assuming that, while the former methods were going to one second, yes, no I was I was a understanding that we're talking about formal methods in general, for protocols and for verifying or proving whatever the properties. But it seems that you are focusing it very much on security properties. It's.

C

So that security properties is my background, but it's certainly not uh the only scope of the RG.

B

C

The proposed dodgy um you know one where one of the first things we're planning to do is try and come up with some examples that people can look at that aren't security related. You know, deadlock related, live lock related, um but yeah, that's certainly uh in our scope. um But, yes, my background is Securities. That's what I think about.

B

And just uh do you have any idea of the background that Japan, because I remember a long time ago, this uh I used to work with this communication, communicating process algebra and things like.

C

B

Yeah, exactly it's uh or or are we talking about something a little bit uh ahead of that.

C

uh Definitely is like a little bit ahead of uh CSP. Csp is 92 93. yeah.

B

Well, even even before, I will tell you yes.

C

um So yeah we're we're looking at things from several decades later than that.

B

But for help based on um related approaches, so.

C

So the underlying tool is usually either pattern, matching recursion schemes or higher order, percussion skins, okay, what.

C

Any other questions more.

E

Questions all right, thank.

C

You thank you very much and come to my RG.

E

C

E

Great all right so so uh next up is uh Ignacio. Who'll be talking about the the raspberry uh a reminder. uh Well, Ignacio is getting set up. um If you're in the this room, you need to wear a mask. If you are not willing to wear a mask, you need to leave this room.

E

G

Colleen, uh hello, everybody telling you about our new uh research group proposal, Search Group rosberg, which is about research, analysts of standard setting processes and I'm chairing it together with meals. Next slide, please!

G

So what is rasp about? Well, the name says it all is about understanding better of the things that we do here, uh not only at the idea, but mostly at the IDF of the point is not to judge whether the ITF is the best thing. Since this live Spirit and the itu is not. uh The point is just to analyze what we do and uh other people might make judgments on that. But that's not really the point of this research group.

G

The point is data rather than judgment uh types of outputs that we expect a joint reports of research papers, uh databases for example.

G

We are now labeling uh email discussions for agreement disagreement, so we can understand better consensus formation processes, uh tools and open software, uh we're, for example, making a little tool to make recommendations for cross area review by comparing the content of the emails of the people with a draft that might require review uh the way to do it well, similar to many of the research groups collaborations so there's many different people from many different areas that are interested in this sort of stuff, uh organizing working sessions and prevent duplication.

G

There are different people that has been working in different ways around this and mostly produce evidence-based reproducible work. uh This is a charter. Please subscribe to the mail list. If this sounds interesting come to the session tomorrow, we do a lot of different things. That's a slide, please uh just to give a glance like this, for example, is from one of the papers that we have uh been working on, and this is the call for ship craft. You can see how it's a group of people that write a lot of drafts together.

G

You can see also some points in the middle. uh You can guess. Maybe why that's happening again? Data no judgment uh next slide.

G

That's the interaction graph of the working groups and you can see how things Cluster nicely around different areas, and this is just to give you a glimpse, but we do many other things and uh if you have a Nexus live, please, if you have questions or you have ideas uh uh or do you want us to look into something please drop by, and this is what we do.

G

What we don't is to make hierarchical comparisons between, which is the uh which standard setting organization is better or similar to what Jonathan said, define how operational process of the ITF should work, though people is welcome to use the data that we might produce to make those judgments by themselves, but that's not overall next slide, so please drop by on Thursday. uh We have uh I believe fabrico session, going from uh ethnography to large language models in understanding the ITF and please come to agree, disagree, propose or just discuss. The research group is starting.

G

So if you want to have a say on things that we should be looking at, uh we are all ears. Thank you very much. All.

A

E

A

Right there any questions. Thank.

E

You and ask you any questions.

E

Yes, not I, don't see any online. Let's call for questions.

E

Okay, well, if you are interested in understanding how the ITF Works uh please go along to the meeting on Thursday. Thank you.

E

Thank you all right, sir, and again a reminder that we have a masking policy and that, if you're in this room, you need to wear a face mask if you are sitting next to someone who is not wearing a face mask please remind them, and if you're not willing to do that, you need to leave the room.

D

Hi, everyone hey.

A

E

E

All right, so, um if any luck you now have control of the slates okay, so uh I'm very pleased to um introduce the first ever uh applied. Networking research, price winners. um uh First of all, applied networking research, Prize winners for this meeting.

E

uh The first Speaker here is uh Boris pizmeni uh Boris is a PhD student at the Technic technion computer science department in Israel and is currently visiting uh epfl he's also employed by Nvidia. As a software architect, his research is focused on improving system software performance by enhancing Nic controller hardware, and in recent years, he's worked on accelerating quick with UDP segmentation offload and received side. Coalescing offload he's also worked on accelerating encryption for quick, TLS and ipsec.

E

His talk today is on autonomous, Nick offloads and it's a paper that was originally presented at the S Plus conference in 2021. If I remember correctly, yes, all right, so uh congratulations again to Boris.

D

E

D

Okay, so thanks for the introduction calling and this paper is autonomous- nickel floats it describes.

D

It describes a software architecture that enables NYX to accelerate level 5 protocol computations transparently to soft tool TCP, so we're going to focus on level 5 Protocols of ltcp here are some examples, specifically we're going to focus on the TLs protocol and its encryption decryption and authenticational digest. As it's called here in the paper, we also present nvme TCP in its digest and copy of flood in the combination of the two, so I believe overview of TLS.

D

Although you probably know better than I about this protocol, most popular way to include TCP traffic has two stages: a handshake and a data transfer. The data intensive position is the data transfer. So we focus on that and we can assume that the handshake has.

E

Been completed.

D

For the rest of the talk, okay, so next we're going to explore the design space of layl5 protocol acceleration using TLS as an example TLS inclusion.

D

uh So the first approach to accelerate TLS is in software, for instance, kernel TLS is a software optimization that requires no additional hardware and we'll see later a bit in more detail how it optimizes TLs, but essentially it breaks some abstractions to make it mobile formed.

D

The problem is that, as long as we're using software, we can't avoid the overhead of data intensive computations.

D

So the next approach is to use a hardware such as the on-cpu acceleration available in Intel. Cpu calls called asni, so so this is very efficient. It uses fast typologies tools and cache memory has a relatively low overhead, but nevertheless, a CPU call doing. Tls can consume more than 50 percent of the call on just encryption.

D

So looking further, we can consider an off CPU accelerator, such as a pcie Calder. Does encryption like the Intel, quick assist, call, and the benefit is that the CPU oval head is independent of the data says being encrypted, but the problem is that significant parallelism is required to outperform the on CPU acceleration, and this can be problematic and sometimes applications need to be redesigned to make use of that.

D

Looking forward, we can ideally place the encryption of the naked surface. Data needs to Traverse the Nick anyway. The problem is that current approaches uh to do this. uh They depend on offloading TCP, IP, routing quality of service. Essentially, the internal Network stack into the hardware, and this introduces a lot of complexity, security problems and has thus file shown itself to be undesirable and impractical.

D

So our approach is to move the encryption to the Nic while still keeping the network stack in software, including TCP Etc. The problem with our approach is that we introduce additional overhead on recovery from reality and nurses. I'll explain next.

D

So before we dive into how autonomous offload works, we build our Solution on top of Canon Tila, so I'm going to explain how Canon TLS works. So Canon TLS is a form of a software specialization.

D

So before can learn TLS. What we had is. We have a baseline application say using a TLS Library. It calls the liability with its data, the library encrypts, the data. It puts it in a TLS record and then the record moves to the kernel will TCP sends it. So we have an encryption pass and a copy pass once when passing from the application to TLS and and the copy is when passing from TLS to I lost control.

D

Okay and the copy is when passing from TLS to TCP, so what can ltls does is it combines the encryption and the copy pass into a single pass, and so it could be more efficient. Additionally, it enables optimizations such as send file.

D

Finally, in Canon TLS, the TLs layer can communicate directly with the next driver, and this is what we require for autonomous software. This direct communication allows us to make some optimizations, okay, so with autonomous TLS. What we have is that we eliminate the encryption pass from software and we move it into the hardware. So when the application sends its data using Canon TLS, we just do a copy. We don't do the encryption, but we still create a record, but this time we can compute some parts of the record, for instance the the Mac the authentication.

D

It's remains zero because we can compute it and we pass this data to TCP, unmodified and TCP will add its handles and segment it as it desires, and eventually, when data passes from memory to the network, the Nic will encrypt the data as it passes through.

D

So, in actual, this is the solution. Now we'll get into how we implement the offload starting from the transmit side. uh So in order to offload uh data, that is in sequence, uh we don't need to do much, because the Nic uses a state that incrementally is updated and we simply send the packet with an indication and how it will will perform its operation encrypted data. Then it uses two contexts to accomplish this population: a static state which is a relatively constant for the hotel connection.

D

It holds, for example, the encryption keys and a dynamic state. It is updated for each and every packet and it holds the state for the next expected TCP sequence dumbbell. So it tells the hardware how to encrypt the next TCP sequence number position, and it holds things like the current message of the column message, size and the IV and the rolling authentication or icv state.

A

Lost control again.

E

um Okay, okay,.

D

um So the problem is when we want to send something that is out of sequence. A full example uh assume that we send packets one through eight and then sotel decides to returns, meet packet five, so the Hudl doesn't have the correct Dynamic state to accomplish that because it expects bucket9 and it needs to transmit back at five. So what happens is that the driver identifies this problem?

D

It compels the sequence numbers between the dynamic state it has a shadow of and the packet that's been transmitted and it's going to perform a flow which we call the cavali into transmit packet, five. So in the recovery flow, what we need to do is we need to pass the TLs local prefix of TCP packet 5, which is marked in the dashed lines.

D

On the figure and after passing this information to Nick handwell, we can send packet five, because the dynamic state would be adjusted for packet five, and to accomplish this, the driver will communicate with the TLs layer directly asking it for this TLS, like called prolifics and in the TLs layer. It holds this mapping which which allows it to provide this information.

D

The only problem is that the TLs layer needs to hold this information all the time, and we can't release this data uh once packets are acknowledged, because we may release a part of a record. So we hold an extra reference. Follow the packets that that combine into a record and we release those only when the entire TLS record is acknowledged.

D

So that's it for the transmit path. uh Moving to to the sieve path so again, we'll receive the in sequence. Flow is very straightforward. How will the clips incrementally as packets go through and indicates for each packet whether it was decrypted and authenticated successfully.

D

Now, problems begin when we have out of all the Transmissions full instance. In this example, we receive packets, one two, three four and then packet, two again so packet to easily transmission and then back at five. So what will happen? Is the childhood will skip the decryption of Market to identifying that it? It was received before because it expects packet. Five. The benefit here is that Hardware continues doing its acceleration for packet five and on volts, and it doesn't stop because of pocket. Two.

D

um Another problem that may occur is a TLS local data, the ordering. So here what we have is the packet 2 is not received.

D

Well, it is expected, and instead packets, fully fall and five alive, and only then bucket two arrives so bucket tool was essentially the older than received later, and what happens in this situation is that using the length field of the TLs record handle we identified that the next local header will appeal in packet, free and so when packet free is received, the Hub will re-synchronizes itself to be ready to decrypt the record that starts in packet, three and so packets, four and five will get decrypted in the offload, and the acceleration will continue for future packets and then pocket 2 is considered to be the transmission in this case,.

D

Now the real problem happens when, on the receive path we get heavily order links. So when the TLs circled handle is the ordered the handle can't use this trick. Well, it relies on the land field of the TLs record, handle to tell the position of of the of the next record, because after Little sifts Packet fall, it knows it didn't see. Pocket free and packet fall can have any number of potential TLS circles and, as a result, it stops offloading until it is recovered until.

H

D

The Hubble state is recovered.

E

D

Okay, uh so to to solve this problem, uh we can just let software notify Hardware every time, uh heals the TLs, like called heals the TLs like old handle, because packets may continue coming all the time and and as a result, uh we will always have this slice condition between software and Hardware. Will software is chasing hardware and Hardware keeps receiving new packets, uh so we need some solution that is not pure software to to solve this problem uh for streaming, TCP workloads and the solution we propose. We devise is a social Hardware collaboration.

D

Well, first, so suppose we have TLS header reorderling and what Hudl does it will speculatively search for what we call a header magic pattern, for instance for TLS? It's this hex 1703 or flea, which represents the type in the version, and once Hardwell identifies this pattern in the TCP byte stream. It will ask software: is this a TLS handle or is it something else? And meanwhile, packets continue to arrive and the handle will track where it expects to see subsequent TLS record handles based on the length field again.

D

So here we first found the handle in pocket five and then in packet. Seven, we check and verify again that the handle is still Dell. If it's not Dell, we continue. We go back to step one because it was long, but if, if Hardware was correct in this speculation, uh eventually software will be able to confirm that indeed, packet 5 contained a TLS circled handle in the position that the the Nick asked about, and this this will allow Hardware to synchronize and resume its acceleration from whichever point it is currently tracking.

D

So this is essentially how we solve the recovery problem on the receives Okay. So taking a step back.

D

Taking a step back, we ask ourselves what protocols in computations are autonomously uploadable and to define the probabilities and that make them such and we find that most computations and Protocols are not floatable, but not all so looking at the properties that make computations autonomously or floatable without by identifying it on the transmit side, the computation might be size preserving, and this precludes a acceleration of compassion and encapsulation. For instance, we have here an example with encapsulation to explain the intuition behind this. So suppose we do.

D

We have two hosts host a in host B, communicating via TCP and host a uses, a Nick that inflates the message say, adds TLS headers or something of that sold. Okay,.

E

It's just a little slow, okay.

D

Okay, so uh Jose sends the first packet and the Nick accelerates it by encapsulating it, and by doing so it adds some additional payload bytes and, as a result, an additional packet is required because the MTU is exceeded. So two packets will send on the while, instead of one so host B receives those two and sends X, but the second act is lost and as a result, what we get is that host a thinks that all the data each sent has been successfully received because it got an act for the 150 bytes it sent.

D

But in fact some data was lost and the Nick needs to be responsible for this transmission. But the transmissions and TCP logic is what we wanted to avoid placing in the Nick. And this is why we consider this undesirable uh and the property that is acquired uh next uh required that the computation is computable on TCP packets of any size.

D

So it can't require any bytes from future packets, and this precludes some block Cycles such as ascbc, which operates on blocks of 16 bytes, because some packets may not contain the all of those 16 bytes and we need to pass them on. We don't want to start stalling packets in Hardware to perform this operation.

D

um Next, to to do the recovery will equal the state required to compute. The operation is of constant size and is a message independent up to maybe some metadata, such as message, sequence, numbers and it kind of depends on all slim payload on on future bytes, and we find that most protocols adhere to this.

D

So indeed, many computations fit these requirements, and this include.

D

As a copy to offload with zero copy and https, which is just using open SSL and what we can see in the figure, for instance, starting from the left figure with the throughput, is that the yellow line and the blue line, the the model is the same and the yellow line is is an Optimum. Http is the best we can hopeful when doing https with offloading, and we see that the Gap is very small and the numbers below show the comparison between https and offload with zero copy.

D

So, on the leftmost point, we improve performance by almost twice and on the rightmost point, with 128 000 connections, it's by improved by 53 percent.

D

D

So finally, this this result is so similar to results, not fixture using FreeBSD with the same Hardware. They are working now and doing this on 800 gigabits, and this is all public. You can look it up.

D

um That's it. In conclusion, autonomous nickel floats is framework for accelerating level 5 protocol computations efficiently, while cooperating with the social TCP stack. It is applicable to most protocols and computations, and the evaluation show that we can improve throughput by 3.3 X and the decibel utilization by up to 60 percent and latency by up to 30 percent.

D

E

Okay, thank you.

E

A

E

Questions for Boris.

E

Surely someone must have a question.

E

All right so I have a question, um so so obviously this uh this approach, assuming certain properties um of the protocols it you know as soon as they're size, preserving incrementally um computable, have constant, States and so on.

E

um And you know, obviously these fit some protocols, but not necessarily all of the protocols um to to what extent are these um sort of fundamental limitations of the approach uh versus sort of limitations of the current work that might uh potentially be resolved in in a future iteration of the the ideas and the approach.

D

So so, almost all of them all fundamental, the only one that can be somewhat uh softened is the requirement for CBC. uh Well, it's possible to to think of a solution that stole some bytes to to enable uh this uh offloading the partial block, the one that doesn't have the full AES block, and but this would be more complicated and ascbc is deprecated. So there is not much interest.

D

E

A

um Dave hi Dave Iran uh MIT um do the trade-offs change that all of you have a user mode. Tcp stack as you were with dpta, or something like that. Not.

D

A

E

Any other questions. Anyone on the remote maybe have a question.

E

Okay, so uh one other thing I mean. Obviously this is an ATF meeting that you're presenting at um is. Is there any um sort of guidance um or any um sort of issues, to consider that the ITF should be paying attention to when designing future protocols to to make this type of approach or similar approaches, um sort of work better?.

D

Yes, actually, that's that's a great question, so uh this this approach also works for TLS 1.3, which was finalized more or less when we were finalizing the hardware and uh one of the things that happened in TLS 1.3 is that the trailer was starting to use the real application type and and when using an offload such as this, it created some problems, for instance, um so in general it worked well because the format of the record remained the same, so keeping the formatting of Records.

D

The same is helpful, but what one problem that appeared is that until we decrypt the entire record, we don't know if it's an application, really an application data or something else, and this becomes a problem when we want to combine this with something else so suppose inside TLS we have some protocol that we Pals and we use to do data placement.

D

So to do data placement we need to to assume its application data, but if it's a handshake, then it's bogus and- and this creates quite quite a lot of complexity- that we didn't anticipate that doesn't exist with TLS 1.2. So so this is somewhat unfortunate. Similarly, the padding is also makes things more complicated than it could have been.

E

Okay, thank you. So it sounds like there's. Maybe some lessons that can be uh well learned from the the way the protocol design changed to make simplify offloading.

D

I'm not sure that the IDF wants to standardize according to to the way how it works, but it would have been desirable, of course, yeah.

E

But but it perhaps helps if there are multiple ways of Designing a protocol and some are easier to offload than others.

D

Yeah definitely.

E

All right, I see roton meter.

F

Yeah Roger meteor, Care University. um This looks like really good work. Thank you. It's a it's! An interesting presentation and I think important topic, and this might be kind of a an extension of Collins question. But, to what extent is this work?

F

The value of this work sort of a characteristic of this particular point in time relative to the technology? Is this a future proof technique? Will we still be able to use what you've been doing 10 years from now 20 years from now, as the technology continues to evolve?

F

um You know, for example, if somebody sort of a corollary question is, if somebody offers you one parameter to change to the system that might might get better. How would that affect your performance? You know if somebody offers you double bandwidth or twice as many cores or something.

D

And that's a great question so so these are the holidays of this technology in particular. So it's how to predict what's going to happen in the future uh right now, we'll see Soviet option, but it's not like everybody needs to do 800, gigabit TLS. So it's not obvious that it's applicable to to all use cases yeah. There are impressive numbers.

F

D

The number is so great, but not everybody needs those numbers. So there is a trade-off, though, and making the most out of it requires quite a lot of work in software. I think the performance in previously today is somewhat better than Linux, because the Netflix guys slowly did a lot of work to make it so and I think we'll see over time how it evolves.

D

The trend is that CPUs are not getting much faster, while Nick's all getting very fast, and it makes sense to move as much data intensive computation over the payload and to the nuke as possible, and as long as we are using TCP I think this technology is going to remain relevant.

F

E

All right, thank you. Are there any final questions.

E

E

All right! Congratulations once again. uh Thank you. Boris next up is Arthur.

E

All right so I'll pass you control and then do a quick intro.

E

Okay, so the uh final talk today is the other uh applied networking research prize winning talk uh It Is by our first seller. Jacobs Arthur has a PHD in computer science, from the federal University of Rio Grande du Sol in Brazil, and he's worked with Jennifer rexford's group in Princeton and with uh Walter willinger.

E

uh His research interests include Network management, intent-based, networking, natural language processing for Network management, self-driving networks, programmable networks and artificial intelligence and its applications for networks and security.

E

He's worked uh working currently as a senior software engineer for nomad health I believe and his paper today is entitled Ai and machine learning for network security. The emperor has no clothes and I believe this was originally presented at the ACM computer and communication security conference in November 2022.. Yes, okay, you should have control over the slides, so I.

I

Do go ahead. Thank you. Thanks Carly can I pull this up. Maybe maybe.

I

That's better, is it yep, okay,.

E

It's all right into the mic right.

I

Into the mic: okay, uh thanks Colin for the introduction, hi everybody. My name is Arthur and I'm. Here today to present to you our work entitled AIML for network security. The emperor has no clothes.

I

uh So in recent years, we've seen exciting advances in machine learning and AI in fields such as facial recognition, recommendation systems or even spam detection among many other areas of computer science and other areas in. But uh let's take a look at what's causing all that excitement and what we call the traditional AIML development pipeline. Usually, if you want to develop a new machine learning model, you start by collecting some data and selecting which model you want to use.

I

You didn't use that data to train your selected model and evaluate it using traditional evaluation, metrics such as Precision, recall or F1 score. Then, if you have a high enough iPhone score, that usually means your job is done. You can claim your model Works, deploy it in a network product in a production and you move on. Otherwise you go back, you collect more data or you collect better data and re-evaluate your model selection.

I

Now we claim that this sort of traditional AIML pipeline is good enough for low stakes decision making, such as recommendation systems or spam detection, but what about high-stakes decision making scenarios such as self-driving cars or network security, in which a wrong decision can have direct impact on people's lives or companies, uh companies, revenues and reputations? In these scenarios, we argue that we need to be able to claim that a model works is not good enough.

I

We need to be able to tell why a model works and when does the model not work considered, especially especially consider that is. It is well documented, documented that machine learning models can suffer from under specification issues such as chocolate learning where the model takes shortcuts to classify the data rather than actually learning to solve the problem or they're a model might suffer from uh out of distribution samples or even that the model is simply overfitted to spurious correlation in the data and not learning anything.

I

Consider this example: we trained a random Forest classifier on the popular data set, seek IDs 2017 for network security and achieved a F1 score of 0.99.

I

In this example, if we revisit the questions we asked before, we quickly realized that the traditional AIML pipeline gives us no answer to them.

I

So in this, in this specific scenario, would you be able to trust this model when trusting this model actually means handing over control for that model to make its decisions so to help developers and researchers?

I

Answer that question we propose trustee trustee is a novel, explainability explainable AI technique that produces Global explanations from any machine learning, Black Box model in the form of low Fidelity, sorry, High, Fidelity and low complexity, decision decision, trees, trustee augments, the traditional AIML Pipeline, with two new steps, the first one to extract a decision tree from any Black Box model and the second one to analyze. That decision tree for any issues that might impact be impairing the the model to make the correct classifications.

I

Let's focus first on this explanation,.

I

When designing trustee, we had four major requirements in mind: the first one for it to be model agnostic for it to be able to work and explain any given machine learning model and not a specific type of Machinery model.

I

The second one was for trustee to be able to produce High Fidelity explanations, that is, decision trees that make the same decision as the Black Box model, the third one, and since this decision tree is meant for a human to be able to parse and understand, we needed this decision trees to be low complexity enough so that a human can actually parse it.

I

And finally, our last requirement was for Tracy to be able to produce uh stable explanations that is produced roughly the same explanation for the same input on multiple executions, so trustees algorithm starts receiving as input to a data set in a black box machine learning model. It then starts by splitting that data set into a training and testing data sets using a given split such as 70 and 30 percent.

I

It then uses that Black Box model to as an oracle to produce the expected output for the training data, which will then be used to guide the training of the decision. Trees notice that, since any machine learning model, can be used to produce the expected output, we achieve our area. Our first design requirement of model agnosticity, then trustee selects an M number of samples from this training data set in expected output and further splits it into a training and testing set.

I

The training set is then used to produce a decision tree using the traditional algorithm, cart called classification and regression trees and then evaluate it. Using the tests and data sets to produce an explanation output which we can then use to measure the Fidelity of the of the produced explanation with the expected output. This process is repeated another time and a num number of times with different samples from the training data set in which we call trustee inner inner loop. That runs any number of times this inner loop produces an output.

I

The decision tree with the highest Fidelity achieved on all iterations, achieving or secondary design requirement of High Fidelity.

I

Then one thing to notice here is that it is not uncommon for cards algorithm to produce decision, trees of hundreds or thousands of nodes, and for for a human to be able to parse it. This needs to be much much smaller, so the size of the explanation here matters to circumvent this problem. We propose a new pruning algorithm.

I

We, like we call Top K pruning top Cape pruning is based on the observation that, if you rank the branches of a decision tree based on the number of samples that it covers, you get diminished returns in terms in terms of fidelity.

I

So we see we proved the outcome, explanation by simply selecting the top key, most important branches from a decision tree, which should give you the most Fidelity for it, achieving our third desirement, our third, our third requirement of low complexity.

I

Now, finally, given that trustee uses a subsample of data to train decision trees, it is possible that with multiple executions, different decision trees will be generated and so to mitigate that issue, we added an outer loop to trustee that runs the inner loop for an s number of times and calculates the pairwise agreement of the decision trees. The decision trees produced that is trustee measures, whether or not they produce decision trees make agree on the decisions made for the same samples and then Returns the decision tree with the highest main agreement amongst all of them.

I

Achieving our last design requirement of stability.

I

This decision tree is also the final output of Trustees algorithm, which is then presented for an operator or developer. That's using trustee.

I

Now for the second uh argument, uh step of the augmented the development pipeline, we introduced a novel, a novel method. We call Trust reports, trust reports, basically automate part of the analysis process to try to identify the three, the under specification issues that I'd mentioned before that shortcut learning, other distribution samples and spurious correlations.

I

uh We basically the trust report, summarizes important information from the decision tree explanations, such as the size of the decision tree the depth. The number of input features from the model that were actually used to classify the data, the Fidelity amongst many other things and and small experiments to see how much the explanation is staple to the actual Black Box.

I

On top of that, the trust report produces useful plots on the on the decision. Tree explanations, such as the number of samples classified at each level of the decision tree for optimal pruning, the number of the number, the number of samples and classes that a specific Branch classifies and then the number of samples that each feature is responsible for in the decision tree classification.

I

Now it is important to notice, though sorry, that the trust report does not automatically tell you which, under specification issue, your model suffers from. It still requires a human to look at them and I try to identify it. Since this under specification issues are ultimately domain dependent. So it's really hard to automate.

I

We did try now to illustrate how trustee can be used to scrutinize machine learning models that brought you three use cases I'd like to to discuss the first one.

I

All of these use cases by the way are are from selected Publications with reproducibility artifacts available online. So the first one is a one-dimensional convolutional neural network used to classify data between VPN encrypted traffic and non-vpn traffic.

I

This model uses the first as features this first 784 raw bytes from each bcap file analyzed and achie, and reported an F1 score of 0.99. We were able to reproduce something close to that of 0.96.

I

We then use this model to extract a decision tree out of it using trustee with a Fidelity of one precisely and no pruning was required, since he only had seven nodes that you can see here on the screen. Now, as you can see, the decision trees is telling us that the model uses is using bytes, 49, 43 and 47 from the input bytes to make the classification between VPN traffic and non-vpn traffic. Now to understand what this decision tree, we need to First understand the data that it's coming from.

I

So if you go when we, when we went to Dove deep into the pcaps that were being used, we quickly noticed that there was a split in the data that is all the non-vpn traffic pcaps had ethernet headers on them, while the VPN trap, pcaps traffic Recaps did not have internet header on them. This created a mismatch.

I

This created a mismatch of the alignment of the features that the model was looking because for non-vpn traffic, the model was looking at source and destination. Mac address headers, while for VPN traffic it was. It was looking at total length, flag, offset and protocol from the ipv4 headers.

I

So we've net that knowledge in mind. We can go back to the decision tree and see that the first decision that this model is looking at is basically comparing whether or not uh for VPN traffic. It is using the UDP or TCP protocols, value, 6, 6 and 17 of the ipv4 protocol uh against a random byte from a source Mac address, which in this case is always larger than 17 for the data, so that basically splits all almost all of the data perfectly.

I

uh The second level of the decision three is basically showing that to weed out the remaining few samples that do not follow these rules. Like one percent, uh the the model is picking up on different headers, such as the fragment offset against a random bike from the render from the source Mac address for the one for vpns on right side and on the left. Side is looking at the destination Mac address, which is always zero.

I

So with that in mind, we set out to produce a validation data set to validate the explanation output, and we did that by tampering with the pack headers from the original pickups, specifically headers, 43, 47 and 49, and as you can see that had no impact at all on the model's precision and recall the reason for that and using our trust report, we were able to identify.

I

That is that, if you go back to the data uh you could you quickly realize that the developers of the model didn't remove the pcap metadata from the from the pcapps before reading the features so they're actually reading features feature values from the pcap metadata for the first 40 bytes, which includes a lot of potential uh potential shortcuts for the model, including by 3023, which indicates whether or not the ethernet header is present or not in the pcap.

I

So from the first few bytes, the model had plenty of opportunities of to take shortcuts to classify the data rather than actually learn to classify between VPN and non-vpn traffic.

I

So with that, with that in mind, we set out to tamper with ranges of feature values, rather individual features from 32 to 63, to 0, to 63 into 0 to 127 until we reach an F1 score of 0.398, which is basically worse than a coin flip.

I

So the takeaway is that this model is suffering from blatant shortcut learning and didn't, learn to classify the data at all and simply picking up on uh shortcuts put in the features in the future values. Now, for the Second Use case, I want to revisit the random Forest example that I showed before we selected many many papers. We saw many Penny papers that relied on the seek IDs 2017 data set for classification. This data set is really popular.

I

This data set contains traffic from 13 different types of attacks aside from benign traffic, including Port skins, DDOS and Heartbleed in this, and this data set comes with a set of 78 pre-computed features from flow statistics such as flow duration, mean, inter arrival time number of packets sent and received in each flow and a lot of most of the Publications. We we found they used the data set, reported F1 score numbers of 0.99, which were very easily able to reproduce with a random Forest classifier.

I

We then use trustee to extract the decision tree from this model, which gives us a 0.99 Fidelity of which you're seeing here the top three pruning of it, which only has six nodes.

I

Now, aside from the obvious problem of using destination ports to to classify FTP and SSH batatorial tax I want to focus here on this specific branch that classifies all of the hard bleed samples in this data set by simply looking at the maximum length of the response, maximum level response packet size in the flow.

I

By simply looking whether or not the maximum response, the maximum length of response packet size is bigger or smaller than 12K. This model is able to determine whether it's a heartbeat attack or not. The reason for that in using this distribution plots from the trust report. We can see that this feature specifically a perfectly splits.

I

The entire heart bleed flows from all of the rest, because the response packet size for heart bleed that flows are is always bigger than 12K and it's always smaller than 12K for all of the other classes, and we can see that same behavior in other features such as the inter-arrival time response, inter arrival time, which almost perfectly splits heart bleed from all others. All of the other classes.

I

Now to understand why this happens, we need to First understand how the heartbeat attack works. A heartbeat attack and I feel like this is preaching to the choir, but uh heartbeat attack. Works happens when I in a malicious actor sends an https heartbeat message with a to a vulnerable server with a value in the size field bigger than the actual package. So, basically, you can send a 16k byte packet and specify it as 64k bytes for the server a vulnerable server will respond with a message with.

I

We have a heartbeat response with the same size as the incoming packet by copying the contacts of the incoming packet into the into the response packet. So, but since the pack, incoming packet only has 16k bytes, the response packet will have 48k 48k bytes from the server memory, which may include credit card information, usernames passwords that sort of thing now in the seek IDs 2017 data set. Specifically, we notice that for the 30 minutes, duration of the heartbeat at the heart bleed attacks generated.

I

They didn't close the connections once we generated huge numbers for feature values related to response packet, sizes uh and Inter arrival times in in those flows which made it abundantly easy for the for the model to pick up on those values. So with that in mind, we set out to generate a validation data set for our explanation, consisting of a thousand new heartbeat flows with out of distribution values in which we simply close.

I

The connection of the harp uh of the https connection after every heartbeat message sent and this generated feature values related to response packet, sizes and intervival time, much similar to benign traffic and, as expected, the random forest classifier was unable to identify a single one of those new thousand heartbeat flows as heartbeat.

I

So the takeaway for this specific use case is that this model is clearly overfeeded to training data and servers and fails to identify any simple changes or simple Auto distribution samples.

I

Now the third use case, unless I don't want to show, is another example of a paper that uses the seek IDs 2017 data set. This paper was published in CCS 2020 and proposes a model called nprint ml. This paper uses an automl model for intrusion detection system. It uses a 4480 features with values minus one zero or one which correspond to a stable bit representation from the from the packet re-established protocol headers read from the five first packets of each flow.

I

Basically, if a header is present in the in the is not present in the in the packets, the value is set to -1 and it's set to zero or one, depending on the value that is set in the in the packets.

I

This model achieved an F1 score of 0.99, which we're easily able to reproduce with their reproducibility artifacts, and then we use trustee to extract the decision-free explanation out of it with a Fidelity of 0.99 as a again and as you can see here, this top four pruning of it only has eight nodes from that tree.

I

Now, there's a couple of different, interesting things to focus here: the first one of each, which is that the first decision of this this decision tree is making is basically looking at the TTL of the first packet and checking whether the third bit from the TTL is one or zero.

I

The reason for that is that, because, in this data set all of the attack, traffic was generated in an outs from an outside computer, one hop away from the measurement point, and so uh this is basically checking whether the the attack was inside the network benign or outside the network, which is malicious. So it's basically able to tell the difference based on that uh most of the attack traffic was generating using Kali Linux, which is the initial TTL value, is 64.

I

minus one hop 63 third bit one, and then you get uh to split all the benign traffic from the from the rest. Now. The second decision is also looking at TTL, but it's splitting all of the DDOS attacks that were generated using Windows, Windows 8.1 to be specific, which has an initial TTL value of 128 or minus one hop 127. So, let's look at the second bit of the second package.

I

It could be the first packet um so yeah, whether it's one or zero, and then it's able to identify all of the DDOS samples in this data set now.

I

The third decision that here is also interesting, because, as you, if you notice the the the decision tree, is checking whether or not the value is negative or not. That means that the value the the model is checking whether no problem yeah. This means that the model is checking whether there is a second packet or not in the in the flow observed.

I

uh As most of you probably know, board scans are usually not responded to by by an attack by by it a victim. So all of the board scan flows in this data set only had one packet, so basically the model was able to check whether or not it was a port scanned by the number of checking the numbers by checking the number of packets in the flow.

I

And finally, this was uh our last observation. We noticed that this decision tree relied heavily on random bits from the TCP options headers uh in the in the flows and using the using the sorry and using the trophy part we iteratively removed features from this data set until there was only TCP options. Fields for this model to pick up on, and still this model reported an F1 score of 0.99 using only TCP options bits. So this this was very interesting to us and indicated that this is actually.

I

This model was not learning anything but was picking up on spurious correlations in the data we set out to validate this explanation by curating, a balanced data set of 4047 flows from real real world traffic from UCSB Network, uh a new Siri cut IDs to label those flows between benign distribution of service attacks and Port scans, which were the three classes that we were able to collect uh from the short span that we collected. As you can see.

I

When we used nprint ml model to classify that data, he was unable to identify a single one of the distribution of service attacks, denial of service attack, sorry, and it was able to identify very few of the port scan attacks. The reason for that is because, as I mentioned before, UCSB also didn't respond to port scan attacks, so the so it was able to pick up on a few of those attacks based on the number of packets in that flow, but, as you can see, it failed when put under minimal stress of real world traffic.

I

So our takeaway for this is that this model suffered from spurious correlations in the training data, and the reason for that is that, because of an initial called curse of dimensionality, this feature this this model has a so large number of features that is not able that the training data is not able to cover all of the possible feature values in that data, making this abundantly easy for them all to pick up on sparious correlations in the data like this.

I

So aside from these three data set three use cases that are presented. We looked at four other different use cases with diff with reproducibility artifacts available and found similar issues in all of them, including kitsunis Ensemble of neural networks uh for anomaly, detection and pensive's reinforcement learning model for adaptive, bitrate selection,.

I

In the paper, you also find an algorithm description of trustee, an ablation study on all of the design requirements that I presented and more information, as well as a user guide on the trust report. That I am that I mentioned before trustee was also packaged into a python package that can be downloaded today. It's available for anyone to use and has received surprisingly number of amount of downloads already uh and finally, machine learning.

I

High stakes scenarios requires a level of trust that the traditional AIML pipeline simply cannot give us, but trustee helps by improving the trust and providing uh High, Fidelity and low complexity explanations in the Forum decision. Trees uh trustee can be used today by anyone and be downloaded in that website or using pip uh and yeah. So just download the python package go analyze, So yeah. Thank you. Thank you.

E

All right, thank you very much excellent talk. uh Does anyone have any questions.

J

Hi Stephen Farrell, yeah I read the paper. It's a really good paper. Thanks uh I enjoyed it I'd, say I enjoyed reading the paper when I got towards the end. The beginning was a little bit harder work for me because it's something I feel, but the end actually was very good when you used it with the case studies. So I was wondering all of this depends on having access to the training data. Essentially right, if you don't, and perhaps in in a lot of real world cases, you won't.

J

Let's say if a vendor is making some claims or something is there any way to kind of envisage that you could extend this to handle cases where you don't have access to the training data yeah.

I

So uh you technically don't need the exact training data to produce an explanation using trustee you just get better insights using the training data, but if you have access to the model say through an API that makes the classification- and you have data to to that- you are able to use to test it. You can produce explanations for that data for using that model, but it I. That is a hard challenge, because not everyone makes that model available like that uh through an API for.

J

Example, and so as a follow-on, and could you envisage that there could be any recommendations you could make to people who are claiming that their models are super good things that they could make available? That would allow some kind of validation like this.

A

J

Let's also make an API available, or you know.

I

uh Yeah, so this is part of what I mentioned before about. We try to automate this analysis as much as possible, but it's really hard because we it's hard to tell whether or not the model is actually suffering from shortcut learning, for example, or the problem might be super easy and not need machine learning at all.

I

So it's hard to make that call unless you're actually familiar with the domain, but uh based we did have like a bunch of guidelines based on the values we we produce on the trust report that could indicate whether or not there might be something going on with the model. So, for instance, if you, if your model has 4 000 features and you're using one or two percent of it to make a a perfect classification using a decision tree. That is probably an indication that there's some problem going on.

I

We can say for sure, but that's an indication. We can tell a recommendation for people to check. That's.

J

Great thanks again and it's nice to see somebody being skeptical, but thanks.

K

Montgomery is: is there some notion that your resulting decision tree model is unique or optimal in some way or they're.

I

uh That is a good question, so.

I

It's not I I would say it's not unique and probably not optimal. It's we optimize for Fidelity, but there are different decision trees that might result in the same Fidelity uh when we achieve the best possible Fidelity for the use cases we analyzed. But if you have correlated features in the data set, you could generate a decision tree of the same Fidelity with using different features, for instance.

I

So we chose not to tap that information, basically by adding that outer loop to trustee to produce a decision tree, that's roughly stable, but uh one of the things that we discussed while developing is that there is knowledge in the different decision. Trees that you may produce the ex. The expressivity of a decision tree is naturally lower than this than the neural network. For instance, a neural network has more power expressivity power than a decision tree.

I

So and that's the reason we moved away from decision trees in the first place, so you can produce different decision, trees, explanation for the same neural network and they all might be true, it's hard to say uh it's hard to say, which one is the the one. True decision green problem, there isn't one it's amalgamation of all of them and that's something: we're kind of working on trying to tap on that. The information of different decision trees that might be produced because they are they they can be.

I

uh There can be valuable information there to explain how the neural network might be working, for instance, does that answer your question? Yes, okay,.

K

um I had a quick second question, just just to be sure right when you produce these curated data sets, you are retraining on them right, no.

I

No, no, we are evaluating the the model as trained originally on the curated data sets to see how they would perform.

E

Okay, I I, see uh Brian, Trammell I think is removed.

L

Well, let me pop myself back out of the queue. Oh no wait. I was supposed to keep myself into people. I am.

A

L

Brian Trammell um good morning from Zurich, um so I noticed something in uh thanks a lot for the uh for the talk. This was great, uh unlike Stephen, I haven't read the paper yet, but I'm going to fix that this week.

L

um I noticed something in all of your examples where, basically, um you went into the decision tree and were essentially doing analyzes based on fairly deep domain knowledge of how uh the sender's in a network operate right. This is something I think that's been missing from at least a lot of the of the ml um literature on apply ml to network security in the past. So so thanks a lot for that.

L

um Is it accurate to characterize the work that you've done here? Is basically automation, assisted analysis of uh of these decision trees right like so the the what trustee gets. You is over the first time to point out: hey you should look here um for an overfitting or an over classification. Is that a is that an accurate have I understood that correctly.

I

L

So so one of the things that that popped in my head, when Stephen was talking- and this might be something to look at um in uh sort of future work- for looking at being able to do verification without training data is, you can essentially generate synthetic network data um based on um your knowledge of how these stocks work right, like so the set that you have to explore if you're trying to extract something from an API is significantly reduced from the set that you and even just in, in the examples that you have here.

L

um You have a lot of examples where it's like, oh I can tell I have peacock metadata versus not or I. Have the ethernet header versus not I mean there's, there's a highly restricted set, so I think this is, is more of a for follow-up work.

L

I'd really like to see some sort of like an extractor come out of that, so that would actually get you to the next step of of of being able to automate um this analysis because it did look a lot like you know it looked a lot like okay, we use the automated thing and then a human had to go. Look at it, which is super useful, but um you know, has some scalability issues.

I

Yeah, understandably, yeah, that's actually a very good idea uh as a reference. I can point to at least a there's, a hot Nets paper that came out in 2021. That did something similar using ale plots, but they didn't generate the data they used ale, plus to sort of guide the collection of more data to cover more data points. So it's something something similar to what you mentioned yeah, but I could see something like that. Being automated.

L

Thanks a lot, and thanks for the talk, this is great.

J

H

Okay, um so the the start of the paper. Thank you for this talk. It was really nice. The start of the paper is about how you're, explaining the models right, but as I go further on it's more like you're, also commenting on the data the data set so that just because I'm familiar with the data Excellence work when you must have heard the data sheet for data sets.

A

H

Just think that this has a lot of applicability in knowing how to do not just model explanation but characterizing. The data set itself. Yeah I mean that I think is very new, and another thing is the Fidelity uh score that you have is you've shown that, with uh with the Precision recall an F1 score, have you considered doing uh specific, Information Gain measure.

I

I know I have not, but uh that that's a good that's a good point. I can look into. uh Regarding the data yeah, we did notice that most of the examples most of the problems we identify with this model arose from the bad data or for using the data wrong. Basically, so it's we had this debate a lot while developing whether we were finding issues with the model with the data but uh in in the VPN on VPN example the for me choosing, which features to use that's part of developing the model.

I

So that's also an issue with the model, even though it's coming from Bad data, so they're very intrically connected and some of the reviewers point and sent out to us, but so it could be used to guide how to get better data or Excellency data. As you mentioned, yeah.

H

Because, like I'm coming from an NLP domain- and there are- these- are very good- well-known data set Benchmark Benchmark data sets which have been like really exceeded by current algorithms. But then those data sets are, as you have shown with. The network data sets are also in a sense, biased, so interesting. This workers.

I

A

Right. Thank you.

E

It was Priyanka.

B

Is this no okay, so uh well the one between Diego Lopez, the uh no it's it's related.

B

Precisely with these comments on the on the data set Etc when we started to work with the AI models, uh I insisted very much that if you we were Network management practitioners Etc, we should focus more on the data and the models and we have been trying to uh to set up mechanisms for publishing data sets that are usable for for making and when you make this reflection about getting the mobile plus the data, because otherwise trustee cannot make or is much more difficult that you can make any sense.

B

I was wondering whether well not necessarily in the idea in the ITF or the irtf or whatever, whether it will be advisable that we try to push for a set of public data that could be equivalent to uh opening I, don't know how to call it open source, AI or open data AI or whatever. So just uh I I felt myself.

B

Vindicated and I wanted to just do to comment it and to ask you if you share the view yeah.

I

Thank you so much that that is part of. We do comment on that in the paper. That sharing data in networking is different from other areas that took advantage of the EI, such as images or text, because there's a lot of private data in networking uh and people are not willing to share that, and at this point, I think we've.

I

There there's been some initiatives like the people that produce seek ideas, 2017 to produce data for machine learning, uh that's publicly available, but they're all fundamentally broken uh that they make them kind of useless for us. So the at least the alternative we found was to be for people to work with their universities.

I

We couldn't publish, for instance, the UCSB data set, because that was Private for the uh the day we did publish the headers, only not the payloads of that of the of the traffic, but it allows the a lot of it allowed us to at least validate the model that was trained on the different data set. So that was the alternative we found and I think might be the one that we as researchers can take advantage of.

B

One of the things that we are trying is to convince the European commission, at least to fund this kind of data sets we're becoming to start to to be successful, to start to be successful. So it's something that it will take time, I'm, afraid.

I

E

All right, Doug, one last question:.

K

So I asked what I wanted to ask after the previous question. So did you re-evaluate the models using the synthetic data or the curated data as the new training set.

I

I'm not sure I follow your question did I.

K

So I was asking when you, when you had these, you know more interesting data sets or curated or synthetic whatever. You want to phrase them you're using the originally trained model, and now a new data set to evaluate them.

A

K

Did you turn around and retrain the models on the synthetic data sets I'm trying to figure out if you're saying something about the robustness of the model or just that it was poorly evaluated and trained in the initial paper.

I

um That's a good question because.

K

When you present, you know, as, as you know, shocking or interesting results from you know, 0.99 to zero is is interesting, but it's not quite giving it the chance to retrain on the better data set. If we're talking about the robustness of the proposed.

I

Yes, but uh I would say, the idea of this proposed is not to retrain right, you're, you're publishing a black box, and this is your solution to it. uh So their idea is basically to take this box and put in a production Network, and it should work, that's sort of how the these models are sewed to us and not that you're need that that you need to constantly be retraining it as you should. uh So.

I

Our idea with validating those papers and that on the data that was used was simply that curating, a different data set to show that, if you put this in your network as it is, it will break basically, uh but we did retrain if I'm not mistaking the the endprint ml model. We did. We train using the UCSB data set and he was able to pick up on.

I

It was able to do better with using adding the retrain data, but we only curated 4 000 flows compared to millions and millions of flows of the seek IDs 2017 data set. So it didn't have a big enough impact on the on accuracy, so yeah, okay,.

E

Thank you all right. Thank you very much.

E

Thank you. Congratulations again. Thank you. So much.

E

All right, uh that's everything we have for today. Thank you again to uh Boris and the Arthur for giving the talks. Congratulations uh on the awards, um both uh both of the uh Prize winners, will be here for the rest of the week. So if you have uh further questions, I want to talk about the work. uh Please, please do go talk to them.

E

uh Please also uh consider going along to the usual usable, formal methods group or the uh the raspberry later in the week, uh and seeing what's going on in those two two new research groups uh with that. Thank you. Everybody and I'll see you around.

K