Internet Engineering Task Force 110, 9 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF110-SFRAME-20210309-1430

Description

SFRAME meeting session at IETF110
2021/03/09 1430

https://datatracker.ietf.org/meeting/110/proceedings/

A

So I bet that you are all wondering what the countdown was for. um Sorry, no great reveal um cullen, I'm speaking.

A

Hopefully that works welcome to s frame. I think this might be our first uh second formal meeting happens like that. Sometimes um I'll be cheering. uh My co-chair bobo was probably happily sleeping.

A

I'm sorry, I can't say the same thing I'll turn my video off, so you don't get to see my blurry sleep, deprived face and we'll get into the meeting.

A

What's up, I'm just going to go through a few slides, we'll talk about the agenda and then I'll see where we end up everyone's read the note. Well, hopefully, it's on the slide. In case you haven't it's on the website in case you actually care to read it.

A

So I don't have a hugely long session this evening or this morning or this afternoon, so um we have some a few short presentations after some discussion on the list about uh the big picture stuff, uh I thought it might help to get uh someone to do a bit of an overview of the the use cases in the skype. So we actually have two presentations on that. I think tim's going to be talking about the the the meta-level questions and dr alex has presentations on how every how the technical pieces all fit together.

A

uh Then sergio has the draft on a proposal. That's now in ibt core, and uh you give some time to talk about the mls integration as well. So nothing all that exciting there uh in terms of status updates, uh not a lot to say. Oh, I should say uh before I go on: does anyone want to bash the agenda?

A

A

I think that since there was so much excitement on the list, uh there would be someone wanting to say something, but nevertheless we move on yeah. Not really uh not. A lot's happened in the past little while and we're kind of hoping that that will change.

A

But we had a good meeting last time, and so maybe we can have a good one. This time too, I'm gonna pass off to tim, uh I'll drive the slides.

B

And so with any luck, my audio is working um yeah, I'm gonna assume it is.

B

Somebody will tell me if it isn't um so yeah uh frame, scope and use cases um kind of trying to stand like a few thousand feet above this and and kind of see what's going on, and I based this on a little bit of implementation experience that I've had, but also listening to what the jitsi people have done and they've spoken publicly about their implementation experience and what they needed and whatever so it kind of comes out of that next slide.

B

B

So, to my mind- and I think this is something that we can like all discuss in a minute- it's just 10-minute slot and only two slides, so it should be. We should have time to discuss this. To my mind, there is only a single use case and that it is for selective forwarding units.

B

um What we're trying to do is to protect.

B

I mean the like really high level view is that we're protecting your data from your service provider, so that you know alice and bob can have a a private conversation without their mutual service provider having to be in the in the media stream in terms of being able to comprehend it. uh The flip side is also true you're. Protecting the service provider from your data, so the service provider is, is, is given, um let's say, plausible deniability about whether they were exposed to your share dealing plans or whether.

C

B

Saw your patent before it was filed, so it's kind of two-way protection here and- and yes, if you like, the clue in the sfu, is about selective forwarding, it's it's selective about what it's trying to forward.

B

um It gets a lot of stuff from from the sender and it only forwards some of it to the receivers, and so it's doing an impedance match between the source and the syncing effect, um and that's typically at the level of either bandwidth, because not all the recipients have the same downlink like some recipients have got much more downlink than others, and likewise the resolution, um some of the recipients have different screen sizes from others, and so basically the sfu is faced with the question of like which of these packets.

B

I've received that I want to multiplex out, ideally to everybody, which of them can I drop to somebody who can't cope with all of it and and that's the like. What we're trying to do here is is to support that. um Can we move to the next slide? Please.

B

And so I that brings us to the scope and what and to my view again, I think, like what we should be doing here is as little as possible like we're in a hurry. This stuff is shipping, gypsy have already got people doing this, and the risk is that people are going to hand roll the encryption for this. um For this use case, and there are some tremendous foot cannons here that like as true with a lot of encryption this, but but in this case there are some really wonderful ones.

B

So um I think what we need to be doing is describing what the unencrypted sfu receives in terms of info, so what it can see.

B

uh That, then, would then allow it to decide what packets to drop and potentially also it's useful for it to be able to decide when it can drop those packets, because there are some points where it's more useful to like switch layers or something like that, um and it may be necessary to unders for it to understand what it's like, what it should or could cache in terms of things like keyframes or last x, packets per layer, and that's to me, that's the total scope of what we need to be doing here, just describing these things and deciding what what they are and kind of.

B

I'm can't actually think of anything else that should be in scope, and I guess so. The purpose of this presentation is to put those two questions out there like is there more to the scope than this? Is there more to the more use cases than this? And if so, what are they and do they fit within the charter and that's kind of pretty much all I wanted to say. I think.

B

So I'm looking to see if anyone.

A

B

A

Yeah, that's what we're waiting for.

A

Oh look at this. It begins jonathan.

D

um I think I agree, but I think that there's probably more foot cannons than you anticipate, which is to say there's you know when you describe the unscripted info that sfu gets there's. That's there's.

D

Probably some foot cannons there too, which is to say both in terms of uh sending too much that you end up accidentally exposing information because, basically um encrypted media, especially equipment, video is pretty complicated, and so, both in terms of you know accidentally encrypting, something that turns out just a few needs and um accidentally not encrypting, something that you know you actually exposed something that you weren't, anticipating exposing are both potentially complicated issues here.

D

So there's going to be a lot of this, is you know, going to be a fairly hard problem, at least to do in you know, in generality and interoperabil, with good interoperability, as opposed to I know what I'm you know, I'm only doing this codec. I know that means I send bites one through six, not safe. You know with my encoder implementation and done.

B

Yeah I mean I, I totally agree that there are huge um pitfalls to fall into here, but I I think that what we, the one that we've already fallen into, is trying not is not abstracting enough, which is why I wanted to do this, like very high level view, is that I think that this, like we really are looking at what packets to drop and to be honest, if you don't drop a packet that you should have dropped or you do drop a packet that you shouldn't have dropped it sort of doesn't matter as much as you might think, because this thing is all over udp anyway, like it's, not a precision thing um and if you, if you send more packets than you, should have done they're going to get dropped by somebody else.

B

So I I think, there's a slight there's been an over emphasis on precision here, which I think is unhelpful. I think we.

D

Are generally less accurate, that's true, but that does imply things on right right now. I think there are certain assumptions that the ultimate decoders make about their media, that you know you know that there is a certain precision in it that you know that they're they're it's going to be reasonably self-consistent about you know. What's in it and you know, gaps are you know, semantically, you know is missing, and so I think that, but that's.

B

That's fine! So that's what a knack is for.

D

Not if a knack is something that turns out that something dropped deliberately and then the other.

E

Side thinks it doesn't.

D

Yeah, and though so I mean it's there's you know if if the receiver thinks you know, a gap means something goes needed, and the sender thinks oh gaap is just because I chose not to send that to you and they disagree. They said the receiver doesn't move forward. Then you have a problem so that there's you know. I think this imposes more requirements on understanding, potentially messy streams on the ultimate decoders than we currently have right.

D

Now, there's sort of an assumption that the sfus clean up a lot of mess which, given this environment, they can't so I think that needs to be understood.

B

uh Yeah, but I don't think that tells you how you have to solve that problem.

D

No, it doesn't just it's something: we're going to need to look at and decide and I think that's something we're going to need to actually just decide and make explicit that you know if you have a existing implementation that assumes a reasonably clean stream. Well, if you're dealing, if you're doing s frame, sorry, you don't get that you're going to have to you know deal with mess.

A

All right thanks jonathan uh bernard.

F

To uh underline jonathan's point one thing that I don't think it's been written down, but their decoders can be very finicky. uh An example is vp8, where the picture id not only has to be monotonically increasing, it has to be sequential and what that means is, if you drop a frame, even if it's not even if it's discardable it'll cause the decoder to find the stream undecodable, and what that means in practice is that the picture id has to be rewritten by the sfu.

F

So there are a lot of uh codec specific complexities here. um So if you say we have to create metadata that works for all codecs, um I don't know that. That's really possible.

F

I haven't shown ourselves able to do it yet.

F

And there's been at least three attempts to to achieve that. um The other thing is about the use case itself, which seems to assume that the service provider is uh separate from the from the application provider and the the problem is that that's a business model which doesn't really exist to any significant extent.

F

So, if you look at all the major products out there, the the sfu provider is basically the same as the application provider, and that creates a whole lot of problems if you're serving javascript, which actually is in possession of keys and then you're claiming you get any additional security out.

B

Of that, that's not really true, I I mean totally and hence the to the second point totally uh and hence the the the wording about kind of plausible deniability. It's not an absolute it's it's a it's merely moving the needle, um but it seems to be a moving and moving of the needle that people want now whether they're right to want it or not, is kind of almost not.

F

I think the actual real problem is something quite different: it's really attacks on virtual machines by other virtues right, it's spectre and all of that stuff- that that's really what the real real issue is. Yeah, it's really yeah so that that probably should be should be said anyway.

B

And I think just just to cover the first point, I think um there's an assumption there that that rewriting has to be done in the sfu, um which is kind of there's. This tacit assumption that the sfu will do everything it has to do so that the um the decoder at the far end doesn't know that s frame is involved, and I think what we're starting to hear is that may not be possible.

A

Okay, thanks bernard stefan.

A

Oh and by the way, I'm cutting the cue here because this session has only a few more minutes left.

G

Good morning is that now working.

B

G

Okay, great um so I I thought the same horn here that uh that uh bernard and jonathan had I would suggest uh that it is. It is a it's basically impossible to do this in a generic way. um I would suggest to add to the requirements that there should be codec specific restrictions on the codec bitstream complexity, things like okay, if you're using hvc you should you shall not use gradual decoder refresh you want to have a keyframe.

G

Otherwise the concept of keyframes and keyframe caching is meaningless right things like that, then you may you may have a chance, um I'm I'm not optimistic even with that, but you may have a chance. Thank you.

A

And next is magnus.

H

Yes, so I I I I want to turn, I think the question around is it: is it really decide what to drop? Is it they decide what to forward and and and for because, from my perspective, we are talking about media streams here and that's maybe the high level assumption at least is that there are several media streams. It's not just one set of packets that you have to select some random set, but it's it's. We actually have some underlying structure and that still comes through here.

H

I think in even videos frames, because you need to preserve that some information internally and some photos of you, but it's worse than the receiver, needs to understand these things, and I think that's part of the context here is: we need to have a model and we need to have saying okay, what general type of structures are we supporting and handling?

H

You know that's a few contexts and that's what you come down to.

B

That's an assertion, but I'm not totally certain that it's true. I I mean if you like, if these things are happening over a udp network and we've built the rest of it such that arbitrary things can be lost and it should work, and I I'm puzzled that saying.

H

B

Isn't good enough.

H

But so I on that I mean that's the other aspect. I didn't bring it, but so, unless you have congestion or saying your resource shortage in the bandwidth, okay, fine, you can send everything. But the moment you have constrained the amount of resources and you need to confirm to that. Then you need to select the set which works and if you're gonna drop random things, you're gonna get to something that doesn't work because you are not going to be able to if you're completely random drop.

H

You have a certain probability of it being able to repair it in time or not or otherwise it's late and that's what you end up being so.

B

Right so so what we're trying to do is improve on random, and my point is that we don't have to be perfect. We just have to be better than random and we and we yes, that might be good enough.

H

Yes, but be able to repair, it means that you actually need to understand if something that was lost is something that's useful for you, which means that you have to have structures that tells you that the things that didn't derive was something that you needed, and that is a thicker part of the problem here, because if you don't have know that this is an rtp stream at certain layer and that you lost something, you either need to make it shift. And do I need this layer?

H

Could I drop the whole layer, or is this the base layer and for the stream I actually want to show? Then I need to request getting this and you need to understand the difference between those two to be able to observe so.

B

You need you need some categorization, which could be opaque, but it needs to be categorized.

H

Yes, and you have to have that structure showing up, I mean that's what rtp does to you if you use it correctly today and that needs to be preserved when you put s frames in rtp payloads, so you can understand which things you did care about or not, and if the relay was related to not to something you cared about, even if it's only the receiver or the sender. That knows, if it's useful, in which contexts.

H

But I think that's enough.

A

I think that's a good point. Thanks magnus.

I

A

Being passionate, it's.

I

Not it's not often that I get to make an end-to-end argument here, but um I'm not working in webrtc or whatever I'm doing stuff. That is entirely green field research.

I

uh I don't see the reason why the reflector needs to know anything about the semantics of the streams, provided that the data is packaged up into different streams so that a selection can be made at some point and provided there is some virtual or singular authority somewhere.

I

That is saying which of these, what the streams are in the way in such a way that the consumers of those streams can make a choice. uh I think that is sufficient to be able to solve this problem. So say, I've got a phone, that's on a a weak link that can make the choice of which streams it is going to consume, and it's only that end device that can make decisions like.

I

Oh, I didn't get this packet, but I'm not going to ask for a repeat of that packet, because I've already shown the pres the next frame to the end user, or you know I'm going to drop all that traffic and just re-synchronize, and so I think that casting this more in an end-to-endy sort of fashion and looking at the service provider just having a um a black box that doesn't understand anything about the uh data, that's traveling through it other than that it happens to be tagged in into these separate uh pieces.

I

uh I think that's a better way of looking at it. uh It certainly helped me.

I

And that's it yeah.

A

Thanks phil, uh no, I think you're the last in in the queue before we move on to the next topic.

J

Yeah, so a higher level question about this slide the scope. uh Looking ahead at the other uh presentations, um I I see uh since a little mismatch. I like to make sure that the group understands which way we're going. um You know clearly this all started from being able to just support and end encryption, but I think a lot of the mechanisms that are being proposed are really about insertable streams in general transforms on the media in general packetization formats for rtp in general.

J

So I wonder whether or not the group is is going to favor general solutions or maybe even prohibit a general solution. Are you really only focused on providing the end-to-end? You know secure media solution, and is it a goal or a non-goal or explicitly don't want to support things that are more generic than.

J

B

I I mean to the extent that's a question for me, which it kind of isn't because it's a question for the group.

E

I guess we should take it to the list.

B

But but my my opinion is we're in a hurry um and we should limit this as as far as possible if it happens that an easy solution, that's generic, drops out, then that's great, but I think the idea that we should track um track insertable screen streams and make them a dependency on. This would be a mistake.

B

I'm hoping for another opinion.

A

I I think, I'm thinking we might have to take this to the to the mailing list, and uh hopefully dr alex can get himself in queue and we can have a discussion about the next thing, um but unfortunate because that was going well uh thanks, tim got got things rolling. I I would encourage people to write down their their thoughts and put them into an email that we can talk about, because no emails is much harder to work with than emails.

A

A

K

um Right so as one of the author of the original document and all the subsequent document scenes and of the charter we've been requested during the charter, part of sram, to give a big picture document and to show which part belong to which group, as in the past 20 minutes, many people pointed to there are parts that are media encryption in a part, rtt packetization, payload question and so on and so forth. So I'm sorry, I couldn't come up with an informal draft yet, but it's going to come here next and that's why.

K

The first slide is about evt core, as friend and in part whip, where we're going to put all the excellent question and problem we have within their respective working group within a global picture and hopefully uh within draft. So some of the draft didn't exist last time and are going to be cited and going to speak about in this session letter about sergio and by richard on specific point. But the big picture I think, is needed to avoid the 20 minute. We just passed next slide.

K

So, just looking at the original webrtc one which was supposed to be p2p, we had something like this, which is very simplify a source, an encoder, an rtp packetizer, the srtp part with the the key creation and the key exchange, um and then a transport. So it's simplified it's a little bit more complicated and that I put some of the rfc, obviously not all of them, um and this is what the rtp redundancy and condition control can look like with the pli sli and everything that fits into the profile for simplification.

K

I'll also show only a single stream. I don't go into simulcast and and nsvc until the end until the last slide next slide. Please.

K

So, very quickly, we had to hide the media server, so it's a truncated way of the media path where we don't put the receiver. We just put a sender, a media server and a receiver, and each of sender, media server and receiver act like a webrtc peer and so now what what was an end-to-end encryption in p2p become a hub by hope, and we can see exactly the same thing for illustration purpose.

K

I show a little uh red arrow uh to position where whip, uh which was the discussion two hours ago, should be, which is the uh signaling protocol on top of jsep when the sender and the media server are, are sending one way and you add exactly the same thing in that case, you have small modification in the sense that if the media server is an sfu, you do not need to manipulate the media anymore and what you do is mainly duplicating the rtp packet and modifying the rtp headers handling rtp, heater extension and in a certain list of things.

K

So, in addition to the original robot dc spec, now we have trickle we have pack, we have ice light. We can cite also um 6904, which was the first heater extension encryption and the cryptex uh proposed draft by justine that will be presented at avt core next slide.

K

Now, if we want to add the end-to-end encryption of the webrtc, multiple steps are needed right.

K

First, uh you need to add an additional filter between the codec and the rtp payload, um and that's what the um the draft and vt core about kodak agnostic is made and there will be a full presentation by sergio and uen on on the topic here. This is the ietf point of view. There is nothing about the web and this is what's going to be presented at nft core next slide.

K

Now, if you look now, if you assume now that the media is encrypted, uh the media server might not be the existing media. Server might not be able to do the job they're doing today, because they depend on access on the rtp, payload header, and so you need to decide which information the sfu needs and where, to put it so so far, the idea was to take the information needed in a structured way and to put it in a rtp header extension one or two years ago, at adt core.

K

The decision was that any new rtp payload should support frame marking. There is a question mark today with the frame marking is enough or not, and yes, there are some question open about what information should be put there.

K

Are we backward compatible or forward compatible and with the the new rtp payload for h.266 by stefan and and the other authors or av1, we come with svc codec that had a layer of complexity uh which needs to be dealt with in the sfu um and was designed for that and cannot be dealt really end-to-end.

K

So now you have two antagonist thing, which is: I want encryption as much as possible, but I still want an sfu if I don't have an sfu in the loop, I'm still p2p and an end to an encrypted, and then I'm done right. So one of the first questions still unsolved. That will require discussion at ivt. Core is what do we do in that rtp heater extension next slide.

K

The second question is uh now: I need to exchange a key end to end, as opposed to exchange a key hub by hope. How do I do that? So here? The exchange of the key- you know you rotate, the key and so on is is orthogonal to really the media encryption, so the s frame group was charted to define a media encryption independently of the use case and independently of of the key exchange a little bit like in a perk group.

K

You have a difference between the double documents and the three other documents that were all together designing a system. So sram is media encryption only now, depending on your application and, depending on the use case, video conferencing being very different from um streaming one-way streaming.

K

For example, you might want to use different system, but in all the case, what is clear is you need a key management system that is separated from the end points on one hand and from the sfu. On the other hand, you need to have the three of them separated.

K

So there is a proposal by richard by cisco to do the key exchange using mls.

K

There is corresponding implementation in safari with extra additional security things because of the web modes, earth threat model, and there is another implementation using olm by gt. I think saul is in the call today. Maybe he will tell us a little bit about that and I'm pretty sure that webex is doing things pretty differently, but they all fit that diagram, where you need to have a secure way to exchange the key and the key management need to be external next slide.

K

Now, if we go specifically into insertable stream and by the way the name has just changed so um can tell us what the new name is, I'm not sure what it is anymore, but the idea is now um in in in a web application in a native application. You control everything, but in a web application the trust model is different. You do not trust the javascript with the traditional webrtc implementation yeah with the traditional webrtc implementation. The key is generated by the user agent.

K

It's never it's never passed on by the javascript and so we're good. Now uh we need to find a way to apply that as frame transform and to get the key um directly in a user agent right. So here you we're gonna, create a an api called insertable stream api until this week. That will allow us to inject encrypted content to the rtp, but we haven't solved the problem of the the encryption and and the exchange of the key uh outside of the javascript.

K

So this is where the safari proposal for an implementation in native worker that is not accessible by javascript makes sense. So next slide, which is the last one, if you put everything together and you put it in police 6, which is absolutely unreadable.

K

This is what you get. So the little difference is the dotted uh white block. Is the insertable stream api, where you're gonna plug in the sram transform, which is protected in the c plus plus part of the of the browser in the user agent and then the external. So now I'm going to let sergio and uen explain a little bit more on the rtp packetizer problem and I'm going to let richard speak about the mls 4 key exchange.

K

We can skip that for now. So this is. That was a slide to answer magnus and colin question about. Does it disrupt any of the rtp or anything? So we went through every single rfc one by one and we implemented them and it doesn't look like it, but we might have missed a few.

A

Okay, thank you. uh Are there any questions based on on this, or is this just reiterating people's understanding already.

A

Great sergio grab grab a mic.

L

These are the slides that are going to be presented tomorrow on the on the apt core, so but not sure how it will go. If we will have the same discussion today and tomorrow or we will. I hope that we are focused on different angles, but so I.

A

Will yeah- maybe maybe maybe given that you have half an hour tomorrow? I think it is um then you can go through, and maybe you can talk about the the high level concepts and we can. We can try to focus in on on those questions that we were talking about earlier with with tim's presentation.

L

Okay, so this is light.

L

So the first is um is like the continuation of the presentation that alice has done. Is that um really with the with s-frame? What we are doing is inserting a new element in the rtp and in the media. Changes define it in the rtc7656, the the rtp, how it was called the taxonomy rtp. So what it introduces is a new is a new element that does the transformation that transforms then the encoder stream that comes from the front encoder before it gets to the packetizer.

L

So typically, the the the packetizer expect the stream to come from the directly from the media encoder.

L

So it can do um so it it passes the the media stream and it's on its frame and for a specific codec boundaries and information, and it then transform it into several rtp packets uh with the if we apply s frame or any other transformation that changes the the content of the or the or includes, for example, in this case the content of the or the media frame, then the current packet is cannot work um with the with the byte stream anymore and what you need to do is say several hacks based on each codec to make it work with this frame, as I presented in the in the last iatf in the last stream meeting.

L

Yes, tim is on the queue.

B

Yes, so um can you just back up and tell us why you have to do the packetization after you've done the transform? Why can't you do that? The other way around I'm sure there are good reasons, but I think they're worth stating.

L

Hey this is something that uh I mean. This is how this frame works. I mean we can. I should not be the one and media transporter. Yeah I mean there's, there are several I mean it is. This is how this frame works.

L

Now I can explain why a frame works like that is because it is a has less overhead when, when sending it on over the wire, because the the metal, the the encryption that associated the frame is sent only one per frame and not one per packet, so the overhead is, is lower and also because, um as alice has has said in this way, s frame is acknowledged to the to the to the media transport and can work with rtp and, for example, with quick, and this was one of the reason that this frame is working on a pre-frame basis and not an upper packet base.

F

My question is: when you say the current media packetizer doesn't support this. Is it really because of the packetizer, or is it because of the sfu that we have these hacks? I think it's more a few right.

L

It's it's really is both I mean is, for example, you cannot. I mean to make it work. uh Bp8 and vp9 are were cases because there is not much mangling of bytes when you do the bucketizing, but, for example, is four and you cannot apply directly to the to the end to the to the frame, because you have to pass the the the null unit. So you have to do a lot of specific things in the in the media frame in order to to create the packet. So it is a bit of both. I.

E

L

Both the things that are needed in the in the svu, and also things that happen on the on the packetizer itself,.

F

Yeah also, um you know, I think it's a it's assumed here, that the transformer doesn't really communicate with the packetizer in any formal way. I'm just questioning whether that's a hard requirement. I mean that's how insertable streams works, but it also creates a number of problems in that. If you increase the data size for some reason now, the packetizer doesn't know what's going on so um anyway. I I would just question whether I think this is a slide about the actual architecture that shipping, rather than the architecture, necessarily that we need.

F

So that's just a question.

L

Yeah I mean this is how it is implemented today, and how is I mean we could have its other ways, but the one that we have today is this one.

H

Yeah magnus, so yes, so I I think this picture is oversimplified and and that's and what we really have in this case is that you have a packetization step in some sense prior to the media transform.

H

And then you have a second packetizer afterwards, because you need to say the media. Encoding is outputting some data, but in reality, if you're talking about scalable code, video codes, for example, you're not putting out one encoder stream, you're outputting multiple encoded streams, and you need to packetize those individually and with the right amount of metadata, even internally, to be able to figure out where it belongs to and and and then you transform it into decrypt protected form, and then you packetize it again for the transport, yes to fit r2p in that sense.

H

So I think we have to be very aware that we're actually having several steps here and and what you packetize before you encode is is- is an important question. But, okay in this case, it often happens and it ends up in the especially in webrtc. It's gonna end up in the uh in the implementation in maybe in javascript domain, but but it's highly relevant so.

L

A

L

That this picture is not correct or should be more defined.

H

I think it would be good to try to take some case of actually providing more and more take a scalable video code and look about how you're actually going to do this and what it looks like because then you end up with this, when it becomes a fork at that in the immediate after the media encoder step, because.

L

The median code- yeah, sbc and and simult gas is specifically um work it out in the draft, and I think that I have a slide later on. Okay, good.

M

Colleen yeah I mean so this. This is certainly one possible way. One could implement it.

M

um One of the the decisions we we made fairly early in the design of rtp uh and that we have reflected in in all of the payload formats that have been defined is specifically not to build things or not to try to build things in a code agnostic way uh and to specifically not build um to try to do the packetization in a codec agnostic way, because the information um you you need the information, that's codec specific to to do this effectively and to make something.

M

That's robust uh and I'm wondering if there are uh are ways which would fit the rtp model better, which did this in a way which you know did the encryption of the contents in a way which reflected um knowledge of the particular payload formats, and that would then simplify the rest of the design, but by encrypting in a payload in a codec aware way. Rather than trying to do this in a codecognostic.

L

L

Because you will have to to specify how to encrypt each of the video and audio codecs, so we will be we'll have to to do a lot of work to just specify. How do we need to implement to do a frame encryption in? um Let's use four, it's just five h2c6 bp8 pp9, maybe one maybe three I mean.

M

Yes, so this is this is this is true, but I think you're going to do that work anyway and you're just going to do it piecemeal over a long period of time, as you realize the robustness problems, and I think if you do, if, if you, if you assume that's the model up front, you are at least aware of the the complexity that's needed to make this work effectively.

L

The the the only thing about the robustness is that, at least in webrtc implementation, we are not using that. So much so I mean while well rtp, is, is implemented or has defined it in into a very in details, for example, let's use for slices- and I have not seen any implementation- that it is really using it a part of in a very codec a specific way. Yes without rtp knacks, fake and things like that, but I don't have not seen much traction of a percodec.

L

Robustness thing so this is.

M

But I'm seeing a bunch of people in the chat uh and the comments magnus was putting putting in was saying that this is needed to make the the selective forwarding work. So I'm not sure this is necessarily accurate.

L

M

Well, I have to remember just because some implementations of poor quality, I think, doesn't necessarily mean that they all need to be.

L

I would love to hear more about how that it is say, use and how how it is done or where it is done, because I, at least in the series that I know it is not also done so if magnus can provide more details about how this game codec specific robustness is using cnsp use and something like that. I would really like to see it.

L

I mean because, right now, in all the s, videos that I know it is not the case. So if.

H

Sorry for talking- and I I trying to understand- I think we're misunderstanding each other- maybe here so I understand what are you trying to say that I have said.

H

I didn't really understand because when I'm talking about detecting losses, I've talked about that on on in classic rtp, which is on ssrc level, you can see a gap for a particular ssrc in the sequence number space, and then you know that you're missing something and if you mapped one ssrc to one stream or layer in certain case you would know that. Oh I'm missing this. I know that people have many cases implemented with scalable codecs.

H

They do maintain part of the structure saying okay, at least in the higher levels, and in some cases you smashed everything together and say: okay, I'm going to repair and it's select and assume that the sfu done the right thing, but with s frames you're not having that insight into this as easily, you probably want to have more ssrcs, so you can see that you lose a particular layer that you're forwarding. So you can try to decide if you're going to repair this and know that this is a particular layer for a particular source.

H

That's what I'm meaning.

L

Yeah- and I agree with that- and this should be covered with the with this packet decision format- I mean- I hope, that we are able to cover.

L

N

uh Let's just try to reiterate the point here of of that. I I think what I heard from and magnus is- and I I totally agree with. I think we just need to think about this way of that in your picture like if you zoom in on all of that, where the encoded stream goes into the media transformer, it's not really a stream.

N

It's already been basically packetized at that point and there's that encoded stream actually is packetized in some form before it even comes in and that packetization is inevitably codec specific at some level I mean you know you have to implement different code for for different things. There I mean I get audio separate, I'm talking about scalable video type things, um so I think, if we just sort of reflect on that's already sort of happening, it sort of maybe re-changes how you frame and think about this conversation.

L

Well, yes, or no I mean I agree with that. What it happens comes up with the from the media. Encoder is spaghetti, but in a frame way. So it's uh an array of bites. I don't really need to to know exactly what's going inside. These bites.

N

Well, you need to know a little bit more than it's a chunk of bytes.

L

I need to know them. I need to know some metadata, but I don't need to know the actual format. So, yes, I need to know some this metadata, so if it is an iframe if it is in some other stuff that is later on in the presentation and but I don't need to know the actual format or the syntax or the bytes, but I need I for sure need to.

E

Get something back.

L

I need to to get some some some information, some metadata, that we are going to that. It is also the one that it is that this we use it needs to to perform the last selector.

N

So I think, really focusing on what the gap between those what the sfu needs to do, a good implementation. What the other things need to do. It is the right thing to do, and I I don't think that the arguing this is how we already implement. It is really compelling for people here, like I think, like really trying to get.

E

N

Right design for s frame is what we're trying to do here, so I I mean, pull apart those of what you know like the the like the form of argument of the sfus that are already implemented. Do it this way, therefore, we have to implement the clients like this, like that's, not very compelling. Both the sfus and the clients need to change to implement s frame, we're defining what both of them do here. So, let's, let's.

C

Get the right, yes, sure.

A

So, unfortunately, we're out of time uh I apologize for those people who just screamed and jumped into the queue. If you have something very quick I'd like to give time now to to richard and encourage everyone to go to the avt corps meeting to discuss this.

A

A

Mr bonds, all right, my audible here.

C

Yep all right yeah, I have audio but no videos, sorry um so yeah. This is about um that's tremendous and thanks alex for the queue up here next slide. Please I'm going to breeze through this pretty quickly. This should be pretty straightforward and non-controversial so, like alex said s frame needs a way to get keys. Mls is a way to provide keys um in srtp. We needed a way to get keys.

C

We use dtls to provide those keys, but dtls is point to point um and for these conferencing cases where s frame is really useful, we need something that does groups that does multi-party uh key exchange and that's what mls does mls core competency is that it does group stuff natively, unlike dtls, so the idea here is to take this group key exchange, primitive and kind of swap it in for the role details plays in dtlssrtp and then take the keys that come out of it and use them for s frame um yeah.

C

The bottom part here is just noting that we do have a working group deliverable on this, um so we should probably adopt a document to fulfill it, and the proposal here is going to be that we do it with this document next slide. Please.

C

So, there's really only two things in this document: it tells you how you take the keys that mls generates for you and you put them in s frame and put them to.

E

C

How you signal them, etc? We'll talk about that in a second and then there's there's also some negotiation of other parameters that you need to use s frame, um how you negotiate those using mls next slide, please!

C

So on the keying side, if you look at kind of what sframe needs, what mls provides um s frame um there, the encryption framing has frame requires a a mapping where uh the sender and receiver can look up a uh a key based on a key id uh that is sent in the s frame header.

C

These keys that are used need to be unique per sender so that you don't have nonce or use problems, um so you don't have different centers setting with the same key and nones pair and have non-streams that can cause aead algorithms to fail.

C

Now, on the mls side, what mls provides is a sequence of group keys um mls divides time into epochs. Whenever someone joins or leaves the group roughly, you get a new key so that the old people who left are locked out the people who joined are let in so you end up with a sequence of group keys.

C

So you kind of need to map the sequence of group keys into the key id space, and you also need to make them per sender, keys and signal which sender you're using so what the draft defines is like how you derive these person, your keys and how you signal them using the key id field. Next slide, please um to do this. We just do kind of the obvious thing um for deriving for sender keys. um We first export a key from the mls context, exactly the same as dtls's rtp does with tls.

C

They use an exporter to get a key and then the first sender keys are just derived by hkdfing things off of the that master seeker that epoch secret the index. That's input here um is something we assume is configured um mls. um One of the things it also provides is an index each. Each participant in the group knows what has a unique index of that that participant's location in the group.

C

So we can reuse that index here, um and so we just encode that index and use it as a as an infinite hkdf, so that we get unique keys per sender, then to signal the stuff in key ids. We just take the two integers. um We truncate the epoch to a certain number of bits, um and we shift over the cinder index on by that number of bits and put that on the left-hand side, so we're here we're using the uh the extensible nature of the key id field.

C

That say a variable length integer up to, I think 64 bits um and taking advantage of that to to you, know, put these numbers uh two numbers in there. Instead of just you know, one opaque number now. The idea of this, this um e is capital e number here is that we we're only going to carry a certain number of bits of epoch.

C

um I think um you know something like well, and that implies the fact that you're, only using a certain number of bits implies that you're going to have rollover. um That's that's! Okay, it's unlike say rollover of um the sequence number in srtp.

C

um What this epoch um represents is, um which you know it's: it guides your key selection, so you're not going to have um say nones for use like you would have with with um sequence number ruler and srtp, but it means you'll have decryption failures.

C

If you have a rollover and people aren't keeping up um so basically this you know the width that the application chooses for this uh epoch field um is going to define how how much reordering, how much how quickly people have to keep up with epoch changes in order to not have decryption failures.

C

Since epochs only change on joints and leaves, it may not be a huge problem, except during the high churn times of the group, so I think we're not going to need a ton of bits here in practice, but I think that's something we can lead up to the application and manage next slide.

C

Please, the only other technical content here is negotiation of the uh the aforementioned things. So, um obviously, if you're going to do s frame encryption, you need a cipher suite you're going to encrypt with the s frame. Spec defines a collection of cipher suites that we just referenced here. They have ids and we it's a typical kind of offer. Selection paradigm is in tls.

C

The difference is that um in mls, the participants put this offer in a key package that describes their capabilities uh and then, when they are welcomed into a group, they find out what the group is using uh for, for you know the specific choices for these parameters, the only other parameter besides the cipher suite is uh the number of epoch bits, so the the epoch underscore bits field. Here is the same as the e field.

C

The number of bits in the epoch on the previous slide, so everyone needs to agree on that, so that people can decode the key ids. So we just signal that in this in this welcome extension, one more slide here, I think.

C

Yeah, so we have implemented this um key management part in the s frame uh repo there. um You know this is the scheme that we're uh you know working on putting into into web access we're doing the the stream implementation there document-wise. I think it's pretty okay, it's it's obviously functional at least to a basic level. um I think raphael was. I don't know if he's on the call. This morning he was thinking about adding some recommendations about how you manage the mls groups that you use for s-frame.

C

For example, if you have like a messaging group that you've got a long-lived, mls group for and you're going to have a temporary mls group just for a call associated to that messaging group, you might uh use some psks to connect those two groups and prove that the folks in the call were also part of the uh the messaging session, um so that's kind of advisory stuff for how you manage your kind of an extra meta level of mls management.

C

um But I think mostly, like I said at this point: it's pretty functional and mostly just it's going to stay abreast of s frame as this frame evolves towards standardization and that's all I have um so yeah. I think this is in pretty good shape. um Then I I would like to propose the working group uh take it on for that uh deliverable I mentioned at the top.

A

Thanks richard well in the last minute, we're not going to be having a lot of opportunity to discuss this, but what I think we might do with this one is take that list uh and we can have a discussion about that. It's unfortunate that the other work that we have sort of doesn't really exist, and so um in terms of it's not formally adopted. So it's almost like putting the cart before the horse in a way, but we can.

A

We can probably have that discussion on the list because looks good, and with that I think we are all done. I noticed my co-chair has arrived welcome thanks to our minute taker um watson, who has produced some excellent notes here and um thanks to everyone for coming and having such a a good discussion. Maybe next time we'll have more time to discuss.

E

Bye yeah. Thank you. Martin thanks also watson for note taking my sincerest apologies. I saw the utc plus one time zone and misinterpreted it as utc. So here I am three minutes early for a meeting. That's actually an hour late. I'm so sorry about that. Thank you. Everyone and I look forward to catching up with the.

E

E