W3C WebRTC Working Group, 24 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: WEBRTC WG interim 2021-11-24

Description

See also the minutes of the meeting at https://www.w3.org/2021/11/24-webrtc-minutes.html

A

Do we have a volunteer describe.

A

It is a pretty important function.

A

uh Can't really do much until we at least get one volunteer.

A

A

uh Floran, can you volunteer to take some notes, basic notes.

B

I can for a little bit, I'm not sure I'm going to be able to stay for the long whole duration, okay,.

A

Yeah, it doesn't doesn't have to be very elaborate, just basic decisions that we reach. That's about it! Okay, thank you all right, so uh the code of conduct we operate under it, w3c code, ethics and professional conduct and we're all passionate about improving whatever you see, but let's try to keep things cordial and professional.

A

So just a few thing about the tips we're going to be managing a cue. So if you type plus q and minus q uh you'll get into and out of the speaker queue and please use headphones so that we don't have a lot of echo and state your name.

A

uh I don't think we're gonna need a poll, but if so we can, we can use that all right, so meeting recording policy. I did send a message to the list that we are recording this particular meeting and that's been our policy, but we are bringing a poll to the list to decide whether to continue that or change it, uh and the options are to continue to record and publish record but keep the links private accessible to working members only or don't record.

A

So please give us your opinion by december. 6Th and we'll, if we have a change in policy, we it will take effect for the december 14th meeting, all right just a little bit about document status. Just because something's in a repo doesn't mean it's been adopted.

A

That requires a call for adoption and editors jeffs, don't necessarily represent working group consensus. The working group drafts do, although now with the continuous publication policy they're pretty much the same thing, which is a little bit confusing, but it is possible to merge pr's that lack consensus, but it's good to attach a note indicating what's controversial. If you do that, all right, so here's what's on the agenda, we have media capture, transform harold and yanivar.

A

Then a lot we'll talk about region capture and then we'll get into the next uh envy use cases and face tracking stuff. All right, um so we'll give a warning uh try to keep strict time, so everyone gets their share uh and give a warning about two minutes before the time is up so harold and yanibar. You are uh on.

C

Thank you. So, at the last meeting we had a discussion that led to.

C

A I would call it the consensus that we had three things to decide about. It was uh issues with what what wg streams it was. uh It was two different proposals for ip api shape if what wg streams are deemed acceptable.

C

So we have the meeting some of us with the representatives of the what wg streams. Folks and uh you- and do you want to speak to that this sled.

C

I think it's yours.

D

Actually it's it's mine, I I did a summary of the I didn't know where to put the slide, so I don't necessarily need to take up your time, uh but if people want to look it over or if we wanna, I can talk to it for a minute or two.

C

D

All right cool, so we had a meeting with the wg working group that covers the stream spec and I think overall they are receptive to making readable stream work with video frames and we've filed various issues and even discussed solutions that have been presented in those issues uh and I'm listing them here. I do a thumbs up to mean theoretically solve. That's, not that's mostly like a positive indication or a sense of that. This has been worked out.

D

At least uh the discussion seems to have uh landed in a place where it sounds like people think this is uh solvable, um and then I marked a number of construction hats to represent difficulty that, but this is just uh spitfalling here. So, in order to stream video frames, we need to solve a couple of things we want to avoid buffering in transforms and as a proposal to allow high water mark zero in writable streams and tr transfer streams, uh and that seems fairly simple.

D

uh But the biggest issue remaining issue, I think, is around cleanup or like a chunk life cycle, as we often call it, which is that we want to avoid relying on garbage collection for video frames uh when chunks are dropped uh or when the stream is errored. And I think there are two proposals being banded about there.

D

One is to pass those that, when you're, in a condition where you're dropping or erroring that you pass the remaining chunks that have been building up in queues uh to the sink and the other one um which is perhaps a more ambitious but better solution might be to imagine a new uh body, l type, that an object can be closable, which has some parallels to his tc 39 proposal uh to expose this to javascript using a simple dispose method.

D

And so that's a nice tie in there. If we can go that route, and uh so those are the main issues and then at the bottom, I put in the two issues that we need in order to make t working, and that is, we need to be able to clone chunks, because video frame requires that and there's a simple, structured clone proposal boolean.

D

That would be trivial to add um to solve that, uh and then the other problem is to avoid drift, because two branches can uh drift in time and the early idea there was just to have a synchronized. True, even better would be to have a real-time mode where we can drop chunks instead, and that would uh be preferable. But there's some difficulty there if to implement that. Well, it would be simpler to do if we had the closable web ideal thing.

D

So overall, I think uh we're happy to see progress there and it looks like the working group is interested in solving these okay.

C

So so this at the moment we, yes, we have issues that need solving, but uh we're fairly confident that we that they can be solved. So we can proceed to work out how the api should work when, when streams are part of the solution, next slide.

C

So from the last meeting we had the comparison between the the one that the proposal that gandiva wrote up and the one that I wrote up and we've discussed this a little in issues on a major dream- check: media stream, midi capture, extensions repo to see what we can come up with as possible and a possible starting point for working group. Further work.

C

A document that could be could be adopted by the working group and after suitable processing by the working group, yeah, be called called for a first public working draft of the working group next slide.

C

So the proposal wrote forward.

C

We document the stream space api.

C

We document in the code that is in the web idl the parts that have consensus and we add notes to reflect the parts that don't have consensus, and we ask the working group to adopt this document. As a starting point and aim for as little needed as possible before we're ready to call for a working group, first public working drafts next slide.

C

And this is, this does not represent the viewpoint of any organization, it depends represents something yaniva and I were able to agree on so in details.

C

We start from the aldosterone repo media capture, transform we mark all the generating and consuming apis disability dedicated worker, because we all agree that, no matter what else it needs to be a dedicated worker and we modify the media stream track generator in line with the universe, video track source proposal, I mean I don't. The names are subject by blackshot, but the principle should be clear because that's what we seem.

E

C

On so we'll add the following notes: that there is no consensus on uh not adding window to this built-in there's no consensus on whether or not audio processing should be added and a note for backwards. For those who want to are worried about backwards, compatibility that they think a media stream check generator in the old name can be implemented on top of the new api and will insert some some sample code in the backwards compatibility section or something that says.

C

If you shim it like this, it works and then we send out the call thread option.

C

And if that is successful, we will create a wtc media capture, transform repo based on aldosterone medication, and we will- uh and we will then proceed with the usual working process of raising banks and making changes until they think that this is ready for first public working draft publication.

C

So that's the proposal: let's have some discussion.

E

uh Uranus in the cube.

F

Yep thanks both of you for our presentation.

F

um um Two main points, uh the first, I'm not clear whether audio is out or not, uh for what is consensus and what would be in the first uh in in the document uh that would be presented to working group, and uh my understanding would be that audio would be out uh initially, and I think that's that's that's good.

F

C

To make it clear that the code, that is the idl we'll say that audio is out, and the note will say that we do not have consensus on this point.

F

Okay, but um and yeah, I'm not sure if the web idle can say that audio is out for media stream for a processor.

F

But if it's, if we have consensus there, then I think it's it's good. The second thing on the stream. My understanding was that we have the ear of the work. uh What what? What program false? So that's good, but uh we still not have like uh when we asked the question: are we? Are you confident that we can solve these issues? uh It was uh the answer was we may be able to solve those issues, and uh I want something stronger uh and we have a meeting next week.

F

So maybe next week we'll make progress, uh but I think that we, the answer to the question, should be something like yeah the plan. uh We need to work on it, but we're confident that we will fix the issues that you identified and if we are getting there, then I feel confident that we add up this proposal.

C

Don't mess on the queue.

E

Yes, on the: what would your uh discussion um so I I agree that the last meeting was still the level of confidence at the last meeting was still maybe not as high as we want. um I think, at the end of the day, we need uh to build the level of confidence on the path to solutions more than on the path to the.

E

What which is folks adopting the particular solution I mean at the end of the day, if the implementers want to do something that is called a wcc flow, that has a characteristic we want, they can do that. We can do that.

E

So I think really, the question is: is there a path toward uh and building on what geneva was saying earlier? I think the the major question is really about the life cycle of uh chunks, and so once we've convinced ourselves and ideally convinced the what with the editors, but I just want to say that, at the end of the day, first and foremost ourselves and in particular the implementers that this is manageable, then I think that's good enough.

E

I would go even as far as saying that, if uh the webrtc working group and where about tc implementers uh wants the streams to do these things and the pressure on these things, getting done is likely to be very high. Given the interest in the technology and the particular piece we're discussing, so I yeah that's my assessment of how we should make that evaluation.

C

Yeah I put myself in a cube there just uh to make a make an argument from there from augment from the floor, so to speak. That.

C

That getting a working group item adopted and getting a working draft adopted does not me shouldn't be gated on all problems having perfect solutions.

C

So I think we have reason to agree on what we experiment on and try out and and push forward with.

C

And if we document that as a as a working draft for the working group, we need to go forward with that. Even while we are sorting out the other problems.

C

uh Tim is on the queue.

G

Yeah, maybe this is the wrong venue for the question, but digging into the closable thing: what happens if you don't close it like if it so you're right? If I understand it correctly, you're, adding a method that needs to be called when you, when the developer decides that they finish with this, this frame, um what happens if they don't call it.

C

What happens now, with the with the the case that we're worried about is when you, when you're.

C

When you crash a stream when you disconnect it or destroy it or whatever, because in that case, there's no code at the moment that close, that is able to get at the frames to close them. So what will happen then in the no in a normal score of things?

C

That is that these things hang around until garbage collection comes together and, as we know, the time of when garbage collection is coming to get them can be highly unpredictable, and if these are lim, if these frames frame buffers are limited resource, then losing some is is trouble.

G

So so I I'm I mean you know, forgive me this is probably maybe too too deep into the weeds here, but but I feel like we should. We need to be very clear about what the expectations are on the developers here and and allow them to be told that they're not doing it right somehow, rather than just having kind of random browser slow down or weird behaviors, um and I totally agree that this is probably not.

G

This is something that will crop up later but um again, but I want to raise it now anyway, and now leave it yeah.

C

Good point: let me see that was your neighbor.

D

Oh yes, I was just going to add if, if it would help the group, could we possibly also add notes to the document to mention some of these uh life cycle issues? Perhaps, and would that be a way forward?.

C

It's possible, it's possible that this wouldn't wouldn't hurt.

C

I'm not opposed to doing that and I I do think that uh backtrackers are better at tracking issues than notes and documents, but we have a tradition for notes and documents. Referencing black trackers, sure.

C

F

Win yeah, my my understanding is that, uh honestly, as long as we do not have a good life cycle um as long as we do not have a good api that says: hey you're, the owner of the owner, we really cannot make progress, um so I feel, like I feel, much more positive now that it will be solved, but definitely we we need to have. uh As dom said, we need to be convinced that we will have a solution there. We, we cannot like, say hey.

F

We might be able to survey in a year from now. uh Let's, let's wait for somebody to to do it, I I I don't think we we should do that. We should first state okay. We are confident that we will solve this problem. We have this forward. uh We do not have to wait for this path to actually ship in the streams spec to make progress for sure, but we we need to be confident that we will solve it.

D

Yep yep to respond, I think we are confident and we have at least two solutions on the table for addressing it. So we have two to pick from.

F

So hopefully one of them will uh fair, but when I, when I had the question at last meeting, uh the optimism was very low for this particular issue. And since then we filed issues we discussed and we started to make proposals, and I feel more confident- and I want to validate that. This optimism that we seem to have is actually shared by everybody. And if that's the case, then uh that will unblock a lot of things. For me.

D

Good point: yep.

A

uh Yeah, I just wanted to make the point that um we haven't been doing a great job of having sample code uh for a lot of these things and in particular one thing I've noted I I know yaniv has written some samples, I've written some samples, it's and I think, looking over the samples, we haven't always cleaned up things correctly, even in the samples for all of the I mean we're doing the basic cleanups, but not for all of the error conditions.

A

So I think that might be a solution to is is assuming we eventually get something that can solve. It is actually have show people how to do the right things and they're not entirely obvious, because there's a lot of error conditions you can have.

A

You know you have to respond to cancels and all of that and make sure you clean everything up. So that's just my comment.

C

You know my first experience with trying to use streams for something I wrote. I wrote a code that tried to transfer to to detach a stream from its sink and attach it somewhere else, and that was after the spec had been adopted and had been out for a while, and they were said to be production codes available.

C

It turned out not that none of this none of the code out there actually implemented what the spec said. That should happen. In that case, it didn't work so.

C

Having the api to experiment on this definitely good for demonstrating uh situations in which bad things can happen,.

F

To echo what bernard said, I I think that in the examples when I run them, I think that I saw sometimes the console.log message is in chrome, stating hey this video frame was uh garbage collected. Yes,.

A

That's what I was referring to yeah and you can actually feel the browser get sluggish until the garbage collection happens, which means that yeah stuff is being leaked.

C

Yeah and ben wagner notes in the chat that please file bags when you, when you see those because he's in charge of the of of the of tracking of tracking those samples, yeah.

A

Yeah, actually part of the problem is that um the samples involve multiple apis, so they wouldn't just be there's not a lot. You can do just with media capture, transform right you, but uh samples showing the whole pipeline, I think, would be useful, I'm not sure where they ought to go, but they'd also involve web codecs and probably a web transporter a whole bunch of stuff yeah.

C

We have uh samples uh involving web codex, we don't have samples involving web transport. Yet, as you say, yes, these cross multiple specs yeah.

A

Okay, uh so we did we get, um I guess the basic items here in terms of conclusion is, I think, we're going to do a call for adoption right.

C

Is that correct.

A

C

Have not heard anyone object except uh well, u.n has been warning us, but we might not succeed, but.

F

I think the idea is first, that there's a document that is edited right with the changes that you mentioned and then it's been it's presented or then there's a call for adoption, but I don't think the idea is to call forward option the current uh draft. No, I have not.

C

Made it I have not made changes to the draft yet.

A

Oh okay, so the first step is to make the changes for the draft, and then I guess announce that on the list and then the second thing would be a call for adoption of that thing. When it's when it's ready.

F

A

F

I would tend to wait the next meeting next week.

A

uh Yeah, well, I don't think any of that is happening instantaneously anyway. So the waiting, the waiting, we're very, very good at you win.

A

We we we exceed at that. Okay, so I think I think we have the action items for the for the minutes, all right. So uh now I think we're ready for ella.

H

Yes, hello, um yes, uh can everybody hear me? Well, yes, at least one person okay. uh So uh thank you very much for your time, and here is uh what I'm going to be presenting about today. It's region capture uh we discussed this a few months ago and now I'm going to reintroduce the subject um so imagine that you've got one tab running an application. That's actually composed of two composite to a composite of two applications.

H

For example, you could have a presentation and the video conferencing application merged into one which would have been convenient, for example, right now right. So in this case um you would want the you would want to have a button share the presentation. You click that button you capture the current tab, but then you don't want to transmit everything remotely, but rather you want to crop it only to a content area right. So in the example that we have here on the left, we've got the presentation and you don't even want all of the left side.

H

You want a subsection of that because you might be you don't want to take the speaker nodes, for example. uh So obviously we would want an api that would allow you to do that in as normal. You would want to be able to do it performantly, robustly and ergonomically, and robustly note that the user might change the size of the window.

H

Might scroll might zoom and things and there could be layout changes and whatever way you choose to decide you choose where to crop to that needs to update really quickly so that you don't miscrop some of the frames, because, if the other, if the remote users, if remote users, see the your speaker nodes for example- well, that's that could be a bit embarrassing and for the user and if they can see their own videos again. Well, that's embarrassing for the application for not having the appropriate level of polish.

H

So we want to take care of those now note that in this example, it is not. It is intentionally unclear whether the presentation is the top level document and it embeds the video conference, the other way around or something different altogether, and I claim that a good api would not actually care about that any kind of application. Whichever way it constructs it structures itself would be able to use a good api next three. Next.

H

Thank you very much. So I just want to take a step back and let's just remember that it is currently possible to already get the current ad right there. You can call get display media, and if the user wants to share the current tab, they can choose the current tab. You can also do that with an extension and hopefully in the somewhat near future, get viewport media will be yet another way that you can get that.

H

So we take this as an article of faith that the user already started capturing the current tab, allow that or rather allow the application to do that uh and when that happens, the application gets absolutely all of the pixels here right and we are just trying to give it away to cut that to opt into removing that next slide. Please, and maybe one more.

H

And I just uh want, because uh some people I've spoken to about this have misunderstood. I want to make it very, very clear that I'm not proposing an api by which the the captured content will decide uh will be able to censor what can actually be captured. uh The sender, the captured content is not the controller here and if anybody wants to suggest some kind of mechanisms for for such control in the future uh be my guest, but that's not part of the current proposal. Previous slide, please.

H

So um when we uh speak about this, you might ask okay, but why do we need a new api I mean pixels are frames frames contain. Pixels frames can be edited just do that by the application, and I claim that this would not be nearly as good uh first of all performance. I hope that it does not need much elaboration to say that the browser can do that a bit better, uh because the browser can do that before it starts shuttling. It.

H

You know from in the case of chrome's architecture, from the browser, processor or gpu process uh to the render process, and then you send smaller frames. You've already gained some performance, so that's kind of a given and there could be other arguments there. Robustness is the key point and, as I mentioned when you scroll when you zoom, when you change the window size any way, um if you've got multi uh multiple documents in the tab from different origins, they're in different processes for them to start communicating. Oh, I just got scrolled, so my new.

H

Actually, you might want to use those coordinates. Instead, we've looked into a couple of those solutions and none of them was none of them was able to actually produce a guarantee unless it really really sacrificed the other two points, performance and ergonomics.

H

So if you start communicating all between the various documents about every single frame- and you delay transmitting it remotely until you've gained some kind of confidence that it is from before any kind of scroll event, etc happened, maybe you can work around this, but if you want all three, the browser needs to step in and ergonomics. uh You know I've just spoken about, but obviously if there is an api- and it's not too complicated to use that's easier- that's easy for the application next slide double x, 2x! Please, thank you. So.

H

So one more thing to uh to look at here- and I think I've already mentioned- that is that we don't really want to crop to an iframe because you might want to crop to some div or some other thing inside of that iframe. So whatever, when you call get display media from document x, you might want to actually crop to something. That's deeply nested several iframes in next slide. Please!

H

So I propose this api, which I hope is relatively simple uh and I later I will explain some of the key design decisions here, but basically, when you want, um let's, when you want, when there is something you know you might want to crop to, you've got some kind of content area. Let's say that everything goes in a div.

H

You call produce crop id on that div. You get some kind of token.

H

For string you pass that on to whichever uh you can post message it you can do whatever or maybe or even inside of this very same document. You give it to whoever has the track and they call crop2 on the track and give that id. So next slide, please so here maybe I should have actually done this. So imagine that in the capture uh in the capture target, so in our case it would be in the presentation.

H

It knows there is some main content area that does not contain the speaker nodes, so it says okay produce a crop id for that, and then it just gives the crop id to whatever so send crop id. The third line here it might just be a direct call to start croptaria capture. If it's everything is in the same document, but otherwise uh it could be a post message to a child iframe. uh It could be anything and then on the capturing side, which again could be the same or could be different.

H

You just call uh on the track. You call prop to prop id and then it starts cropping and it keeps on tracking uh the size. So whenever the div gets relayed out on stuff, like that, uh it just keeps that and for every single frame the frame is um cropped uh exactly to those coordinates uh next slide, please, so you might ask um okay, why not coordinates? I think I've covered that uh you might ask.

H

Yeah, so sorry this one, I think, we've fully covered. So I yeah so one more thing, you might say: hey it's really nice, but you know who told you that whatever your you called produce crop id on is even rectangular, in which case I say, that's. Okay, in that case, we'll just take the bounding box of whatever, and this is not really the intended use case- applications that you know it's a tool when you get a tool, you should use it sensibly.

H

If you use it unsensibly, there will be some kind of very uh clear uh explanation of what the browser is going to do, but uh you do that at your own peril and you're not really gonna gain anything from you know doing silly things and good luck. uh So, let's for the rest of the discussion, I think that we can pretend that the that we've got a simple rectangular uh target next slide.

H

Please now you might ask okay, but why crop it is, and the answer is that you might want to go several iframes deep right, and in this case you don't actually have an html element that you can give to crop2, because it's in a different document and um and if you for some reason, if we decide that it's really uh much more ergonomic to actually call crop2 uh on an html element, then we can always implement that as yet another as an overloaded uh interface right that just behind the scenes just produces a crop id and immediately calls crop2.

H

On top of that, so one can be implemented in terms of the other, but not the other way around. So I think that's! Okay! Next slide!

H

Please uh here you might want to say: okay, okay, but but why not just transfer the tracks right like produce a track transfer it to whatever you want to crop, to call crop2 on that transfer it back, and I would claim that that's not really ergonomic or safe, you're kind of pushing applications towards the pattern in which they kind of transfer a track that you know remember can actually capture the entire uh tab at any given moment, and you pass that through several other documents and I don't think, that's safe.

H

I think that it's much much better. If you just say you trust, embedded content to just kind of hand you over a crop id that would be meaningless if it's wrong and if they happen to guess it or you know, get it, that's not a problem right, but then actually giving them the track and therefore access to all of the pixels on the screen uh on the tab. uh Next slide. Please and last there are two promises.

H

uh If we actually go back a couple of slides I'll, tell you when to stop it's the slide, where I show the interface uh the my proposed api. So should be slide number.

H

H

So here we've got two promises and I think that the second one should be relatively simple. When you call crop2, you get a promise that tells you, okay, when this promise resolves uh cropping, is actually going to take place because in uh before that, you might get a couple of frames that are cropped either uncropped or cropped to whatever target you set previously, because remember that you could also change targets whenever you want right. So when this promise resolves, you know that cropping has uh been updated.

H

So, for example, you might want to actually you know start transmitting uh frames remotely. Only at that point uh produce crop id also returns a promise, and the reason is that, um basically uh in the implementation for chrome, we're going to produce those ideas on the browser process, not in the render process and that's going to take some time and all of this propagation of state needs to take some place, and they imagine that firefox safari you might have similar architectures, where you would not want to promise that you create the id immediately.

H

But maybe maybe you do, maybe you don't it's nice to have the uh ability to do so. A bit later, I am just about done, so I think that it's a perfect time for please q plus q.

H

um U.N, I think you were first.

F

Yeah, um so a couple of observations. First for the iframe thing you can use message channel, I mean message sports to do uh transferable and it should work. So there's no, no issue there. um If the issue about transferable is that uh so you you call um get display, viewport or get report media and then you're blocked on a given element. uh After transferring it.

F

I think that there are solutions to make get viewport media uh more dynamic there by using, uh for instance, element declaration and you you pass the element declaration to get viewport media, and then you can change, uh you can decorate the elements and- and this will be very dynamic- and you can just use message port, then, to ask the capturing iphone to change its decoration and that should work fine without any ids.

H

um So this uh this dives into uh one design decision and so first of all, do I understand that everything else you agreed with.

F

uh I'm arguing just with what I said: yeah.

H

I know it was mostly a joke, uh so I would say first, uh we don't want to only use this for get viewport media get the opportunity, as stated in the future, people are using get display media and we would like this to work with that as well. That's number one. We wanted to uh work with extensions, that's number two! So all of this should work uh um so changes to get viewport media are a bit out of scope here.

H

uh Third, is that after we hopefully get to this point, and we say we managed to get this uh standardized, I would like to also offer that hey. Why should oma only for the current tab? What about the next tab? What, if I've captured from emit.com tab? I captured the slides.google.com tab.

H

I could communicate there too. So in that case, I do need.

F

Something again.

H

F

No okay from another time and transfer it.

H

F

H

You're good as well.

F

It's it's working, the same no issue there.

H

Yeah, that is, uh that means that we will wait several years until all of those things are done and we will not make progress as mentioned. We have started this discussion of region capture a few months ago, and uh you know applications want to use that in the foreseeable future.

F

I understand that uh the the main benefit I see from this proposal is that it will. It would apply to get display media uh equally to get report media. That's uh that's a clear advantage and I haven't thought of ideas that would allow to get to the same kind of feature level uh by not your new music proprieties. Maybe ideas are needed for that. uh I will need you to think a little bit more there to know for sure.

H

Sure I I would love to also be in contact if you are.

H

If there are any other considerations that might change your mind or change my mind.

H

There were more people in the queue.

E

Oh yeah, I'm next, so once thing I'm not clear about is so you target this element. What happens if that element is no longer visible to the user? Is it still being captured and, if so, is that something that might uh spook users.

H

Yes, so uh thank you for asking that question, uh so I will answer that specific question, but beforehand I will say that, because of the limitations of time, I have not presented absolutely everything about the proposal. There is a spec draft on my own uh on my own github page. uh I will link to it. um Actually, you can just look it up, it's 1983, region-capture and all of the edge cases I could think of are addressed there and specifically for this one.

H

What happens is let's say that the user scrolled or that the application kind of removed the capture target, and then we just mute the track, and if that comes back into view, then you unmute the track.

E

F

Yeah, I just want to mention as well that adding features like cropping to get display media. uh We. We know that uh capturing get display. Media uh browser tabs is a very sensitive things and we want people to move away and uh if we add support, forget display media, then uh we might stick to get display media. While there's v get report media which is safer, so um the more we provide benefits in get viewport media and the the letter I didn't get this the safer. The web is and that's important to keep in mind.

H

um I agree and thank you, but I think that it has been around about one year since we first started. Scouting get viewport media. In this time a lot of people are using uh screen sharing and it is important for to address uh some of their concerns and some of the things they need and not only keep the eyes on the distant future.

D

Right so uh well, first thanks for uh a lot for presenting this, I think uh you're capturing a problem that we should solve in the sense that um I mean we just earlier in the same meeting here, we discussed media capture, transform where you could actually do this kind of cropping in javascript, but I think you make a good point that we want to be able to it's not so much performance, but that we want to be able to not have a disconnect about accidentally uh over sharing anything.

D

So I like that with solving the underlying problem, I had like three specific asks of things I would like to change in the api. I hope that's not too going too deep uh one is, I think we should avoid the name. We should avoid id at any api that we add uh or otherwise suffer the wrath of the ping working group descending on us. Could you say that again, I'm sorry, I didn't get that uh using the word id or exposing an id.

D

It's usually something that the ping working group uh looks um that might invite a lot of scrutiny, um fair or otherwise, and I would actually propose that, uh instead of using a string as an id that we add an interface- uh and I think you and I have talked about this uh having uh if the id- if we could use the element itself, if it were transferable, I think we would have that it would have seen obvious.

D

But all we need here is a handle that uh can be moved around as a placeholder for the element as a reference to the element, but it doesn't need to leave the user agent at any point. So having it it'll be an interface. um Let's say I produce a hundred ids, a thousand ids if they're strings. The use agent has no way to know when it can forget about them, but if they're interfaces they can be garbage collected.

D

So we don't need to keep track of that anymore and having an interface would also uh assuage any concerns about the string id uh leaving the browser and coming back and being an id from somewhere else.

H

D

H

uh Sure is the okay: if answer this one, or would you like to or do they kind of tie in together in a way that you would like to present all of them together? Well, let me mention.

D

The third one, the third one would be- and maybe this isn't popular would be- to use, apply constraints for this, uh because we already have uh constraints for cropping, constraining the image, if you will in width and height and resize mode. So I might, I would have questions about.

D

How does this cropping interact with those things and also apply constraints returns a promise, and it also conveniently is an api both on the track, as well as on get display media and probably even the future, get viewport media, so it would be accessible, perhaps where you could, uh for instance, in your current example, you have to you get the track and then you crop it. So does that mean that if you assign it to a source object, is there a moment where it's uncropped, for example?

D

H

Yeah, that was my last point. uh Well, yes, but that's why cropped returns promise. So if you wanted to you could just wait until you get it, you call crop two. You wait until the crop to promise resolves and then, when you put it into a source object, it will never produce any pre-crop things. uh It might.

D

However, javascript might not know to do that because it happens to work on the developer's browser and then in the field. Maybe there's different timing and then suddenly, if they didn't do that, if they didn't catch it, there might be an open.

H

uh I agree that the tools that cannot be misused are better. There is a trade of their. I think it's kind of um it's difficult for me to imagine uh a developer to that calls crop to andy. Imagine is that it just happens immediately, especially if it returns a promise, but uh we can discuss that in terms of using apply constraints. I I don't think that anybody is happy with constraints. Constraints are already very, very complicated and I don't think that we would be making the prom problem any better by leading more into it.

H

uh Additionally, as far as I know, apply constraints, currently it does not guarantee that the promise only gets resolved when uh the constraints. Actually, you know when frames, all subsequent frames are guaranteed to be according to that, but rather it just says: okay, you resolve when you, when you know it's okay right and we, if we.

E

H

The meaning of the promise we could uh things could go a bit unexpectedly, and especially given that constraints are such a fast field. You know making sure that absolutely every single one of those actually applies to the next uh frames that might end up being a bit more difficult implementation, wise than we initially envision.

D

But I I would uh you bring up a good point about apply constraints, but it's also a promise, and it's perhaps underspecified when applied constraints should resolve whether it should wait to ensure and provide the guarantee that effects have been applied. At that point, I think that's a good idea. I think we should fix supply constraints. Otherwise you end up with timing issues. Let's say I want to crop to a different element, but because, let's say I'm changing from cropping element a to element b, but because element, a and b are different sizes.

D

I'm also going to apply a different uh scaling downscaling to them using apply constraints. So now you have to coordinate crop2 with the black constraints, but if they're, all in one there's a single promise, there's the synchronization point.

H

Yeah sure would you be yes, I'm sorry.

A

Yeah, I I we have about 15 minutes late left in this segment. I'm just wondering if um we understand what the action items are, I don't want to go too deep into the proposal just to I want to understand. You know what are the next steps forward and make sure we get that into the minutes and are clear about that. Are you asking for something the working group to do something? A lot.

H

um I guess at this point, I'm mostly trying to understand what people's positions are and, if understand correctly, it is that it is a a problem worth solving uh any objection here.

H

I guess not, and um it is uh unclear to me if these are discussions over optimizing, the the api or whether the any of those concerns are blocking for the people who have raised the various.

A

Issues I'd like to invite comment from uh the working group at large as to, uh rather than just a few people.

E

So tim is next on the queue yeah.

G

So um I like this, uh I I've got it used for it today um yesterday, the day before uh so I mean I, I think it's a. I think we should be doing this and we should be doing this expeditiously. So I'm kind of I see the merit in potentially not waiting for other things, and I think specifically around the I like that.

G

I mean I take the point about not calling it an id, but but I like the idea of it having it being an opaque token of some sort, either interface or string, um because you can have multiple of them, so the the the um the cr, the uh the application that is is is being um uh uh being captured, um can have, can offer multiple things and then the capturer can can switch between them in a way, that's convenient. So you you capture the whole area, and then you say: okay!

G

Well, I'm going to zoom in on on fred or I'm going to zoom in on the slide, um or I know and again in the application, I'm I'm working on at the moment. That would be really useful. Okay, I want to be able to crop the drop things off and and move change things around and not having to round trip to the other application.

G

To say: hey would you mind changing the stream that you're sending me, even though it's exactly the same, like it's capturing the same tab, but I want to change the cropping area to a different thing. I think being able to do that really quickly and easily is, is a merit of this being an opaque token rather than transferring a stream. So I I do like this and I think it's a you know it could.

G

There are probably some some detailed improvements along some of the anova lines, although I'm again I'm not fond of constraints, although we would have to document how this interacts with them, but but in general, I'm in favor.

G

That's a long way of saying that.

H

Thank you, tim uh and uh obviously, I'm very happy to hear that uh quick question. uh If I understood correctly, you said that you were also in favor of uh using an interface rather than strings, and I would like to ask both of you as well as yanivar. uh What are we trying to guard against because it seems to me like if the string escapes the user agent and comes back later like what does it really matter? I mean the user agent will just recognize it as an unknown string. I think I'm.

G

H

Going to speak for for.

G

Yanovar as well here, which is that what we're trying to avoid is ping um we're trying to avoid a long discussion about that. These these strings are safe um and we may be wrong. Okay, we may think they're safe and then you'll find a um a sneaky crypto guru. Who will tell you that they're not for a reason that we haven't thought of so like if they can't escape the browser, then that conversation doesn't crop up.

H

G

It's about avoiding the conversation, rather than being, you know, actually having a specific risk.

H

G

H

I appreciate I appreciate that uh sentiment and uh but- and I would not push back on it at all- if not if it weren't actually blocking my uh next proposal, which would be okay but allow this to also work when you capture another tab and when I capture another tab, I want it to be relatively easy or the other tab to just export some kind of message. Saying: hey here's an id. I know not an id.

H

Here's, not an id that you can use to capture to some interest section in points of interest in me and it seems like anything. That's not a string needs to at least be able to could be convertible to a string for that to happen.

G

And the moment you do that you're slowing down the adoption of this api. So so my advice would be to to make it an interface and then worry about how you're going to stringify it later.

H

I'm open to that.

D

Thank you tim. I accept that answer on my.

C

I think we have been out on the queue.

A

Yeah um I was gonna mention something along the lines of what tin said, which is very often we talk about opaque tokens which turned out not to be so opaque um and the the question of how you exactly specify them and whether they're implemented uh the same in every browser is kind of tricky, and often the differences can can leak privacy information more than you might think.

A

So um you know just because something I'm sure ping will want to get into the details, and that will be a long conversation and avoiding the conversation is worth something and, and ping often brings up conversations that even when you don't think there should be any so uh there might be more here than than you might think.

H

uh Sure uh I would just mention that uh I was thinking of using the uuid, uh which at least would have resolved some uh compatibility issues. I think, but if you think that uh using something completely opaque, okay, I see a chat message and uh we can dom. Would you like to take grab the mic and explain why not or maybe this is out of scope.

E

uh In the interest of time, I think we should not dive into this. But, yes, you ideas are one of the specific pain points for ping, because they can be used to identify users coming back to a website. So right.

H

um So one of the.

E

Just I mean again for for your own benefit. I I think that's a discussion that is worth having in an issue. uh What exactly is the token that is used to transfer the crop cropped region?

E

And it may be that we are, you know over concerned about this and you're right, but I don't think we are going to solve this and I don't think we need to solve this for your purpose. uh It's uh I understand there is a phase two that you are worried about, but to team spawn. I think you should, let's, let's figure out phase one first before.

H

Sure I just want to give you the pleasure of being able to say we told you so uh in a bit and say that I would be surprised if pink would find something here, because this uuid would be produced uniquely for the element and not outlive the session. In any way. If you reload the tab, you would get a new one, and you will tell me we told you so when I find out why I was mistaken and it's going to be fun for everybody involved.

H

But but sure I can just avoid the discussion instead and we can revisit it when we want to start cropping to another tab.

F

Was in the queue- and I just wanted to say, if we go down with api talks, I plus one for abstract control minus one for apply constraints. We should not go there and uh with regards to promise to returning a promise when the capture has changed, uh it's a difficult topic uh both for apply constraints and for this particular api. So the only thing that you can guarantee is that at the time the promise is resolved.

F

The next frame is, of course, uh the capture has changed for the next frame, but capture might have changed already for the previous frame and you it's very.

E

F

Difficult to to guarantee uh things there, maybe.

E

F

Capture transform, we will be able to get there, but it's in general. It's very difficult.

H

So the intention is not for the promise to say all of the previous uh frames uh were cropped this way or that way like. We only guarantee that, after the promise resolves, you will not get any more frames that were uh cropped to the old one right. You might never get another frame ever, but if you do get any one, it will be cropped to either the new one or to an even later uh crop target. That's the promise and.

A

H

These nuances are kind of reflected in the spec draft. uh In a bit more rigor.

D

No, I don't see anyone on the queue, but a quick question actually.

C

In the queue so just pointing out that, if you, if you have an interface either, it's tell your eyes, stay relaxable or it's not if it's serializable, it's a string in this case, if it's not it's a token or a symbol or whatever and.

C

Serializable is useful for moving stuff around within the browser.

C

I do worry that we that we are building the overcharge, not the best technology we have if we are making a decision that affects usability on the basis of we don't want to discuss with certain weaponry, it's not what they should be doing and your ideas are fine as long as they can only be type four and you kill them fast.

C

H

My comment, uh I agree and I think that uh they naturally will leave only like they are randomly assigned and then, when you reload the page, if you even try to produce the same one for the very same element, you will just randomly assign a different one. So I think that satisfied the condition that the requirement you've just cited.

C

There is a privacy risk that.

C

There is some I would well, but that's longer, discussion.

A

So um what are the next steps here? I think we have four minutes remaining and I want to make sure we we understand what to do next and what do we write down in the minutes.

H

um As far as I'm concerned, if you all vote to make this working draft, I would vote okay, but you probably want to specify something a bit lesser than that as the next step right.

A

Okay, so are you suggesting that the next step is a call for adoption or not.

H

Probably not so I'm calling on people more experienced than me in matters of your um of the process here to tell me what the next step should be.

C

E

C

In standard space, it has to be called for adoption.

C

I mean, apart from that, it's just opinion being ready.

E

Well, I've had a clearly support for the use case. uh Some shape concerns, but there are always shape concerns. um So I guess a call for adoption might just be in order, although it would be useful if people think it's too early or if they want to think more about it and raise issues beforehand or.

A

Can we have a call for review prior to adoption, I'm just trying to understand uh how we get something before the working group and you know get to get the comments out on the table as opposed to waiting from there for them? You know to show up 14 months from now or something.

D

Well, could we have a call for adoption of a document that has been updated to use an interface instead of string.

H

Is that a blocking concern, because I think it's unclear to me if we've uh uh finalized that discussion, but that seems like something that could be easily changed. You know with the pr.

D

But yes uh perhaps, but uh I don't the use cases you have presented do not require strings so.

H

I think uh it will be. It needs to be serializable, because you need to be able to post message it to another iframe.

D

Yes, but sir, you can have things that are serial serializable in the browser and still open.

H

H

If I do not find that, if I do not find that this is a problem, then that would not be a blocking concern for me to to stop it being a string. So that's, okay for me,.

D

So yeah, the only other thing I would perhaps push for was supply constraints, but I hear un was not very fond of that, so I'll concede that we probably won't be able to do that. I have some concerns with interaction, but uh other than that I don't see a reason. We shouldn't uh try to resolve that within github issues.

A

um Yeah go ahead.

E

Tom, well, I was going to ask ella you're also offering to serve as an editor for that document. Should it be adopted.

H

I'm sorry, could you.

E

You you're also referring to server as an editor of the document. Should it be adopted. Yes, I'm asking because we have more documents than we have editors, so I don't want us to commit to.

H

I understand you asked officially in the answer officially. Yes,.

E

Okay, so I guess in the interest of time we'll confirm this with the chairs after the meeting, but I guess the next thing might be a call for adoption after a short update on your hand, if you can settle the token thingy, okay.

A

H

Thank you very much.

A

All right so we're moving on to the next work item, which is webrtc and the use cases, so um it seemed apropos to have a little quote from lewis carroll uh alice asks. Would you tell me please which way I ought to go from here, and that depends a good deal on where you want to get to said the cat? Well, I don't much care where said alex, then doesn't matter which way you go.

A

So a little bit of the history of webrtc d use cases. I looked up. The first public working draft was december, 11 2018, and that was it seems like a very very long time ago. So, 27 months before the pandemic began in march of 2020 and 28 months before the first public working draft of web codex. So web codex was at that point uh still a gleam in the eye of a few folks.

A

But and since then a lot of things have happened, just thinking about all of the communications technologies that have reached a mass market. Since then, podcasting I'm told that something like three quarters of the people in china listen to at least one podcast a month.

A

Video conferencing we've all seen these enormous increases in the usage of video conferencing during the pandemic, but also video streaming services were not entirely mass market prior to the pandemic, but they most certainly are now game streaming services, weren't uh just on the verge of being launched or small scale before the pandemic, and now they're they're widely used large scale, webinars and online classes of 100 000 people, and now some services are claiming up to a million participants in these things and then online performances.

A

You know a lot of musicians haven't been touring so they're having live concerts. So all this stuff has happened since we had a first public working draft. So all this raises some questions is: does the current document reflect what the industry cares about? Secondly, does the document reflect the current state of technology? We've had uh considerable technology advances since the first public working draft, and the third thing is, um is the are the?

A

Are we on track to actually enabling the use cases that we have, let alone, perhaps the the ones that the industry is interested in? At this point, so I thought we'd review what we've got in the document just to refresh everybody's memory, but also to look at where we are given the things that we've got in the document and then asks some of the questions again.

A

So we have three existing use cases and by existing what we mean are these are use cases that were in the original webrtc use cases document that perhaps we think we can do a little bit better. So three of these sections, one one, two two one, two two and two three and just some observations. What I try to do here is characterize the type of requirements, I'm not listing them all, uh but um one of the things is that there aren't many api proposals that actually address these existing use cases.

A

So we have webrtc, svc and webword, see ice and that's about it.

A

The section two one and two two use cases have quite a few ice requirements, so they reference ice, a lot and things we need to do to make ice better to enable these use cases.

A

The other thing is looking at these three existing use cases. I guess video conferencing certainly has become mass market, but the other two. You know a lot of things have happened during a pandemic. I don't know that mobile calling services are or the next big thing, or uh we certainly have game streaming.

A

uh I don't know uh the multi-party with voice communications, how big that is. So it's a question about whether these use cases are actually compelling. uh Are the ice requirements real like have the fact that we've not done much on these ice things? Is that really constraining the adoption of um webrtc? If it is, are those addressed by the weber, see ice document or or not, um and the other thing is in the video conferencing space?

A

um I would note that we've got web codecs now. So there are multiple ways of adopting of addressing this particular use case. You could do it using extensions to webrtc, but there's quite a few extensions that are required and we don't seem to be actually proposing those extensions.

A

So the question is: is there an alternative way to go about this via web codecs and should should we have been considering those requirements um and not just not just a webrtc extension approach, which seems to be what the document is focusing on and then we have new use cases a couple of those- and I looked at first at sections three one through three five, um and if you look at these, these are uh use.

A

Cases are pretty heavily uh dependent on data exchange things, but uh the only ap api proposal I could find was rtc data channel and workers. We haven't done much with data transport in service workers.

A

There aren't any proposals for low latency peer-to-peer transport, which seems to be required by a bunch of these or for that matter. What wd stream, support and peer-to-peer data transfer hasn't gone forward and there's no proposals that address the ar vr or metaverse use case in section 3-5.

A

I would note that a bunch of these things have gone mass-market. Certainly we have quite a few of the low latency peer-to-peer services out there in the game streaming arena.

A

Caching, um that this use case does mention commercial uh implementation, so it is quite real, uh but we're not doing much uh to to address that just in terms of activity um and then we have another set of use cases.

A

Three, six, three seven and three eight those we are we do have quite a bit of uh activity in the working group, so uh media capture transform is in that, for example, uh some of the work in some of the machine learning working groups relate to that, so quite a bit of w3c activity and these particular ones, although not for the case of section 3.9, the reduced complexity, signaling we've talked a little bit about some of the questions that remain uh the machine.

A

Learning workrooms are coming up to speed on some of the a api proposals, including what wg streams. So it's I don't think we've quite answered some all the questions there.

A

We don't have currently a proposal for the face and body tracking apis, which is part of 36. Although we will have a presentation on this in a minute or two and again no proposals for the data exchange and service workers.

A

Looking over section 38, which is about video conferencing, there is some security requirements which seem a little bit out of date. One of them is non-repudiation, which I don't think s frame actually provides and another is the use of the term uh perfect forward secrecy, and I think that term has gone out of use because there's no such thing as perfection in terms of forward secrecy.

A

So a few comments there. So I want to open the discussion around a couple of the questions that I asked is. Does this document reflect current industry priorities? You know everything that's gone on the pandemic. Does it reflect the diversity of of uh new things that we're doing?

A

Does it reflect the current state of technology, in particular the web codex? You know approach to building apps um and are we in terms of whether we're on track to enabling these use cases? I'd have a couple of observations. It appears to me that no use cases have all the requirements met by api proposals. So there's, uh if you're looking to implement any of these use cases using the apis that we've been proposing the answers you can't implement any of them.

A

Only four of the 11 use cases have any api proposals, and that raises the question you know. Is it because we're not doing what we said we're going to do, or maybe some of these use cases aren't compelling enough or don't have consensus. So uh with respect to that, we could call for a consensus on some of the use cases and figure out if anybody's interested.

A

I would also note that there are some areas of recent interest like media ingestion, that aren't covered. So maybe we have use cases that we don't care about and also there are use cases we do care about that aren't in the document, so that's possible, but also in terms of the biggest gaps. I would note that data transport is mentioned in seven of the 11 use cases, but there are very few proposals, so that would include things you know ways to solve.

A

This would be access to the rtp transport which we don't have or desire uh impact, low level. Control of the data transport and there's a bunch of use cases that mention this, including things like control over the congestion control mechanisms.

A

uh The re-transmission and details like that and the last question is this document doesn't really talk about what the long-term architecture that we're moving towards this. So that goal I talked about. Where are we trying to go? Are we trying to extend webrtc? The document seems heavily oriented towards that, um but you know, given that we have web codecs, are there alternative ways to get to the same place like? Are we trying to do web codex over the data channel or web transport or web codex over rtp?

A

It's it's not at all clear to me what our long term goal is in this. So anyway, I'd like to open up a discussion within the group. Try to figure out.

A

The answers to some of these things.

A

I guess we have tim tim.

G

You came in and out yeah I had actually taken myself off the queue, but like a couple of things to say I mean I think um so I think one thing one area you missed in the terms of the list of changes in the environment in the 28 months or whatever is I think much more of this is happening on mobile than than was I mean, certainly in my practice, a lot of people are using mobile for for the sorts of targets that we always used to assume was going to be laptop, and I don't think this document in any way reflects that, um and I think that that so that would be an additional kind of change and you're talking.

A

G

Mobile browsers, um yeah, yeah yeah, um you know.

C

That's definitely true.

A

In a lot of cases like game streaming, a lot of a lot of the target is mobile, browsers, absolutely.

G

Yeah and but also you know, people joining um little small family conference, video conferences, all sorts of things, there's a there's, a ton of people who will do that happily on an ipad but wouldn't get their laptop out for it. That sort of thing.

G

So I think I think it it that's a big shift that I don't think this document reflects and then I think the other thing at the other end of your your presentation, your question about the end is like: are we looking in the right direction and I think I think we're not going to be able to do that without trying to address the question of whether we're still interested in p2p or not like the initial design goal for for webrtc is that it was like everything you were going to do with.

G

It would be capable of using being used over peer-to-peer, and I mean I think, that's great and for my practice, it's fantastic, but in reality the kinds of things that we see being done with webrtc, a lot of them aren't and and web transport simply isn't, and it isn't designed to be, and so I I think we would need to be clearer about whether we're still trying to address both the star and the peer-to-peer architectures or whether we're going to say, hey, look, you know, web transport will do the star and what you see will do the peer-to-peer and then there's a bunch of apis that they'll both share.

G

um I I think we need to think about that.

A

Okay, I think we have you wen.

F

Yeah, uh thanks for the presentation, bernad um one, so what we're seeing uh with web transport and so on is that a lot of flexibility is given and with webrtc it's it's a really integrated system which has benefit in terms of efficiency and so on, but clearly it seems that people might have like requirements and needs for some uh opening, and some opening has been done and in some cases I I don't see the benefit like replacing codex. I'm not sure.

F

I see the benefits, but uh one case that I see is uh opening uh cases like metadata like if you're in a metaverse vr ar you. You want to have metadata synchronized, and that means that you, you need to somehow open the gate to rtb packets and rtp headers, and uh I don't know.

E

F

Have the use cases there and it would be great if we would have proposals in that area. um A generation parsing like why should ltp header extensions be implement fully implemented in browsers for applications to use a particular ftp header extension rtb header extensions are usually cheap, so it's not like. You need hardware, rtp header extension, implementations for instance, so that would be something that would be good to have um with regards to mobile boards, as that team mentioned.

F

uh I fully agree with that and uh if you look at ios and android there somehow different from over oasis- and that should also arises uh like this- this should surface somehow in browsers, and um it would be great if we had like more interrupt between ios and android at the browser level, for instance, and one example I have is uh when you have a phone call and the webpage is capturing audio, then things are happening like typically, the phone call will take precedence and the webpage will will be muted, but the handling of muting.

F

There is it's very different and and that's an error which is user agent specific somehow. But it would be good if we could like provide guidelines and uh progressively align in those areas, and maybe there are other areas that are specific to mobile browsers, that we should also try to uh to work on to make things more uh consistent between oasis yeah, between browsers.

A

Yeah with respect to that um mobility issue, you man, I I get more complaints from developers about mobile platforms for a bunch of reasons. One is some of the things you mentioned, but the other is you know when developers tell me hey, you guys are creating seven different apis.

A

Is it realistic for that same set of seven apis to be implemented by every mobile device? And the answer is no they're, not you're not going to get that on all different you're going to get a different mixture of these apis and that create makes the developer's life complete hell.

A

um So it's better to have a smaller set of things that everybody's going to do than have a whole. You know a bunch of apis that you know will be on every desktop, but won't be on every mobile device, uh harold you're, muted. So one of one.

C

Of the things that is not immediately obvious is that there's a lot of stuff going on in uh that uses the webrtc implementation that doesn't isn't in the browser.

C

Like I had a complaint only yesterday about the certain configuration change I made in in my webrtc implementation that affected ring doorbells right and in frequent cases.

C

People want the ability, the set of connections to this part. You see using thingies from a web browser. I have another complaint that well I my browser cannot talk to my infrastructure components that that kind of looks like webrtc, that isn't so.

C

There are a lot of ways of doing webrtc or where there are a lot of things that use webrtc protocols.

C

I do agree that our our use case, driven approach seems to have not worked out very well for tracking. What's going on and.

C

But I bet yeah when you want long-term architecture.

C

That's often hard because either it means that we grow into the architecture. I imagined yesterday, which can't adopt adapt to the future, or it means that you have to create the path as you walk it with frequently leads to an architecture that is hardly coherent, yeah, so yeah.

A

The reason I mentioned architecture- harold, is, if you're, that the document seems created with the mindset that we're going to extend whatever you see, which seems to result in endless numbers of extension requirements which don't seem to be have generated, motivation on the browser vendors to actually do them, as opposed to the web codecs approach. Where the answer is yes, if you want this extension, go write, it yourself, which seems much more likely to actually happen.

A

So that was why I brought up the architecture is: is it's? The current approach seems to be piling up requirements that don't seem to meet the bar for the browser vendors.

C

Yeah, a lot of them have been implemented or, and the other way that extensions get get proposed is a browser. Vendor sees a small need rights, have a proposal to satisfy that small need and then comes the working group to ask for that to be blessed as an extension, I'm guilty of several of those, whether or not they fit the architecture or not, and it's a way of bringing the usability forward, but it doesn't lead to a coherent architecture or- and it especially doesn't lead to evaluation of basic principles like do. We expose rdp.

A

Yeah and the other thing is, it leads to the some of the problems that tim mentioned with the mobile devices, because you're then asking the mobile to implement these zillions of extensions that you're creating and the odds that you'll have them all seems low, whereas if you're asking the mobile device to do, you know two things. I want webcodex from you and I want rtp.

A

That seems more likely to be satisfied than you know, 15 different extensions, but that's just my opinion.

A

uh I ansi, I think, and then we're gonna have to move on, but.

I

Yeah yeah thanks, so I'm sick because uh weber uh with machine learning working group chair, uh so I I wanted to wanted to let the group know that we're very much interested in working with you without this working group to to understand uh uh your use cases for enemy, better and actually dom uh dom opened at this constraint recently in our group and we're uh we're building a prototype that integrates the media capture, transform api, whatever it will be or more morphs into uh and integrate that with a web neural network api that is able to accelerate uh inference of machine learning models, so so dom gave us a use case, which was background blur um so yeah.

I

We, we hope to have some results that we can share with you, and I will I drop the link to the irc to the github issues. So if you have uh thoughts, suggestions, uh metrics, we should track uh questions whatever. Let us know so. I think it's. uh This is an interesting area, and I like this, I this proposal that you you've been baking recently uh with the what we g streams integration.

I

So uh there seems to be many moving parts still. So, let's see how we can pull off this prototype, but yeah we'll do our best. So that was my shameless plot. Thanks.

A

uh Yanivar and then I think, we're gonna cut the queue and move on to the next item.

D

Yeah, just uh two comments is that uh I I think, apart from funny hats, it seems like most of these use cases have focused around peer connection, but with funny hats, it does suggest it's also supposed to cover media capture. So I know, for example, a lot has proposed a lot of interesting use cases uh in the screen sharing space, for example. So it's definitely I think it's the case that we have been working on things that maybe aren't covered in this document, so that might be need to update.

D

uh My other was more of an editorial concern in that I know, mozilla takes use cases pretty seriously. So uh I see a couple of use cases here have a note that says they have not completed a call for consensus, not just in the editor's draft, but also in a technical report, because we do photo updating now.

D

So I'm a bit concerned uh that we perhaps be a little more cautious about what we include in the document so that people don't mistake uh yeah, as you mentioned in the beginning of the meeting, there's some uh it can be confusing for people to read these documents.

A

Yeah so you're suggesting that we try to do a call for consensus on the items that, on the use cases.

D

Well, or or remove them depending on what you would like to do, yeah.

A

Okay, any other any other comments on this. I think we'll move to the next item.

A

A

Okay, so we now have uh redo talking about face detection.

J

Hi, thanks uh bernard, so uh I think your document of the use cases were very useful because uh in the funny head section, I think in the next few months I'll try to present at least four out of the eight ones presented. So today uh we'll talk about the phase detection api we presented in the um webrtc and we used cases uh breakout session activepak last month. uh Okay, next slide uh yeah.

J

The basic ask was to make the api a bit more general and forward-looking and not, and something for the future, also as well, and not getting fixated on only what the present platform apis are giving. So I think now we have reworked the api and it's general enough, and we want to hear what features can be marked for future. Something like that. So here we are with the reworked version of the face detection api.

J

uh Hopefully uh some of you folks had the chance to look at the details I sent to the mailing list. Here's the github page link next next slide, so I'll quickly go over the stuff in five minutes so that we keep majority time for discussion. So, in short, the first ask, I think I guess it was from harad- was uh instead of a bounding box. Can we return a mask, because that is something what meat is doing?

J

uh We tried to reason a bit and even though right now there's no platform api across all the os supporting a mosque. We tried to accommodate this concept in this api by returning returning a contour. uh The number of points describing the contour can be user defined using a setting. Let's say you know, face detection, num, contour points we can bike about naming later on, but yeah um the and the implementation wise. What will happen is right now we can def default to a four point rectangle and maybe in future.

J

If we can get a proper outline contour, we could. You know this is sort of extensible, um because what happens is like for normal face detection uh in camera. Ipos the code in the driver runs a very efficient algorithm, very fast, quite accurate, no dnn stuff sim, a very simple uh ml classifiers on cpu, using um ss or on maybe on gpus also sometimes, but since the camera stack always runs this face detection. Every time for uh aaa algorithm optimization, so the algorithms are usually hard cascade or mean local binary pattern.

J

So I'm talking about chrome, os and windows specifically. uh So this brings to my next point. Also uh there was a comment about face mesh. uh I think tensorflow and other ml frameworks are returning up. Quite a good like high value, sick 468 landmark face mesh well, most dnns can return something similar depending on the model, but from a practical point of view.

J

uh What's in the platform today and what is going to come in the next next iteration next one year and almost two years so uh uh no platform apis are going to come up with face mesh unless you, you know, use a uh neutral network framework. So this is not going to be implemented in driver in the next one very near future, so there's always a possibility of happening it, but it's a bit far um once, but still for the sake of extensibility, and I mean we have still kept the mesh as a option.

J

We can, you know, remove it or add an note about implementation, support. Something like that that I learned from you guys so yeah uh landmark so okay, uh so there is also talk about this face landmark set um that is sort of supported by all platforms. Now uh next slide.

J

Yeah there should yeah so normally face detection mode, usually people say fast, accurate, normal sort of enum, but here I've we have tried to make it like face: detection on and off like none or presence off and on contour or mesh, whether it returns me a contour or maybe it can return a mesh depending on underlying driver support um yeah.

J

So the extensibility is up for discussion so so regarding face uh expression, uh blink and smile are some are something like which are universally supported across all platforms, so but I've added a few more uh sorry uh previous slide. Please yeah thanks. So I've added a few more like anger, disgust fear these kind of things which you know these are optional.

J

Why I added this is I could do these things without using dnn stuff, so using simple uh classifiers on top of the hard cascade or something like that, so so these things are still in the realm of possibility to be implemented within drivers in the near future. So I mean we can restrict them like every other enum. We can restrict them a bit, but uh what we have tried is to give you all the options and then maybe we can remove something so that uh yeah yeah but uh yeah I'll open up for discussions. Now.

J

I think that's the most important part. Why, uh like to all especially harold, does the contour sound good uh yeah here? It is please, uh let's start discussions.

J

A

Yeah we've got harold and then tim in the queue.

C

Aren't you any queue first. Was that the previous queue.

D

uh Previously that.

A

Was the previous queue yeah, so harold europe yeah.

C

So contour sounds a good bit better than than leaving at the square because, as you know, and anything that is doing things like background blur is not using square.

C

I do worry about the way of thinking uh freezing distance strictly in terms of drivers, because.

C

I would want to approach it from the other end and say that if a media stream has this metastem track, has this information in it? I want the way to get at it, and I don't and as a consumer of the track, I don't care how it got there and.

C

As a producer of the track, I want to be able to add that information as an as a form of annotation to the video frame right. So so I would like to see if we can.

C

I would like to imagine an api where we, where we can can have uh production or consumption or even refinement of the of of the of the of the annotation of the video track being passed along right so that, for instance, you could do a media stream track feeds into a processor.

C

I mean the bsm track, carries the box because that's what the driver knows how to do. The mediastream jack uses that as a hint to initialize its own and contour the action algorithms and adds the contour as immediately metadata to the stream and pause it pass it on.

C

So I kind of like the the shape you're looking at now, because it has a fair amount of the right sort of metadata, but I'm I'm worried about the the way that it's described as a as a one-way thing.

J

C

Only only for consumption and not the production side right.

J

Yeah the background blur uh part, uh maybe in the uh early january I'll, try to present uh the entire background blur and replacement uh stuff because uh we're, as I showed you, the demo, how we are working on it, uh it's still a bit of work in progress. um Yes, you. I understood your point that you wanted uh face detection and then, as a so, you wanted that when we do background blur, you do the face detection and take over the you know, value which phase detection did.

J

But the way we are trying to do is this is free of computation uh like when your camera is on. If it's in auto mode, this already happens in the driver.

J

So I think uh yes, uh tim. Please.

G

uh Yeah I mean my second harold's thing that I think it would be. It would be nice to be able to use this api to do successive refinement of of the you know, it's great to have it for free and computation free, but then we're going to want to layer in another layer of computation.

G

You know in in something else some other api. That's in the browser to do that. um But my other point was I'm nervous about the inclusion of the expression enum, the others are are reasonably factual and that is a very much a subjective value judgment and I I I think it kind of doesn't sit.

J

G

J

Sure are you, are you uh worried about the list or are you you do not want it at all, because blink and smile are across all platforms or are you do you want to restrict the list or you want to get it off.

G

I I'm worried about them being um either wrong uh and possibly were even more likely wrong um more wrong for different subgroups uh population groups. So you know that there are these. These things are famously inaccurate, for particular subgroups or particular use cases, and so I I the more subjective the the api point, the more that worries me.

J

Yes, um I agree that it does depend on a few things about the training yeah. Okay, um we can. We can keep a note that this is something u.n.

F

I think universe. First, oh.

J

D

Yes, excuse me, so apologies if this was covered, but I'm trying to remember what we agreed to to to work on when it came to this topic of face detection from last meeting um and the state. What is the status of what you're presenting and what are you asking the working group for right.

J

So uh last time I put up in image capture and then there was a un or somebody commented that we should move this to the media capture extension. So we have sort of there was that this was a logistic ask. Another one ask was make it a bit more general.

J

uh That was my understanding, so uh uh I I might have made it a bit too general, but I'm trying to maybe throw out stuff uh what people do not like, and then we can put up a pair in the extension spec media capture, extension spec, and then I mean, of course, chromium image. Implementation would follow.

D

Yeah and just process wise, I think at some point we need to. uh I think you need to ask the working group to adopt this or not so I'm concerned.

E

D

We work on apis that are managed outside of the working group. Okay,.

J

J

The media capture extension is not a correct place for this.

D

I think that would be a good place uh to discuss it. I I don't know if there's been any issues opened or.

J

uh So but we can definitely open uh put up right away. I just wanted to before putting up the pr. I just wanted to know whether um check the pulse, whether it's okay or not, I mean the peers- would be ready next day.

D

All right so so this is a future proposal.

D

Okay, um right um yeah, I'm also a little uh wondering about how to deal with this as well.

J

Okay, please let me know how to like deal with it.

D

I welcome feedback from others in the working group as well. Okay,.

A

Yeah, I uh I I think this is pretty interesting, but I also shared tim's concerns about the emotion analysis.

A

I've been involved in several studies of that and I do think we see different uh accuracies with different populations so that that could be. uh That could be a concern. um I guess uh I also would want to uh talk about uh the the way in which this would be used.

A

um It's uh basically it's a interface on media stream track, but I think uh the in my mind the question is this: I think, is used to be designed to be used along with uh media capture transform.

A

So the way I think of this is is potentially part of a transform stream uh that you might be developing to do the background blur, for example, yeah, um and so um you know at least the way I think of it.

A

In my mind, is I've got this video frame, potentially that's in a gpu buffer, and I guess the machine learning folks will tell us on how to do the how to do the processing quickly, but um I'm getting information here that would potentially help feed into that background blur model and help make it execute faster. I guess that's um and uh I guess my my question is you know uh the the information that's provided, for example, the contours and, and so forth.

A

Is it's designed to help me process this thing that's in the gpu buffer um and I guess one of my concerns is particularly in going effects, see performance effects that might be created. I'm presuming this. These dictionaries are operating in maine memory and I've got this gpu and switching back and forth and validating caches and stuff like that.

J

Yeah so performance wise. I can tell you few things so if you run on chrome, os, kb lake yeah, if you understood that I forgot the model number and if you run on the not yet released windows ones for the phase detection in those are working on cpu using sse.

J

The library name is called pvl, you can and they are working. They are working on cpu without any dnns, so the gpu memory you are talking about won't affect there. As of now, of course, in future, there are versions coming out where.

J

Where there is a possibility that permitting something will be done on gpu and, of course, in more future, there are other accelerators. Obviously uh I so if your. If your concern is performance, uh I we can do some metrics and I think I can show that this would be one of the most performant one in.

A

Terms of space detection.

J

A

Question for the api shape is right right now. It's it's an extension to the media stream track interface, yes, and what I'm saying is I'm thinking of of you know the the machine learning the background blur and I it seems to me like. I want this to work in fact on a video frame right. Basically, what I'm looking at I'm I've got this video frame. That's come out of media capture, transform. I want to give it to you and you give me back the contour and then I'm going to go and use that in mind.

A

J

A

I understood if.

J

I, if I so, if I uh try to show up a background blur on media stream, track with the like, very good performance, will that convince the group.

A

Well, I guess my my question was more. Is it? Is this an api that extends.

E

A

Stream track, or is it? Is it something that I operate on a video frame with? That's all right, I think.

F

Yeah media stream track only that was the point. If I, if I may jump in jump in, I had like very similar uh feedback to bernard right to me. The thing is, you have driver data that you want to expose yeah, and the decision to expose should be at mediastream track level like yeah. I want to opt in into getting uh this face, detection from the driver, so please generate it now. The generated data should not be on media stream track. It should be synchronized with video frames. So.

E

Either it should.

J

F

Either you can get it from the video from itself or it should be. uh You should be able to get it whenever you get a new video frame through media media capture transform. um So so this model makes sense to me. um The other comment I had was that it's good to given the idea is to expose driver information, it's good to uh to be as specific and as possible, and there are things in that the driver is already generated and that's what we want to describe once the application has it.

F

They can use like a generic metadata mechanism like to attach metadata to video frames and pass it on to further transformer and so on, but uh we can separate the issues and if we look only at the driver's specific exposure, then we should be specific and not, and we should not try to generalize too much, because if we are generalizing too much it, it will become very hard. So I would look at what drivers are producing now and try to describe that and application specific will produce mask or whatever from it.

J

Right, okay, so we can remove the mask part as of now or add a note or something like that, like something: okay,.

A

Okay, so we have four minutes left. I just want to make sure we capture the action items and next steps for what we're what we're going to do.

A

So, what? What are the next steps.

J

um Well, I was thinking next step. I could show you some. I think I showed you some demo and how it works, or uh maybe I can get some numbers performance numbers or something like that, and I think what harold is telling is use the face detection as a pipe to background blur. uh I'm I I will try to experiment with that part and I will try to also put up a pure background blurs um and try to get some numbers from there.

E

So we do what I'm hearing isn't so much the pipe to background blur, but more of an architectural point that the data should be exposed at the frame level that can then be repurposed for bargain blur for any other kind of uh transform. So I think it's that seems to be the most consistent feedback. I've heard across.

J

Okay, so uh it has to be to so which interface sorry I missed it, which interface I should be looking at then like, instead of media stream track.

E

So you you want to look at uh not yet adopted media capture, transform proposal. I guess.

C

And what codex.

E

uh To get the right hooks for for your work.

J

E

J

D

Have to raise the issue in uh in web codex.

J

D

J

H

J

The problem is, of course, I have done a few written some stuff for web codecs and all those things, but this one was not related to web codecs. So.

C

F

This related to the video.

C

Frame which is currently owned by web colleagues.

F

Okay, but we can certainly extend these objects. I think we should get the discussions in media capture extensions and try to precise the architecture points there and which points should like for each point. We should file another issue and and dig into the house and uh so have you filed. I I think you filed an issue on on github media capture extensions and we should probably take the time to provide the feedback. We gave at least the architecture points and describe them properly so that you can work on them. So.

J

I put up an image capture and you put asked me to put in extensions media. I can do it this week like tomorrow. We can put it in so that we get the uh feedback and.

E

So we do to be clear: it's an issue not a full request that I think is being requested. Okay,.

J

We can put up a issue and yeah. We can put up a issue and put up as a pull request. Also, you don't have to accept and uh yeah for discussion. Yes,.

D

Sorry is this issue 289. We could probably transfer it for you.

J

Yes, it is, it is issued to 89 and and would.

D

You prefer, if we transfer it or do you want to create a new one.

J

uh Anything uh okay, yeah anything uh and uh we have a sort of. If, if we go back one two slides before uh uh we have a document, we have a link to the yeah arrows github page, where we have explained everything against media capture extension. So if you link open that page.

J

Okay, we are a bit out of time, sorry for that, but uh yep um looking for.

D

Transfer: okay, thanks.

J

uh I can have a look at video frames, uh so.

J

And yeah I'll have a look at video frames. If it fits there, uh I will try out the poc and move it a bit to fit there and maybe I'll come back and give you implementation feedback.

A

Okay, well, I think we're out of time, and so I'd like to thank everybody for attending and we're going to have our december meeting, and so, if you have agenda requests, please please get those in.

A

Thank you. Everybody.

C

Okay, see you there bye.

C

And stop the recording.